JAVA stream 커스텀 컬렉터 구현하기

2021. 11. 9. 19:00

빠른 성능을 요하는 상황에서 기존 컬렉터의 한계를 보고,

이를 해결하기 위한 커스텀 컬렉터를 작성하여 이에 대한 필요성을 확인해보도록 하겠습니다.

먼저, 어떤 태스크가 주어졌을 때 시간 측정을 하는 간단한 클래스를 구현해보겠습니다.

package example;

import java.time.Duration;
import java.util.function.Consumer;

public class Timer {
    public <T> Duration measure(T t, Consumer<T> task) {
        long start = System.currentTimeMillis();
        task.accept(t);
        return Duration.ofMillis(System.currentTimeMillis() - start);
    }
}

단순히 어떤 태스크와 이를 수행하기 위한 객체를 받아서 시간을 측정해주는 메소드를 가지고 있습니다.

이를 이용하여, 성능 향상을 위한 커스텀 컬렉터를 구현했을 때와 기존 자바 스트림 API의 컬렉터를 사용했을 때의 수행 시간을 비교해보겠습니다.

수행할 기능은 간단합니다.

어떤 정수 범위가 주어졌을 때, 소수와 소수가 아닌 수로 구분해주는 작업을 수행합니다.

먼저 인터페이스를 정의해줍니다.

public interface PrimePartitioner {
    Map<Boolean, List<Integer>> partition(int max);
}

정수 max를 파라미터로 받았을 때,

2~max 사이에서 소수인 수와 소수가 아닌 수를 구분하여 순서대로 맵에 저장하고 반환해줍니다.

(true = 소수, false = 소수가 아닌 수)

이 기능은 두 가지 형태로 작성될 것입니다.

첫째로, 자바 스트림 API에서 기본적으로 제공해주는 partioningBy Collector를 이용하여 작성합니다.

정의된 인터페이스를 통해 기능을 구현해줍니다.

package example.partitioner.normal;

import example.partitioner.PrimePartitioner;

import java.util.List;
import java.util.Map;
import java.util.stream.IntStream;

import static java.util.stream.Collectors.partitioningBy;

public class NormalPrimePartitioner implements PrimePartitioner {

    @Override
    public Map<Boolean, List<Integer>> partition(int max) {
        return IntStream.rangeClosed(2, max).boxed()
                .collect(partitioningBy(i -> isPrime(i)));
    }

    public boolean isPrime(int num) {
        int root = (int)Math.sqrt(num);
        return IntStream.rangeClosed(2, root).noneMatch(i -> num % i == 0);
    }
}

기본적인 소수 판별 알고리즘입니다.

2부터 max까지 정수 스트림을 생성하고, partitioningBy 컬렉터를 사용해서 소수와 소수가 아닌 수를 파티션해줍니다.

이 때, partitioningBy의 인자로 들어가는 Predicate의 제너릭 타입에는 기본 타입이 들어갈 수 없으므로,

boxed로 박싱을 해주었습니다.

위 코드는 잘 동작하지만, 성능 상의 문제가 있습니다.

이미 소수로 판명난 수의 배수인지만 판별해도, 해당하는 수가 소수인지 확인할 수 있습니다.

하지만, 위 코드는 2부터 모든 정수에 대해서 다 검사하기때문에 불필요한 검사 과정이 계속 됩니다.

어떤 수가 소수인지 판별할 때, 이미 소수로 판별된 리스트를 이용할 수 있다면, 그 리스트만 이용해도 빠르게 소수인지 판별할 수 있을 것입니다.

하지만 기본적으로 제공해주는 partitioningBy 컬렉터를 사용해서는, 그 중간 과정에 접근할 수 없습니다.

이러한 문제를 해결하기 위해, 이번에는 커스텀 컬렉터를 구현해보겠습니다.

그 전에 컬렉터 인터페이스를 살펴보도록 하겠습니다.

public interface Collector<T, A, R> {
    Supplier<A> supplier();
    BiConsumer<A, T> accumulator();
    BinaryOperator<A> combiner();
    Function<A, R> finisher();
    Set<Characteristics> characteristics();

    enum Characteristics {
        CONCURRENT,
        UNORDERED,
        IDENTITY_FINISH
    }
}

Collector 인터페이스를 구현하기 위해서는 위와 같은 5가지 메소드를 구현해야합니다.

먼저 3개의 타입 파라미터를 살펴보겠습니다.

T : reduce 작업을 수행할 스트림 요소의 타입

A : reduce 작업을 수행하며 중간 결과가 누적될 타입

R : reduce 작업의 결과 타입

5가지의 메소드는 다음을 구현해야합니다.

Supplier<A> supplier() : 중간 결과가 누적될 컨테이너를 생성하여 반환하는 Supplier를 정의

BiConsumer<A, T> accumulator() : reduce 연산을 수행하는 작업을 정의(T 타입의 데이터를 이용하여 A 타입의 컨테이너에 결과 누적)

BinaryOperator<A> combiner() : 두 작업이 병렬로 처리되었을 때, 이 결과를 어떻게 처리할지 정의(예를 들어, 하나의 결과 리스트에다가 다른 결과 리스트를 addAll)

Function<A, R> finisher() : A 타입으로 정의되어 중간 결과가 누적된 컨테이너를, 최종 결과 R 타입으로 변환하는 Function 정의

Set<Characteristics> characteristics : 컬렉터의 연산을 정의하는 Characteristics 불변 집합을 반환

Characterisitics는 다음 세 항목이 포함된 Enum 입니다.

CONCURRENT : 컬렉터가 accumulator를 여러 스레드에서 동시에 호출할 수 있어서 병렬로 수행할 수 있음

UNORDER : 리듀싱 결과는 스트림 요소의 방문 순서나 누적 순서에 영향을 받지 않음

IDENTITY_FINISH : finisher가 identity 기능이며 생략할 수 있음

(더 구체적인 내용은 문서 확인)

이제 Collector 인터페이스를 구현하는 코드를 작성해보겠습니다.

package example.partitioner.custom;

import java.util.*;
import java.util.function.BiConsumer;
import java.util.function.BinaryOperator;
import java.util.function.Function;
import java.util.function.Supplier;
import java.util.stream.Collector;

public class PrimeNumberCollector implements Collector<Integer, Map<Boolean, List<Integer>>, Map<Boolean, List<Integer>>> {

    @Override
    public Supplier<Map<Boolean, List<Integer>>> supplier() {
        return () -> {
            Map<Boolean, List<Integer>> map = new HashMap<>();
            map.put(true, new ArrayList<>());
            map.put(false, new ArrayList<>());
            return map;
        };
    }

    @Override
    public BiConsumer<Map<Boolean, List<Integer>>, Integer> accumulator() {
        return (acc, num) -> {
            List<Integer> primes = acc.get(true);
            acc.get(isPrime(primes, num)).add(num);
        };
    }

    @Override
    public BinaryOperator<Map<Boolean, List<Integer>>> combiner() {
        return (m1, m2) -> {
          m1.get(true).addAll(m2.get(true));
          m1.get(false).addAll(m2.get(false));
          return m1;
        };
    }

    @Override
    public Function<Map<Boolean, List<Integer>>, Map<Boolean, List<Integer>>> finisher() {
        return Function.identity();
    }

    @Override
    public Set<Characteristics> characteristics() {
        return Collections.unmodifiableSet(EnumSet.of(Characteristics.IDENTITY_FINISH));
    }

    private boolean isPrime(List<Integer> primes, int num) {
        int root = (int)Math.sqrt(num);
        return primes.stream()
                .takeWhile(i -> i <= root)
                .noneMatch(i -> num % i == 0);
    }
}

각 요소로는 정수가 들어오므로 T는 Integer,

중간 결과가 누적될 타입 A와 결과 타입 R은 Map<Boolean, List<Integer>>로 선언해줍니다.

각각을 자세히 살펴보겠습니다.

@Override
public Supplier<Map<Boolean, List<Integer>>> supplier() {
    return () -> {
        Map<Boolean, List<Integer>> map = new HashMap<>();
        map.put(true, new ArrayList<>());
        map.put(false, new ArrayList<>());
        return map;
    };
}

중간 결과를 누적할 컨테이너를 만드는 함수를 반환해줍니다.

소수라면 true, 소수가 아니라면 false로 매핑되는 리스트에 담기게 됩니다.

@Override
public BiConsumer<Map<Boolean, List<Integer>>, Integer> accumulator() {
    return (acc, num) -> {
        List<Integer> primes = acc.get(true);
        acc.get(isPrime(primes, num)).add(num);
    };
}

중간 결과가 누적되는 컨테이너 acc에, num이 소수인지 판별하여 true 또는 false로 매핑되는 리스트에 넣어줍니다.

이 때, 소수인지 빠르게 판별할 수 있기 위해, 이때까지 누적된 소수 리스트를 이용할 수 있습니다.

private boolean isPrime(List<Integer> primes, int num) {
    int root = (int)Math.sqrt(num);
    return primes.stream()
            .takeWhile(i -> i <= root)
            .noneMatch(i -> num % i == 0);
}

이번에는 중간 누적 결과를 이용해서 불필요한 검사를 줄일 수 있습니다.

takeWhile은, 주어진 Predicate이 거짓이 되면 스트림이 끝나게 됩니다.

@Override
public BinaryOperator<Map<Boolean, List<Integer>>> combiner() {
    return (m1, m2) -> {
      m1.get(true).addAll(m2.get(true));
      m1.get(false).addAll(m2.get(false));
      return m1;
    };
}

2부터 소수를 구해나가는 순차적인 알고리즘이기 때문에, 컬렉터는 병렬로 수행될 수 없습니다.

따라서, combiner는 구현할 필요가 없습니다.

그래서 UnsupportedOperationException와 같은 예외를 던져도 되지만,

혹시 구현하게 된다면 위와 같은 방식으로 작성할 수 있습니다.

m1에 m2을 addAll하여 m1을 반환하여 결합해줍니다.

@Override
public Function<Map<Boolean, List<Integer>>, Map<Boolean, List<Integer>>> finisher() {
    return Function.identity();
}

중간 결과 타입과 최종 결과 타입이 동일하기 때문에, 항등 함수를 반환해줍니다.

@Override
public Set<Characteristics> characteristics() {
    return Collections.unmodifiableSet(EnumSet.of(Characteristics.IDENTITY_FINISH));
}

순차적으로 발견된 소수가 필요하므로 IDENTITY_FINISH만 불변으로 반환해줍니다.

이제 작성된 커스텀 컬렉터를 이용하는 파티셔너를 구현해보겠습니다.

package example.partitioner.custom;

import example.partitioner.PrimePartitioner;

import java.util.List;
import java.util.Map;
import java.util.stream.IntStream;

public class CustomPrimePartitioner implements PrimePartitioner {
    @Override
    public Map<Boolean, List<Integer>> partition(int max) {
        return IntStream.rangeClosed(2, max)
                .boxed()
                .collect(new PrimeNumberCollector());
    }
}

단순히 컬렉터만 바뀌고 완전히 동일한 코드입니다.

(중복 코드지만, 타입 추론때문에 그대로 남겨두었습니다..)

이제 각각의 컬렉터로 테스트를 수행하기 위한 코드를 작성해보겠습니다.

package example;

import example.partitioner.PrimePartitioner;
import example.partitioner.custom.CustomPrimePartitioner;
import example.partitioner.normal.NormalPrimePartitioner;

import java.time.Duration;

public class PrimePartitionTimerTest {

    private static final int MAX = 10000000;

    public static void main(String[] args) {
        test(new NormalPrimePartitioner(), "기본");
//        test(new CustomPrimePartitioner(), "커스텀");
    }

    public static void test(PrimePartitioner partitioner, String title) {
        System.out.println("=== " + title + " 측정 시작 ===");

        Timer timer = new Timer();

        long fastestSeconds = Long.MAX_VALUE;
        for(int i=0; i<10; i++) {
            Duration measuredTime = timer.measure(partitioner, p -> p.partition(MAX));
            long measuredSeconds = measuredTime.getSeconds();
            fastestSeconds = Math.min(fastestSeconds, measuredSeconds);
            System.out.println(i + " : measuredSeconds = " + measuredSeconds);
        }
        System.out.println("fastestSeconds = " + fastestSeconds);

        System.out.println("=== " + title + " 측정 완료 ===");
    }
}

처음 작성했던 타이머를 이용해서 실행 시간을 측정해주는 코드입니다.

1000만까지의 정수에 대해서 소수를 분류하고, 각각의 케이스를 10번씩 수행해보겠습니다.

먼저, 기본 컬렉터를 이용한 방식의 실행 결과입니다.

=== 기본 측정 시작 ===
0 : measuredSeconds = 14
1 : measuredSeconds = 10
2 : measuredSeconds = 11
3 : measuredSeconds = 10
4 : measuredSeconds = 11
5 : measuredSeconds = 11
6 : measuredSeconds = 11
7 : measuredSeconds = 11
8 : measuredSeconds = 11
9 : measuredSeconds = 10
fastestSeconds = 10
=== 기본 측정 완료 ===

다음으로, 커스텀 컬렉터를 이용한 방식의 실행 결과입니다.

=== 커스텀 측정 시작 ===
0 : measuredSeconds = 7
1 : measuredSeconds = 5
2 : measuredSeconds = 4
3 : measuredSeconds = 4
4 : measuredSeconds = 4
5 : measuredSeconds = 4
6 : measuredSeconds = 4
7 : measuredSeconds = 4
8 : measuredSeconds = 4
9 : measuredSeconds = 4
fastestSeconds = 4
=== 커스텀 측정 완료 ===

확연하게 빨라진 것을 확인할 수 있습니다.

이렇게 해서 커스텀 컬렉터의 필요성과 작성 방법을 알아보았습니다.

출처 및 참고 자료 : 모던 자바 인 액션

'Java' 카테고리의 다른 글

Java 8 Default Method 정리 및 우선 순위 알아보기 (0)	2021.11.10
빌더 패턴(Builder pattern) 변형하기 - 3 (0)	2021.11.09
빌더 패턴(Builder pattern) 변형하기 - 2 (0)	2021.11.09
빌더 패턴(Builder pattern) 변형하기 - 1 (0)	2021.11.09
double brace initialization (0)	2021.11.09

쿠케캬캬 개발 기록

JAVA stream 커스텀 컬렉터 구현하기

'Java' 카테고리의 다른 글

+ Recent posts

티스토리툴바