When (Not) to Choose Sequences over Lists in Kotlin

In my previous post, I hinted that using sequences over lists can provide substantial performance benefits. Today, I’d like to delve deeper into this topic. We’ll explore how sequences operate, identify when they can offer faster processing times and significant memory savings, as well as discuss when sequences are best to be avoided.

Using sequences instead of collections can present a significant performance boost. Especially when performing multiple operations such as filter and map on large data sets. Sequences execute these operations lazily, meaning they only process elements as required.

Let’s create an example where we define a list and a sequence. First, we perform operations using the list and measure the time and memory consumed. Afterwards, we do the same operations with the sequence, and again measure the time and memory used.

Here is a helper method that prints out the computation results along with stats about time and memory consumption.

import kotlin.time.measureTimedValue

private fun<T> measureTimeAndMemory(block: () -> T) {
    val runtime = Runtime.getRuntime()
    val usedMemoryBefore = runtime.totalMemory() - runtime.freeMemory()
    val (result, duration) = measureTimedValue { block() }
    val usedMemoryAfter = runtime.totalMemory() - runtime.freeMemory()
    println("Result: $result")
    println("Time: ${duration.inWholeMilliseconds} ms")
    println("Memory: ${usedMemoryAfter - usedMemoryBefore} bytes")
}

Sifting Through a Large Data Set

Suppose we work with a large data set and want to find the first 3 even numbers that are divisible by 11. We’ll compare the performance of a List approach and a Sequence approach to see the difference.

Using a list, the implementation looks as follows.

measureTimeAndMemory {
  (1..1_000_000)
    .filter { it % 2 == 0 }
    .filter { it % 11 == 0 }
    .take(3)
}

The exact timing and memory consumption will vary of course, but it’s the relative comparison that matters. Here is the result I got on my machine. The operation took 20 milliseconds and consumed about 10MB of memory.

Result: [22, 44, 66]
Time: 20 ms
Memory: 10736640 bytes

It only takes a minor adjustment to switch to using a sequence instead of a list.

measureTimeAndMemory {
  (1..1_000_000).asSequence()
    .filter { it % 2 == 0 }
    .filter { it % 11 == 0 }
    .take(3)
    .toList()
}

The difference in outcome is rather substantial. The computation was almost 7 times faster and consumed ten times less memory!

Result: [22, 44, 66]
Time: 3 ms
Memory: 420056 bytes

In this example, the main performance advantage of the Sequence approach is that its execution stops as soon as it finds the first 3 even numbers that are divisible by 11. The List approach, however, first filters all even numbers and then filters the ones divisible by 11. It creates multiple intermediate lists in the process which explains increased memory usage and, since the full filtering is applied to the whole range, it also takes longer to process.

The Blessing of Lazy Computations

Sequences process elements lazily, only generating the output as it becomes necessary. In our example, the computation stops as soon as it finds the first 3 even numbers that are divisible by 11. This results in a significant performance advantage when it comes to both time and memory consumption, especially with large data sets.

Getting back to our example, let’s visualize the intermediate throw-away lists we produce when using the List approach.

measureTimeAndMemory {
  // Creates a list
  val divisibleByTwo = (1..1_000_000).filter { it % 2 == 0 } 
  println("Divisible by 2: ${divisibleByTwo.size} elements")
  
  // Another list
  val divisibleByTwoAndEleven = divisibleByTwo.filter { it % 11 == 0 }
  println("Divisible by 2 and 11: ${divisibleByTwoAndEleven.size} elements")
  
  divisibleByTwoAndEleven.take(3)
}

Now we can clearly see that the entire data set was processed in order to arrive at the first 3 elements that satisfy the filter criteria. That’s wasteful.

Divisible by 2: 500000 elements
Divisible by 2 and 11: 45454 elements
Result: [22, 44, 66]
Time: 28 ms
Memory: 10788872 bytes

Laziness Prevents Waste

In our example, using a sequence nips the creation of wasteful side effects in the bud. Trying to visualise the footprint in the same way we did with a list won’t even work.

// Creates a sequence
val divisibleByTwo = (1..max).asSequence().filter { it % 2 == 0 }

// This won't compile. A sequence is not a collection!
println("Divisible by 2: ${divisibleByTwo.size} elements")

// Turning the sequence into a list ruins all performance gains!
println("Divisible by 2: ${divisibleByTwo.toList().size} elements")

The moment we turn a sequence into a list by using toList, we immediately lose all benefits. From this moment on, the computation examines every single element in the list just like with the sub-optimal solution presented earlier.

Use with Care

Sequences in Kotlin are beneficial when working with large collections, especially when chaining multiple operations that can benefit from lazy evaluation. However, in certain situations using a sequence is not the best choice.

Small collections where performance gain is negligible. The overhead of setting up the sequence may surpass the benefits of lazy evaluation.
When the whole collection is needed immediately: Sequences are beneficial for early termination in a chain. On the other hand, if you need the entire collection processed right away, using a sequence brings no value.
Operations that require random access or index-based lookups. Sequences don’t support efficient indexed access. They are iterated each time from the beginning!
When dealing with sequences of operations that have short-circuiting conditions, but every element is still processed. A typical scenario is a map operation followed by a firstOrNull or lastOrNull without any other short-circuit operations in between.

Here’s an example reflecting the last point. Let’s say we start with a collection of numbers and want to find the first negative number after squaring each number. This is intentionally a nonsensical operation to show inappropriate use of sequences.

fun main() {
    val numbers = listOf(1, 2, 3, 4, -5, 6, -7)

    // List approach:
    val firstNegativeSquaredList = numbers.map { it * it }.firstOrNull { it < 0 }
    println("List result: $firstNegativeSquaredList")

    // Sequence approach:
    val firstNegativeSquaredSequence = numbers.asSequence().map { it * it }.firstOrNull { it < 0 }
    println("Sequence result: $firstNegativeSquaredSequence")
}

In this example, using a sequence offers no advantage over a list. The map function generates a sequence of positive squares, and since there are no negative squares, the firstOrNull will always process the entire collection regardless of whether it’s a List or a Sequence. Moreover, there’s an overhead in creating the sequence.

The example demonstrates that without the possibility of short-circuiting operations, sequences do not offer a performance benefit and may introduce a slight overhead. It’s always important to use the right tool for the job, which can sometimes be a simple list or a direct iteration without sequences.

2 Comments


Petr Šabata March 7, 2024

Nicely explained with the short-circuiting. Let’s not forget that .filter { condition1 }.filter { condition2 } can be written down as .filter { condition1 && condition2 }.
- Reply
  
  Tomas Zezula March 8, 2024
  
  Good point, thank you for highlight that.