Async/Threads Benchmark

Real time and memory usage micro benchmarks

2023/07

Background

A couple months ago, I saw this posed on Hacker News

How much memory do you need to run 1M concurrent tasks?

pkolaczk wrote a series of synthetic benchmarks comparing memory overhead of async vs threads. These benchmarks mostly stared a thread or task and then slept for 10 seconds. In the post, pkolaczk stated three rust programs were used with threads, async-std, and tokio, but no source code was provided. I started a TFTP project and then went down a rabbit hole and was reading Hands-On Concurrency with Rust, by Brian L. Troutwine, which I need to read again.

Here is my addition/attempt. I forked pkolaczk’s GitHub repo and saw there was no Rust code, so I wrote my own.

Benchmarks

I wrote two tests. The first was a running average adapted from Dr. John D. Cooks’s post on Accurately computing running variance. My initial intent was to create a running temperature average, but ended up with a Law of Large numbers test of the random number generator. The second test spawned a task perform a pittance of work. The task generated a UUID and then SHA-512 hashed the UUID. Except for Java, I had to settle for SHA-256, and C#, I gave up.

In the running average, access is coordinated through mutex or message passing.

In the hashing, each task returns the hash to a single resource and access is controlled through mutex or message passing. Memory usage here also accounts for storing hashes of n tasks.

I ran 10,000, 100,000, and 1,000,000 tasks. Native threads capped out at 10,000.

The source code is located here.

Languages

I wrote these tests in following languages.

Running Average

Language Synchronization How
Dotnet 6 and 7 C# ??? async task
Go Mutex goroutines
Java Mutex threads, virtual threads (JDK 20, experimental)
Rust Mutex async-std, rayon, threads, tokio
Rust Messages threads, tokio

Hashing

Language Synchronization How
Go Mutex goroutines
Java Mutex threads, virtual threads (JDK 20, experimental)
Rust Mutex async-std, rayon, threads, tokio
Rust Messages threads, tokio

Dotnet or .Net?

Overall I like Dotnet and C#. It’s has all the batteries included. However, Microsoft’s documentation is so difficult for me to read and understand. I had to search the internet too many times and because I don’t get paid for this I gave up. This implementation drops about 2-3% of tasks, is the worst, and likely not indicative of Dotnet. In college, I took a C# course and did well until the threading assignment. Ten years later and I’ve learned nothing.

Go, Java, and Rust

These took me the least amount of time to write. Surprisingly, I wrote all the Go, Java, and Rust benchmarks in about a day and a half. Rust and Go were the easiest for me to wrap my head around. I’ve been slowing working my way through Zero to Production in Rust by Luca Palmieri and Let’s Go! by Alex Edwards, which may have helped.

The Rust rayon tests are the most synthetic. Because all of the tasks are semi-independent I used a broadcast method that wouldn’t be suitable for all workloads.

Setup

OS: Fedora Linux 38 (KDE Plasma) x86_64
Kernel: 6.4.4-200.fc38.x86_64
DE: Plasma 5.27.6
CPU: 12th Gen Intel i5-12600K (16)
Memory: 64070MiB

Dotnet 6.0.20
Dotnet 7.0.109
Go 1.26.0
Java openjdk 20.0.1
Rust 1.71.0

Results

Writing the benchmarks took a couple days. Writing the scripts to process the data took about two weeks. I tried to generate fancy charts, but they were hard to understand, so I went with tables. I also need to adjust the table css.

Running Stat

10,000 Tasks Real Time (seconds) Memory (MiB)
Go 0.121 42.1
Java Threads 0.292 46.9
Java VirtualThreads 0.067 58.0
Net6 0.802 141.7
Net7 0.794 141.7
Rust async_std mutex 0.053 27.2
Rust rayon mutex 0.054 27.1
Rust threads message 0.221 81.9
Rust threads mutex 0.218 81.9
Rust tokio message 0.050 27.1
Rust tokio mutex 0.050 27.1
100,000 Tasks Real Time (seconds) Memory (MiB)
Go 0.176 41.7
Java Threads 4.147 84.1
Java VirtualThreads 0.165 120.2
Net6 0.886 142.0
Net7 0.870 141.9
Rust async_std mutex 0.137 39.5
Rust rayon mutex 0.121 27.2
Rust tokio message 0.101 27.2
Rust tokio mutex 0.108 27.2
1,000,000 Tasks Real Time (seconds) Memory (MiB)
Go 0.554 250.5
Java Threads 75.863 321.8
Java VirtualThreads 2.856 586.5
Net6 1.781 142.8
Net7 1.595 142.9
Rust async_std mutex 0.983 403.4
Rust rayon mutex 0.804 27.2
Rust tokio message 0.618 137.4
Rust tokio mutex 0.723 218.0

Hashing

10,000 Tasks Real Time (seconds) Memory (MiB)
Go 0.159 59.6
Java Threads 0.382 183.1
Java VirtualThreads 0.385 132.9
Rust async_std mutex 0.060 27.5
Rust rayon mutex 0.060 27.4
Rust threads message 0.218 82.4
Rust threads mutex 0.228 83.3
Rust tokio message 0.044 26.1
Rust tokio mutex 0.053 27.4
100,000 Tasks Real Time (seconds) Memory (MiB)
Go 0.395 212.9
Java Threads 3.593 505.6
Java VirtualThreads 2.635 688.0
Rust async_std mutex 0.183 107.1
Rust rayon mutex 0.170 27.3
Rust tokio message 0.102 30.2
Rust tokio mutex 0.113 66.1
1,000,000 Tasks Real Time (seconds) Memory (MiB)
Go 2.560 2048.0
Java Threads 51.137 1740.8
Java VirtualThreads 18.461 4198.4
Rust async_std mutex 1.454 1024.0
Rust rayon mutex 1.284 158.3
Rust tokio message 0.739 286.3
Rust tokio mutex 0.754 642.6

Here are both one million tasks benchmarks for an apple-orange comparison.

1,000,000 Tasks Running Average Real Time (seconds) Running Average Memory (MiB) Hashing Real Time (seconds) Hashing Memory (MiB)
Go 0.554 250.5 2.560 2048.0
Java Threads 75.863 321.8 51.137 1740.8
Java VirtualThreads 2.856 586.5 18.461 4198.4
Rust async_std mutex 0.983 403.4 1.454 1024.0
Rust rayon mutex 0.804 27.2 1.284 158.3
Rust tokio message 0.618 137.4 0.739 286.3
Rust tokio mutex 0.723 218.0 0.754 642.6