Async/Threads Benchmark
Real time and memory usage micro benchmarks
2023/07
Background
A couple months ago, I saw this posed on Hacker News
How much memory do you need to run 1M concurrent tasks?
pkolaczk wrote a series of synthetic benchmarks comparing memory overhead of async vs threads. These benchmarks mostly stared a thread or task and then slept for 10 seconds. In the post, pkolaczk stated three rust programs were used with threads, async-std, and tokio, but no source code was provided. I started a TFTP project and then went down a rabbit hole and was reading Hands-On Concurrency with Rust, by Brian L. Troutwine, which I need to read again.
Here is my addition/attempt. I forked pkolaczk’s GitHub repo and saw there was no Rust code, so I wrote my own.
Benchmarks
I wrote two tests. The first was a running average adapted from Dr. John D. Cooks’s post on Accurately computing running variance. My initial intent was to create a running temperature average, but ended up with a Law of Large numbers test of the random number generator. The second test spawned a task perform a pittance of work. The task generated a UUID and then SHA-512 hashed the UUID. Except for Java, I had to settle for SHA-256, and C#, I gave up.
In the running average, access is coordinated through mutex or message passing.
In the hashing, each task returns the hash to a single resource and access is controlled through mutex or message passing. Memory usage here also accounts for storing hashes of n tasks.
I ran 10,000, 100,000, and 1,000,000 tasks. Native threads capped out at 10,000.
The source code is located here.
Languages
I wrote these tests in following languages.
Running Average
Language | Synchronization | How |
---|---|---|
Dotnet 6 and 7 C# | ??? | async task |
Go | Mutex | goroutines |
Java | Mutex | threads, virtual threads (JDK 20, experimental) |
Rust | Mutex | async-std, rayon, threads, tokio |
Rust | Messages | threads, tokio |
Hashing
Language | Synchronization | How |
---|---|---|
Go | Mutex | goroutines |
Java | Mutex | threads, virtual threads (JDK 20, experimental) |
Rust | Mutex | async-std, rayon, threads, tokio |
Rust | Messages | threads, tokio |
Dotnet or .Net?
Overall I like Dotnet and C#. It’s has all the batteries included. However, Microsoft’s documentation is so difficult for me to read and understand. I had to search the internet too many times and because I don’t get paid for this I gave up. This implementation drops about 2-3% of tasks, is the worst, and likely not indicative of Dotnet. In college, I took a C# course and did well until the threading assignment. Ten years later and I’ve learned nothing.
Go, Java, and Rust
These took me the least amount of time to write. Surprisingly, I wrote all the Go, Java, and Rust benchmarks in about a day and a half. Rust and Go were the easiest for me to wrap my head around. I’ve been slowing working my way through Zero to Production in Rust by Luca Palmieri and Let’s Go! by Alex Edwards, which may have helped.
The Rust rayon tests are the most synthetic. Because all of the tasks are semi-independent I used a broadcast method that wouldn’t be suitable for all workloads.
Setup
OS: Fedora Linux 38 (KDE Plasma) x86_64
Kernel: 6.4.4-200.fc38.x86_64
DE: Plasma 5.27.6
CPU: 12th Gen Intel i5-12600K (16)
Memory: 64070MiB
Dotnet 6.0.20
Dotnet 7.0.109
Go 1.26.0
Java openjdk 20.0.1
Rust 1.71.0
Results
Writing the benchmarks took a couple days. Writing the scripts to process the data took about two weeks. I tried to generate fancy charts, but they were hard to understand, so I went with tables. I also need to adjust the table css.
Running Stat
10,000 Tasks | Real Time (seconds) | Memory (MiB) |
---|---|---|
Go | 0.121 | 42.1 |
Java Threads | 0.292 | 46.9 |
Java VirtualThreads | 0.067 | 58.0 |
Net6 | 0.802 | 141.7 |
Net7 | 0.794 | 141.7 |
Rust async_std mutex | 0.053 | 27.2 |
Rust rayon mutex | 0.054 | 27.1 |
Rust threads message | 0.221 | 81.9 |
Rust threads mutex | 0.218 | 81.9 |
Rust tokio message | 0.050 | 27.1 |
Rust tokio mutex | 0.050 | 27.1 |
100,000 Tasks | Real Time (seconds) | Memory (MiB) |
---|---|---|
Go | 0.176 | 41.7 |
Java Threads | 4.147 | 84.1 |
Java VirtualThreads | 0.165 | 120.2 |
Net6 | 0.886 | 142.0 |
Net7 | 0.870 | 141.9 |
Rust async_std mutex | 0.137 | 39.5 |
Rust rayon mutex | 0.121 | 27.2 |
Rust tokio message | 0.101 | 27.2 |
Rust tokio mutex | 0.108 | 27.2 |
1,000,000 Tasks | Real Time (seconds) | Memory (MiB) |
---|---|---|
Go | 0.554 | 250.5 |
Java Threads | 75.863 | 321.8 |
Java VirtualThreads | 2.856 | 586.5 |
Net6 | 1.781 | 142.8 |
Net7 | 1.595 | 142.9 |
Rust async_std mutex | 0.983 | 403.4 |
Rust rayon mutex | 0.804 | 27.2 |
Rust tokio message | 0.618 | 137.4 |
Rust tokio mutex | 0.723 | 218.0 |
Hashing
10,000 Tasks | Real Time (seconds) | Memory (MiB) |
---|---|---|
Go | 0.159 | 59.6 |
Java Threads | 0.382 | 183.1 |
Java VirtualThreads | 0.385 | 132.9 |
Rust async_std mutex | 0.060 | 27.5 |
Rust rayon mutex | 0.060 | 27.4 |
Rust threads message | 0.218 | 82.4 |
Rust threads mutex | 0.228 | 83.3 |
Rust tokio message | 0.044 | 26.1 |
Rust tokio mutex | 0.053 | 27.4 |
100,000 Tasks | Real Time (seconds) | Memory (MiB) |
---|---|---|
Go | 0.395 | 212.9 |
Java Threads | 3.593 | 505.6 |
Java VirtualThreads | 2.635 | 688.0 |
Rust async_std mutex | 0.183 | 107.1 |
Rust rayon mutex | 0.170 | 27.3 |
Rust tokio message | 0.102 | 30.2 |
Rust tokio mutex | 0.113 | 66.1 |
1,000,000 Tasks | Real Time (seconds) | Memory (MiB) |
---|---|---|
Go | 2.560 | 2048.0 |
Java Threads | 51.137 | 1740.8 |
Java VirtualThreads | 18.461 | 4198.4 |
Rust async_std mutex | 1.454 | 1024.0 |
Rust rayon mutex | 1.284 | 158.3 |
Rust tokio message | 0.739 | 286.3 |
Rust tokio mutex | 0.754 | 642.6 |
Here are both one million tasks benchmarks for an apple-orange comparison.
1,000,000 Tasks | Running Average Real Time (seconds) | Running Average Memory (MiB) | Hashing Real Time (seconds) | Hashing Memory (MiB) |
---|---|---|---|---|
Go | 0.554 | 250.5 | 2.560 | 2048.0 |
Java Threads | 75.863 | 321.8 | 51.137 | 1740.8 |
Java VirtualThreads | 2.856 | 586.5 | 18.461 | 4198.4 |
Rust async_std mutex | 0.983 | 403.4 | 1.454 | 1024.0 |
Rust rayon mutex | 0.804 | 27.2 | 1.284 | 158.3 |
Rust tokio message | 0.618 | 137.4 | 0.739 | 286.3 |
Rust tokio mutex | 0.723 | 218.0 | 0.754 | 642.6 |