Comparing performance of GraalJS interpreted vs compiled mode

In the previous post, we saw how we can run Javascript using GraalVM and also how it can be optimized for more performance. Now, let’s see how those two modes – interpreted vs compiled – really perform when tested using a benchmark program.

Benchmark program used for this test is simple implementation of “Sieve of Eratosthenes“. Java Micro-harness framework is used to test the GraalJS execution of our benchmark program in both interpreted & compiled mode in Forked JVM, twice with enough warmup & iterations to see any variations in the results. I have used a laptop running AMD Ryzen 5 2500U (8 CPU) to run all the tests with single thread.

Average Time, Latency

There are different benchmark types available in JMH framework. We’re interested in “Average Time” – i.e., time taken per invocation of the benchmark program. This is also sometimes called as “latency” in performance benchmarking community. Results below are in milliseconds taken per invocation.

Interpreted Mode

IterationForked VM 1Forked VM 2
164.30867.666
237.6438.007
334.64634.997
434.37334.803
534.09234.656
632.95533.498
732.92133.31
832.34532.904
932.40932.579
1032.44732.716
Interpreted Mode – Time per Operation (Average time) in Milliseconds

Compiled Mode

IterationForked VM 1Forked VM 2
12.1712.255
21.1661.195
31.0811.138
41.0850.982
50.9760.969
60.9760.969
70.9760.976
80.9750.968
90.9750.97
100.9770.969
Compiled Mode – Time per Operation (Average Time) in Milliseconds

Throughput

Throughput is technically inverse of average time, but some groups prefer one over the other. For easy comparison, Benchmark program is run for throughput (operations per second) and results are here.

Interpreted Mode

IterationForked VM 1Forked VM 2
118.35817.853
227.80628.081
329.26329.975
429.8529.852
530.47730.184
630.79930.738
731.18431.153
831.57131.501
931.55131.42
1031.51631.779
Interpreted Mode – Throughput (Operations per Second)

Compiled Mode

IterationForked VM 1Forked VM 2
1524.753521.322
2887.251888.703
3908.369903.94
41024.868988.931
51036.6411032.677
61031.321021.7
71033.6491021.353
81037.1241025.252
91040.9351023.142
101040.5311024.955
Compiled Mode – Throughput (Operations per Second)

Comparison

Throughput (Ops/Sec)MinimumAverageMaximum
Interpreted Mode30.73831.32131.779
Compiled Mode1021.3531029.9961040.935
Average Time (ms/op)MinimumAverageMaximum
Interpreted Mode32.34532.80833.498
Compiled Mode0.9680.9780.977

The performance of compiled mode looks staggering 30+ times better than interpreted mode, but performance results are always based on multiple factors. To start with, our benchmark program is CPU-bound program that doesn’t deal with anything else except reading and writing to memory majority of the time and provides an output at the end. This may not resemble our real world Javascript programs that we want to use for extending our Java applications, but nevertheless this proves that it’s indeed better to go for compiled mode when the Javascript is going to be executed over and over again.

Is that all? Not really. CPU consumption during the execution of interpreted mode is far less when compared to execution of compiled mode. How to systematically capture this CPU information & compare is for some other post.

Final question is – How much CPU we can afford for the JVM during the startup and early iterations? What happens after a week or so when we don’t need it? VMs where this Java application runs won’t be having an optimal usage of CPU. What’s the target?

Boiling down to usual ‘trade-off’ that engineers have to do based on discussion with business – resource-cost vs performance.

Stay tuned.