In the previous post, we saw how we can run Javascript using GraalVM and also how it can be optimized for more performance. Now, let’s see how those two modes – interpreted vs compiled – really perform when tested using a benchmark program.
Benchmark program used for this test is simple implementation of “Sieve of Eratosthenes“. Java Micro-harness framework is used to test the GraalJS execution of our benchmark program in both interpreted & compiled mode in Forked JVM, twice with enough warmup & iterations to see any variations in the results. I have used a laptop running AMD Ryzen 5 2500U (8 CPU) to run all the tests with single thread.
Average Time, Latency
There are different benchmark types available in JMH framework. We’re interested in “Average Time” – i.e., time taken per invocation of the benchmark program. This is also sometimes called as “latency” in performance benchmarking community. Results below are in milliseconds taken per invocation.
Interpreted Mode
| Iteration | Forked VM 1 | Forked VM 2 |
| 1 | 64.308 | 67.666 |
| 2 | 37.64 | 38.007 |
| 3 | 34.646 | 34.997 |
| 4 | 34.373 | 34.803 |
| 5 | 34.092 | 34.656 |
| 6 | 32.955 | 33.498 |
| 7 | 32.921 | 33.31 |
| 8 | 32.345 | 32.904 |
| 9 | 32.409 | 32.579 |
| 10 | 32.447 | 32.716 |

Compiled Mode
| Iteration | Forked VM 1 | Forked VM 2 |
| 1 | 2.171 | 2.255 |
| 2 | 1.166 | 1.195 |
| 3 | 1.081 | 1.138 |
| 4 | 1.085 | 0.982 |
| 5 | 0.976 | 0.969 |
| 6 | 0.976 | 0.969 |
| 7 | 0.976 | 0.976 |
| 8 | 0.975 | 0.968 |
| 9 | 0.975 | 0.97 |
| 10 | 0.977 | 0.969 |

Throughput
Throughput is technically inverse of average time, but some groups prefer one over the other. For easy comparison, Benchmark program is run for throughput (operations per second) and results are here.
Interpreted Mode
| Iteration | Forked VM 1 | Forked VM 2 |
| 1 | 18.358 | 17.853 |
| 2 | 27.806 | 28.081 |
| 3 | 29.263 | 29.975 |
| 4 | 29.85 | 29.852 |
| 5 | 30.477 | 30.184 |
| 6 | 30.799 | 30.738 |
| 7 | 31.184 | 31.153 |
| 8 | 31.571 | 31.501 |
| 9 | 31.551 | 31.42 |
| 10 | 31.516 | 31.779 |

Compiled Mode
| Iteration | Forked VM 1 | Forked VM 2 |
| 1 | 524.753 | 521.322 |
| 2 | 887.251 | 888.703 |
| 3 | 908.369 | 903.94 |
| 4 | 1024.868 | 988.931 |
| 5 | 1036.641 | 1032.677 |
| 6 | 1031.32 | 1021.7 |
| 7 | 1033.649 | 1021.353 |
| 8 | 1037.124 | 1025.252 |
| 9 | 1040.935 | 1023.142 |
| 10 | 1040.531 | 1024.955 |

Comparison
| Throughput (Ops/Sec) | Minimum | Average | Maximum |
|---|---|---|---|
| Interpreted Mode | 30.738 | 31.321 | 31.779 |
| Compiled Mode | 1021.353 | 1029.996 | 1040.935 |
| Average Time (ms/op) | Minimum | Average | Maximum |
|---|---|---|---|
| Interpreted Mode | 32.345 | 32.808 | 33.498 |
| Compiled Mode | 0.968 | 0.978 | 0.977 |
The performance of compiled mode looks staggering 30+ times better than interpreted mode, but performance results are always based on multiple factors. To start with, our benchmark program is CPU-bound program that doesn’t deal with anything else except reading and writing to memory majority of the time and provides an output at the end. This may not resemble our real world Javascript programs that we want to use for extending our Java applications, but nevertheless this proves that it’s indeed better to go for compiled mode when the Javascript is going to be executed over and over again.
Is that all? Not really. CPU consumption during the execution of interpreted mode is far less when compared to execution of compiled mode. How to systematically capture this CPU information & compare is for some other post.
Final question is – How much CPU we can afford for the JVM during the startup and early iterations? What happens after a week or so when we don’t need it? VMs where this Java application runs won’t be having an optimal usage of CPU. What’s the target?
Boiling down to usual ‘trade-off’ that engineers have to do based on discussion with business – resource-cost vs performance.
Stay tuned.