CaffeineMark™ 2.5

Benchmark Test Descriptions



Overview of CaffeineMark 2.5

The original CaffeineMark benchmark consisted of four tests: a prime number sieve, a tight integer loop, an image blasting test and a BitBlting test. The tests gave a fairly accurate measure of applet performance. While these simplistic tests were generally successful, they had some shortcomings. In particular,

For CaffeineMark 2.01, we attempted to address some of the shortcomings of the 1.0 version by incorporating nine tests instead of four and by changing the formula for the overall CaffeineMark. The graphics tests are weighted as highly as any of the other tests, but there are more tests.

We also added two tests which can be run locally. These tests measure Allocation/Garbage Collection and JIT compiler speed.

Finally, version 2.5 fixed some problems with the string and sieve tests. The string test was too memory intensive, and was unable to test embedded systems, e.g., systems intended for use in consumer electronics. The sieve test contained a bug which made it act more like a loop test than a sieve test. All other tests were left unchanged.

You may also notice the CaffeineMark theorize about the types of optimizations that are being performed by the Virtual Machine. For example, the CaffeineMark will try to detect a non-rendering graphics system.


The Tests

Sieve A prime number sieve test. The sieve locates all the prime numbers under 2048.
Loop Runs several types of integer loops. The loops are sensitive to common compiler optimizations such as inline substitution, register variable and constant subexpression optimization.
String Tests string concatenation and search.
Method Tests how fast the VM performs method calls.
Floating-Point Floating point tests simulate the calculations needed to rotate 50 three dimensional points through 90 degrees, 5 degrees at a time. This tests primarily matrix multiplication, but also does some trigonometric function evaluation and division.
Logic The Logic test executes loops containing decision trees. The test is sensitive to short circuit boolean expression evaluation (variable expressions only) and branching speed.
Image Tests the speed of the drawImage() call. A very small part of the Java system, but a very important part for animation.
Graphics Tests the speed of the drawLine(), setColor() and fillRect() calls. Again, a small, yet important part of the Java class library.
Dialog The Dialog test measures access time for components properties on a dialog box.
Allocation And
Garbage Collection (AGC)
(Local Only)
The AGC test runs an animation in the standard way (using sleep for a thread). The time in milliseconds between the drawing of each frame is measured. If little or no processing is done between each frame, the time between frames will be approximately the sleep time of the thread.

The AGC allocates space in the paint() method of the animator. This allocated space becomes garbage at the end of the paint() call. The allocation and garbage collection increases the time between frames.

The AGC test completes after about two minutes, when the test determines that, statistically, the frames have been delayed from the initial 50-100ms per frame to 300ms per frame. Since garbage collection runs in a separately scheduled thread, it can take some time for the statistics to stabilize.

Dynamic Compilation (DC)
(Local Only)
The DC test actually recompiles a class file repeatedly. This requires the ability to subclass the ClassLoader, so not all Java implementations will support this test. The Sun and Cafe appletviewers and Netscape 2.01 will run the test locally under Windows 95.


General Testing Algorithm

The general testing algorithm is as follows:
  1. Get the current time in milliseconds.
  2. Perform the benchmark operation.
  3. Check the elapsed time. If the elapsed time is less than one second, return to step 2. This determines the number of times to run step 5.
  4. Get the current time in milliseconds.
  5. Perform the benchmark operation triple the number of times it was executed in step 2.
  6. The test result is a constant factor times the number of times step 5 is executed, divided by the total number of milliseconds the test ran (approx. 3000).
The AGC test uses a different method (see the table above).


Test Weighting - How To Calculate The CaffeineMark

CaffeineMark= 0.5 x Sieve
+ 0.5 x Loop
+ 0.5 x Method
+ 0.5 x Logic
+ String
+ Floating-Point
+ Image
+ Graphics
+ Dialog


Reference System

Unlike benchmarks designed to test hardware system performance (or even combined hardware OS performance), the CaffeineMark tests Java system performance which is a function of the hardware, OS, and VM performance. As such, a reference system is relatively unimportant. Indeed, the CaffeineMark scores for the reference system vary by a factor of 8 across VM's!

For completeness, our reference system consisted of the following:

The reference system has test scores of 100 across the board and has a CaffeineMark of 100.


Source Code

There are no plans to release the CaffeineMark source code at this time.
Back to the CaffeineMark 2.5 Home Page
The CaffeineMark™ benchmark is Copyright © 1996 Pendragon Software, All Rights Reserved. CaffeineMark is a trademark of Pendragon Software. Java and HotJava are trademarks of Sun Microsystems, Inc., and refer to Sun's Java programming language and HotJava browser technologies.