A big flaw in Java's nanoTime() / by Your Name

I'm working on a project that requires me to measure time and timestamp events in a concurrent/asynchronous system. I didn't want to use System.currentTimeMillis() because it's not returning a linearly increasing number of time stamps. The user or services such as NTP can change the system clock at any time and the time will leap back and forward.

So what about System.nanoTime()? It seems to return a linearly increasing number of time stamps for the next 292 years from the point you turn on the computer. The JavaDoc doesn't mention any potential problems besides that nanosecond precision is maybe not supported by the underlying operating system. It should be sufficient to measure time or to timestamp events.

It turns out nanoTime()'s behavior is actually quite unspecified and it's unclear what the clock's expected properties are. Chances are it's using the Time Stamp Counter (TSC) and modern computers have one of them for each core. It's up to the underlying Operating System or Hardware to keep these TSCs in sync (or not) or provide some other high precision clock etc. etc. From the developers perspective it's completely unclear what nanoTime() is actually returning.

There are a few things that can happen if nanoTime() is based on TSCs and if there are multiple CPUs or COREs at play or if the underlying Operating System is messing with the CPU/CORE frequencies (e.g. switching to a lower clock speed to save energy). This can be best seen if you're trying to measure time by doing two subsequent nanoTime() calls and something happens in between these calls. Let's take a look at the following example.

while (true) {
    long startTime = System.nanoTime();
    // Let's hope the current Thread is switched to an another CORE or 
    // enters some power saving mode that lowers the TSC's frequency
    Thread.yield();
    long time = System.nanoTime() - startTime;
    assert (time >= 0L);
}

It's pretty simple and straight forward. We're simply measuring the amount of time that elapses between the two nanoTime() calls and the yield() tries to enforce a context switch (hopefully to a different CPU or CORE). In practice you've to run the code on multiple Threads to catch these fluke events but I've managed to get negative time values in my experiments*.

The code in my project looks actually more like this (the subsequent nanoTime() calls are quite "far apart" from each other) and I see these fluke events once or twice an hour.

ExecutorService executor
    = Executors.newCachedThreadPool();

final long startTime = System.nanoTime(); executor.execute(new Runnable() { @Override public void run() { doSomething(); long time = System.nanoTime() - startTime; assert (time >= 0L); } });

You may ask what the outcome is? Well, the outcome is that you've to deal with the exact same problems as with currentTimeMillis(). The problems have very different origins but at the end of day you'll see the same errors (time is leaping back and forth). So neither can be really trusted and if you don't need the additional precision you may equally well stick with currentTimeMillis() to get nice things such as the fact that it's based on the UNIX System Time.

A few links that might be useful: [1] [2] [3] [4] [5]

* MacBook, Mac OS X 10.6.5, Java 1.6.0_22 and Intel Core 2 Duo