I have been trying the C version and running the testperf on a Intel(R) Xeon(R) CPU E5-2670 v3 @ 2.30GHz. The average cost per add is around 40nsec. (I used 2 digit precision, 1 usec to 10msec range)
So was wondering in what conditions the numbers reported inhttp://hdrhistogram.org
were measured? "Measurements show value recording times as low as 3-6 nanoseconds on modern (circa 2014) Intel CPUs" this is an order of magnitude faster.