Java Tuning and Trouble Shooting

Useful JVM Settings

-Xmx512m (maximum Heap space) -Xms1024m (minimum Heap size)
starting points 2G 32 bit 3-4G 64 bit and 1:3 young vs old
-XX:+HeapDumpOnOutOfMemoryError

Java Trouble Shooting Tools

  • heap dump
    • jmap -dump:file=filename.bin [pid]
      • hprof binary format
      • needs to be analysed in another took like MAT
      • or jhat to see in a web browser on port 7000 on local host
        • jhat -J-mx2000m filename.bin
    • jmap -histo [pid]
      • histogram of java object heap
      • quick text output
    • jmap -heap [pid]
      • to print java heap summary
  • thread dump / thread stack
    • jstack [pid]
  • enabling gc logs
    • -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xloggc:[logname]
  • visualJVM
  • jconsole
  • list open files for process
    • sudo lsof -p [pid]
  • list max open file limit
    • ulimit -S -n

Big O Times

n O(1) O(log(n)) O(n) O(n log(n)) O(N^2) O(2^N) O(N!)
1 1 0.0 1 0.0 1 2 1
5 1 0.7 5 3.5 25 32 120
10 1 1.0 10 10.0 100 1024 3628800
25 1 1.4 25 34.9 625 33554432 1.55112E+25
50 1 1.7 50 84.9 2,500 1.1259E+15 3.04141E+64
100 1 2.0 100 200.0 10,000 1.26765E+30 9.3326E+157
500 1 2.7 500 1,349.5 250,000 3.2734E+150 too large for excel
1000 1 3.0 1,000 3,000.0 1,000,000 1.0715E+301 too large for excel
10000 1 4.0 10,000 40,000.0 100,000,000 too large for excel too large for excel
100000 1 5.0 100,000 500,000.0 10,000,000,000 too large for excel too large for excel

Java Data Structures

Here are list Java 7 data structures and their thread model and expected performance. (See BigO Cheat Sheet for more algorithm speeds):-

Name Overiew Threading Method Sorted Index Search / Contiains or Get Insert Delete Who
Maps
HashMap Hashtable Not thread safe No n/a O(1) O(1) O(1) Java
ConcurrentHashMap Segmented hashtable Non blocking No n/a Java
Hashtable Old School, don’t use Intrinsic blocks n/a O(1) O(1) O(1) Java
ConcurrentSkipListMap Tree like skipped list Lock free CAS Yes n/a O(log(N)) O(log(N)) O(log(N)) Java
Lists
ArrayList Resizable array Not thread safe No O(1) O(N) O(N) O(N) Java
LinkedList Double linked list Not thread safe No O(N) O(N) O(1) O(1) Java
Vector Old School, don’t use Intrinsic blocks No O(N) O(N) O(1) O(1) Java
Sets
HashSet HashMap backed Not thread safe No n/a O(1) O(1) O(1) Java
TresSet Tree backed Not thread safe Yes n/a O(log(N)) O(log(N)) O(log(N)) Java
ConcurrentSkipListSet As ConcurrentSkipListMap As ConcurrentSkipListMap yes n/a O(log(N)) O(log(N)) O(log(N)) Java
Queue  / FIFO
ConcurrentLinkedQueue Non blocking No Java
PriorityQueue Balanced binary heap Not thread safe Yes n/a O(N) O(log(N)) O(log(N)) Java
ArrayBlockingQueue Bounded array Reentrant lock No O(1) O(N) O(1) O(N) Java
PriorityBlockingQueue Array based binary heap Reentrant lock Yes n/a O(N) O(log(N)) ? O(log(N)) ? Java
LinkedBlockingQueue Typically greater throughput than array backed Reentrant lock No Java
LinkedTransferQueue Producer waits for consumer Short spin then park/unpark No Java
SynchronousQueue Insert waits for remove Short spin then park/unpark No Java
DelayQueue Backed by PriorityQueue Reentrant lock Yes Java
ForwardingBlockingQueue Blocking queue to another Blocking queue Not thread safe No Google
EvictingQueue Evicts from head Not thread safe No Google
Dequeue / FIFO or LIFO
LinkedList List, double linked list Not thread safe No O(N) O(N) O(1) O(1) Java
LinkedBlockingDeque Linked list Reentrant lock No Java
ArrayDeque Array backed Not thread safe No Java
ConcurrentLinkedDeque Non blocking No Java
ForwardingDeque Abstract Not thread safe No Google
ForwardingBlockingDeque Abstract Not thread safe No Google

Cost of Locks, CAS and Memory Access

The absolute numbers below will change as technology progresses but the relative numbers should stay the same.

According to LMAX the relative cost of locks and CAS is as follows (May 2011):-

Operation (500 million times) Cost (ms) Relative Cost
Single thread 300 1
Single thread with lock 10000 33
Two threads with lock 224000 747
Single thread with CAS 5700 19
Two threads with CAS 30000 100
Single thread with volatile write 4700 16
And according to Pinku Surana (Handwaving), here are the numbers for memory access (Jan 2009):-
Operation Cost (ns) Relative Cost
L1 cache reference 0.5 1
Branch mispredict 5 10
L2 cache reference 7 14
Mutex lock/unlock 100 200
Main memory reference 100 200
Compress 1K bytes with Zippy 10,000 20,000
Send 2K bytes over 1 Gbps network 20,000 40,000
Read 1 MB sequentially from memory 250,000 500,000
Round trip within same datacenter 500,000 1,000,000
Disk seek 10,000,000 20,000,000
Read 1 MB sequentially from network 10,000,000 20,000,000
Read 1 MB sequentially from disk 30,000,000 60,000,000
Send packet CA->Netherlands->CA 150,000,000 300,000,000

Java Garbage Collection

TLAB and the Cost of Object Allocation

To avoid contention each thread is assigned a Thread Local Allocation Buffer (TLAB) from which it allocates objects. Using TLABs allows object allocation to scale with number of threads by avoiding contention on a single memory resource. Object allocation via a TLAB is a very cheap operation; it simply bumps a pointer for the object size which takes roughly 10 instructions on most platforms. Heap memory allocation for Java is even cheaper than using malloc from the C runtime. When a TLAB is exhausted a thread simply requests a new one from the Eden space. When Eden has been filled a minor collection commences.  [ref]

Weak References explained.

Floating Point Numbers in Low Latency Applications

Clearly there are lots of issues with using float and double and using BigDecimal is not going to cut it an low latency financial world unless you are not really that low latency.  So what is the answer. As usual it depends, but here are some suggestions.

  • Look after the pennies rather than the pounds. That is use cents rather than dollars, or what ever is the lowest precision in your world.
  • Ensure intermediate calculations are as exact as possible so that rounding errors are not accumulated.
  • Build or borrow a maths library to compare floating point numbers for instance (i.e float and double) with an epsilon value, i.e. a nearly equals method so that you can hide all the gory details in a library. Also add rounding and precision to the laundry list. Slightly surprised Java’s core libraries don’t cover this better.
  • Use string representations on the wire rounded to appropriate levels of precision. Sure there is an overhead in conversion but at least its accurate.
  • A reminder of sizes:-
    • Integer = 32 bits, 4 bytes = 2^31 =+- 2147483648 = 2.14748E+9
    • Long = 64 bits, 8 byes = 2^63 = +-9.22337E+18
    • Float = 32 bits, 4 bytes = +-3.5E+38 and +-1.4E-45, about 7 digits
    • Double = 64 bits, 8 bytes = +-1.7E+308 and +-4.9E-324, about 15 digits
    • String = words, 2 bytes, short

References

Java theory and practice: Where’s your point?
Java’s new math, Part 2: Floating-point numbers
Java float / double comparison

Low Latency Java Blogs and Websites

Here are some interesting low latency blogs and websites I’ve been reading:-

Low Latency Java Techniques

Here are a list of web pages I’ve been reading recently:-

In due course I will be adding posts for each of these topics.