Java Tuning and Trouble Shooting

Posted on 31st July 2015 by peter2014

Useful JVM Settings

-Xmx512m (maximum Heap space) -Xms1024m (minimum Heap size)
starting points 2G 32 bit 3-4G 64 bit and 1:3 young vs old
-XX:+HeapDumpOnOutOfMemoryError

Java Trouble Shooting Tools

heap dump
- jmap -dump:file=filename.bin [pid]
  - hprof binary format
  - needs to be analysed in another took like MAT
  - or jhat to see in a web browser on port 7000 on local host
    - jhat -J-mx2000m filename.bin
- jmap -histo [pid]
  - histogram of java object heap
  - quick text output
- jmap -heap [pid]
  - to print java heap summary
thread dump / thread stack
- jstack [pid]
enabling gc logs
- -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xloggc:[logname]
visualJVM
jconsole
list open files for process
- sudo lsof -p [pid]
list max open file limit
- ulimit -S -n

Big O Times

Posted on 2nd February 2014 by peter2014

n	O(1)	O(log(n))	O(n)	O(n log(n))	O(N^2)	O(2^N)	O(N!)
1	1	0.0	1	0.0	1	2	1
5	1	0.7	5	3.5	25	32	120
10	1	1.0	10	10.0	100	1024	3628800
25	1	1.4	25	34.9	625	33554432	1.55112E+25
50	1	1.7	50	84.9	2,500	1.1259E+15	3.04141E+64
100	1	2.0	100	200.0	10,000	1.26765E+30	9.3326E+157
500	1	2.7	500	1,349.5	250,000	3.2734E+150	too large for excel
1000	1	3.0	1,000	3,000.0	1,000,000	1.0715E+301	too large for excel
10000	1	4.0	10,000	40,000.0	100,000,000	too large for excel	too large for excel
100000	1	5.0	100,000	500,000.0	10,000,000,000	too large for excel	too large for excel

Java Data Structures

Posted on 1st February 2014 by peter2014

Here are list Java 7 data structures and their thread model and expected performance. (See BigO Cheat Sheet for more algorithm speeds):-

Name	Overiew	Threading Method	Sorted	Index	Search / Contiains or Get	Insert	Delete	Who
Maps
HashMap	Hashtable	Not thread safe	No	n/a	O(1)	O(1)	O(1)	Java
ConcurrentHashMap	Segmented hashtable	Non blocking	No	n/a				Java
Hashtable	Old School, don’t use	Intrinsic blocks		n/a	O(1)	O(1)	O(1)	Java
ConcurrentSkipListMap	Tree like skipped list	Lock free CAS	Yes	n/a	O(log(N))	O(log(N))	O(log(N))	Java

Lists
ArrayList	Resizable array	Not thread safe	No	O(1)	O(N)	O(N)	O(N)	Java
LinkedList	Double linked list	Not thread safe	No	O(N)	O(N)	O(1)	O(1)	Java
Vector	Old School, don’t use	Intrinsic blocks	No	O(N)	O(N)	O(1)	O(1)	Java

Sets
HashSet	HashMap backed	Not thread safe	No	n/a	O(1)	O(1)	O(1)	Java
TresSet	Tree backed	Not thread safe	Yes	n/a	O(log(N))	O(log(N))	O(log(N))	Java
ConcurrentSkipListSet	As ConcurrentSkipListMap	As ConcurrentSkipListMap	yes	n/a	O(log(N))	O(log(N))	O(log(N))	Java

Queue / FIFO
ConcurrentLinkedQueue		Non blocking	No					Java
PriorityQueue	Balanced binary heap	Not thread safe	Yes	n/a	O(N)	O(log(N))	O(log(N))	Java
ArrayBlockingQueue	Bounded array	Reentrant lock	No	O(1)	O(N)	O(1)	O(N)	Java
PriorityBlockingQueue	Array based binary heap	Reentrant lock	Yes	n/a	O(N)	O(log(N)) ?	O(log(N)) ?	Java
LinkedBlockingQueue	Typically greater throughput than array backed	Reentrant lock	No					Java
LinkedTransferQueue	Producer waits for consumer	Short spin then park/unpark	No					Java
SynchronousQueue	Insert waits for remove	Short spin then park/unpark	No					Java
DelayQueue	Backed by PriorityQueue	Reentrant lock	Yes					Java
ForwardingBlockingQueue	Blocking queue to another Blocking queue	Not thread safe	No					Google
EvictingQueue	Evicts from head	Not thread safe	No					Google

Dequeue / FIFO or LIFO
LinkedList	List, double linked list	Not thread safe	No	O(N)	O(N)	O(1)	O(1)	Java
LinkedBlockingDeque	Linked list	Reentrant lock	No					Java
ArrayDeque	Array backed	Not thread safe	No					Java
ConcurrentLinkedDeque		Non blocking	No					Java
ForwardingDeque	Abstract	Not thread safe	No					Google
ForwardingBlockingDeque	Abstract	Not thread safe	No					Google

Cost of Locks, CAS and Memory Access

Posted on 29th January 2014 by peter2014

The absolute numbers below will change as technology progresses but the relative numbers should stay the same.

According to LMAX the relative cost of locks and CAS is as follows (May 2011):-

Operation (500 million times)	Cost (ms)	Relative Cost
Single thread	300	1
Single thread with lock	10000	33
Two threads with lock	224000	747
Single thread with CAS	5700	19
Two threads with CAS	30000	100
Single thread with volatile write	4700	16

And according to Pinku Surana (Handwaving), here are the numbers for memory access (Jan 2009):-

Operation	Cost (ns)	Relative Cost
L1 cache reference	0.5	1
Branch mispredict	5	10
L2 cache reference	7	14
Mutex lock/unlock	100	200
Main memory reference	100	200
Compress 1K bytes with Zippy	10,000	20,000
Send 2K bytes over 1 Gbps network	20,000	40,000
Read 1 MB sequentially from memory	250,000	500,000
Round trip within same datacenter	500,000	1,000,000
Disk seek	10,000,000	20,000,000
Read 1 MB sequentially from network	10,000,000	20,000,000
Read 1 MB sequentially from disk	30,000,000	60,000,000
Send packet CA->Netherlands->CA	150,000,000	300,000,000

Java Garbage Collection

Posted on 21st January 2014 by peter2014

TLAB and the Cost of Object Allocation

To avoid contention each thread is assigned a Thread Local Allocation Buffer (TLAB) from which it allocates objects. Using TLABs allows object allocation to scale with number of threads by avoiding contention on a single memory resource. Object allocation via a TLAB is a very cheap operation; it simply bumps a pointer for the object size which takes roughly 10 instructions on most platforms. Heap memory allocation for Java is even cheaper than using malloc from the C runtime. When a TLAB is exhausted a thread simply requests a new one from the Eden space. When Eden has been filled a minor collection commences. [ref]

Weak References explained.

Floating Point Numbers in Low Latency Applications

Posted on 15th January 2014 by peter2014

Clearly there are lots of issues with using float and double and using BigDecimal is not going to cut it an low latency financial world unless you are not really that low latency. So what is the answer. As usual it depends, but here are some suggestions.

Look after the pennies rather than the pounds. That is use cents rather than dollars, or what ever is the lowest precision in your world.
Ensure intermediate calculations are as exact as possible so that rounding errors are not accumulated.
Build or borrow a maths library to compare floating point numbers for instance (i.e float and double) with an epsilon value, i.e. a nearly equals method so that you can hide all the gory details in a library. Also add rounding and precision to the laundry list. Slightly surprised Java’s core libraries don’t cover this better.
Use string representations on the wire rounded to appropriate levels of precision. Sure there is an overhead in conversion but at least its accurate.
A reminder of sizes:-
- Integer = 32 bits, 4 bytes = 2^31 =+- 2147483648 = 2.14748E+9
- Long = 64 bits, 8 byes = 2^63 = +-9.22337E+18
- Float = 32 bits, 4 bytes = +-3.5E+38 and +-1.4E-45, about 7 digits
- Double = 64 bits, 8 bytes = +-1.7E+308 and +-4.9E-324, about 15 digits
- String = words, 2 bytes, short

References

Java theory and practice: Where’s your point?
Java’s new math, Part 2: Floating-point numbers
Java float / double comparison

Low Latency Java Blogs and Websites

Posted on 13th January 2014 by peter2014

Here are some interesting low latency blogs and websites I’ve been reading:-

http://psy-lob-saw.blogspot.co.uk Various Low latency
http://mechanical-sympathy.blogspot.co.uk Various Low latency
http://mentablog.soliveirajr.com Various Low latency
http://jeremymanson.blogspot.co.uk Java Concurrency
http://alexanderpodelko.com Performance Management
http://bigocheatsheet.com Algorithm Speed
http://highscalability.com Mainly for Web Site
http://www.low-latency.com Aggregator website includes big data and analytics
http://www.hftreview.com Aggregator website
http://java-performance.com A bunch of tricks here
http://vanillajava.blogspot.co.uk Writing-and-testing-high-frequency-trading-engines-in-java
OpenHFT

Low Latency Java Techniques

Posted on 12th January 2014 by peter2014

Here are a list of web pages I’ve been reading recently:-

Disruptor (lock free and garbage free) see also
Pinning Threads
Pinning Processes
JIT Warming (but warming time can be complex)
Memory Mapped IPC (within hosts)
Lock Free Algorithms and Courses
Inter Socket IPC with UDP (between hosts)
SDP with Infiband (if you have the hardware)
Java Large Pages (not always faster)
Roll Your Own Everything including fix layer
Immutable objects, but not for really low latency as need to create objects in the first place
Cache friendly data structures cache misses are expensive
O(1) or O(log2 n) at worst algorithms
Single writer principle, all others are readers
Number of threads = number of cores + 1
Keep call stacks small and don’t use Spring as GC is harder with a big stack
No locks or IO on the main flow
Keep the flow simple and uncluttered
Buy big machines with large RAM to avoid GC all together
Use Unsafe for low level data manipulation and compare and swap.

In due course I will be adding posts for each of these topics.

SMR

Kotlin | Spring | Python | Java | Scala | C++ | Linux | IOS | Agile BDD | Low Latency Trading | Market Data

Category Archives: Java