Jacob Maloney bio photo

Jacob Maloney

Jacob is a Senior Software Engineer with the AdServer team at Conversant.

Email Github

FlameGraphs are an incredibly useful method of visualizing profiled software data. The FlameGraph tools aggregate stack trace data into an interactive SVG (click any of the FlameGraph pictures in this post to use them interactively) which help to visualize the program’s execution as a whole.

flamegraph

A brief description from the author’s website:

The x-axis shows the stack profile population, sorted alphabetically (it is not the passage of time), and the y-axis shows stack depth. Each rectangle represents a stack frame. The wider a frame is is, the more often it was present in the stacks. The top edge shows what is on-CPU, and beneath it is its ancestry. The colors are usually not significant, picked randomly to differentiate frames.

FlameGraphs are typically used as a different way of analyzing CPU profiling data (as opposed to a generic profiler). Their ability to visualize the whole program often exposes things that a normal profiler can’t. However, any type of stack trace data can be used to create FlameGraphs that analyze things like memory allocations, Off-CPU, Hot/Cold, etc.. To learn more about FlameGraphs and existing tools go to the source.

Java Memory Allocation

For JVM users FlameGraphs are more complicated to use due to the presence of the virtual machine. There are work-arounds for CPU profiling that range from using JVM stack traces to mapping jit’ed code to perf-events. However, I was unable to find anything that profiled memory allocations. In fact the tooling around memory allocations in java is pretty sparse in general. You can use jmap but that doesn’t provide any context about what code allocated that object.

 num     #instances         #bytes  class name
----------------------------------------------
   1:        175984    28937793560  [B
   2:      18812763      933450896  [C
   3:         47520      630796472  [J
   4:      21819420      523666080  java.lang.String
   5:        394681      314138640  [Ljava.lang.Object;
... 

The HPROF tool can provide heap allocation profiles and stack traces through the heap=sites option.

SITES BEGIN (ordered by live bytes) Fri Oct 22 11:52:24 2004
          percent          live          alloc'ed  stack class
 rank   self  accum     bytes objs     bytes  objs trace name
    1 44.73% 44.73%   1161280 14516  1161280 14516 302032 java.util.zip.ZipEntry
    2  8.95% 53.67%    232256 14516   232256 14516 302033 com.sun.tools.javac.util.List
    3  5.06% 58.74%    131504    2    131504     2 301029 com.sun.tools.javac.util.Name[]
    4  5.05% 63.79%    131088    1    131088     1 301030 byte[]
    5  5.05% 68.84%    131072    1    131072     1 301710 byte[]

However, in practice the overhead of measuring every allocation is often too high. To help reduce overhead the code is instrumented so that all memory allocations call into a sampling framework which can take a stack trace to provide the code path that was responsible for the memory allocation. This stack trace data can then be used to construct a FlameGraph which represents the memory allocation profile of that program.

flamegraph

In this example a standalone instance of HBase was instrumented and YCSB stress tests were run against it. You can see there are 2 major memory allocating parts of this application; the RpcServer and the RpcExecutor. In the RpcServer you can see a few things; KQueueSelectorImpl (because I ran the test on a Mac) converting large amounts of ints -> Integers, HeapByteBuffer’s being allocated, and protobuf parsing.

How To Run

To build your own java memory allocation FlameGraphs you need to first get the java agent and build it.

$ git clone https://github.com/jmaloney10/allocation-instrumenter.git
$ cd allocation-instrumenter
$ mvn clean package

The agent jar will be built in the target directory

$ ls -l target
total 248
drwxr-xr-x  16 jmaloney  admin    544 Aug 18 10:07 apidocs
drwxr-xr-x   4 jmaloney  admin    136 Aug 18 10:07 classes
drwxr-xr-x   3 jmaloney  admin    102 Aug 18 10:07 generated-sources
drwxr-xr-x   4 jmaloney  admin    136 Aug 18 10:07 jarjar
-rw-r--r--   1 jmaloney  admin  74471 Aug 18 10:07 java-allocation-instrumenter-3.0-SNAPSHOT-javadoc.jar
-rw-r--r--   1 jmaloney  admin  34017 Aug 18 10:07 java-allocation-instrumenter-3.0-SNAPSHOT-sources.jar
-rw-r--r--   1 jmaloney  admin   9327 Aug 18 10:07 java-allocation-instrumenter-3.0-SNAPSHOT.jar
drwxr-xr-x   4 jmaloney  admin    136 Aug 18 10:07 javadoc-bundle-options
drwxr-xr-x   3 jmaloney  admin    102 Aug 18 10:07 maven-archiver
drwxr-xr-x   3 jmaloney  admin    102 Aug 18 10:07 maven-status
drwxr-xr-x   3 jmaloney  admin    102 Aug 18 10:07 original-classes

The java-allocation-instrumenter-3.0-SNAPSHOT.jar file is the jar to specify in the javaagent parameter and the jar that contains the FlameCollapse utility. You will need to add the -javaagent parameter to your jvm start command (substituting in with yours)

$ java -javaagent:<path>/java-allocation-instrumenter-3.0-SNAPSHOT.jar=<path>/flame.properties ...

This will instrument your JVM and generate stack trace information for generating FlameGraph’s. By default the stack trace information is written to /tmp/stacks.txt but this is configurable in your flame.properties (see the readme for all available properties). After you have gathered enough stack trace information you are ready to turn it into a FlameGraph! There are two ways to do this: use the helper generate.sh script or manually call the FlameCollapse and flamegraph.pl tools. Using the generate.sh script is easy.

$ ./generate.sh -i /tmp/stacks.txt -f my_flamegraph.svg

This takes the raw stack trace data generated by the agent as parameter -i and outputs it to a file specified by -f. You can then view the interactive SVG in a web browser. If you look inside generate.sh you will see that its a very simple wrapper which does the collapse -> flamegraph steps for you. If you want to use custom flamegraph.pl options you can run it manually by doing the following

$ java -jar target/java-allocation-instrumenter-3.0-SNAPSHOT.jar /tmp/stacks.txt
$ FlameGraph/flamegraph.pl --inverted collapsed.txt > my_flamegraph.svg

How It Works

The java agent, based largely on the Allocation Instrumentor library, instruments classes by inserting calls to the AllocationRecorder framework after every memory allocating byte code.

instrumenting

In order to reduce the overhead of the AllocationRecorder framework various sampling strategies can be employed to limit the number of allocations measured. When a sample is taken the AllocationRecorder creates an AllocationEvent which contains data about the object’s size via Instrumentation.getObjectSize(), the allocation site via stack trace, and the name of the class allocated. The event is then offered to a queue.

sampling

Events on the queue are consumed by the FlamePrinter thread which parses the allocation information and writes to a log that is consumable by later tooling to create a FlameGraph.

printing