FlameGraphs are an incredibly useful method of visualizing profiled software data. The FlameGraph tools aggregate stack trace data into an interactive SVG (click any of the FlameGraph pictures in this post to use them interactively) which help to visualize the program’s execution as a whole.
A brief description from the author’s website:
The x-axis shows the stack profile population, sorted alphabetically (it is not the passage of time), and the y-axis shows stack depth. Each rectangle represents a stack frame. The wider a frame is is, the more often it was present in the stacks. The top edge shows what is on-CPU, and beneath it is its ancestry. The colors are usually not significant, picked randomly to differentiate frames.
FlameGraphs are typically used as a different way of analyzing CPU profiling data (as opposed to a generic profiler). Their ability to visualize the whole program often exposes things that a normal profiler can’t. However, any type of stack trace data can be used to create FlameGraphs that analyze things like memory allocations, Off-CPU, Hot/Cold, etc.. To learn more about FlameGraphs and existing tools go to the source.
Java Memory Allocation
For JVM users FlameGraphs are more complicated to use due to the presence of the virtual machine. There are work-arounds for CPU profiling that range from using JVM stack traces to mapping jit’ed code to perf-events. However, I was unable to find anything that profiled memory allocations. In fact the tooling around memory allocations in java is pretty sparse in general. You can use jmap
but that doesn’t provide any context about what code allocated that object.
num #instances #bytes class name
----------------------------------------------
1: 175984 28937793560 [B
2: 18812763 933450896 [C
3: 47520 630796472 [J
4: 21819420 523666080 java.lang.String
5: 394681 314138640 [Ljava.lang.Object;
...
The HPROF tool can provide heap allocation profiles and stack traces through the heap=sites
option.
SITES BEGIN (ordered by live bytes) Fri Oct 22 11:52:24 2004
percent live alloc'ed stack class
rank self accum bytes objs bytes objs trace name
1 44.73% 44.73% 1161280 14516 1161280 14516 302032 java.util.zip.ZipEntry
2 8.95% 53.67% 232256 14516 232256 14516 302033 com.sun.tools.javac.util.List
3 5.06% 58.74% 131504 2 131504 2 301029 com.sun.tools.javac.util.Name[]
4 5.05% 63.79% 131088 1 131088 1 301030 byte[]
5 5.05% 68.84% 131072 1 131072 1 301710 byte[]
However, in practice the overhead of measuring every allocation is often too high. To help reduce overhead the code is instrumented so that all memory allocations call into a sampling framework which can take a stack trace to provide the code path that was responsible for the memory allocation. This stack trace data can then be used to construct a FlameGraph which represents the memory allocation profile of that program.
In this example a standalone instance of HBase was instrumented and YCSB stress tests were run against it. You can see there are 2 major memory allocating parts of this application; the RpcServer and the RpcExecutor. In the RpcServer you can see a few things; KQueueSelectorImpl (because I ran the test on a Mac) converting large amounts of ints -> Integers, HeapByteBuffer’s being allocated, and protobuf parsing.
How To Run
To build your own java memory allocation FlameGraphs you need to first get the java agent and build it.
$ git clone https://github.com/jmaloney10/allocation-instrumenter.git
$ cd allocation-instrumenter
$ mvn clean package
The agent jar will be built in the target
directory
$ ls -l target
total 248
drwxr-xr-x 16 jmaloney admin 544 Aug 18 10:07 apidocs
drwxr-xr-x 4 jmaloney admin 136 Aug 18 10:07 classes
drwxr-xr-x 3 jmaloney admin 102 Aug 18 10:07 generated-sources
drwxr-xr-x 4 jmaloney admin 136 Aug 18 10:07 jarjar
-rw-r--r-- 1 jmaloney admin 74471 Aug 18 10:07 java-allocation-instrumenter-3.0-SNAPSHOT-javadoc.jar
-rw-r--r-- 1 jmaloney admin 34017 Aug 18 10:07 java-allocation-instrumenter-3.0-SNAPSHOT-sources.jar
-rw-r--r-- 1 jmaloney admin 9327 Aug 18 10:07 java-allocation-instrumenter-3.0-SNAPSHOT.jar
drwxr-xr-x 4 jmaloney admin 136 Aug 18 10:07 javadoc-bundle-options
drwxr-xr-x 3 jmaloney admin 102 Aug 18 10:07 maven-archiver
drwxr-xr-x 3 jmaloney admin 102 Aug 18 10:07 maven-status
drwxr-xr-x 3 jmaloney admin 102 Aug 18 10:07 original-classes
The java-allocation-instrumenter-3.0-SNAPSHOT.jar
file is the jar to specify in the javaagent parameter and the jar that contains the FlameCollapse utility. You will need to add the -javaagent
parameter to your jvm start command (substituting in
$ java -javaagent:<path>/java-allocation-instrumenter-3.0-SNAPSHOT.jar=<path>/flame.properties ...
This will instrument your JVM and generate stack trace information for generating FlameGraph’s. By default the stack trace information is written to /tmp/stacks.txt
but this is configurable in your flame.properties (see the readme for all available properties). After you have gathered enough stack trace information you are ready to turn it into a FlameGraph! There are two ways to do this: use the helper generate.sh script or manually call the FlameCollapse and flamegraph.pl tools. Using the generate.sh script is easy.
$ ./generate.sh -i /tmp/stacks.txt -f my_flamegraph.svg
This takes the raw stack trace data generated by the agent as parameter -i
and outputs it to a file specified by -f
. You can then view the interactive SVG in a web browser. If you look inside generate.sh you will see that its a very simple wrapper which does the collapse -> flamegraph steps for you. If you want to use custom flamegraph.pl
options you can run it manually by doing the following
$ java -jar target/java-allocation-instrumenter-3.0-SNAPSHOT.jar /tmp/stacks.txt
$ FlameGraph/flamegraph.pl --inverted collapsed.txt > my_flamegraph.svg
How It Works
The java agent, based largely on the Allocation Instrumentor library, instruments classes by inserting calls to the AllocationRecorder framework after every memory allocating byte code.
In order to reduce the overhead of the AllocationRecorder framework various sampling strategies can be employed to limit the number of allocations measured. When a sample is taken the AllocationRecorder creates an AllocationEvent which contains data about the object’s size via Instrumentation.getObjectSize(), the allocation site via stack trace, and the name of the class allocated. The event is then offered to a queue.
Events on the queue are consumed by the FlamePrinter thread which parses the allocation information and writes to a log that is consumable by later tooling to create a FlameGraph.