John Cairns bio photo

John Cairns

John is an engineer, architect and mentor who focuses on extremly performance sensitive and ultra high volume applications.

Email Twitter Github

I’m used to people saying “don’t sweat the small stuff.” As a reformed C programmer writing Java, I’ve mostly gotten myself comfortable doing things the Java way. That means I usually try to structure my code to give the JVM the best advantage in terms of the optimization it can do. I typically leave fine tuning for rainy days and times when I want to have a whiteboard debate involving half of the team calling me names.

Graph: Decreased Process Time

The graph shows the process time for our application in milliseconds dropping as a result of the small change described in this article.

On the other hand habitually ignoring slow code is a recipe for mediocrity in a large system. I recently took a look at the profile results for our latest release. I like to make sure nothing “unusual” sticks out. Interestingly I noticed something usual sticking out instead. I had frequently overlooked a small method getIpFromString() because I imagined it was small enough and tight enough code that either the JVM would take care of the problem or whatever slight tuning I could make on it would not be valuable on a grand scale.

Nevertheless, the significance of this method in our overall CPU time bothered me enough that I decided to take a quick detour and consider the code in question. First, “Find Usages” revealed that there are over 200 occurrences of this method in our codebase. Apparently we are doing this calculation almost everywhere, and many others were also assuming that our library implementation was “good enough.”

Here is the code I found when I investigated further:

private static final Pattern IP_SPLITTER_PATTERN = Pattern.compile("\\.");

public static int getIpFromString(final String ipString) {
    try {
        final String[] ipStrs = IP_SPLITTER_PATTERN.split(ipString, 0);
        return	(Integer.parseInt(ipStrs[0], 10) << 24) +
		(Integer.parseInt(ipStrs[1], 10) << 16) +
		(Integer.parseInt(ipStrs[2], 10) << 8) +
		Integer.parseInt(ipStrs[3], 10);
	}
	catch (Exception e) {
		return 0;
	}
}

@95th percentile, 360ns

The nice thing about this code is it is fairly concise and easy to understand. However, a few things immediately struck me about this. First and foremost, this code is doing a regex match just to find the “.” I know from experience that regex is generally the most painful and slow way to do string analysis in Java. But, this is crushing an ant with a bazooka if ever anybody tried it. To make things worse, Java pours salt in the fresh bazooka wound by allocating an array and four substrings of octets on our behalf. After this, each octet is passed down to Integer.parseInt.

You might reasonably consider that parseInt is beyond reproach for this use case and it is. But when you look at the implementation you see that it is full of protective code and checking for things like negative values. The JDK must handle this, but my code need not. Also, the Achilles heel of Integer.parseInt seems to be that it throws an exception when it fails to parse the input token. Once again this sort of checking is properly the domain of a general purpose Java library but in a system that is parsing the output of machine generated IP addresses, it leads to branching code paths that seem only to serve the Java API requirements.

In the annoyance category, I note that the code is catching Exception rather than the less general NumberFormatException. Of course, Effective Java reasons against this sort of generality.

With resolve, I wrote my own version of this method to replace the established one. My first requirement is that I would jettison any type of String splitting, array creation, or substring based string analysis in favor of a more direct approach. I decided that I could parse integer numbers inline like your typical reformed C programmer would.

Taking into account endianness considerations, I have my solution in short order. With a quick unit test, I’m able to do some performance testing. Here is the replacement code:

 public static int getIpFromString(final String ipString) {
        int octet = 0;
        int digit = 1;

        int ip = 0;
        int upShift = 0;

        for(int i=ipString.length()-1; i>=0; i--) {
            final char c = ipString.charAt(i);
            if(c != '.') {
                octet += Character.getNumericValue(c)*digit;
                digit*=10;
            } else {
                ip |= (octet << upShift);
                upShift += 8;
                octet = 0;
                digit = 1;
            }
        }

        ip |= octet << upShift;

        return ip;
    };

The fundamental approach is the same as parsing a single int. Roll over the string backwards and increase a digit index that gives the power of 10. Finally when we encounter a “.”, simply save the octet in the appropriate location in the expected result. We must increase the shift by 1 byte (8 bits) every time through. After resetting the initial values for parsing the octet, I’m ready to start the loop again. This code suffers slightly from the complexity of having some of the parsing code inlined up into the main loop, however at 14 lines its hardly overweight in terms of method size.

One final catch, you have to grab the last octet, since there are only three dots and 4 octets, the last one will tarry if you don’t merge that down. Of course, I added that code after my unit test failed.

What were the results of this exploration? My benchmark comparison of the two original methods revealed that my code is easily a factor of 3 speedup on the original code. I pushed that to a factor of 5 by switching out the Character.getNumericValue(c) with the far simpler ASCII equivalent. Sorry UNICODE aficionados, the “Java Way” strikes again!

 public static int getIpFromString(final String ipString) {
        int octet = 0;
        int digit = 1;

        int ip = 0;
        int upShift = 0;

        for(int i=ipString.length()-1; i>=0; i--) {
            final char c = ipString.charAt(i);
            if(c != '.') {
                octet += (c - '0')*digit;
                digit*=10;
            } else {
                ip |= (octet << upShift);
                upShift += 8;
                octet = 0;
                digit = 1;
            }
        }

        ip |= octet << ups;

        return ip;
    };

95th percentile, 52ns

Notice the implementation of getNumericValue() is deeply nested in method calls that can’t possibly all be inlined properly in any real world execution scenario. In my testing I noticed a reduction in overall system process time, which I think is fairly amazing given the small fraction of our code that this tiny method represents.

But the most amazing result is this tiny change lowered GC bandwidth utilization considerably. Our peak young generation utilization went down by roughly a factor of 2. Considering that this problem shouldn’t properly involve any memory allocations in the first place, that is a nice improvement.

PostScript

What happened to getIpFromString() in our system profiler results? It went from 10th to not found! Not bad for a quick detour.

Benchmark

The benchmark results reported above were obtained by running 4 million samples over 4096 random generated IP addresses. 95th percentile is reported, but other percentiles are consistent. The hardware is a notebook with Intel Core i7-4600U.