Jim Driscoll's Blog

Notes on Technology and the Web

Java 8 Streams

leave a comment »

A short diversion from blogging about Web technologies: I was reading about some common programming questions asked in interviews, and I came across this problem: “Determine if a word is a palindrome”.

Leaving aside whether you want to really be testing anyone but fresh grads with a fresh-grad level problem, it did occur to me that I hadn’t done anything like this in many, many years. And Java has some new features, right? Why not try them out? So, inspired by Cay Horstman’s blog post on Java 8 (and aided by his excellent book Java SE 8 for the Really Impatient), I took a stab at finding all the palindromes in English. (This code is available as a gist.)

    Path dict = Paths.get("/usr/share/dict/words");
    Stream wordStream = Files.lines(dict);
    long palindrome = wordStream
        .filter(TestPalindrome::isPalindrome ) // find all palindromes
        .peek(System.out::println)  // write each match 
        .count(); // terminal - return a count
    System.out.println("Count of palindromes: "+palindrome);

If you’re not familiar with Java 8 Streams, well, it’s actually kind of a treat: It’s become incredibly easy to generate lists of various objects (in this case, Strings) from input, convert back and forth between formats, filter results, and get summaries. You just need to get used to a slightly different syntax.

Let’s go through the code above. Find the list of all English words (on my (Mac) system, it’s in /usr/share/dict/words), get that list as a Stream of Strings. Then filter that list, via a test function (isPalindrome) that returns a boolean. I use peek to print the matching list to standard out, then get a count of the number of palindromes. By the way – according to this program, it’s only 161 words – who knew?

The two code sections containing :: are Method References, new to Java 8. They function like function pointers in many other languages (though they’re better typed, since this is Java). peek(System.out::println) means “for each String in the filtered Stream, call System.out.println, passing as a parameter the each String in turn”. Similarly, for TestPalindrome::isPalindrome – I had to use the class name instead of being able to use this, since it was called in a static method (the main method of the program).

The code to actually determine the palindrome is called inside the filter, and looks like this:

    private static boolean isPalindrome(String word) {
        boolean isEven = word.length() % 2 == 0;
        return isPalindrome(word, isEven, isEven ? 0 : 1);
    }
    
    private static boolean isPalindrome(String word, boolean isEven, int offset) {
        int midpoint = word.length() / 2;
        if (offset > midpoint + (isEven ? -1 : 0) ) {
            return true;
        } 
        char xchar = word.charAt(midpoint - offset + (isEven ? -1 : 0));
        char ychar = word.charAt(midpoint + offset);
        if (xchar != ychar) {
            return false;
        }
        return isPalindrome(word, isEven, offset + 1);
    }

This code could certainly be shrunk down further, but I thought it struck the right balance of readability and performance. The entire run to determine every palindrome in the English language now takes about a second, of which, 10% of that is spent printing them. Moore’s law continues to catch me off guard. (This is on a brand new top of the line MacBook Pro, though, so your milage may vary. But it’s still a far cry from the execution time which would have been measured in minutes that I still remember from… well, a long time ago.)

One last note about this code: It’s trivial to change this from a sequential examination of all 200k words in the dictionary to a parallel search – just add .parallel() after either the creation of the stream from Files or the beginning of the Stream chain, before .filter. Can you guess how much faster this makes the code? I bet you’re wrong (I was!): It’s about 10% slower, on a machine with 8 cores. And if you don’t know why that is, just think about this: How expensive is it to set up all those threads? How cheap is the Palindrome function? The answer lies in the balance of those two. So, lesson relearned (for me): Always test before assuming performance gains.

Hope this was interesting. Check out Cay’s book – it’s a good overview of what’s changed. (It’s currently in rough draft form, and only available from Safari Books Online, but expected publishing date is listed as Jan 17, so it might be more widely available by the time you read this.)

Written by jamesgdriscoll

January 2, 2014 at 5:36 PM

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 413 other followers

%d bloggers like this: