Jim Driscoll's Blog

Notes on Technology and the Web

Bacon Numbers with Java 8, part 4

leave a comment »

This fourth part of the series of posts on getting a Bacon Number with Java 8 will probably be a bit of an anticlimax. As a reminder, we’re covering:

Given a large hash table whose keys are movie names and whose values 
are a list of actors in those movies, write a function to determine 
the Bacon Number of a particular actor.

We created sample data with Stream.generate, we also created a micro benchmark with functional interfaces, to measure how we were doing. And last post, we modified the data to be easier to search.

As the last step, we actually need to search that modified data. Since we only want to find the closest relationship, a Breadth First Search is certainly the way to go.

Set<Integer> visited = new HashSet<>();
Integer result = null;
Set<Integer> degreelist = actors.get(firstactor);
if (degreelist == null || actors.get(secondactor) == null) {   //short circuit
    result = -1;
} else {
    int degree = 1;
    while (result == null) {
        if (degreelist.contains(secondactor)) {
            result = degree;
            break;
        }
        degree++;
        if (degree > DEEPESTSEARCH) {
            result = DEEPESTSEARCH;
            break;
        }
        visited.addAll(degreelist);
        Set<Integer> nextDegreelist = new HashSet<>();
        degreelist.stream().forEach((i) -> {
            nextDegreelist.addAll(actors.get(i));
        });
        nextDegreelist.removeAll(visited);
        if (nextDegreelist.isEmpty()) {
            result = -1;
            break;
        }
        degreelist = nextDegreelist;
    }
}

Assuming we’re starting with a Map called actors, and two ints for the first and second actors to find. Create a Set of all visited actors (starting at the first one), and then just start iterating through each level, looking for the second one. When we find it, we’re done. Note that I use a forEach on the Stream from each degree to build the Set that I search. But we can do better:

        Set<Integer> visited = new HashSet<>();
        Integer result = null;
        Set<Integer> degreelist = actors.get(firstactor);
        if (degreelist == null || actors.get(secondactor) == null) {   //short circuit
            result = -1;
        } else {
            int degree = 1;
            while (result == null) {
                if (degreelist.contains(secondactor)) {
                    result = degree;
                    break;
                }
                degree++;
                if (degree > DEEPESTSEARCH) {
                    result = DEEPESTSEARCH;
                    break;
                }
                visited.addAll(degreelist);
                Set<Integer> nextDegreelist = degreelist.parallelStream().unordered()
                        .flatMap((i) -> actors.get(i).stream())
                        .collect(Collectors.toSet());
                nextDegreelist.removeAll(visited);
                if (nextDegreelist.isEmpty()) {
                    result = -1;
                    break;
                }
                degreelist = nextDegreelist;
            }
        }

Here, I build the Set I’m going to search through using a unordered parallelStream. I collapse the results of that Stream using flatMap into a Stream of Integers, then collect it at the end into a Set. All this is much more efficient, and results in a clock time savings of about 50% on an 8 core machine.

A word about what map and flatMap do, since I haven’t mentioned flatMap before: map will take a result on your Stream, and map it to a different value, applying the function you supply to each element of the Stream. Similarly, flatMap does the same thing, but it’s built to work with Streams as the returned object, and will unpack them from a Stream of Streams (of some Class) into a single continuous Stream of the Class. So this mapping function is taking a Stream of Integers, using them to look up a Set of Integers, and then unpacking that into individual Integers that it then returns as a Stream. All in one concise function.

At the end of that flatMap process, I bundle that Stream back up into a Set via the call to collect, and that’s what’s used for the next search.

And there you have it: We’ve not just found the Bacon Number as specified, we’ve enabled discovering any relationship degree at all.

All the source code for this and the previous 3 posts can be found in the gist. I’d love to know what you thought.

About these ads

Written by jamesgdriscoll

January 11, 2014 at 3:52 PM

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 413 other followers

%d bloggers like this: