Kurser i Domain-Driven Design - Våren 2012




Tuesday, October 27, 2009

From Java to Clojure Followup

Interest in the From Java to Clojure article has been pretty big, rarely before have we seen a PNEHM!-article getting so much attention!

I have also gotten some great feedback, in particular Stuart Halloway suggests that we use Clojure-contrib's seq-utils/frequencies to improve the code. Since we use Apache Commons Collections in the Java version of the code, its only fair for us to dive into the goodness of Clojure-contrib to see what can be of use there.

Let us start by a quick review of what our final version of the Clojure code looked like:

(ns step4.pnehm.clojure-orderer)

(defn count-words [coll]
(reduce #(merge-with + %1 {%2 1}) {} coll))

(defn cmpr [[val1 freq1] [val2 freq2]]
(let [freq (compare freq2 freq1)]
(if-not (zero? freq) freq (compare val1 val2))))

(defn order-by-freq [coll]
(keys (sort cmpr (count-words coll))))

As it turns out, clojure.contrib.seq-utils/frequencies does exactly what our function count-words does, as such we can use it as a drop-in replacement. A version which uses contrib now looks like this:

(ns withcontrib.pnehm.clojure-orderer
(:use clojure.contrib.seq-utils))

(defn cmpr [[val1 freq1] [val2 freq2]]
(let [freq (compare freq2 freq1)]
(if-not (zero? freq) freq (compare val1 val2))))

(defn order-by-freq [coll]
(keys (sort cmpr (frequencies coll))))

We have to make sure we :use clojure.contrib.seq-utils, and then we can replace our call to count-words, with a call to frequencies.

Now, for extra credits, let's look inside the frequencies function in Clojure-contrib, to see what i looks like:

(defn frequencies
"Returns a map from distinct items in coll to the number of times
they appear."
[coll]
(reduce (fn [counts x]
(assoc counts x (inc (get counts x 0))))
{} coll))

The implementation is quite different from our own, it feels a bit more straight-forward and intuitive. Initially, an empty map is created. As the collection is reduced a copy of the map is created for each processed item and the item is added with an incremented count (if the item already is in the map) or added to the new map with a count of one (if it is the first time the item is processed). get, gets a value from the map given a key, if there is no match the default, '0', is returned. inc increments the value, and assoc associates the value to a key in the map.

Not only is this version simpler than our own (which is good), it's also much faster (also good). Using seq-utils/frequencies a sample run with our micro-benchmark now looks like this (sorting 100 characters with 10000 samples):

Java: 120 ms
Groovy: 538 ms
Time Clojure: 563 ms

Excellent!

So, Joshua Bloch's item 47 in Effective Java (2 ed) applies as always: Know and use the libraries. If you get the feeling that someone must have done what you are about to do before you, someone most probably have.

Many thanks to Stuart Halloway (who will be at Øredev next week, don't miss it!).

The sources at http://code.google.com/p/pnehm-java-to-cool-language-x/ have been updated with the alternative version.

No comments: