PATRICIA Trie / by Your Name

Practical Algorithm to Retrieve Information Coded in Alphanumeric
A radix tree, Patricia trie/tree, or crit bit tree is a specialized set data structure based on the trie that is used to store a set of strings. In contrast with a regular trie, the edges of a Patricia trie are labelled with sequences of characters rather than with single characters. These can be strings of characters, bit strings such as integers or IP addresses, or generally arbitrary sequences of objects in lexicographical order. Sometimes the names radix tree and crit bit tree are only applied to trees storing integers and Patricia trie is retained for more general inputs, but the structure works the same way in all cases. -- Wikipedia
This blog post is not about the technical details of the PATRICIA Trie but about the historical background of the implementation I've just released on Google Code.

Some time in late spring of 2005 I was skimming through Robert Sedgewick's book "Algorithms in Java" and the PATRICIA Trie caught my eye (Chapter 15.3, Page 650). I thought it's a neat Algorithm and started to work on an implementation that extends the Map interface. The remove operation turned out to be very tricky. I couldn't find any books or other implementations that would either explain how to do it or do it more efficiently than O(n) (i.e. rebuilding the Trie). I spent quite a few nights with figuring out how to do it and came up with a couple of solutions. It's probably worth a dedicated blog post.

I sent it to LimeWire along with a patch but they didn't use it before I started my internship at LimeWire in late 2005. The same summer we were talking about DHTs and in particular Kademlia. I kind of figured that PATRICIA can be used for the routing table but it was too early to talk about it and I was too busy with writing my final paper for school but that is how the Mojito DHT [1] project was eventually started in December 2005.

Sam Berlin returned in 2006 to LimeWire and he was looking for ways to filter IP-Addresses very efficiently. We were facing the problem that each Message needed to be matched against a huge blacklist of ranges with roughly 100k entries.

That is how he got interested in PATRICIA. He extended it by the SortedMap interface and more importantly he figured out how to implement the various views like entrySet or the Iterators without having to pre-traverse the entire Trie.

We decided to contribute it to the Apache Commons Collections and Google Collections Library but they are quite slow with picking up new things.

I have therefore started a new Google Code project that is hosting the most recent version of the Trie. You can find and download it from here.