John Cairns bio photo

John Cairns

John is an engineer, architect and mentor who focuses on extremly performance sensitive and ultra high volume applications.

Email Twitter Github

Conversant R-Tree is an index for high dimensional range data. It is well suited for indexing spatial data and provides efficient range searching across all dimensions. The Conversant R-Tree is a hyperdimensional implementation that supports data with arbitrarily large numbers of orthogonal relations cast as dimensions.

In the simplest case, an R-Tree is comparable with a B-Tree, with bounding rectangles for entries. Each node of the tree contains a rectangle representing the minimum bounds of all of the child rectangles. Querying this tree is possible, with a simple containment test in the bounding rectangle. First, the root node is checked, then each matching subtree. In a B-Tree only two branches are possible, however an R-Tree extends the B-Tree by checking containment across an arbitrary number of child branches. The R-Tree is constructed in such a way to get a close to optimal arrangement for searching. If instead of 2-D rectangles, data contains third, fourth, or nth dimensions, typical R-Tree implementations are limited. However Conversant R-Tree supports an arbitrarily large number of dimensions. The only requirement is for the implementer to provide a custom ‘contains’ and ‘intersects’ method for their HyperRect.

Conversant’s implementation supports entries of ‘n’ dimensions by appropriately extending the concept of extent (area) and range to a hypercube. We employ a mechanism to create ranges out of our data, allowing us to build and efficiently query a tree of bounding ‘rectangles’ of 20 or more dimensions. A fundamental concept in many R-Tree split mechanism is the extent of the rectangle. We provide several splitting approaches to work with multiple types of data.

Two important decisions need to be made when building an R-Tree. First, the number of entries per node and second how to handle splitting up nodes that fill up. These decisions are largely based on the rectangle sizes and amount of overlap of the bounding regions to be indexed by the tree. We measured the optimal values based on our specific data and included the code for all strategies we evaluated for node-splitting. The table above gives some of the results in our evaluation with split type down the left column and number of child branches across the top row. The number indicates the typical insert or search depth for a tree of 1000 range objects.

Check out a working example or try the code out for yourself by cloning the Conversant R-Tree github project

Further Reading