I've recently been looking at location based services for mobile devices. In particular I wanted to implement a service that my Android based application could call, providing 'my location', and find the location of nearest points of interest. In my example, I've used UK train stations.
The brute force approach would certainly work – looping through all stations and calculating the distance from my location and then sorting by distance – but that of course would not be a scalable solution.
Instead, after doing a bit of research, I decided use a kd-tree to model the station locations and implement a search algorithm to find the nearest stations. This is based on a paper by Andrew Moore.
This paper discusses a general algorithm for finding the nearest neighbours in k-dimensional Cartesian space. The following example shows how the algorithm works for finding a single nearest neighbour in 2-d space.
A 2 dimensional kd-tree divides the space into rectangles, each node representing a point of interest that divides the space up on either x or y axis at each level of the tree.
So the root node, [5,14] divides the space on the y-axis into two areas. The left side of the tree has y-values of less than 14 and the right side of the tree has y-values greater than 14. The second level of the tree splits the space along the x-axis, so the left branch, below [22,7], has x-values less than 22 and the right branch has x-values greater than 22; similarly for [12,22], that splits at x-value 12.
Therefore it is easy to traverse down the tree picking the “nearer” branch below each node, that is the branch in which our target lays, and calculating which of the nodes is the candidate “nearest” point.
So if our target point is [7,6], then at the root, [5,14], we'd look in the left branch of the tree since 6 is less than 14, then at [22,7] we'd look again in the left branch since 7 is less than 22; and we find our nearest neighbour is [4,4].
However, if we consider a target point of [32,12] then the initial traversal doesn't give us the actual nearest neighbour. It finds [34,6]. The real nearest neighbour is [30,16] but it was in the “further” branch of the tree when we considered where to traverse at the root.
The algorithm described by Moore accounts for this. It back tracks and considered each “further” branch of the tree to determine if a nearer neighbour than the current candidate could exist in that branch. i.e. does any of the rectangle defined by the “further” branch intersect with the circle around the target point with a radius equal to the distance from the target to the candidate nearest point.
Therefore, after finding the initial candidate nearest point, [34,6], we knows that any nearer point must be within the circle, which shaded blue in the diagram.
We back track to [22,7] and considers the “further” branch (the left hand branch in this case). In this case, we can see that there is no intersection between the rectangle defined by the “further” branch, here shaded red, and the circle and therefore we can immediately discount it. There can be no nearer point in that branch of the tree.
We again back track, this time to the root [5,14] and consider the “further” branch, shaded yellow, and find that this does insect with the circle, so we traverse this branch, to see if there are any nearer points than the current candidate. In this case we do find that [30,16] is nearer.
This algorithm work as it is for Cartesian space. However, I needed to adjust it to work with the spherical co-ordinate system of the earth. I defined my kd-tree to be based on longitude and latitude and used the Haversine formula to calculate distances between points and “my location”. However, the calculation to see if the “rectangle” defined by a branch of the tree intersects the circle around my location is not so straight-forward. For example one must consider how it works around the poles and at -/+ 180 degrees longitude.
In a Cartesian system, points A and B are not considered to be near to each other. However, in reality they are when seen on the sphere.
Luckily there are some excellent libraries available that make this simple. In this case I picked up the OpenMap library which provides a function that does exactly the intersection calculation I needed.
With this, it was easy to implement a variant of Moore's algorithm that works with latitude and longitude co-ordinates.
In my next post I'll discuss converting from the Ordnance Survey national grid co-ordinates I had from the publicly available NaPTAN data for UK rail stations into latitude and longitude co-ordinates for my nearest neighbour algorithm and into correct latitude and longitude system for use with GPS and mobile device mapping.