Google maps has surely made life easy for all of us hasn’t it. Whether it is to explore a new city, planning a road trip, find the quickest route to the airport, avoid traffic on the way to office! Just type in your destination and you have the perfect route! Seems so simple!! But all that goes on in the background from typing your destination to getting your route is anything BUT simple. So let's have a look at what does go on behind the scenes. Lets design Google Maps today!
The solution to most of these challenges lies in Dynamic Programming. And this is where we will introduce something called Segment.
A segment is a small area that we can operate on easily. For example, a city can be divided into hundreds of segments of size, let’s say, 1km X 1km. Each segment will have four coordinates and based on these coordinates we can identify which segment the user is in. We will map the globe into segments, find paths within these segments and keep building up our solution till we find the path between two required locations. Just to clarify here, when we say coordinates we mean latitude and longitude of the location. The easiest way to visualise a road network is as a graph, where each junction will be a node and each road will be an edge. Each edge can have multiple weights like distance, time, traffic etc. which we can use to find the shortest/quickest path. From this graph visualisation we can safely say there will be multiple paths between various points, so we will run any of the graph algorithms to find shortest paths between various points within the segment. Also to avoid recalculating these paths again and again, we will cache this information and we will call this shortest path a calculated edge or calculated path.
We will also cache the distances and calculated edges of each vertex from exit points of the segment. Once we know the distance from segment exits we can easily calculate inter-segment distances.
When we are navigating inter-segment, it is important that we identify how many segments we need to consider for our algorithm, because we can’t run a Dijkstra’s for all the segments across the globe. To restrict the number of segments we use the aerial distance between the 2 points. Suppose aerial distance between points A and B is 10km, we can consider 10 segments across each direction i.e 400 segments, which is much better than the otherwise massive graph.
Once the number of segments is restricted, we can restrict our graph such that the exit points of each segment will become the vertices of the graph and the calculated paths between the two points and all the exit points will become the edges of the graph. Now all we need to do to find the route is run a Dijkstra’s on this graph.
So we have now divided the map into segments and we know how to calculate inter-segment routes. But what if the route spans over thousands of segments. Running Dijkstra’s on this huge graph will be very time consuming. So we introduce another level of nesting called Mega Segments. We will now divide the map into mega-segments which will be further divided into segments. Similar to how we cached calculated edges between exit points for segments, we will also cache calculated paths between exit points for mega-segments and run dijktra’s on this graph of exit points of mega segments.
Now that we have a graph, how do we come up with weights for the edges? As we mentioned previously, our edges can have multiple weights. There can be -
We will not consider traffic, weather etc as weights because the are not that easily quantifiable. Instead we will consider them as attributes that can effect the average speed. But now the question is how to update the route when traffic/weather conditions change? Any change in these attributes will change the average speed and in turn the ETA. As soon as the weight(ETA, avg speed) of the edges is impacted the computed paths are recalculated to come up with a new route. This recalculation is only done when the weight changes more than a threshold percentage. For eg, if the ETA changes by more than x%, we will recalculate the path, otherwise we will stick to the current route.
And that is the algorithm used by google maps to compute the route between two points. Now let us have a look at the system architecture.
For ease of understanding we will understand the architecture in two parts. First we will see how the users are connected to the application and how their location information is captured. In the next part we will see the architecture for navigation flow.
And that is all from architecture point of view. I am sure you guys have some questions about why we are using which service like why Redis, why Cassandra, why Kafka. Well that is content for another article. We have written an article on Choosing The Best Storage Solutions For Your Requirements. You can go through it to understand when to use which databases or services and their alternatives.
Now if you remember when we discussed some of the challenges while building this system, we mentioned something called Disputed areas.
Let us consider the example of the dispute between the countries of India, China and Pakistan over the state of Kashmir. How does Google mark the boundaries of the state when Pakistan claims some part of the state belongs to them, China makes similar claims about another part of the state while India claims whole of Kashmir is under their jurisdiction. Here Google does something really cool. They mark the boundaries based on the country you are coming from! If the user is coming from India, whole of Kashmir will be marked as a part of India, where as if the user is travelling from Pakistan, the part that Pakistan claims to be under their territory is marked as Pakistan’s land and the part that is disputed between India and china is marked as a dotted line and same for China.
That should be it for Google Maps System Design! Send in your thoughts on our youtube video!