by Keshan de Silva and Yudhanjaya Wijeratne
One of the most useful datasets we have is a collection of pseudoanaonymized call data records for all of Sri Lanka, largely from the year 2013. Given that Sri Lanka has extremely high cell coverage and subscription rates (we’re actually oversubscribed – there’s more subscribers than people in the country; an artifact of people owning multiple SIMS), this dataset is ripe for conducting analysis at a big data scale.
We recently used it to examine the event attendance of the annual Nallur festival that happens in Jaffna, Sri Lanka. Using CDR records, we were able to analyze the increase in population of the given region during the time of the festival. A lengthy writeup describes it on Medium, explaining the importance of the festival and the logic for picking it.
The gist of this is that three main analyses were conducted:
1. We needed to find whether there was in fact a sizeable population increase, and if so, whether they were visitors from outside the region of the festival.
To do this, we first filtered the raw dataset for CDRs originating from 2013-06-01 to 2013-10-31. This gave us the people who were in the region in the timeframe of the festival.
Next, we ran analysis to find the cell ID that each of these people frequent, using the tower of origin, and filtered that by the Jaffna District, where the town of Nallur is in.
This allowed us to estimate the population of the Jaffna District and compare that against time to see the influx during the festival. It also neatly prevents one subscriber from being counted to more than one cell, which makes these as close to unique people as we can get.
As expected, there was a population increase that showed up very clearly in the analysis. By comparing this against the baseline population of Jaffna, we were able to determine that these were, indeed, visitors, and not inter-regional shifts.
2. To confirm this, we had to identify where this influx came from. By working backward, and identifying which towers these records frequently connected to before the festival (over a period of months), were able to chart out the home regions of these visitors, and thus estimate how much of that population increase came from different regions of the country.
3. The next analysis was where these people went to during and after the festival, which spans over multiple consecutive days. By tracking the cell towers that these visitors connected to, and examining the areas of coverage for these cell towers, we were able to derive insights as to where people stayed and went. For example, more people flock to Nallur during the latter parts of the festival: a confirmation of vague anecdotal reports (often found only second or third-hand on the Internet) that certain ceremonies of the festival are more important than others.
So what of Nallur?
It’s up to the government and the organizers to understand and apply these insights. Perhaps Kandy needs more transport arranged: perhaps the town of Jaffna might need more bare space next time to accommodate everyone. Perhaps a tourism industry can be encouraged around events like these. There are certainly business opportunities that can be encouraged — for example, all those people coming to watch definitely need places to stay, especially in those last, unexpectedly popular days.
Great events happen all across the world for all kinds of reasons — consider, for example, the Kandy Perehara, a T20 cricket match, or a May Day rally — or even something as exotic as Burning Man. Nallur is a case study, but is not the limit of this kind of analysis. With Call Data Records, we can figure out where people flock to, and where they come from when these events happen. Like what we saw here, you do find unexpected insights in these things, and the solutions are diverse.
Imagine what else we could do next.