Data, Algorithms and Policy — Page 2 of 15 — LIRNEasia


We present a dataset consisting of 3576 documents in Sinhala, drawn from Sri Lankan news websites and factchecking operations, annotated as CREDIBLE, FALSE, PARTIAL or UN- CERTAIN. The dataset has markers for the content of the document, the classification, the web domain from which each document was retrieved, and the date on which the document was published. We also present the results of misinformation classification models built for the Sinhala language, as well as comparisons to English benchmarks, and suggest that for smaller media ecosystems it may make more practical sense to model uncertainty instead of truth vs falsehood binaries.
As hate speech on social media becomes an ever-increasing problem, policymakers may look to more authoritarian measures for policing content. Several countries have already, at some stage, banned networks such as Facebook and Twitter (Liebelson, 2017).
This paper presents two colloquial Sinhala language corpora from the language efforts of the Data, Analysis and Policy team of LIRNEasia, as well as a list of algorithmically derived stopwords. The larger of the two corpora spans 2010 to 2020 and contains 28,825,820 to 29,549,672 words of multilingual text posted by 533 Sri Lankan Facebook pages, including politics, media, celebrities, and other categories; the smaller corpus amounts to 5,402,76 words of only Sinhala text extracted from the larger.
We summarize the state of progress in artificial intelligence as used for classifying misinforma- tion, or ’fake news’. Making a case for AI in an assistive capacity for factchecking, we briefly examine the history of the field, divide current work into ’classical machine learning’ and ’deep learning’, and for both, examine the work that has led to certain algorithms becoming the de facto standards for this type of text classification task.
In a practical experiment, we benchmark five common text classification algorithms - Naive Bayes, Logistic Regression, Support Vector Machines, Random Forests, and eXtreme Gradient Boosting - on multiple misinformation datasets, accounting for both data-rich and data-poor environments.
LIRNEasia Chair, Rohan Samarajiva shared a message with students in Sri Lanka who have completed their formal education on SLVLOG Good Vibes.
Intended for policymakers, technologists, educators and others, this international collection of 19 short stories delves into AI’s cultural impacts with hesitation and wonder.
Information collection (or data collection) is vital during an epidemic, especially for purposes such as contact tracing and quarantine monitoring. However, it also poses challenges such as keeping up with the spread of the infectious disease, and the need to protect personally identifiable information. We explore some of the methods of information collection deployed in Sri Lanka and Thailand during the COVID-19 pandemic, and offer policy recommendations for future pandemics.
The fears are that those who are connected or corrupt will get free vaccines, even if they are not on the priority list; or that vaccines obtained for the free channel will be diverted to the pay channel, allowing private providers to make excessive profits which will feed the corruption.
Rohan Samarajiva and Ramathi Bandaranayake presented preliminary findings from our work on risk communication during COVID-19.
Rohan Samarajiva and Ramathi Bandaranayake presented preliminary findings from our work on risk communication during COVID-19.
Chair Rohan Samarajiva was interviewed by Roar Media on the implications of using drones for identifying those violating curfew orders.
Key considerations and recommendations for public health officials in developing wearable contact tracing solutions during COVID-19
This policy brief details guidance on making decisions in a pandemic.
Sometime in March 2018, the Sri Lankan government blocked access to Facebook, citing the spread of hate speech on the platform and tying it to the incidents of mob violence in Digana, Kandy.
Wijeratne, Y., de Silva, N. (2020).  Sinhala Language Corpora and Stopwords from a Decade of Sri Lankan Facebook. LIRNEasia.