Data, Algorithms and Policy


On two occasions I have been asked [by members of Parliament], “Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?” I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question. 

— Charles Babbage, Passages from the Life of a Philosopher (1864), Chap. 5, 59.

We live in a society where machines, algorithms and humans intertwine; where the “consensual hallucination” of cyberspace is no longer a separate part of our lives, but a swamp through which we wade, leaving data trails for the world to see; where the wrong people put the wrong figures into the wrong machines and wonder why the answer isn’t right. Terms like “Big Data” and “AI” have become Rorschach blots on the public consciousness

LIRNEasia’s role is to participate in the public policy dialogue around our algorithmically-inclined society with critical research and technical expertise. Since 2013, as cross-disciplinary team of data scientists, lawyers, and social scientists, we have conducted our own analyses, engaged deeply with policy makers and with private, data-heavy organizations.


Documents

  • Misinformation in Bangladesh: A Brief Primer

    Over the past decade, both internet penetration and digital media user base have increased substantially.

  • A Corpus and Machine Learning Models for Fake News Classification in Bengali

    We present a dataset consisting of 3468 documents in Bengali, drawn from Bangladeshi news websites and factchecking operations, annotated as CREDIBLE, FALSE, PARTIAL or UN-CERTAIN. The dataset has markers for the content of the document, the classification, the web domain from which each document was retrieved, and the date on which the document was published. We also present the results of misinformation classification models built for the Bengali language, as well as comparisons to prior work in English and Sinhala.

  • (Research Report) AI Ethics in Practice

    This research report analyses the implementation of AI ethics principles in the policy, legal and regulatory, and technical arenas in Singapore and India.

  • CPM’s 26th webinar on ‘Safety of information in a technically driven world’

    The Institute of Chartered Professional Managers of Sri Lanka’s (CPM Sri Lanka) 26th Webinar was held on the 27th of August 2021 with a focus on the ‘Safety of information in a technically driven world’ a timely subject of cyber security.

    LIRNEasia Chair Prof. Rohan Samarajiva, shared his expertise on the key presentation (below) focusing on the current issues of information security, including potential risks to organizations and its management, data storage and back-ups as well as prevention and recovery.

  • A Corpus and Machine Learning Models for Fake News Classification in Sinhala

    We present a dataset consisting of 3576 documents in Sinhala, drawn from Sri Lankan news websites and factchecking operations, annotated as CREDIBLE, FALSE, PARTIAL or UN- CERTAIN. The dataset has markers for the content of the document, the classification, the web domain from which each document was retrieved, and the date on which the document was published. We also present the results of misinformation classification models built for the Sinhala language, as well as comparisons to English benchmarks, and suggest that for smaller media ecosystems it may make more practical sense to model uncertainty instead of truth vs falsehood binaries.

  • The Control of Hate Speech on Social Media: Lessons from Sri Lanka

    As hate speech on social media becomes an ever-increasing problem, policymakers may look to more authoritarian measures for policing content. Several countries have already, at some stage, banned networks such as Facebook and Twitter (Liebelson, 2017).

  • Sinhala Language Corpora and Stopwords from a Decade of Sri Lankan Facebook

    This paper presents two colloquial Sinhala language corpora from the language efforts of the Data, Analysis and Policy team of LIRNEasia, as well as a list of algorithmically derived stopwords. The larger of the two corpora spans 2010 to 2020 and contains 28,825,820 to 29,549,672 words of multilingual text posted by 533 Sri Lankan Facebook pages, including politics, media, celebrities, and other categories; the smaller corpus amounts to 5,402,76 words of only Sinhala text extracted from the larger.

  • Artificial Intelligence for Factchecking: Observations on the State and Practicality of the Art

    We summarize the state of progress in artificial intelligence as used for classifying misinforma- tion, or ’fake news’. Making a case for AI in an assistive capacity for factchecking, we briefly examine the history of the field, divide current work into ’classical machine learning’ and ’deep learning’, and for both, examine the work that has led to certain algorithms becoming the de facto standards for this type of text classification task.

More Documents →


Events

More Events →


People

Blogs and Updates