Misinformation and Language Resources — LIRNEasia


LIRNEasia initiated its exploration of misinformation in 2018 following ethnic riots in Digana, Sri Lanka. The focus was on investigating automated rogue actors on social media, particularly through the analysis of tweets to comprehend their impact using data analytics. In 2020, the scope of our work expanded to encompass the examination of AI’s role in misinformation. This involved delving into the state of the art, designing, and testing over 400 machine learning models to assess algorithmic efficacy, data requirements, as well as hardware and liveware costs. The outcomes included the development of new misinformation datasets and models tailored for Sinhala and Bengali.

Leveraging our strengths in qualitative research, the team also probed into the challenges faced by regional fact checkers and journalists. We explored the practical aspects of technology adoption in this context through key informant interviews. In 2021, LIRNEasia conducted a scoping study funded by IDRC to comprehend the nature of information disorder and strategies to counteract it. The study output comprises a comprehensive map of actors and frameworks, an evaluation of current approaches and tools used by stakeholder groups, and an overview of the research landscape. The scoping study involved both desk research and key informant interviews.

Previous research by LIRNEasia delved into the nature of information disorder in Asia, mapping the involved actors and examining the actions and strategies employed. The findings highlighted that fact-checking, awareness campaigns, training programs (including digital literacy initiatives), and policy changes were among the widely adopted strategies to combat information disorder.

Currently, LIRNEasia is engaged in two projects related to information disorder. The first, titled “Human Factors in Information Disorder and Finding Measures to Counter: An Experimental Approach Leading to New Knowledge Creation,” is conducted in collaboration with Watchdog and Sarvodaya Fusion. Simultaneously, the second project addresses information disorder at the grassroots level, with training programs specifically designed for school children.


Documents

  • Tackling online misinformation while protecting freedom of expression (Event Report)

    An Expert Round Table discussion on “Tackling online misinformation while protecting freedom of expression” held on the 11th of October 2021, as the second of a series of discussions under the theme of “Frontiers of Digital Economy”

  • Webinar on Information Disorder

    LIRNEasia joined a webinar on Information Disorder organized by University of Cape Town on 6 May 2022. This event was based on the collaborative Global South report on Information Disorder where LIRNEasia authored the chapter on Asian region. 

  • Use of AI in classifying Misinformation [White Paper]

    A white paper exploring the use of AI in classifying misinformation. 

  • Misinformation in Bangladesh: A Brief Primer

    Over the past decade, both internet penetration and digital media user base have increased substantially.

  • A Corpus and Machine Learning Models for Fake News Classification in Bengali

    We present a dataset consisting of 3468 documents in Bengali, drawn from Bangladeshi news websites and factchecking operations, annotated as CREDIBLE, FALSE, PARTIAL or UN-CERTAIN. The dataset has markers for the content of the document, the classification, the web domain from which each document was retrieved, and the date on which the document was published. We also present the results of misinformation classification models built for the Bengali language, as well as comparisons to prior work in English and Sinhala.

  • A Corpus and Machine Learning Models for Fake News Classification in Sinhala

    We present a dataset consisting of 3576 documents in Sinhala, drawn from Sri Lankan news websites and factchecking operations, annotated as CREDIBLE, FALSE, PARTIAL or UN- CERTAIN. The dataset has markers for the content of the document, the classification, the web domain from which each document was retrieved, and the date on which the document was published. We also present the results of misinformation classification models built for the Sinhala language, as well as comparisons to English benchmarks, and suggest that for smaller media ecosystems it may make more practical sense to model uncertainty instead of truth vs falsehood binaries.

  • The Control of Hate Speech on Social Media: Lessons from Sri Lanka

    As hate speech on social media becomes an ever-increasing problem, policymakers may look to more authoritarian measures for policing content. Several countries have already, at some stage, banned networks such as Facebook and Twitter (Liebelson, 2017).

  • Sinhala Language Corpora and Stopwords from a Decade of Sri Lankan Facebook

    This paper presents two colloquial Sinhala language corpora from the language efforts of the Data, Analysis and Policy team of LIRNEasia, as well as a list of algorithmically derived stopwords. The larger of the two corpora spans 2010 to 2020 and contains 28,825,820 to 29,549,672 words of multilingual text posted by 533 Sri Lankan Facebook pages, including politics, media, celebrities, and other categories; the smaller corpus amounts to 5,402,76 words of only Sinhala text extracted from the larger.

More Documents →


Events

More Events →


People

Blogs and Updates