In a practical experiment, we benchmark five common text classification algorithms – Naive Bayes, Logistic Regression, Support Vector Machines, Random Forests, and eXtreme Gradient Boosting – on multiple misinformation datasets, accounting for both data-rich and data-poor environments. We test these methods by repeatedly reducing the sizes of training data, thus creating 435 AI models in total. From these models we make observations on the data require- ments, the training times that such models might require in practice, and how the availability on these things impact accuracy and algorithm choices in practical scenarios. We then discuss the implications and avenues of further research.
Comments are closed.