Predicting News Virality Using Machine Learning Models

Authors

  • Ayesha Noor Assistant Professor of Data Science, Bahauddin Zakariya University Author
  • Hamza Rauf Lecturer in Computer Science, University of Lahore Author

Keywords:

News Virality, Machine Learning, XGBoost, Ensemble Models, Computational Journalism, Sentiment Analysis

Abstract

The paper set out to determine the level of efficacy found in machine learning models when evaluating the virality of the online news stories, through the mixed-methods approach of incorporating both quantitative performance indicators and qualitative language evaluation.  Textual characteristics and engagement statistics were preprocessed and transformed through TF-IDF and embedding techniques and feature engined into sentiment polarity, thematic diversity, and article length.  Some of the supervised models we trained and tested are Logistic Regression, Support Vector Machines, Random Forest, Gradient Boosting and XGBoost. We employed stratified splits and bootstrap validation and measures of performance such as Accuracy, Precision, Recall, F1-score, and the ROC-AUC.  As can be seen in the results, ensemble methods were always more accurate and achieved a higher F1-score as compared to the baseline models.  The recall was higher in Logistic Regression and Naive Bayes like Recall was better in Logistic Regression which implies the better article containing viral article retrieval. TF-IDF characteristics enhanced more compared to embeddings in sparse environments in terms of text.  The HAP analysis further added more insight as to how sentiment, alteration in themes, and language style may all be key contributors to making anything go viral.  The findings suggest hybrid approaches that combine the power of algorithms with linguistic understanding to be the most robust framework of predicting newsworthiness.  The study contributes to the areas of research on computational journalism and the digital realm of communication by providing practical suggestions on how to enhance the forecasts of engagement and stem the proliferation of misinformation by means of guidance of media platforms, marketers, and politicians.

Downloads

Published

2025-06-30