Skip to main navigation Skip to search Skip to main content

Mapping the Infodemic: Geolocating Reddit Users and Unsupervised Topic Modeling of COVID-19-Related Misinformation

Research output: Contribution to journalArticlepeer-review

Abstract

The problem of geolocating Reddit users without access to the author information API is tackled in this study. Using subreddit data, we analyzed and identified user location based on their interactions within location-specific subreddits. Using unsupervised learning methods such as Latent Dirichlet Allocation (LDA) and Non-Negative Matrix Factorization (NMF) algorithms, we examined conversations about COVID-19 and immunization across the U.S., focusing on COVID-19 vaccination. Our topic modeling identifies four themes: humor and sarcasm (e.g., jokes about microchips), conspiracy theories (e.g., tracking devices and microchips in the COVID-19 vaccine), public skepticism (e.g., debates over vaccine safety and freedom), and vaccine brand concerns (e.g., Pfizer, Moderna, and booster shots). Our geolocation analysis shows that regions with lower vaccination rates often exhibit a higher prevalence of misinformation-labeled comments. For example, counties such as Ada County (Idaho), Newton County (Missouri), and Flathead County (Montana) showed both a low vaccine uptake and a high rate of false information. This study provides useful information on the many different examples of misinformation that are disseminated online. It gives us a better understanding of how people in different parts of the U.S. think about getting a COVID-19 vaccine.

Original languageEnglish
Article number748
JournalInformation (Switzerland)
Volume16
Issue number9
DOIs
StatePublished - Sep 2025

Keywords

  • COVID-19
  • fake news
  • geolocation
  • misinformation
  • topic modeling
  • unsupervised learning

Fingerprint

Dive into the research topics of 'Mapping the Infodemic: Geolocating Reddit Users and Unsupervised Topic Modeling of COVID-19-Related Misinformation'. Together they form a unique fingerprint.

Cite this