Beyond Counting Words: Assessing Performance of Dictionaries, Supervised Machine Learning, and Embeddings in Topic and Frame Classification

Anne C. Kroon*, Toni van der Meer, Rens Vliegenthart

*Corresponding author for this work

Research output: Contribution to journalArticleAcademicpeer-review

Abstract

Topics and frames are at the heart of various theories in communication science and other social sciences, making their measurement of key interest to many scholars. The current study compares and contrasts two main deductive computational approaches to measure policy topics and frames: Dictionary (lexicon) based identification, and supervised machine learning. Additionally, we introduce domain-specific word embeddings to these classification tasks. Drawing on a manually coded dataset of Dutch news articles and parliamentary questions, our results indicate that supervised machine learning outperforms dictionary-based classification for both tasks. Furthermore, results show that word embeddings may boost performance at relatively low cost by introducing relevant and domain-specific semantic information to the classification model.
Original languageEnglish
Pages (from-to)528-570
JournalComputational Communication Research
Volume4
Issue number2
DOIs
Publication statusPublished - 1 Oct 2022

Fingerprint

Dive into the research topics of 'Beyond Counting Words: Assessing Performance of Dictionaries, Supervised Machine Learning, and Embeddings in Topic and Frame Classification'. Together they form a unique fingerprint.

Cite this