TY - JOUR
T1 - What’s the Tone? Easy Doesn’t Do It
T2 - Analyzing Performance and Agreement Between Off-the-Shelf Sentiment Analysis Tools
AU - Boukes, Mark
AU - van de Velde, Bob
AU - Araujo, Theo
AU - Vliegenthart, Rens
PY - 2020/4/2
Y1 - 2020/4/2
N2 - This article scrutinizes the method of automated content analysis to measure the tone of news coverage. We compare a range of off-the-shelf sentiment analysis tools to manually coded economic news as well as examine the agreement between these dictionary approaches themselves. We assess the performance of five off-the-shelf sentiment analysis tools and two tailor-made dictionary-based approaches. The analyses result in five conclusions. First, there is little overlap between the off-the-shelf tools; causing wide divergence in terms of tone measurement. Second, there is no stronger overlap with manual coding for short texts (i.e., headlines) than for long texts (i.e., full articles). Third, an approach that combines individual dictionaries achieves a comparably good performance. Fourth, precision may increase to acceptable levels at higher levels of granularity. Fifth, performance of dictionary approaches depends more on the number of relevant keywords in the dictionary than on the number of valenced words as such; a small tailor-made lexicon was not inferior to large established dictionaries. Altogether, we conclude that off-the-shelf sentiment analysis tools are mostly unreliable and unsuitable for research purposes–at least in the context of Dutch economic news–and manual validation for the specific language, domain, and genre of the research project at hand is always warranted.
AB - This article scrutinizes the method of automated content analysis to measure the tone of news coverage. We compare a range of off-the-shelf sentiment analysis tools to manually coded economic news as well as examine the agreement between these dictionary approaches themselves. We assess the performance of five off-the-shelf sentiment analysis tools and two tailor-made dictionary-based approaches. The analyses result in five conclusions. First, there is little overlap between the off-the-shelf tools; causing wide divergence in terms of tone measurement. Second, there is no stronger overlap with manual coding for short texts (i.e., headlines) than for long texts (i.e., full articles). Third, an approach that combines individual dictionaries achieves a comparably good performance. Fourth, precision may increase to acceptable levels at higher levels of granularity. Fifth, performance of dictionary approaches depends more on the number of relevant keywords in the dictionary than on the number of valenced words as such; a small tailor-made lexicon was not inferior to large established dictionaries. Altogether, we conclude that off-the-shelf sentiment analysis tools are mostly unreliable and unsuitable for research purposes–at least in the context of Dutch economic news–and manual validation for the specific language, domain, and genre of the research project at hand is always warranted.
U2 - 10.1080/19312458.2019.1671966
DO - 10.1080/19312458.2019.1671966
M3 - Article
AN - SCOPUS:85074338709
SN - 1931-2458
VL - 14
SP - 83
EP - 104
JO - Communication Methods and Measures
JF - Communication Methods and Measures
IS - 2
ER -