Strategies for Improving Semi-automated Topic Classification of Media and Parliamentary documents

G.E. Breeman, H. Then, J. Kleinnijenhuis, W. van Atteveldt, A. Timmermans

Research output: Contribution to conferenceConference paperAcademic

Abstract

Since 1995 the techniques and capacities to store new electronic data and to make it available to many persons have become a common good. As of then, different organizations, such as research institutes, universities, libraries, and private companies (Google) started to scan older documents and make them electronically available as well. This has generated a lot of new research opportunities for all kinds of academic disciplines. The use of software to analyze large datasets has become an important part of doing research in the social sciences. Most academics rely on human coded datasets, both in qualitative and quantitative research. However, with the increasing amount of datasets and the complexity of the questions scholars pose to the datasets, the quest for more efficient and effective methods is now on the agenda. One of the most common techniques of content analysis is the Boolean key-word search method. To find certain topics in a dataset, the researcher creates first a list of keywords, added with certain parameters (AND, OR etc.). All keys are usually grouped in families and the entire list of keys and groups is called the ontology. Then the keywords are searched in the dataset, retrieving all documents containing the specified keywords. The online newspaper dataset, LexisNexis, provides the user with such a Boolean search method. However, the Boolean key-word search is not always satisfying in terms of reliability and validity. For that reason social scientists rely on hand-coding. Two projects that do so are the congressional bills project (www.congressionalbills.org ) and the policy agenda-setting project (see www.policyagendas.org ). They developed a topic code book and coded various different sources, such as, the state of the union speeches, bills, newspaper articles etcetera. The continuous improving automated coding techniques, and the increasing number of agenda setting projects (in especially European countries), however, has made the use of automated coding software a feasible option and also a necessity.
Original languageEnglish
Publication statusPublished - 2009
Event2nd Annual Meeting of the Comparative Policy Agendas Conference The Hague -
Duration: 18 Jun 200919 Jun 2009

Conference

Conference2nd Annual Meeting of the Comparative Policy Agendas Conference The Hague
Period18/06/0919/06/09

Fingerprint Dive into the research topics of 'Strategies for Improving Semi-automated Topic Classification of Media and Parliamentary documents'. Together they form a unique fingerprint.

Cite this