AG Kommunikationstheorie


Improvement in Part-of-Speech Tagging with application to german blog comments


Studying technology acceptance (main topic of the HUMIC project), requires the survey and analysis of user opinions. Web 2.0 provides a huge collection of user comments regarding different technologies. For the classification of user opinions in such comments Natural Language Processing (NLP) methods are essential but not yet applicable for blog comments. The most fundamental part of the linguistic pipeline in NLP is Part-of-Speech (POS) tagging, where word forms and syntactic functions are assigned to each word/token in a text. Most POS taggers are trained on newspaper corpora. Applying such taggers to blog comments results in performance degrades. Blog comments pose additional challenges due to text ungrammaticalities and peculiarities (e.g. missing pronouns, usage of emoticons, letter iterations). In this talk we present a couple of extensions to a basic Markov Model tagger. These extensions improve its accuracy when trained on a combination of annotated newspaper texts and blog comments.

zurück zur Terminübersicht