Detecting Irregularities in Blog Comment Language Affecting POS Tagging Accuracy

Authors

M. Neunerdt, B. Trevisan, R. Mathar, E. Jakobs,

Abstract

Studying technology acceptance requires the survey and analysis of user opinions to identify acceptance relevant factors. In addition to surveys, Web 2.0 poses a huge collection of user comments regarding different technologies. Extracting acceptance-relevant factors and user opinions from these comments requires the application of Natural Language Processing (NLP) methods, particularly POS tagging. Due to the language used in blogs, POS tagging results suffer from high error rates. In this paper, we present a user-specific study of blog comments to analyze the relation between blog language and performance of NLP methods. Application of the proposed approach leads to enhancement of POS tagging and lemmatizing quality. Furthermore, we present an ontology-based corpus generation tool to improve the identification of topic- and user-specific blog comments. Utilizing the generation tool, exemplarily a corpus dealing with mobile communication systems (MCS) is created. Furthermore, we analyze and transform the identified comments into structured datasets.

BibT_EX Reference Entry

@article{NeTrMaJa12,
	author = {Melanie Neunerdt and Bianka Trevisan and Rudolf Mathar and Eva-Maria Jakobs},
	title = "Detecting Irregularities in Blog Comment Language Affecting POS Tagging Accuracy",
	pages = "71-88",
	journal = "International Journal of Computational Linguistics and Applications",
	volume = "3",
	number = "1",
	month = Jun,
	year = 2012,
	hsb = hsb999910272430 ,
	}

Downloads

_{Download paper} _{Download bibtex-file}

This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights there in are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.

*** Aktuelle Informationen gemäß Art. 13 DS-GVO: Datenschutzhinweis *** Impressum ***

Institute for Theoretical Information Technology

Detecting Irregularities in Blog Comment Language Affecting POS Tagging Accuracy

Authors

Abstract

BibTEX Reference Entry

Downloads

BibT_EX Reference Entry