Automatic Genre Classification in Web Pages Applied to Web Comments

Authors

M. Neunerdt, M. Reyer, R. Mathar,

Abstract

        Automatic Web comment detection could significantly facilitate information retrieval systems, e.g., a focused Web crawler. In this paper, we propose a text genre classifier for Web text segments as intermediate step for Web comment detection in Web pages. Different feature types and classifiers are analyzed for this purpose. We compare the two-level approach to state-of-the-art techniques operating on the whole Web page text and show that accuracy can be improved significantly. Finally, we illustrate the applicability for information retrieval systems by evaluating our approach on Web pages achieved by a Web crawler.

BibTEX Reference Entry 

@inproceedings{NeReMa14,
	author = {Melanie Neunerdt and Michael Reyer and Rudolf Mathar},
	title = "Automatic Genre Classification in Web Pages Applied to Web Comments",
	pages = "145-151",
	booktitle = "12th Conference on Natural Language Processing (KONVENS)",
	address = {Hildesheim, Germany},
	month = Oct,
	year = 2014,
	hsb = hsb999910363744 ,
	}

Downloads

 Download paper  Download bibtex-file

This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights there in are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.