In:
Natural Language Engineering, Cambridge University Press (CUP), Vol. 23, No. 5 ( 2017-09), p. 687-707
Abstract:
Newspaper text can be broadly divided in the classes ‘opinion’ (editorials, commentary, letters to the editor) and ‘neutral’ (reports). We describe a classification system for performing this separation, which uses a set of linguistically motivated features. Working with various English newspaper corpora, we demonstrate that it significantly outperforms bag-of-lemma and PoS-tag models. We conclude that the linguistic features constitute the best method for achieving robustness against change of newspaper or domain.
Type of Medium:
Online Resource
ISSN:
1351-3249
,
1469-8110
DOI:
10.1017/S1351324917000043
Language:
English
Publisher:
Cambridge University Press (CUP)
Publication Date:
2017
detail.hit.zdb_id:
1481165-0
SSG:
7,11
Bookmarklink