In:
Natural Language Engineering, Cambridge University Press (CUP), Vol. 21, No. 5 ( 2015-11), p. 773-798
Abstract:
In this paper, we propose an unsupervised and automated method to identify noun sense changes based on rigorous analysis of time-varying text data available in the form of millions of digitized books and millions of tweets posted per day. We construct distributional-thesauri-based networks from data at different time points and cluster each of them separately to obtain word-centric sense clusters corresponding to the different time points. Subsequently, we propose a split/join based approach to compare the sense clusters at two different time points to find if there is ‘birth’ of a new sense. The approach also helps us to find if an older sense was ‘split’ into more than one sense or a newer sense has been formed from the ‘join’ of older senses or a particular sense has undergone ‘death’. We use this completely unsupervised approach (a) within the Google books data to identify word sense differences within a media, and (b) across Google books and Twitter data to identify differences in word sense distribution across different media. We conduct a thorough evaluation of the proposed methodology both manually as well as through comparison with WordNet.
Type of Medium:
Online Resource
ISSN:
1351-3249
,
1469-8110
DOI:
10.1017/S135132491500011X
Language:
English
Publisher:
Cambridge University Press (CUP)
Publication Date:
2015
detail.hit.zdb_id:
1481165-0
SSG:
7,11