Ihre E-Mail wurde erfolgreich gesendet. Bitte prüfen Sie Ihren Maileingang.

Leider ist ein Fehler beim E-Mail-Versand aufgetreten. Bitte versuchen Sie es erneut.

Vorgang fortführen?

Exportieren
Filter
  • Informatik  (1)
Medientyp
Verlag/Herausgeber
Sprache
Erscheinungszeitraum
FID
Fachgebiete(RVK)
  • Informatik  (1)
RVK
  • 1
    Online-Ressource
    Online-Ressource
    Cambridge University Press (CUP) ; 2011
    In:  Natural Language Engineering Vol. 17, No. 1 ( 2011-01), p. 37-70
    In: Natural Language Engineering, Cambridge University Press (CUP), Vol. 17, No. 1 ( 2011-01), p. 37-70
    Kurzfassung: In this article, we demonstrate several novel ways in which insights from information theory (IT) and computational linguistics (CL) can be woven into a vector-space-model (VSM) approach to information retrieval (IR). Our proposals focus, essentially, on three areas: pre-processing (morphological analysis), term weighting, and alternative geometrical models to the widely used term-by-document matrix. The latter include (1) PARAFAC2 decomposition of a term-by-document-by-language tensor, and (2) eigenvalue decomposition of a term-by-term matrix (inspired by Statistical Machine Translation). We evaluate all proposals, comparing them to a ‘standard’ approach based on Latent Semantic Analysis, on a multilingual document clustering task. The evidence suggests that proper consideration of IT within IR is indeed called for: in all cases, our best results are achieved using the information-theoretic variations upon the standard approach. Furthermore, we show that different information-theoretic options can be combined for still better results. A key function of language is to encode and convey information, and contributions of IT to the field of CL can be traced back a number of decades. We think that our proposals help bring IR and CL more into line with one another. In our conclusion, we suggest that the fact that our proposals yield empirical improvements is not coincidental given that they increase the theoretical transparency of VSM approaches to IR; on the contrary, they help shed light on why aspects of these approaches work as they do.
    Materialart: Online-Ressource
    ISSN: 1351-3249 , 1469-8110
    RVK:
    Sprache: Englisch
    Verlag: Cambridge University Press (CUP)
    Publikationsdatum: 2011
    ZDB Id: 1481165-0
    SSG: 7,11
    Bibliothek Standort Signatur Band/Heft/Jahr Verfügbarkeit
    BibTip Andere fanden auch interessant ...
Schließen ⊗
Diese Webseite nutzt Cookies und das Analyse-Tool Matomo. Weitere Informationen finden Sie auf den KOBV Seiten zum Datenschutz