Umfang:
1 Online-Ressource (xv, 183 Seiten)
,
Illustrationen
ISBN:
1681733935
,
9781681733937
Serie:
Synthesis lectures on data mining and knowledge discovery #15
Inhalt:
The real-world data, though massive, is largely unstructured, in the form of natural-language text. It is challenging but highly desirable to mine structures from massive text data, without extensive human annotation and labeling. In this book, we investigate the principles and methodologies of mining structures of factual knowledge (e.g., entities and their relationships) from massive, unstructured text corpora. Departing from many existing structure extraction methods that have heavy reliance on human annotated data for model training, our effort-light approach leverages human-curated facts stored in external knowledge bases as distant supervision and exploits rich data redundancy in large text corpora for context understanding. This effort-light mining approach leads to a series of new principles and powerful methodologies for structuring text corpora, including: (1) entity recognition, typing, and synonym discovery; (2) entity relation extraction; and (3) open-domain attribute-value mining and information extraction. This book introduces this new research frontier and points out some promising research directions
Inhalt:
1. Introduction -- 1.1 Overview of the book -- 1.1.1 Part I: Identifying typed entities -- 1.1.2 Part II: Extracting typed entity relationships -- 1.1.3 Part III: Toward automated factual structure mining -- 2. Background -- 2.1 Entity structures -- 2.2 Relation structures -- 2.3 Distant supervision from knowledge bases -- 2.4 Mining entity and relation structures -- 2.5 Common notations -- 3. Literature review -- 3.1 Hand-crafted methods -- 3.2 Traditional supervised learning methods -- 3.2.1 Sequence labeling methods -- 3.2.2 Supervised relation extraction methods -- 3.3 Weakly supervised extraction methods -- 3.3.1 Semi-supervised learning -- 3.3.2 Pattern-based bootstrapping -- 3.4 Distantly supervised learning methods -- 3.5 Learning with noisy labeled data -- 3.6 Open-domain information extraction --
Inhalt:
Bibliography -- Authors' biographies
Inhalt:
Part I. Identifying typed entities -- 4. Entity recognition and typing with knowledge bases -- 4.1 Overview and motivation -- 4.2 Problem definition -- 4.3 Relation phrase-based graph construction -- 4.3.1 Candidate generation -- 4.3.2 Mention-name subgraph -- 4.3.3 Name-relation phrase subgraph -- 4.3.4 Mention correlation subgraph -- 4.4 Clustering-integrated type propagation on graphs -- 4.4.1 Seed mention generation -- 4.4.2 Relation phrase clustering -- 4.4.3 The joint optimization problem -- 4.4.4 The ClusType algorithm -- 4.4.5 Computational complexity analysis -- 4.5 Experiments -- 4.5.1 Data preparation -- 4.5.2 Experimental settings -- 4.5.3 Experiments and performance study -- 4.6 Discussion -- 4.7 Summary -- 5. Fine-grained entity typing with knowledge bases -- 5.1 Overview and motivation -- 5.2 Preliminaries -- 5.3 The AFET framework -- 5.3.1 Text feature generation -- 5.3.2 Training set partition -- 5.3.3 The joint mention-type model -- 5.3.4 Modeling type correlation -- 5.3.5 Modeling noisy type labels -- 5.3.6 Hierarchical partial-label embedding -- 5.4 Experiments -- 5.4.1 Data preparation -- 5.4.2 Evaluation settings -- 5.4.3 Performance comparison and analyses -- 5.5 Discussion and case analysis -- 5.6 Summary -- 6. Synonym discovery from large corpus / Meng Qu -- 6.1 Overview and motivation -- 6.1.1 Challenges -- 6.1.2 Proposed solution -- 6.2 The DPE framework -- 6.2.1 Synonym seed collection -- 6.2.2 Joint optimization problem -- 6.2.3 Distributional module -- 6.2.4 Pattern module -- 6.3 Experiment -- 6.4 Summary --
Inhalt:
Part II. Extracting typed relationships -- 7. Joint extraction of typed entities and relationships -- 7.1 Overview and motivation -- 7.2 Preliminaries -- 7.3 The CoType framework -- 7.3.1 Candidate generation -- 7.3.2 Joint entity and relation embedding -- 7.3.3 Model learning and type inference -- 7.4 Experiments -- 7.4.1 Data preparation and experiment setting -- 7.4.2 Experiments and performance study -- 7.5 Discussion -- 7.6 Summary -- 8. Pattern-enhanced embedding learning for relation extraction / Meng Qu -- 8.1 Overview and motivation -- 8.1.1 Challenges -- 8.1.2 Proposed solution -- 8.2 The REPEL framework -- 8.3 Experiment -- 8.4 Summary -- 9. Heterogeneous supervision for relation extraction / Liyuan Liu -- 9.1 Overview and motivation -- 9.2 Preliminaries -- 9.2.1 Relation extraction -- 9.2.2 Heterogeneous supervision -- 9.2.3 Problem definition -- 9.3 The REHession framework -- 9.3.1 Modeling relation mention -- 9.3.2 True label discovery -- 9.3.3 Modeling relation type -- 9.3.4 Model learning -- 9.3.5 Relation type inference -- 9.4 Experiments -- 9.5 Summary -- 10. Indirect supervision: leveraging knowledge from auxiliary tasks / Zeqiu Wu -- 10.1 Overview and motivation -- 10.1.1 Challenges -- 10.1.2 Proposed solution -- 10.2 The proposed approach -- 10.2.1 Heterogeneous network construction -- 10.2.2 Joint RE and QA embedding -- 10.2.3 Type inference -- 10.3 Experiments -- 10.4 Summary --
Inhalt:
Part III. Toward automated factual structure mining -- 11. Mining entity attribute values with meta patterns / Meng Jiang -- 11.1 Overview and motivation -- 11.1.1 Challenges -- 11.1.2 Proposed solution -- 11.1.3 Problem formulation -- 11.2 The MetaPAD framework -- 11.2.1 Generating meta patterns by context-aware segmentation -- 11.2.2 Grouping synonymous meta patterns -- 11.2.3 Adjusting type levels for preciseness -- 11.3 Summary -- 12. Open information extraction with global structure cohesiveness / Qi Zhu -- 12.1 Overview and motivation -- 12.1.1 Proposed solution -- 12.2 The ReMine framework -- 12.2.1 The joint optimization problem -- 12.3 Summary -- 13. Applications -- 13.1 Structuring life science papers: the Life-iNet system -- 13.2 Extracting document facets from technical corpora -- 13.3 Comparative document analysis -- 14. Conclusions -- 14.1 Effort-light StructMine: summary -- 14.2 Conclusion -- 15. Vision and future work -- 15.1 Extracting implicit patterns from massive unlabeled corpora -- 15.2 Enriching factual structure representation --
Anmerkung:
Includes bibliographical references (pages 167-181)
Weitere Ausg.:
ISBN 9781681733944
Weitere Ausg.:
ISBN 9781681733920
Weitere Ausg.:
Print version ISBN 9781681733920
Sprache:
Englisch
Bookmarklink