KOBV Portal

Hits per page

hit 1 - 1 | 1 hit

Select All Export

Online Resource

Biomedical Information Extraction : Mining Disease Associated Genes from Literature (2014)

Huang, Zhong [VerfasserIn]

add to watchlist on the watchlist

Details

UID:

(DE-627)858878399

Format: 1 Online-Ressource

Content: Disease associated gene discovery is a critical step to realize the future of personalized medicine. However empirical and clinical validation of disease associated genes are time consuming and expensive. In silico discovery of disease associated genes from literature is therefore becoming the first essential step for biomarker discovery to support hypothesis formulation and decision making. Completion of human genome project and advent of high-throughput technology have produced tremendous amount of data, which results in exponential growing of biomedical knowledge deposited in literature database. The sheer quantity of unexplored information causes information overflow for biomedical researchers, and poses big challenge for informatics researchers to address user's information extraction needs. This thesis focused on mining disease associated genes from PubMed literature database using machine learning and graph theory based information extraction (IE) methods. Mining disease associated genes is not trivial and requires pipelines of information extraction steps and methods. Beginning from named entity recognition (NER), the author introduced semantic concept type into feature space for conditional random fields machine learning and demonstrated the effectiveness of the concept feature for disease NER. The effects of domain specific POS tagging, domain specific dictionaries, and named entity encoding scheme on NER performance were also explored. Experimental results show that by combining knowledge base with concept feature space, it can significantly improve the overall disease NER performance. It has also shown that shallow linguistic features of global and local word sequence context can be used with string kernel based supporting vector machine (SVM) for efficient disease-gene relation extraction. Lastly, the disease-associated gene network was constructed by utilizing concept co-occurrence matrix computed from disease focused document collection, and subjected to systematic topology analysis. The gene network was then merged with a seed-gene expanded network to form heterogeneous disease-gene network. The author identified and prioritized disease-associated genes by graph centrality measurements. This novel approach provides a new mean for disease associated gene extraction from large corpora.

Note: Dissertation 2014

Language: English

Keywords: Hochschulschrift

URL: Volltext (kostenfrei)

Library	Location	Call Number	Volume/Issue/Year	Availability

Others were also interested in ...

Online Resource

Fulltext

GBV

hit 1 - 1 | 1 hit

Kooperativer Bibliotheksverbund

Berlin Brandenburg