Language Processing System

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 74076 Experts worldwide ranked by ideXlab platform

Li Zhou - One of the best experts on this subject based on the ideXlab platform.

  • using machine learning to identify health outcomes from electronic health record data
    Current Epidemiology Reports, 2018
    Co-Authors: Jenna Wong, Li Zhou, Mara Murray E Horwitz
    Abstract:

    Electronic health records (EHRs) contain valuable data for identifying health outcomes, but these data also present numerous challenges when creating computable phenotyping algorithms. Machine learning methods could help with some of these challenges. In this review, we discuss four common scenarios that researchers may find helpful for thinking critically about when and for what tasks machine learning may be used to identify health outcomes from EHR data. We first consider the conditions in which machine learning may be especially useful with respect to two dimensions of a health outcome: (1) the characteristics of its diagnostic criteria and (2) the format in which its diagnostic data are usually stored within EHR Systems. In the first dimension, we propose that for health outcomes with diagnostic criteria involving many clinical factors, vague definitions, or subjective interpretations, machine learning may be useful for modeling the complex diagnostic decision-making process from a vector of clinical inputs to identify individuals with the health outcome. In the second dimension, we propose that for health outcomes where diagnostic information is largely stored in unstructured formats such as free text or images, machine learning may be useful for extracting and structuring this information as part of a natural Language Processing System or an image recognition task. We then consider these two dimensions jointly to define four common scenarios of health outcomes. For each scenario, we discuss the potential uses for machine learning—first assuming accurate and complete EHR data and then relaxing these assumptions to accommodate the limitations of real-world EHR Systems. We illustrate these four scenarios using concrete examples and describe how recent studies have used machine learning to identify these health outcomes from EHR data. Machine learning has great potential to improve the accuracy and efficiency of health outcome identification from EHR Systems, especially under certain conditions. To promote the use of machine learning in EHR-based phenotyping tasks, future work should prioritize efforts to increase the transportability of machine learning algorithms for use in multi-site settings.

  • automated identification of wound information in clinical notes of patients with heart diseases developing and validating a natural Language Processing application
    International Journal of Nursing Studies, 2016
    Co-Authors: Maxim Topaz, Dawn Dowding, Anna Zisberg, Kathryn H Bowles, Li Zhou
    Abstract:

    Abstract Background Electronic health records are being increasingly used by nurses with up to 80% of the health data recorded as free text. However, only a few studies have developed nursing-relevant tools that help busy clinicians to identify information they need at the point of care. Objective This study developed and validated one of the first automated natural Language Processing applications to extract wound information (wound type, pressure ulcer stage, wound size, anatomic location, and wound treatment) from free text clinical notes. Methods and design First, two human annotators manually reviewed a purposeful training sample (n=360) and random test sample (n=1100) of clinical notes (including 50% discharge summaries and 50% outpatient notes), identified wound cases, and created a gold standard dataset. We then trained and tested our natural Language Processing System (known as MTERMS) to process the wound information. Finally, we assessed our automated approach by comparing System-generated findings against the gold standard. We also compared the prevalence of wound cases identified from free-text data with coded diagnoses in the structured data. Results The testing dataset included 101 notes (9.2%) with wound information. The overall System performance was good (F-measure is a compiled measure of System's accuracy=92.7%), with best results for wound treatment (F-measure=95.7%) and poorest results for wound size (F-measure=81.9%). Only 46.5% of wound notes had a structured code for a wound diagnosis. Conclusions The natural Language Processing System achieved good performance on a subset of randomly selected discharge summaries and outpatient notes. In more than half of the wound notes, there were no coded wound diagnoses, which highlight the significance of using natural Language Processing to enrich clinical decision making. Our future steps will include expansion of the application's information coverage to other relevant wound factors and validation of the model with external data.

Carol Friedman - One of the best experts on this subject based on the ideXlab platform.

  • Deriving comorbidities from medical records using natural Language Processing
    Journal of the American Medical Informatics Association, 2013
    Co-Authors: Hojjat Salmasian, Daniel E. Freedberg, Carol Friedman
    Abstract:

    Extracting comorbidity information is crucial for phenotypic studies because of the confounding effect of comorbidities. We developed an automated method that accurately determines comorbidities from electronic medical records. Using a modified version of the Charlson comorbidity index (CCI), two physicians created a reference standard of comorbidities by manual review of 100 admission notes. We processed the notes using the MedLEE natural Language Processing System, and wrote queries to extract comorbidities automatically from its structured output. Interrater agreement for the reference set was very high (97.7%). Our method yielded an F1 score of 0.761 and the summed CCI score was not different from the reference standard (p=0.329, power 80.4%). In comparison, obtaining comorbidities from claims data yielded an F1 score of 0.741, due to lower sensitivity (66.1%). Because CCI has previously been validated as a predictor of mortality and readmission, our method could allow automated prediction of these outcomes.

  • GENIES: A natural-Language Processing System for the extraction of molecular pathways from journal articles
    Bioinformatics, 2001
    Co-Authors: Carol Friedman, Pauline Kra, Michael Krauthammer, Hong Yu, Andrey Rzhetsky
    Abstract:

    Systems that extract structured information from natural Language passages have been highly successful in specialized domains. The time is opportune for developing analogous applications for molecular biology and genomics. We present a System, GENIES, that extracts and structures information about cellular pathways from the biological literature in accordance with a knowledge model that we developed earlier. We implemented GENIES by modifying an existing medical natural Language Processing System, MedLEE, and performed a preliminary evaluation study. Our results demonstrate the value of the underlying techniques for the purpose of acquiring valuable knowledge from biological journals.

  • A broad-coverage natural Language Processing System
    Proceedings / Amia Symposium, 2000
    Co-Authors: Carol Friedman
    Abstract:

    Natural Language Processing Systems (NLP) that extract clinical information from textual reports were shown to be effective for limited domains and for particular applications. Because an NLP System typically requires substantial resources to develop, it is beneficial if it is designed to be easily extendible to multiple domains and applications. This paper describes multiple extensions of an NLP System called MedLEE, which was originally developed for the domain of radiological reports of the chest, but has subsequently been extended to mammography, discharge summaries, all of radiology, electrocardiography, echocardiography, and pathology.

  • Identification of findings suspicious for breast cancer based on natural Language Processing of mammogram reports.
    Proceedings : a conference of the American Medical Informatics Association / ... AMIA Annual Fall Symposium. AMIA Fall Symposium, 1997
    Co-Authors: N. L. Jain, Carol Friedman
    Abstract:

    There is need for encoded data for computerized clinical decision support, but most such data are unavailable as they are in free-text reports. Natural Language Processing offers one alternative for encoding such data. MedLEE is a natural Language Processing System which is in routine use for encoding chest radiograph and mammogram reports. In this paper, we study MedLEE's ability to identify mammogram findings suspicious for breast cancer by comparing MedLEE's encoding with a logbook of all suspicious findings maintained by the mammography center. While MedLEE was able to identify all the suspicious findings, it varied in the level of granularity, particularly about the location of the suspicious finding. Thus, natural Language Processing is a useful technique for encoding mammogram reports in order to detect suspicious findings.

Lolkje T W De Jongvan Den Berg - One of the best experts on this subject based on the ideXlab platform.

  • using concepts in literature based discovery simulating swanson s raynaud fish oil and migraine magnesium discoveries
    Journal of the Association for Information Science and Technology, 2001
    Co-Authors: Marc Weeber, Henry Klein, Lolkje T W De Jongvan Den Berg
    Abstract:

    Literature-based discovery has resulted in new knowledge. In the biomedical context, Don R. Swanson has generated several literature-based hypotheses that have been corroborated experimentally and clinically. In this paper, we propose a two-step model of the discovery process in which hypotheses are generated and subsequently tested. We have implemented this model in a Natural Language Processing System that uses biomedical Unified Medical Language System (UMLS) concepts as its unit of analysis. We use the semantic information that is provided with these concepts as a powerful filter to successfully simulate Swanson's discoveries of connecting Raynaud's disease with fish oil and migraine with a magnesium deficiency.

  • text based discovery in biomedicine the architecture of the dad System
    American Medical Informatics Association Annual Symposium, 2000
    Co-Authors: Marc Weeber, Alan R. Aronson, Henny Klein, James G Mork, Lolkje T W De Jongvan Den Berg
    Abstract:

    Current scientific research takes place in highly specialized contexts with poor communication between disciplines as a likely consequence. Knowledge from one discipline may be useful for the other without researchers knowing it. As scientific publications are a condensation of this knowledge, literature-based discovery tools may help the individual scientist to explore new useful domains. We report on the development of the DAD-System, a concept-based Natural Language Processing System for PubMed citations that provides the biomedical researcher such a tool. We describe the general architecture and illustrate its operation by a simulation of a well-known text-based discovery: The favorable effects of fish oil on patients suffering from Raynaud's disease [1].

Guergana K Savova - One of the best experts on this subject based on the ideXlab platform.

  • DeepPhe: A natural Language Processing System for extracting cancer phenotypes from clinical records
    Cancer Research, 2017
    Co-Authors: Guergana K Savova, Timothy Miller, Sean Finan, Harry Hochheiser, Chen Lin, David Harris, Olga Medvedeva, Eugene Tseytlin, Melissa Castine, Girish Chavan
    Abstract:

    Precise phenotype information is needed to understand the effects of genetic and epigenetic changes on tumor behavior and responsiveness. Extraction and representation of cancer phenotypes is currently mostly performed manually, making it difficult to correlate phenotypic data to genomic data. In addition, genomic data are being produced at an increasingly faster pace, exacerbating the problem. The DeepPhe software enables automated extraction of detailed phenotype information from electronic medical records of cancer patients. The System implements advanced Natural Language Processing and knowledge engineering methods within a flexible modular architecture, and was evaluated using a manually annotated dataset of the University of Pittsburgh Medical Center breast cancer patients. The resulting platform provides critical and missing computational methods for computational phenotyping. Working in tandem with advanced analysis of high-throughput sequencing, these approaches will further accelerate the transition to precision cancer treatment. Cancer Res; 77(21); e115–8. ©2017 AACR .

  • Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): Architecture, component evaluation and applications
    Journal of the American Medical Informatics Association, 2010
    Co-Authors: Guergana K Savova, Karin C. Kipper-schuler, James J Masanz, Jiaping Zheng, Sunghwan Sohn, Philip V Ogren, Christopher G. Chute
    Abstract:

    We aim to build and evaluate an open-source natural Language Processing System for information extraction from electronic medical record clinical free-text. We describe and evaluate our System, the clinical Text Analysis and Knowledge Extraction System (cTAKES), released open-source at http://www.ohnlp.org. The cTAKES builds on existing open-source technologies-the Unstructured Information Management Architecture framework and OpenNLP natural Language Processing toolkit. Its components, specifically trained for the clinical domain, create rich linguistic and semantic annotations. Performance of individual components: sentence boundary detector accuracy=0.949; tokenizer accuracy=0.949; part-of-speech tagger accuracy=0.936; shallow parser F-score=0.924; named entity recognizer and System-level evaluation F-score=0.715 for exact and 0.824 for overlapping spans, and accuracy for concept mapping, negation, and status attributes for exact and overlapping spans of 0.957, 0.943, 0.859, and 0.580, 0.939, and 0.839, respectively. Overall performance is discussed against five applications. The cTAKES annotations are the foundation for methods and modules for higher-level semantic Processing of clinical free-text.

Girish Chavan - One of the best experts on this subject based on the ideXlab platform.

  • DeepPhe: A natural Language Processing System for extracting cancer phenotypes from clinical records
    Cancer Research, 2017
    Co-Authors: Guergana K Savova, Timothy Miller, Sean Finan, Harry Hochheiser, Chen Lin, David Harris, Olga Medvedeva, Eugene Tseytlin, Melissa Castine, Girish Chavan
    Abstract:

    Precise phenotype information is needed to understand the effects of genetic and epigenetic changes on tumor behavior and responsiveness. Extraction and representation of cancer phenotypes is currently mostly performed manually, making it difficult to correlate phenotypic data to genomic data. In addition, genomic data are being produced at an increasingly faster pace, exacerbating the problem. The DeepPhe software enables automated extraction of detailed phenotype information from electronic medical records of cancer patients. The System implements advanced Natural Language Processing and knowledge engineering methods within a flexible modular architecture, and was evaluated using a manually annotated dataset of the University of Pittsburgh Medical Center breast cancer patients. The resulting platform provides critical and missing computational methods for computational phenotyping. Working in tandem with advanced analysis of high-throughput sequencing, these approaches will further accelerate the transition to precision cancer treatment. Cancer Res; 77(21); e115–8. ©2017 AACR .

  • a federated network for translational cancer research using clinical data and biospecimens
    Cancer Research, 2015
    Co-Authors: Rebecca S Jacobson, Girish Chavan, Michael J Becich, Roni J Bollag, Julia Corrigan, Rajiv Dhir, Michael Feldman, Carmelo Gaudioso, Elizabeth Legowski, Nita J Maihle
    Abstract:

    Advances in cancer research and personalized medicine will require significant new bridging infrastructures, including more robust biorepositories that link human tissue to clinical phenotypes and outcomes. In order to meet that challenge, four cancer centers formed the Text Information Extraction System (TIES) Cancer Research Network, a federated network that facilitates data and biospecimen sharing among member institutions. Member sites can access pathology data that are de-identified and processed with the TIES natural Language Processing System, which creates a repository of rich phenotype data linked to clinical biospecimens. TIES incorporates multiple security and privacy best practices that, combined with legal agreements, network policies, and procedures, enable regulatory compliance. The TIES Cancer Research Network now provides integrated access to investigators at all member institutions, where multiple investigator-driven pilot projects are underway. Examples of federated search across the network illustrate the potential impact on translational research, particularly for studies involving rare cancers, rare phenotypes, and specific biologic behaviors. The network satisfies several key desiderata including local control of data and credentialing, inclusion of rich phenotype information, and applicability to diverse research objectives. The TIES Cancer Research Network presents a model for a national data and biospecimen network.