Unstructured Information

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 360 Experts worldwide ranked by ideXlab platform

Michael Beetz - One of the best experts on this subject based on the ideXlab platform.

  • robosherlock Unstructured Information processing for robot perception
    International Conference on Robotics and Automation, 2015
    Co-Authors: Michael Beetz, Ferenc Balintbenczedi, Nico Blodow, Daniel Nyga, Thiemo Wiedemeyer, Zoltancsaba Marton
    Abstract:

    We present RoboSherlock, an open source software framework for implementing perception systems for robots performing human-scale everyday manipulation tasks. In RoboSherlock, perception and interpretation of realistic scenes is formulated as an Unstructured Information management (UIM) problem. The application of the UIM principle supports the implementation of perception systems that can answer task-relevant queries about objects in a scene, boost object recognition performance by combining the strengths of multiple perception algorithms, support knowledge-enabled reasoning about objects and enable automatic and knowledge-driven generation of processing pipelines. We demonstrate the potential of the proposed framework by three feasibility studies of systems for real-world scene perception that have been built on top of RoboSherlock.

  • pr2 looking at things ensemble learning for Unstructured Information processing with markov logic networks
    International Conference on Robotics and Automation, 2014
    Co-Authors: Daniel Nyga, Ferenc Balintbenczedi, Michael Beetz
    Abstract:

    We investigate the perception and reasoning task of answering queries about realistic scenes with objects of daily use perceived by a robot. A key problem implied by the task is the variety of perceivable properties of objects, such as their shape, texture, color, size, text pieces and logos, that go beyond the capabilities of individual state-of-the-art perception methods. A promising alternative is to employ combinations of more specialized perception methods. In this paper we propose a novel combination method, which structures perception in a two-step process, and apply this method in our object perception system. In a first step, specialized methods annotate detected object hypotheses with symbolic Information pieces. In the second step, the given query Q is answered by inferring the conditional probability P(Q | E), where E are the symbolic Information pieces considered as evidence for the conditional probability. In this setting Q and E are part of a probabilistic model of scenes, objects and their annotations, which the perception method has beforehand learned a joint probability distribution of. Our proposed method has substantial advantages over alternative methods in terms of the generality of queries that can be answered, the generation of Information that can actively guide perception, the ease of extension, the possibility of including additional kinds of evidences, and its potential for the realization of self-improving and -specializing perception systems. We show for object categorization, which is a subclass of the probabilistic inferences, that impressive categorization performance can be achieved combining the employed expert perception methods in a synergistic manner.

Daniel Nyga - One of the best experts on this subject based on the ideXlab platform.

  • robosherlock Unstructured Information processing for robot perception
    International Conference on Robotics and Automation, 2015
    Co-Authors: Michael Beetz, Ferenc Balintbenczedi, Nico Blodow, Daniel Nyga, Thiemo Wiedemeyer, Zoltancsaba Marton
    Abstract:

    We present RoboSherlock, an open source software framework for implementing perception systems for robots performing human-scale everyday manipulation tasks. In RoboSherlock, perception and interpretation of realistic scenes is formulated as an Unstructured Information management (UIM) problem. The application of the UIM principle supports the implementation of perception systems that can answer task-relevant queries about objects in a scene, boost object recognition performance by combining the strengths of multiple perception algorithms, support knowledge-enabled reasoning about objects and enable automatic and knowledge-driven generation of processing pipelines. We demonstrate the potential of the proposed framework by three feasibility studies of systems for real-world scene perception that have been built on top of RoboSherlock.

  • pr2 looking at things ensemble learning for Unstructured Information processing with markov logic networks
    International Conference on Robotics and Automation, 2014
    Co-Authors: Daniel Nyga, Ferenc Balintbenczedi, Michael Beetz
    Abstract:

    We investigate the perception and reasoning task of answering queries about realistic scenes with objects of daily use perceived by a robot. A key problem implied by the task is the variety of perceivable properties of objects, such as their shape, texture, color, size, text pieces and logos, that go beyond the capabilities of individual state-of-the-art perception methods. A promising alternative is to employ combinations of more specialized perception methods. In this paper we propose a novel combination method, which structures perception in a two-step process, and apply this method in our object perception system. In a first step, specialized methods annotate detected object hypotheses with symbolic Information pieces. In the second step, the given query Q is answered by inferring the conditional probability P(Q | E), where E are the symbolic Information pieces considered as evidence for the conditional probability. In this setting Q and E are part of a probabilistic model of scenes, objects and their annotations, which the perception method has beforehand learned a joint probability distribution of. Our proposed method has substantial advantages over alternative methods in terms of the generality of queries that can be answered, the generation of Information that can actively guide perception, the ease of extension, the possibility of including additional kinds of evidences, and its potential for the realization of self-improving and -specializing perception systems. We show for object categorization, which is a subclass of the probabilistic inferences, that impressive categorization performance can be achieved combining the employed expert perception methods in a synergistic manner.

Adam Lally - One of the best experts on this subject based on the ideXlab platform.

  • watsonpaths scenario based question answering and inference over Unstructured Information
    Ai Magazine, 2017
    Co-Authors: Adam Lally, Erik T Mueller, Sugato Bagchi, Michael A Barborak, David W Buchanan, Jennifer Chucarroll, David A Ferrucci, Michael R Glass, Aditya Kalyanpur, William J Murdock
    Abstract:

    We present WatsonPaths, a novel system that can answer scenario-based questions. These include medical questions that present a patient summary and ask for the most likely diagnosis or most appropriate treatment. WatsonPaths builds on the IBM Watson question answering system. WatsonPaths breaks down the input scenario into individual pieces of Information, asks relevant subquestions of Watson to conclude new Information, and represents these results in a graphical model. Probabilistic inference is performed over the graph to conclude the answer. On a set of medical test preparation questions, WatsonPaths shows a significant improvement in accuracy over multiple baselines.

  • UIMA: An architectural approach to Unstructured Information processing in the corporate research environment
    Natural Language Engineering, 2004
    Co-Authors: David Ferrucci, Adam Lally
    Abstract:

    IBM Research has over 200 people working on Unstructured Information Management (UIM) technologies with a strong focus on Natural Language Processing (NLP). These researchers are engaged in activities ranging from natural language dialog, Information retrieval, topic-tracking, named-entity detection, document classification and machine translation to bioinformatics and open-domain question answering. An analysis of these activities strongly suggested that improving the organization's ability to quickly discover each other's results and rapidly combine different technologies and approaches would accelerate scientific advance. Furthermore, the ability to reuse and combine results through a common architecture and a robust software framework would accelerate the transfer of research results in NLP into IBM's product platforms. Market analyses indicating a growing need to process Unstructured Information, specifically multilingual, natural language text, coupled with IBM Research's investment in NLP, led to the development of middleware architecture for processing Unstructured Information dubbed UIMA. At the heart of UIMA are powerful search capabilities and a data-driven framework for the development, composition and distributed deployment of analysis engines. In this paper we give a general introduction to UIMA focusing on the design points of its analysis engine architecture and we discuss how UIMA is helping to accelerate research and technology transfer.

  • accelerating corporate research in the development application and deployment of human language technologies
    North American Chapter of the Association for Computational Linguistics, 2003
    Co-Authors: David A Ferrucci, Adam Lally
    Abstract:

    IBM Research has over 200 people working on Unstructured Information Management (UIM) technologies with a strong focus on HLT. Spread out over the globe they are engaged in activities ranging from natural language dialog to machine translation to bioinformatics to open-domain question answering. An analysis of these activities strongly suggested that improving the organization's ability to quickly discover each other's results and rapidly combine different technologies and approaches would accelerate scientific advance. Furthermore, the ability to reuse and combine results through a common architecture and a robust software framework would accelerate the transfer of research results in HLT into IBM's product platforms. Market analyses indicating a growing need to process Unstructured Information, specifically multi-lingual, natural language text, coupled with IBM Research's investment in HLT, led to the development of middleware architecture for processing Unstructured Information dubbed UIMA. At the heart of UIMA are powerful search capabilities and a data-driven framework for the development, composition and distributed deployment of analysis engines. In this paper we give a general introduction to UIMA focusing on the design points of its analysis engine architecture and we discuss how UIMA is helping to accelerate research and technology transfer.

Christopher G. Chute - One of the best experts on this subject based on the ideXlab platform.

  • MedXN: an open source medication extraction and normalization tool for clinical text.
    Journal of the American Medical Informatics Association : JAMIA, 2014
    Co-Authors: Sunghwan Sohn, Christopher G. Chute, Sean P. Murphy, Cheryl Clark, Scott Halgrim, Hongfang Liu
    Abstract:

    Objective We developed the Medication Extraction and Normalization (MedXN) system to extract comprehensive medication Information and normalize it to the most appropriate RxNorm concept unique identifier (RxCUI) as specifically as possible. Methods Medication descriptions in clinical notes were decomposed into medication name and attributes, which were separately extracted using RxNorm dictionary lookup and regular expression. Then, each medication name and its attributes were combined together according to RxNorm convention to find the most appropriate RxNorm representation. To do this, we employed serialized hierarchical steps implemented in Apache's Unstructured Information Management Architecture. We also performed synonym expansion, removed false medications, and employed inference rules to improve the medication extraction and normalization performance. Results An evaluation on test data of 397 medication mentions showed F-measures of 0.975 for medication name and over 0.90 for most attributes. The RxCUI assignment produced F-measures of 0.932 for medication name and 0.864 for full medication Information. Most false negative RxCUI assignments in full medication Information are due to human assumption of missing attributes and medication names in the gold standard. Conclusions The MedXN system ( ) was able to extract comprehensive medication Information with high accuracy and demonstrated good normalization capability to RxCUI as long as explicit evidence existed. More sophisticated inference rules might result in further improvements to specific RxCUI assignments for incomplete medication descriptions.

  • modeling and executing electronic health records driven phenotyping algorithms using the nqf quality data model and jboss drools engine
    American Medical Informatics Association Annual Symposium, 2012
    Co-Authors: Cory M Endle, Christopher G. Chute, Sahana Murthy, Craig Stancl, Dale Suesse, Davide Sottara, Stanley M Huff, Jyotishman Pathak
    Abstract:

    With increasing adoption of electronic health records (EHRs), the need for formal representations for EHR-driven phenotyping algorithms has been recognized for some time. The recently proposed Quality Data Model from the National Quality Forum (NQF) provides an Information model and a grammar that is intended to represent data collected during routine clinical care in EHRs as well as the basic logic required to represent the algorithmic criteria for phenotype definitions. The QDM is further aligned with Meaningful Use standards to ensure that the clinical data and algorithmic criteria are represented in a consistent, unambiguous and reproducible manner. However, phenotype definitions represented in QDM, while structured, cannot be executed readily on existing EHRs. Rather, human interpretation, and subsequent implementation is a required step for this process. To address this need, the current study investigates open-source JBoss® Drools rules engine for automatic translation of QDM criteria into rules for execution over EHR data. In particular, using Apache Foundation’s Unstructured Information Management Architecture (UIMA) platform, we developed a translator tool for converting QDM defined phenotyping algorithm criteria into executable Drools rules scripts, and demonstrated their execution on real patient data from Mayo Clinic to identify cases for Coronary Artery Disease and Diabetes. To the best of our knowledge, this is the first study illustrating a framework and an approach for executing phenotyping criteria modeled in QDM using the Drools business rules management system.

  • Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): Architecture, component evaluation and applications
    Journal of the American Medical Informatics Association, 2010
    Co-Authors: Guergana K Savova, Karin C. Kipper-schuler, James J Masanz, Jiaping Zheng, Sunghwan Sohn, Philip V Ogren, Christopher G. Chute
    Abstract:

    We aim to build and evaluate an open-source natural language processing system for Information extraction from electronic medical record clinical free-text. We describe and evaluate our system, the clinical Text Analysis and Knowledge Extraction System (cTAKES), released open-source at http://www.ohnlp.org. The cTAKES builds on existing open-source technologies-the Unstructured Information Management Architecture framework and OpenNLP natural language processing toolkit. Its components, specifically trained for the clinical domain, create rich linguistic and semantic annotations. Performance of individual components: sentence boundary detector accuracy=0.949; tokenizer accuracy=0.949; part-of-speech tagger accuracy=0.936; shallow parser F-score=0.924; named entity recognizer and system-level evaluation F-score=0.715 for exact and 0.824 for overlapping spans, and accuracy for concept mapping, negation, and status attributes for exact and overlapping spans of 0.957, 0.943, 0.859, and 0.580, 0.939, and 0.839, respectively. Overall performance is discussed against five applications. The cTAKES annotations are the foundation for methods and modules for higher-level semantic processing of clinical free-text.

  • mayo clinic nlp system for patient smoking status identification
    Journal of the American Medical Informatics Association, 2008
    Co-Authors: Guergana K Savova, Philip V Ogren, Patrick H Duffy, James D Buntrock, Christopher G. Chute
    Abstract:

    This article describes our system entry for the 2006 I2B2 contest "Challenges in Natural Language Processing for Clinical Data" for the task of identifying the smoking status of patients. Our system makes the simplifying assumption that patient-level smoking status determination can be achieved by accurately classifying individual sentences from a patient's record. We created our system with reusable text analysis components built on the Unstructured Information Management Architecture and Weka. This reuse of code minimized the development effort related specifically to our smoking status classifier. We report precision, recall, F-score, and 95% exact confidence intervals for each metric. Recasting the classification task for the sentence level and reusing code from other text analysis projects allowed us to quickly build a classification system that performs with a system F-score of 92.64 based on held-out data tests and of 85.57 on the formal evaluation data. Our general medical natural language engine is easily adaptable to a real-world medical informatics application. Some of the limitations as applied to the use-case are negation detection and temporal resolution.

Ferenc Balintbenczedi - One of the best experts on this subject based on the ideXlab platform.

  • robosherlock Unstructured Information processing for robot perception
    International Conference on Robotics and Automation, 2015
    Co-Authors: Michael Beetz, Ferenc Balintbenczedi, Nico Blodow, Daniel Nyga, Thiemo Wiedemeyer, Zoltancsaba Marton
    Abstract:

    We present RoboSherlock, an open source software framework for implementing perception systems for robots performing human-scale everyday manipulation tasks. In RoboSherlock, perception and interpretation of realistic scenes is formulated as an Unstructured Information management (UIM) problem. The application of the UIM principle supports the implementation of perception systems that can answer task-relevant queries about objects in a scene, boost object recognition performance by combining the strengths of multiple perception algorithms, support knowledge-enabled reasoning about objects and enable automatic and knowledge-driven generation of processing pipelines. We demonstrate the potential of the proposed framework by three feasibility studies of systems for real-world scene perception that have been built on top of RoboSherlock.

  • pr2 looking at things ensemble learning for Unstructured Information processing with markov logic networks
    International Conference on Robotics and Automation, 2014
    Co-Authors: Daniel Nyga, Ferenc Balintbenczedi, Michael Beetz
    Abstract:

    We investigate the perception and reasoning task of answering queries about realistic scenes with objects of daily use perceived by a robot. A key problem implied by the task is the variety of perceivable properties of objects, such as their shape, texture, color, size, text pieces and logos, that go beyond the capabilities of individual state-of-the-art perception methods. A promising alternative is to employ combinations of more specialized perception methods. In this paper we propose a novel combination method, which structures perception in a two-step process, and apply this method in our object perception system. In a first step, specialized methods annotate detected object hypotheses with symbolic Information pieces. In the second step, the given query Q is answered by inferring the conditional probability P(Q | E), where E are the symbolic Information pieces considered as evidence for the conditional probability. In this setting Q and E are part of a probabilistic model of scenes, objects and their annotations, which the perception method has beforehand learned a joint probability distribution of. Our proposed method has substantial advantages over alternative methods in terms of the generality of queries that can be answered, the generation of Information that can actively guide perception, the ease of extension, the possibility of including additional kinds of evidences, and its potential for the realization of self-improving and -specializing perception systems. We show for object categorization, which is a subclass of the probabilistic inferences, that impressive categorization performance can be achieved combining the employed expert perception methods in a synergistic manner.