Syntactic Level

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 32970 Experts worldwide ranked by ideXlab platform

Houkuan Huang - One of the best experts on this subject based on the ideXlab platform.

  • a multi layer text classification framework based on two Level representation model
    Expert Systems With Applications, 2012
    Co-Authors: Liping Jing, Jian Yu, Houkuan Huang
    Abstract:

    Text categorization is one of the most common themes in data mining and machine learning fields. Unlike structured data, unstructured text data is more difficult to be analyzed because it contains complicated both Syntactic and semantic information. In this paper, we propose a two-Level representation model (2RM) to represent text data, one is for representing Syntactic information and the other is for semantic information. Each document, in Syntactic Level, is represented as a term vector where the value of each component is the term frequency and inverse document frequency. The Wikipedia concepts related to terms in Syntactic Level are used to represent document in semantic Level. Meanwhile, we designed a multi-layer classification framework (MLCLA) to make use of the semantic and Syntactic information represented in 2RM model. The MLCLA framework contains three classifiers. Among them, two classifiers are applied on Syntactic Level and semantic Level in parallel. The outputs of these two classifiers will be combined and input to the third classifier, so that the final results can be obtained. Experimental results on benchmark data sets (20Newsgroups, Reuters-21578 and Classic3) have shown that the proposed 2RM model plus MLCLA framework improves the text classification performance by comparing with the existing flat text representation models (Term-based VSM, Term Semantic Kernel Model, Concept-based VSM, Concept Semantic Kernel Model and Term+Concept VSM) plus existing classification methods.

  • KES (2) - Semantics-based representation model for multi-layer text classification
    Knowledge-Based and Intelligent Information and Engineering Systems, 2010
    Co-Authors: Jiali Yun, Liping Jing, Houkuan Huang
    Abstract:

    Text categorization is one of the most common themes in data mining and machine learning fields. Unlike structured data, unstructured text data is more complicated to be analyzed because it contains too much information, e.g., Syntactic and semantic. In this paper, we propose a semantics-based model to represent text data in two Levels. One Level is for Syntactic information and the other is for semantic information. Syntactic Level represents each document as a term vector, and the component records tf-idf value of each term. The semantic Level represents document with Wikipedia concepts related to terms in Syntactic Level. The Syntactic and semantic information are efficiently combined by our proposed multi-layer classification framework. Experimental results on benchmark dataset (Reuters-21578) have shown that the proposed representation model plus proposed classification framework improves the performance of text classification by comparing with the flat text representation models (term VSM, concept VSM, term+concept VSM) plus existing classification methods.

Charles L A Clarke - One of the best experts on this subject based on the ideXlab platform.

  • a comparative evaluation of techniques for Syntactic Level source code analysis
    Asia-Pacific Software Engineering Conference, 2000
    Co-Authors: Anthony Cox, Charles L A Clarke
    Abstract:

    Many program maintenance tools rely on traditional parsing techniques to obtain Syntactic Level models of the code being maintained. When, for some reason, code cannot be parsed, software maintainers are forced to fall back on ad hoc tools and techniques, such as grep. As an alternative, hierarchical lexical analysis augmented with simple data structures can be used to extract an approximation of the abstract syntax for a source file. Experiments indicate that such an approach is feasible and produces results comparable to those obtained using a parser.

  • APSEC - A comparative evaluation of techniques for Syntactic Level source code analysis
    Proceedings Seventh Asia-Pacific Software Engeering Conference. APSEC 2000, 1
    Co-Authors: Anthony Cox, Charles L A Clarke
    Abstract:

    Many program maintenance tools rely on traditional parsing techniques to obtain Syntactic Level models of the code being maintained. When, for some reason, code cannot be parsed, software maintainers are forced to fall back on ad hoc tools and techniques, such as grep. As an alternative, hierarchical lexical analysis augmented with simple data structures can be used to extract an approximation of the abstract syntax for a source file. Experiments indicate that such an approach is feasible and produces results comparable to those obtained using a parser.

György Szaszák - One of the best experts on this subject based on the ideXlab platform.

  • Using prosody to improve automatic speech recognition
    Speech Communication, 2010
    Co-Authors: Klára Vicsi, György Szaszák
    Abstract:

    In this paper acoustic processing and modelling of the supra-segmental characteristics of speech is addressed, with the aim of incorporating advanced Syntactic and semantic Level processing of spoken language for speech recognition/understanding tasks. The proposed modelling approach is very similar to the one used in standard speech recognition, where basic HMM units (the most often acoustic phoneme models) are trained and are then connected according to the dictionary and some grammar (language model) to obtain a recognition network, along which recognition can be interpreted also as an alignment process. In this paper the HMM framework is used to model speech prosody, and to perform initial Syntactic and/or semantic Level processing of the input speech in parallel to standard speech recognition. As acoustic-prosodic features, fundamental frequency and energy are used. A method was implemented for Syntactic Level information extraction from the speech. The method was designed to work for fixed-stress languages, and it yields a segmentation of the input speech for Syntactically linked word groups, or even single words corresponding to a Syntactic unit (these word groups are sometimes referred to as phonological phrases in psycholinguistics, which can consist of one or more words). These so-called word-stress units are marked by prosody, and have an associated fundamental frequency and/or energy contour which allows their discovery. For this, HMMs for the different types of word-stress unit contours were trained and then used for recognition and alignment of such units from the input speech. This prosodic segmentation of the input speech also allows word-boundary recovery and can be used for N-best lattice rescoring based on prosodic information. The Syntactic Level input speech segmentation algorithm was evaluated for the Hungarian and for the Finnish languages that have fixed stress on the first syllable. (This means if a word is stressed, stress is realized on the first syllable of the word.) The N-best rescoring based on Syntactic Level word-stress unit alignment was shown to augment the number of correctly recognized words. For further Syntactic and semantic Level processing of the input speech in ASR, clause and sentence boundary detection and modality (sentence type) recognition was implemented. Again, the classification was carried out by HMMs, which model the prosodic contour for each clause and/or sentence modality type. Clause (and hence also sentence) boundary detection was based on HMM's excellent capacity in aligning dynamically the reference prosodic structure to the utterance coming from the ASR input. This method also allows punctuation to be automatically marked. This semantic Level processing of speech was investigated for the Hungarian and for the German languages. The correctness of recognized types of modalities was 69% for Hungarian, and 78% for German.

Yinuo Guo - One of the best experts on this subject based on the ideXlab platform.

  • meteor 2 0 adopt Syntactic Level paraphrase knowledge into machine translation evaluation
    Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers Day 1), 2019
    Co-Authors: Yinuo Guo
    Abstract:

    This paper describes Meteor++ 2.0, our submission to the WMT19 Metric Shared Task. The well known Meteor metric improves machine translation evaluation by introducing paraphrase knowledge. However, it only focuses on the lexical Level and utilizes consecutive n-grams paraphrases. In this work, we take into consideration Syntactic Level paraphrase knowledge, which sometimes may be skip-grams. We describe how such knowledge can be extracted from Paraphrase Database (PPDB) and integrated into Meteor-based metrics. Experiments on WMT15 and WMT17 evaluation datasets show that the newly proposed metric outperforms all previous versions of Meteor.

  • WMT (2) - Meteor++ 2.0: Adopt Syntactic Level Paraphrase Knowledge into Machine Translation Evaluation
    Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers Day 1), 2019
    Co-Authors: Yinuo Guo
    Abstract:

    This paper describes Meteor++ 2.0, our submission to the WMT19 Metric Shared Task. The well known Meteor metric improves machine translation evaluation by introducing paraphrase knowledge. However, it only focuses on the lexical Level and utilizes consecutive n-grams paraphrases. In this work, we take into consideration Syntactic Level paraphrase knowledge, which sometimes may be skip-grams. We describe how such knowledge can be extracted from Paraphrase Database (PPDB) and integrated into Meteor-based metrics. Experiments on WMT15 and WMT17 evaluation datasets show that the newly proposed metric outperforms all previous versions of Meteor.

Suhyun Park - One of the best experts on this subject based on the ideXlab platform.

  • Syntactic-Level integration and display of multiple domains' S-100-based data for e-navigation
    Cluster Computing, 2017
    Co-Authors: Daewon Park, Suhyun Park
    Abstract:

    In the maritime field, interest in the utilization of multiple and various domains' data for provision of relevant, accurate and timely information ensuring the safety and security of navigation at sea has been growing. Discussion, for example, of the implementation of e-navigation, a new maritime service paradigm introduced by the International Maritime Organization, is ongoing. E-navigation enables and facilitates the on- and off-shore exchange, sharing and utilization of marine and marine-related domains' data in support of users' decision-making. For consistent exchange and sharing of marine and marine-related data in the e-navigation environment, the International Hydrographic Organization's S-100 has been adopted as the baseline of the common maritime data structure. S-100 provides common data models for consistent definition of data elements representing data contents. To that end, it supports Syntactic-Level data interoperability among S-100 applied e-navigation systems. However, the current S-100 does not provide methods or models for formal representation of data semantics or semantic-Level harmonization of data. Therefore, current e-navigation-system efforts are focusing on the utilization of multiple domains' data at the Syntactic-Level. To provide relevant information for various marine activities, e-navigation systems should be able to handle various domains' data and integrate them at the Syntactic-Level. In this paper, we introduce a method by which multiple domains' S-100-based data can be integrated according to the characteristics of such data. Additionally, we present a method for consistent display and integration of multiple domains' S-100-based data. These methods promise to greatly facilitate S-100-based e-navigation systems' handling of multiple and various domains' data.