Structured Language

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 258 Experts worldwide ranked by ideXlab platform

Frederick Jelinek - One of the best experts on this subject based on the ideXlab platform.

  • NIPS - Using Random Forests in the Structured Language Model
    2004
    Co-Authors: Frederick Jelinek
    Abstract:

    In this paper, we explore the use of Random Forests (RFs) in the Structured Language model (SLM), which uses rich syntactic information in predicting the next word based on words already seen. The goal in this work is to construct RFs by randomly growing Decision Trees (DTs) using syntactic information and investigate the performance of the SLM modeled by the RFs in automatic speech recognition. RFs, which were originally developed as classifiers, are a combination of decision tree classifiers. Each tree is grown based on random training data sampled independently and with the same distribution for all trees in the forest, and a random selection of possible questions at each node of the decision tree. Our approach extends the original idea of RFs to deal with the data sparseness problem encountered in Language modeling. RFs have been studied in the context of n-gram Language modeling and have been shown to generalize well to unseen data. We show in this paper that RFs using syntactic information can also achieve better performance in both perplexity (PPL) and word error rate (WER) in a large vocabulary speech recognition system, compared to a baseline that uses Kneser-Ney smoothing.

  • exact training of a neural syntactic Language model
    International Conference on Acoustics Speech and Signal Processing, 2004
    Co-Authors: Ahmad Emami, Frederick Jelinek
    Abstract:

    The Structured Language model (SLM) aims at predicting the next word in a given word string by making a syntactical analysis of the preceding words. However, it faces the data sparseness problem because of the large dimensionality and diversity of the information available in the syntactic parsing. Previously, we proposed using neural network models for the SLM (Emami, A. et al., Proc. ICASSP, 2003; Emami, Proc. EUROSPEECH'03., 2003). The neural network model is better suited to tackle the data sparseness problem and its use gave significant improvements in perplexity and word error rate over the baseline SLM. We present a new method of training the neural net based SLM. This procedure makes use of the partial parsing hypothesized by the SLM itself, and is more expensive than the approximate training method used previously. Experiments with the new training method on the UPenn and WSJ corpora show significant reductions in perplexity and word error rate, achieving the lowest published results for the given corpora.

  • Stochastic Analysis of Structured Language Modeling
    Mathematical Foundations of Speech and Language Processing, 2004
    Co-Authors: Frederick Jelinek
    Abstract:

    As previously introduced, the Structured Language Model (SLM) operated with the help of a stack from which less probable sub-parse entries were purged before further words were generated. In this article we generalize the CKY algorithm to obtain a chart which allows the direct computation of Language model probabilities thus rendering the stacks unnecessary. An analysis of the behavior of the SLM leads to a generalization of the Inside–Outside algorithm and thus to rigorous EM type re-estimation of the SLM parameters. The derived algorithms are computationally expensive but their demands can be mitigated by use of appropriate thresholding.

  • training connectionist models for the Structured Language model
    Empirical Methods in Natural Language Processing, 2003
    Co-Authors: Ahmad Emami, Frederick Jelinek
    Abstract:

    We investigate the performance of the Structured Language Model (SLM) in terms of perplexity (PPL) when its components are modeled by connectionist models. The connectionist models use a distributed representation of the items in the history and make much better use of contexts than currently used interpolated or back-off models, not only because of the inherent capability of the connectionist model in fighting the data sparseness problem, but also because of the sublinear growth in the model size when the context length is increased. The connectionist models can be further trained by an EM procedure, similar to the previously used procedure for training the SLM. Our experiments show that the connectionist models can significantly improve the PPL over the interpolated and back-off models on the UPENN Treebank corpora, after interpolating with a baseline trigram Language model. The EM training procedure can improve the connectionist models further, by using hidden events obtained by the SLM parser.

  • EMNLP - Training connectionist models for the Structured Language model
    Proceedings of the 2003 conference on Empirical methods in natural language processing -, 2003
    Co-Authors: Ahmad Emami, Frederick Jelinek
    Abstract:

    We investigate the performance of the Structured Language Model (SLM) in terms of perplexity (PPL) when its components are modeled by connectionist models. The connectionist models use a distributed representation of the items in the history and make much better use of contexts than currently used interpolated or back-off models, not only because of the inherent capability of the connectionist model in fighting the data sparseness problem, but also because of the sublinear growth in the model size when the context length is increased. The connectionist models can be further trained by an EM procedure, similar to the previously used procedure for training the SLM. Our experiments show that the connectionist models can significantly improve the PPL over the interpolated and back-off models on the UPENN Treebank corpora, after interpolating with a baseline trigram Language model. The EM training procedure can improve the connectionist models further, by using hidden events obtained by the SLM parser.

Ciprian Chelba - One of the best experts on this subject based on the ideXlab platform.

  • a study on richer syntactic dependencies for Structured Language modeling
    Meeting of the Association for Computational Linguistics, 2002
    Co-Authors: Ciprian Chelba, Frederick Jelinek
    Abstract:

    We study the impact of richer syntactic dependencies on the performance of the Structured Language model (SLM) along three dimensions: parsing accuracy (LP/LR), perplexity (PPL) and word-error-rate (WER, N-best re-scoring). We show that our models achieve an improvement in LP/LR, PPL and/or WER over the reported baseline results using the SLM on the UPenn Treebank and Wall Street Journal (WSJ) corpora, respectively. Analysis of parsing performance shows correlation between the quality of the parser (as measured by precision/recall) and the Language model performance (PPL and WER). A remarkable fact is that the enriched SLM outperforms the baseline 3-gram model in terms of WER by 10% when used in isolation as a second pass (N-best re-scoring) Language model.

  • Information Extraction Using the Structured Language Model
    arXiv: Computation and Language, 2001
    Co-Authors: Ciprian Chelba, Milind Mahajan
    Abstract:

    The paper presents a data-driven approach to information extraction (viewed as template filling) using the Structured Language model (SLM) as a statistical parser. The task of template filling is cast as constrained parsing using the SLM. The model is automatically trained from a set of sentences annotated with frame/slot labels and spans. Training proceeds in stages: first a constrained syntactic parser is trained such that the parses on training data meet the specified semantic spans, then the non-terminal labels are enriched to contain semantic information and finally a constrained syntactic+semantic parser is trained on the parse trees resulting from the previous stage. Despite the small amount of training data used, the model is shown to outperform the slot level accuracy of a simple semantic grammar authored manually for the MiPad --- personal information management --- task.

  • EMNLP - Information Extraction Using the Structured Language Model.
    2001
    Co-Authors: Ciprian Chelba, Milind Mahajan
    Abstract:

    The paper presents a data-driven approach to information extraction (viewed as template lling) using the Structured Language model (SLM) as a statistical parser. The task of template lling is cast as constrained parsing using the SLM. The model is automatically trained from a set of sentences annotated with frame/slot labels and spans. Training proceeds in stages: rst a constrained syntactic parser is trained such that the parses on training data meet the speci ed semantic spans, then the non-terminal labels are enriched to contain semantic information and nally a constrained syntactic+semantic parser is trained on the parse trees resulting from the previous stage. Despite the small amount of training data used, the model is shown to outperform the slot level accuracy of a simple semantic grammar authored manually for the MiPad | personal information management | task.

  • ACL - A Study on Richer Syntactic Dependencies for Structured Language Modeling
    Proceedings of the 40th Annual Meeting on Association for Computational Linguistics - ACL '02, 2001
    Co-Authors: Ciprian Chelba, Frederick Jelinek
    Abstract:

    We study the impact of richer syntactic dependencies on the performance of the Structured Language model (SLM) along three dimensions: parsing accuracy (LP/LR), perplexity (PPL) and word-error-rate (WER, N-best re-scoring). We show that our models achieve an improvement in LP/LR, PPL and/or WER over the reported baseline results using the SLM on the UPenn Treebank and Wall Street Journal (WSJ) corpora, respectively. Analysis of parsing performance shows correlation between the quality of the parser (as measured by precision/recall) and the Language model performance (PPL and WER). A remarkable fact is that the enriched SLM outperforms the baseline 3-gram model in terms of WER by 10% when used in isolation as a second pass (N-best re-scoring) Language model.

  • Structured Language Modeling for Speech Recognition
    arXiv: Computation and Language, 2000
    Co-Authors: Ciprian Chelba, Frederick Jelinek
    Abstract:

    A new Language model for speech recognition is presented. The model develops hidden hierarchical syntactic-like structure incrementally and uses it to extract meaningful information from the word history, thus complementing the locality of currently used trigram models. The Structured Language model (SLM) and its performance in a two-pass speech recognizer --- lattice decoding --- are presented. Experiments on the WSJ corpus show an improvement in both perplexity (PPL) and word error rate (WER) over conventional trigram models.

Ahmad Emami - One of the best experts on this subject based on the ideXlab platform.

  • exact training of a neural syntactic Language model
    International Conference on Acoustics Speech and Signal Processing, 2004
    Co-Authors: Ahmad Emami, Frederick Jelinek
    Abstract:

    The Structured Language model (SLM) aims at predicting the next word in a given word string by making a syntactical analysis of the preceding words. However, it faces the data sparseness problem because of the large dimensionality and diversity of the information available in the syntactic parsing. Previously, we proposed using neural network models for the SLM (Emami, A. et al., Proc. ICASSP, 2003; Emami, Proc. EUROSPEECH'03., 2003). The neural network model is better suited to tackle the data sparseness problem and its use gave significant improvements in perplexity and word error rate over the baseline SLM. We present a new method of training the neural net based SLM. This procedure makes use of the partial parsing hypothesized by the SLM itself, and is more expensive than the approximate training method used previously. Experiments with the new training method on the UPenn and WSJ corpora show significant reductions in perplexity and word error rate, achieving the lowest published results for the given corpora.

  • training connectionist models for the Structured Language model
    Empirical Methods in Natural Language Processing, 2003
    Co-Authors: Ahmad Emami, Frederick Jelinek
    Abstract:

    We investigate the performance of the Structured Language Model (SLM) in terms of perplexity (PPL) when its components are modeled by connectionist models. The connectionist models use a distributed representation of the items in the history and make much better use of contexts than currently used interpolated or back-off models, not only because of the inherent capability of the connectionist model in fighting the data sparseness problem, but also because of the sublinear growth in the model size when the context length is increased. The connectionist models can be further trained by an EM procedure, similar to the previously used procedure for training the SLM. Our experiments show that the connectionist models can significantly improve the PPL over the interpolated and back-off models on the UPENN Treebank corpora, after interpolating with a baseline trigram Language model. The EM training procedure can improve the connectionist models further, by using hidden events obtained by the SLM parser.

  • EMNLP - Training connectionist models for the Structured Language model
    Proceedings of the 2003 conference on Empirical methods in natural language processing -, 2003
    Co-Authors: Ahmad Emami, Frederick Jelinek
    Abstract:

    We investigate the performance of the Structured Language Model (SLM) in terms of perplexity (PPL) when its components are modeled by connectionist models. The connectionist models use a distributed representation of the items in the history and make much better use of contexts than currently used interpolated or back-off models, not only because of the inherent capability of the connectionist model in fighting the data sparseness problem, but also because of the sublinear growth in the model size when the context length is increased. The connectionist models can be further trained by an EM procedure, similar to the previously used procedure for training the SLM. Our experiments show that the connectionist models can significantly improve the PPL over the interpolated and back-off models on the UPENN Treebank corpora, after interpolating with a baseline trigram Language model. The EM training procedure can improve the connectionist models further, by using hidden events obtained by the SLM parser.

Ufuk Topcu - One of the best experts on this subject based on the ideXlab platform.

  • Counterexamples for Robotic Planning Explained in Structured Language
    arXiv: Robotics, 2018
    Co-Authors: Lu Feng, Mahsa Ghasemi, Kai-wei Chang, Ufuk Topcu
    Abstract:

    Automated techniques such as model checking have been used to verify models of robotic mission plans based on Markov decision processes (MDPs) and generate counterexamples that may help diagnose requirement violations. However, such artifacts may be too complex for humans to understand, because existing representations of counterexamples typically include a large number of paths or a complex automaton. To help improve the interpretability of counterexamples, we define a notion of explainable counterexample, which includes a set of Structured natural Language sentences to describe the robotic behavior that lead to a requirement violation in an MDP model of robotic mission plan. We propose an approach based on mixed-integer linear programming for generating explainable counterexamples that are minimal, sound and complete. We demonstrate the usefulness of the proposed approach via a case study of warehouse robots planning.

  • ICRA - Counterexamples for Robotic Planning Explained in Structured Language
    2018 IEEE International Conference on Robotics and Automation (ICRA), 2018
    Co-Authors: Lu Feng, Mahsa Ghasemi, Kai-wei Chang, Ufuk Topcu
    Abstract:

    Automated techniques such as model checking have been used to verify models of robotic mission plans based on Markov decision processes (MDPs) and generate counterexamples that may help diagnose requirement violations. However, such artifacts may be too complex for humans to understand, because existing representations of counterexamples typically include a large number of paths or a complex automaton. To help improve the interpretability of counterexamples, we define a notion of explainable counterexample, which includes a set of Structured natural Language sentences to describe the robotic behavior that lead to a requirement violation in an MDP model of robotic mission plan. We propose an approach based on mixed-integer linear programming for generating explainable counterexamples that are minimal, sound and complete. We demonstrate the usefulness of the proposed approach via a case study of warehouse robots planning.

Richard L. Sparks - One of the best experts on this subject based on the ideXlab platform.

  • Teaching a foreign Language using multisensory Structured Language techniques to at-risk learners: a review.
    Dyslexia (Chichester England), 2000
    Co-Authors: Richard L. Sparks, Karen Miller
    Abstract:

    An overview of multisensory Structured Language (MSL) techniques used to teach a foreign Language to at-risk students is outlined. Research supporting the use of MSL techniques is reviewed. Specific activities using the MSL approach to teach the phonology/orthography, grammar and vocabulary of the foreign Language as well as reading and communicative activities in the foreign Language are presented.

  • Benefits of multisensory Structured Language instruction for at-risk foreign Language learners: A comparison study of high school Spanish students
    Annals of Dyslexia, 1998
    Co-Authors: Richard L. Sparks, Karen Miller, Marjorie Artzer, Jon M. Patton, Leonore Ganschow, Dorothy J. Hordubay, Geri Walsh
    Abstract:

    In this study, the benefits of multisensory Structured Language (MSL) instruction in Spanish were examined. Participants were students in high-school-level Spanish attending girls’ preparatory schools. Of the 55 participants, 39 qualified as at-risk for foreign Language learning difficulties and 16 were deemed not-at-risk. The at-risk students were assigned to one of three conditions: (1) MSL—multisensory Spanish instruction in self-contained classrooms (n=14); (2) SC—traditional Spanish instruction provided in self-contained classrooms (n=11); and (3) NSC—traditional Spanish instruction in regular (not self-contained) Spanish classes (n=14). Not-at-risk students (n=16) received traditional Spanish instruction in regular classes similar to the instruction provided to the NSC group.

  • The Effects of Multisensory Structured Language Instruction on Native Language and Foreign Language Aptitude Skills of At-Risk High School Foreign Language Learners: A Replication and Follow-up Study
    Annals of dyslexia, 1993
    Co-Authors: Richard L. Sparks, Leonore Ganschow
    Abstract:

    According to research findings, most students who experience foreign Language learning problems are thought to have overt or subtle native Language learning difficulties, primarily with phonological processing. A recent study by the authors showed that when a multisensory Structured Language approach to teaching Spanish was used with a group of at-risk high school students, the group’s pre- and posttest scores on native Language phonological processing, verbal memory and vocabulary, and foreign Language aptitude measures significantly improved. In this replication and follow-up study, the authors compared pre- and posttest scores of a second group of students (Cohort 2) who received MSL instruction in Spanish on native Language and foreign Language aptitude measures. They also followed students from the first study (Cohort 1) over a second year of foreign Language instruction. Findings showed that the second cohort made significant gains on three native Language phonological measures and a test of foreign Language aptitude. Follow-up testing on the first cohort showed that the group maintained its initial gains on all native Language and foreign Language aptitude measures. Implications for the authors’ Linguistic Coding Deficit Hypothesis are discussed and linked with current reading research, in particular the concepts of the assumption of specificity and modularity.

  • The effects of multisensory Structured Language instruction on native Language and foreign Language aptitude skills of at-risk high school foreign Language learners.
    Annals of dyslexia, 1992
    Co-Authors: Richard L. Sparks, Leonore Ganschow, Jane Pohlman, Sue Skinner, Marjorie Artzer
    Abstract:

    Research findings suggest that most students who have foreign Language learning problems have Language-based difficulties and, in particular, phonological processing problems. Authors of the present study examined pre- and posttest scores on native Language and foreign Language aptitude tests of three groups of at-risk high school students enrolled in special, self-contained sections of first-year Spanish. Two groups were instructed using a multisensory Structured Language (MSL) approach. One of the groups was taught in both English and Spanish (MSL/ES), the other only in Spanish (MSL/S). The third group (NO-MSL) was instructed using more traditional second Language teaching methodologies. Significant gains were made by the MSL-ES group on measures of native Language phonology, vocabulary, and verbal memory and on a test of foreign Language aptitude; the MSL/S group made significant gains on the test of foreign Language aptitude. No significant gains on the native Language or foreign Language aptitude measures were made by the NO-MSL group. Implications for foreign Language classroom instruction of at-risk students are discussed.