Disfluency

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 5379 Experts worldwide ranked by ideXlab platform

Gokhan Tur - One of the best experts on this subject based on the ideXlab platform.

  • INTERSPEECH - Segmentation and Disfluency removal for conversational speech translation.
    2014
    Co-Authors: Hany Hassan, Lee Schwartz, Dilek Hakkani-tur, Gokhan Tur
    Abstract:

    In this paper we focus on the effect of on-line speech segmentation and Disfluency removal methods on conversational speech translation. In a real-time conversational speech to speech translation system, on-line segmentation of speech is required to avoid latency beyond few seconds. While sentential unit segmentation and Disfluency removal have been heavily studied mainly for off-line speech processing, to the best of our knowledge, the combined effect of these tasks on conversational speech translation has not been investigated. Furthermore, optimization of performance given maximum allowable system latency to enable a conversation is a newer problem for these tasks. We show that the conventional assumption of doing segmentation followed by Disfluency removal is not the best practice. We propose a new approach to do simple-Disfluency removal followed by segmentation and then by complex-Disfluency removal. The proposed approach shows a significant gain on translation performance of up to 3 Bleu points with only 6 second latency to look ahead, using state-ofthe art machine translation and speech recognition systems. Index Terms: speech translation, Disfluency removal, segmentation, sentence units, speech processing

  • segmentation and Disfluency removal for conversational speech translation
    Conference of the International Speech Communication Association, 2014
    Co-Authors: Hany Hassan, Lee Schwartz, Dilek Hakkanitur, Gokhan Tur
    Abstract:

    In this paper we focus on the effect of on-line speech segmentation and Disfluency removal methods on conversational speech translation. In a real-time conversational speech to speech translation system, on-line segmentation of speech is required to avoid latency beyond few seconds. While sentential unit segmentation and Disfluency removal have been heavily studied mainly for off-line speech processing, to the best of our knowledge, the combined effect of these tasks on conversational speech translation has not been investigated. Furthermore, optimization of performance given maximum allowable system latency to enable a conversation is a newer problem for these tasks. We show that the conventional assumption of doing segmentation followed by Disfluency removal is not the best practice. We propose a new approach to do simple-Disfluency removal followed by segmentation and then by complex-Disfluency removal. The proposed approach shows a significant gain on translation performance of up to 3 Bleu points with only 6 second latency to look ahead, using state-ofthe art machine translation and speech recognition systems. Index Terms: speech translation, Disfluency removal, segmentation, sentence units, speech processing

  • automatic Disfluency removal for improving spoken language translation
    International Conference on Acoustics Speech and Signal Processing, 2010
    Co-Authors: Wen Wang, Gokhan Tur, Jing Zheng, Necip Fazil Ayan
    Abstract:

    Statistical machine translation (SMT) systems for spoken languages suffer from conversational speech phenomena, in particular, the presence of speech disfluencies. We examine the impact of disfluencies from broadcast conversation data on our hierarchical phrasebased SMT system and implement automatic Disfluency removal approaches for cleansing the MT input. We evaluate the efficacy of proposed approaches and investigate the impact of Disfluency removal on SMT performance across different Disfluency types. We show that for translating Mandarin broadcast conversational transcripts into English, our automatic Disfluency removal approaches could produce significant improvement in BLEU and TER.

  • ICASSP - Automatic Disfluency removal for improving spoken language translation
    2010 IEEE International Conference on Acoustics Speech and Signal Processing, 2010
    Co-Authors: Wen Wang, Gokhan Tur, Jing Zheng, Necip Fazil Ayan
    Abstract:

    Statistical machine translation (SMT) systems for spoken languages suffer from conversational speech phenomena, in particular, the presence of speech disfluencies. We examine the impact of disfluencies from broadcast conversation data on our hierarchical phrasebased SMT system and implement automatic Disfluency removal approaches for cleansing the MT input. We evaluate the efficacy of proposed approaches and investigate the impact of Disfluency removal on SMT performance across different Disfluency types. We show that for translating Mandarin broadcast conversational transcripts into English, our automatic Disfluency removal approaches could produce significant improvement in BLEU and TER.

Elizabeth Shriberg - One of the best experts on this subject based on the ideXlab platform.

  • INTERSPEECH - Automatic Disfluency identification in conversational speech using multiple knowledge sources.
    2003
    Co-Authors: Yang Liu, Elizabeth Shriberg, Andreas Stolcke
    Abstract:

    Disfluencies occur frequently in spontaneous speech. Detection and correction of disfluencies can make automatic speech recognition transcripts more readable for human readers, and can aid downstream processing by machine. This work investigates a number of knowledge sources for Disfluency detection, including acoustic-prosodic features, a language model (LM) to account for repetition patterns, a part-of-speech (POS) based LM, and rule-based knowledge. Different components are designed for different purposes in the system. Results show that detection of Disfluency interruption points is best achieved by a combination of prosodic cues, word-based cues, and POS-based cues. The onset of a Disfluency to be removed, in contrast, is best found using knowledge-based rules. Finally, specific Disfluency types can be aided by the modeling of word patterns.

  • automatic Disfluency identification in conversational speech using multiple knowledge sources
    Conference of the International Speech Communication Association, 2003
    Co-Authors: Yang Liu, Elizabeth Shriberg, Andreas Stolcke
    Abstract:

    Disfluencies occur frequently in spontaneous speech. Detection and correction of disfluencies can make automatic speech recognition transcripts more readable for human readers, and can aid downstream processing by machine. This work investigates a number of knowledge sources for Disfluency detection, including acoustic-prosodic features, a language model (LM) to account for repetition patterns, a part-of-speech (POS) based LM, and rule-based knowledge. Different components are designed for different purposes in the system. Results show that detection of Disfluency interruption points is best achieved by a combination of prosodic cues, word-based cues, and POS-based cues. The onset of a Disfluency to be removed, in contrast, is best found using knowledge-based rules. Finally, specific Disfluency types can be aided by the modeling of word patterns.

  • Phonetic Consequences of Speech Disfluency
    1999
    Co-Authors: Elizabeth Shriberg
    Abstract:

    Abstract : Unlike read or laboratory speech, spontaneous speech contains high rates of disfluencies (e.g., repetitions, repairs, filled pauses). Such events reflect production problems frequently encountered in everyday conversation. Analyses of American English show that Disfluency affects a variety of phonetic aspects of speech, including segment durations, intonation, voice quality, vowel quality, and coarticulation patterns. These effects provide clues about production processes, and can guide methods for Disfluency processing in speech recognition applications.

  • EUROSPEECH - A prosody only decision-tree model for Disfluency detection.
    1997
    Co-Authors: Elizabeth Shriberg, Rebecca Bates, A. Stolcke
    Abstract:

    Speech disfluencies (filled pauses, repetitions, repairs, a nd false starts) are pervasive in spontaneous speech. The ability to detect and correct disfluencies automatically is important for eff ective natural language understanding, as well as to improve speech models in general. Previous approaches to Disfluency detect ion have relied heavily on lexical information, which makes them less applicable when word recognition is unreliable. We have developed a Disfluency detection method using decision tree classifiers that use only local and automatically extracted pros odic features. Because the model doesn’t rely on lexical informa tion, it is widely applicable even when word recognition is unreliable. The model performed significantly better than chance a t detecting four Disfluency types. It also outperformed a lang uage model in the detection of false starts, given the correct tra nscription. Combining the prosody model with a specialized language model improved accuracy over either model alone for the detection of false starts. Results suggest that a prosody-only mo del can aid the automatic detection of disfluencies in spontaneo us speech.

  • a prosody only decision tree model for Disfluency detection
    Conference of the International Speech Communication Association, 1997
    Co-Authors: Elizabeth Shriberg, Rebecca Bates, A. Stolcke
    Abstract:

    Speech disfluencies (filled pauses, repetitions, repairs, a nd false starts) are pervasive in spontaneous speech. The ability to detect and correct disfluencies automatically is important for eff ective natural language understanding, as well as to improve speech models in general. Previous approaches to Disfluency detect ion have relied heavily on lexical information, which makes them less applicable when word recognition is unreliable. We have developed a Disfluency detection method using decision tree classifiers that use only local and automatically extracted pros odic features. Because the model doesn’t rely on lexical informa tion, it is widely applicable even when word recognition is unreliable. The model performed significantly better than chance a t detecting four Disfluency types. It also outperformed a lang uage model in the detection of false starts, given the correct tra nscription. Combining the prosody model with a specialized language model improved accuracy over either model alone for the detection of false starts. Results suggest that a prosody-only mo del can aid the automatic detection of disfluencies in spontaneo us speech.

Necip Fazil Ayan - One of the best experts on this subject based on the ideXlab platform.

  • automatic Disfluency removal for improving spoken language translation
    International Conference on Acoustics Speech and Signal Processing, 2010
    Co-Authors: Wen Wang, Gokhan Tur, Jing Zheng, Necip Fazil Ayan
    Abstract:

    Statistical machine translation (SMT) systems for spoken languages suffer from conversational speech phenomena, in particular, the presence of speech disfluencies. We examine the impact of disfluencies from broadcast conversation data on our hierarchical phrasebased SMT system and implement automatic Disfluency removal approaches for cleansing the MT input. We evaluate the efficacy of proposed approaches and investigate the impact of Disfluency removal on SMT performance across different Disfluency types. We show that for translating Mandarin broadcast conversational transcripts into English, our automatic Disfluency removal approaches could produce significant improvement in BLEU and TER.

  • ICASSP - Automatic Disfluency removal for improving spoken language translation
    2010 IEEE International Conference on Acoustics Speech and Signal Processing, 2010
    Co-Authors: Wen Wang, Gokhan Tur, Jing Zheng, Necip Fazil Ayan
    Abstract:

    Statistical machine translation (SMT) systems for spoken languages suffer from conversational speech phenomena, in particular, the presence of speech disfluencies. We examine the impact of disfluencies from broadcast conversation data on our hierarchical phrasebased SMT system and implement automatic Disfluency removal approaches for cleansing the MT input. We evaluate the efficacy of proposed approaches and investigate the impact of Disfluency removal on SMT performance across different Disfluency types. We show that for translating Mandarin broadcast conversational transcripts into English, our automatic Disfluency removal approaches could produce significant improvement in BLEU and TER.

Alex Waibel - One of the best experts on this subject based on the ideXlab platform.

  • tight integration of speech Disfluency removal into smt
    Conference of the European Chapter of the Association for Computational Linguistics, 2014
    Co-Authors: Eunah Cho, Jan Niehues, Alex Waibel
    Abstract:

    Speech disfluencies are one of the main challenges of spoken language processing. Conventional Disfluency detection systems deploy a hard decision, which can have a negative influence on subsequent applications such as machine translation. In this paper we suggest a novel approach in which Disfluency detection is integrated into the translation process. We train a CRF model to obtain a Disfluency probability for each word. The SMT decoder will then skip the potentially disfluent word based on its Disfluency probability. Using the suggested scheme, the translation score of both the manual transcript and ASR output is improved by around 0.35 BLEU points compared to the CRF hard decision system.

  • EACL - Tight Integration of Speech Disfluency Removal into SMT
    Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics volume 2: Short Papers, 2014
    Co-Authors: Eunah Cho, Jan Niehues, Alex Waibel
    Abstract:

    Speech disfluencies are one of the main challenges of spoken language processing. Conventional Disfluency detection systems deploy a hard decision, which can have a negative influence on subsequent applications such as machine translation. In this paper we suggest a novel approach in which Disfluency detection is integrated into the translation process. We train a CRF model to obtain a Disfluency probability for each word. The SMT decoder will then skip the potentially disfluent word based on its Disfluency probability. Using the suggested scheme, the translation score of both the manual transcript and ASR output is improved by around 0.35 BLEU points compared to the CRF hard decision system.

Martin Corley - One of the best experts on this subject based on the ideXlab platform.

  • speaker versus listener oriented Disfluency a re examination of arguments and assumptions from autism spectrum disorder
    Journal of Autism and Developmental Disorders, 2017
    Co-Authors: Paul E Engelhardt, Oliver Alfridijanta, Mhairi E G Mcmullon, Martin Corley
    Abstract:

    We re-evaluate conclusions about Disfluency production in high-functioning forms of autism spectrum disorder (HFA). Previous studies examined individuals with HFA to address a theoretical question regarding speaker- and listener-oriented disfluencies. Individuals with HFA tend to be self-centric and have poor pragmatic language skills, and should be less likely to produce listener-oriented Disfluency. However, previous studies did not account for individual differences variables that affect Disfluency. We show that both matched and unmatched controls produce fewer repairs than individuals with HFA. For silent pauses, there was no difference between matched controls and HFA, but both groups produced more than unmatched controls. These results identify limitations in prior research and shed light on the relationship between autism spectrum disorders and disfluent speech.

  • Disfluency in dialogue: an intentional signal from the speaker?
    Psychonomic Bulletin & Review, 2012
    Co-Authors: Ian R. Finlayson, Martin Corley
    Abstract:

    Disfluency is a characteristic feature of spontaneous human speech, commonly seen as a consequence of problems with production. However, the question remains open as to why speakers are disfluent: Is it a mechanical by-product of planning difficulty, or do speakers use Disfluency in dialogue to manage listeners’ expectations? To address this question, we present two experiments investigating the production of Disfluency in monologue and dialogue situations. Dialogue affected the linguistic choices made by participants, who aligned on referring expressions by choosing less frequent names for ambiguous images where those names had previously been mentioned. However, participants were no more disfluent in dialogue than in monologue situations, and the distribution of types of Disfluency used remained constant. Our evidence rules out at least a straightforward interpretation of the view that disfluencies are an intentional signal in dialogue.

  • it s the way that you er say it hesitations in speech affect language comprehension
    Cognition, 2007
    Co-Authors: Martin Corley, Lucy J Macgregor, David I. Donaldson
    Abstract:

    Everyday speech is littered with Disfluency, often correlated with the production of less predictable words (e.g., Beattie & Butterworth [Beattie, G., & Butterworth, B. (1979). Contextual probability and word frequency as determinants of pauses in spontaneous speech. Language and Speech, 22, 201-211.]). But what are the effects of Disfluency on listeners? In an ERP experiment which compared fluent to disfluent utterances, we established an N400 effect for unpredictable compared to predictable words. This effect, reflecting the difference in ease of integrating words into their contexts, was reduced in cases where the target words were preceded by a hesitation marked by the word er. Moreover, a subsequent recognition memory test showed that words preceded by Disfluency were more likely to be remembered. The study demonstrates that hesitation affects the way in which listeners process spoken language, and that these changes are associated with longer-term consequences for the representation of the message.

  • The Influence of Lexical, Conceptual and Planning Based Factors on Disfluency Production
    2006
    Co-Authors: Michael J. Schnadt, Martin Corley
    Abstract:

    The Influence of Lexical, Conceptual and Planning Based Factors on Disfluency Production Michael J. Schnadt (m.j.schnadt@sms.ed.ac.uk) School of Philosophy, Psychology and Language Sciences, Edinburgh University, 7 George Square, Edinburgh, EH8 9JZ, UK Martin Corley (martin.corley@ed.ac.uk) School of Philosophy, Psychology and Language Sciences, Edinburgh University, 7 George Square, Edinburgh, EH8 9JZ, UK Abstract it is entirely unclear what the underlying cause of these disfluencies might be. They could be a consequence of the relatively low frequencies (compared to closed-class words) with which open-class words are likely to occur (Levelt, 1983), or in other words, they could be caused by lexical retrieval difficulties. Alternatively, disfluencies could be attributed to the vastly greater choice of open-class words available to the speaker (Schachter, Christenfeld, Ravina & Bilous, 1991), or to difficulties with lexical choice and access. Finally, they may be due to causes outwith the language system: if, for example, a speaker is trying to name an unfamiliar or ambiguous object (Siegman & Pope, 1966). Effectively, the difficulties signaled by disfluencies could occur at any stage of the speech process: during planning, lexical retrieval or articulation of the speech plan, and it has been argued that different types of Disfluency may signal different kinds of problems during production (Bortfield et al., 2001). Clearly, a better understanding of the underlying causes of Disfluency would provide an important contribution our understanding of language production in general. In this paper, we present two experiments which use the Network Task (Levelt, 1983; Oomen & Postma, 2001) to explicitly manipulate the content of what people say when describing a network of objects, in order to investigate a priori what factors influence the production of Disfluency. In Experiment 1, the frequency and name agreement of items in the networks are varied; Experiment 2 varies the visual accessibility of pictures used, reflecting difficulties that do not have their origin in the linguistic system. These experiments allow us to investigate the causes of Disfluency directly, establishing whether different disfluencies serve different purposes. Two experiments were conducted to elicit naturalistic speech, while manipulating factors thought to influence Disfluency production. Participants described the route taken by a marker through visually presented networks of objects linked via one or more paths. In Experiment 1, lexical frequency and name agreement of the object names were manipulated; in Experiment 2, linguistic properties were kept constant and accessibility was manipulated through visual blurring. An increase in Disfluency was observed immediately preceding object names in cases where the objects named were either low frequency or blurred. In both experiments, prolongations were the most frequently occurring class of Disfluency. Additionally, when disfluencies during the path description were examined, more possible path choices led to greater numbers of disfluencies, which were predominantly filled pauses and repairs. This study allows us to draw preliminary conclusions about the influence of lexical, conceptual and planning-based factors on Disfluency production and to begin to determine precisely the circumstances under which disfluencies occur in natural speech. Introduction Natural spoken language is full of Disfluency, generally defined as “phenomena that interrupt the flow of speech and do not add propositional content to an utterance” (Fox Tree, 1995). These phenomena include pauses, interruptions, substitutions, repeated words or phrases, prolongations, such as the pronounced “thee”, and filled pauses such as um and uh. Such disruptions are very frequent: Brennan and Schober (2001) estimate that around 10% of utterances contain at least one Disfluency, whereas Fox Tree (1995), averaging across a number of studies, estimated that the rate of disfluencies in spontaneous speech is about 6 per 100 words. While the distribution of disfluencies in spontaneous speech is relatively well understood from corpus-based studies (e.g., Bortfield, Leon, Bloom, Schober & Brennan, 2001; Clark & Fox Tree, 2002; Shriberg, 1996), and claims have been made as to the differing functions of different types of Disfluency (e.g., Clark & Fox Tree, 2002), experimental studies to date have proved somewhat inconclusive as to their cause. Take, for example, the finding that disfluencies are more likely to precede open- class words (Maclay & Osgood, 1959). On current evidence, Experiment 1 Method The experiment was presented as a communication task. Participants described the route taken by a marker through a network of objects to a listener situated behind a screen.