Speech Application

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 324 Experts worldwide ranked by ideXlab platform

D.a. Reynolds - One of the best experts on this subject based on the ideXlab platform.

  • Measuring fine structure in Speech: Application to speaker identification
    1995 International Conference on Acoustics Speech and Signal Processing, 1995
    Co-Authors: C.r. Jankowski, T.f. Quatieri, D.a. Reynolds
    Abstract:

    The performance of systems for speaker identification (SID) can be quite good with clean Speech, though much lower with degraded Speech. Thus it is useful to search for new features for SID, particularly features that are robust over a degraded channel. This paper investigates features that are based on amplitude and frequency modulations of Speech formants, high resolution measurement of fundamental frequency and location of "secondary pulses", measured using a high-resolution energy operator. When these features are added to traditional features using an existing SID system with a 168 speaker telephone Speech database, SID performance improved by as much as 4% for male speakers and 8.2% for female speakers.

  • ICASSP - Measuring fine structure in Speech: Application to speaker identification
    1995 International Conference on Acoustics Speech and Signal Processing, 1995
    Co-Authors: Charles Jankowski, T.f. Quatieri, D.a. Reynolds
    Abstract:

    The performance of systems for speaker identification (SID) can be quite good with clean Speech, though much lower with degraded Speech. Thus it is useful to search for new features for SID, particularly features that are robust over a degraded channel. This paper investigates features that are based on amplitude and frequency modulations of Speech formants, high resolution measurement of fundamental frequency and location of "secondary pulses", measured using a high-resolution energy operator. When these features are added to traditional features using an existing SID system with a 168 speaker telephone Speech database, SID performance improved by as much as 4% for male speakers and 8.2% for female speakers.

Frédéric Béchet - One of the best experts on this subject based on the ideXlab platform.

  • Adapting lexical representation and OOV handling from written to spoken language with word embedding
    Proceedings of the Annual Conference of the International Speech Communication Association INTERSPEECH, 2015
    Co-Authors: Jeremie Tafforeau, Thierry Artières, Benoit Favre, Frédéric Béchet
    Abstract:

    Word embeddings have become ubiquitous in NLP, especially when using neural networks. One of the assumptions of such representations is that words with similar properties have similar representation, allowing for better generalization from subsequent models. In the standard setting, two kinds of training corpora are used: A very large unlabeled corpus for learning the word embedding representations; and an in-domain training corpus with gold labels for training classifiers on the target NLP task. Because of the amount of data required to learn embeddings, they are trained on large corpus of written text. This can be an issue when dealing with non-canonical language, such as spontaneous Speech: embeddings have to be adapted to fit the particularities of spoken transcriptions. However the adaptation corpus available for a given Speech Application can be limited, resulting in a high number of words from the embedding space not occurring in the adaptation space. We present in this paper a method for adapting an embedding space trained on written text to a spoken corpus of limited size. In particular we deal with words from the embedding space not occurring in the adaptation data. We report experiments done on a Part-Of- Speech task on spontaneous Speech transcriptions collected in a call-centre. We show that our word embedding adaptation approach outperforms state-of-the-art Conditional Random Field approach when little in-domain adaptation data is available. Copyright © 2015 ISCA.

  • syntactic annotation of spontaneous Speech Application to call center conversation data
    Language Resources and Evaluation, 2012
    Co-Authors: Thierry Bazillon, Frédéric Béchet, Melanie Deplano, Alexis Nasr, Benoit Favre
    Abstract:

    This paper describes the syntactic annotation process of the DECODA corpus. This corpus contains manual transcriptions of spoken conversations recorded in the French call-center of the Paris Public Transport Authority (RATP). Three levels of syntactic annotation have been performed with a semi-supervised approach: POS tags, Syntactic Chunks and Dependency parses. The main idea is to use off-the-shelf NLP tools and models, originaly developped and trained on written text, to perform a first automatic annotation on the manually transcribed corpus. At the same time a fully manual annotation process is performed on a subset of the original corpus, called the GOLD corpus. An iterative process is then applied, consisting in manually correcting errors found in the automatic annotations, retraining the linguistic models of the NLP tools on this corrected corpus, then checking the quality of the adapted models on the fully manual annotations of the GOLD corpus. This process iterates until a certain error rate is reached. This paper describes this process, the main issues raising when adapting NLP tools to process Speech transcriptions, and presents the first evaluations performed with these new adapted tools.

Paul Deléglise - One of the best experts on this subject based on the ideXlab platform.

  • Characterizing and detecting spontaneous Speech: Application to speaker role recognition
    Speech Communication, 2014
    Co-Authors: Richard Dufour, Yannick Estève, Paul Deléglise
    Abstract:

    Processing spontaneous Speech is one of the many challenges that automatic Speech recognition systems have to deal with. The main characteristics of this kind of Speech are disfluencies (filled pause, repetition, false start, etc.) and many studies have focused on their detection and correction. Spontaneous Speech is defined in opposition to prepared Speech, where utterances contain well-formed sentences close to those found in written documents. Acoustic and linguistic features made available by the use of an automatic Speech recognition system are proposed to characterize and detect spontaneous Speech segments from large audio databases. To better define this notion of spontaneous Speech, segments of an 11-hour corpus (French Broadcast News) had been manually labeled according to three classes of spontaneity. Firstly, we present a study of these features. We then propose a two-level strategy to automatically assign a class of spontaneity to each Speech segment. The proposed system reaches a 73.0% precision and a 73.5% recall on high spontaneous Speech segments, and a 66.8% precision and a 69.6% recall on prepared Speech segments. A quantitative study shows that the classes of spontaneity are useful information to characterize the speaker roles. This is confirmed by extending the Speech spontaneity characterization approach to build an efficient automatic speaker role recognition system. © 2013 Elsevier B.V. All rights reserved.

  • Characterizing and detecting spontaneous Speech: Application to speaker role recognition
    Speech Communication, 2013
    Co-Authors: Richard Dufour, Yannick Estève, Paul Deléglise
    Abstract:

    Processing spontaneous Speech is one of the many challenges that automatic Speech recognition systems have to deal with. The main characteristics of this kind of Speech are disfluencies (filled pause, repetition, false start, etc.) and many studies have focused on their detection and correction. Spontaneous Speech is defined in opposition to prepared Speech, where utterances contain well-formed sentences close to those found in written documents. Acoustic and linguistic features made available by the use of an automatic Speech recognition system are proposed to characterize and detect spontaneous Speech segments from large audio databases. To better define this notion of spontaneous Speech, segments of an 11-hour corpus (French Broadcast News) had been manually labeled according to three classes of spontaneity. Firstly, we present a study of these features. We then propose a two-level strategy to automatically assign a class of spontaneity to each Speech segment. The proposed system reaches a 73.0% precision and a 73.5% recall on high spontaneous Speech segments, and a 66.8% precision and a 69.6% recall on prepared Speech segments. A quantitative study shows that the classes of spontaneity are useful information to characterize the speaker roles. This is confirmed by extending the Speech spontaneity characterization approach to build an efficient automatic speaker role recognition system.

C.r. Jankowski - One of the best experts on this subject based on the ideXlab platform.

  • Measuring fine structure in Speech: Application to speaker identification
    1995 International Conference on Acoustics Speech and Signal Processing, 1995
    Co-Authors: C.r. Jankowski, T.f. Quatieri, D.a. Reynolds
    Abstract:

    The performance of systems for speaker identification (SID) can be quite good with clean Speech, though much lower with degraded Speech. Thus it is useful to search for new features for SID, particularly features that are robust over a degraded channel. This paper investigates features that are based on amplitude and frequency modulations of Speech formants, high resolution measurement of fundamental frequency and location of "secondary pulses", measured using a high-resolution energy operator. When these features are added to traditional features using an existing SID system with a 168 speaker telephone Speech database, SID performance improved by as much as 4% for male speakers and 8.2% for female speakers.

R. Dahiya - One of the best experts on this subject based on the ideXlab platform.

  • WiMob (4) - SAMVAAD: Speech Applications made viable for access-anywhere devices
    WiMob'2005) IEEE International Conference on Wireless And Mobile Computing Networking And Communications 2005., 2005
    Co-Authors: N. Rajput, A.a. Nanavati, M. Kumar, P. Kankar, R. Dahiya
    Abstract:

    The proliferation of pervasive devices has stimulated the development of Applications that support ubiquitous access via multiple modalities. Since the processing capabilities of pervasive devices differ vastly, device-specific Application adaptation becomes essential. We address the problem of Speech Application adaptation by dialog call-flow reorganisation for pervasive devices with different memory constraints. Given an atomic dialog call-flow A and device memory size m, we present optimal deterministic algorithms, RESEQUENCE and BALANCE-TREE, which minimise the number of questions in the reorganised output call-flow A/sub m/. Algorithms MASQ and MATREE produce C/sub m/, minimally distant from input call-flow A/sub m/ while accommodating the memory constraint m. These two minimisation criteria are capable of capturing various usability requirements important in dialog call-flow design. The following observation forms the cornerstone of all the algorithms in this paper: Two grammars g/sub 1/ and g/sub 2/ comprising of |g/sub 1/| and |g/sub 2/| elements respectively can be merged into a single grammar g = g/sub 1/ /spl times/ g/sub 2/ having |g/sub 1/|/spl middot/|g/sub 2/| elements for the sequential case, and g = g/sub 1/ + g/sub 2/ having |g/sub 1/|+|g/sub 2/| elements for the tree case. Device-speciific considerations lead us to introduce the concept of an -characterisation of a call-flow, defined as the set of pairs {(m/sub i/,q/sub i/)| /spl isin/ N}, where q/sub i/ is the minimum number of questions required for memory size m/sub i/. Each call-flow has a unique, device-independent signature in its -characterisation - a measure of its adaptability. We present SAMVAAD, a system that implements these algorithms on call-flows authored in VXML containing SRGS grammars. The system was tested on an IBM voice browser using a sample airline reservation system call-flow reorganised for memories ranging from 64 MB to 210 KB. We ran an experiment with 14 users to obtain feedback on the usability of the adapted call-flows.

  • SAMVAAD: Speech Applications made viable for access-anywhere devices
    WiMob'2005) IEEE International Conference on Wireless And Mobile Computing Networking And Communications 2005., 2005
    Co-Authors: N. Rajput, A.a. Nanavati, M. Kumar, P. Kankar, R. Dahiya
    Abstract:

    The proliferation of pervasive devices has stimulated the development of Applications that support ubiquitous access via multiple modalities. Since the processing capabilities of pervasive devices differ vastly, device-specific Application adaptation becomes essential. We address the problem of Speech Application adaptation by dialog call-flow reorganisation for pervasive devices with different memory constraints. Given an atomic dialog call-flow A and device memory size m, we present optimal deterministic algorithms, RESEQUENCE and BALANCE-TREE, which minimise the number of questions in the reorganised output call-flow A/sub m/. Algorithms MASQ and MATREE produce C/sub m/, minimally distant from input call-flow A/sub m/ while accommodating the memory constraint m. These two minimisation criteria are capable of capturing various usability requirements important in dialog call-flow design. The following observation forms the cornerstone of all the algorithms in this paper: Two grammars g/sub 1/ and g/sub 2/ comprising of |g/sub 1/| and |g/sub 2/| elements respectively can be merged into a single grammar g = g/sub 1/ /spl times/ g/sub 2/ having |g/sub 1/|/spl middot/|g/sub 2/| elements for the sequential case, and g = g/sub 1/ + g/sub 2/ having |g/sub 1/|+|g/sub 2/| elements for the tree case. Device-speciific considerations lead us to introduce the concept of an -characterisation of a call-flow, defined as the set of pairs {(m/sub i/,q/sub i/)| /spl isin/ N}, where q/sub i/ is the minimum number of questions required for memory size m/sub i/. Each call-flow has a unique, device-independent signature in its -characterisation - a measure of its adaptability. We present SAMVAAD, a system that implements these algorithms on call-flows authored in VXML containing SRGS grammars. The system was tested on an IBM voice browser using a sample airline reservation system call-flow reorganised for memories ranging from 64 MB to 210 KB. We ran an experiment with 14 users to obtain feedback on the usability of the adapted call-flows.