The Experts below are selected from a list of 20898 Experts worldwide ranked by ideXlab platform
Janet M. Baker - One of the best experts on this subject based on the ideXlab platform.
-
large vocabulary continuous speech recognition of Wall Street Journal data
International Conference on Acoustics Speech and Signal Processing, 1993Co-Authors: R Roth, Janet M. Baker, L Gillick, M Hunt, Y Ito, S Lowe, J Orloff, Barbara Peskin, F ScattoneAbstract:The authors report on the progress that has been made at Dragon Systems in speaker-independent large-vocabulary speech recognition using speech from DARPA's Wall Street Journal corpus. First they present an overview of the recognition and training algorithms. Then, they describe experiments involving two improvements to these algorithms, moving to higher-dimensional streams and using an IMELDA transformation. They also present some results showing the reduction in error rates. >
-
ICASSP (2) - Large vocabulary continuous speech recognition of Wall Street Journal data
IEEE International Conference on Acoustics Speech and Signal Processing, 1993Co-Authors: R Roth, Janet M. Baker, L Gillick, M Hunt, Y Ito, S Lowe, J Orloff, Barbara Peskin, F ScattoneAbstract:The authors report on the progress that has been made at Dragon Systems in speaker-independent large-vocabulary speech recognition using speech from DARPA's Wall Street Journal corpus. First they present an overview of the recognition and training algorithms. Then, they describe experiments involving two improvements to these algorithms, moving to higher-dimensional streams and using an IMELDA transformation. They also present some results showing the reduction in error rates. >
-
large vocabulary recognition of Wall Street Journal sentences at dragon systems
Human Language Technology, 1992Co-Authors: James K Baker, Janet M. Baker, L Gillick, Y Ito, S Lowe, Paul G Bamberg, Kathleen Bishop, Vera Helman, Zezhen Huang, Barbara PeskinAbstract:In this paper we present some of the algorithm improvements that have been made to Dragon's continuous speech recognition and training programs, improvements that have more than halved our error rate on the Resource Management task since the last SLS meeting in February 1991. We also report the "dry run" results that we have obtained on the 5000-word speaker-dependent Wall Street Journal recognition task, and outline our overall research strategy and plans for the future.In our system, a set of output distributions, known as the set of PELs (phonetic elements), is associated with each phoneme. The HMM for a PIC (phoneme-in-context) is represented as a linear sequence of states, each having an out-put distribution chosen from the set of PELs for the given phoneme, and a (double exponential) duration distribution.In this paper we report on two methods of acoustic modeling and training. The first method involves generating a set of (unimodal) PELs for a given speaker by clustering the hypothetical frames found in the spectral models for that speaker, and then constructing speaker-dependent PEL sequences to represent each PIC. The "spectral model" for a PIC is simply the expected value of the sequence of frames that would be generated by the PIC. The second method represents the probability distribution for each parameter in a PEL as a mixture of a fixed set of unimodal components, the mixing weights being estimated using the EM algorithm. In both models we assume that the parameters are statistically independent.We report results obtained using each of these two methods (RePELing/Respelling and univariate "tied mixtures") on the 5000-word closed-vocabulary verbalized punctuation version of the Wall Street Journal task.
-
The design for the Wall Street Journal-based CSR corpus
Proceedings of the workshop on Speech and Natural Language - HLT '91, 1992Co-Authors: Douglas B. Paul, Janet M. BakerAbstract:The DARPA Spoken Language System (SLS) community has long taken a leadership position in designing, implementing, and globally distributing significant speech corpora widely used for advancing speech recognition research. The Wall Street Journal (WSJ) CSR Corpus described here is the newest addition to this valuable set of resources. In contrast to previous corpora, the WSJ corpus will provide DARPA its first general-purpose English, large vocabulary, natural language, high perplexity, corpus containing significant quantities of both speech data (400 hrs.) and text data (47M words), thereby providing a means to integrate speech recognition and natural language processing in application domains with high potential practical value. This paper presents the motivating goals, acoustic data design, text processing steps, lexicons, and testing paradigms incorporated into the multi-faceted WSJ CSR Corpus.
-
HLT - Large vocabulary recognition of Wall Street Journal sentences at Dragon Systems
Proceedings of the workshop on Speech and Natural Language - HLT '91, 1992Co-Authors: James K Baker, Janet M. Baker, L Gillick, Y Ito, S Lowe, Paul G Bamberg, Kathleen Bishop, Vera Helman, Zezhen Huang, Barbara PeskinAbstract:In this paper we present some of the algorithm improvements that have been made to Dragon's continuous speech recognition and training programs, improvements that have more than halved our error rate on the Resource Management task since the last SLS meeting in February 1991. We also report the "dry run" results that we have obtained on the 5000-word speaker-dependent Wall Street Journal recognition task, and outline our overall research strategy and plans for the future.In our system, a set of output distributions, known as the set of PELs (phonetic elements), is associated with each phoneme. The HMM for a PIC (phoneme-in-context) is represented as a linear sequence of states, each having an out-put distribution chosen from the set of PELs for the given phoneme, and a (double exponential) duration distribution.In this paper we report on two methods of acoustic modeling and training. The first method involves generating a set of (unimodal) PELs for a given speaker by clustering the hypothetical frames found in the spectral models for that speaker, and then constructing speaker-dependent PEL sequences to represent each PIC. The "spectral model" for a PIC is simply the expected value of the sequence of frames that would be generated by the PIC. The second method represents the probability distribution for each parameter in a PEL as a mixture of a fixed set of unimodal components, the mixing weights being estimated using the EM algorithm. In both models we assume that the parameters are statistically independent.We report results obtained using each of these two methods (RePELing/Respelling and univariate "tied mixtures") on the 5000-word closed-vocabulary verbalized punctuation version of the Wall Street Journal task.
Salim Roukos - One of the best experts on this subject based on the ideXlab platform.
-
performance of the ibm large vocabulary continuous speech recognition system on the arpa Wall Street Journal task
International Conference on Acoustics Speech and Signal Processing, 1995Co-Authors: Lalit R Bahl, S Balakrishnanaiyer, J R Bellgarda, Martin Franz, Ponani S Gopalakrishnan, David Nahamoo, Miroslav Novak, Mukund Padmanabhan, Michael Picheny, Salim RoukosAbstract:In this paper we discuss various experimental results using our continuous speech recognition system on the Wall Street Journal task. Experiments with different feature extraction methods, varying amounts and type of training data, and different vocabulary sizes are reported.
-
ICASSP - Performance of the IBM large vocabulary continuous speech recognition system on the ARPA Wall Street Journal task
1995 International Conference on Acoustics Speech and Signal Processing, 1Co-Authors: Lalit R Bahl, J R Bellgarda, Martin Franz, Ponani S Gopalakrishnan, David Nahamoo, Miroslav Novak, Mukund Padmanabhan, Michael Picheny, S. Balakrishnan-aiyer, Salim RoukosAbstract:In this paper we discuss various experimental results using our continuous speech recognition system on the Wall Street Journal task. Experiments with different feature extraction methods, varying amounts and type of training data, and different vocabulary sizes are reported.
Volker Steinbiss - One of the best experts on this subject based on the ideXlab platform.
-
large vocabulary continuous speech recognition of Wall Street Journal data
International Conference on Acoustics Speech and Signal Processing, 1994Co-Authors: Xavier L Aubert, Christian Dugast, Hermann Ney, Volker SteinbissAbstract:We report on recent developments of the Philips large vocabulary speech recognition system and on our experiments with the Wall Street Journal (WSJ) corpus. A two-pass decoding has been devised that allows an easy integration of more complex language models. First, a word lattice is produced using a time synchronous beam search with a bigram language model. Next, a higher-order language model is applied to the lattice at the phrase level. The conditions insuring the validity of this approach are explained and practical results for trigram demonstrate its usefulness. The main system development stages on WSJ data are presented and our final recognizers are evaluated on Nov. '92 and Nov. '93 test-data for both 5 K and 20 K vocabularies. >
-
ICASSP (2) - Large vocabulary continuous speech recognition of Wall Street Journal data
Proceedings of ICASSP '94. IEEE International Conference on Acoustics Speech and Signal Processing, 1Co-Authors: Xavier L Aubert, Christian Dugast, Hermann Ney, Volker SteinbissAbstract:We report on recent developments of the Philips large vocabulary speech recognition system and on our experiments with the Wall Street Journal (WSJ) corpus. A two-pass decoding has been devised that allows an easy integration of more complex language models. First, a word lattice is produced using a time synchronous beam search with a bigram language model. Next, a higher-order language model is applied to the lattice at the phrase level. The conditions insuring the validity of this approach are explained and practical results for trigram demonstrate its usefulness. The main system development stages on WSJ data are presented and our final recognizers are evaluated on Nov. '92 and Nov. '93 test-data for both 5 K and 20 K vocabularies. >
F Scattone - One of the best experts on this subject based on the ideXlab platform.
-
large vocabulary continuous speech recognition of Wall Street Journal data
International Conference on Acoustics Speech and Signal Processing, 1993Co-Authors: R Roth, Janet M. Baker, L Gillick, M Hunt, Y Ito, S Lowe, J Orloff, Barbara Peskin, F ScattoneAbstract:The authors report on the progress that has been made at Dragon Systems in speaker-independent large-vocabulary speech recognition using speech from DARPA's Wall Street Journal corpus. First they present an overview of the recognition and training algorithms. Then, they describe experiments involving two improvements to these algorithms, moving to higher-dimensional streams and using an IMELDA transformation. They also present some results showing the reduction in error rates. >
-
ICASSP (2) - Large vocabulary continuous speech recognition of Wall Street Journal data
IEEE International Conference on Acoustics Speech and Signal Processing, 1993Co-Authors: R Roth, Janet M. Baker, L Gillick, M Hunt, Y Ito, S Lowe, J Orloff, Barbara Peskin, F ScattoneAbstract:The authors report on the progress that has been made at Dragon Systems in speaker-independent large-vocabulary speech recognition using speech from DARPA's Wall Street Journal corpus. First they present an overview of the recognition and training algorithms. Then, they describe experiments involving two improvements to these algorithms, moving to higher-dimensional streams and using an IMELDA transformation. They also present some results showing the reduction in error rates. >
Barbara Peskin - One of the best experts on this subject based on the ideXlab platform.
-
large vocabulary continuous speech recognition of Wall Street Journal data
International Conference on Acoustics Speech and Signal Processing, 1993Co-Authors: R Roth, Janet M. Baker, L Gillick, M Hunt, Y Ito, S Lowe, J Orloff, Barbara Peskin, F ScattoneAbstract:The authors report on the progress that has been made at Dragon Systems in speaker-independent large-vocabulary speech recognition using speech from DARPA's Wall Street Journal corpus. First they present an overview of the recognition and training algorithms. Then, they describe experiments involving two improvements to these algorithms, moving to higher-dimensional streams and using an IMELDA transformation. They also present some results showing the reduction in error rates. >
-
ICASSP (2) - Large vocabulary continuous speech recognition of Wall Street Journal data
IEEE International Conference on Acoustics Speech and Signal Processing, 1993Co-Authors: R Roth, Janet M. Baker, L Gillick, M Hunt, Y Ito, S Lowe, J Orloff, Barbara Peskin, F ScattoneAbstract:The authors report on the progress that has been made at Dragon Systems in speaker-independent large-vocabulary speech recognition using speech from DARPA's Wall Street Journal corpus. First they present an overview of the recognition and training algorithms. Then, they describe experiments involving two improvements to these algorithms, moving to higher-dimensional streams and using an IMELDA transformation. They also present some results showing the reduction in error rates. >
-
large vocabulary recognition of Wall Street Journal sentences at dragon systems
Human Language Technology, 1992Co-Authors: James K Baker, Janet M. Baker, L Gillick, Y Ito, S Lowe, Paul G Bamberg, Kathleen Bishop, Vera Helman, Zezhen Huang, Barbara PeskinAbstract:In this paper we present some of the algorithm improvements that have been made to Dragon's continuous speech recognition and training programs, improvements that have more than halved our error rate on the Resource Management task since the last SLS meeting in February 1991. We also report the "dry run" results that we have obtained on the 5000-word speaker-dependent Wall Street Journal recognition task, and outline our overall research strategy and plans for the future.In our system, a set of output distributions, known as the set of PELs (phonetic elements), is associated with each phoneme. The HMM for a PIC (phoneme-in-context) is represented as a linear sequence of states, each having an out-put distribution chosen from the set of PELs for the given phoneme, and a (double exponential) duration distribution.In this paper we report on two methods of acoustic modeling and training. The first method involves generating a set of (unimodal) PELs for a given speaker by clustering the hypothetical frames found in the spectral models for that speaker, and then constructing speaker-dependent PEL sequences to represent each PIC. The "spectral model" for a PIC is simply the expected value of the sequence of frames that would be generated by the PIC. The second method represents the probability distribution for each parameter in a PEL as a mixture of a fixed set of unimodal components, the mixing weights being estimated using the EM algorithm. In both models we assume that the parameters are statistically independent.We report results obtained using each of these two methods (RePELing/Respelling and univariate "tied mixtures") on the 5000-word closed-vocabulary verbalized punctuation version of the Wall Street Journal task.
-
HLT - Large vocabulary recognition of Wall Street Journal sentences at Dragon Systems
Proceedings of the workshop on Speech and Natural Language - HLT '91, 1992Co-Authors: James K Baker, Janet M. Baker, L Gillick, Y Ito, S Lowe, Paul G Bamberg, Kathleen Bishop, Vera Helman, Zezhen Huang, Barbara PeskinAbstract:In this paper we present some of the algorithm improvements that have been made to Dragon's continuous speech recognition and training programs, improvements that have more than halved our error rate on the Resource Management task since the last SLS meeting in February 1991. We also report the "dry run" results that we have obtained on the 5000-word speaker-dependent Wall Street Journal recognition task, and outline our overall research strategy and plans for the future.In our system, a set of output distributions, known as the set of PELs (phonetic elements), is associated with each phoneme. The HMM for a PIC (phoneme-in-context) is represented as a linear sequence of states, each having an out-put distribution chosen from the set of PELs for the given phoneme, and a (double exponential) duration distribution.In this paper we report on two methods of acoustic modeling and training. The first method involves generating a set of (unimodal) PELs for a given speaker by clustering the hypothetical frames found in the spectral models for that speaker, and then constructing speaker-dependent PEL sequences to represent each PIC. The "spectral model" for a PIC is simply the expected value of the sequence of frames that would be generated by the PIC. The second method represents the probability distribution for each parameter in a PEL as a mixture of a fixed set of unimodal components, the mixing weights being estimated using the EM algorithm. In both models we assume that the parameters are statistically independent.We report results obtained using each of these two methods (RePELing/Respelling and univariate "tied mixtures") on the 5000-word closed-vocabulary verbalized punctuation version of the Wall Street Journal task.