The Experts below are selected from a list of 9726 Experts worldwide ranked by ideXlab platform
Harvey F. Silverman - One of the best experts on this subject based on the ideXlab platform.
-
Performance of an HMM Speech Recognizer using a real-time tracking microphone array as input
IEEE Transactions on Speech and Audio Processing, 1999Co-Authors: T.b. Hughes, Hongseok Kim, J.h. Dibiase, Harvey F. SilvermanAbstract:This correspondence reports results for a tracking, real-time microphone array as an input to a hidden Markov model based (HMM-based) connected alpha-digits Speech Recognizer. For a talker in the near field of the array (within 0.5 m), performance approaches that of a close-talking microphone input device.
-
using a real time tracking microphone array as input to an hmm Speech Recognizer
International Conference on Acoustics Speech and Signal Processing, 1998Co-Authors: T.b. Hughes, Hongseok Kim, J.h. Dibiase, Harvey F. SilvermanAbstract:A major problem for Speech recognition systems is relieving the talker of the need to use a close-talking, head-mounted or a deskstand microphone. A likely solution is the use of an array of microphones that can steer itself to the talker and can use a beamforming algorithm to overcome the reduced signal-to-noise ratio due to room acoustics. This paper reports results for a tracking, real-time microphone-array as an input to an HMM-based connected alpha-digits Speech Recognizer. For a talker in the very near field of the array (within a meter), performance approaches that of a close-talking microphone input device. The effects of both the noise reducing steered array and the use of a maximum a posteriori (MAP) training step are shown to be significant. Here, the array system and the Recognizer are described, experiments are presented, and the implications of combining these two systems discussed.
Wonyong Sung - One of the best experts on this subject based on the ideXlab platform.
-
a real time fpga based 20 000 word Speech Recognizer with optimized dram access
IEEE Transactions on Circuits and Systems, 2010Co-Authors: Youngkyu Choi, Jungwook Choi, Wonyong SungAbstract:A real-time hardware-based large vocabulary Speech Recognizer requires high memory bandwidth. We have developed a field-programmable-gate-array (FPGA)-based 20 000-word Speech Recognizer utilizing efficient dynamic random access memory (DRAM) access. This system contains all the functional blocks for hidden-Markov-model-based speaker-independent continuous Speech recognition: feature extraction, emission probability computation, and intraword and interword Viterbi beam search. The feature extraction is conducted in software on a soft-core-based CPU, while the other functional units are implemented using parallel and pipelined hardware blocks. In order to reduce the number of memory access operations, we used several techniques such as bitwidth reduction of the Gaussian parameters, multiframe computation of the emission probability, and two-stage language model pruning. We also employ a customized DRAM controller that supports various access patterns optimized for each functional unit of the Speech Recognizer. The Speech recognition hardware was synthesized for the Virtex-4 FPGA, and it operates at 100 MHz. The experimental result on Nov 92 20 k test set shows that the developed system runs 1.52 and 1.39 times faster than real time using the bigram and trigram language models, respectively.
-
openmp based parallel implementation of a continuous Speech Recognizer on a multi core system
International Conference on Acoustics Speech and Signal Processing, 2009Co-Authors: Wonyong SungAbstract:We have implemented a 20,000-word continuous Speech Recognizer on a multi-core based system. A fine grain parallel processing approach is employed for good scalability, and the OpenMP library is used for enhanced portability. In the emission probability computation, a dynamic workload distribution method is employed for good load balancing. However, the search network involved in the Viterbi beam search is statically partitioned into independent subtrees to reduce memory synchronization overhead. In order to further improve the performance, a workload predictive thread assignment strategy as well as a false cache line sharing prevention method are employed. The test was conducted using WSJ1 20k test and development set. We achieved the speed-up of 3.90 by utilizing four threads parallelization in a four-core system compared to four copies of the baseline single thread Speech Recognizer running simultaneously. The final recognition system runs about twice the speed of the real-time requirement.
-
ICASSP - OpenMP-based parallel implementation of a continuous Speech Recognizer on a multi-core system
2009 IEEE International Conference on Acoustics Speech and Signal Processing, 2009Co-Authors: Kisun You, Young-joon Lee, Wonyong SungAbstract:We have implemented a 20,000-word continuous Speech Recognizer on a multi-core based system. A fine grain parallel processing approach is employed for good scalability, and the OpenMP library is used for enhanced portability. In the emission probability computation, a dynamic workload distribution method is employed for good load balancing. However, the search network involved in the Viterbi beam search is statically partitioned into independent subtrees to reduce memory synchronization overhead. In order to further improve the performance, a workload predictive thread assignment strategy as well as a false cache line sharing prevention method are employed. The test was conducted using WSJ1 20k test and development set. We achieved the speed-up of 3.90 by utilizing four threads parallelization in a four-core system compared to four copies of the baseline single thread Speech Recognizer running simultaneously. The final recognition system runs about twice the speed of the real-time requirement.
-
fpga based implementation of a real time 5000 word continuous Speech Recognizer
European Signal Processing Conference, 2008Co-Authors: Youngkyu Choi, Wonyong SungAbstract:We have developed a hidden Markov model based 5000-word speaker independent continuous Speech Recognizer using a Field-Programmable Gate Array (FPGA). The feature extraction is conducted in software on a soft-core based CPU, while the emission probability computation and the Viterbi beam search are implemented using parallel and pipelined hardware blocks. In order to reduce the bandwidth requirement to external DRAM, we employed bit-width reduction of the Gaussian parameters, multi-block computation of the emission probability, and two-stage language model pruning. These optimizations reduce the memory bandwidth requirement for emission probability computation and inter-word transition by 81% and 44%, respectively. The Speech recognition hardware was synthesized for the Virtex-4 FPGA, and it operates at 100MHz. The experimental result on Wall Street Journal 5k vocabulary task shows that the developed system runs 1.52 times faster than real-time.
Milind Mahajan - One of the best experts on this subject based on the ideXlab platform.
-
microsoft windows highly intelligent Speech Recognizer whisper
International Conference on Acoustics Speech and Signal Processing, 1995Co-Authors: Xuedong Huang, Meiyuh Hwang, Alejandro Acero, F Alleva, Li Jiang, Milind MahajanAbstract:Since January 1993, the authors have been working to refine and extend Sphinx-II technologies in order to develop practical Speech recognition at Microsoft. The result of that work has been the Whisper (Windows Highly Intelligent Speech Recognizer). Whisper represents significantly improved recognition efficiency, usability, and accuracy, when compared with the Sphinx-II system. In addition Whisper offers Speech input capabilities for Microsoft Windows and can be scaled to meet different PC platform configurations. It provides features such as continuous Speech recognition, speaker-independence, on-line adaptation, noise robustness, dynamic vocabularies and grammars. For typical Windows Command-and-Control applications (less than 1000 words), Whisper provides a software only solution on PCs equipped with a 486DX, 4MB of memory, and a standard sound card and a desk-top microphone.
-
ICASSP - Microsoft Windows highly intelligent Speech Recognizer: Whisper
1995 International Conference on Acoustics Speech and Signal Processing, 1Co-Authors: Xuedong Huang, Alex Acero, Meiyuh Hwang, F Alleva, Li Jiang, Milind MahajanAbstract:Since January 1993, the authors have been working to refine and extend Sphinx-II technologies in order to develop practical Speech recognition at Microsoft. The result of that work has been the Whisper (Windows Highly Intelligent Speech Recognizer). Whisper represents significantly improved recognition efficiency, usability, and accuracy, when compared with the Sphinx-II system. In addition Whisper offers Speech input capabilities for Microsoft Windows and can be scaled to meet different PC platform configurations. It provides features such as continuous Speech recognition, speaker-independence, on-line adaptation, noise robustness, dynamic vocabularies and grammars. For typical Windows Command-and-Control applications (less than 1000 words), Whisper provides a software only solution on PCs equipped with a 486DX, 4MB of memory, and a standard sound card and a desk-top microphone.
T.b. Hughes - One of the best experts on this subject based on the ideXlab platform.
-
Performance of an HMM Speech Recognizer using a real-time tracking microphone array as input
IEEE Transactions on Speech and Audio Processing, 1999Co-Authors: T.b. Hughes, Hongseok Kim, J.h. Dibiase, Harvey F. SilvermanAbstract:This correspondence reports results for a tracking, real-time microphone array as an input to a hidden Markov model based (HMM-based) connected alpha-digits Speech Recognizer. For a talker in the near field of the array (within 0.5 m), performance approaches that of a close-talking microphone input device.
-
using a real time tracking microphone array as input to an hmm Speech Recognizer
International Conference on Acoustics Speech and Signal Processing, 1998Co-Authors: T.b. Hughes, Hongseok Kim, J.h. Dibiase, Harvey F. SilvermanAbstract:A major problem for Speech recognition systems is relieving the talker of the need to use a close-talking, head-mounted or a deskstand microphone. A likely solution is the use of an array of microphones that can steer itself to the talker and can use a beamforming algorithm to overcome the reduced signal-to-noise ratio due to room acoustics. This paper reports results for a tracking, real-time microphone-array as an input to an HMM-based connected alpha-digits Speech Recognizer. For a talker in the very near field of the array (within a meter), performance approaches that of a close-talking microphone input device. The effects of both the noise reducing steered array and the use of a maximum a posteriori (MAP) training step are shown to be significant. Here, the array system and the Recognizer are described, experiments are presented, and the implications of combining these two systems discussed.
Kuldip K. Paliwal - One of the best experts on this subject based on the ideXlab platform.
-
ICASSP (2) - Use of temporal correlation between successive frames in a hidden Markov model based Speech Recognizer
IEEE International Conference on Acoustics Speech and Signal Processing, 1993Co-Authors: Kuldip K. PaliwalAbstract:The temporal correlation between successive frames is incorporated in an HMM (hidden Markov model) based Speech Recognizer. This is done by making the probability of the current observation vector dependent on the previous observation vectors. Preliminary results show that this approach provides significant improvement in recognition performance (with the use of temporal correlation between two successive frames alone). >
-
use of temporal correlation between successive frames in a hidden markov model based Speech Recognizer
International Conference on Acoustics Speech and Signal Processing, 1993Co-Authors: Kuldip K. PaliwalAbstract:The temporal correlation between successive frames is incorporated in an HMM (hidden Markov model) based Speech Recognizer. This is done by making the probability of the current observation vector dependent on the previous observation vectors. Preliminary results show that this approach provides significant improvement in recognition performance (with the use of temporal correlation between two successive frames alone). >
-
ICASSP - Lexicon-building methods for an acoustic sub-word based Speech Recognizer
International Conference on Acoustics Speech and Signal Processing, 1Co-Authors: Kuldip K. PaliwalAbstract:The use of an acoustic subword unit (ASWU)-based Speech recognition system for the recognition of isolated words is discussed. Some methods are proposed for generating the deterministic and the statistical types of word lexicon. It is shown that the use of a modified k-means algorithm on the likelihoods derived through the Viterbi algorithm provides the best deterministic-type of word lexicon. However, the ASWU-based Speech Recognizer leads to better performance with the statistical type of word lexicon than with the deterministic type. Improving the design of the word lexicon makes it possible to narrow the gap in the recognition performances of the whole word unit (WWU)-based and the ASWU-based Speech Recognizers considerably. Further improvements are expected by designing the word lexicon better. >