Speech Recognizer

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 9726 Experts worldwide ranked by ideXlab platform

Harvey F. Silverman - One of the best experts on this subject based on the ideXlab platform.

  • Performance of an HMM Speech Recognizer using a real-time tracking microphone array as input
    IEEE Transactions on Speech and Audio Processing, 1999
    Co-Authors: T.b. Hughes, Hongseok Kim, J.h. Dibiase, Harvey F. Silverman
    Abstract:

    This correspondence reports results for a tracking, real-time microphone array as an input to a hidden Markov model based (HMM-based) connected alpha-digits Speech Recognizer. For a talker in the near field of the array (within 0.5 m), performance approaches that of a close-talking microphone input device.

  • using a real time tracking microphone array as input to an hmm Speech Recognizer
    International Conference on Acoustics Speech and Signal Processing, 1998
    Co-Authors: T.b. Hughes, Hongseok Kim, J.h. Dibiase, Harvey F. Silverman
    Abstract:

    A major problem for Speech recognition systems is relieving the talker of the need to use a close-talking, head-mounted or a deskstand microphone. A likely solution is the use of an array of microphones that can steer itself to the talker and can use a beamforming algorithm to overcome the reduced signal-to-noise ratio due to room acoustics. This paper reports results for a tracking, real-time microphone-array as an input to an HMM-based connected alpha-digits Speech Recognizer. For a talker in the very near field of the array (within a meter), performance approaches that of a close-talking microphone input device. The effects of both the noise reducing steered array and the use of a maximum a posteriori (MAP) training step are shown to be significant. Here, the array system and the Recognizer are described, experiments are presented, and the implications of combining these two systems discussed.

Wonyong Sung - One of the best experts on this subject based on the ideXlab platform.

  • a real time fpga based 20 000 word Speech Recognizer with optimized dram access
    IEEE Transactions on Circuits and Systems, 2010
    Co-Authors: Youngkyu Choi, Jungwook Choi, Wonyong Sung
    Abstract:

    A real-time hardware-based large vocabulary Speech Recognizer requires high memory bandwidth. We have developed a field-programmable-gate-array (FPGA)-based 20 000-word Speech Recognizer utilizing efficient dynamic random access memory (DRAM) access. This system contains all the functional blocks for hidden-Markov-model-based speaker-independent continuous Speech recognition: feature extraction, emission probability computation, and intraword and interword Viterbi beam search. The feature extraction is conducted in software on a soft-core-based CPU, while the other functional units are implemented using parallel and pipelined hardware blocks. In order to reduce the number of memory access operations, we used several techniques such as bitwidth reduction of the Gaussian parameters, multiframe computation of the emission probability, and two-stage language model pruning. We also employ a customized DRAM controller that supports various access patterns optimized for each functional unit of the Speech Recognizer. The Speech recognition hardware was synthesized for the Virtex-4 FPGA, and it operates at 100 MHz. The experimental result on Nov 92 20 k test set shows that the developed system runs 1.52 and 1.39 times faster than real time using the bigram and trigram language models, respectively.

  • openmp based parallel implementation of a continuous Speech Recognizer on a multi core system
    International Conference on Acoustics Speech and Signal Processing, 2009
    Co-Authors: Wonyong Sung
    Abstract:

    We have implemented a 20,000-word continuous Speech Recognizer on a multi-core based system. A fine grain parallel processing approach is employed for good scalability, and the OpenMP library is used for enhanced portability. In the emission probability computation, a dynamic workload distribution method is employed for good load balancing. However, the search network involved in the Viterbi beam search is statically partitioned into independent subtrees to reduce memory synchronization overhead. In order to further improve the performance, a workload predictive thread assignment strategy as well as a false cache line sharing prevention method are employed. The test was conducted using WSJ1 20k test and development set. We achieved the speed-up of 3.90 by utilizing four threads parallelization in a four-core system compared to four copies of the baseline single thread Speech Recognizer running simultaneously. The final recognition system runs about twice the speed of the real-time requirement.

  • ICASSP - OpenMP-based parallel implementation of a continuous Speech Recognizer on a multi-core system
    2009 IEEE International Conference on Acoustics Speech and Signal Processing, 2009
    Co-Authors: Kisun You, Young-joon Lee, Wonyong Sung
    Abstract:

    We have implemented a 20,000-word continuous Speech Recognizer on a multi-core based system. A fine grain parallel processing approach is employed for good scalability, and the OpenMP library is used for enhanced portability. In the emission probability computation, a dynamic workload distribution method is employed for good load balancing. However, the search network involved in the Viterbi beam search is statically partitioned into independent subtrees to reduce memory synchronization overhead. In order to further improve the performance, a workload predictive thread assignment strategy as well as a false cache line sharing prevention method are employed. The test was conducted using WSJ1 20k test and development set. We achieved the speed-up of 3.90 by utilizing four threads parallelization in a four-core system compared to four copies of the baseline single thread Speech Recognizer running simultaneously. The final recognition system runs about twice the speed of the real-time requirement.

  • fpga based implementation of a real time 5000 word continuous Speech Recognizer
    European Signal Processing Conference, 2008
    Co-Authors: Youngkyu Choi, Wonyong Sung
    Abstract:

    We have developed a hidden Markov model based 5000-word speaker independent continuous Speech Recognizer using a Field-Programmable Gate Array (FPGA). The feature extraction is conducted in software on a soft-core based CPU, while the emission probability computation and the Viterbi beam search are implemented using parallel and pipelined hardware blocks. In order to reduce the bandwidth requirement to external DRAM, we employed bit-width reduction of the Gaussian parameters, multi-block computation of the emission probability, and two-stage language model pruning. These optimizations reduce the memory bandwidth requirement for emission probability computation and inter-word transition by 81% and 44%, respectively. The Speech recognition hardware was synthesized for the Virtex-4 FPGA, and it operates at 100MHz. The experimental result on Wall Street Journal 5k vocabulary task shows that the developed system runs 1.52 times faster than real-time.

Milind Mahajan - One of the best experts on this subject based on the ideXlab platform.

  • microsoft windows highly intelligent Speech Recognizer whisper
    International Conference on Acoustics Speech and Signal Processing, 1995
    Co-Authors: Xuedong Huang, Meiyuh Hwang, Alejandro Acero, F Alleva, Li Jiang, Milind Mahajan
    Abstract:

    Since January 1993, the authors have been working to refine and extend Sphinx-II technologies in order to develop practical Speech recognition at Microsoft. The result of that work has been the Whisper (Windows Highly Intelligent Speech Recognizer). Whisper represents significantly improved recognition efficiency, usability, and accuracy, when compared with the Sphinx-II system. In addition Whisper offers Speech input capabilities for Microsoft Windows and can be scaled to meet different PC platform configurations. It provides features such as continuous Speech recognition, speaker-independence, on-line adaptation, noise robustness, dynamic vocabularies and grammars. For typical Windows Command-and-Control applications (less than 1000 words), Whisper provides a software only solution on PCs equipped with a 486DX, 4MB of memory, and a standard sound card and a desk-top microphone.

  • ICASSP - Microsoft Windows highly intelligent Speech Recognizer: Whisper
    1995 International Conference on Acoustics Speech and Signal Processing, 1
    Co-Authors: Xuedong Huang, Alex Acero, Meiyuh Hwang, F Alleva, Li Jiang, Milind Mahajan
    Abstract:

    Since January 1993, the authors have been working to refine and extend Sphinx-II technologies in order to develop practical Speech recognition at Microsoft. The result of that work has been the Whisper (Windows Highly Intelligent Speech Recognizer). Whisper represents significantly improved recognition efficiency, usability, and accuracy, when compared with the Sphinx-II system. In addition Whisper offers Speech input capabilities for Microsoft Windows and can be scaled to meet different PC platform configurations. It provides features such as continuous Speech recognition, speaker-independence, on-line adaptation, noise robustness, dynamic vocabularies and grammars. For typical Windows Command-and-Control applications (less than 1000 words), Whisper provides a software only solution on PCs equipped with a 486DX, 4MB of memory, and a standard sound card and a desk-top microphone.

T.b. Hughes - One of the best experts on this subject based on the ideXlab platform.

  • Performance of an HMM Speech Recognizer using a real-time tracking microphone array as input
    IEEE Transactions on Speech and Audio Processing, 1999
    Co-Authors: T.b. Hughes, Hongseok Kim, J.h. Dibiase, Harvey F. Silverman
    Abstract:

    This correspondence reports results for a tracking, real-time microphone array as an input to a hidden Markov model based (HMM-based) connected alpha-digits Speech Recognizer. For a talker in the near field of the array (within 0.5 m), performance approaches that of a close-talking microphone input device.

  • using a real time tracking microphone array as input to an hmm Speech Recognizer
    International Conference on Acoustics Speech and Signal Processing, 1998
    Co-Authors: T.b. Hughes, Hongseok Kim, J.h. Dibiase, Harvey F. Silverman
    Abstract:

    A major problem for Speech recognition systems is relieving the talker of the need to use a close-talking, head-mounted or a deskstand microphone. A likely solution is the use of an array of microphones that can steer itself to the talker and can use a beamforming algorithm to overcome the reduced signal-to-noise ratio due to room acoustics. This paper reports results for a tracking, real-time microphone-array as an input to an HMM-based connected alpha-digits Speech Recognizer. For a talker in the very near field of the array (within a meter), performance approaches that of a close-talking microphone input device. The effects of both the noise reducing steered array and the use of a maximum a posteriori (MAP) training step are shown to be significant. Here, the array system and the Recognizer are described, experiments are presented, and the implications of combining these two systems discussed.

Kuldip K. Paliwal - One of the best experts on this subject based on the ideXlab platform.