Language Modeling

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 135492 Experts worldwide ranked by ideXlab platform

Tomas Mikolov - One of the best experts on this subject based on the ideXlab platform.

  • one billion word benchmark for measuring progress in statistical Language Modeling
    arXiv: Computation and Language, 2013
    Co-Authors: Ciprian Chelba, Tomas Mikolov, Thorsten Brants, Mike Schuster, Phillipp Koehn, Tony Robinson
    Abstract:

    We propose a new benchmark corpus to be used for measuring progress in statistical Language Modeling. With almost one billion words of training data, we hope this benchmark will be useful to quickly evaluate novel Language Modeling techniques, and to compare their contribution when combined with other advanced techniques. We show performance of several well-known types of Language models, with the best results achieved with a recurrent neural network based Language model. The baseline unpruned Kneser-Ney 5-gram model achieves perplexity 67.6; a combination of techniques leads to 35% reduction in perplexity, or 10% reduction in cross-entropy (bits), over that baseline. The benchmark is available as a code.google.com project; besides the scripts needed to rebuild the training/held-out data, it also makes available log-probability values for each word in each of ten held-out data sets, for each of the baseline n-gram models.

  • empirical evaluation and combination of advanced Language Modeling techniques
    Conference of the International Speech Communication Association, 2011
    Co-Authors: Tomas Mikolov, Anoop Deoras, Stefan Kombrink, Lukas Burget, Jan Cernocký
    Abstract:

    We present results obtained with several advanced Language Modeling techniques, including class based model, cache model, maximum entropy model, structured Language model, random forest Language model and several types of neural network based Language models. We show results obtained after combining all these models by using linear interpolation. We conclude that for both small and moderately sized tasks, we obtain new state of the art results with combination of models, that is significantly better than performance of any individual model. Obtained perplexity reductions against Good-Turing trigram baseline are over 50% and against modified Kneser-Ney smoothed 5-gram over 40%. Index Terms: Language Modeling, neural networks, model combination, speech recognition

  • RNNLM --- Recurrent Neural Network Language Modeling Toolkit
    Proceedings of ASRU 2011, 2011
    Co-Authors: Tomas Mikolov, Anoop Deoras, Stefan Kombrink, Lukas Burget, Jan Honza Černocký
    Abstract:

    —We present freely available open-source toolkit for training recurrent neural network based Language models. It can be easily used to improve existing speech recognition and machine translation systems. Also, it can be used as a baseline for future research of advanced Language Modeling techniques. In the paper, we discuss optimal parameter selection and different modes of functionality. The toolkit, example scripts and basic setups are freely available at http://rnnlm.sourceforge.net/.

  • recurrent neural network based Language Modeling in meeting recognition
    Conference of the International Speech Communication Association, 2011
    Co-Authors: Stefan Kombrink, Martin Karafiat, Tomas Mikolov, Lukas Burget
    Abstract:

    We use recurrent neural network (RNN) based Language models to improve the BUT English meeting recognizer. On the baseline setup using the original Language models we decrease word error rate (WER) more than 1% absolute by n-best list rescoring and Language model adaptation. When n-gram Language models are trained on the same moderately sized data set as the RNN models, improvements are higher yielding a system which performs comparable to the baseline. A noticeable improvement was observed with unsupervised adaptation of RNN models. Furthermore, we examine the influence of word history on WER and show how to speed-up rescoring by caching common prefix strings. Index Terms: automatic speech recognition, Language Modeling, recurrent neural networks, rescoring, adaptation

  • recurrent neural network based Language Modeling in meeting recognition
    Conference of the International Speech Communication Association, 2011
    Co-Authors: Stefan Kombrink, Martin Karafiat, Tomas Mikolov, Lukas Burget
    Abstract:

    We use recurrent neural network (RNN) based Language models to improve the BUT English meeting recognizer. On the baseline setup using the original Language models we decrease word error rate (WER) more than 1% absolute by n-best list rescoring and Language model adaptation. When n-gram Language models are trained on the same moderately sized data set as the RNN models, improvements are higher yielding a system which performs comparable to the baseline. A noticeable improvement was observed with unsupervised adaptation of RNN models. Furthermore, we examine the influence of word history on WER and show how to speed-up rescoring by caching common prefix strings. Index Terms: automatic speech recognition, Language Modeling, recurrent neural networks, rescoring, adaptation

Lukas Burget - One of the best experts on this subject based on the ideXlab platform.

  • empirical evaluation and combination of advanced Language Modeling techniques
    Conference of the International Speech Communication Association, 2011
    Co-Authors: Tomas Mikolov, Anoop Deoras, Stefan Kombrink, Lukas Burget, Jan Cernocký
    Abstract:

    We present results obtained with several advanced Language Modeling techniques, including class based model, cache model, maximum entropy model, structured Language model, random forest Language model and several types of neural network based Language models. We show results obtained after combining all these models by using linear interpolation. We conclude that for both small and moderately sized tasks, we obtain new state of the art results with combination of models, that is significantly better than performance of any individual model. Obtained perplexity reductions against Good-Turing trigram baseline are over 50% and against modified Kneser-Ney smoothed 5-gram over 40%. Index Terms: Language Modeling, neural networks, model combination, speech recognition

  • RNNLM --- Recurrent Neural Network Language Modeling Toolkit
    Proceedings of ASRU 2011, 2011
    Co-Authors: Tomas Mikolov, Anoop Deoras, Stefan Kombrink, Lukas Burget, Jan Honza Černocký
    Abstract:

    —We present freely available open-source toolkit for training recurrent neural network based Language models. It can be easily used to improve existing speech recognition and machine translation systems. Also, it can be used as a baseline for future research of advanced Language Modeling techniques. In the paper, we discuss optimal parameter selection and different modes of functionality. The toolkit, example scripts and basic setups are freely available at http://rnnlm.sourceforge.net/.

  • recurrent neural network based Language Modeling in meeting recognition
    Conference of the International Speech Communication Association, 2011
    Co-Authors: Stefan Kombrink, Martin Karafiat, Tomas Mikolov, Lukas Burget
    Abstract:

    We use recurrent neural network (RNN) based Language models to improve the BUT English meeting recognizer. On the baseline setup using the original Language models we decrease word error rate (WER) more than 1% absolute by n-best list rescoring and Language model adaptation. When n-gram Language models are trained on the same moderately sized data set as the RNN models, improvements are higher yielding a system which performs comparable to the baseline. A noticeable improvement was observed with unsupervised adaptation of RNN models. Furthermore, we examine the influence of word history on WER and show how to speed-up rescoring by caching common prefix strings. Index Terms: automatic speech recognition, Language Modeling, recurrent neural networks, rescoring, adaptation

  • recurrent neural network based Language Modeling in meeting recognition
    Conference of the International Speech Communication Association, 2011
    Co-Authors: Stefan Kombrink, Martin Karafiat, Tomas Mikolov, Lukas Burget
    Abstract:

    We use recurrent neural network (RNN) based Language models to improve the BUT English meeting recognizer. On the baseline setup using the original Language models we decrease word error rate (WER) more than 1% absolute by n-best list rescoring and Language model adaptation. When n-gram Language models are trained on the same moderately sized data set as the RNN models, improvements are higher yielding a system which performs comparable to the baseline. A noticeable improvement was observed with unsupervised adaptation of RNN models. Furthermore, we examine the influence of word history on WER and show how to speed-up rescoring by caching common prefix strings. Index Terms: automatic speech recognition, Language Modeling, recurrent neural networks, rescoring, adaptation

Stefan Kombrink - One of the best experts on this subject based on the ideXlab platform.

  • empirical evaluation and combination of advanced Language Modeling techniques
    Conference of the International Speech Communication Association, 2011
    Co-Authors: Tomas Mikolov, Anoop Deoras, Stefan Kombrink, Lukas Burget, Jan Cernocký
    Abstract:

    We present results obtained with several advanced Language Modeling techniques, including class based model, cache model, maximum entropy model, structured Language model, random forest Language model and several types of neural network based Language models. We show results obtained after combining all these models by using linear interpolation. We conclude that for both small and moderately sized tasks, we obtain new state of the art results with combination of models, that is significantly better than performance of any individual model. Obtained perplexity reductions against Good-Turing trigram baseline are over 50% and against modified Kneser-Ney smoothed 5-gram over 40%. Index Terms: Language Modeling, neural networks, model combination, speech recognition

  • RNNLM --- Recurrent Neural Network Language Modeling Toolkit
    Proceedings of ASRU 2011, 2011
    Co-Authors: Tomas Mikolov, Anoop Deoras, Stefan Kombrink, Lukas Burget, Jan Honza Černocký
    Abstract:

    —We present freely available open-source toolkit for training recurrent neural network based Language models. It can be easily used to improve existing speech recognition and machine translation systems. Also, it can be used as a baseline for future research of advanced Language Modeling techniques. In the paper, we discuss optimal parameter selection and different modes of functionality. The toolkit, example scripts and basic setups are freely available at http://rnnlm.sourceforge.net/.

  • recurrent neural network based Language Modeling in meeting recognition
    Conference of the International Speech Communication Association, 2011
    Co-Authors: Stefan Kombrink, Martin Karafiat, Tomas Mikolov, Lukas Burget
    Abstract:

    We use recurrent neural network (RNN) based Language models to improve the BUT English meeting recognizer. On the baseline setup using the original Language models we decrease word error rate (WER) more than 1% absolute by n-best list rescoring and Language model adaptation. When n-gram Language models are trained on the same moderately sized data set as the RNN models, improvements are higher yielding a system which performs comparable to the baseline. A noticeable improvement was observed with unsupervised adaptation of RNN models. Furthermore, we examine the influence of word history on WER and show how to speed-up rescoring by caching common prefix strings. Index Terms: automatic speech recognition, Language Modeling, recurrent neural networks, rescoring, adaptation

  • recurrent neural network based Language Modeling in meeting recognition
    Conference of the International Speech Communication Association, 2011
    Co-Authors: Stefan Kombrink, Martin Karafiat, Tomas Mikolov, Lukas Burget
    Abstract:

    We use recurrent neural network (RNN) based Language models to improve the BUT English meeting recognizer. On the baseline setup using the original Language models we decrease word error rate (WER) more than 1% absolute by n-best list rescoring and Language model adaptation. When n-gram Language models are trained on the same moderately sized data set as the RNN models, improvements are higher yielding a system which performs comparable to the baseline. A noticeable improvement was observed with unsupervised adaptation of RNN models. Furthermore, we examine the influence of word history on WER and show how to speed-up rescoring by caching common prefix strings. Index Terms: automatic speech recognition, Language Modeling, recurrent neural networks, rescoring, adaptation

Hermann Ney - One of the best experts on this subject based on the ideXlab platform.

  • Language Modeling with deep transformers
    Conference of the International Speech Communication Association, 2019
    Co-Authors: Kazuki Irie, Ralf Schlüter, Albert Zeyer, Hermann Ney
    Abstract:

    We explore deep autoregressive Transformer models in Language Modeling for speech recognition. We focus on two aspects. First, we revisit Transformer model configurations specifically for Language Modeling. We show that well configured Transformer models outperform our baseline models based on the shallow stack of LSTM recurrent neural network layers. We carry out experiments on the open-source LibriSpeech 960hr task, for both 200K vocabulary word-level and 10K byte-pair encoding subword-level Language Modeling. We apply our word-level models to conventional hybrid speech recognition by lattice rescoring, and the subword-level models to attention based encoder-decoder models by shallow fusion. Second, we show that deep Transformer Language models do not require positional encoding. The positional encoding is an essential augmentation for the self-attention mechanism which is invariant to sequence ordering. However, in autoregressive setup, as is the case for Language Modeling, the amount of information increases along the position dimension, which is a positional signal by its own. The analysis of attention weights shows that deep autoregressive self-attention models can automatically make use of such positional information. We find that removing the positional encoding even slightly improves the performance of these models.

  • lstm neural networks for Language Modeling
    Conference of the International Speech Communication Association, 2012
    Co-Authors: Martin Sundermeyer, Ralf Schlüter, Hermann Ney
    Abstract:

    Neural networks have become increasingly popular for the task of Language Modeling. Whereas feed-forward networks only exploit a fixed context length to predict the next word of a sequence, conceptually, standard recurrent neural networks can take into account all of the predecessor words. On the other hand, it is well known that recurrent networks are difficult to train and therefore are unlikely to show the full potential of recurrent models. These problems are addressed by a the Long Short-Term Memory neural network architecture. In this work, we analyze this type of network on an English and a large French Language Modeling task. Experiments show improvements of about 8 % relative in perplexity over standard recurrent neural network LMs. In addition, we gain considerable improvements in WER on top of a state-of-the-art speech recognition system.

  • improved backing off for m gram Language Modeling
    International Conference on Acoustics Speech and Signal Processing, 1995
    Co-Authors: Reinhard Kneser, Hermann Ney
    Abstract:

    In stochastic Language Modeling, backing-off is a widely used method to cope with the sparse data problem. In case of unseen events this method backs off to a less specific distribution. In this paper we propose to use distributions which are especially optimized for the task of backing-off. Two different theoretical derivations lead to distributions which are quite different from the probability distributions that are usually used for backing-off. Experiments show an improvement of about 10% in terms of perplexity and 5% in terms of word error rate.

Jan Honza Černocký - One of the best experts on this subject based on the ideXlab platform.

  • RNNLM --- Recurrent Neural Network Language Modeling Toolkit
    Proceedings of ASRU 2011, 2011
    Co-Authors: Tomas Mikolov, Anoop Deoras, Stefan Kombrink, Lukas Burget, Jan Honza Černocký
    Abstract:

    —We present freely available open-source toolkit for training recurrent neural network based Language models. It can be easily used to improve existing speech recognition and machine translation systems. Also, it can be used as a baseline for future research of advanced Language Modeling techniques. In the paper, we discuss optimal parameter selection and different modes of functionality. The toolkit, example scripts and basic setups are freely available at http://rnnlm.sourceforge.net/.