Phonemes

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 57726 Experts worldwide ranked by ideXlab platform

Kandarpa Kumar Sarma - One of the best experts on this subject based on the ideXlab platform.

  • an ann based approach to recognize initial Phonemes of spoken words of assamese language
    Applied Soft Computing, 2013
    Co-Authors: Mousmita Sarma, Kandarpa Kumar Sarma
    Abstract:

    Initial phoneme is used in spoken word recognition models. These are used to activate words starting with that phoneme in spoken word recognition models. Such investigations are critical for classification of initial phoneme into a phonetic group. A work is described in this paper using an artificial neural network (ANN) based approach to recognize initial consonant Phonemes of Assamese words. A self organizing map (SOM) based algorithm is developed to segment the initial Phonemes from its word counterpart. Using a combination of three types of ANN structures, namely recurrent neural network (RNN), SOM and probabilistic neural network (PNN), the proposed algorithm proves its superiority over the conventional discrete wavelet transform (DWT) based phoneme segmentation. The algorithm is exclusively designed on the basis of Assamese phonemical structure which consists of certain unique features and are grouped into six distinct phoneme families. Before applying the segmentation approach using SOM, an RNN is used to take some localized decision to classify the words into six phoneme families. Next the SOM segmented Phonemes are classified into individual Phonemes. A two-class PNN classification is performed with clean Assamese Phonemes, to recognize the segmented Phonemes. The validation of recognized Phonemes is checked by matching the first formant frequency of the phoneme. Formant frequency of Assamese Phonemes, estimated using the pole or formant location determination from the linear prediction model of vocal tract, is used effectively as a priori knowledge in the proposed algorithm.

  • Segmentation of Assamese Phonemes using SOM
    2012 3rd National Conference on Emerging Trends and Applications in Computer Science, 2012
    Co-Authors: Mousmita Sarma, Kandarpa Kumar Sarma
    Abstract:

    Phonemes are the smallest distinguishable unit of speech signal. Segmentation of phoneme from its word counterpart is a fundamental and crucial part in speech processing since initial phoneme is used to activate words starting with that phoneme. This work describes an Artificial Neural Network (ANN) based algorithm developed for segmentation and classification of consonant phoneme of Assamese language. The algorithm uses weight vectors, obtained by training Self Organizing Map (SOM) with different number of iteration. Segments of different Phonemes constituting the word whose LPC samples are used for training are obtained from SOM weights. A two class Probabilistic Neural Network (PNN) trained with clean Assamese phoneme is used to identify phoneme segment. The classification of phoneme segment is performed as per the consonant phoneme structure of Assamese language which consists of six phoneme families. Experimental results establish the superiority of the SOM-based segmentation over the speaker independent phoneme segmentation reported till now including those obtained using Discrete Wavelet Transform (DWT).

Richard P. Harvey - One of the best experts on this subject based on the ideXlab platform.

  • Phoneme-to-viseme mappings
    Speech Communication, 2017
    Co-Authors: Helen L. Bear, Richard P. Harvey
    Abstract:

    Visemes are the visual equivalent of Phonemes. Although not precisely defined, a common working definition of a viseme is a set of Phonemes which have identical appearance on the lips. Therefore a phoneme falls into one viseme class but a viseme may represent many Phonemes: a one-to-many mapping. This mapping introduces ambiguity between Phonemes when using viseme classifiers. Not only is this ambiguity damaging to the performance of audio-visual classifiers operating on real expressive speech, there is also considerable choice between possible mappings.In this paper we explore the issue of this choice of viseme-to-phoneme map. We show that there is definite difference in performance between viseme-to-phoneme mappings and explore why some maps appear to work better than others. We also devise a new algorithm for constructing phoneme-to-viseme mappings from labeled speech data. These new visemes, Bear visemes, are shown to perform better than previously known units.

  • finding Phonemes improving machine lip reading
    AVSP, 2015
    Co-Authors: Helen L. Bear, Richard P. Harvey, Yuxuan Lan
    Abstract:

    In machine lip-reading there is continued debate and research around the correct classes to be used for recognition. In this paper we use a structured approach for devising speaker-dependent viseme classes, which enables the creation of a set of phoneme-to-viseme maps where each has a different quantity of visemes ranging from two to 45. Viseme classes are based upon the mapping of articulated Phonemes, which have been confused during phoneme recognition, into viseme groups. Using these maps, with the LiLIR dataset, we show the effect of changing the viseme map size in speaker-dependent machine lip-reading, measured by word recognition correctness and so demonstrate that word recognition with phoneme classifiers is not just possible, but often better than word recognition with viseme classifiers. Furthermore, there are intermediate units between visemes and Phonemes which are better still.

Tarek Sherif - One of the best experts on this subject based on the ideXlab platform.

  • applying many to many alignments and hidden markov models to letter to phoneme conversion
    North American Chapter of the Association for Computational Linguistics, 2007
    Co-Authors: Sittichai Jiampojama, Grzegorz Kondrak, Tarek Sherif
    Abstract:

    Letter-to-phoneme conversion generally requires aligned training data of letters and Phonemes. Typically, the alignments are limited to one-to-one alignments. We present a novel technique of training with many-to-many alignments. A letter chunking bigram prediction manages double letters and double Phonemes automatically as opposed to preprocessing with fixed lists. We also apply an HMM method in conjunction with a local classification model to predict a global phoneme sequence given a word. The many-to-many alignments result in significant improvements over the traditional one-to-one approach. Our system achieves state-of-the-art performance on several languages and data sets.

Helen L. Bear - One of the best experts on this subject based on the ideXlab platform.

  • Phoneme-to-viseme mappings
    Speech Communication, 2017
    Co-Authors: Helen L. Bear, Richard P. Harvey
    Abstract:

    Visemes are the visual equivalent of Phonemes. Although not precisely defined, a common working definition of a viseme is a set of Phonemes which have identical appearance on the lips. Therefore a phoneme falls into one viseme class but a viseme may represent many Phonemes: a one-to-many mapping. This mapping introduces ambiguity between Phonemes when using viseme classifiers. Not only is this ambiguity damaging to the performance of audio-visual classifiers operating on real expressive speech, there is also considerable choice between possible mappings.In this paper we explore the issue of this choice of viseme-to-phoneme map. We show that there is definite difference in performance between viseme-to-phoneme mappings and explore why some maps appear to work better than others. We also devise a new algorithm for constructing phoneme-to-viseme mappings from labeled speech data. These new visemes, Bear visemes, are shown to perform better than previously known units.

  • finding Phonemes improving machine lip reading
    AVSP, 2015
    Co-Authors: Helen L. Bear, Richard P. Harvey, Yuxuan Lan
    Abstract:

    In machine lip-reading there is continued debate and research around the correct classes to be used for recognition. In this paper we use a structured approach for devising speaker-dependent viseme classes, which enables the creation of a set of phoneme-to-viseme maps where each has a different quantity of visemes ranging from two to 45. Viseme classes are based upon the mapping of articulated Phonemes, which have been confused during phoneme recognition, into viseme groups. Using these maps, with the LiLIR dataset, we show the effect of changing the viseme map size in speaker-dependent machine lip-reading, measured by word recognition correctness and so demonstrate that word recognition with phoneme classifiers is not just possible, but often better than word recognition with viseme classifiers. Furthermore, there are intermediate units between visemes and Phonemes which are better still.

Jessica Crossland - One of the best experts on this subject based on the ideXlab platform.

  • rhyme and alliteration phoneme detection and learning to read
    Developmental Psychology, 1990
    Co-Authors: Morag Maclea, Lynette Adley, Jessica Crossland
    Abstract:

    In this article, 3 views of the relation between various forms of phonological awareness (detection of rhyme and alliteration and detection of Phonemes) and children's reading were tested. These are (a) that the experience of learning to read leads to phoneme awareness and that neither of these is connected to awareness of rhyme, (b) that sensitivity to rhyme leads to awareness of Phonemes, which in turn affects reading, and (c) that rhyme makes a direct contribution to reading that is independent of the connection between reading and phoneme awareness. The results from a longitudinal study that monitored the phonological awareness and progress in reading and spelling of 65 children from the ages of 4 years 7 months to 6 years 7 months produced strong support for a combination of the 2nd and 3rd models and none at all for the 1 st model.