Speech Synthesis

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 360 Experts worldwide ranked by ideXlab platform

Junichi Yamagishi - One of the best experts on this subject based on the ideXlab platform.

  • A deep auto-encoder based low-dimensional feature extraction from FFT spectral envelopes for statistical parametric Speech Synthesis
    2016 IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), 2016
    Co-Authors: Shinji Takaki, Junichi Yamagishi
    Abstract:

    In the state-of-the-art statistical parametric Speech Synthesis system, a Speech analysis module, e.g. STRAIGHT spectral analysis, is generally used for obtaining accurate and stable spectral envelopes, and then low-dimensional acoustic features extracted from obtained spectral envelopes are used for training acoustic models. However, a spectral envelope estimation algorithm used in such a Speech analysis module includes various processing derived from human knowledge. In this paper, we present our investigation of deep autoencoder based, non-linear, data-driven and unsupervised low-dimensional feature extraction using FFT spectral envelopes for statistical parametric Speech Synthesis. Experimental results showed that a text-to-Speech Synthesis system using deep auto-encoder based low-dimensional feature extraction from FFT spectral envelopes is indeed a promising approach.

  • Speech Synthesis based on hidden markov models
    Proceedings of the IEEE, 2013
    Co-Authors: Keiichi Tokuda, Tomoki Toda, Junichi Yamagishi, Yoshihiko Nankaku, Keiichiro Oura
    Abstract:

    This paper gives a general overview of hidden Markov model (HMM)-based Speech Synthesis, which has recently been demonstrated to be very effective in synthesizing Speech. The main advantage of this approach is its flexibility in changing speaker identities, emotions, and speaking styles. This paper also discusses the relation between the HMM-based approach and the more conventional unit-selection approach that has dominated over the last decades. Finally, advanced techniques for future developments are described.

  • the romanian Speech Synthesis rss corpus building a high quality hmm based Speech Synthesis system using a high sampling rate
    Speech Communication, 2011
    Co-Authors: Adriana Stan, Simon King, Junichi Yamagishi, Matthew P Aylett
    Abstract:

    This paper first introduces a newly-recorded high quality Romanian Speech corpus designed for Speech Synthesis, called ''RSS'', along with Romanian front-end text processing modules and HMM-based synthetic voices built from the corpus. All of these are now freely available for academic use in order to promote Romanian Speech technology research. The RSS corpus comprises 3500 training sentences and 500 test sentences uttered by a female speaker and was recorded using multiple microphones at 96kHz sampling frequency in a hemianechoic chamber. The details of the new Romanian text processor we have developed are also given. Using the database, we then revisit some basic configuration choices of Speech Synthesis, such as waveform sampling frequency and auditory frequency warping scale, with the aim of improving speaker similarity, which is an acknowledged weakness of current HMM-based Speech Synthesisers. As we demonstrate using perceptual tests, these configuration choices can make substantial differences to the quality of the synthetic Speech. Contrary to common practice in automatic Speech recognition, higher waveform sampling frequencies can offer enhanced feature extraction and improved speaker similarity for HMM-based Speech Synthesis.

  • roles of the average voice in speaker adaptive hmm based Speech Synthesis
    Conference of the International Speech Communication Association, 2010
    Co-Authors: Junichi Yamagishi, Simon King, Oliver Watts, Bela Usabaev
    Abstract:

    In speaker-adaptive HMM-based Speech Synthesis, there are typically a few speakers for which the output synthetic Speech sounds worse than that of other speakers, despite having the same amount of adaptation data from within the same corpus. This paper investigates these fluctuations in quality and concludes that as mel-cepstral distance from the average voice becomes larger, the MOS naturalness scores generally become worse. Although this negative correlation is not that strong, it suggests a way to improve the training and adaptation strategies. We also draw comparisons between our findings and the work of other researchers regarding “vocal attractiveness.” Index Terms: Speech Synthesis, HMM, average voice, speaker adaptation

  • recent development of the hmm based Speech Synthesis system hts
    Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2009
    Co-Authors: Keiichiro Oura, Shinji Sako, Takashi Nose, Alan W Black, Tomoki Toda, Junichi Yamagishi, Takashi Masuko, Keiichi Tokuda
    Abstract:

    A statistical parametric approach to Speech Synthesis based on hidden Markov models (HMMs) has grown in popularity over the last few years. In this approach, spectrum, excitation, and duration of Speech are simultaneously modeled by context-dependent HMMs, and Speech waveforms are generated from the HMMs themselves. Since December 2002, we have publicly released an open-source software toolkit named “HMMbased Speech Synthesis system (HTS)” to provide a research and development toolkit for statistical parametric Speech Synthesis. This paper describes recent developments of HTS in detail, as well as future release plans.

Keiichi Tokuda - One of the best experts on this subject based on the ideXlab platform.

  • Speech Synthesis based on hidden markov models
    Proceedings of the IEEE, 2013
    Co-Authors: Keiichi Tokuda, Tomoki Toda, Junichi Yamagishi, Yoshihiko Nankaku, Keiichiro Oura
    Abstract:

    This paper gives a general overview of hidden Markov model (HMM)-based Speech Synthesis, which has recently been demonstrated to be very effective in synthesizing Speech. The main advantage of this approach is its flexibility in changing speaker identities, emotions, and speaking styles. This paper also discusses the relation between the HMM-based approach and the more conventional unit-selection approach that has dominated over the last decades. Finally, advanced techniques for future developments are described.

  • overview of nit hmm based Speech Synthesis system for blizzard challenge 2012
    2012
    Co-Authors: Shinji Takaki, Keiichiro Oura, Kei Hashimoto, Kei Sawada, Keiichi Tokuda
    Abstract:

    This paper describes a hidden Markov model (HMM) based Speech Synthesis system developed for the Blizzard Challenge 2012. In the Blizzard Challenge 2012, we focused on a design of contexts for using audio books as training data and duration modeling of silence between sentences for synthesizing paragraphs. It is well known that contextual factors affect Speech. We use extended contexts for using audio books to construct appropriate model parameter tying structures. In addition, duration models of silence between sentences are created to synthesize more natural Speech because connections between sentences are important for synthesizing paragraphs. Subjective evaluation results show that the system synthesized the high intelligible Speech. Index Terms: Speech Synthesis, hidden Markov model, context clustering

  • overview of nit hmm based Speech Synthesis system for blizzard challenge 2011
    2011
    Co-Authors: Kei Hashimoto, Keiichiro Oura, Shinji Takaki, Keiichi Tokuda
    Abstract:

    This paper describes a hidden Markov model (HMM) based Speech Synthesis system developed for the Blizzard Challenge 2011. In the Blizzard Challenge 2011, we focused on the training algorithm for HMM-based Speech Synthesis systems. To alleviate the local maxima problems in the maximum likelihood estimation, we apply the deterministic annealing expectation maximization (DAEM) algorithm for training HMMs. By using the DAEM algorithm, the reliable acoustic model parameters can be estimated. In addition, we apply stepwise model selection to the model training. The decision tree based context clustering is used as model selection in HMM-based Speech Synthesis. By using the stepwise model selection method, decision trees are gradually changed from small trees into large trees for estimating reliable acoustic models. Subjective evaluation results show that the system synthesized the high intelligible Speech. Index Terms: Speech Synthesis, hidden Markov model, deterministic annealing, model structure

  • recent development of the hmm based Speech Synthesis system hts
    Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2009
    Co-Authors: Keiichiro Oura, Shinji Sako, Takashi Nose, Alan W Black, Tomoki Toda, Junichi Yamagishi, Takashi Masuko, Keiichi Tokuda
    Abstract:

    A statistical parametric approach to Speech Synthesis based on hidden Markov models (HMMs) has grown in popularity over the last few years. In this approach, spectrum, excitation, and duration of Speech are simultaneously modeled by context-dependent HMMs, and Speech waveforms are generated from the HMMs themselves. Since December 2002, we have publicly released an open-source software toolkit named “HMMbased Speech Synthesis system (HTS)” to provide a research and development toolkit for statistical parametric Speech Synthesis. This paper describes recent developments of HTS in detail, as well as future release plans.

  • Robust Speaker-Adaptive HMM-Based Text-to-Speech Synthesis
    IEEE Transactions on Audio Speech and Language Processing, 2009
    Co-Authors: Junichi Yamagishi, Steve Renals, Takashi Nose, Zhen-hua Ling, Keiichi Tokuda, Simon King, Heiga Zen, Tomoki Toda
    Abstract:

    This paper describes a speaker-adaptive HMM-based Speech Synthesis system. The new system, called ldquoHTS-2007,rdquo employs speaker adaptation (CSMAPLR+MAP), feature-space adaptive training, mixed-gender modeling, and full-covariance modeling using CSMAPLR transforms, in addition to several other techniques that have proved effective in our previous systems. Subjective evaluation results show that the new system generates significantly better quality synthetic Speech than speaker-dependent approaches with realistic amounts of Speech data, and that it bears comparison with speaker-dependent approaches even when large amounts of Speech data are available. In addition, a comparison study with several Speech Synthesis techniques shows the new system is very robust: It is able to build voices from less-than-ideal Speech data and synthesize good-quality Speech even for out-of-domain sentences.

Alan W Black - One of the best experts on this subject based on the ideXlab platform.

  • Postfilters to Modify the Modulation Spectrum for Statistical Parametric Speech Synthesis
    IEEE ACM Transactions on Audio Speech and Language Processing, 2016
    Co-Authors: Shinnosuke Takamichi, Alan W Black, Tomoki Toda, Graham Neubig, Sakriani Sakti, Satoshi Nakamura
    Abstract:

    This paper presents novel approaches based on modulation spectrum (MS) for high-quality statistical parametric Speech Synthesis, including text-to-Speech (TTS) and voice conversion (VC). Although statistical parametric Speech Synthesis offers various advantages over concatenative Speech Synthesis, the synthetic Speech quality is still not as good as that of concatenative Speech Synthesis or the quality of natural Speech. One of the biggest issues causing the quality degradation is the over-smoothing effect often observed in the generated Speech parameter trajectories. Global variance (GV) is known as a feature well correlated with the over-smoothing effect, and the effectiveness of keeping the GV of the generated Speech parameter trajectories similar to those of natural Speech has been confirmed. However, the quality gap between natural Speech and synthetic Speech is still large. In this paper, we propose using the MS of the generated Speech parameter trajectories as a new feature to effectively quantify the over-smoothing effect. Moreover, we propose postfilters to modify the MS utterance by utterance or segment by segment to make the MS of synthetic Speech close to that of natural Speech. The proposed postfilters are applicable to various synthesizers based on statistical parametric Speech Synthesis. We first perform an evaluation of the proposed method in the framework of hidden Markov model (HMM)-based TTS, examining its properties from different perspectives. Furthermore, effectiveness of the proposed postfilters are also evaluated in Gaussian mixture model (GMM)-based VC and classification and regression trees (CART)-based TTS (a.k.a., CLUSTERGEN). The experimental results demonstrate that 1) the proposed utterance-level postfilter achieves quality comparable to the conventional generation algorithm considering the GV, and yields significant improvements by applying to the GV-based generation algorithm in HMM-based TTS, 2) the proposed segment-level postfilter capable of achieving low-delay Synthesis also yields significant improvements in synthetic Speech quality, and 3) the proposed postfilters are also effective in not only HMM-based TTS but also GMM-based VC and CLUSTERGEN.

  • Articulatory features for expressive Speech Synthesis
    2012 IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), 2012
    Co-Authors: Alan W Black, Prasanna Kumar Muthukumar, Tim Polzehl, Timothy H. Bunnell, Daniel Perry, Stefan Steidl, Florian Metze, Kishore Prahallad, Callie Vaughn
    Abstract:

    This paper describes some of the results from the project entitled “New Parameterization for Emotional Speech Synthesis” held at the Summer 2011 JHU CLSP workshop. We describe experiments on how to use articulatory features as a meaningful intermediate representation for Speech Synthesis. This parameterization not only allows us to reproduce natural sounding Speech but also allows us to generate stylistically varying Speech.

  • recent development of the hmm based Speech Synthesis system hts
    Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2009
    Co-Authors: Keiichiro Oura, Shinji Sako, Takashi Nose, Alan W Black, Tomoki Toda, Junichi Yamagishi, Takashi Masuko, Keiichi Tokuda
    Abstract:

    A statistical parametric approach to Speech Synthesis based on hidden Markov models (HMMs) has grown in popularity over the last few years. In this approach, spectrum, excitation, and duration of Speech are simultaneously modeled by context-dependent HMMs, and Speech waveforms are generated from the HMMs themselves. Since December 2002, we have publicly released an open-source software toolkit named “HMMbased Speech Synthesis system (HTS)” to provide a research and development toolkit for statistical parametric Speech Synthesis. This paper describes recent developments of HTS in detail, as well as future release plans.

  • Statistical Parametric Speech Synthesis
    2007 IEEE International Conference on Acoustics Speech and Signal Processing - ICASSP '07, 2007
    Co-Authors: Alan W Black, Heiga Zen, Kazuhiro Tokuda
    Abstract:

    This paper gives a general overview of techniques in statistical parametric Speech Synthesis. One of the instances of these techniques, called HMM-based generation Synthesis (or simply HMM-based Synthesis), has recently been shown to be very effective in generating acceptable Speech Synthesis. This paper also contrasts these techniques with the more conventional unit selection technology that has dominated Speech Synthesis over the last ten years. Advantages and disadvantages of statistical parametric Synthesis are highlighted as well as identifying where we expect the key developments to appear in the immediate future.

  • the hmm based Speech Synthesis system hts version 2 0
    SSW, 2007
    Co-Authors: Takashi Nose, Shinji Sako, Alan W Black, Junichi Yamagishi, Takashi Masuko, Keiichi Tokuda
    Abstract:

    A statistical parametric Speech Synthesis system based on hidden Markov models (HMMs) has grown in popularity over the last few years. This system simultaneously models spectrum, excitation, and duration of Speech using context-dependent HMMs and generates Speech waveforms from the HMMs themselves. Since December 2002, we have publicly released an open-source software toolkit named HMM-based Speech Synthesis system (HTS) to provide a research and development platform for the Speech Synthesis community. In December 2006, HTS version 2.0 was released. This version includes a number of new features which are useful for both Speech Synthesis researchers and developers. This paper describes HTS version 2.0 in detail, as well as future release plans.

Tomoki Toda - One of the best experts on this subject based on the ideXlab platform.

  • Postfilters to Modify the Modulation Spectrum for Statistical Parametric Speech Synthesis
    IEEE ACM Transactions on Audio Speech and Language Processing, 2016
    Co-Authors: Shinnosuke Takamichi, Alan W Black, Tomoki Toda, Graham Neubig, Sakriani Sakti, Satoshi Nakamura
    Abstract:

    This paper presents novel approaches based on modulation spectrum (MS) for high-quality statistical parametric Speech Synthesis, including text-to-Speech (TTS) and voice conversion (VC). Although statistical parametric Speech Synthesis offers various advantages over concatenative Speech Synthesis, the synthetic Speech quality is still not as good as that of concatenative Speech Synthesis or the quality of natural Speech. One of the biggest issues causing the quality degradation is the over-smoothing effect often observed in the generated Speech parameter trajectories. Global variance (GV) is known as a feature well correlated with the over-smoothing effect, and the effectiveness of keeping the GV of the generated Speech parameter trajectories similar to those of natural Speech has been confirmed. However, the quality gap between natural Speech and synthetic Speech is still large. In this paper, we propose using the MS of the generated Speech parameter trajectories as a new feature to effectively quantify the over-smoothing effect. Moreover, we propose postfilters to modify the MS utterance by utterance or segment by segment to make the MS of synthetic Speech close to that of natural Speech. The proposed postfilters are applicable to various synthesizers based on statistical parametric Speech Synthesis. We first perform an evaluation of the proposed method in the framework of hidden Markov model (HMM)-based TTS, examining its properties from different perspectives. Furthermore, effectiveness of the proposed postfilters are also evaluated in Gaussian mixture model (GMM)-based VC and classification and regression trees (CART)-based TTS (a.k.a., CLUSTERGEN). The experimental results demonstrate that 1) the proposed utterance-level postfilter achieves quality comparable to the conventional generation algorithm considering the GV, and yields significant improvements by applying to the GV-based generation algorithm in HMM-based TTS, 2) the proposed segment-level postfilter capable of achieving low-delay Synthesis also yields significant improvements in synthetic Speech quality, and 3) the proposed postfilters are also effective in not only HMM-based TTS but also GMM-based VC and CLUSTERGEN.

  • Speech Synthesis based on hidden markov models
    Proceedings of the IEEE, 2013
    Co-Authors: Keiichi Tokuda, Tomoki Toda, Junichi Yamagishi, Yoshihiko Nankaku, Keiichiro Oura
    Abstract:

    This paper gives a general overview of hidden Markov model (HMM)-based Speech Synthesis, which has recently been demonstrated to be very effective in synthesizing Speech. The main advantage of this approach is its flexibility in changing speaker identities, emotions, and speaking styles. This paper also discusses the relation between the HMM-based approach and the more conventional unit-selection approach that has dominated over the last decades. Finally, advanced techniques for future developments are described.

  • recent development of the hmm based Speech Synthesis system hts
    Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2009
    Co-Authors: Keiichiro Oura, Shinji Sako, Takashi Nose, Alan W Black, Tomoki Toda, Junichi Yamagishi, Takashi Masuko, Keiichi Tokuda
    Abstract:

    A statistical parametric approach to Speech Synthesis based on hidden Markov models (HMMs) has grown in popularity over the last few years. In this approach, spectrum, excitation, and duration of Speech are simultaneously modeled by context-dependent HMMs, and Speech waveforms are generated from the HMMs themselves. Since December 2002, we have publicly released an open-source software toolkit named “HMMbased Speech Synthesis system (HTS)” to provide a research and development toolkit for statistical parametric Speech Synthesis. This paper describes recent developments of HTS in detail, as well as future release plans.

  • Robust Speaker-Adaptive HMM-Based Text-to-Speech Synthesis
    IEEE Transactions on Audio Speech and Language Processing, 2009
    Co-Authors: Junichi Yamagishi, Steve Renals, Takashi Nose, Zhen-hua Ling, Keiichi Tokuda, Simon King, Heiga Zen, Tomoki Toda
    Abstract:

    This paper describes a speaker-adaptive HMM-based Speech Synthesis system. The new system, called ldquoHTS-2007,rdquo employs speaker adaptation (CSMAPLR+MAP), feature-space adaptive training, mixed-gender modeling, and full-covariance modeling using CSMAPLR transforms, in addition to several other techniques that have proved effective in our previous systems. Subjective evaluation results show that the new system generates significantly better quality synthetic Speech than speaker-dependent approaches with realistic amounts of Speech data, and that it bears comparison with speaker-dependent approaches even when large amounts of Speech data are available. In addition, a comparison study with several Speech Synthesis techniques shows the new system is very robust: It is able to build voices from less-than-ideal Speech data and synthesize good-quality Speech even for out-of-domain sentences.

  • speaker independent hmm based Speech Synthesis system hts 2007 system for the blizzard challenge 2007
    2007
    Co-Authors: Junichi Yamagishi, Tomoki Toda, Keiichi Tokuda
    Abstract:

    This paper describes an HMM-based Speech Synthesis system developed by the HTS working group for the Blizzard Challenge 2007. To further explore the potential of HMM-based Speech Synthesis, we incorporate new features in our conventional system which underpin a speaker-independent approach: speaker adaptation techniques; adaptive training for HSMMs; and full covariance modeling using the CSMAPLR transforms.

Takao Kobayashi - One of the best experts on this subject based on the ideXlab platform.

  • duration prediction using multiple gaussian process experts for gpr based Speech Synthesis
    International Conference on Acoustics Speech and Signal Processing, 2017
    Co-Authors: Decha Moungsri, Tomoki Koriyama, Takao Kobayashi
    Abstract:

    This paper proposes an alternative multi-level approach to duration prediction for improving prosody generation in statistical parametric Speech Synthesis using multiple Gaussian process experts. We use two duration models at different levels, specifically, syllable and phone. First, we individually train syllable- and phone-level duration models. Then, the predictive distributions of syllable and phone duration models are combined by product of Gaussians. The means of combined predictive distributions are used as predicted durations for synthetic Speech. We show objective and subjective evaluation results for the proposed technique by comparing with the conventional ones when the techniques are applied to Gaussian process regression (GPR)-based Speech Synthesis.

  • Prosody Control and Variation Enhancement Techniques for HMM-Based Expressive Speech Synthesis
    Speech Prosody in Speech Synthesis: Modeling and generation of prosody for high quality and flexible speech synthesis, 2015
    Co-Authors: Takao Kobayashi
    Abstract:

    Natural Speech has diverse forms of expressiveness including emotions, speaking styles, and voice characteristics. Moreover, the expressivity changes depending on many factors at the phrase level, such as the speaker’s temporal emotional state, focus, feelings, and intention. Thus taking into account such variations in modeling of Speech Synthesis units is crucial to generating natural-sounding expressive Speech. In this context, two approaches to HMM-based expressive Speech Synthesis are described: a technique for intuitively controlling style expressivity appearing in synthetic Speech by incorporating subjective intensity scores in the model training and a technique for enhancing prosodic variations of synthetic Speech using a newly defined phrase-level context for HMM-based Speech Synthesis and its unsupervised annotation for training data consisting of expressive Speech.

  • A Style Control Technique for HMM-Based Expressive Speech Synthesis
    IEICE Transactions on Information and Systems, 2007
    Co-Authors: Takashi Nose, Junichi Yamagishi, Takashi Masuko, Takao Kobayashi
    Abstract:

    This paper describes a technique for controlling the degree of expressivity of a desired emotional expression and/or speaking style of synthesized Speech in an HMM-based Speech Synthesis framework. With this technique, multiple emotional expressions and speaking styles of Speech are modeled in a single model by using a multiple-regression hidden semi-Markov model (MRHSMM). A set of control parameters, called the style vector, is defined, and each Speech Synthesis unit is modeled by using the MRHSMM, in which mean parameters of the state output and duration distributions are expressed by multiple-regression of the style vector. In the Synthesis stage, the mean parameters of the Synthesis units are modified by transforming an arbitrarily given style vector that corresponds to a point in a low-dimensional space, called style space, each of whose coordinates represents a certain specific speaking style or emotion of Speech. The results of subjective evaluation tests show that style and its intensity can be controlled by changing the style vector.

  • hidden semi markov model based Speech Synthesis
    Conference of the International Speech Communication Association, 2004
    Co-Authors: Keiichi Tokuda, Takashi Masuko, Takao Kobayashi, Tadashi Kitamura
    Abstract:

    In the present paper, a hidden-semi Markov model (HSMM) based Speech Synthesis system is proposed. In a hidden Markov model (HMM) based Speech Synthesis system which we have proposed, rhythm and tempo are controlled by state duration probability distributions modeled by single Gaussian distributions. To Synthesis Speech, it constructs a sentence HMM corresponding to an arbitralily given text and determine state durations maximizing their probabilities, then a Speech parameter vector sequence is generated for the given state sequence. However, there is an inconsistency: although the Speech is synthesized from HMMs with explicit state duration probability distributions, HMMs are trained without them. In the present paper, we introduce an HSMM, which is an HMM with explicit state duration probability distributions, into the HMM-based Speech Synthesis system. Experimental results show that the use of HSMM training improves the naturalness of the synthesized Speech.

  • speaker interpolation for hmm based Speech Synthesis system
    The Journal of The Acoustical Society of Japan (e), 2000
    Co-Authors: Takayoshi Yoshimura, Keiichi Tokuda, Takashi Masuko, Takao Kobayashi, Tadashi Kitamura
    Abstract:

    This paper describes an approach to voice characteristics conversion for an HMM-based text-to-Speech Synthesis system using speaker interpolation.Although most text-to-Speech Synthesis systems which synthesize Speech by concatenating Speech units can synthesize Speech with acceptable quality, they still cannot synthesize Speech with various voice quality such as speaker individualities and emotions;In order to control speaker individualities and emotions, therefore, they need a large database, which records Speech units with various voice characteristics in sythesis phase.On the other hand, our system synthesize Speech with untrained speaker’s voice quality by interpolating HMM parameters among some representative speakers’ HMM sets.Accordingly, our system can synthesize Speech with various voice quality without large database in Synthesis phase.An HMM interpolation technique is derived from a probabilistic similarity measure for HMMs, and used to synthesize Speech with untrained speaker’s voice quality by interpolating HMM parameters among some representative speakers’ HMM sets.The results of subjective experiments show that we can gradually change the voice quality of synthesized Speech from one’s to the other’s by changing the interpolation ratio.