Spoken Dialogue System

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 7197 Experts worldwide ranked by ideXlab platform

Diane J. Litman - One of the best experts on this subject based on the ideXlab platform.

  • The relative impact of student affect on performance models in a Spoken Dialogue tutoring System
    User Modeling and User-Adapted Interaction, 2008
    Co-Authors: Kate Forbes-riley, Mihai Rotaru, Diane J. Litman
    Abstract:

    We hypothesize that student affect is a useful predictor of Spoken Dialogue System performance, relative to other parameters. We test this hypothesis in the context of our Spoken Dialogue tutoring System, where student learning is the primary performance metric. We first present our System and corpora, which have been annotated with several student affective states, student correctness and discourse structure. We then discuss unigram and bigram parameters derived from these annotations. The unigram parameters represent each annotation type individu- ally, as well as System-generic features. The bigram parameters represent annotation combinations, including student state sequences and student states in the discourse structure context. We then use these parameters to build learning models. First, we build simple models based on correlations between each of our parameters and learning. Our results suggest that our affect parameters are among our most useful predictors of learning, particularly in specific discourse structure contexts. Next, we use the PARADISE framework (multiple linear regression) to build complex learning models containing only the most useful subset of parameters. Our approach is a value-added one; we perform a number of model-building experiments, both with and without including our affect parameters, and then compare the performance of the models on the training and the test sets. Our results show that when included as inputs, our affect parameters are selected as predictors in most models, and many of these models show high generalizability in testing. Our results also show that overall, the affect-included models significantly outperform the affect-excluded models.

  • comparing synthesized versus pre recorded tutor speech in an intelligent tutoring Spoken Dialogue System
    The Florida AI Research Society, 2006
    Co-Authors: Katherine Forbesriley, Diane J. Litman, Scott Silliman, Joel Tetreault
    Abstract:

    We evaluate the impact of tutor voice quulity in the context of our intelligent tutoring Spoken Dialogue System. We first describe two versions of our System which yielded two corpora of human-computer tutoring Dialogues: one using a tutor voiee pre-recorded by a human, and the other using a low-cost texr-to-speech tutor voice. We then discuss the results of two-tailed t-tests comparing student learning gains, System usability, and Dialogue efficiency across the two corpora and across corpora subsets. Overall, our results suggest that tutor voice quality may have only a minor impact on these metrics in the context of our tutoring System. We find that tutor voice quality docs not impact learning gains, hut it may impact usability and efficiency for some corpora subsets. Copyright © 2006, American Association for Artificial Intelligence (www.aaai.org). All rights reserved.

  • recognizing student emotions and attitudes on the basis of utterances in Spoken tutoring Dialogues with both human and computer tutors
    Speech Communication, 2006
    Co-Authors: Diane J. Litman, Katherine Forbesriley
    Abstract:

    While human tutors respond to both what a student says and to how the student says it, most tutorial Dialogue Systems cannot detect the student emotions and attitudes underlying an utterance. We present an empirical study investigating the feasibility of recognizing student state in two corpora of Spoken tutoring Dialogues, one with a human tutor, and one with a computer tutor. We first annotate student turns for negative, neutral and positive student states in both corpora. We then automatically extract acoustic–prosodic features from the student speech, and lexical items from the transcribed or recognized speech. We compare the results of machine learning experiments using these features alone, in combination, and with student and task dependent features, to predict student states. We also compare our results across human–human and human–computer Spoken tutoring Dialogues. Our results show significant improvements in prediction accuracy over relevant baselines, and provide a first step towards enhancing our intelligent tutoring Spoken Dialogue System to automatically recognize and adapt to student states.

  • ITSPOKE: An Intelligent Tutoring Spoken Dialogue System
    Proc. of the Human Language Technology Conference: 4th Meeting of the North American Chapter of the Association for Computational Linguistics (HLT NAA, 2004
    Co-Authors: Diane J. Litman, Scott Silliman
    Abstract:

    ITSPOKE is a Spoken Dialogue System that uses the Why2-Atlas text-based tutoring System as its "back-end". A student first types a natural language answer to a qualitative physics problem. ITSPOKE then engages the student in a Spoken Dialogue to provide feedback and correct misconceptions, and to elicit more complete explanations. We are using ITSPOKE to generate an empirically-based understanding of the ramifications of adding Spoken language capabilities to text-based Dialogue tutors.

  • optimizing Dialogue management with reinforcement learning experiments with the njfun System
    Journal of Artificial Intelligence Research, 2002
    Co-Authors: Satinder Singh, Diane J. Litman, Michael Kearns, Marilyn A Walker
    Abstract:

    Designing the Dialogue policy of a Spoken Dialogue System involves many nontrivial choices. This paper presents a reinforcement learning approach for automatically optimizing a Dialogue policy, which addresses the technical challenges in applying reinforcement learning to a working Dialogue System with human users. We report on the design, construction and empirical evaluation of NJFun, an experimental Spoken Dialogue System that provides users with access to information about fun things to do in New Jersey. Our results show that by optimizing its performance via reinforcement learning, NJFun measurably improves System performance.

Tatsuya Kawahara - One of the best experts on this subject based on the ideXlab platform.

  • Spoken Dialogue System for a human like conversational robot erica
    IWSDS, 2019
    Co-Authors: Tatsuya Kawahara
    Abstract:

    This article gives an overview of our symbiotic human-robot interaction project, which aims at an autonomous android who behaves and interacts just like a human. A conversational android ERICA is designed to conduct several social roles focused on Spoken Dialogue, such as attentive listening (similar to counseling) and job interview. Design principles in developing these Spoken Dialogue Systems are described, in particular focused on the attentive listening System. Generation of backchannels, fillers and laughter is also addressed to make human-like conversation behaviors.

  • bayes risk based Dialogue management for document retrieval System with speech interface
    Speech Communication, 2010
    Co-Authors: Teruhisa Misu, Tatsuya Kawahara
    Abstract:

    We propose an efficient technique of Dialogue management for an information navigation System based on a document knowledge base. The System can use ASR N-best hypotheses and contextual information to perform robustly for fragmental speech input and erroneous output of automatic speech recognition (ASR). It also has several choices in generating responses or confirmations. We formulate the optimization of these choices based on a Bayes risk criterion, which is defined based on a reward for correct information presentation and a penalty for redundant turns. The parameters for the Dialogue management we propose can be adaptively tuned by online learning. We evaluated this strategy with our Spoken Dialogue System called ''Dialogue Navigator for Kyoto City'', which generates responses based on the document retrieval and also has question-answering capability. The effectiveness of the proposed framework was demonstrated by the increased success rate of Dialogue and the reduced number of turns for information access through an experiment with a large number of utterances by real users.

  • bayes risk based Dialogue management for document retrieval System with speech interface
    International Conference on Computational Linguistics, 2008
    Co-Authors: Teruhisa Misu, Tatsuya Kawahara
    Abstract:

    We propose an efficient Dialogue management for an information navigation System based on a document knowledge base with a Spoken Dialogue interface. In order to perform robustly for fragmental speech input and erroneous output of an automatic speech recognition (ASR), the System should selectively use N-best hypotheses of ASR and contextual information. The System also has several choices in generating responses or confirmations. In this work, we formulate the optimization of the choices based on a unified criterion: Bayes risk, which is defined based on reward for correct information presentation and penalty for redundant turns. We have evaluated this strategy with a Spoken Dialogue System which also has questionanswering capability. Effectiveness of the proposed framework was confirmed in the success rate of retrieval and the average number of turns.

Teruhisa Misu - One of the best experts on this subject based on the ideXlab platform.

  • bayes risk based Dialogue management for document retrieval System with speech interface
    Speech Communication, 2010
    Co-Authors: Teruhisa Misu, Tatsuya Kawahara
    Abstract:

    We propose an efficient technique of Dialogue management for an information navigation System based on a document knowledge base. The System can use ASR N-best hypotheses and contextual information to perform robustly for fragmental speech input and erroneous output of automatic speech recognition (ASR). It also has several choices in generating responses or confirmations. We formulate the optimization of these choices based on a Bayes risk criterion, which is defined based on a reward for correct information presentation and a penalty for redundant turns. The parameters for the Dialogue management we propose can be adaptively tuned by online learning. We evaluated this strategy with our Spoken Dialogue System called ''Dialogue Navigator for Kyoto City'', which generates responses based on the document retrieval and also has question-answering capability. The effectiveness of the proposed framework was demonstrated by the increased success rate of Dialogue and the reduced number of turns for information access through an experiment with a large number of utterances by real users.

  • bayes risk based Dialogue management for document retrieval System with speech interface
    International Conference on Computational Linguistics, 2008
    Co-Authors: Teruhisa Misu, Tatsuya Kawahara
    Abstract:

    We propose an efficient Dialogue management for an information navigation System based on a document knowledge base with a Spoken Dialogue interface. In order to perform robustly for fragmental speech input and erroneous output of an automatic speech recognition (ASR), the System should selectively use N-best hypotheses of ASR and contextual information. The System also has several choices in generating responses or confirmations. In this work, we formulate the optimization of the choices based on a unified criterion: Bayes risk, which is defined based on reward for correct information presentation and penalty for redundant turns. We have evaluated this strategy with a Spoken Dialogue System which also has questionanswering capability. Effectiveness of the proposed framework was confirmed in the success rate of retrieval and the average number of turns.

Katherine Forbesriley - One of the best experts on this subject based on the ideXlab platform.

  • comparing synthesized versus pre recorded tutor speech in an intelligent tutoring Spoken Dialogue System
    The Florida AI Research Society, 2006
    Co-Authors: Katherine Forbesriley, Diane J. Litman, Scott Silliman, Joel Tetreault
    Abstract:

    We evaluate the impact of tutor voice quulity in the context of our intelligent tutoring Spoken Dialogue System. We first describe two versions of our System which yielded two corpora of human-computer tutoring Dialogues: one using a tutor voiee pre-recorded by a human, and the other using a low-cost texr-to-speech tutor voice. We then discuss the results of two-tailed t-tests comparing student learning gains, System usability, and Dialogue efficiency across the two corpora and across corpora subsets. Overall, our results suggest that tutor voice quality may have only a minor impact on these metrics in the context of our tutoring System. We find that tutor voice quality docs not impact learning gains, hut it may impact usability and efficiency for some corpora subsets. Copyright © 2006, American Association for Artificial Intelligence (www.aaai.org). All rights reserved.

  • recognizing student emotions and attitudes on the basis of utterances in Spoken tutoring Dialogues with both human and computer tutors
    Speech Communication, 2006
    Co-Authors: Diane J. Litman, Katherine Forbesriley
    Abstract:

    While human tutors respond to both what a student says and to how the student says it, most tutorial Dialogue Systems cannot detect the student emotions and attitudes underlying an utterance. We present an empirical study investigating the feasibility of recognizing student state in two corpora of Spoken tutoring Dialogues, one with a human tutor, and one with a computer tutor. We first annotate student turns for negative, neutral and positive student states in both corpora. We then automatically extract acoustic–prosodic features from the student speech, and lexical items from the transcribed or recognized speech. We compare the results of machine learning experiments using these features alone, in combination, and with student and task dependent features, to predict student states. We also compare our results across human–human and human–computer Spoken tutoring Dialogues. Our results show significant improvements in prediction accuracy over relevant baselines, and provide a first step towards enhancing our intelligent tutoring Spoken Dialogue System to automatically recognize and adapt to student states.

Steve Young - One of the best experts on this subject based on the ideXlab platform.

  • Exploiting Sentence and Context Representations in Deep Neural Models for Spoken Language Understanding
    arXiv, 2016
    Co-Authors: Lina M. Rojas Barahona, Nikola Mrkšić, Tsung Hsien Wen, Pei-hao Su, Stefan Ultes, Milica Gašić, Steve Young
    Abstract:

    This paper presents a deep learning architecture for the semantic decoder component of a Statistical Spoken Dialogue System. In a slot-filling Dialogue, the semantic decoder predicts the Dialogue act and a set of slot-value pairs from a set of n-best hypotheses returned by the Automatic Speech Recognition. Most current models for Spoken language understanding assume (i) word-aligned semantic annotations as in sequence taggers and (ii) delexicalisation, or a mapping of input words to domain-specific concepts using heuristics that try to capture morphological variation but that do not scale to other domains nor to language variation (e.g., morphology, synonyms, paraphrasing ). In this work the semantic decoder is trained using unaligned semantic annotations and it uses distributed semantic representation learning to overcome the limitations of explicit delexicalisation. The proposed architecture uses a convolutional neural network for the sentence representation and a long-short term memory network for the context representation. Results are presented for the publicly available DSTC2 corpus and an In-car corpus which is similar to DSTC2 but has a significantly higher word error rate (WER).

  • stochastic language generation in Dialogue using recurrent neural networks with convolutional sentence reranking
    arXiv: Computation and Language, 2015
    Co-Authors: Tsung Hsien Wen, Milica Gasic, Dongho Kim, Nikola Mrksic, David Vandyke, Steve Young
    Abstract:

    The natural language generation (NLG) component of a Spoken Dialogue System (SDS) usually needs a substantial amount of handcrafting or a well-labeled dataset to be trained on. These limitations add significantly to development costs and make cross-domain, multi-lingual Dialogue Systems intractable. Moreover, human languages are context-aware. The most natural response should be directly learned from data rather than depending on predefined syntaxes or rules. This paper presents a statistical language generator based on a joint recurrent and convolutional neural network structure which can be trained on Dialogue act-utterance pairs without any semantic alignments or predefined grammar trees. Objective metrics suggest that this new model outperforms previous methods under the same experimental conditions. Results of an evaluation by human judges indicate that it produces not only high quality but linguistically varied utterances which are preferred compared to n-gram and rule-based Systems.

  • learning from real users rating Dialogue success with neural networks for reinforcement learning in Spoken Dialogue Systems
    Conference of the International Speech Communication Association, 2015
    Co-Authors: David Vandyke, Tsung Hsien Wen, Milica Gasic, Dongho Kim, Nikola Mrksic, Steve Young
    Abstract:

    Copyright © 2015 ISCA. To train a statistical Spoken Dialogue System (SDS) it is essential that an accurate method for measuring task success is available. To date training has relied on presenting a task to either simulated or paid users and inferring the Dialogue's success by observing whether this presented task was achieved or not. Our aim however is to be able to learn from real users acting under their own volition, in which case it is non-trivial to rate the success as any prior knowledge of the task is simply unavailable. User feedback may be utilised but has been found to be inconsistent. Hence, here we present two neural network models that evaluate a sequence of turn-level features to rate the success of a Dialogue. Importantly these models make no use of any prior knowledge of the user's task. The models are trained on Dialogues generated by a simulated user and the best model is then used to train a policy on-line which is shown to perform at least as well as a baseline System using prior knowledge of the user's task. We note that the models should also be of interest for evaluating SDS and for monitoring a Dialogue in rule-based SDS.

  • Uncertainty management for on-line optimisation of a POMDP-based large-scale Spoken Dialogue System
    Proceedings of the Annual Conference of the International Speech Communication Association INTERSPEECH, 2011
    Co-Authors: Lucie Daubigney, Senthilkumar Chandramohan, Matthieu Geist, Milica Gašić, Olivier Pietquin, Steve Young
    Abstract:

    The optimization of Dialogue policies using reinforcement learning (RL) is now an accepted part of the state of the art in Spoken Dialogue Systems (SDS). Yet, it is still the case that the commonly used training algorithms for SDS require a large number of Dialogues and hence most Systems still rely on artificial data generated by a user simulator. Optimization is therefore performed off-line before releasing the System to real users. Gaussian Processes (GP) for RL have recently been applied to Dialogue Systems. One advantage of GP is that they compute an explicit measure of uncertainty in the value function estimates computed during learning. In this paper, a class of novel learning strategies is described which use uncertainty to control exploration on-line. Comparisons between several exploration schemes show that significant improvements to learning speed can be obtained and that rapid and safe online optimisation is possible, even on a complex task. Copyright © 2011 ISCA.

  • bayesian update of Dialogue state a pomdp framework for Spoken Dialogue Systems
    Computer Speech & Language, 2010
    Co-Authors: Blaise Thomson, Steve Young
    Abstract:

    This paper describes a statistically motivated framework for performing real-time Dialogue state updates and policy learning in a Spoken Dialogue System. The framework is based on the partially observable Markov decision process (POMDP), which provides a well-founded, statistical model of Spoken Dialogue management. However, exact belief state updates in a POMDP model are computationally intractable so approximate methods must be used. This paper presents a tractable method based on the loopy belief propagation algorithm. Various simplifications are made, which improve the efficiency significantly compared to the original algorithm as well as compared to other POMDP-based Dialogue state updating approaches. A second contribution of this paper is a method for learning in Spoken Dialogue Systems which uses a component-based policy with the episodic Natural Actor Critic algorithm. The framework proposed in this paper was tested on both simulations and in a user trial. Both indicated that using Bayesian updates of the Dialogue state significantly outperforms traditional definitions of the Dialogue state. Policy learning worked effectively and the learned policy outperformed all others on simulations. In user trials the learned policy was also competitive, although its optimality was less conclusive. Overall, the Bayesian update of Dialogue state framework was shown to be a feasible and effective approach to building real-world POMDP-based Dialogue Systems.