Visual Database

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 1377 Experts worldwide ranked by ideXlab platform

Jean-philippe Thiran - One of the best experts on this subject based on the ideXlab platform.

  • Class-specific classifiers in audio-Visual speech recognition
    2010 18th European Signal Processing Conference, 2010
    Co-Authors: Virginia Estellers, Paul M. Baggenstoss, Jean-philippe Thiran
    Abstract:

    In this paper, class-specific classifiers for audio, Visual and audioVisual speech recognition systems are developed and compared with traditional Bayes classifiers. We use state-of-the-art feature extraction methods and develop traditional and class-specific classifiers for speech recognition, showing the benefits of a class-specific method on each modality for speaker dependent and independent set-ups. Experiments with a reference audio-Visual Database show a general increase in the systems performance by the introduction of class-specific techniques on both Visual and audio-Visual modalities.

  • Multimodal speaker localization in a probabilistic framework
    2006 14th European Signal Processing Conference, 2006
    Co-Authors: Mihai Gurban, Jean-philippe Thiran
    Abstract:

    A multimodal probabilistic framework is proposed for the problem of finding the active speaker in a video sequence. We localize the current speaker's mouth in the image by using the video and the audio channels together. We propose a novel Visual feature that is well-suited for the analysis of the movement of the mouth. After estimating the joint probability density of the audio and Visual features, we can find the most probable location of the current speaker's mouth in a sequence of images. The proposed method is tested on the CUAVE audio-Visual Database, yielding improved results, compared to other approaches from the literature.

Hasan M. Jamil - One of the best experts on this subject based on the ideXlab platform.

  • VisFlow: A Visual Database Integration and Workflow Querying System
    2017 IEEE 33rd International Conference on Data Engineering (ICDE), 2017
    Co-Authors: Hasan M. Jamil
    Abstract:

    The adoption and availability of diverse application design and support platforms are making generic scientific application orchestration increasingly difficult. In such an evolving environment, higher level abstractions of design primitives are critically important using which end users have a chance to craft their own applications without a complete technical grasp of the lower level details. In this research, we introduce a novel scientific workflow design platform that supports high level tools for data integration, process description and analytics based on a Visual language for naive users and advanced options for computing savvy programmers in one single platform, called VisFlow. We describe its salient features and advantages using a complex scientific application in natural resources and ecology. Video: https://youtu.be/ 2YSYVyOuuk.

  • ICDE - VisFlow: A Visual Database Integration and Workflow Querying System
    2017 IEEE 33rd International Conference on Data Engineering (ICDE), 2017
    Co-Authors: Hasan M. Jamil, Xiaogang Ma
    Abstract:

    The adoption and availability of diverse application design and support platforms are making generic scientific application orchestration increasingly difficult. In such an evolving environment, higher level abstractions of design primitives are critically important using which end users have a chance to craft their own applications without a complete technical grasp of the lower level details. In this research, we introduce a novel scientific workflow design platform that supports high level tools for data integration, process description and analytics based on a Visual language for naive users and advanced options for computing savvy programmers in one single platform, called VisFlow. We describe its salient features and advantages using a complex scientific application in natural resources and ecology. Video: https://youtu.be/ 2YSYVyOuuk.

Gultekin Ozsoyoglu - One of the best experts on this subject based on the ideXlab platform.

  • towards a unified Visual Database access
    International Conference on Management of Data, 1993
    Co-Authors: K Vadaparty, Y A Aslandogan, Gultekin Ozsoyoglu
    Abstract:

    Since the development of QBE, over fifty Visual query languages have been proposed to facilitate easy Database access. Although these languages have introduced some very useful paradigms, a number of these have some severe limitations, such as: (a) not extending beyond the relational model (b) not considering negation and safety, formally (c) using ad hoc constructs, with no analysis of expressivity or complexity done, etc. Note that Visual Database access is an important issue being revisted, with the emergence of different flavors of object-oriented Databases. We believe that there is a need for developing a unified Visual query language. Specifically, our goal is to develop a Visual query language that has the following properties: (i) It has a few core constructs using which “expert-users” can define new ( derived ) constructs easily (ii) “Normal users” can use easily either the core or the derived constructs for Database querying (iii) It can implement representative constructs of other (textual or Visual) query language straightforwardly, and (iv) It has formal semantics, with its theoretical properties, such as complexity, analyzed. We believe that we make a first step towards the above goal by introducing a new logical construct called restricted universal quantifier and combining it with the hierarchical structure of windows to develop a V isual Q uery L anguage, called VQL. The core constructs of VQL can encode easily a number of representative constructs of different (about six Visual and four non-Visual) relational, nested and object-oriented query languages. We also study the theoretical aspects such as safety, complexity, etc., of VQL.

  • SIGMOD Conference - Towards a unified Visual Database access
    Proceedings of the 1993 ACM SIGMOD international conference on Management of data - SIGMOD '93, 1993
    Co-Authors: K Vadaparty, Y A Aslandogan, Gultekin Ozsoyoglu
    Abstract:

    Since the development of QBE, over fifty Visual query languages have been proposed to facilitate easy Database access. Although these languages have introduced some very useful paradigms, a number of these have some severe limitations, such as: (a) not extending beyond the relational model (b) not considering negation and safety, formally (c) using ad hoc constructs, with no analysis of expressivity or complexity done, etc. Note that Visual Database access is an important issue being revisted, with the emergence of different flavors of object-oriented Databases. We believe that there is a need for developing a unified Visual query language. Specifically, our goal is to develop a Visual query language that has the following properties: (i) It has a few core constructs using which “expert-users” can define new ( derived ) constructs easily (ii) “Normal users” can use easily either the core or the derived constructs for Database querying (iii) It can implement representative constructs of other (textual or Visual) query language straightforwardly, and (iv) It has formal semantics, with its theoretical properties, such as complexity, analyzed. We believe that we make a first step towards the above goal by introducing a new logical construct called restricted universal quantifier and combining it with the hierarchical structure of windows to develop a V isual Q uery L anguage, called VQL. The core constructs of VQL can encode easily a number of representative constructs of different (about six Visual and four non-Visual) relational, nested and object-oriented query languages. We also study the theoretical aspects such as safety, complexity, etc., of VQL.

Sridha Sridharan - One of the best experts on this subject based on the ideXlab platform.

  • cross Database audio Visual speech adaptation for phonetic spoken term detection
    Computer Speech & Language, 2017
    Co-Authors: Shahram Kalantari, David Dean, Sridha Sridharan
    Abstract:

    We show that the use of Visual information helps both phone recognition and spoken term detection accuracy.Fused HMM adaptation could be utilized to benefit from multiple Databases when training audio Visual phone modelsAn additional audio adaptation improves cross-Database training accuracy for phone recognition and spoken term detection.A post training step can be used to update all HMM parameters and further improve phone recognition accuracy Spoken term detection (STD), the process of finding all occurrences of a specified search term in a large amount of speech segments, has many applications in multimedia search and retrieval of information. It is known that use of video information in the form of lip movements can improve the performance of STD in the presence of audio noise. However, research in this direction has been hampered by the unavailability of large annotated audio Visual Databases for development. We propose a novel approach to develop audio Visual spoken term detection when only a small (low resource) audio Visual Database is available for development. First, cross Database training is proposed as a novel framework using the fused hidden Markov modeling (HMM) technique, which is used to train an audio model using extensive large and publicly available audio Databases; then it is adapted to the Visual data of the given audio Visual Database. This approach is shown to perform better than standard HMM joint-training method and also improves the performance of spoken term detection when used in the indexing stage. In another attempt, the external audio models are first adapted to the audio data of the given audio Visual Database and then they are adapted to the Visual data. This approach also improves both phone recognition and spoken term detection accuracy. Finally, the cross Database training technique is used as HMM initialization, and an extra parameter re-estimation step is applied on the initialized models using Baum Welch technique. The proposed approaches for audio Visual model training have allowed for benefiting from both large extensive out of domain audio Databases that are available and the small audio Visual Database that is given for development to create more accurate audio-Visual models.

  • cross Database audio Visual speech adaptation for phonetic spoken term detection
    Science & Engineering Faculty, 2017
    Co-Authors: Shahram Kalantari, David Dean, Sridha Sridharan
    Abstract:

    Spoken term detection (STD), the process of finding all occurrences of a specified search term in a large amount of speech segments, has many applications in multimedia search and retrieval of information. It is known that use of video information in the form of lip movements can improve the performance of STD in the presence of audio noise. However, research in this direction has been hampered by the unavailability of large annotated audio Visual Databases for development. We propose a novel approach to develop audio Visual spoken term detection when only a small (low resource) audio Visual Database is available for development. First, cross Database training is proposed as a novel method using the fused hidden Markov modelling (HMM) technique, which is used to train an audio model using extensive large and publicly available audio Databases; then it is adapted to the Visual data of the given audio Visual Database. This approach is shown to perform better than standard HMM joint-training method and also improves the performance of spoken term detection when used in the indexing stage. In another attempt, the external audio models are first adapted to the audio data of the given audio Visual Database and then they are adapted to the Visual data. This approach also improves both phone recognition and spoken term detection accuracy. Finally, the cross Database training technique is used as HMM initialisation and an extra parameter re-estimation step is applied on the initialised models using Baum Welch technique. The proposed approaches for audio Visual model training have allowed for benefiting from both large extensive out of domain audio Databases that are available and the small audio Visual Database that is given for development, to create more accurate audio-Visual models.

J.n. Gowdy - One of the best experts on this subject based on the ideXlab platform.

  • Moving-Talker, Speaker-Independent Feature Study, and Baseline Results Using the CUAVE Multimodal Speech Corpus
    EURASIP Journal on Advances in Signal Processing, 2002
    Co-Authors: E.k. Patterson, S. Gurbuz, Z. Tufekci, J.n. Gowdy
    Abstract:

    Strides in computer technology and the search for deeper, more powerful techniques in signal processing have brought multimodal research to the forefront in recent years. Audio-Visual speech processing has become an important part of this research because it holds great potential for overcoming certain problems of traditional audio-only methods. Difficulties, due to background noise and multiple speakers in an application environment, are significantly reduced by the additional information provided by Visual features. This paper presents information on a new audio-Visual Database, a feature study on moving speakers, and on baseline results for the whole speaker group. Although a few Databases have been collected in this area, none has emerged as a standard for comparison. Also, efforts to date have often been limited, focusing on cropped video or stationary speakers. This paper seeks to introduce a challenging audio-Visual Database that is flexible and fairly comprehensive, yet easily available to researchers on one DVD. The Clemson University Audio-Visual Experiments (CUAVE) Database is a speaker-independent corpus of both connected and continuous digit strings totaling over 7000 utterances. It contains a wide variety of speakers and is designed to meet several goals discussed in this paper. One of these goals is to allow testing of adverse conditions such as moving talkers and speaker pairs. A feature study of connected digit strings is also discussed. It compares stationary and moving talkers in a speaker-independent grouping. An image-processing-based contour technique, an image transform method, and a deformable template scheme are used in this comparison to obtain Visual features. This paper also presents methods and results in an attempt to make these techniques more robust to speaker movement. Finally, initial baseline speaker-independent results are included using all speakers, and conclusions as well as suggested areas of research are given.

  • CUAVE: A new audio-Visual Database for multimodal human-computer interface research
    2002 IEEE International Conference on Acoustics Speech and Signal Processing, 2002
    Co-Authors: E.k. Patterson, S. Gurbuz, Z. Tufekci, J.n. Gowdy
    Abstract:

    Multimodal signal processing has become an important topic of research for overcoming certain problems of audio-only speech processing. Audio-Visual speech recognition is one area with great potential. Difficulties due to background noise and multiple speakers are significantly reduced by the additional information provided by extra Visual features. Despite a few efforts to create Databases in this area, none has emerged as a standard for comparison for several possible reasons. This paper seeks to introduce a new audioVisual Database that is flexible and fairly comprehensive, yet easily available to researchers on one DVD. The CUAVE Database is a speaker-independent corpus of over 7,000 utterances of both connected and isolated digits. It is designed to meet several goals that are discussed in this paper. The most notable are availability of the Database, flexibility for use of the audio-Visual data, and realistic considerations in the recordings (such as speaker movement). Another important focus of the Database is the inclusion of pairs of simultaneous speakers, the first documented Database of this kind. The overall goal of this project is to facilitate more widespread audio-Visual research through an easily available Database. For information on obtaining CUAVE, please visit our webpage (http://ece.clemson.edu/speech).

  • ICASSP - CUAVE: A new audio-Visual Database for multimodal human-computer interface research
    IEEE International Conference on Acoustics Speech and Signal Processing, 2002
    Co-Authors: Eric Patterson, S. Gurbuz, Z. Tufekci, J.n. Gowdy
    Abstract:

    Multimodal signal processing has become an important topic of research for overcoming certain problems of audio-only speech processing. Audio-Visual speech recognition is one area with great potential. Difficulties due to background noise and multiple speakers are significantly reduced by the additional information provided by extra Visual features. Despite a few efforts to create Databases in this area, none has emerged as a standard for comparison for several possible reasons. This paper seeks to introduce a new audioVisual Database that is flexible and fairly comprehensive, yet easily available to researchers on one DVD. The CUAVE Database is a speaker-independent corpus of over 7,000 utterances of both connected and isolated digits. It is designed to meet several goals that are discussed in this paper. The most notable are availability of the Database, flexibility for use of the audio-Visual data, and realistic considerations in the recordings (such as speaker movement). Another important focus of the Database is the inclusion of pairs of simultaneous speakers, the first documented Database of this kind. The overall goal of this project is to facilitate more widespread audio-Visual research through an easily available Database. For information on obtaining CUAVE, please visit our webpage (http://ece.clemson.edu/speech).