Lexical Information

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 4623 Experts worldwide ranked by ideXlab platform

Giuseppe Scanniello - One of the best experts on this subject based on the ideXlab platform.

  • ICSM - Clustering and Lexical Information support for the recovery of design pattern in source code
    2011 27th IEEE International Conference on Software Maintenance (ICSM), 2011
    Co-Authors: Simone Romano, Giuseppe Scanniello, Michele Risi, Carmine Gravino
    Abstract:

    We propose an approach that leverages Lexical Information and fuzzy clustering to reduce the number of the design pattern instances that existing approaches based on structural Information (i.e., navigating the dependencies among software elements) erroneously recover in source code. To assess the effectiveness of the techniques, we present the results of a case study conducted on four open source software systems implemented in java. The data analysis indicates that the use of Lexical Information and fuzzy clustering improves the correctness of the results achieved by existing design pattern recovery approaches based on structural Information, while preserving the number of design pattern instances correctly identified.

  • Investigating the use of Lexical Information for software system clustering
    2011 15th European Conference on Software Maintenance and Reengineering, 2011
    Co-Authors: Anna Corazza, Sergio Di Martino, Valerio Maggio, Giuseppe Scanniello
    Abstract:

    Developers have a lot of freedom in writing comments as well as in choosing identifiers and method names. These are intentional in nature and provide a different relevance of Information to understand what a software system implements, and in particular the role of each source file. In this paper we investigate the effectiveness of exploiting Lexical Information for software system clustering. In particular we explore the contribution of the combined use of six different dictionaries, corresponding to the six parts of the source code where programmers introduce Lexical Information, namely: class, attribute, method and parameter names, comments, and source code statements. Their relevance has been weighted by means of a probabilistic model, whose parameters have been estimated by the Expectation-Maximization algorithm. To group source files accordingly we used a hierarchical clustering algorithm. The investigation has been conducted on a dataset of 13 open source Java software systems.

  • CSMR - Investigating the use of Lexical Information for software system clustering
    2011 15th European Conference on Software Maintenance and Reengineering, 2011
    Co-Authors: Anna Corazza, Sergio Di Martino, Valerio Maggio, Giuseppe Scanniello
    Abstract:

    Developers have a lot of freedom in writing comments as well as in choosing identifiers and method names. These are intentional in nature and provide a different relevance of Information to understand what a software system implements, and in particular the role of each source file. In this paper we investigate the effectiveness of exploiting Lexical Information for software system clustering. In particular we explore the contribution of the combined use of six different dictionaries, corresponding to the six parts of the source code where programmers introduce Lexical Information, namely: class, attribute, method and parameter names, comments, and source code statements. Their relevance has been weighted by means of a probabilistic model, whose parameters have been estimated by the Expectation-Maximization algorithm. To group source files accordingly we used a hierarchical clustering algorithm. The investigation has been conducted on a dataset of 13 open source Java software systems.

  • Clustering and Lexical Information support for the recovery of design pattern in source code
    2011 27th IEEE International Conference on Software Maintenance (ICSM), 2011
    Co-Authors: Simone Romano, Giuseppe Scanniello, Michele Risi, Carmine Gravino
    Abstract:

    We propose an approach that leverages Lexical Information and fuzzy clustering to reduce the number of the design pattern instances that existing approaches based on structural Information (i.e., navigating the dependencies among software elements) erroneously recover in source code. To assess the effectiveness of the techniques, we present the results of a case study conducted on four open source software systems implemented in java. The data analysis indicates that the use of Lexical Information and fuzzy clustering improves the correctness of the results achieved by existing design pattern recovery approaches based on structural Information, while preserving the number of design pattern instances correctly identified.

  • A Probabilistic Based Approach towards Software System Clustering
    2010 14th European Conference on Software Maintenance and Reengineering, 2010
    Co-Authors: Anna Corazza, Sergio Di Martino, Giuseppe Scanniello
    Abstract:

    In this paper we present a clustering based approach to partition software systems into meaningful subsystems. In particular, the approach uses Lexical Information extracted from four zones in Java classes, which may provide a different contribution towards software systems partitioning. To automatically weigh these zones, we introduced a probabilistic model, and applied the Expectation-Maximization (EM) algorithm. To group classes according to the considered Lexical Information, we customized the well-known K-Medoids algorithm. To assess the approach and the implemented supporting system, we have conducted a case study on six open source software systems.

Anna Corazza - One of the best experts on this subject based on the ideXlab platform.

  • Investigating the use of Lexical Information for software system clustering
    2011 15th European Conference on Software Maintenance and Reengineering, 2011
    Co-Authors: Anna Corazza, Sergio Di Martino, Valerio Maggio, Giuseppe Scanniello
    Abstract:

    Developers have a lot of freedom in writing comments as well as in choosing identifiers and method names. These are intentional in nature and provide a different relevance of Information to understand what a software system implements, and in particular the role of each source file. In this paper we investigate the effectiveness of exploiting Lexical Information for software system clustering. In particular we explore the contribution of the combined use of six different dictionaries, corresponding to the six parts of the source code where programmers introduce Lexical Information, namely: class, attribute, method and parameter names, comments, and source code statements. Their relevance has been weighted by means of a probabilistic model, whose parameters have been estimated by the Expectation-Maximization algorithm. To group source files accordingly we used a hierarchical clustering algorithm. The investigation has been conducted on a dataset of 13 open source Java software systems.

  • CSMR - Investigating the use of Lexical Information for software system clustering
    2011 15th European Conference on Software Maintenance and Reengineering, 2011
    Co-Authors: Anna Corazza, Sergio Di Martino, Valerio Maggio, Giuseppe Scanniello
    Abstract:

    Developers have a lot of freedom in writing comments as well as in choosing identifiers and method names. These are intentional in nature and provide a different relevance of Information to understand what a software system implements, and in particular the role of each source file. In this paper we investigate the effectiveness of exploiting Lexical Information for software system clustering. In particular we explore the contribution of the combined use of six different dictionaries, corresponding to the six parts of the source code where programmers introduce Lexical Information, namely: class, attribute, method and parameter names, comments, and source code statements. Their relevance has been weighted by means of a probabilistic model, whose parameters have been estimated by the Expectation-Maximization algorithm. To group source files accordingly we used a hierarchical clustering algorithm. The investigation has been conducted on a dataset of 13 open source Java software systems.

  • A Probabilistic Based Approach towards Software System Clustering
    2010 14th European Conference on Software Maintenance and Reengineering, 2010
    Co-Authors: Anna Corazza, Sergio Di Martino, Giuseppe Scanniello
    Abstract:

    In this paper we present a clustering based approach to partition software systems into meaningful subsystems. In particular, the approach uses Lexical Information extracted from four zones in Java classes, which may provide a different contribution towards software systems partitioning. To automatically weigh these zones, we introduced a probabilistic model, and applied the Expectation-Maximization (EM) algorithm. To group classes according to the considered Lexical Information, we customized the well-known K-Medoids algorithm. To assess the approach and the implemented supporting system, we have conducted a case study on six open source software systems.

  • CSMR - A Probabilistic Based Approach towards Software System Clustering
    2010 14th European Conference on Software Maintenance and Reengineering, 2010
    Co-Authors: Anna Corazza, Sergio Di Martino, Giuseppe Scanniello
    Abstract:

    In this paper we present a clustering based approach to partition software systems into meaningful subsystems. In particular, the approach uses Lexical Information extracted from four zones in Java classes, which may provide a different contribution towards software systems partitioning. To automatically weigh these zones, we introduced a probabilistic model, and applied the Expectation-Maximization (EM) algorithm. To group classes according to the considered Lexical Information, we customized the well-known K-Medoids algorithm. To assess the approach and the implemented supporting system, we have conducted a case study on six open source software systems.

Sergio Di Martino - One of the best experts on this subject based on the ideXlab platform.

  • Investigating the use of Lexical Information for software system clustering
    2011 15th European Conference on Software Maintenance and Reengineering, 2011
    Co-Authors: Anna Corazza, Sergio Di Martino, Valerio Maggio, Giuseppe Scanniello
    Abstract:

    Developers have a lot of freedom in writing comments as well as in choosing identifiers and method names. These are intentional in nature and provide a different relevance of Information to understand what a software system implements, and in particular the role of each source file. In this paper we investigate the effectiveness of exploiting Lexical Information for software system clustering. In particular we explore the contribution of the combined use of six different dictionaries, corresponding to the six parts of the source code where programmers introduce Lexical Information, namely: class, attribute, method and parameter names, comments, and source code statements. Their relevance has been weighted by means of a probabilistic model, whose parameters have been estimated by the Expectation-Maximization algorithm. To group source files accordingly we used a hierarchical clustering algorithm. The investigation has been conducted on a dataset of 13 open source Java software systems.

  • CSMR - Investigating the use of Lexical Information for software system clustering
    2011 15th European Conference on Software Maintenance and Reengineering, 2011
    Co-Authors: Anna Corazza, Sergio Di Martino, Valerio Maggio, Giuseppe Scanniello
    Abstract:

    Developers have a lot of freedom in writing comments as well as in choosing identifiers and method names. These are intentional in nature and provide a different relevance of Information to understand what a software system implements, and in particular the role of each source file. In this paper we investigate the effectiveness of exploiting Lexical Information for software system clustering. In particular we explore the contribution of the combined use of six different dictionaries, corresponding to the six parts of the source code where programmers introduce Lexical Information, namely: class, attribute, method and parameter names, comments, and source code statements. Their relevance has been weighted by means of a probabilistic model, whose parameters have been estimated by the Expectation-Maximization algorithm. To group source files accordingly we used a hierarchical clustering algorithm. The investigation has been conducted on a dataset of 13 open source Java software systems.

  • A Probabilistic Based Approach towards Software System Clustering
    2010 14th European Conference on Software Maintenance and Reengineering, 2010
    Co-Authors: Anna Corazza, Sergio Di Martino, Giuseppe Scanniello
    Abstract:

    In this paper we present a clustering based approach to partition software systems into meaningful subsystems. In particular, the approach uses Lexical Information extracted from four zones in Java classes, which may provide a different contribution towards software systems partitioning. To automatically weigh these zones, we introduced a probabilistic model, and applied the Expectation-Maximization (EM) algorithm. To group classes according to the considered Lexical Information, we customized the well-known K-Medoids algorithm. To assess the approach and the implemented supporting system, we have conducted a case study on six open source software systems.

  • CSMR - A Probabilistic Based Approach towards Software System Clustering
    2010 14th European Conference on Software Maintenance and Reengineering, 2010
    Co-Authors: Anna Corazza, Sergio Di Martino, Giuseppe Scanniello
    Abstract:

    In this paper we present a clustering based approach to partition software systems into meaningful subsystems. In particular, the approach uses Lexical Information extracted from four zones in Java classes, which may provide a different contribution towards software systems partitioning. To automatically weigh these zones, we introduced a probabilistic model, and applied the Expectation-Maximization (EM) algorithm. To group classes according to the considered Lexical Information, we customized the well-known K-Medoids algorithm. To assess the approach and the implemented supporting system, we have conducted a case study on six open source software systems.

Shrikanth S Narayanan - One of the best experts on this subject based on the ideXlab platform.

  • speaker diarization with Lexical Information
    Conference of the International Speech Communication Association, 2019
    Co-Authors: Tae Jin Park, Jing Huang, Xiaodong He, Bowen Zhou, Panayiotis G Georgiou, Shrikanth S Narayanan
    Abstract:

    This work presents a novel approach for speaker diarization to leverage Lexical Information provided by automatic speech recognition. We propose a speaker diarization system that can incorporate word-level speaker turn probabilities with speaker embeddings into a speaker clustering process to improve the overall diarization accuracy. To integrate Lexical and acoustic Information in a comprehensive way during clustering, we introduce an adjacency matrix integration for spectral clustering. Since words and word boundary Information for word-level speaker turn probability estimation are provided by a speech recognition system, our proposed method works without any human intervention for manual transcriptions. We show that the proposed method improves diarization performance on various evaluation datasets compared to the baseline diarization system using acoustic Information only in speaker embeddings.

  • INTERSPEECH - Speaker Diarization with Lexical Information
    Interspeech 2019, 2019
    Co-Authors: Tae Jin Park, Jing Huang, Xiaodong He, Bowen Zhou, Panayiotis G Georgiou, Shrikanth S Narayanan
    Abstract:

    This work presents a novel approach for speaker diarization to leverage Lexical Information provided by automatic speech recognition. We propose a speaker diarization system that can incorporate word-level speaker turn probabilities with speaker embeddings into a speaker clustering process to improve the overall diarization accuracy. To integrate Lexical and acoustic Information in a comprehensive way during clustering, we introduce an adjacency matrix integration for spectral clustering. Since words and word boundary Information for word-level speaker turn probability estimation are provided by a speech recognition system, our proposed method works without any human intervention for manual transcriptions. We show that the proposed method improves diarization performance on various evaluation datasets compared to the baseline diarization system using acoustic Information only in speaker embeddings.

Elizabeth Shriberg - One of the best experts on this subject based on the ideXlab platform.

  • Using prosodic and Lexical Information for speaker identification
    2002 IEEE International Conference on Acoustics Speech and Signal Processing, 2002
    Co-Authors: Frederick Weber, Linda Manganaro, Barbara Peskin, Elizabeth Shriberg
    Abstract:

    We investigate the incorporation of larger time-scale Information, such as prosody, into standard speaker ID systems. Our study is based on the Extended Data Task of the NIST 2001 Speaker ID evaluation, which provides much more test and training data than has traditionally been available to similar speaker ID investigations. In addition, we have had access to a detailed prosodic feature database of Switchboard-I conversations, including data not previously applied to speaker ID. We describe two baseline acoustic systems, an approach using Gaussian Mixture Models, and an LVCSR-based speaker ID system. These results are compared to and combined with two larger time-scale systems: a system based on an “idiolect” language model. and a system making use of the contents of the prosody database. We find that, with sufficient test and training data, suprasegmental Information can significantly enhance the performance of traditional speaker ID systems.

  • ICASSP - Using prosodic and Lexical Information for speaker identification
    IEEE International Conference on Acoustics Speech and Signal Processing, 2002
    Co-Authors: Frederick Weber, Linda Manganaro, Barbara Peskin, Elizabeth Shriberg
    Abstract:

    We investigate the incorporation of larger time-scale Information, such as prosody, into standard speaker ID systems. Our study is based on the Extended Data Task of the NIST 2001 Speaker ID evaluation, which provides much more test and training data than has traditionally been available to similar speaker ID investigations. In addition, we have had access to a detailed prosodic feature database of Switchboard-I conversations, including data not previously applied to speaker ID. We describe two baseline acoustic systems, an approach using Gaussian Mixture Models, and an LVCSR-based speaker ID system. These results are compared to and combined with two larger time-scale systems: a system based on an “idiolect” language model. and a system making use of the contents of the prosody database. We find that, with sufficient test and training data, suprasegmental Information can significantly enhance the performance of traditional speaker ID systems.