Lexical Information - Explore the Science & Experts

The Experts below are selected from a list of 4623 Experts worldwide ranked by ideXlab platform

Giuseppe Scanniello - One of the best experts on this subject based on the ideXlab platform.

ICSM - Clustering and Lexical Information support for the recovery of design pattern in source code

2011 27th IEEE International Conference on Software Maintenance (ICSM), 2011

Co-Authors: Simone Romano, Giuseppe Scanniello, Michele Risi, Carmine Gravino

Abstract:

We propose an approach that leverages Lexical Information and fuzzy clustering to reduce the number of the design pattern instances that existing approaches based on structural Information (i.e., navigating the dependencies among software elements) erroneously recover in source code. To assess the effectiveness of the techniques, we present the results of a case study conducted on four open source software systems implemented in java. The data analysis indicates that the use of Lexical Information and fuzzy clustering improves the correctness of the results achieved by existing design pattern recovery approaches based on structural Information, while preserving the number of design pattern instances correctly identified.

15 days free trial to Access Article
Investigating the use of Lexical Information for software system clustering

2011 15th European Conference on Software Maintenance and Reengineering, 2011

Co-Authors: Anna Corazza, Sergio Di Martino, Valerio Maggio, Giuseppe Scanniello

Abstract:

Developers have a lot of freedom in writing comments as well as in choosing identifiers and method names. These are intentional in nature and provide a different relevance of Information to understand what a software system implements, and in particular the role of each source file. In this paper we investigate the effectiveness of exploiting Lexical Information for software system clustering. In particular we explore the contribution of the combined use of six different dictionaries, corresponding to the six parts of the source code where programmers introduce Lexical Information, namely: class, attribute, method and parameter names, comments, and source code statements. Their relevance has been weighted by means of a probabilistic model, whose parameters have been estimated by the Expectation-Maximization algorithm. To group source files accordingly we used a hierarchical clustering algorithm. The investigation has been conducted on a dataset of 13 open source Java software systems.

15 days free trial to Access Article
CSMR - Investigating the use of Lexical Information for software system clustering

2011 15th European Conference on Software Maintenance and Reengineering, 2011

Co-Authors: Anna Corazza, Sergio Di Martino, Valerio Maggio, Giuseppe Scanniello

Abstract:

Developers have a lot of freedom in writing comments as well as in choosing identifiers and method names. These are intentional in nature and provide a different relevance of Information to understand what a software system implements, and in particular the role of each source file. In this paper we investigate the effectiveness of exploiting Lexical Information for software system clustering. In particular we explore the contribution of the combined use of six different dictionaries, corresponding to the six parts of the source code where programmers introduce Lexical Information, namely: class, attribute, method and parameter names, comments, and source code statements. Their relevance has been weighted by means of a probabilistic model, whose parameters have been estimated by the Expectation-Maximization algorithm. To group source files accordingly we used a hierarchical clustering algorithm. The investigation has been conducted on a dataset of 13 open source Java software systems.

15 days free trial to Access Article
Clustering and Lexical Information support for the recovery of design pattern in source code

2011 27th IEEE International Conference on Software Maintenance (ICSM), 2011

Co-Authors: Simone Romano, Giuseppe Scanniello, Michele Risi, Carmine Gravino

Abstract:

We propose an approach that leverages Lexical Information and fuzzy clustering to reduce the number of the design pattern instances that existing approaches based on structural Information (i.e., navigating the dependencies among software elements) erroneously recover in source code. To assess the effectiveness of the techniques, we present the results of a case study conducted on four open source software systems implemented in java. The data analysis indicates that the use of Lexical Information and fuzzy clustering improves the correctness of the results achieved by existing design pattern recovery approaches based on structural Information, while preserving the number of design pattern instances correctly identified.

15 days free trial to Access Article
A Probabilistic Based Approach towards Software System Clustering

2010 14th European Conference on Software Maintenance and Reengineering, 2010

Co-Authors: Anna Corazza, Sergio Di Martino, Giuseppe Scanniello

Abstract:

In this paper we present a clustering based approach to partition software systems into meaningful subsystems. In particular, the approach uses Lexical Information extracted from four zones in Java classes, which may provide a different contribution towards software systems partitioning. To automatically weigh these zones, we introduced a probabilistic model, and applied the Expectation-Maximization (EM) algorithm. To group classes according to the considered Lexical Information, we customized the well-known K-Medoids algorithm. To assess the approach and the implemented supporting system, we have conducted a case study on six open source software systems.

15 days free trial to Access Article

Anna Corazza - One of the best experts on this subject based on the ideXlab platform.

Investigating the use of Lexical Information for software system clustering

2011 15th European Conference on Software Maintenance and Reengineering, 2011

Co-Authors: Anna Corazza, Sergio Di Martino, Valerio Maggio, Giuseppe Scanniello

Abstract:

Developers have a lot of freedom in writing comments as well as in choosing identifiers and method names. These are intentional in nature and provide a different relevance of Information to understand what a software system implements, and in particular the role of each source file. In this paper we investigate the effectiveness of exploiting Lexical Information for software system clustering. In particular we explore the contribution of the combined use of six different dictionaries, corresponding to the six parts of the source code where programmers introduce Lexical Information, namely: class, attribute, method and parameter names, comments, and source code statements. Their relevance has been weighted by means of a probabilistic model, whose parameters have been estimated by the Expectation-Maximization algorithm. To group source files accordingly we used a hierarchical clustering algorithm. The investigation has been conducted on a dataset of 13 open source Java software systems.

15 days free trial to Access Article
CSMR - Investigating the use of Lexical Information for software system clustering

2011 15th European Conference on Software Maintenance and Reengineering, 2011

Co-Authors: Anna Corazza, Sergio Di Martino, Valerio Maggio, Giuseppe Scanniello

Abstract:

Developers have a lot of freedom in writing comments as well as in choosing identifiers and method names. These are intentional in nature and provide a different relevance of Information to understand what a software system implements, and in particular the role of each source file. In this paper we investigate the effectiveness of exploiting Lexical Information for software system clustering. In particular we explore the contribution of the combined use of six different dictionaries, corresponding to the six parts of the source code where programmers introduce Lexical Information, namely: class, attribute, method and parameter names, comments, and source code statements. Their relevance has been weighted by means of a probabilistic model, whose parameters have been estimated by the Expectation-Maximization algorithm. To group source files accordingly we used a hierarchical clustering algorithm. The investigation has been conducted on a dataset of 13 open source Java software systems.

15 days free trial to Access Article
A Probabilistic Based Approach towards Software System Clustering

2010 14th European Conference on Software Maintenance and Reengineering, 2010

Co-Authors: Anna Corazza, Sergio Di Martino, Giuseppe Scanniello

Abstract:

In this paper we present a clustering based approach to partition software systems into meaningful subsystems. In particular, the approach uses Lexical Information extracted from four zones in Java classes, which may provide a different contribution towards software systems partitioning. To automatically weigh these zones, we introduced a probabilistic model, and applied the Expectation-Maximization (EM) algorithm. To group classes according to the considered Lexical Information, we customized the well-known K-Medoids algorithm. To assess the approach and the implemented supporting system, we have conducted a case study on six open source software systems.

15 days free trial to Access Article
CSMR - A Probabilistic Based Approach towards Software System Clustering

2010 14th European Conference on Software Maintenance and Reengineering, 2010

Co-Authors: Anna Corazza, Sergio Di Martino, Giuseppe Scanniello

Abstract:

In this paper we present a clustering based approach to partition software systems into meaningful subsystems. In particular, the approach uses Lexical Information extracted from four zones in Java classes, which may provide a different contribution towards software systems partitioning. To automatically weigh these zones, we introduced a probabilistic model, and applied the Expectation-Maximization (EM) algorithm. To group classes according to the considered Lexical Information, we customized the well-known K-Medoids algorithm. To assess the approach and the implemented supporting system, we have conducted a case study on six open source software systems.

15 days free trial to Access Article

Sergio Di Martino - One of the best experts on this subject based on the ideXlab platform.

Investigating the use of Lexical Information for software system clustering

2011 15th European Conference on Software Maintenance and Reengineering, 2011

Co-Authors: Anna Corazza, Sergio Di Martino, Valerio Maggio, Giuseppe Scanniello

Abstract:

Developers have a lot of freedom in writing comments as well as in choosing identifiers and method names. These are intentional in nature and provide a different relevance of Information to understand what a software system implements, and in particular the role of each source file. In this paper we investigate the effectiveness of exploiting Lexical Information for software system clustering. In particular we explore the contribution of the combined use of six different dictionaries, corresponding to the six parts of the source code where programmers introduce Lexical Information, namely: class, attribute, method and parameter names, comments, and source code statements. Their relevance has been weighted by means of a probabilistic model, whose parameters have been estimated by the Expectation-Maximization algorithm. To group source files accordingly we used a hierarchical clustering algorithm. The investigation has been conducted on a dataset of 13 open source Java software systems.

15 days free trial to Access Article
CSMR - Investigating the use of Lexical Information for software system clustering

2011 15th European Conference on Software Maintenance and Reengineering, 2011

Co-Authors: Anna Corazza, Sergio Di Martino, Valerio Maggio, Giuseppe Scanniello

Abstract:

Developers have a lot of freedom in writing comments as well as in choosing identifiers and method names. These are intentional in nature and provide a different relevance of Information to understand what a software system implements, and in particular the role of each source file. In this paper we investigate the effectiveness of exploiting Lexical Information for software system clustering. In particular we explore the contribution of the combined use of six different dictionaries, corresponding to the six parts of the source code where programmers introduce Lexical Information, namely: class, attribute, method and parameter names, comments, and source code statements. Their relevance has been weighted by means of a probabilistic model, whose parameters have been estimated by the Expectation-Maximization algorithm. To group source files accordingly we used a hierarchical clustering algorithm. The investigation has been conducted on a dataset of 13 open source Java software systems.

15 days free trial to Access Article
A Probabilistic Based Approach towards Software System Clustering

2010 14th European Conference on Software Maintenance and Reengineering, 2010

Co-Authors: Anna Corazza, Sergio Di Martino, Giuseppe Scanniello

Abstract:

In this paper we present a clustering based approach to partition software systems into meaningful subsystems. In particular, the approach uses Lexical Information extracted from four zones in Java classes, which may provide a different contribution towards software systems partitioning. To automatically weigh these zones, we introduced a probabilistic model, and applied the Expectation-Maximization (EM) algorithm. To group classes according to the considered Lexical Information, we customized the well-known K-Medoids algorithm. To assess the approach and the implemented supporting system, we have conducted a case study on six open source software systems.

15 days free trial to Access Article
CSMR - A Probabilistic Based Approach towards Software System Clustering

2010 14th European Conference on Software Maintenance and Reengineering, 2010

Co-Authors: Anna Corazza, Sergio Di Martino, Giuseppe Scanniello

Abstract:

In this paper we present a clustering based approach to partition software systems into meaningful subsystems. In particular, the approach uses Lexical Information extracted from four zones in Java classes, which may provide a different contribution towards software systems partitioning. To automatically weigh these zones, we introduced a probabilistic model, and applied the Expectation-Maximization (EM) algorithm. To group classes according to the considered Lexical Information, we customized the well-known K-Medoids algorithm. To assess the approach and the implemented supporting system, we have conducted a case study on six open source software systems.

15 days free trial to Access Article

Shrikanth S Narayanan - One of the best experts on this subject based on the ideXlab platform.

speaker diarization with Lexical Information

Conference of the International Speech Communication Association, 2019

Co-Authors: Tae Jin Park, Jing Huang, Xiaodong He, Bowen Zhou, Panayiotis G Georgiou, Shrikanth S Narayanan

Abstract:

This work presents a novel approach for speaker diarization to leverage Lexical Information provided by automatic speech recognition. We propose a speaker diarization system that can incorporate word-level speaker turn probabilities with speaker embeddings into a speaker clustering process to improve the overall diarization accuracy. To integrate Lexical and acoustic Information in a comprehensive way during clustering, we introduce an adjacency matrix integration for spectral clustering. Since words and word boundary Information for word-level speaker turn probability estimation are provided by a speech recognition system, our proposed method works without any human intervention for manual transcriptions. We show that the proposed method improves diarization performance on various evaluation datasets compared to the baseline diarization system using acoustic Information only in speaker embeddings.

15 days free trial to Access Article
INTERSPEECH - Speaker Diarization with Lexical Information

Interspeech 2019, 2019

Co-Authors: Tae Jin Park, Jing Huang, Xiaodong He, Bowen Zhou, Panayiotis G Georgiou, Shrikanth S Narayanan

Abstract:

This work presents a novel approach for speaker diarization to leverage Lexical Information provided by automatic speech recognition. We propose a speaker diarization system that can incorporate word-level speaker turn probabilities with speaker embeddings into a speaker clustering process to improve the overall diarization accuracy. To integrate Lexical and acoustic Information in a comprehensive way during clustering, we introduce an adjacency matrix integration for spectral clustering. Since words and word boundary Information for word-level speaker turn probability estimation are provided by a speech recognition system, our proposed method works without any human intervention for manual transcriptions. We show that the proposed method improves diarization performance on various evaluation datasets compared to the baseline diarization system using acoustic Information only in speaker embeddings.

15 days free trial to Access Article

Elizabeth Shriberg - One of the best experts on this subject based on the ideXlab platform.

Using prosodic and Lexical Information for speaker identification

2002 IEEE International Conference on Acoustics Speech and Signal Processing, 2002

Co-Authors: Frederick Weber, Linda Manganaro, Barbara Peskin, Elizabeth Shriberg

Abstract:

We investigate the incorporation of larger time-scale Information, such as prosody, into standard speaker ID systems. Our study is based on the Extended Data Task of the NIST 2001 Speaker ID evaluation, which provides much more test and training data than has traditionally been available to similar speaker ID investigations. In addition, we have had access to a detailed prosodic feature database of Switchboard-I conversations, including data not previously applied to speaker ID. We describe two baseline acoustic systems, an approach using Gaussian Mixture Models, and an LVCSR-based speaker ID system. These results are compared to and combined with two larger time-scale systems: a system based on an “idiolect” language model. and a system making use of the contents of the prosody database. We find that, with sufficient test and training data, suprasegmental Information can significantly enhance the performance of traditional speaker ID systems.

15 days free trial to Access Article
ICASSP - Using prosodic and Lexical Information for speaker identification

IEEE International Conference on Acoustics Speech and Signal Processing, 2002

Co-Authors: Frederick Weber, Linda Manganaro, Barbara Peskin, Elizabeth Shriberg

Abstract:

We investigate the incorporation of larger time-scale Information, such as prosody, into standard speaker ID systems. Our study is based on the Extended Data Task of the NIST 2001 Speaker ID evaluation, which provides much more test and training data than has traditionally been available to similar speaker ID investigations. In addition, we have had access to a detailed prosodic feature database of Switchboard-I conversations, including data not previously applied to speaker ID. We describe two baseline acoustic systems, an approach using Gaussian Mixture Models, and an LVCSR-based speaker ID system. These results are compared to and combined with two larger time-scale systems: a system based on an “idiolect” language model. and a system making use of the contents of the prosody database. We find that, with sufficient test and training data, suprasegmental Information can significantly enhance the performance of traditional speaker ID systems.

15 days free trial to Access Article

Discover everything there is to know about the scientific topic Lexical Information with ideXlab!

Giuseppe Scanniello - One of the best experts on this subject based on the ideXlab platform.

ICSM - Clustering and Lexical Information support for the recovery of design pattern in source code

Investigating the use of Lexical Information for software system clustering

CSMR - Investigating the use of Lexical Information for software system clustering

Clustering and Lexical Information support for the recovery of design pattern in source code

A Probabilistic Based Approach towards Software System Clustering

Anna Corazza - One of the best experts on this subject based on the ideXlab platform.

Investigating the use of Lexical Information for software system clustering

CSMR - Investigating the use of Lexical Information for software system clustering

A Probabilistic Based Approach towards Software System Clustering

CSMR - A Probabilistic Based Approach towards Software System Clustering

Sergio Di Martino - One of the best experts on this subject based on the ideXlab platform.

Investigating the use of Lexical Information for software system clustering

CSMR - Investigating the use of Lexical Information for software system clustering

A Probabilistic Based Approach towards Software System Clustering

CSMR - A Probabilistic Based Approach towards Software System Clustering

Shrikanth S Narayanan - One of the best experts on this subject based on the ideXlab platform.

speaker diarization with Lexical Information

INTERSPEECH - Speaker Diarization with Lexical Information

Elizabeth Shriberg - One of the best experts on this subject based on the ideXlab platform.

Using prosodic and Lexical Information for speaker identification

ICASSP - Using prosodic and Lexical Information for speaker identification