Abstract Syntax

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 360 Experts worldwide ranked by ideXlab platform

Xin Xia - One of the best experts on this subject based on the ideXlab platform.

  • a differential testing approach for evaluating Abstract Syntax tree mapping algorithms
    International Conference on Software Engineering, 2021
    Co-Authors: Yuanrui Fan, Xin Xia, Ahmed E Hassan, Yuan Wang
    Abstract:

    Abstract Syntax tree (AST) mapping algorithms are widely used to analyze changes in source code. Despite the foundational role of AST mapping algorithms, little effort has been made to evaluate the accuracy of AST mapping algorithms, i.e., the extent to which an algorithm captures the evolution of code. We observe that a program element often has only one best-mapped program element. Based on this observation, we propose a hierarchical approach to automatically compare the similarity of mapped statements and tokens by different algorithms. By performing the comparison, we determine if each of the compared algorithms generates inaccurate mappings for a statement or its tokens. We invite 12 external experts to determine if three commonly used AST mapping algorithms generate accurate mappings for a statement and its tokens for 200 statements. Based on the experts' feedback, we observe that our approach achieves a precision of 0.98–1.00 and a recall of 0.65–0.75. Furthermore, we conduct a large-scale study with a dataset of ten Java projects containing a total of 263,165 file revisions. Our approach determines that GumTree, MTDiff and IJM generate inaccurate mappings for 20%–29%, 25%–36% and 21%–30% of the file revisions, respectively. Our experimental results show that state-of-the-art AST mapping algorithms still need improvements.

  • detecting code clones with graph neural network and flow augmented Abstract Syntax tree
    IEEE International Conference on Software Analysis Evolution and Reengineering, 2020
    Co-Authors: Wenhan Wang, Xin Xia, Zhi Jin
    Abstract:

    Code clones are semantically similar code fragments pairs that are syntactically similar or different. Detection of code clones can help to reduce the cost of software maintenance and prevent bugs. Numerous approaches of detecting code clones have been proposed previously, but most of them focus on detecting syntactic clones and do not work well on semantic clones with different syntactic features. To detect semantic clones, researchers have tried to adopt deep learning for code clone detection to automatically learn latent semantic features from data. Especially, to leverage grammar information, several approaches used Abstract Syntax trees (AST) as input and achieved significant progress on code clone benchmarks in various programming languages. However, these AST-based approaches still can not fully leverage the structural information of code fragments, especially semantic information such as control flow and data flow. To leverage control and data flow information, in this paper, we build a graph representation of programs called flow-augmented Abstract Syntax tree (FA-AST). We construct FA-AST by augmenting original ASTs with explicit control and data flow edges. Then we apply two different types of graph neural networks (GNN) on FA-AST to measure the similarity of code pairs. As far as we have concerned, we are the first to apply graph neural networks on the domain of code clone detection. We apply our FA-AST and graph neural networks on two Java datasets: Google Code Jam and BigCloneBench. Our approach outperforms the state-of-the-art approaches on both Google Code Jam and BigCloneBench tasks.

Yuan Wang - One of the best experts on this subject based on the ideXlab platform.

  • a differential testing approach for evaluating Abstract Syntax tree mapping algorithms
    International Conference on Software Engineering, 2021
    Co-Authors: Yuanrui Fan, Xin Xia, Ahmed E Hassan, Yuan Wang
    Abstract:

    Abstract Syntax tree (AST) mapping algorithms are widely used to analyze changes in source code. Despite the foundational role of AST mapping algorithms, little effort has been made to evaluate the accuracy of AST mapping algorithms, i.e., the extent to which an algorithm captures the evolution of code. We observe that a program element often has only one best-mapped program element. Based on this observation, we propose a hierarchical approach to automatically compare the similarity of mapped statements and tokens by different algorithms. By performing the comparison, we determine if each of the compared algorithms generates inaccurate mappings for a statement or its tokens. We invite 12 external experts to determine if three commonly used AST mapping algorithms generate accurate mappings for a statement and its tokens for 200 statements. Based on the experts' feedback, we observe that our approach achieves a precision of 0.98–1.00 and a recall of 0.65–0.75. Furthermore, we conduct a large-scale study with a dataset of ten Java projects containing a total of 263,165 file revisions. Our approach determines that GumTree, MTDiff and IJM generate inaccurate mappings for 20%–29%, 25%–36% and 21%–30% of the file revisions, respectively. Our experimental results show that state-of-the-art AST mapping algorithms still need improvements.

Zhaoyi Liu - One of the best experts on this subject based on the ideXlab platform.

  • code summarization with Abstract Syntax tree
    International Conference on Neural Information Processing, 2019
    Co-Authors: Qiuyuan Chen, Zhaoyi Liu
    Abstract:

    Code summarization, which provides a high-level description of the function implemented by code, plays a vital role in software maintenance and code retrieval. Traditional approaches focus on retrieving similar code snippets to generate summaries, and recently researchers pay increasing attention to leverage deep learning approaches, especially the encoder-decoder framework. Approaches based on encoder-decoder suffer from two drawbacks: (a) Lack of summarization in functionality level; (b) Code snippets are always too long (more than ten words), regular encoders perform poorly. In this paper, we propose a novel code representation with the help of Abstract Syntax Trees, which could describe the functionality of code snippets and shortens the length of inputs. Based on our proposed code representation, we develop Generative Task, which aims to generate summary sentences of code snippets. Experiments on large-scale real-world industrial Java projects indicate that our approaches are effective and outperform the state-of-the-art approaches in code summarization.

L Bier - One of the best experts on this subject based on the ideXlab platform.

  • clone detection using Abstract Syntax trees
    International Conference on Software Maintenance, 1998
    Co-Authors: Ira D Baxter, A Yahin, Leonardo De Moura, Marcelo Santanna, L Bier
    Abstract:

    Existing research suggests that a considerable fraction (5-10%) of the source code of large scale computer programs is duplicate code ("clones"). Detection and removal of such clones promises decreased software maintenance costs of possibly the same magnitude. Previous work was limited to detection of either near misses differing only in single lexems, or near misses only between complete functions. The paper presents simple and practical methods for detecting exact and near miss clones over arbitrary program fragments in program source code by using Abstract Syntax trees. Previous work also did not suggest practical means for removing detected clones. Since our methods operate in terms of the program structure, clones could be removed by mechanical methods producing in-lined procedures or standard preprocessor macros. A tool using these techniques is applied to a C production software system of some 400 K source lines, and the results confirm detected levels of duplication found by previous work. The tool produces macro bodies needed for clone removal, and macro invocations to replace the clones. The tool uses a variation of the well known compiler method for detecting common sub expressions. This method determines exact tree matches; a number of adjustments are needed to detect equivalent statement sequences, commutative operands, and nearly exact matches. We additionally suggest that clone detection could also be useful in producing more structured code, and in reverse engineering to discover domain concepts and their implementations.

Yuanrui Fan - One of the best experts on this subject based on the ideXlab platform.

  • a differential testing approach for evaluating Abstract Syntax tree mapping algorithms
    International Conference on Software Engineering, 2021
    Co-Authors: Yuanrui Fan, Xin Xia, Ahmed E Hassan, Yuan Wang
    Abstract:

    Abstract Syntax tree (AST) mapping algorithms are widely used to analyze changes in source code. Despite the foundational role of AST mapping algorithms, little effort has been made to evaluate the accuracy of AST mapping algorithms, i.e., the extent to which an algorithm captures the evolution of code. We observe that a program element often has only one best-mapped program element. Based on this observation, we propose a hierarchical approach to automatically compare the similarity of mapped statements and tokens by different algorithms. By performing the comparison, we determine if each of the compared algorithms generates inaccurate mappings for a statement or its tokens. We invite 12 external experts to determine if three commonly used AST mapping algorithms generate accurate mappings for a statement and its tokens for 200 statements. Based on the experts' feedback, we observe that our approach achieves a precision of 0.98–1.00 and a recall of 0.65–0.75. Furthermore, we conduct a large-scale study with a dataset of ten Java projects containing a total of 263,165 file revisions. Our approach determines that GumTree, MTDiff and IJM generate inaccurate mappings for 20%–29%, 25%–36% and 21%–30% of the file revisions, respectively. Our experimental results show that state-of-the-art AST mapping algorithms still need improvements.