Undefined Behavior

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 270 Experts worldwide ranked by ideXlab platform

John Regehr - One of the best experts on this subject based on the ideXlab platform.

  • Random testing for C and C++ compilers with YARPGen
    Proceedings of the ACM on Programming Languages, 2020
    Co-Authors: Vsevolod Livinskii, Babokin Dmitry Yurievich, John Regehr
    Abstract:

    Compilers should not crash and they should not miscompile applications. Random testing is an effective method for finding compiler bugs that have escaped other kinds of testing. This paper presents Yet Another Random Program Generator (YARPGen), a random test-case generator for C and C++ that we used to find and report more than 220 bugs in GCC, LLVM, and the Intel® C++ Compiler. Our research contributions include a method for generating expressive programs that avoid Undefined Behavior without using dynamic checks, and generation policies, a mechanism for increasing diversity of generated code and for triggering more optimizations. Generation policies decrease the testing time to find hard-to-trigger compiler bugs and, for the kinds of scalar optimizations YARPGen was designed to stress-test, increase the number of times these optimizations are applied by the compiler by an average of 20% for LLVM and 40% for GCC. We also created tools for automating most of the common tasks related to compiler fuzzing; these tools are also useful for fuzzers other than ours.

  • Practical verification of peephole optimizations with Alive
    Communications of The ACM, 2018
    Co-Authors: Nuno P. Lopes, David Menendez, Santosh Nagarakatte, John Regehr
    Abstract:

    Compilers should not miscompile. Peephole optimizations, which perform local rewriting of the input program to improve the efficiency of generated code, are a persistent source of compiler bugs. We created Alive, a domain-specific language for writing optimizations and for automatically either proving them correct or else generating counterexamples. Furthermore, Alive can be automatically translated into C++ code that is suitable for inclusion in an LLVM optimization pass. Alive is based on an attempt to balance usability and formal methods; for example, it captures---but largely hides---the detailed semantics of the various kinds of Undefined Behavior. Alive has found numerous bugs in the LLVM compiler and is being used by LLVM developers.

  • Taming Undefined Behavior in LLVM
    ACM SIGPLAN Notices, 2017
    Co-Authors: Juneyoung Lee, Yoonseung Kim, Youngju Song, Chung-kil Hur, Sanjoy Das, David Majnemer, John Regehr, Nuno P. Lopes
    Abstract:

    A central concern for an optimizing compiler is the design of its intermediate representation (IR) for code. The IR should make it easy to perform transformations, and should also afford efficient and precise static analysis. In this paper we study an aspect of IR design that has received little attention: the role of Undefined Behavior. The IR for every optimizing compiler we have looked at, including GCC, LLVM, Intel's, and Microsoft's, supports one or more forms of Undefined Behavior (UB), not only to reflect the semantics of UB-heavy programming languages such as C and C++, but also to model inherently unsafe low-level operations such as memory stores and to avoid over-constraining IR semantics to the point that desirable transformations become illegal. The current semantics of LLVM's IR fails to justify some cases of loop unswitching, global value numbering, and other important "textbook" optimizations, causing long-standing bugs. We present solutions to the problems we have identified in LLVM's IR and show that most optimizations currently in LLVM remain sound, and that some desirable new transformations become permissible. Our solutions do not degrade compile time or performance of generated code.

  • PLDI - Taming Undefined Behavior in LLVM
    Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation, 2017
    Co-Authors: Juneyoung Lee, Yoonseung Kim, Youngju Song, Chung-kil Hur, Sanjoy Das, David Majnemer, John Regehr, Nuno P. Lopes
    Abstract:

    A central concern for an optimizing compiler is the design of its intermediate representation (IR) for code. The IR should make it easy to perform transformations, and should also afford efficient and precise static analysis. In this paper we study an aspect of IR design that has received little attention: the role of Undefined Behavior. The IR for every optimizing compiler we have looked at, including GCC, LLVM, Intel's, and Microsoft's, supports one or more forms of Undefined Behavior (UB), not only to reflect the semantics of UB-heavy programming languages such as C and C++, but also to model inherently unsafe low-level operations such as memory stores and to avoid over-constraining IR semantics to the point that desirable transformations become illegal. The current semantics of LLVM's IR fails to justify some cases of loop unswitching, global value numbering, and other important "textbook" optimizations, causing long-standing bugs. We present solutions to the problems we have identified in LLVM's IR and show that most optimizations currently in LLVM remain sound, and that some desirable new transformations become permissible. Our solutions do not degrade compile time or performance of generated code.

  • Provably correct peephole optimizations with alive
    ACM SIGPLAN Notices, 2015
    Co-Authors: Nuno P. Lopes, David Menendez, Santosh Nagarakatte, John Regehr
    Abstract:

    Compilers should not miscompile. Our work addresses problems in developing peephole optimizations that perform local rewriting to improve the efficiency of LLVM code. These optimizations are individually difficult to get right, particularly in the presence of Undefined Behavior; taken together they represent a persistent source of bugs. This paper presents Alive, a domain-specific language for writing optimizations and for automatically either proving them correct or else generating counterexamples. Furthermore, Alive can be automatically translated into C++ code that is suitable for inclusion in an LLVM optimization pass. Alive is based on an attempt to balance usability and formal methods; for example, it captures---but largely hides---the detailed semantics of three different kinds of Undefined Behavior in LLVM. We have translated more than 300 LLVM optimizations into Alive and, in the process, found that eight of them were wrong.

Grigore Rosu - One of the best experts on this subject based on the ideXlab platform.

  • Dealing With C's Original Sin
    IEEE Software, 2019
    Co-Authors: Chris Hathhorn, Grigore Rosu
    Abstract:

    In the very early days of C, the compiler written by Dennis Ritchie and supplied with the UNIX operating system entirely defined the language. As the number of users and C implementations grew, however, so too did the need for a language standard-a contract between users and implementers about what should and should not count as C. This effort began in 1983 with the formation of a committee tasked with producing "an unambiguous and machine-independent definition of the language C" and led to the ANSI C Standard in 1989.1 In retrospect, it was not until this date, 17 years after the first compiler, when C's most notorious language feature slithered into the world: Undefined Behavior .

  • RV - Runtime Verification at Work: A Tutorial
    Runtime Verification, 2016
    Co-Authors: Philip Daian, Chris Hathhorn, Dwight Guth, Manasvi Saxena, Edgar Pek, Traian Florin Şerbănuţă, Grigore Rosu
    Abstract:

    We present a suite of runtime verification tools developed by Runtime Verification Inc.: RV-Match, RV-Predict, and RV-Monitor. RV-Match is a tool for checking C programs for Undefined Behavior and other common programmer mistakes. It is extracted from the most complete formal semantics of the C11 language and beats many similar tools in its ability to catch a broad range of undesirable Behaviors. RV-Predict is a dynamic data race detector for Java and C/C++ programs. It is perhaps the only tool that is both sound and maximal: it only reports real races and it can find all races that can be found by any other sound data race detector analyzing the same execution trace. RV-Monitor is a runtime monitoring tool that checks and enforces safety and security properties during program execution. Our tools focus on reporting no false positives and are free for non-commercial use.

  • CAV (1) - RV-match: Practical semantics-based program analysis
    Computer Aided Verification, 2016
    Co-Authors: Dwight Guth, Chris Hathhorn, Manasvi Saxena, Grigore Rosu
    Abstract:

    We present RV-Match, a tool for checking C programs for Undefined Behavior and other common programmer mistakes. Our tool is extracted from the most complete formal semantics of the C11 language. Previous versions of this tool were used primarily for testing the correctness of the semantics, but we have improved it into a tool for doing practical analysis of real C programs. It beats many similar tools in its ability to catch a broad range of undesirable Behaviors. We demonstrate this with comparisons based on a third-party benchmark.

  • PLDI - Defining the Undefinedness of C
    Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation, 2015
    Co-Authors: Chris Hathhorn, Chucky Ellison, Grigore Rosu
    Abstract:

    We present a ``negative'' semantics of the C11 language---a semantics that does not just give meaning to correct programs, but also rejects Undefined programs. We investigate Undefined Behavior in C and discuss the techniques and special considerations needed for formally specifying it. We have used these techniques to modify and extend a semantics of C into one that captures Undefined Behavior. The amount of semantic infrastructure and effort required to achieve this was unexpectedly high, in the end nearly doubling the size of the original semantics. From our semantics, we have automatically extracted an Undefinedness checker, which we evaluate against other popular analysis tools, using our own test suite in addition to a third-party test suite. Our checker is capable of detecting examples of all 77 categories of core language Undefinedness appearing in the C11 standard, more than any other tool we considered. Based on this evaluation, we argue that our work is the most comprehensive and complete semantic treatment of Undefined Behavior in C, and thus of the C language itself.

Nuno P. Lopes - One of the best experts on this subject based on the ideXlab platform.

  • Practical verification of peephole optimizations with Alive
    Communications of The ACM, 2018
    Co-Authors: Nuno P. Lopes, David Menendez, Santosh Nagarakatte, John Regehr
    Abstract:

    Compilers should not miscompile. Peephole optimizations, which perform local rewriting of the input program to improve the efficiency of generated code, are a persistent source of compiler bugs. We created Alive, a domain-specific language for writing optimizations and for automatically either proving them correct or else generating counterexamples. Furthermore, Alive can be automatically translated into C++ code that is suitable for inclusion in an LLVM optimization pass. Alive is based on an attempt to balance usability and formal methods; for example, it captures---but largely hides---the detailed semantics of the various kinds of Undefined Behavior. Alive has found numerous bugs in the LLVM compiler and is being used by LLVM developers.

  • Taming Undefined Behavior in LLVM
    ACM SIGPLAN Notices, 2017
    Co-Authors: Juneyoung Lee, Yoonseung Kim, Youngju Song, Chung-kil Hur, Sanjoy Das, David Majnemer, John Regehr, Nuno P. Lopes
    Abstract:

    A central concern for an optimizing compiler is the design of its intermediate representation (IR) for code. The IR should make it easy to perform transformations, and should also afford efficient and precise static analysis. In this paper we study an aspect of IR design that has received little attention: the role of Undefined Behavior. The IR for every optimizing compiler we have looked at, including GCC, LLVM, Intel's, and Microsoft's, supports one or more forms of Undefined Behavior (UB), not only to reflect the semantics of UB-heavy programming languages such as C and C++, but also to model inherently unsafe low-level operations such as memory stores and to avoid over-constraining IR semantics to the point that desirable transformations become illegal. The current semantics of LLVM's IR fails to justify some cases of loop unswitching, global value numbering, and other important "textbook" optimizations, causing long-standing bugs. We present solutions to the problems we have identified in LLVM's IR and show that most optimizations currently in LLVM remain sound, and that some desirable new transformations become permissible. Our solutions do not degrade compile time or performance of generated code.

  • PLDI - Taming Undefined Behavior in LLVM
    Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation, 2017
    Co-Authors: Juneyoung Lee, Yoonseung Kim, Youngju Song, Chung-kil Hur, Sanjoy Das, David Majnemer, John Regehr, Nuno P. Lopes
    Abstract:

    A central concern for an optimizing compiler is the design of its intermediate representation (IR) for code. The IR should make it easy to perform transformations, and should also afford efficient and precise static analysis. In this paper we study an aspect of IR design that has received little attention: the role of Undefined Behavior. The IR for every optimizing compiler we have looked at, including GCC, LLVM, Intel's, and Microsoft's, supports one or more forms of Undefined Behavior (UB), not only to reflect the semantics of UB-heavy programming languages such as C and C++, but also to model inherently unsafe low-level operations such as memory stores and to avoid over-constraining IR semantics to the point that desirable transformations become illegal. The current semantics of LLVM's IR fails to justify some cases of loop unswitching, global value numbering, and other important "textbook" optimizations, causing long-standing bugs. We present solutions to the problems we have identified in LLVM's IR and show that most optimizations currently in LLVM remain sound, and that some desirable new transformations become permissible. Our solutions do not degrade compile time or performance of generated code.

  • Provably correct peephole optimizations with alive
    ACM SIGPLAN Notices, 2015
    Co-Authors: Nuno P. Lopes, David Menendez, Santosh Nagarakatte, John Regehr
    Abstract:

    Compilers should not miscompile. Our work addresses problems in developing peephole optimizations that perform local rewriting to improve the efficiency of LLVM code. These optimizations are individually difficult to get right, particularly in the presence of Undefined Behavior; taken together they represent a persistent source of bugs. This paper presents Alive, a domain-specific language for writing optimizations and for automatically either proving them correct or else generating counterexamples. Furthermore, Alive can be automatically translated into C++ code that is suitable for inclusion in an LLVM optimization pass. Alive is based on an attempt to balance usability and formal methods; for example, it captures---but largely hides---the detailed semantics of three different kinds of Undefined Behavior in LLVM. We have translated more than 300 LLVM optimizations into Alive and, in the process, found that eight of them were wrong.

  • PLDI - Provably correct peephole optimizations with alive
    Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation - PLDI 2015, 2015
    Co-Authors: Nuno P. Lopes, David Menendez, Santosh Nagarakatte, John Regehr
    Abstract:

    Compilers should not miscompile. Our work addresses problems in developing peephole optimizations that perform local rewriting to improve the efficiency of LLVM code. These optimizations are individually difficult to get right, particularly in the presence of Undefined Behavior; taken together they represent a persistent source of bugs. This paper presents Alive, a domain-specific language for writing optimizations and for automatically either proving them correct or else generating counterexamples. Furthermore, Alive can be automatically translated into C++ code that is suitable for inclusion in an LLVM optimization pass. Alive is based on an attempt to balance usability and formal methods; for example, it captures---but largely hides---the detailed semantics of three different kinds of Undefined Behavior in LLVM. We have translated more than 300 LLVM optimizations into Alive and, in the process, found that eight of them were wrong.

Chris Hathhorn - One of the best experts on this subject based on the ideXlab platform.

  • Dealing With C's Original Sin
    IEEE Software, 2019
    Co-Authors: Chris Hathhorn, Grigore Rosu
    Abstract:

    In the very early days of C, the compiler written by Dennis Ritchie and supplied with the UNIX operating system entirely defined the language. As the number of users and C implementations grew, however, so too did the need for a language standard-a contract between users and implementers about what should and should not count as C. This effort began in 1983 with the formation of a committee tasked with producing "an unambiguous and machine-independent definition of the language C" and led to the ANSI C Standard in 1989.1 In retrospect, it was not until this date, 17 years after the first compiler, when C's most notorious language feature slithered into the world: Undefined Behavior .

  • RV - Runtime Verification at Work: A Tutorial
    Runtime Verification, 2016
    Co-Authors: Philip Daian, Chris Hathhorn, Dwight Guth, Manasvi Saxena, Edgar Pek, Traian Florin Şerbănuţă, Grigore Rosu
    Abstract:

    We present a suite of runtime verification tools developed by Runtime Verification Inc.: RV-Match, RV-Predict, and RV-Monitor. RV-Match is a tool for checking C programs for Undefined Behavior and other common programmer mistakes. It is extracted from the most complete formal semantics of the C11 language and beats many similar tools in its ability to catch a broad range of undesirable Behaviors. RV-Predict is a dynamic data race detector for Java and C/C++ programs. It is perhaps the only tool that is both sound and maximal: it only reports real races and it can find all races that can be found by any other sound data race detector analyzing the same execution trace. RV-Monitor is a runtime monitoring tool that checks and enforces safety and security properties during program execution. Our tools focus on reporting no false positives and are free for non-commercial use.

  • CAV (1) - RV-match: Practical semantics-based program analysis
    Computer Aided Verification, 2016
    Co-Authors: Dwight Guth, Chris Hathhorn, Manasvi Saxena, Grigore Rosu
    Abstract:

    We present RV-Match, a tool for checking C programs for Undefined Behavior and other common programmer mistakes. Our tool is extracted from the most complete formal semantics of the C11 language. Previous versions of this tool were used primarily for testing the correctness of the semantics, but we have improved it into a tool for doing practical analysis of real C programs. It beats many similar tools in its ability to catch a broad range of undesirable Behaviors. We demonstrate this with comparisons based on a third-party benchmark.

  • PLDI - Defining the Undefinedness of C
    Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation, 2015
    Co-Authors: Chris Hathhorn, Chucky Ellison, Grigore Rosu
    Abstract:

    We present a ``negative'' semantics of the C11 language---a semantics that does not just give meaning to correct programs, but also rejects Undefined programs. We investigate Undefined Behavior in C and discuss the techniques and special considerations needed for formally specifying it. We have used these techniques to modify and extend a semantics of C into one that captures Undefined Behavior. The amount of semantic infrastructure and effort required to achieve this was unexpectedly high, in the end nearly doubling the size of the original semantics. From our semantics, we have automatically extracted an Undefinedness checker, which we evaluate against other popular analysis tools, using our own test suite in addition to a third-party test suite. Our checker is capable of detecting examples of all 77 categories of core language Undefinedness appearing in the C11 standard, more than any other tool we considered. Based on this evaluation, we argue that our work is the most comprehensive and complete semantic treatment of Undefined Behavior in C, and thus of the C language itself.

  • Defining the Undefinedness of C11 : practical semantics-based program analysis
    1
    Co-Authors: Chris Hathhorn
    Abstract:

    [ACCESS RESTRICTED TO THE UNIVERSITY OF MISSOURI AT AUTHOR'S REQUEST.] This thesis extends the work of Ellison and Ros,u [13, 12] but focuses on the "negative" semantics of the C11 language--the semantics required to not just give meaning to correct programs, but also to reject Undefined programs. We investigate Undefined Behavior in C and discuss the techniques and special considerations needed to formally specify it. Using these techniques, we have modified and extended a semantics of C into one that captures Undefined Behavior. The amount of semantic infrastructure and effort required to achieve this was unexpectedly high, in the end more than tripling the size of the original semantics. From our semantics, we automatically extract kcc, a tool for checking realworld C programs for Undefined Behavior and other common programmer mistakes. Previous versions of this tool were used primarily for testing the correctness of the semantics, but we have improved it into a tool for doing practical analysis of real C programs. It beats many similar tools in its ability to catch a broad range of undesirable Behaviors. We demonstrate this with comparisons based on our own test suite in addition to third-party benchmarks. Our checker is capable of detecting examples of all 77 categories of core language Undefinedness appearing in the C11 standard, more than any other tool we considered. Based on this evaluation, we argue that our work is the most comprehensive and complete semantic treatment of Undefined Behavior in C, and thus of the C language itself.

Anton A. Kutsenko - One of the best experts on this subject based on the ideXlab platform.

  • Programming Infinite Machines
    Erkenntnis, 2019
    Co-Authors: Anton A. Kutsenko
    Abstract:

    For infinite machines that are free from the classical Thomson’s lamp paradox, we show that they are not free from its inverted-in-time version. We provide a program for infinite machines and an infinite mechanism that demonstrate this paradox. While their finite analogs work predictably, the program and the infinite mechanism demonstrate an Undefined Behavior. As in the case of infinite Davies machines (Davies in Br J Philos Sci 52(4):671–682, 2001), our examples are free from infinite masses, infinite velocities, infinite forces, etc. Only infinite divisibility of space and time is assumed. Thus, the infinite devices considered are possible in a Newtonian Universe and they do not conflict with Newtonian mechanics. Note that the classical Thomson’s lamp paradox leads to infinite velocities which may not be producible in acceptable models of Newtonian mechanics. Finally, it is shown that the “paradox of predictability” is similar to the inverted Thomson’s lamp paradox.

  • Programming infinite machines
    arXiv: Logic in Computer Science, 2018
    Co-Authors: Anton A. Kutsenko
    Abstract:

    For infinite machines which are free from the classical Thompson's lamp paradox we show that they are not free from its inverted version. We provide a program for infinite machines and an infinite mechanism which simulate this paradox. While their finite analogs work predictably, the program and the infinite mechanism demonstrate an Undefined Behavior. As in the case of infinite Davies's machines, our examples are free from infinite masses, infinite velocities, infinite forces, etc. Only infinite divisibility of space and timeis assumed. Thus, the considered infinite devices are possible in a continuous Newtonian Universe and they do not conflict with continuous Newtonian mechanics. Some possible applications to the analysis of the Navier-Stokes equations are discussed.