Numeric Data Type

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 57 Experts worldwide ranked by ideXlab platform

Gustavo Alonso - One of the best experts on this subject based on the ideXlab platform.

  • Reproducible Floating-Point Aggregation in RDBMSs
    arXiv: Databases, 2018
    Co-Authors: Ingo Müller, Andrea Arteaga, Torsten Hoefler, Gustavo Alonso
    Abstract:

    Industry-grade Database systems are expected to produce the same result if the same query is repeatedly run on the same input. However, the numerous sources of non-determinism in modern systems make reproducible results difficult to achieve. This is particularly true if floating-point numbers are involved, where the order of the operations affects the final result. As part of a larger effort to extend Database engines with Data representations more suitable for machine learning and scientific applications, in this paper we explore the problem of making relational GroupBy over floating-point formats bit-reproducible, i.e., ensuring any execution of the operator produces the same result up to every single bit. To that aim, we first propose a Numeric Data Type that can be used as drop-in replacement for other number formats and is---unlike standard floating-point formats---associative. We use this Data Type to make state-of-the-art GroupBy operators reproducible, but this approach incurs a slowdown between 4x and 12x compared to the same operator using conventional Database number formats. We thus explore how to modify existing GroupBy algorithms to make them bit-reproducible and efficient. By using vectorized summation on batches and carefully balancing batch size, cache footprint, and preprocessing costs, we are able to reduce the slowdown due to reproducibility to a factor between 1.9x and 2.4x of aggregation in isolation and to a mere 2.7% of end-to-end query performance even on aggregation-intensive queries in MonetDB. We thereby provide a solid basis for supporting more reproducible operations directly in relational engines. This document is an extended version of an article currently in print for the proceedings of ICDE'18 with the same title and by the same authors. The main additions are more implementation details and experiments.

  • ICDE - Reproducible Floating-Point Aggregation in RDBMSs
    2018 IEEE 34th International Conference on Data Engineering (ICDE), 2018
    Co-Authors: Ingo Mueller, Andrea Arteaga, Torsten Hoefler, Gustavo Alonso
    Abstract:

    Industry-grade Database systems are expected to produce the same result if the same query is repeatedly run on the same input. However, the numerous sources of non-determinism in modern systems make reproducible results difficult to achieve. This is particularly true if floating-point numbers are involved, where the order of the operations affects the final result. As part of a larger effort to extend Database engines with Data representations more suitable for machine learning and scientific applications, in this paper we explore the problem of making relational GroupBy over floating-point formats bit-reproducible, i.e., ensuring any execution of the operator produces the same result up to every single bit. To that aim, we first propose a Numeric Data Type that can be used as drop-in replacement for other number formats and is—unlike standard floating-point formats—associative. We use this Data Type to make state-of-the-art GroupBy operators reproducible, but this approach incurs a slowdown between 4x and 12x compared to the same operator using conventional Database number formats. We thus explore how to modify existing GroupBy algorithms to make them bit-reproducible and efficient. By using vectorized summation on batches and carefully balancing batch size, cache footprint, and preprocessing costs, we are able to reduce the slowdown due to reproducibility to a factor between 1.9x and 2.4x of aggregation in isolation and to a mere 2.7% of end-to-end query performance even on aggregation-intensive queries in MonetDB. We thereby provide a solid basis for supporting more reproducible operations directly in relational engines.

Deng Wei - One of the best experts on this subject based on the ideXlab platform.

  • An Algorithm for Clustering Evolving Text Data Stream with Outliers
    Computer Science, 2007
    Co-Authors: Deng Wei
    Abstract:

    As a branch of clustering,Data stream clustering has become a hot spot in Data mining.Although there are many stream clustering algorithms,they are only suitable for low dimensional Numeric Data Type,and few of them are designed for high dimensional text streams.A novel online micro cluster structure based on the traditional stream clus- tering framework was proposed and it is suitable for clustering text.Dividing the online micro cluster into potential and outlier micro clusters also brings advantage when outliers appear frequently in stream.Experiments show that these methods bring advancements for processing text streams when compared to others.

Ingo Mueller - One of the best experts on this subject based on the ideXlab platform.

  • ICDE - Reproducible Floating-Point Aggregation in RDBMSs
    2018 IEEE 34th International Conference on Data Engineering (ICDE), 2018
    Co-Authors: Ingo Mueller, Andrea Arteaga, Torsten Hoefler, Gustavo Alonso
    Abstract:

    Industry-grade Database systems are expected to produce the same result if the same query is repeatedly run on the same input. However, the numerous sources of non-determinism in modern systems make reproducible results difficult to achieve. This is particularly true if floating-point numbers are involved, where the order of the operations affects the final result. As part of a larger effort to extend Database engines with Data representations more suitable for machine learning and scientific applications, in this paper we explore the problem of making relational GroupBy over floating-point formats bit-reproducible, i.e., ensuring any execution of the operator produces the same result up to every single bit. To that aim, we first propose a Numeric Data Type that can be used as drop-in replacement for other number formats and is—unlike standard floating-point formats—associative. We use this Data Type to make state-of-the-art GroupBy operators reproducible, but this approach incurs a slowdown between 4x and 12x compared to the same operator using conventional Database number formats. We thus explore how to modify existing GroupBy algorithms to make them bit-reproducible and efficient. By using vectorized summation on batches and carefully balancing batch size, cache footprint, and preprocessing costs, we are able to reduce the slowdown due to reproducibility to a factor between 1.9x and 2.4x of aggregation in isolation and to a mere 2.7% of end-to-end query performance even on aggregation-intensive queries in MonetDB. We thereby provide a solid basis for supporting more reproducible operations directly in relational engines.

T. Ajith Bosco Raj - One of the best experts on this subject based on the ideXlab platform.

  • WFCM based big sensor Data error detection and correction in wireless sensor network
    Cluster Computing, 2019
    Co-Authors: R. Sheeba, G. Jiji, T. Ajith Bosco Raj
    Abstract:

    In WSN the requested Data is collected from the initial node i.e. sender and the information are uploaded on a cloud platform. Only Numeric Data Type is considered in this error detection and correction technique. Map Reduce algorithm is applied on clusters made by big Data and Weighted Fuzzy C-Means Clustering (WFCM) technique is used for clustering. Completely different operations are performed on the cloud platform like error detection, location finding, Data cleansing and error recovery. Throughout the filtering of big Data sets, whenever an abnormal knowledge is encountered, detection rule has to perform two tasks. “fd (n/e,t)” is decision making function. It is used to determine whether the detected anomalous Data is a true error. In other words, fd (n/e,t) has two outputs, “false negative” for detecting a true error and “false positive” to select non-error Data. “fl (n/e,t)” is a function for tracking and returning original error source.

Alvin M. Despain - One of the best experts on this subject based on the ideXlab platform.

  • An integrated prolog architecture for symbolic and Numeric executions
    Annals of Mathematics and Artificial Intelligence, 1991
    Co-Authors: Robert Yung, Alvin M. Despain
    Abstract:

    Numerically intensive calculations are not well supported by Prolog, yet there are important applications that require tightly coupled symbolic and Numeric calculations. The Aquarius Numeric Processor (ANP) is an extended Numeric Instruction Set Architecture based on the Berkeley Programmed Logic Machine (PLM) to support integrated symbolic and Numeric calculations. This extension expands the existing Numeric Data Type to include 32- and 64-bit integers, and single and double precision floating-point numbers conforming to the IEEE Standard P754. A new class of Data structure, Numeric arrays, is added to represent matrices and arrays found in most scientific programming languages. Powerful Numeric instructions are included to manipulate the new Data Types. Dynamic Type checking and coercing of operands are done. The ANP and PLM together provide for the efficient execution for symbolic and Numeric operations written in AI languages such as Prolog and Lisp. Simulated performance results indicate the system will achieve about 10 MFLOPs on the Prolog version of some Whetstone and Linpack benchmarks and close to 20 MFLOPS on some matrix operations (all in double precision).

  • Extending a Prolog architecture for high performance Numeric computations
    [1989] Proceedings of the Twenty-Second Annual Hawaii International Conference on System Sciences. Volume 1: Architecture Track, 1
    Co-Authors: Robert Yung, Alvin M. Despain, Yale N. Patt
    Abstract:

    The Aquarius Numeric processor (ANP) is an extended Numeric instruction set architecture that is based on the Berkeley programmed logic machine (PLM) and supports integrated symbolic and Numeric calculations. This extension expands the existing Numeric Data Type to include 32- and 64-bit integers and single- and double-precision floating-point numbers conforming to the IEEE Standard P754. A class of Data structure called Numeric arrays has been added to represent matrices and arrays found in most scientific programming languages. Powerful Numeric instructions are included to manipulate these novel Data Types. The authors describe the programming model and the architecture of the ANP. An experimental ANP is currently under construction using TTL (transistor-transistor logic) and ECL (emitter-coupled logic) parts. Simulated performance results indicate that the system will achieve about 10 MFLOPs (millions of floating-point operations) on the Prolog version of some Whetstone and Linpack benchmarks and close to 20 MFLOPS on some matrix operations (all in double precision). >