Lossy Compression

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 11517 Experts worldwide ranked by ideXlab platform

Tracy Camp - One of the best experts on this subject based on the ideXlab platform.

  • Lossy Compression for wireless seismic data acquisition
    IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2016
    Co-Authors: Marc J Rubin, Michael B Wakin, Tracy Camp
    Abstract:

    In this paper, we rigorously compare compressive sampling (CS) to four state of the art, on-mote, Lossy Compression algorithms [ $K$ -run-length encoding (KRLE), lightweight temporal Compression (LTC), wavelet quantization thresholding and run-length encoding (WQTR), and a low-pass filtered fast Fourier transform (FFT)]. Specifically, we first simulate Lossy Compression on two real-world seismic data sets, and we then evaluate algorithm performance using implementations on real hardware. In terms of Compression ratios, recovered signal error, power consumption, on-mote execution runtime, and classification accuracy of a seismic event detection task (on decompressed signals), results show that CS performs comparable to (and in many cases better than) the other algorithms evaluated. A main benefit to users is that CS, a lightweight and nonadaptive Compression technique, can guarantee a desired level of Compression performance (and thus, radio usage and power consumption) without subjugating recovered signal quality. Our contribution is a novel and rigorous comparison of five state of the art, on-mote, Lossy Compression algorithms in simulation on real-world data sets and in implementations on hardware.

  • A Comparison of On-Mote Lossy Compression Algorithms for Wireless Seismic Data Acquisition
    2014 IEEE International Conference on Distributed Computing in Sensor Systems, 2014
    Co-Authors: Marc J Rubin, Michael B Wakin, Tracy Camp
    Abstract:

    In this article, we rigorously compare compressive sampling (CS) to four state of the art, on-mote, Lossy Compression algorithms (K-run-length encoding (KRLE), lightweight temporal Compression (LTC), wavelet quantization thresholding and run-length encoding (WQTR), and a low-pass filtered fast Fourier transform (FFT)). Specifically, we first simulate Lossy Compression on two real-world seismic data sets, and we then evaluate algorithm performance using implementations on real hardware. In terms of Compression rates, recovered signal error, power consumption, and classification accuracy of a seismic event detection task (on decompressed signals), results show that CS performs comparable to (and in many cases better than) the other algorithms evaluated. The main benefit to users is that CS, a lightweight and non-adaptive Compression technique, can guarantee a desired level of Compression performance (and thus, radio usage and power consumption) without subjugating recovered signal quality. Our contribution is a novel and rigorous comparison of five state of the art, on-mote, Lossy Compression algorithms in simulation on real-world data sets and implemented on hardware.

Franck Cappello - One of the best experts on this subject based on the ideXlab platform.

  • Performance Optimization for Relative-Error-Bounded Lossy Compression on Scientific Data
    IEEE Transactions on Parallel and Distributed Systems, 2020
    Co-Authors: Tao Lu, Sheng Di, Xuan Wang, Weizhe Zhang, Haijun Zhang, Franck Cappello
    Abstract:

    Scientific simulations in high-performance computing (HPC) environments generate vast volume of data, which may cause a severe I/O bottleneck at runtime and a huge burden on storage space for postanalysis. Unlike traditional data reduction schemes such as deduplication or lossless Compression, not only can error-controlled Lossy Compression significantly reduce the data size but it also holds the promise to satisfy user demand on error control. Pointwise relative error bounds (i.e., Compression errors depends on the data values) are widely used by many scientific applications with Lossy Compression since error control can adapt to the error bound in the dataset automatically. Pointwise relative-error-bounded Compression is complicated and time consuming. In this article, we develop efficient precomputation-based mechanisms based on the SZ Lossy Compression framework. Our mechanisms can avoid costly logarithmic transformation and identify quantization factor values via a fast table lookup, greatly accelerating the relative-error-bounded Compression with excellent Compression ratios. In addition, we reduce traversing operations for Huffman decoding, significantly accelerating the deCompression process in SZ. Experiments with eight well-known real-world scientific simulation datasets show that our solution can improve the Compression and deCompression rates (i.e., the speed) by about 40 and 80 p, respectively, in most of cases, making our designed Lossy Compression strategy the best-in-class solution in most cases.

  • FRaZ: A Generic High-Fidelity Fixed-Ratio Lossy Compression Framework for Scientific Floating-point Data
    arXiv: Distributed Parallel and Cluster Computing, 2020
    Co-Authors: Robert Underwood, Sheng Di, Jon Calhoun, Franck Cappello
    Abstract:

    With ever-increasing volumes of scientific floating-point data being produced by high-performance computing applications, significantly reducing scientific floating-point data size is critical, and error-controlled Lossy compressors have been developed for years. None of the existing scientific floating-point Lossy data compressors, however, support effective fixed-ratio Lossy Compression. Yet fixed-ratio Lossy Compression for scientific floating-point data not only compresses to the requested ratio but also respects a user-specified error bound with higher fidelity. In this paper, we present FRaZ: a generic fixed-ratio Lossy Compression framework respecting user-specified error constraints. The contribution is twofold. (1) We develop an efficient iterative approach to accurately determine the appropriate error settings for different Lossy compressors based on target Compression ratios. (2) We perform a thorough performance and accuracy evaluation for our proposed fixed-ratio Compression framework with multiple state-of-the-art error-controlled Lossy compressors, using several real-world scientific floating-point datasets from different domains. Experiments show that FRaZ effectively identifies the optimum error setting in the entire error setting space of any given Lossy compressor. While fixed-ratio Lossy Compression is slower than fixed-error Compression, it provides an important new Lossy Compression technique for users of very large scientific floating-point datasets.

  • Accelerating Relative-error Bounded Lossy Compression for HPC datasets with Precomputation-Based Mechanisms
    2019 35th Symposium on Mass Storage Systems and Technologies (MSST), 2019
    Co-Authors: Tao Lu, Sheng Di, Xuan Wang, Weizhe Zhang, Franck Cappello
    Abstract:

    Scientific simulations in high-performance computing (HPC) environments are producing vast volume of data, which may cause a severe I/O bottleneck at runtime and a huge burden on storage space for post-analysis. Unlike the traditional data reduction schemes (such as deduplication or lossless Compression), not only can error-controlled Lossy Compression significantly reduce the data size but it can also hold the promise to satisfy user demand on error control. Point-wise relative error bounds (i.e., Compression errors depends on the data values) are widely used by many scientific applications in the Lossy Compression, since error control can adapt to the precision in the dataset automatically. Point-wise relative error bounded Compression is complicated and time consuming. In this work, we develop efficient precomputation-based mechanisms in the SZ Lossy Compression framework. Our mechanisms can avoid costly logarithmic transformation and identify quantization factor values via a fast table lookup, greatly accelerating the relative-error bounded Compression with excellent Compression ratios. In addition, our mechanisms also help reduce traversing operations for Huffman decoding, and thus significantly accelerate the deCompression process in SZ. Experiments with four well-known real-world scientific simulation datasets show that our solution can improve the Compression rate by about 30% and deCompression rate by about 70% in most of cases, making our designed Lossy Compression strategy the best choice in class in most cases.

  • CLUSTER - Fixed-PSNR Lossy Compression for Scientific Data
    2018 IEEE International Conference on Cluster Computing (CLUSTER), 2018
    Co-Authors: Sheng Di, Xin Liang, Zizhong Chen, Franck Cappello
    Abstract:

    Error-controlled Lossy Compression has been studied for years because of extremely large volumes of data being produced by today's scientific simulations. None of existing Lossy compressors, however, allow users to fix the peak signal-to-noise ratio (PSNR) during Compression, although PSNR has been considered as one of the most significant indicators to assess Compression quality. In this paper, we propose a novel technique providing a fixed-PSNR Lossy Compression for scientific data sets. We implement our proposed method based on the SZ Lossy Compression framework and release the code as an open-source toolkit. We evaluate our fixed-PSNR compressor on three realworld high-performance computing data sets. Experiments show that our solution has a high accuracy in controlling PSNR, with an average deviation of 0.1 ~ 5.0 dB on the tested data sets.

  • Fixed-PSNR Lossy Compression for Scientific Data
    2018 IEEE International Conference on Cluster Computing (CLUSTER), 2018
    Co-Authors: Sheng Di, Xin Liang, Zizhong Chen, Franck Cappello
    Abstract:

    Error-controlled Lossy Compression has been studied for years because of extremely large volumes of data being produced by today's scientific simulations. None of existing Lossy compressors, however, allow users to fix the peak signal-to-noise ratio (PSNR) during Compression, although PSNR has been considered as one of the most significant indicators to assess Compression quality. In this paper, we propose a novel technique providing a fixed-PSNR Lossy Compression for scientific data sets. We implement our proposed method based on the SZ Lossy Compression framework and release the code as an open-source toolkit. We evaluate our fixed-PSNR compressor on three realworld high-performance computing data sets. Experiments show that our solution has a high accuracy in controlling PSNR, with an average deviation of 0.1 ~ 5.0 dB on the tested data sets.

Tsachy Weissman - One of the best experts on this subject based on the ideXlab platform.

  • Effect of Lossy Compression of quality scores on variant calling
    Briefings in Bioinformatics, 2017
    Co-Authors: Idoia Ochoa, Rachel Goldfeder, Mikel Hernaez, Tsachy Weissman, Euan Ashley
    Abstract:

    Recent advancements in sequencing technology have led to a drastic reduction in genome sequencing costs. This development has generated an unprecedented amount of data that must be stored, processed, and communicated. To facilitate this effort, Compression of genomic files has been proposed. Specifically, Lossy Compression of quality scores is emerging as a natural candidate for reducing the growing costs of storage. A main goal of performing DNA sequencing in population studies and clinical settings is to identify genetic variation. Though the field agrees that smaller files are advantageous, the cost of Lossy Compression, in terms of variant discovery, is unclear.Bioinformatic algorithms to identify SNPs and INDELs use base quality score information; here, we evaluate the effect of Lossy Compression of quality scores on SNP and INDEL detection. Specifically, we investigate how the output of the variant caller when using the original data differs from that obtained when quality scores are replaced by those generated by a Lossy compressor. Using gold standard genomic datasets and simulated data, we are able to analyze how accurate the output of the variant calling is, both for the original data and that previously lossily compressed. We show that Lossy Compression can significantly alleviate the storage while maintaining variant calling performance comparable to that with the original data. Further, in some cases Lossy Compression can lead to variant calling performance that is superior to that using the original file. We envisage our findings and framework serving as a benchmark in future development and analyses of Lossy genomic data compressors.

  • effect of Lossy Compression of quality scores on variant calling
    bioRxiv, 2015
    Co-Authors: Idoia Ochoa, Rachel Goldfeder, Mikel Hernaez, Tsachy Weissman, Euan A Ashley
    Abstract:

    Recent advancements in sequencing technology have led to a drastic reduction in the cost of genome sequencing. This development has generated an unprecedented amount of genomic data that must be stored, processed, and communicated. To facilitate this effort, Compression of genomic files has been proposed. Specifically, Lossy Compression of quality scores is emerging as a natural candidate for reducing the growing costs of storage. A main goal of performing DNA sequencing in population studies and clinical settings is to identify genetic variation. Though the field agrees that smaller files are advantageous, the cost of Lossy Compression, in terms of variant discovery, is unclear. Bioinformatic algorithms to identify SNPs and INDELs from next-generation DNA sequencing data use base quality score information; here, we evaluate the effect of Lossy Compression of quality scores on SNP and INDEL detection. We analyze several Lossy compressors introduced recently in the literature. Specifically, we investigate how the output of the variant caller when using the original data (uncompressed) differs from that obtained when quality scores are replaced by those generated by a Lossy compressor. Using gold standard genomic datasets such as the GIAB (Genome In A Bottle) consensus sequence for NA12878 and simulated data, we are able to analyze how accurate the output of the variant calling is, both for the original data and that previously lossily compressed. We show that Lossy Compression can significantly alleviate the storage while maintaining variant calling performance comparable to that with the uncompressed data. Further, in some cases Lossy Compression can lead to variant calling performance which is superior to that using the uncompressed file. We envisage our findings and framework serving as a benchmark in future development and analyses of Lossy genomic data compressors. The \emph{Supplementary Data} can be found at \url{http://web.stanford.edu/~iochoa/supplementEffectLossy.zip}.

  • Universality of logarithmic loss in Lossy Compression
    2015 IEEE International Symposium on Information Theory (ISIT), 2015
    Co-Authors: Albert No, Tsachy Weissman
    Abstract:

    We establish two strong senses of universality of logarithmic loss as a distortion criterion in Lossy Compression: For any fixed length Lossy Compression problem under an arbitrary distortion criterion, we show that there is an equivalent Lossy Compression problem under logarithmic loss. In the successive refinement problem, if the first decoder operates under logarithmic loss, we show that any discrete memoryless source is successively refinable under an arbitrary distortion criterion for the second decoder.

  • ISIT - Universality of logarithmic loss in Lossy Compression
    2015 IEEE International Symposium on Information Theory (ISIT), 2015
    Co-Authors: Albert No, Tsachy Weissman
    Abstract:

    We establish two strong senses of universality of logarithmic loss as a distortion criterion in Lossy Compression: For any fixed length Lossy Compression problem under an arbitrary distortion criterion, we show that there is an equivalent Lossy Compression problem under logarithmic loss. In the successive refinement problem, if the first decoder operates under logarithmic loss, we show that any discrete memoryless source is successively refinable under an arbitrary distortion criterion for the second decoder.

  • Achievable complexity-performance tradeoffs in Lossy Compression
    Problems of Information Transmission, 2012
    Co-Authors: Ankit Gupta, Sergio Verdu, Tsachy Weissman
    Abstract:

    We present several results related to the complexity-performance tradeoff in Lossy Compression. The first result shows that for a memoryless source with rate-distortion function R(D) and a bounded distortion measure, the rate-distortion point (R(D) + ?, D + ?) can be achieved with constant deCompression time per (separable) symbol and Compression time per symbol proportional to $$\left( {{{\lambda _1 } \mathord{\left/ {\vphantom {{\lambda _1 } \varepsilon }} \right. \kern-\nulldelimiterspace} \varepsilon }} \right)^{{{\lambda _2 } \mathord{\left/ {\vphantom {{\lambda _2 } {\gamma ^2 }}} \right. \kern-\nulldelimiterspace} {\gamma ^2 }}}$$ , where ? 1 and ? 2 are source dependent constants. The second result establishes that the same point can be achieved with constant deCompression time and Compression time per symbol proportional to $$\left( {{{\rho _1 } \mathord{\left/ {\vphantom {{\rho _1 } \gamma }} \right. \kern-\nulldelimiterspace} \gamma }} \right)^{{{\rho _2 } \mathord{\left/ {\vphantom {{\rho _2 } {\varepsilon ^2 }}} \right. \kern-\nulldelimiterspace} {\varepsilon ^2 }}}$$ . These results imply, for any function g(n) that increases without bound arbitrarily slowly, the existence of a sequence of Lossy Compression schemes of blocklength n with O(ng(n)) Compression complexity and O(n) deCompression complexity that achieve the point (R(D), D) asymptotically with increasing blocklength. We also establish that if the reproduction alphabet is finite, then for any given R there exists a universal Lossy Compression scheme with O(ng(n)) Compression complexity and O(n) deCompression complexity that achieves the point (R, D(R)) asymptotically for any stationary ergodic source with distortion-rate function D(·).

Marc J Rubin - One of the best experts on this subject based on the ideXlab platform.

  • Lossy Compression for wireless seismic data acquisition
    IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2016
    Co-Authors: Marc J Rubin, Michael B Wakin, Tracy Camp
    Abstract:

    In this paper, we rigorously compare compressive sampling (CS) to four state of the art, on-mote, Lossy Compression algorithms [ $K$ -run-length encoding (KRLE), lightweight temporal Compression (LTC), wavelet quantization thresholding and run-length encoding (WQTR), and a low-pass filtered fast Fourier transform (FFT)]. Specifically, we first simulate Lossy Compression on two real-world seismic data sets, and we then evaluate algorithm performance using implementations on real hardware. In terms of Compression ratios, recovered signal error, power consumption, on-mote execution runtime, and classification accuracy of a seismic event detection task (on decompressed signals), results show that CS performs comparable to (and in many cases better than) the other algorithms evaluated. A main benefit to users is that CS, a lightweight and nonadaptive Compression technique, can guarantee a desired level of Compression performance (and thus, radio usage and power consumption) without subjugating recovered signal quality. Our contribution is a novel and rigorous comparison of five state of the art, on-mote, Lossy Compression algorithms in simulation on real-world data sets and in implementations on hardware.

  • A Comparison of On-Mote Lossy Compression Algorithms for Wireless Seismic Data Acquisition
    2014 IEEE International Conference on Distributed Computing in Sensor Systems, 2014
    Co-Authors: Marc J Rubin, Michael B Wakin, Tracy Camp
    Abstract:

    In this article, we rigorously compare compressive sampling (CS) to four state of the art, on-mote, Lossy Compression algorithms (K-run-length encoding (KRLE), lightweight temporal Compression (LTC), wavelet quantization thresholding and run-length encoding (WQTR), and a low-pass filtered fast Fourier transform (FFT)). Specifically, we first simulate Lossy Compression on two real-world seismic data sets, and we then evaluate algorithm performance using implementations on real hardware. In terms of Compression rates, recovered signal error, power consumption, and classification accuracy of a seismic event detection task (on decompressed signals), results show that CS performs comparable to (and in many cases better than) the other algorithms evaluated. The main benefit to users is that CS, a lightweight and non-adaptive Compression technique, can guarantee a desired level of Compression performance (and thus, radio usage and power consumption) without subjugating recovered signal quality. Our contribution is a novel and rigorous comparison of five state of the art, on-mote, Lossy Compression algorithms in simulation on real-world data sets and implemented on hardware.

Jun Muramatsu - One of the best experts on this subject based on the ideXlab platform.