Exponent Bit

The Experts below are selected from a list of 1530 Experts worldwide ranked by ideXlab platform

Flegar Goran - One of the best experts on this subject based on the ideXlab platform.

FloatX: A C++ Library for Customized Floating-Point Arithmetic

'Association for Computing Machinery (ACM)', 2019

Co-Authors: Flegar Goran, Scheidegger Florian, Novakovic Vedran, Mariani Giovani, Tomás Domínguez, Andrés Enrique, Malossi Cristiano, Quintana-ortí, Enrique S.

Abstract:

"© ACM, 2019. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in ACM Transactions on Mathematical Software, {45, 4, (2019)} https://dl.acm.org/doi/10.1145/3368086"[EN] We present FloatX (Float eXtended), a C++ framework to investigate the effect of leveraging customized floating-point formats in numerical applications. FloatX formats are based on binary IEEE 754 with smaller significand and Exponent Bit counts specified by the user. Among other properties, FloatX facilitates an incremental transformation of the code, relies on hardware-supported floating-point types as back-end to preserve efficiency, and incurs no storage overhead. The article discusses in detail the design principles, programming interface, and datatype casting rules behind FloatX. Furthermore, it demonstrates FloatX's usage and benefits via several case studies from well-known numerical dense linear algebra libraries, such as BLAS and LAPACK; the Ginkgo library for sparse linear systems; and two neural network applications related with image processing and text recognition.This work was supported by the CICYT projects TIN2014-53495-R and TIN2017-82972-R of the MINECO and FEDER, and the EU H2020 project 732631 "OPRECOMP. Open Transprecision Computing."Flegar, G.; Scheidegger, F.; Novakovic, V.; Mariani, G.; Tomás Domínguez, AE.; Malossi, C.; Quintana-Ortí, ES. (2019). FloatX: A C++ Library for Customized Floating-Point Arithmetic. ACM Transactions on Mathematical Software. 45(4):1-23. https://doi.org/10.1145/3368086S12345

15 days free trial to Access Article

Scheidegger Florian - One of the best experts on this subject based on the ideXlab platform.

FloatX: A C++ Library for Customized Floating-Point Arithmetic

'Association for Computing Machinery (ACM)', 2019

Co-Authors: Flegar Goran, Scheidegger Florian, Novakovic Vedran, Mariani Giovani, Tomás Domínguez, Andrés Enrique, Malossi Cristiano, Quintana-ortí, Enrique S.

Abstract:

"© ACM, 2019. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in ACM Transactions on Mathematical Software, {45, 4, (2019)} https://dl.acm.org/doi/10.1145/3368086"[EN] We present FloatX (Float eXtended), a C++ framework to investigate the effect of leveraging customized floating-point formats in numerical applications. FloatX formats are based on binary IEEE 754 with smaller significand and Exponent Bit counts specified by the user. Among other properties, FloatX facilitates an incremental transformation of the code, relies on hardware-supported floating-point types as back-end to preserve efficiency, and incurs no storage overhead. The article discusses in detail the design principles, programming interface, and datatype casting rules behind FloatX. Furthermore, it demonstrates FloatX's usage and benefits via several case studies from well-known numerical dense linear algebra libraries, such as BLAS and LAPACK; the Ginkgo library for sparse linear systems; and two neural network applications related with image processing and text recognition.This work was supported by the CICYT projects TIN2014-53495-R and TIN2017-82972-R of the MINECO and FEDER, and the EU H2020 project 732631 "OPRECOMP. Open Transprecision Computing."Flegar, G.; Scheidegger, F.; Novakovic, V.; Mariani, G.; Tomás Domínguez, AE.; Malossi, C.; Quintana-Ortí, ES. (2019). FloatX: A C++ Library for Customized Floating-Point Arithmetic. ACM Transactions on Mathematical Software. 45(4):1-23. https://doi.org/10.1145/3368086S12345

15 days free trial to Access Article

Novakovic Vedran - One of the best experts on this subject based on the ideXlab platform.

FloatX: A C++ Library for Customized Floating-Point Arithmetic

'Association for Computing Machinery (ACM)', 2019

Co-Authors: Flegar Goran, Scheidegger Florian, Novakovic Vedran, Mariani Giovani, Tomás Domínguez, Andrés Enrique, Malossi Cristiano, Quintana-ortí, Enrique S.

Abstract:

"© ACM, 2019. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in ACM Transactions on Mathematical Software, {45, 4, (2019)} https://dl.acm.org/doi/10.1145/3368086"[EN] We present FloatX (Float eXtended), a C++ framework to investigate the effect of leveraging customized floating-point formats in numerical applications. FloatX formats are based on binary IEEE 754 with smaller significand and Exponent Bit counts specified by the user. Among other properties, FloatX facilitates an incremental transformation of the code, relies on hardware-supported floating-point types as back-end to preserve efficiency, and incurs no storage overhead. The article discusses in detail the design principles, programming interface, and datatype casting rules behind FloatX. Furthermore, it demonstrates FloatX's usage and benefits via several case studies from well-known numerical dense linear algebra libraries, such as BLAS and LAPACK; the Ginkgo library for sparse linear systems; and two neural network applications related with image processing and text recognition.This work was supported by the CICYT projects TIN2014-53495-R and TIN2017-82972-R of the MINECO and FEDER, and the EU H2020 project 732631 "OPRECOMP. Open Transprecision Computing."Flegar, G.; Scheidegger, F.; Novakovic, V.; Mariani, G.; Tomás Domínguez, AE.; Malossi, C.; Quintana-Ortí, ES. (2019). FloatX: A C++ Library for Customized Floating-Point Arithmetic. ACM Transactions on Mathematical Software. 45(4):1-23. https://doi.org/10.1145/3368086S12345

15 days free trial to Access Article

Mariani Giovani - One of the best experts on this subject based on the ideXlab platform.

FloatX: A C++ Library for Customized Floating-Point Arithmetic

'Association for Computing Machinery (ACM)', 2019

Co-Authors: Flegar Goran, Scheidegger Florian, Novakovic Vedran, Mariani Giovani, Tomás Domínguez, Andrés Enrique, Malossi Cristiano, Quintana-ortí, Enrique S.

Abstract:

"© ACM, 2019. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in ACM Transactions on Mathematical Software, {45, 4, (2019)} https://dl.acm.org/doi/10.1145/3368086"[EN] We present FloatX (Float eXtended), a C++ framework to investigate the effect of leveraging customized floating-point formats in numerical applications. FloatX formats are based on binary IEEE 754 with smaller significand and Exponent Bit counts specified by the user. Among other properties, FloatX facilitates an incremental transformation of the code, relies on hardware-supported floating-point types as back-end to preserve efficiency, and incurs no storage overhead. The article discusses in detail the design principles, programming interface, and datatype casting rules behind FloatX. Furthermore, it demonstrates FloatX's usage and benefits via several case studies from well-known numerical dense linear algebra libraries, such as BLAS and LAPACK; the Ginkgo library for sparse linear systems; and two neural network applications related with image processing and text recognition.This work was supported by the CICYT projects TIN2014-53495-R and TIN2017-82972-R of the MINECO and FEDER, and the EU H2020 project 732631 "OPRECOMP. Open Transprecision Computing."Flegar, G.; Scheidegger, F.; Novakovic, V.; Mariani, G.; Tomás Domínguez, AE.; Malossi, C.; Quintana-Ortí, ES. (2019). FloatX: A C++ Library for Customized Floating-Point Arithmetic. ACM Transactions on Mathematical Software. 45(4):1-23. https://doi.org/10.1145/3368086S12345

15 days free trial to Access Article

Tomás Domínguez, Andrés Enrique - One of the best experts on this subject based on the ideXlab platform.

FloatX: A C++ Library for Customized Floating-Point Arithmetic

'Association for Computing Machinery (ACM)', 2019

Co-Authors: Flegar Goran, Scheidegger Florian, Novakovic Vedran, Mariani Giovani, Tomás Domínguez, Andrés Enrique, Malossi Cristiano, Quintana-ortí, Enrique S.

Abstract:

"© ACM, 2019. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in ACM Transactions on Mathematical Software, {45, 4, (2019)} https://dl.acm.org/doi/10.1145/3368086"[EN] We present FloatX (Float eXtended), a C++ framework to investigate the effect of leveraging customized floating-point formats in numerical applications. FloatX formats are based on binary IEEE 754 with smaller significand and Exponent Bit counts specified by the user. Among other properties, FloatX facilitates an incremental transformation of the code, relies on hardware-supported floating-point types as back-end to preserve efficiency, and incurs no storage overhead. The article discusses in detail the design principles, programming interface, and datatype casting rules behind FloatX. Furthermore, it demonstrates FloatX's usage and benefits via several case studies from well-known numerical dense linear algebra libraries, such as BLAS and LAPACK; the Ginkgo library for sparse linear systems; and two neural network applications related with image processing and text recognition.This work was supported by the CICYT projects TIN2014-53495-R and TIN2017-82972-R of the MINECO and FEDER, and the EU H2020 project 732631 "OPRECOMP. Open Transprecision Computing."Flegar, G.; Scheidegger, F.; Novakovic, V.; Mariani, G.; Tomás Domínguez, AE.; Malossi, C.; Quintana-Ortí, ES. (2019). FloatX: A C++ Library for Customized Floating-Point Arithmetic. ACM Transactions on Mathematical Software. 45(4):1-23. https://doi.org/10.1145/3368086S12345

15 days free trial to Access Article

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

Flegar Goran - One of the best experts on this subject based on the ideXlab platform.

FloatX: A C++ Library for Customized Floating-Point Arithmetic

Scheidegger Florian - One of the best experts on this subject based on the ideXlab platform.

FloatX: A C++ Library for Customized Floating-Point Arithmetic

Novakovic Vedran - One of the best experts on this subject based on the ideXlab platform.

FloatX: A C++ Library for Customized Floating-Point Arithmetic

Mariani Giovani - One of the best experts on this subject based on the ideXlab platform.

FloatX: A C++ Library for Customized Floating-Point Arithmetic

Tomás Domínguez, Andrés Enrique - One of the best experts on this subject based on the ideXlab platform.

FloatX: A C++ Library for Customized Floating-Point Arithmetic