Serial Version

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 10212 Experts worldwide ranked by ideXlab platform

Elias D Nino - One of the best experts on this subject based on the ideXlab platform.

  • a parallel implementation of the ensemble kalman filter based on modified cholesky decomposition
    Journal of Computational Science, 2019
    Co-Authors: Elias D Nino, Adrian Sandu, Xinwei Deng
    Abstract:

    Abstract This paper discusses an efficient parallel implementation of the ensemble Kalman filter based on the modified Cholesky decomposition. The proposed implementation starts with decomposing the domain into sub-domains. In each sub-domain a sparse estimation of the inverse background error covariance matrix is computed via a modified Cholesky decomposition; the estimates are computed concurrently on separate processors. The sparsity of this estimator is dictated by the conditional independence of model components for some radius of influence. Then, the assimilation step is carried out in parallel without the need of inter-processor communication. Once the local analysis states are computed, the analysis sub-domains are mapped back onto the global domain to obtain the analysis ensemble. Computational experiments are performed using the Atmospheric General Circulation Model (SPEEDY) with the T-63 resolution on the Blueridge cluster at Virginia Tech. The number of processors used in the experiments ranges from 96 to 2048. The proposed implementation outperforms in terms of accuracy the well-known local ensemble transform Kalman filter (LETKF) for all the model variables. The computational time of the proposed implementation is similar to that of the parallel LETKF method (where no covariance estimation is performed). Finally, for the largest number of processors, the proposed parallel implementation is 400 times faster than the Serial Version of the proposed method.

Xinwei Deng - One of the best experts on this subject based on the ideXlab platform.

  • a parallel implementation of the ensemble kalman filter based on modified cholesky decomposition
    Journal of Computational Science, 2019
    Co-Authors: Elias D Nino, Adrian Sandu, Xinwei Deng
    Abstract:

    Abstract This paper discusses an efficient parallel implementation of the ensemble Kalman filter based on the modified Cholesky decomposition. The proposed implementation starts with decomposing the domain into sub-domains. In each sub-domain a sparse estimation of the inverse background error covariance matrix is computed via a modified Cholesky decomposition; the estimates are computed concurrently on separate processors. The sparsity of this estimator is dictated by the conditional independence of model components for some radius of influence. Then, the assimilation step is carried out in parallel without the need of inter-processor communication. Once the local analysis states are computed, the analysis sub-domains are mapped back onto the global domain to obtain the analysis ensemble. Computational experiments are performed using the Atmospheric General Circulation Model (SPEEDY) with the T-63 resolution on the Blueridge cluster at Virginia Tech. The number of processors used in the experiments ranges from 96 to 2048. The proposed implementation outperforms in terms of accuracy the well-known local ensemble transform Kalman filter (LETKF) for all the model variables. The computational time of the proposed implementation is similar to that of the parallel LETKF method (where no covariance estimation is performed). Finally, for the largest number of processors, the proposed parallel implementation is 400 times faster than the Serial Version of the proposed method.

  • a parallel ensemble kalman filter implementation based on modified cholesky decomposition
    Proceedings of the 6th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, 2015
    Co-Authors: Elias D Ninoruiz, Adrian Sandu, Xinwei Deng
    Abstract:

    This paper discusses an efficient parallel implementation of the ensemble Kalman filter based on the modified Cholesky decomposition. The proposed implementation starts with decomposing the domain into sub-domains. In each sub-domain a sparse estimation of the inverse background error covariance matrix is computed via a modified Cholesky decomposition; the estimates are computed concurrently on separate processors. The sparsity of this estimator is dictated by the conditional independence of model components for some radius of influence. Then, the assimilation step is carried out in parallel without the need of inter-processor communication. Once the local analysis states are computed, the analysis sub-domains are mapped back onto the global domain to obtain the analysis ensemble. Computational experiments are performed using the Atmospheric General Circulation Model (SPEEDY) with the T-63 resolution on the Blueridge cluster at Virginia Tech. The number of processors used in the experiments ranges from 96 to 2,048. The proposed implementation outperforms in terms of accuracy the well-known local ensemble transform Kalman filter (LETKF) for all the model variables. The computational time of the proposed implementation is similar to that of the parallel LETKF method (where no covariance estimation is performed). Finally, for the largest number of processors, the proposed parallel implementation is 400 times faster than the Serial Version of the proposed method.

Jin Hoon Bang - One of the best experts on this subject based on the ideXlab platform.

  • classification based financial markets prediction using deep neural networks
    Algorithmic Finance, 2017
    Co-Authors: Matthew Dixon, Diego Klabjan, Jin Hoon Bang
    Abstract:

    Deep neural networks (DNNs) are powerful types of artificial neural networks (ANNs) that use several hidden layers. They have recently gained considerable attention in the speech transcription and image recognition community (Krizhevsky et al., 2012 ) for their superior predictive properties including robustness to overfitting. However their application to algorithmic trading has not been previously researched, partly because of their computational complexity. This paper describes the application of DNNs to predicting financial market movement directions. In particular we describe the configuration and training approach and then demonstrate their application to backtesting a simple trading strategy over 43 different Commodity and FX future mid-prices at 5-minute intervals. All results in this paper are generated using a C++ implementation on the Intel Xeon Phi co-processor which is 11.4x faster than the Serial Version and a Python strategy backtesting environment both of which are available as open source code written by the authors.

  • classification based financial markets prediction using deep neural networks
    Social Science Research Network, 2016
    Co-Authors: Matthew Dixon, Diego Klabjan, Jin Hoon Bang
    Abstract:

    Deep neural networks (DNNs) are powerful types of artificial neural networks (ANNs) that use several hidden layers. They have recently gained considerable attention in the speech transcription and image recognition community for their superior predictive properties including robustness to over fitting. However their application to algorithmic trading has not been previously researched, partly because of their computational complexity. This paper describes the application of DNNs to predicting financial market movement directions. In particular we describe the configuration and training approach and then demonstrate their application to back testing a simple trading strategy over 43 different Commodity and FX future mid-prices at 5-minute intervals. All results in this paper are generated using a C implementation on the Intel Xeon Phi co-processor which is 11.4x faster than the Serial Version and a Python strategy back testing environment both of which are available as open source code written by the authors.

Michal Brylinski - One of the best experts on this subject based on the ideXlab platform.

  • Accelerating the Pace of Protein Functional Annotation With Intel Xeon Phi Coprocessors
    IEEE Transactions on Nanobioscience, 2015
    Co-Authors: Wei P. Feinstein, Juana Moreno, Mark Jarrell, Michal Brylinski
    Abstract:

    Intel Xeon Phi is a new addition to the family of powerful parallel accelerators. The range of its potential applications in computationally driven research is broad; however, at present, the repository of scientific codes is still relatively limited. In this study, we describe the development and benchmarking of a parallel Version of ${\mmb e}$ FindSite, a structural bioinformatics algorithm for the prediction of ligand-binding sites in proteins. Implemented for the Intel Xeon Phi platform, the parallelization of the structure alignment portion of ${\mmb e}$ FindSite using pragma-based OpenMP brings about the desired performance improvements, which scale well with the number of computing cores. Compared to a Serial Version, the parallel code runs 11.8 and 10.1 times faster on the CPU and the coprocessor, respectively; when both resources are utilized simultaneously, the speedup is 17.6. For example, ligand-binding predictions for 501 benchmarking proteins are completed in 2.1 hours on a single Stampede node equipped with the Intel Xeon Phi card compared to 3.1 hours without the accelerator and 36.8 hours required by a Serial Version. In addition to the satisfactory parallel performance, porting existing scientific codes to the Intel Xeon Phi architecture is relatively straightforward with a short development time due to the support of common parallel programming models by the coprocessor. The parallel Version of ${\mmb e}$ FindSite is freely available to the academic community at www.brylinski.org/efindsite .

Filippo Bistaffa - One of the best experts on this subject based on the ideXlab platform.

  • algorithms for graph constrained coalition formation in the real world
    ACM Transactions on Intelligent Systems and Technology, 2017
    Co-Authors: Filippo Bistaffa, Alessandro Farinelli, Jesus Cerquides, Juan A Rodriguezaguilar, Sarvapali D Ramchurn
    Abstract:

    Coalition formation typically involves the coming together of multiple, heterogeneous, agents to achieve both their individual and collective goals. In this article, we focus on a special case of coalition formation known as Graph-Constrained Coalition Formation (GCCF) whereby a network connecting the agents constrains the formation of coalitions. We focus on this type of problem given that in many real-world applications, agents may be connected by a communication network or only trust certain peers in their social network. We propose a novel representation of this problem based on the concept of edge contraction, which allows us to model the search space induced by the GCCF problem as a rooted tree. Then, we propose an anytime solution algorithm (Coalition Formation for Sparse Synergies (CFSS)), which is particularly efficient when applied to a general class of characteristic functions called m + a functions. Moreover, we show how CFSS can be efficiently parallelised to solve GCCF using a nonredundant partition of the search space. We benchmark CFSS on both synthetic and realistic scenarios, using a real-world dataset consisting of the energy consumption of a large number of households in the UK. Our results show that, in the best case, the Serial Version of CFSS is four orders of magnitude faster than the state of the art, while the parallel Version is 9.44 times faster than the Serial Version on a 12-core machine. Moreover, CFSS is the first approach to provide anytime approximate solutions with quality guarantees for very large systems of agents (i.e., with more than 2,700 agents).

  • algorithms for graph constrained coalition formation in the real world
    arXiv: Multiagent Systems, 2016
    Co-Authors: Filippo Bistaffa, Alessandro Farinelli, Jesus Cerquides, Juan A Rodriguezaguilar, Sarvapali D Ramchurn
    Abstract:

    Coalition formation typically involves the coming together of multiple, heterogeneous, agents to achieve both their individual and collective goals. In this paper, we focus on a special case of coalition formation known as Graph-Constrained Coalition Formation (GCCF) whereby a network connecting the agents constrains the formation of coalitions. We focus on this type of problem given that in many real-world applications, agents may be connected by a communication network or only trust certain peers in their social network. We propose a novel representation of this problem based on the concept of edge contraction, which allows us to model the search space induced by the GCCF problem as a rooted tree. Then, we propose an anytime solution algorithm (CFSS), which is particularly efficient when applied to a general class of characteristic functions called $m+a$ functions. Moreover, we show how CFSS can be efficiently parallelised to solve GCCF using a non-redundant partition of the search space. We benchmark CFSS on both synthetic and realistic scenarios, using a real-world dataset consisting of the energy consumption of a large number of households in the UK. Our results show that, in the best case, the Serial Version of CFSS is 4 orders of magnitude faster than the state of the art, while the parallel Version is 9.44 times faster than the Serial Version on a 12-core machine. Moreover, CFSS is the first approach to provide anytime approximate solutions with quality guarantees for very large systems of agents (i.e., with more than 2700 agents).

  • cube a cuda approach for bucket elimination on gpus
    European Conference on Artificial Intelligence, 2016
    Co-Authors: Filippo Bistaffa, Nicola Bombieri, Alessandro Farinelli
    Abstract:

    We consider Bucket Elimination (BE), a popular algorithmic framework to solve Constraint Optimisation Problems (COPs). We focus on the parallelisation of the most computationally intensive operations of BE, i.e., join sum and maximisation, which are key ingredients in several close variants of the BE framework (including Belief Propagation on Junction Trees and Distributed COP techniques such as ActionGDL and DPOP). In particular, we propose CUBE, a highly-parallel GPU implementation of such operations, which adopts an efficient memory layout allowing all threads to independently locate their input and output addresses in memory, hence achieving a high computational throughput. We compare CUBE with the most recent GPU implementation of BE. Our results show that CUBE achieves significant speed-ups (up to two orders of magnitude) w.r.t. the counterpart approach, showing a dramatic decrease of the runtime w.r.t. the Serial Version (i.e., up to 652× faster). More important, such speed-ups increase when the complexity of the problem grows, showing that CUBE correctly exploits the additional degree of parallelism inherent in the problem.