Hadoop Platform

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 3369 Experts worldwide ranked by ideXlab platform

Piyang Chen - One of the best experts on this subject based on the ideXlab platform.

  • optimizing the cloud Platform performance for supporting large scale cognitive radio networks
    Wireless Communications and Networking Conference, 2012
    Co-Authors: Shieyuan Wang, Pofan Wang, Piyang Chen
    Abstract:

    In this paper, we optimize the performance of a cloud Platform to effectively support cooperative spectrum sensing in a cognitive radio (CR) cloud network. This cloud uses the Apache Hadoop Platform to run a cooperative spectrum sensing algorithm in parallel over multiple servers in the cloud. A cooperative spectrum sensing algorithm needs to process a very large number of spectrum sensing reports per second to quickly update its database that stores the current activities of all primary users of the CR network. Because the updates of the database must be finished as soon as possible to make the CR approach effective, the cloud Platform must be able to run the algorithm in real time with as little overhead as possible. In this work, we first measured the execution time of such an algorithm over our own cloud and the Amazon EC2 public cloud, using the original Hadoop Platform design and implementation. We found that the original Hadoop Platform has too much fixed overhead and incurs too much delay to the cooperative spectrum sensing algorithm, which makes it unable to update the primary user database in just a few seconds. Therefore, we studied the source code and the design and implementation of the Hadoop Platform to improve its performance. Our experimental results show that our improvement of the Hadoop Platform can significantly reduce the required time of the cooperative spectrum sensing algorithm and make it more suitable for large-scale CR networks.

  • WCNC - Optimizing the cloud Platform performance for supporting large-scale cognitive radio networks
    2012 IEEE Wireless Communications and Networking Conference (WCNC), 2012
    Co-Authors: Shieyuan Wang, Pofan Wang, Piyang Chen
    Abstract:

    In this paper, we optimize the performance of a cloud Platform to effectively support cooperative spectrum sensing in a cognitive radio (CR) cloud network. This cloud uses the Apache Hadoop Platform to run a cooperative spectrum sensing algorithm in parallel over multiple servers in the cloud. A cooperative spectrum sensing algorithm needs to process a very large number of spectrum sensing reports per second to quickly update its database that stores the current activities of all primary users of the CR network. Because the updates of the database must be finished as soon as possible to make the CR approach effective, the cloud Platform must be able to run the algorithm in real time with as little overhead as possible. In this work, we first measured the execution time of such an algorithm over our own cloud and the Amazon EC2 public cloud, using the original Hadoop Platform design and implementation. We found that the original Hadoop Platform has too much fixed overhead and incurs too much delay to the cooperative spectrum sensing algorithm, which makes it unable to update the primary user database in just a few seconds. Therefore, we studied the source code and the design and implementation of the Hadoop Platform to improve its performance. Our experimental results show that our improvement of the Hadoop Platform can significantly reduce the required time of the cooperative spectrum sensing algorithm and make it more suitable for large-scale CR networks.

Hsiaoping Tsai - One of the best experts on this subject based on the ideXlab platform.

  • PAKDD Workshops - Mining Uncertain Sequence Data on Hadoop Platform
    Lecture Notes in Computer Science, 2014
    Co-Authors: Ziyun Sun, Mingche Tsai, Hsiaoping Tsai
    Abstract:

    Sequence pattern mining is the mining of special and representative features hidden in sequence data. Recently, it has been attracting a lot of attention, especially in the fields of bioinformatics and spatio-temporal trajectory mining. Observing that many sequence data are born with uncertainties and huge sequence data are increasingly generated and accumulated, this paper aims to discover the hidden features from a large amount of uncertain sequence data. Specifically, Probabilistic Suffix Tree (PST) is an implementation of Variable-length Markov Chain (VMM) that has been widely applied in sequence data mining. However, the conventional PST construction algorithm is not for the mining of uncertain data and cannot bear the computing of huge data. Thus, to mine a large amount of sequence data with uncertainties, this paper proposes the uPST\(_{MR}^+\) algorithm on the Hadoop Platform to fully utilize the computing power and storage capacity of cloud computing. The proposed uPST\(_{MR}^+\) algorithm constructs a PST in a progressive, multi-layered, and iterative manner so as to avoid excessive learning patterns and balance the overhead of distributed computing. In addition, to prevent the drag on overall performance owing to multiple scanning of the entire sequence data, we trade space for time by using a NodeArray data structure to store the intermediate statistical results to reduce disk I/O. To verify the performance of uPST\(_{MR}^{+}\), we conduct several experiments. The experimental results show that uPST\(_{MR}^{+}\) outperforms the naive approach significantly and show good scalability and stability. Also, although using NodeArray costs a little extra memory, the execution time is significantly lowered.

  • mining uncertain sequence data on Hadoop Platform
    Pacific-Asia Conference on Knowledge Discovery and Data Mining, 2014
    Co-Authors: Ziyun Sun, Mingche Tsai, Hsiaoping Tsai
    Abstract:

    Sequence pattern mining is the mining of special and representative features hidden in sequence data. Recently, it has been attracting a lot of attention, especially in the fields of bioinformatics and spatio-temporal trajectory mining. Observing that many sequence data are born with uncertainties and huge sequence data are increasingly generated and accumulated, this paper aims to discover the hidden features from a large amount of uncertain sequence data. Specifically, Probabilistic Suffix Tree (PST) is an implementation of Variable-length Markov Chain (VMM) that has been widely applied in sequence data mining. However, the conventional PST construction algorithm is not for the mining of uncertain data and cannot bear the computing of huge data. Thus, to mine a large amount of sequence data with uncertainties, this paper proposes the uPST\(_{MR}^+\) algorithm on the Hadoop Platform to fully utilize the computing power and storage capacity of cloud computing. The proposed uPST\(_{MR}^+\) algorithm constructs a PST in a progressive, multi-layered, and iterative manner so as to avoid excessive learning patterns and balance the overhead of distributed computing. In addition, to prevent the drag on overall performance owing to multiple scanning of the entire sequence data, we trade space for time by using a NodeArray data structure to store the intermediate statistical results to reduce disk I/O. To verify the performance of uPST\(_{MR}^{+}\), we conduct several experiments. The experimental results show that uPST\(_{MR}^{+}\) outperforms the naive approach significantly and show good scalability and stability. Also, although using NodeArray costs a little extra memory, the execution time is significantly lowered.

Shieyuan Wang - One of the best experts on this subject based on the ideXlab platform.

  • optimizing the cloud Platform performance for supporting large scale cognitive radio networks
    Wireless Communications and Networking Conference, 2012
    Co-Authors: Shieyuan Wang, Pofan Wang, Piyang Chen
    Abstract:

    In this paper, we optimize the performance of a cloud Platform to effectively support cooperative spectrum sensing in a cognitive radio (CR) cloud network. This cloud uses the Apache Hadoop Platform to run a cooperative spectrum sensing algorithm in parallel over multiple servers in the cloud. A cooperative spectrum sensing algorithm needs to process a very large number of spectrum sensing reports per second to quickly update its database that stores the current activities of all primary users of the CR network. Because the updates of the database must be finished as soon as possible to make the CR approach effective, the cloud Platform must be able to run the algorithm in real time with as little overhead as possible. In this work, we first measured the execution time of such an algorithm over our own cloud and the Amazon EC2 public cloud, using the original Hadoop Platform design and implementation. We found that the original Hadoop Platform has too much fixed overhead and incurs too much delay to the cooperative spectrum sensing algorithm, which makes it unable to update the primary user database in just a few seconds. Therefore, we studied the source code and the design and implementation of the Hadoop Platform to improve its performance. Our experimental results show that our improvement of the Hadoop Platform can significantly reduce the required time of the cooperative spectrum sensing algorithm and make it more suitable for large-scale CR networks.

  • WCNC - Optimizing the cloud Platform performance for supporting large-scale cognitive radio networks
    2012 IEEE Wireless Communications and Networking Conference (WCNC), 2012
    Co-Authors: Shieyuan Wang, Pofan Wang, Piyang Chen
    Abstract:

    In this paper, we optimize the performance of a cloud Platform to effectively support cooperative spectrum sensing in a cognitive radio (CR) cloud network. This cloud uses the Apache Hadoop Platform to run a cooperative spectrum sensing algorithm in parallel over multiple servers in the cloud. A cooperative spectrum sensing algorithm needs to process a very large number of spectrum sensing reports per second to quickly update its database that stores the current activities of all primary users of the CR network. Because the updates of the database must be finished as soon as possible to make the CR approach effective, the cloud Platform must be able to run the algorithm in real time with as little overhead as possible. In this work, we first measured the execution time of such an algorithm over our own cloud and the Amazon EC2 public cloud, using the original Hadoop Platform design and implementation. We found that the original Hadoop Platform has too much fixed overhead and incurs too much delay to the cooperative spectrum sensing algorithm, which makes it unable to update the primary user database in just a few seconds. Therefore, we studied the source code and the design and implementation of the Hadoop Platform to improve its performance. Our experimental results show that our improvement of the Hadoop Platform can significantly reduce the required time of the cooperative spectrum sensing algorithm and make it more suitable for large-scale CR networks.

Ziyun Sun - One of the best experts on this subject based on the ideXlab platform.

  • PAKDD Workshops - Mining Uncertain Sequence Data on Hadoop Platform
    Lecture Notes in Computer Science, 2014
    Co-Authors: Ziyun Sun, Mingche Tsai, Hsiaoping Tsai
    Abstract:

    Sequence pattern mining is the mining of special and representative features hidden in sequence data. Recently, it has been attracting a lot of attention, especially in the fields of bioinformatics and spatio-temporal trajectory mining. Observing that many sequence data are born with uncertainties and huge sequence data are increasingly generated and accumulated, this paper aims to discover the hidden features from a large amount of uncertain sequence data. Specifically, Probabilistic Suffix Tree (PST) is an implementation of Variable-length Markov Chain (VMM) that has been widely applied in sequence data mining. However, the conventional PST construction algorithm is not for the mining of uncertain data and cannot bear the computing of huge data. Thus, to mine a large amount of sequence data with uncertainties, this paper proposes the uPST\(_{MR}^+\) algorithm on the Hadoop Platform to fully utilize the computing power and storage capacity of cloud computing. The proposed uPST\(_{MR}^+\) algorithm constructs a PST in a progressive, multi-layered, and iterative manner so as to avoid excessive learning patterns and balance the overhead of distributed computing. In addition, to prevent the drag on overall performance owing to multiple scanning of the entire sequence data, we trade space for time by using a NodeArray data structure to store the intermediate statistical results to reduce disk I/O. To verify the performance of uPST\(_{MR}^{+}\), we conduct several experiments. The experimental results show that uPST\(_{MR}^{+}\) outperforms the naive approach significantly and show good scalability and stability. Also, although using NodeArray costs a little extra memory, the execution time is significantly lowered.

  • mining uncertain sequence data on Hadoop Platform
    Pacific-Asia Conference on Knowledge Discovery and Data Mining, 2014
    Co-Authors: Ziyun Sun, Mingche Tsai, Hsiaoping Tsai
    Abstract:

    Sequence pattern mining is the mining of special and representative features hidden in sequence data. Recently, it has been attracting a lot of attention, especially in the fields of bioinformatics and spatio-temporal trajectory mining. Observing that many sequence data are born with uncertainties and huge sequence data are increasingly generated and accumulated, this paper aims to discover the hidden features from a large amount of uncertain sequence data. Specifically, Probabilistic Suffix Tree (PST) is an implementation of Variable-length Markov Chain (VMM) that has been widely applied in sequence data mining. However, the conventional PST construction algorithm is not for the mining of uncertain data and cannot bear the computing of huge data. Thus, to mine a large amount of sequence data with uncertainties, this paper proposes the uPST\(_{MR}^+\) algorithm on the Hadoop Platform to fully utilize the computing power and storage capacity of cloud computing. The proposed uPST\(_{MR}^+\) algorithm constructs a PST in a progressive, multi-layered, and iterative manner so as to avoid excessive learning patterns and balance the overhead of distributed computing. In addition, to prevent the drag on overall performance owing to multiple scanning of the entire sequence data, we trade space for time by using a NodeArray data structure to store the intermediate statistical results to reduce disk I/O. To verify the performance of uPST\(_{MR}^{+}\), we conduct several experiments. The experimental results show that uPST\(_{MR}^{+}\) outperforms the naive approach significantly and show good scalability and stability. Also, although using NodeArray costs a little extra memory, the execution time is significantly lowered.

Pofan Wang - One of the best experts on this subject based on the ideXlab platform.

  • optimizing the cloud Platform performance for supporting large scale cognitive radio networks
    Wireless Communications and Networking Conference, 2012
    Co-Authors: Shieyuan Wang, Pofan Wang, Piyang Chen
    Abstract:

    In this paper, we optimize the performance of a cloud Platform to effectively support cooperative spectrum sensing in a cognitive radio (CR) cloud network. This cloud uses the Apache Hadoop Platform to run a cooperative spectrum sensing algorithm in parallel over multiple servers in the cloud. A cooperative spectrum sensing algorithm needs to process a very large number of spectrum sensing reports per second to quickly update its database that stores the current activities of all primary users of the CR network. Because the updates of the database must be finished as soon as possible to make the CR approach effective, the cloud Platform must be able to run the algorithm in real time with as little overhead as possible. In this work, we first measured the execution time of such an algorithm over our own cloud and the Amazon EC2 public cloud, using the original Hadoop Platform design and implementation. We found that the original Hadoop Platform has too much fixed overhead and incurs too much delay to the cooperative spectrum sensing algorithm, which makes it unable to update the primary user database in just a few seconds. Therefore, we studied the source code and the design and implementation of the Hadoop Platform to improve its performance. Our experimental results show that our improvement of the Hadoop Platform can significantly reduce the required time of the cooperative spectrum sensing algorithm and make it more suitable for large-scale CR networks.

  • WCNC - Optimizing the cloud Platform performance for supporting large-scale cognitive radio networks
    2012 IEEE Wireless Communications and Networking Conference (WCNC), 2012
    Co-Authors: Shieyuan Wang, Pofan Wang, Piyang Chen
    Abstract:

    In this paper, we optimize the performance of a cloud Platform to effectively support cooperative spectrum sensing in a cognitive radio (CR) cloud network. This cloud uses the Apache Hadoop Platform to run a cooperative spectrum sensing algorithm in parallel over multiple servers in the cloud. A cooperative spectrum sensing algorithm needs to process a very large number of spectrum sensing reports per second to quickly update its database that stores the current activities of all primary users of the CR network. Because the updates of the database must be finished as soon as possible to make the CR approach effective, the cloud Platform must be able to run the algorithm in real time with as little overhead as possible. In this work, we first measured the execution time of such an algorithm over our own cloud and the Amazon EC2 public cloud, using the original Hadoop Platform design and implementation. We found that the original Hadoop Platform has too much fixed overhead and incurs too much delay to the cooperative spectrum sensing algorithm, which makes it unable to update the primary user database in just a few seconds. Therefore, we studied the source code and the design and implementation of the Hadoop Platform to improve its performance. Our experimental results show that our improvement of the Hadoop Platform can significantly reduce the required time of the cooperative spectrum sensing algorithm and make it more suitable for large-scale CR networks.