Lower Price Point

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 102 Experts worldwide ranked by ideXlab platform

Matei Ripeanu - One of the best experts on this subject based on the ideXlab platform.

  • On Graphs, GPUs, and Blind Dating: A Workload to Processor Matchmaking Quest
    2013 IEEE 27th International Symposium on Parallel and Distributed Processing, 2013
    Co-Authors: Abdullah Gharaibeh, Lauro Beltrão Beltrão Costa, Elizeu Santos-neto, Matei Ripeanu
    Abstract:

    Graph processing has gained renewed attention. The increasing large scale and wealth of connected data, such as those accrued by social network applications, demand the design of new techniques and platforms to efficiently derive actionable information from large scale graphs. Hybrid systems that host processing units optimized for both fast sequential processing and bulk processing (e.g., GPUaccelerated systems) have the potential to cope with the heterogeneous structure of real graphs and enable high performance graph processing. Reaching this Point, however, poses multiple challenges. The heterogeneity of the processing elements (e.g., GPUs implement a different parallel processing model than CPUs and have much less memory) and the inherent irregularity of graph workloads require careful graph partitioning and load assignment. In particular, the workload generated by a partitioning scheme should match the strength of the processing element the partition is allocated to. This work explores the feasibility and quantifies the performance gains of such low-cost partitioning schemes. We propose to partition the workload between the two types of processing elements based on vertex connectivity. We show that such partitioning schemes offer a simple, yet efficient way to boost the overall performance of the hybrid system. Our evaluation illustrates that processing a 4-billion edges graph on a system with one CPU socket and one GPU, while offloading as little as 25% of the edges to the GPU, achieves 2x performance improvement over state-of-the-art implementations running on a dual-socket symmetric system. Moreover, for the same graph, a hybrid system with dualsocket and dual-GPU is capable of 1.13 Billion breadth-first search traversed edge per second, a performance rate that is competitive with the latest entries in the Graph500 list, yet at a much Lower Price Point.

  • IPDPS - On Graphs, GPUs, and Blind Dating: A Workload to Processor Matchmaking Quest
    2013 IEEE 27th International Symposium on Parallel and Distributed Processing, 2013
    Co-Authors: Abdullah Gharaibeh, Elizeu Santos-neto, Lauro Beltrao Costa, Matei Ripeanu
    Abstract:

    Graph processing has gained renewed attention. The increasing large scale and wealth of connected data, such as those accrued by social network applications, demand the design of new techniques and platforms to efficiently derive actionable information from large scale graphs. Hybrid systems that host processing units optimized for both fast sequential processing and bulk processing (e.g., GPUaccelerated systems) have the potential to cope with the heterogeneous structure of real graphs and enable high performance graph processing. Reaching this Point, however, poses multiple challenges. The heterogeneity of the processing elements (e.g., GPUs implement a different parallel processing model than CPUs and have much less memory) and the inherent irregularity of graph workloads require careful graph partitioning and load assignment. In particular, the workload generated by a partitioning scheme should match the strength of the processing element the partition is allocated to. This work explores the feasibility and quantifies the performance gains of such low-cost partitioning schemes. We propose to partition the workload between the two types of processing elements based on vertex connectivity. We show that such partitioning schemes offer a simple, yet efficient way to boost the overall performance of the hybrid system. Our evaluation illustrates that processing a 4-billion edges graph on a system with one CPU socket and one GPU, while offloading as little as 25% of the edges to the GPU, achieves 2x performance improvement over state-of-the-art implementations running on a dual-socket symmetric system. Moreover, for the same graph, a hybrid system with dualsocket and dual-GPU is capable of 1.13 Billion breadth-first search traversed edge per second, a performance rate that is competitive with the latest entries in the Graph500 list, yet at a much Lower Price Point.

Abdullah Gharaibeh - One of the best experts on this subject based on the ideXlab platform.

  • On Graphs, GPUs, and Blind Dating: A Workload to Processor Matchmaking Quest
    2013 IEEE 27th International Symposium on Parallel and Distributed Processing, 2013
    Co-Authors: Abdullah Gharaibeh, Lauro Beltrão Beltrão Costa, Elizeu Santos-neto, Matei Ripeanu
    Abstract:

    Graph processing has gained renewed attention. The increasing large scale and wealth of connected data, such as those accrued by social network applications, demand the design of new techniques and platforms to efficiently derive actionable information from large scale graphs. Hybrid systems that host processing units optimized for both fast sequential processing and bulk processing (e.g., GPUaccelerated systems) have the potential to cope with the heterogeneous structure of real graphs and enable high performance graph processing. Reaching this Point, however, poses multiple challenges. The heterogeneity of the processing elements (e.g., GPUs implement a different parallel processing model than CPUs and have much less memory) and the inherent irregularity of graph workloads require careful graph partitioning and load assignment. In particular, the workload generated by a partitioning scheme should match the strength of the processing element the partition is allocated to. This work explores the feasibility and quantifies the performance gains of such low-cost partitioning schemes. We propose to partition the workload between the two types of processing elements based on vertex connectivity. We show that such partitioning schemes offer a simple, yet efficient way to boost the overall performance of the hybrid system. Our evaluation illustrates that processing a 4-billion edges graph on a system with one CPU socket and one GPU, while offloading as little as 25% of the edges to the GPU, achieves 2x performance improvement over state-of-the-art implementations running on a dual-socket symmetric system. Moreover, for the same graph, a hybrid system with dualsocket and dual-GPU is capable of 1.13 Billion breadth-first search traversed edge per second, a performance rate that is competitive with the latest entries in the Graph500 list, yet at a much Lower Price Point.

  • IPDPS - On Graphs, GPUs, and Blind Dating: A Workload to Processor Matchmaking Quest
    2013 IEEE 27th International Symposium on Parallel and Distributed Processing, 2013
    Co-Authors: Abdullah Gharaibeh, Elizeu Santos-neto, Lauro Beltrao Costa, Matei Ripeanu
    Abstract:

    Graph processing has gained renewed attention. The increasing large scale and wealth of connected data, such as those accrued by social network applications, demand the design of new techniques and platforms to efficiently derive actionable information from large scale graphs. Hybrid systems that host processing units optimized for both fast sequential processing and bulk processing (e.g., GPUaccelerated systems) have the potential to cope with the heterogeneous structure of real graphs and enable high performance graph processing. Reaching this Point, however, poses multiple challenges. The heterogeneity of the processing elements (e.g., GPUs implement a different parallel processing model than CPUs and have much less memory) and the inherent irregularity of graph workloads require careful graph partitioning and load assignment. In particular, the workload generated by a partitioning scheme should match the strength of the processing element the partition is allocated to. This work explores the feasibility and quantifies the performance gains of such low-cost partitioning schemes. We propose to partition the workload between the two types of processing elements based on vertex connectivity. We show that such partitioning schemes offer a simple, yet efficient way to boost the overall performance of the hybrid system. Our evaluation illustrates that processing a 4-billion edges graph on a system with one CPU socket and one GPU, while offloading as little as 25% of the edges to the GPU, achieves 2x performance improvement over state-of-the-art implementations running on a dual-socket symmetric system. Moreover, for the same graph, a hybrid system with dualsocket and dual-GPU is capable of 1.13 Billion breadth-first search traversed edge per second, a performance rate that is competitive with the latest entries in the Graph500 list, yet at a much Lower Price Point.

Mo Li - One of the best experts on this subject based on the ideXlab platform.

  • ETFA - Localization for industrial warehouse storage rack using passive UHF RFID system
    2017 22nd IEEE International Conference on Emerging Technologies and Factory Automation (ETFA), 2017
    Co-Authors: Sheng Huang, Sethu Jose, Mo Li
    Abstract:

    Tracking the location of items on the warehouse storage rack automatically is required by manufacturing companies to improve operational efficiency, productivity, and resource utilization. Due to the long reading range and faster data transfer rate, passive ultra-high frequency (UHF) RFID system is explored to localize the items on the rack in this study. A tag is incorporated into an item on the rack for localization using radio signals. The received signal strength indication (RSSI) and radio frequency (RF) phase are utilized for localization. A supervised clustering approach is developed based on the measurements of location tag at the different zone of the rack. The localization accuracy is improved by using the kernel as a similarity measure. If the desired granularity of item location is not small, the Lower Price Point per tag makes passive RFID systems more economical than active RFID systems.

  • Localization for industrial warehouse storage rack using passive UHF RFID system
    2017 22nd IEEE International Conference on Emerging Technologies and Factory Automation (ETFA), 2017
    Co-Authors: Sheng Huang, Sethu Jose, Mo Li
    Abstract:

    Tracking the location of items on the warehouse storage rack automatically is required by manufacturing companies to improve operational efficiency, productivity, and resource utilization. Due to the long reading range and faster data transfer rate, passive ultra-high frequency (UHF) RFID system is explored to localize the items on the rack in this study. A tag is incorporated into an item on the rack for localization using radio signals. The received signal strength indication (RSSI) and radio frequency (RF) phase are utilized for localization. A supervised clustering approach is developed based on the measurements of location tag at the different zone of the rack. The localization accuracy is improved by using the kernel as a similarity measure. If the desired granularity of item location is not small, the Lower Price Point per tag makes passive RFID systems more economical than active RFID systems.

Elizeu Santos-neto - One of the best experts on this subject based on the ideXlab platform.

  • On Graphs, GPUs, and Blind Dating: A Workload to Processor Matchmaking Quest
    2013 IEEE 27th International Symposium on Parallel and Distributed Processing, 2013
    Co-Authors: Abdullah Gharaibeh, Lauro Beltrão Beltrão Costa, Elizeu Santos-neto, Matei Ripeanu
    Abstract:

    Graph processing has gained renewed attention. The increasing large scale and wealth of connected data, such as those accrued by social network applications, demand the design of new techniques and platforms to efficiently derive actionable information from large scale graphs. Hybrid systems that host processing units optimized for both fast sequential processing and bulk processing (e.g., GPUaccelerated systems) have the potential to cope with the heterogeneous structure of real graphs and enable high performance graph processing. Reaching this Point, however, poses multiple challenges. The heterogeneity of the processing elements (e.g., GPUs implement a different parallel processing model than CPUs and have much less memory) and the inherent irregularity of graph workloads require careful graph partitioning and load assignment. In particular, the workload generated by a partitioning scheme should match the strength of the processing element the partition is allocated to. This work explores the feasibility and quantifies the performance gains of such low-cost partitioning schemes. We propose to partition the workload between the two types of processing elements based on vertex connectivity. We show that such partitioning schemes offer a simple, yet efficient way to boost the overall performance of the hybrid system. Our evaluation illustrates that processing a 4-billion edges graph on a system with one CPU socket and one GPU, while offloading as little as 25% of the edges to the GPU, achieves 2x performance improvement over state-of-the-art implementations running on a dual-socket symmetric system. Moreover, for the same graph, a hybrid system with dualsocket and dual-GPU is capable of 1.13 Billion breadth-first search traversed edge per second, a performance rate that is competitive with the latest entries in the Graph500 list, yet at a much Lower Price Point.

  • IPDPS - On Graphs, GPUs, and Blind Dating: A Workload to Processor Matchmaking Quest
    2013 IEEE 27th International Symposium on Parallel and Distributed Processing, 2013
    Co-Authors: Abdullah Gharaibeh, Elizeu Santos-neto, Lauro Beltrao Costa, Matei Ripeanu
    Abstract:

    Graph processing has gained renewed attention. The increasing large scale and wealth of connected data, such as those accrued by social network applications, demand the design of new techniques and platforms to efficiently derive actionable information from large scale graphs. Hybrid systems that host processing units optimized for both fast sequential processing and bulk processing (e.g., GPUaccelerated systems) have the potential to cope with the heterogeneous structure of real graphs and enable high performance graph processing. Reaching this Point, however, poses multiple challenges. The heterogeneity of the processing elements (e.g., GPUs implement a different parallel processing model than CPUs and have much less memory) and the inherent irregularity of graph workloads require careful graph partitioning and load assignment. In particular, the workload generated by a partitioning scheme should match the strength of the processing element the partition is allocated to. This work explores the feasibility and quantifies the performance gains of such low-cost partitioning schemes. We propose to partition the workload between the two types of processing elements based on vertex connectivity. We show that such partitioning schemes offer a simple, yet efficient way to boost the overall performance of the hybrid system. Our evaluation illustrates that processing a 4-billion edges graph on a system with one CPU socket and one GPU, while offloading as little as 25% of the edges to the GPU, achieves 2x performance improvement over state-of-the-art implementations running on a dual-socket symmetric system. Moreover, for the same graph, a hybrid system with dualsocket and dual-GPU is capable of 1.13 Billion breadth-first search traversed edge per second, a performance rate that is competitive with the latest entries in the Graph500 list, yet at a much Lower Price Point.

Lauro Beltrão Beltrão Costa - One of the best experts on this subject based on the ideXlab platform.

  • On Graphs, GPUs, and Blind Dating: A Workload to Processor Matchmaking Quest
    2013 IEEE 27th International Symposium on Parallel and Distributed Processing, 2013
    Co-Authors: Abdullah Gharaibeh, Lauro Beltrão Beltrão Costa, Elizeu Santos-neto, Matei Ripeanu
    Abstract:

    Graph processing has gained renewed attention. The increasing large scale and wealth of connected data, such as those accrued by social network applications, demand the design of new techniques and platforms to efficiently derive actionable information from large scale graphs. Hybrid systems that host processing units optimized for both fast sequential processing and bulk processing (e.g., GPUaccelerated systems) have the potential to cope with the heterogeneous structure of real graphs and enable high performance graph processing. Reaching this Point, however, poses multiple challenges. The heterogeneity of the processing elements (e.g., GPUs implement a different parallel processing model than CPUs and have much less memory) and the inherent irregularity of graph workloads require careful graph partitioning and load assignment. In particular, the workload generated by a partitioning scheme should match the strength of the processing element the partition is allocated to. This work explores the feasibility and quantifies the performance gains of such low-cost partitioning schemes. We propose to partition the workload between the two types of processing elements based on vertex connectivity. We show that such partitioning schemes offer a simple, yet efficient way to boost the overall performance of the hybrid system. Our evaluation illustrates that processing a 4-billion edges graph on a system with one CPU socket and one GPU, while offloading as little as 25% of the edges to the GPU, achieves 2x performance improvement over state-of-the-art implementations running on a dual-socket symmetric system. Moreover, for the same graph, a hybrid system with dualsocket and dual-GPU is capable of 1.13 Billion breadth-first search traversed edge per second, a performance rate that is competitive with the latest entries in the Graph500 list, yet at a much Lower Price Point.