Text Analytics - Explore the Science & Experts

The Experts below are selected from a list of 6453 Experts worldwide ranked by ideXlab platform

Laura Chiticariu - One of the best experts on this subject based on the ideXlab platform.

A hardware compilation framework for Text Analytics queries

Journal of Parallel and Distributed Computing, 2018

Co-Authors: Raphael Polig, Kubilay Atasu, Frederick R. Reiss, Laura Chiticariu, Heiner Giefers, Christoph Hagleitner, Peter Hofstee

Abstract:

Abstract Unstructured Text data is being generated at an unprecedented rate in the form of Twitter feeds, machine logs or medical records. The analysis of this data is an important step to gaining significant insight regarding innovation, security and decision-making. The performance of traditional compute systems struggles to keep up with the rapid data growth and the expected high quality of information extraction. To cope with this situation, a compilation framework is presented that can transform Text Analytics queries into a hardware description. Deployed on an FPGA, the queries can be executed 60 times faster on average compared to a multi-threaded software implementation. The performance has been evaluated on two generations of high-end server systems including two generations of FPGAs, demonstrating the performance gains from advanced technology.

15 days free trial to Access Article
FPL - Compiling Text Analytics queries to FPGAs

2014 24th International Conference on Field Programmable Logic and Applications (FPL), 2014

Co-Authors: Raphael Polig, Kubilay Atasu, Heiner Giefers, Laura Chiticariu

Abstract:

Extracting information from unstructured Text data is a compute-intensive task. The performance of general-purpose processors cannot keep up with the rapid growth of Textual data. Therefore we discuss the use of FPGAs to perform large scale Text Analytics. We present a framework consisting of a compiler and an operator library capable of generating a Verilog processing pipeline from a Text Analytics query specified in the annotation query language AQL. The operator library comprises a set of configurable modules capable of performing relational and extraction tasks which can be assembled by the compiler to represent a full annotation operator graph. Leveraging the nature of Text processing we show that most tasks can be performed in an efficient streaming fashion. We evaluate the performance, power consumption and hardware utilization of our approach for a set of different queries compiled to a Stratix IV FPGA. Measurements show an up to 79 times improvement of document-throughput over a 64 threaded software implementation on a POWER7 server. Moreover the accelerated system's energy efficiency is up to 85 times better.

15 days free trial to Access Article
Giving Text Analytics a Boost

IEEE Micro, 2014

Co-Authors: Raphael Polig, Kubilay Atasu, Frederick R. Reiss, H. Peter Hofstee, Laura Chiticariu, Christoph Hagleitner, Eva Sitaridi

Abstract:

The amount of Textual data has reached a new scale and continues to grow at an unprecedented rate. IBM's SystemT software is a powerful Text-Analytics system that offers a query-based interface to reveal the valuable information that lies within these mounds of data. However, traditional server architectures are not capable of analyzing so-called big data efficiently, despite the high memory bandwidth that is available. The authors show that by using a streaming hardware accelerator implemented in reconfigurable logic, the throughput rates of the SystemT's information extraction queries can be improved by an order of magnitude. They also show how such a system can be deployed by extending SystemT's existing compilation flow and by using a multithreaded communication interface that can efficiently use the accelerator's bandwidth.

15 days free trial to Access Article
Compiling Text Analytics queries to FPGAs

2014 24th International Conference on Field Programmable Logic and Applications (FPL), 2014

Co-Authors: Raphael Polig, Kubilay Atasu, Heiner Giefers, Laura Chiticariu

Abstract:

Extracting information from unstructured Text data is a compute-intensive task. The performance of general-purpose processors cannot keep up with the rapid growth of Textual data. Therefore we discuss the use of FPGAs to perform large scale Text Analytics. We present a framework consisting of a compiler and an operator library capable of generating a Verilog processing pipeline from a Text Analytics query specified in the annotation query language AQL. The operator library comprises a set of configurable modules capable of performing relational and extraction tasks which can be assembled by the compiler to represent a full annotation operator graph. Leveraging the nature of Text processing we show that most tasks can be performed in an efficient streaming fashion. We evaluate the performance, power consumption and hardware utilization of our approach for a set of different queries compiled to a Stratix IV FPGA. Measurements show an up to 79 times improvement of document-throughput over a 64 threaded software implementation on a POWER7 server. Moreover the accelerated system's energy efficiency is up to 85 times better.

15 days free trial to Access Article
i can do Text Analytics designing development tools for novice developers

Human Factors in Computing Systems, 2013

Co-Authors: Huahai Yang, Daina Puponswickham, Yunyao Li, Laura Chiticariu, Benjamin Nguyen, Arnaldo Carrenofuentes

Abstract:

Text Analytics, an increasingly important application domain, is hampered by the high barrier to entry due to the many conceptual difficulties novice developers encounter. This work addresses the problem by developing a tool to guide novice developers to adopt the best practices employed by expert developers in Text Analytics and to quickly harness the full power of the underlying system. Taking a user centered task analytical approach, the tool development went through multiple design iterations and evaluation cycles. In the latest evaluation, we found that our tool enables novice developers to develop high quality extractors on par with the state of art within a few hours and with minimal training. Finally, we discuss our experience and lessons learned in the conText of designing user interfaces to reduce the barriers to entry into complex domains of expertise.

15 days free trial to Access Article

Raphael Polig - One of the best experts on this subject based on the ideXlab platform.

A hardware compilation framework for Text Analytics queries

Journal of Parallel and Distributed Computing, 2018

Co-Authors: Raphael Polig, Kubilay Atasu, Frederick R. Reiss, Laura Chiticariu, Heiner Giefers, Christoph Hagleitner, Peter Hofstee

Abstract:

Abstract Unstructured Text data is being generated at an unprecedented rate in the form of Twitter feeds, machine logs or medical records. The analysis of this data is an important step to gaining significant insight regarding innovation, security and decision-making. The performance of traditional compute systems struggles to keep up with the rapid data growth and the expected high quality of information extraction. To cope with this situation, a compilation framework is presented that can transform Text Analytics queries into a hardware description. Deployed on an FPGA, the queries can be executed 60 times faster on average compared to a multi-threaded software implementation. The performance has been evaluated on two generations of high-end server systems including two generations of FPGAs, demonstrating the performance gains from advanced technology.

15 days free trial to Access Article
FPL - Compiling Text Analytics queries to FPGAs

2014 24th International Conference on Field Programmable Logic and Applications (FPL), 2014

Co-Authors: Raphael Polig, Kubilay Atasu, Heiner Giefers, Laura Chiticariu

Abstract:

Extracting information from unstructured Text data is a compute-intensive task. The performance of general-purpose processors cannot keep up with the rapid growth of Textual data. Therefore we discuss the use of FPGAs to perform large scale Text Analytics. We present a framework consisting of a compiler and an operator library capable of generating a Verilog processing pipeline from a Text Analytics query specified in the annotation query language AQL. The operator library comprises a set of configurable modules capable of performing relational and extraction tasks which can be assembled by the compiler to represent a full annotation operator graph. Leveraging the nature of Text processing we show that most tasks can be performed in an efficient streaming fashion. We evaluate the performance, power consumption and hardware utilization of our approach for a set of different queries compiled to a Stratix IV FPGA. Measurements show an up to 79 times improvement of document-throughput over a 64 threaded software implementation on a POWER7 server. Moreover the accelerated system's energy efficiency is up to 85 times better.

15 days free trial to Access Article
Giving Text Analytics a Boost

IEEE Micro, 2014

Co-Authors: Raphael Polig, Kubilay Atasu, Frederick R. Reiss, H. Peter Hofstee, Laura Chiticariu, Christoph Hagleitner, Eva Sitaridi

Abstract:

The amount of Textual data has reached a new scale and continues to grow at an unprecedented rate. IBM's SystemT software is a powerful Text-Analytics system that offers a query-based interface to reveal the valuable information that lies within these mounds of data. However, traditional server architectures are not capable of analyzing so-called big data efficiently, despite the high memory bandwidth that is available. The authors show that by using a streaming hardware accelerator implemented in reconfigurable logic, the throughput rates of the SystemT's information extraction queries can be improved by an order of magnitude. They also show how such a system can be deployed by extending SystemT's existing compilation flow and by using a multithreaded communication interface that can efficiently use the accelerator's bandwidth.

15 days free trial to Access Article
Compiling Text Analytics queries to FPGAs

2014 24th International Conference on Field Programmable Logic and Applications (FPL), 2014

Co-Authors: Raphael Polig, Kubilay Atasu, Heiner Giefers, Laura Chiticariu

Abstract:

Extracting information from unstructured Text data is a compute-intensive task. The performance of general-purpose processors cannot keep up with the rapid growth of Textual data. Therefore we discuss the use of FPGAs to perform large scale Text Analytics. We present a framework consisting of a compiler and an operator library capable of generating a Verilog processing pipeline from a Text Analytics query specified in the annotation query language AQL. The operator library comprises a set of configurable modules capable of performing relational and extraction tasks which can be assembled by the compiler to represent a full annotation operator graph. Leveraging the nature of Text processing we show that most tasks can be performed in an efficient streaming fashion. We evaluate the performance, power consumption and hardware utilization of our approach for a set of different queries compiled to a Stratix IV FPGA. Measurements show an up to 79 times improvement of document-throughput over a 64 threaded software implementation on a POWER7 server. Moreover the accelerated system's energy efficiency is up to 85 times better.

15 days free trial to Access Article
FPL - Hardware-accelerated regular expression matching for high-throughput Text Analytics

2013 23rd International Conference on Field programmable Logic and Applications, 2013

Co-Authors: Kubilay Atasu, Raphael Polig, Christoph Hagleitner, Frederick R. Reiss

Abstract:

Advanced Text Analytics systems combine regular expression (regex) matching, dictionary processing, and relational algebra for efficient information extraction from Text documents. Such systems require support for advanced regex matching features, such as start offset reporting and capturing groups. However, existing regex matching architectures based on reconfigurable nondeterministic state machines and programmable deterministic state machines are not designed to support such features. We describe a novel architecture that supports such advanced features using a network of state machines. We also present a compiler that maps the regexs onto such networks that can be efficiently realized on reconfigurable logic. For each regex, our compiler produces a state machine description, statically computes the number of state machines needed, and produces an optimized interconnection network. Experiments on an Altera Stratix IV FPGA, using regexs from a real life Text Analytics benchmark, show that a throughput rate of 16 Gb/s can be reached.

15 days free trial to Access Article

Kubilay Atasu - One of the best experts on this subject based on the ideXlab platform.

A hardware compilation framework for Text Analytics queries

Journal of Parallel and Distributed Computing, 2018

Co-Authors: Raphael Polig, Kubilay Atasu, Frederick R. Reiss, Laura Chiticariu, Heiner Giefers, Christoph Hagleitner, Peter Hofstee

Abstract:

Abstract Unstructured Text data is being generated at an unprecedented rate in the form of Twitter feeds, machine logs or medical records. The analysis of this data is an important step to gaining significant insight regarding innovation, security and decision-making. The performance of traditional compute systems struggles to keep up with the rapid data growth and the expected high quality of information extraction. To cope with this situation, a compilation framework is presented that can transform Text Analytics queries into a hardware description. Deployed on an FPGA, the queries can be executed 60 times faster on average compared to a multi-threaded software implementation. The performance has been evaluated on two generations of high-end server systems including two generations of FPGAs, demonstrating the performance gains from advanced technology.

15 days free trial to Access Article
FPL - Compiling Text Analytics queries to FPGAs

2014 24th International Conference on Field Programmable Logic and Applications (FPL), 2014

Co-Authors: Raphael Polig, Kubilay Atasu, Heiner Giefers, Laura Chiticariu

Abstract:

Extracting information from unstructured Text data is a compute-intensive task. The performance of general-purpose processors cannot keep up with the rapid growth of Textual data. Therefore we discuss the use of FPGAs to perform large scale Text Analytics. We present a framework consisting of a compiler and an operator library capable of generating a Verilog processing pipeline from a Text Analytics query specified in the annotation query language AQL. The operator library comprises a set of configurable modules capable of performing relational and extraction tasks which can be assembled by the compiler to represent a full annotation operator graph. Leveraging the nature of Text processing we show that most tasks can be performed in an efficient streaming fashion. We evaluate the performance, power consumption and hardware utilization of our approach for a set of different queries compiled to a Stratix IV FPGA. Measurements show an up to 79 times improvement of document-throughput over a 64 threaded software implementation on a POWER7 server. Moreover the accelerated system's energy efficiency is up to 85 times better.

15 days free trial to Access Article
Giving Text Analytics a Boost

IEEE Micro, 2014

Co-Authors: Raphael Polig, Kubilay Atasu, Frederick R. Reiss, H. Peter Hofstee, Laura Chiticariu, Christoph Hagleitner, Eva Sitaridi

Abstract:

The amount of Textual data has reached a new scale and continues to grow at an unprecedented rate. IBM's SystemT software is a powerful Text-Analytics system that offers a query-based interface to reveal the valuable information that lies within these mounds of data. However, traditional server architectures are not capable of analyzing so-called big data efficiently, despite the high memory bandwidth that is available. The authors show that by using a streaming hardware accelerator implemented in reconfigurable logic, the throughput rates of the SystemT's information extraction queries can be improved by an order of magnitude. They also show how such a system can be deployed by extending SystemT's existing compilation flow and by using a multithreaded communication interface that can efficiently use the accelerator's bandwidth.

15 days free trial to Access Article
Resource-efficient regular expression matching architecture for Text Analytics

2014 IEEE 25th International Conference on Application-Specific Systems Architectures and Processors, 2014

Co-Authors: Kubilay Atasu

Abstract:

Text Analytics systems, such as IBM's SystemT software, rely on regular expressions (regexs) and dictionaries for transforming unstructured data into a structured format. Unlike network intrusion detection systems, Text Analytics systems compute and report precisely where the specific and sensitive information starts and ends in a Text document. Therefore, advanced regex matching functions, such as start-offset reporting, capturing groups, and leftmost match computation are heavily used in Text Analytics systems. We present a novel regex matching architecture that supports such functions in a resource-efficient way. The resource efficiency is achieved by 1) eliminating state replication, 2) avoiding expensive offset comparison operations in leftmost match computation, and 3) minimizing the number of offset registers. Experiments on regex sets from Text Analytics and network intrusion detection domains, using an Altera Stratix IV FPGA, show that the proposed architecture achieves a more than threefold reduction of the logic resources used and a more than 1.25-fold increase of the clock frequency with respect to a recently proposed architecture that supports identical features.

15 days free trial to Access Article
ASAP - Resource-efficient regular expression matching architecture for Text Analytics

2014 IEEE 25th International Conference on Application-Specific Systems Architectures and Processors, 2014

Co-Authors: Kubilay Atasu

Abstract:

Text Analytics systems, such as IBM's SystemT software, rely on regular expressions (regexs) and dictionaries for transforming unstructured data into a structured format. Unlike network intrusion detection systems, Text Analytics systems compute and report precisely where the specific and sensitive information starts and ends in a Text document. Therefore, advanced regex matching functions, such as start-offset reporting, capturing groups, and leftmost match computation are heavily used in Text Analytics systems. We present a novel regex matching architecture that supports such functions in a resource-efficient way. The resource efficiency is achieved by 1) eliminating state replication, 2) avoiding expensive offset comparison operations in leftmost match computation, and 3) minimizing the number of offset registers. Experiments on regex sets from Text Analytics and network intrusion detection domains, using an Altera Stratix IV FPGA, show that the proposed architecture achieves a more than threefold reduction of the logic resources used and a more than 1.25-fold increase of the clock frequency with respect to a recently proposed architecture that supports identical features.

15 days free trial to Access Article

Zeynep Akkalyoncu Yilmaz - One of the best experts on this subject based on the ideXlab platform.

information retrieval meets scalable Text Analytics solr integration with spark

International ACM SIGIR Conference on Research and Development in Information Retrieval, 2019

Co-Authors: Ryan Clancy, Zeynep Akkalyoncu Yilmaz

Abstract:

Despite the broad adoption of both Apache Spark and Apache Solr, there is little integration between these two platforms to support scalable, end-to-end Text Analytics. We believe this is a missed opportunity, as there is substantial synergy in building analytical pipelines where the results of potentially complex faceted queries feed downstream Text processing components. This demonstration explores exactly such an integration: we evaluate performance under different analytical scenarios and present three simple case studies that illustrate the range of possible analyses enabled by seamlessly connecting Spark to Solr.

15 days free trial to Access Article
SIGIR - Information Retrieval Meets Scalable Text Analytics: Solr Integration with Spark

Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval - SIGIR'19, 2019

Co-Authors: Ryan Clancy, Zeynep Akkalyoncu Yilmaz

Abstract:

Despite the broad adoption of both Apache Spark and Apache Solr, there is little integration between these two platforms to support scalable, end-to-end Text Analytics. We believe this is a missed opportunity, as there is substantial synergy in building analytical pipelines where the results of potentially complex faceted queries feed downstream Text processing components. This demonstration explores exactly such an integration: we evaluate performance under different analytical scenarios and present three simple case studies that illustrate the range of possible analyses enabled by seamlessly connecting Spark to Solr.

15 days free trial to Access Article

Frederick R. Reiss - One of the best experts on this subject based on the ideXlab platform.

A hardware compilation framework for Text Analytics queries

Journal of Parallel and Distributed Computing, 2018

Co-Authors: Raphael Polig, Kubilay Atasu, Frederick R. Reiss, Laura Chiticariu, Heiner Giefers, Christoph Hagleitner, Peter Hofstee

Abstract:

Abstract Unstructured Text data is being generated at an unprecedented rate in the form of Twitter feeds, machine logs or medical records. The analysis of this data is an important step to gaining significant insight regarding innovation, security and decision-making. The performance of traditional compute systems struggles to keep up with the rapid data growth and the expected high quality of information extraction. To cope with this situation, a compilation framework is presented that can transform Text Analytics queries into a hardware description. Deployed on an FPGA, the queries can be executed 60 times faster on average compared to a multi-threaded software implementation. The performance has been evaluated on two generations of high-end server systems including two generations of FPGAs, demonstrating the performance gains from advanced technology.

15 days free trial to Access Article
Giving Text Analytics a Boost

IEEE Micro, 2014

Co-Authors: Raphael Polig, Kubilay Atasu, Frederick R. Reiss, H. Peter Hofstee, Laura Chiticariu, Christoph Hagleitner, Eva Sitaridi

Abstract:

The amount of Textual data has reached a new scale and continues to grow at an unprecedented rate. IBM's SystemT software is a powerful Text-Analytics system that offers a query-based interface to reveal the valuable information that lies within these mounds of data. However, traditional server architectures are not capable of analyzing so-called big data efficiently, despite the high memory bandwidth that is available. The authors show that by using a streaming hardware accelerator implemented in reconfigurable logic, the throughput rates of the SystemT's information extraction queries can be improved by an order of magnitude. They also show how such a system can be deployed by extending SystemT's existing compilation flow and by using a multithreaded communication interface that can efficiently use the accelerator's bandwidth.

15 days free trial to Access Article
FPL - Hardware-accelerated regular expression matching for high-throughput Text Analytics

2013 23rd International Conference on Field programmable Logic and Applications, 2013

Co-Authors: Kubilay Atasu, Raphael Polig, Christoph Hagleitner, Frederick R. Reiss

Abstract:

Advanced Text Analytics systems combine regular expression (regex) matching, dictionary processing, and relational algebra for efficient information extraction from Text documents. Such systems require support for advanced regex matching features, such as start offset reporting and capturing groups. However, existing regex matching architectures based on reconfigurable nondeterministic state machines and programmable deterministic state machines are not designed to support such features. We describe a novel architecture that supports such advanced features using a network of state machines. We also present a compiler that maps the regexs onto such networks that can be efficiently realized on reconfigurable logic. For each regex, our compiler produces a state machine description, statically computes the number of state machines needed, and produces an optimized interconnection network. Experiments on an Altera Stratix IV FPGA, using regexs from a real life Text Analytics benchmark, show that a throughput rate of 16 Gb/s can be reached.

15 days free trial to Access Article
Hardware-accelerated regular expression matching for high-throughput Text Analytics

2013 23rd International Conference on Field programmable Logic and Applications, 2013

Co-Authors: Kubilay Atasu, Raphael Polig, Christoph Hagleitner, Frederick R. Reiss

Abstract:

Advanced Text Analytics systems combine regular expression (regex) matching, dictionary processing, and relational algebra for efficient information extraction from Text documents. Such systems require support for advanced regex matching features, such as start offset reporting and capturing groups. However, existing regex matching architectures based on reconfigurable nondeterministic state machines and programmable deterministic state machines are not designed to support such features. We describe a novel architecture that supports such advanced features using a network of state machines. We also present a compiler that maps the regexs onto such networks that can be efficiently realized on reconfigurable logic. For each regex, our compiler produces a state machine description, statically computes the number of state machines needed, and produces an optimized interconnection network. Experiments on an Altera Stratix IV FPGA, using regexs from a real life Text Analytics benchmark, show that a throughput rate of 16 Gb/s can be reached.

15 days free trial to Access Article

Discover everything there is to know about the scientific topic Text Analytics with ideXlab!

Laura Chiticariu - One of the best experts on this subject based on the ideXlab platform.

A hardware compilation framework for Text Analytics queries

FPL - Compiling Text Analytics queries to FPGAs

Giving Text Analytics a Boost

Compiling Text Analytics queries to FPGAs

i can do Text Analytics designing development tools for novice developers

Raphael Polig - One of the best experts on this subject based on the ideXlab platform.

A hardware compilation framework for Text Analytics queries

FPL - Compiling Text Analytics queries to FPGAs

Giving Text Analytics a Boost

Compiling Text Analytics queries to FPGAs

FPL - Hardware-accelerated regular expression matching for high-throughput Text Analytics

Kubilay Atasu - One of the best experts on this subject based on the ideXlab platform.

A hardware compilation framework for Text Analytics queries

FPL - Compiling Text Analytics queries to FPGAs

Giving Text Analytics a Boost

Resource-efficient regular expression matching architecture for Text Analytics

ASAP - Resource-efficient regular expression matching architecture for Text Analytics

Zeynep Akkalyoncu Yilmaz - One of the best experts on this subject based on the ideXlab platform.

information retrieval meets scalable Text Analytics solr integration with spark

SIGIR - Information Retrieval Meets Scalable Text Analytics: Solr Integration with Spark

Frederick R. Reiss - One of the best experts on this subject based on the ideXlab platform.

A hardware compilation framework for Text Analytics queries

Giving Text Analytics a Boost

FPL - Hardware-accelerated regular expression matching for high-throughput Text Analytics

Hardware-accelerated regular expression matching for high-throughput Text Analytics