Unstructured Content

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 12054 Experts worldwide ranked by ideXlab platform

Niloy Mukherjee - One of the best experts on this subject based on the ideXlab platform.

  • oracle database filesystem
    International Conference on Management of Data, 2011
    Co-Authors: Krishna Kunchithapadam, Amit Ganesh, Wei Zhang, Niloy Mukherjee
    Abstract:

    Modern enterprise, web, and multimedia applications are generating Unstructured Content at unforeseen volumes in the form of documents, texts, and media files. Such Content is generally associated with relational data such as user names, location tags, and timestamps. Storage of Unstructured Content in a relational database would guarantee the same robustness, transactional consistency, data integrity, data recoverability and other data management features consolidated across files and relational Contents. Although database systems are preferred for relational data management, poor performance of Unstructured data storage, limited data transformation functionalities, and lack of interfaces based on filesystem standards may keep more than eighty five percent of non-relational Unstructured Content out of databases in the coming decades. We introduce Oracle Database Filesystem (DBFS) as a consolidated solution that unifies state-of-the-art network filesystem features with relational database management ones. DBFS is a novel shared-storage network filesystem developed in the RDBMS kernel that allows Content management applications to transparently store and organize files using standard filesystem interfaces, in the same database that stores associated relational Content. The server component of DBFS is based on Oracle SecureFiles, a novel Unstructured data storage engine within the RDBMS that provides filesystem like or better storage performance for files within the database while fully leveraging relational data management features such as transaction atomicity, isolation, read consistency, temporality, and information lifecycle management. We present a preliminary performance evaluation of DBFS that demonstrates more than 10TB/hr throughput of filesystem read and write operations consistently over a period of 12 hours on an Oracle Exadata Database cluster of four server nodes. In terms of file storage, such extreme performance is equivalent to ingestion of more than 2500 million 100KB document files a single day. The set of initial results look very promising for DBFS towards becoming the universal storage solution for both relational and Unstructured Content.

  • SIGMOD Conference - Oracle database filesystem
    Proceedings of the 2011 international conference on Management of data - SIGMOD '11, 2011
    Co-Authors: Krishna Kunchithapadam, Amit Ganesh, Wei Zhang, Niloy Mukherjee
    Abstract:

    Modern enterprise, web, and multimedia applications are generating Unstructured Content at unforeseen volumes in the form of documents, texts, and media files. Such Content is generally associated with relational data such as user names, location tags, and timestamps. Storage of Unstructured Content in a relational database would guarantee the same robustness, transactional consistency, data integrity, data recoverability and other data management features consolidated across files and relational Contents. Although database systems are preferred for relational data management, poor performance of Unstructured data storage, limited data transformation functionalities, and lack of interfaces based on filesystem standards may keep more than eighty five percent of non-relational Unstructured Content out of databases in the coming decades. We introduce Oracle Database Filesystem (DBFS) as a consolidated solution that unifies state-of-the-art network filesystem features with relational database management ones. DBFS is a novel shared-storage network filesystem developed in the RDBMS kernel that allows Content management applications to transparently store and organize files using standard filesystem interfaces, in the same database that stores associated relational Content. The server component of DBFS is based on Oracle SecureFiles, a novel Unstructured data storage engine within the RDBMS that provides filesystem like or better storage performance for files within the database while fully leveraging relational data management features such as transaction atomicity, isolation, read consistency, temporality, and information lifecycle management. We present a preliminary performance evaluation of DBFS that demonstrates more than 10TB/hr throughput of filesystem read and write operations consistently over a period of 12 hours on an Oracle Exadata Database cluster of four server nodes. In terms of file storage, such extreme performance is equivalent to ingestion of more than 2500 million 100KB document files a single day. The set of initial results look very promising for DBFS towards becoming the universal storage solution for both relational and Unstructured Content.

Eugene Agichtein - One of the best experts on this subject based on the ideXlab platform.

  • TREC - Factoid Question Answering over Unstructured and Structured Web Content.
    2005
    Co-Authors: Silviu Cucerzan, Eugene Agichtein
    Abstract:

    We describe our experience with two new, builtfrom-scratch, web-based question answering systems applied to the TREC 2005 Main Question Answering task, which use complementary models of answering questions over both structured and Unstructured Content on the Web. Our approaches depart from previous question answering (QA) work in several ways. For Unstructured Content, we used a web-based system with novel features such as web snippet pattern matching and generic answer type matching using web counts. We also experimented with a new, complementary question answering approach that uses information from the millions of tables and lists that abound on the web. This system attempts to answer factoid questions by guessing relevant rows and fields in matching web tables and integrating the results. We believe a combination of the two approaches holds promise.

  • factoid question answering over Unstructured and structured web Content
    Text REtrieval Conference, 2005
    Co-Authors: Silviu Cucerzan, Eugene Agichtein
    Abstract:

    We describe our experience with two new, builtfrom-scratch, web-based question answering systems applied to the TREC 2005 Main Question Answering task, which use complementary models of answering questions over both structured and Unstructured Content on the Web. Our approaches depart from previous question answering (QA) work in several ways. For Unstructured Content, we used a web-based system with novel features such as web snippet pattern matching and generic answer type matching using web counts. We also experimented with a new, complementary question answering approach that uses information from the millions of tables and lists that abound on the web. This system attempts to answer factoid questions by guessing relevant rows and fields in matching web tables and integrating the results. We believe a combination of the two approaches holds promise.

Krishna Kunchithapadam - One of the best experts on this subject based on the ideXlab platform.

  • oracle database filesystem
    International Conference on Management of Data, 2011
    Co-Authors: Krishna Kunchithapadam, Amit Ganesh, Wei Zhang, Niloy Mukherjee
    Abstract:

    Modern enterprise, web, and multimedia applications are generating Unstructured Content at unforeseen volumes in the form of documents, texts, and media files. Such Content is generally associated with relational data such as user names, location tags, and timestamps. Storage of Unstructured Content in a relational database would guarantee the same robustness, transactional consistency, data integrity, data recoverability and other data management features consolidated across files and relational Contents. Although database systems are preferred for relational data management, poor performance of Unstructured data storage, limited data transformation functionalities, and lack of interfaces based on filesystem standards may keep more than eighty five percent of non-relational Unstructured Content out of databases in the coming decades. We introduce Oracle Database Filesystem (DBFS) as a consolidated solution that unifies state-of-the-art network filesystem features with relational database management ones. DBFS is a novel shared-storage network filesystem developed in the RDBMS kernel that allows Content management applications to transparently store and organize files using standard filesystem interfaces, in the same database that stores associated relational Content. The server component of DBFS is based on Oracle SecureFiles, a novel Unstructured data storage engine within the RDBMS that provides filesystem like or better storage performance for files within the database while fully leveraging relational data management features such as transaction atomicity, isolation, read consistency, temporality, and information lifecycle management. We present a preliminary performance evaluation of DBFS that demonstrates more than 10TB/hr throughput of filesystem read and write operations consistently over a period of 12 hours on an Oracle Exadata Database cluster of four server nodes. In terms of file storage, such extreme performance is equivalent to ingestion of more than 2500 million 100KB document files a single day. The set of initial results look very promising for DBFS towards becoming the universal storage solution for both relational and Unstructured Content.

  • SIGMOD Conference - Oracle database filesystem
    Proceedings of the 2011 international conference on Management of data - SIGMOD '11, 2011
    Co-Authors: Krishna Kunchithapadam, Amit Ganesh, Wei Zhang, Niloy Mukherjee
    Abstract:

    Modern enterprise, web, and multimedia applications are generating Unstructured Content at unforeseen volumes in the form of documents, texts, and media files. Such Content is generally associated with relational data such as user names, location tags, and timestamps. Storage of Unstructured Content in a relational database would guarantee the same robustness, transactional consistency, data integrity, data recoverability and other data management features consolidated across files and relational Contents. Although database systems are preferred for relational data management, poor performance of Unstructured data storage, limited data transformation functionalities, and lack of interfaces based on filesystem standards may keep more than eighty five percent of non-relational Unstructured Content out of databases in the coming decades. We introduce Oracle Database Filesystem (DBFS) as a consolidated solution that unifies state-of-the-art network filesystem features with relational database management ones. DBFS is a novel shared-storage network filesystem developed in the RDBMS kernel that allows Content management applications to transparently store and organize files using standard filesystem interfaces, in the same database that stores associated relational Content. The server component of DBFS is based on Oracle SecureFiles, a novel Unstructured data storage engine within the RDBMS that provides filesystem like or better storage performance for files within the database while fully leveraging relational data management features such as transaction atomicity, isolation, read consistency, temporality, and information lifecycle management. We present a preliminary performance evaluation of DBFS that demonstrates more than 10TB/hr throughput of filesystem read and write operations consistently over a period of 12 hours on an Oracle Exadata Database cluster of four server nodes. In terms of file storage, such extreme performance is equivalent to ingestion of more than 2500 million 100KB document files a single day. The set of initial results look very promising for DBFS towards becoming the universal storage solution for both relational and Unstructured Content.

Bernhard Mitschang - One of the best experts on this subject based on the ideXlab platform.

  • the deep data warehouse link based integration and enrichment of warehouse data and Unstructured Content
    Enterprise Distributed Object Computing, 2014
    Co-Authors: Christoph Groger, Holger Schwarz, Bernhard Mitschang
    Abstract:

    Data warehouses are at the core of enterprise IT and enable the efficient storage and analysis of structured data. Besides, Unstructured Content, e.g., emails and documents, constitutes more than half of the entire enterprise data and contains a lot of implicit knowledge about warehouse entities. Thus, holistic ana-lytics require the integration of structured warehouse data and Unstructured Content to generate novel insights. These insights can also be used to enrich the integrated data and to create a new basis for further analytics. Existing integration approaches only support a limited range of analytical applications and require the costly adaptation of the warehouse schema. In this paper, we present the Deep Data Warehouse (DeepDWH), a novel type of data warehouse based on the flexible integration and enrichment of warehouse data and Unstructured Content, addressing the variety challenge of Big Data. It relies on information-rich in-stance-level links between warehouse elements and Content items, which are represented in a graph-oriented structure. Neither adaptations of the existing warehouse nor the design of an overall federated schema are required. We design a conceptual linking model and develop a logical schema for links based on a property graph. As a proof of concept, we present a prototypical imple-mentation of the DeepDWH including a link store based on a graph database.

  • EDOC - The Deep Data Warehouse: Link-Based Integration and Enrichment of Warehouse Data and Unstructured Content
    2014 IEEE 18th International Enterprise Distributed Object Computing Conference, 2014
    Co-Authors: Christoph Groger, Holger Schwarz, Bernhard Mitschang
    Abstract:

    Data warehouses are at the core of enterprise IT and enable the efficient storage and analysis of structured data. Besides, Unstructured Content, e.g., emails and documents, constitutes more than half of the entire enterprise data and contains a lot of implicit knowledge about warehouse entities. Thus, holistic ana-lytics require the integration of structured warehouse data and Unstructured Content to generate novel insights. These insights can also be used to enrich the integrated data and to create a new basis for further analytics. Existing integration approaches only support a limited range of analytical applications and require the costly adaptation of the warehouse schema. In this paper, we present the Deep Data Warehouse (DeepDWH), a novel type of data warehouse based on the flexible integration and enrichment of warehouse data and Unstructured Content, addressing the variety challenge of Big Data. It relies on information-rich in-stance-level links between warehouse elements and Content items, which are represented in a graph-oriented structure. Neither adaptations of the existing warehouse nor the design of an overall federated schema are required. We design a conceptual linking model and develop a logical schema for links based on a property graph. As a proof of concept, we present a prototypical imple-mentation of the DeepDWH including a link store based on a graph database.

Silviu Cucerzan - One of the best experts on this subject based on the ideXlab platform.

  • TREC - Factoid Question Answering over Unstructured and Structured Web Content.
    2005
    Co-Authors: Silviu Cucerzan, Eugene Agichtein
    Abstract:

    We describe our experience with two new, builtfrom-scratch, web-based question answering systems applied to the TREC 2005 Main Question Answering task, which use complementary models of answering questions over both structured and Unstructured Content on the Web. Our approaches depart from previous question answering (QA) work in several ways. For Unstructured Content, we used a web-based system with novel features such as web snippet pattern matching and generic answer type matching using web counts. We also experimented with a new, complementary question answering approach that uses information from the millions of tables and lists that abound on the web. This system attempts to answer factoid questions by guessing relevant rows and fields in matching web tables and integrating the results. We believe a combination of the two approaches holds promise.

  • factoid question answering over Unstructured and structured web Content
    Text REtrieval Conference, 2005
    Co-Authors: Silviu Cucerzan, Eugene Agichtein
    Abstract:

    We describe our experience with two new, builtfrom-scratch, web-based question answering systems applied to the TREC 2005 Main Question Answering task, which use complementary models of answering questions over both structured and Unstructured Content on the Web. Our approaches depart from previous question answering (QA) work in several ways. For Unstructured Content, we used a web-based system with novel features such as web snippet pattern matching and generic answer type matching using web counts. We also experimented with a new, complementary question answering approach that uses information from the millions of tables and lists that abound on the web. This system attempts to answer factoid questions by guessing relevant rows and fields in matching web tables and integrating the results. We believe a combination of the two approaches holds promise.