Extracted Content

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 99 Experts worldwide ranked by ideXlab platform

Luis Gravano - One of the best experts on this subject based on the ideXlab platform.

  • distributed search over the hidden web hierarchical database sampling and selection
    Very Large Data Bases, 2002
    Co-Authors: Panagiotis G. Ipeirotis, Luis Gravano
    Abstract:

    Many valuable text databases on the web have non-crawlable Contents that are "hidden" behind search interfaces. Metasearchers are helpful tools for searching over many such databases at once through a unified query interface. A critical task for a metasearcher to process a query efficiently and effectively is the selection of the most promising databases for the query, a task that typically relies on statistical summaries of the database Contents. Unfortunately, web-accessible text databases do not generally export Content summaries. In this paper, we present an algorithm to derive Content summaries from "uncooperative" databases by using "focused query probes," which adaptively zoom in on and extract documents that are representative of the topic coverage of the databases. Our Content summaries are the first to include absolute document frequency estimates for the database words. We also present a novel database selection algorithm that exploits both the Extracted Content summaries and a hierarchical classification of the databases, automatically derived during probing, to compensate for potentially incomplete Content summaries. Finally, we evaluate our techniques thoroughly using a variety of databases, including 50 real web-accessible text databases. Our experiments indicate that our new Content-summary construction technique is efficient and produces more accurate summaries than those from previously proposed strategies. Also, our hierarchical database selection algorithm exhibits significantly higher precision than its flat counterparts.

  • VLDB - Distributed search over the hidden web: hierarchical database sampling and selection
    VLDB '02: Proceedings of the 28th International Conference on Very Large Databases, 2002
    Co-Authors: Panagiotis G. Ipeirotis, Luis Gravano
    Abstract:

    Many valuable text databases on the web have non-crawlable Contents that are "hidden" behind search interfaces. Metasearchers are helpful tools for searching over many such databases at once through a unified query interface. A critical task for a metasearcher to process a query efficiently and effectively is the selection of the most promising databases for the query, a task that typically relies on statistical summaries of the database Contents. Unfortunately, web-accessible text databases do not generally export Content summaries. In this paper, we present an algorithm to derive Content summaries from "uncooperative" databases by using "focused query probes," which adaptively zoom in on and extract documents that are representative of the topic coverage of the databases. Our Content summaries are the first to include absolute document frequency estimates for the database words. We also present a novel database selection algorithm that exploits both the Extracted Content summaries and a hierarchical classification of the databases, automatically derived during probing, to compensate for potentially incomplete Content summaries. Finally, we evaluate our techniques thoroughly using a variety of databases, including 50 real web-accessible text databases. Our experiments indicate that our new Content-summary construction technique is efficient and produces more accurate summaries than those from previously proposed strategies. Also, our hierarchical database selection algorithm exhibits significantly higher precision than its flat counterparts.

Panagiotis G. Ipeirotis - One of the best experts on this subject based on the ideXlab platform.

  • distributed search over the hidden web hierarchical database sampling and selection
    Very Large Data Bases, 2002
    Co-Authors: Panagiotis G. Ipeirotis, Luis Gravano
    Abstract:

    Many valuable text databases on the web have non-crawlable Contents that are "hidden" behind search interfaces. Metasearchers are helpful tools for searching over many such databases at once through a unified query interface. A critical task for a metasearcher to process a query efficiently and effectively is the selection of the most promising databases for the query, a task that typically relies on statistical summaries of the database Contents. Unfortunately, web-accessible text databases do not generally export Content summaries. In this paper, we present an algorithm to derive Content summaries from "uncooperative" databases by using "focused query probes," which adaptively zoom in on and extract documents that are representative of the topic coverage of the databases. Our Content summaries are the first to include absolute document frequency estimates for the database words. We also present a novel database selection algorithm that exploits both the Extracted Content summaries and a hierarchical classification of the databases, automatically derived during probing, to compensate for potentially incomplete Content summaries. Finally, we evaluate our techniques thoroughly using a variety of databases, including 50 real web-accessible text databases. Our experiments indicate that our new Content-summary construction technique is efficient and produces more accurate summaries than those from previously proposed strategies. Also, our hierarchical database selection algorithm exhibits significantly higher precision than its flat counterparts.

  • VLDB - Distributed search over the hidden web: hierarchical database sampling and selection
    VLDB '02: Proceedings of the 28th International Conference on Very Large Databases, 2002
    Co-Authors: Panagiotis G. Ipeirotis, Luis Gravano
    Abstract:

    Many valuable text databases on the web have non-crawlable Contents that are "hidden" behind search interfaces. Metasearchers are helpful tools for searching over many such databases at once through a unified query interface. A critical task for a metasearcher to process a query efficiently and effectively is the selection of the most promising databases for the query, a task that typically relies on statistical summaries of the database Contents. Unfortunately, web-accessible text databases do not generally export Content summaries. In this paper, we present an algorithm to derive Content summaries from "uncooperative" databases by using "focused query probes," which adaptively zoom in on and extract documents that are representative of the topic coverage of the databases. Our Content summaries are the first to include absolute document frequency estimates for the database words. We also present a novel database selection algorithm that exploits both the Extracted Content summaries and a hierarchical classification of the databases, automatically derived during probing, to compensate for potentially incomplete Content summaries. Finally, we evaluate our techniques thoroughly using a variety of databases, including 50 real web-accessible text databases. Our experiments indicate that our new Content-summary construction technique is efficient and produces more accurate summaries than those from previously proposed strategies. Also, our hierarchical database selection algorithm exhibits significantly higher precision than its flat counterparts.

Witold Hołubowicz - One of the best experts on this subject based on the ideXlab platform.

  • Cyber Security of the Application Layer of Mission Critical Industrial Systems
    2016
    Co-Authors: Rafał Kozik, Michał Choraś, Rafał Renk, Witold Hołubowicz
    Abstract:

    In this paper we focus on proposing the effective methods of cyber protection of the application layer. We also discuss how this challenge is related to mission critical industrial and manufacturing systems. In this paper we propose two step HTTP request analysis method that engages request segmentation, statistical analysis of the Extracted Content and machine learning on the imbalanced data. In this work, we particularly addressed the segmentation technique that allows us to divide the large dataset on smaller subsets and learn the classifiers in a significantly shorter time. In our experiments we evaluated several classifiers that are popular in data mining community. The results of our experiments are obtained on a benchmark CSIC’10 HTTP dataset. The proposed approach allows us to further improve the achieved results of protecting application layer in comparison to other benchmark approaches.

  • CISIM - Cyber Security of the Application Layer of Mission Critical Industrial Systems
    Computer Information Systems and Industrial Management, 2016
    Co-Authors: Rafał Kozik, Michał Choraś, Rafał Renk, Witold Hołubowicz
    Abstract:

    In this paper we focus on proposing the effective methods of cyber protection of the application layer. We also discuss how this challenge is related to mission critical industrial and manufacturing systems. In this paper we propose two step HTTP request analysis method that engages request segmentation, statistical analysis of the Extracted Content and machine learning on the imbalanced data. In this work, we particularly addressed the segmentation technique that allows us to divide the large dataset on smaller subsets and learn the classifiers in a significantly shorter time. In our experiments we evaluated several classifiers that are popular in data mining community. The results of our experiments are obtained on a benchmark CSIC’10 HTTP dataset. The proposed approach allows us to further improve the achieved results of protecting application layer in comparison to other benchmark approaches.

Rafał Kozik - One of the best experts on this subject based on the ideXlab platform.

  • Cyber Security of the Application Layer of Mission Critical Industrial Systems
    2016
    Co-Authors: Rafał Kozik, Michał Choraś, Rafał Renk, Witold Hołubowicz
    Abstract:

    In this paper we focus on proposing the effective methods of cyber protection of the application layer. We also discuss how this challenge is related to mission critical industrial and manufacturing systems. In this paper we propose two step HTTP request analysis method that engages request segmentation, statistical analysis of the Extracted Content and machine learning on the imbalanced data. In this work, we particularly addressed the segmentation technique that allows us to divide the large dataset on smaller subsets and learn the classifiers in a significantly shorter time. In our experiments we evaluated several classifiers that are popular in data mining community. The results of our experiments are obtained on a benchmark CSIC’10 HTTP dataset. The proposed approach allows us to further improve the achieved results of protecting application layer in comparison to other benchmark approaches.

  • CISIM - Cyber Security of the Application Layer of Mission Critical Industrial Systems
    Computer Information Systems and Industrial Management, 2016
    Co-Authors: Rafał Kozik, Michał Choraś, Rafał Renk, Witold Hołubowicz
    Abstract:

    In this paper we focus on proposing the effective methods of cyber protection of the application layer. We also discuss how this challenge is related to mission critical industrial and manufacturing systems. In this paper we propose two step HTTP request analysis method that engages request segmentation, statistical analysis of the Extracted Content and machine learning on the imbalanced data. In this work, we particularly addressed the segmentation technique that allows us to divide the large dataset on smaller subsets and learn the classifiers in a significantly shorter time. In our experiments we evaluated several classifiers that are popular in data mining community. The results of our experiments are obtained on a benchmark CSIC’10 HTTP dataset. The proposed approach allows us to further improve the achieved results of protecting application layer in comparison to other benchmark approaches.

Arthur Leclaire - One of the best experts on this subject based on the ideXlab platform.

  • Stochastic Image Models from SIFT-like descriptors
    SIAM Journal on Imaging Sciences, 2018
    Co-Authors: Agnès Desolneux, Arthur Leclaire
    Abstract:

    Extraction of local features constitutes a first step of many algorithms used in computer vision. The choice of keypoints and local features is often driven by the optimization of a performance criterion on a given computer vision task, which sometimes makes the Extracted Content difficult to apprehend. In this paper we propose to examine the Content of local image descriptors from a reconstruction perspective. For that, relying on the keypoints and descriptors provided by the scale-invariant feature transform (SIFT), we propose two stochastic models for exploring the set of images that can be obtained from given SIFT descriptors. The two models are both defined as solutions of generalized Poisson problems that combine gradient information at different scales. The first model consists in sampling an orientation field according to a maximum entropy distribution constrained by local histograms of gradient orientations (at scale 0). The second model consists in simple resampling of the local histogram of gradient orientations at multiple scales. We show that both these models admit convolutive expressions which allow to compute the model statistics (e.g. the mean, the variance). Also, in the experimental section, we show that these models are able to recover many image structures, while not requiring any external database. Finally, we compare several other choices of points of interest in terms of quality of reconstruction, which confirms the optimality of the SIFT keypoints over simpler alternatives.