Single Database

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 360 Experts worldwide ranked by ideXlab platform

Rafail Ostrovsky - One of the best experts on this subject based on the ideXlab platform.

  • public key encryption that allows pir queries
    International Cryptology Conference, 2007
    Co-Authors: Da Oneh, Rafail Ostrovsky, Eyal Kushilevitz, Iii William E Skeith
    Abstract:

    Consider the following problem: Alice wishes to maintain her email using a storage-provider Bob (such as a Yahoo! or hotmail email account). This storage-provider should provide for Alice the ability to collect, retrieve, search and delete emails but, at the same time, should learn neither the content of messages sent from the senders to Alice (with Bob as an intermediary), nor the search criteria used by Alice. A trivial solution is that messages will be sent to Bob in encrypted form and Alice, whenever she wants to search for some message, will ask Bob to send her a copy of the entire Database of encrypted emails. This however is highly inefficient. We will be interested in solutions that are communication-efficient and, at the same time, respect the privacy of Alice. In this paper, we show how to create a public-key encryption scheme for Alice that allows PIR searching over encrypted documents. Our solution is the first to reveal no partial information regarding the user's search (including the access pattern) in the public-key setting and with nontrivially small communication complexity. This provides a theoretical solution to a problem posed by Boneh, DiCrescenzo, Ostrovsky and Persiano on "Public-key Encryption with Keyword Search." The main technique of our solution also allows for Single-Database PIR writing with sublinear communication complexity, which we consider of independent interest.

  • a survey of Single Database private information retrieval techniques and applications
    Public Key Cryptography, 2007
    Co-Authors: Rafail Ostrovsky, Iii William E Skeith
    Abstract:

    In this paper we survey the notion of Single-Database Private Information Retrieval (PIR). The first Single-Database PIR was constructed in 1997 by Kushilevitz and Ostrovsky and since then Single-Database PIR has emerged as an important cryptographic primitive. For example, Single-Database PIR turned out to be intimately connected to collision-resistant hash functions, oblivious transfer and public-key encryptions with additional properties. In this survey, we give an overview of many of the constructions for Single-Database PIR (including an abstract construction based upon homomorphic encryption) and describe some of the connections of PIR to other primitives.

  • a survey of Single Database pir techniques and applications
    IACR Cryptology ePrint Archive, 2007
    Co-Authors: Rafail Ostrovsky, William E Skeith
    Abstract:

    In this paper we survey the notion of Single-Database Private Information Retrieval (PIR). The first Single-Database PIR was constructed in 1997 by Kushilevitz and Ostrovsky and since then Single-Database PIR has emerged as an important cryptographic primitive. For example, Single-Database PIR turned out to be intimately connected to collision-resistant hash functions, oblivious transfer and public-key encryptions with additional properties. In this survey, we give an overview of many of the constructions for Single-Database PIR (including an abstract construction based upon homomorphic encryption) and describe some of the connections of PIR to other primitives.

  • Single Database private information retrieval implies oblivious transfer
    Theory and Application of Cryptographic Techniques, 2000
    Co-Authors: Giovanni Di Crescenzo, Tal Malkin, Rafail Ostrovsky
    Abstract:

    A Single-Database Private Information Retrieval (PIR) is a protocol that allows a user to privately retrieve from a Database an entry with as small as possible communication complexity. We call a PIR protocol non-trivial if its total communication is strictly less than the size of the Database. Non-trivial PIR is an important cryptographic primitive with many applications. Thus, understanding which assumptions are necessary for implementing such a primitive is an important task, although (so far) not a well-understood one. In this paper we show that any non-trivial PIR implies Oblivious Transfer, a far better understood primitive. Our result not only significantly clarifies our understanding of any non-trivial PIR protocol, but also yields the following consequences: - Any non-trivial PIR is complete for all two-party and multiparty secure computations. - There exists a communication-efficient reduction from any PIR protocol to a 1-out-of-n Oblivious Transfer protocol (also called SPIR). - There is strong evidence that the assumption of the existence of a one-way function is necessary but not sufficient for any non-trivial PIR protocol.

  • replication is not needed Single Database computationally private information retrieval
    Foundations of Computer Science, 1997
    Co-Authors: Eyal Kushilevitz, Rafail Ostrovsky
    Abstract:

    We establish the following, quite unexpected, result: replication of data for the computational private information retrieval problem is not necessary. More specifically, based on the quadratic residuosity assumption, we present a Single Database, computationally private information retrieval scheme with O(n/sup /spl epsiv//) communication complexity for any /spl epsiv/>0.

Dragomir R Radev - One of the best experts on this subject based on the ideXlab platform.

  • spider a large scale human labeled dataset for complex and cross domain semantic parsing and text to sql task
    arXiv: Computation and Language, 2018
    Co-Authors: Rui Zhang, Kai Yang, Michihiro Yasunaga, Dongxu Wang, Qingning Yao, Shanelle Roman, Zilin Zhang, Dragomir R Radev
    Abstract:

    We present Spider, a large-scale, complex and cross-domain semantic parsing and text-to-SQL dataset annotated by 11 college students. It consists of 10,181 questions and 5,693 unique complex SQL queries on 200 Databases with multiple tables, covering 138 different domains. We define a new complex and cross-domain semantic parsing and text-to-SQL task where different complex SQL queries and Databases appear in train and test sets. In this way, the task requires the model to generalize well to both new SQL queries and new Database schemas. Spider is distinct from most of the previous semantic parsing tasks because they all use a Single Database and the exact same programs in the train set and the test set. We experiment with various state-of-the-art models and the best model achieves only 12.4% exact matching accuracy on a Database split setting. This shows that Spider presents a strong challenge for future research. Our dataset and task are publicly available at this https URL

  • spider a large scale human labeled dataset for complex and cross domain semantic parsing and text to sql task
    Empirical Methods in Natural Language Processing, 2018
    Co-Authors: Rui Zhang, Kai Yang, Michihiro Yasunaga, Dongxu Wang, Qingning Yao, Shanelle Roman, Zilin Zhang, Dragomir R Radev
    Abstract:

    We present Spider, a large-scale complex and cross-domain semantic parsing and text-to-SQL dataset annotated by 11 college students. It consists of 10,181 questions and 5,693 unique complex SQL queries on 200 Databases with multiple tables covering 138 different domains. We define a new complex and cross-domain semantic parsing and text-to-SQL task so that different complicated SQL queries and Databases appear in train and test sets. In this way, the task requires the model to generalize well to both new SQL queries and new Database schemas. Therefore, Spider is distinct from most of the previous semantic parsing tasks because they all use a Single Database and have the exact same program in the train set and the test set. We experiment with various state-of-the-art models and the best model achieves only 9.7% exact matching accuracy on a Database split setting. This shows that Spider presents a strong challenge for future research. Our dataset and task with the most recent updates are publicly available at https://yale-lily.github.io/seq2sql/spider.

Renhong Cheng - One of the best experts on this subject based on the ideXlab platform.

  • Robust detection of median filtering based on combined features of difference image
    Signal Processing: Image Communication, 2019
    Co-Authors: Hang Gao, Tiegang Gao, Renhong Cheng
    Abstract:

    Abstract Median filtering is a widely used method for denoising and smoothing regions of an image; it has drawn much attention from researchers of image forensic. A new detection scheme of median filtering based on combined features of difference image (CFDI) is proposed in this paper. In the proposed scheme, the combined features consist of joint conditional probability density functions (JCPDFs) of first-order and second-order difference image (DI), the principal component analysis (PCA) is used to reduce the dimensionality of JCPDFs, and thus, the final features are obtained for the given threshold. A large number of experiments on Single Database and compound Databases show that, the proposed scheme achieves superior performance on the uncompressed image datasets, and it also achieves better performance compared with state-of-the-art methods, especially for strong JPEG compression and low resolution images.

Peter Christen - One of the best experts on this subject based on the ideXlab platform.

  • a survey of indexing techniques for scalable record linkage and deduplication
    IEEE Transactions on Knowledge and Data Engineering, 2012
    Co-Authors: Peter Christen
    Abstract:

    Record linkage is the process of matching records from several Databases that refer to the same entities. When applied on a Single Database, this process is known as deduplication. Increasingly, matched data are becoming important in many application areas, because they can contain information that is not available otherwise, or that is too costly to acquire. Removing duplicate records in a Single Database is a crucial step in the data cleaning process, because duplicates can severely influence the outcomes of any subsequent data processing or data mining. With the increasing size of today's Databases, the complexity of the matching process becomes one of the major challenges for record linkage and deduplication. In recent years, various indexing techniques have been developed for record linkage and deduplication. They are aimed at reducing the number of record pairs to be compared in the matching process by removing obvious nonmatching pairs, while at the same time maintaining high matching quality. This paper presents a survey of 12 variations of 6 indexing techniques. Their complexity is analyzed, and their performance and scalability is evaluated within an experimental framework using both synthetic and real data sets. No such detailed survey has so far been published.

  • development and user experiences of an open source data cleaning deduplication and record linkage system
    Knowledge Discovery and Data Mining, 2009
    Co-Authors: Peter Christen
    Abstract:

    Record linkage, also known as Database matching or entity resolution, is now recognised as a core step in the KDD process. Data mining projects increasingly require that information from several sources is combined before the actual mining can be conducted. Also of increasing interest is the deduplication of a Single Database. The objectives of record linkage and deduplication are to identify, match and merge all records that relate to the same real-world entities. Because real-world data is commonly 'dirty', data cleaning is an important first step in many deduplication, record linkage, and data mining project. In this paper, an overview of the Febrl (Freely Extensible Biomedical Record Linkage) system is provided, and the results of a recent survey of Febrl users is discussed. Febrl includes a variety of functionalities required for data cleaning, deduplication and record linkage, and it provides a graphical user interface that facilitates its application for users who do not have programming experience.

Hang Gao - One of the best experts on this subject based on the ideXlab platform.

  • Detection of median filtering based on ARMA model and pixel-pair histogram feature of difference image
    Multimedia Tools and Applications, 2020
    Co-Authors: Hang Gao, Tiegang Gao
    Abstract:

    Median filtering is a widely used method for removing noise and smoothing regions of an image, and the detection of median filtering has drawn much attention from researchers of image forensic. A new robust detection scheme of median filtering based on pixel-pair histogram (PPH) and coefficients of autoregressive moving average model (ARMA) of difference image is proposed in this paper. In the proposed scheme, the PPH and ARMA are extracted from the difference image in four directions; the generated PPH-ARMA feature of 396 dimensions can effectively be used to detect the median filtering. In order to verify the effectiveness of the proposed scheme, a series of experiments on Single Database and compound Databases are conducted, and the experimental results show that, the proposed scheme outperforms many existing algorithms. Moreover, the suggested approach achieves best performance in Single dataset and multiple compound datasets compared with state-of-the-art methods, especially for strong JPEG compression and low resolution images.

  • Robust detection of median filtering based on combined features of difference image
    Signal Processing: Image Communication, 2019
    Co-Authors: Hang Gao, Tiegang Gao, Renhong Cheng
    Abstract:

    Abstract Median filtering is a widely used method for denoising and smoothing regions of an image; it has drawn much attention from researchers of image forensic. A new detection scheme of median filtering based on combined features of difference image (CFDI) is proposed in this paper. In the proposed scheme, the combined features consist of joint conditional probability density functions (JCPDFs) of first-order and second-order difference image (DI), the principal component analysis (PCA) is used to reduce the dimensionality of JCPDFs, and thus, the final features are obtained for the given threshold. A large number of experiments on Single Database and compound Databases show that, the proposed scheme achieves superior performance on the uncompressed image datasets, and it also achieves better performance compared with state-of-the-art methods, especially for strong JPEG compression and low resolution images.