Functional Genomics

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 360 Experts worldwide ranked by ideXlab platform

Arif Harmanci - One of the best experts on this subject based on the ideXlab platform.

  • analysis of sensitive information leakage in Functional Genomics signal profiles through genomic deletions
    Nature Communications, 2018
    Co-Authors: Arif Harmanci, Mark Gerstein
    Abstract:

    Functional Genomics experiments, such as RNA-seq, provide non-individual specific information about gene expression under different conditions such as disease and normal. There is great desire to share these data. However, privacy concerns often preclude sharing of the raw reads. To enable safe sharing, aggregated summaries such as read-depth signal profiles and levels of gene expression are used. Projects such as GTEx and ENCODE share these because they ostensibly do not leak much identifying information. Here, we attempt to quantify the validity of this statement, measuring the leakage of genomic deletions from signal profiles. We present information theoretic measures for the degree to which one can genotype these deletions. We then develop practical genotyping approaches and demonstrate how to use these to identify an individual within a large cohort in the context of linking attacks. Finally, we present an anonymization method removing much of the leakage from signal profiles.

  • sensitive information leakage from Functional Genomics data theoretical quantifications practical file formats for privacy preservation
    bioRxiv, 2018
    Co-Authors: Gamze Gursoy, Arif Harmanci, Molly E Green, Fabio C P Navarro, Mark Gerstein
    Abstract:

    Functional Genomics experiments provide data on aspects of gene function in a variety of conditions and how they relate to organismal phenotype (e.g. "genes upregulated in AIDS"). These experiments do not necessarily concern findings on identifiable individuals, leading to a neglect of their privacy issues; however, for each experiment, it is possible to create "cryptic quasi-identifiers"9 statistically linking them back to individuals and thereby leaking sensitive phenotypic information (e.g. "HIV status"). Here, we develop metrics for quantifying this leakage and instantiate them in practical linking attacks. As genotyping noise is a crucial quantity for the feasibility of attacks, we perform them both with highly accurate reference Genomics datasets as well as by generating RNA and DNA data from more realistic environmental samples. Finally, in order to reduce leakage, we develop a data-sanitization protocol for making principled privacy-utility trade-offs, permitting the sharing of Functional Genomics data while minimizing risk of leakage.

  • sensitive information leakage from Functional Genomics data theoretical quantifications practical file formats for privacy preservation
    bioRxiv, 2018
    Co-Authors: Gamze Gursoy, Arif Harmanci, Molly E Green, Fabio C P Navarro, Mark Gerstein
    Abstract:

    Functional Genomics experiments on human subjects present a privacy conundrum. On one hand, many of the conclusions we infer from these experiments are not tied to the identity of individuals but represent universal statements about biology and disease. On the other hand, by virtue of the experimental procedure, the sequencing reads are tagged with small bits of patients9 variant information, which presents privacy challenges in terms of data sharing. There is great desire to share data as broadly as possible. Therefore, measuring the amount of variant information leaked in a variety of experiments, particularly in relation to the amount of sequencing, is a key first step in reducing information leakage and determining an appropriate set point for sharing with minimal leakage. To this end, we derived information-theoretic measures for the private information leaked in experiments and developed various file formats to reduce this during sharing. We show that high-depth experiments such as Hi-C provide accurate genotyping that can lead to large privacy leaks. Counterintuitively, low-depth experiments such as ChIP and single-cell RNA sequencing, although not useful for genotyping, can create strong quasi-identifiers for re-identification through linking attacks. We show that partial and incomplete genotypes from many of these experiments can further be combined to construct an individual9s complete variant set and identify phenotypes. We provide a proof-of-concept analytic framework, in which the amount of leaked information can be estimated from the depth and breadth of the coverage as well as sequencing biases of a given Functional Genomics experiment. Finally, as a practical instantiation of our framework, we propose file formats that maximize the potential sharing of data while protecting individuals9 sensitive information. Depending on the desired sharing set point, our proposed format can achieve differential trade-offs in the privacy-utility balance. At the highest level of privacy, we mask all the variants leaked from reads, but still can create useable signal profiles that give complete recovery of the original gene expression levels.

Mark Gerstein - One of the best experts on this subject based on the ideXlab platform.

  • fancy fast estimation of privacy risk in Functional Genomics data
    Bioinformatics, 2021
    Co-Authors: Gamze Gursoy, Fabio C P Navarro, Charlotte M Brannon, Mark Gerstein
    Abstract:

    Motivation Functional Genomics data is becoming clinically actionable, raising privacy concerns. However, quantifying privacy leakage via genotyping is difficult due to the heterogeneous nature of sequencing techniques. Thus, we present FANCY, a tool that rapidly estimates the number of leaking variants from raw RNA-Seq, ATAC-Seq and ChIP-Seq reads, without explicit genotyping. FANCY employs supervised regression using overall sequencing statistics as features and provides an estimate of the overall privacy risk before data release. Results FANCY can predict the cumulative number of leaking SNVs with an average 0.95 R2 for all independent test sets. We realize the importance of accurate prediction even when the number of leaked variants is low. Thus, we develop a special version of the model, which can make predictions with higher accuracy for only a few leaking variants. Availability A python and MATLAB implementation of FANCY, as well as custom scripts to generate the features can be found at https://github.com/gersteinlab/FANCY. We also provide jupyter notebooks so that users can optimize the parameters in the regression model based on their own data. An easy-to-use webserver that takes inputs and displays results can be found at fancy.gersteinlab.org.

  • fancy fast estimation of privacy risk in Functional Genomics data
    bioRxiv, 2020
    Co-Authors: Gamze Gursoy, Fabio C P Navarro, Charlotte M Brannon, Mark Gerstein
    Abstract:

    Functional Genomics data is becoming clinically actionable, raising privacy concerns. However, quantifying the privacy leakage by genotyping is difficult due to the heterogeneous nature of sequencing techniques. Thus, we present FANCY, a tool that rapidly estimates number of leaking variants from raw RNA-Seq, ATAC-Seq and ChIP-Seq reads, without explicit genotyping. FANCY employs supervised regression using overall sequencing statistics as features and provides an estimate of the overall privacy risk before data release. FANCY can predict the cumulative number of leaking SNVs with a 0.95 average R^2 for all independent test sets. We acknowledged the importance of accurate prediction even when the number of leaked variants is low, so we developed a special version of model, which can make predictions with higher accuracy for only a few leaking variants. A python and MATLAB implementation of FANCY, as well as custom scripts to generate the features can be found at https://github.com/gersteinlab/FANCY. We also provide jupyter notebooks so that users can optimize the parameters in the regression model based on their own data. An easy-to-use webserver that takes inputs and displays results can be found at fancy.gersteinlab.org.

  • analysis of sensitive information leakage in Functional Genomics signal profiles through genomic deletions
    Nature Communications, 2018
    Co-Authors: Arif Harmanci, Mark Gerstein
    Abstract:

    Functional Genomics experiments, such as RNA-seq, provide non-individual specific information about gene expression under different conditions such as disease and normal. There is great desire to share these data. However, privacy concerns often preclude sharing of the raw reads. To enable safe sharing, aggregated summaries such as read-depth signal profiles and levels of gene expression are used. Projects such as GTEx and ENCODE share these because they ostensibly do not leak much identifying information. Here, we attempt to quantify the validity of this statement, measuring the leakage of genomic deletions from signal profiles. We present information theoretic measures for the degree to which one can genotype these deletions. We then develop practical genotyping approaches and demonstrate how to use these to identify an individual within a large cohort in the context of linking attacks. Finally, we present an anonymization method removing much of the leakage from signal profiles.

  • sensitive information leakage from Functional Genomics data theoretical quantifications practical file formats for privacy preservation
    bioRxiv, 2018
    Co-Authors: Gamze Gursoy, Arif Harmanci, Molly E Green, Fabio C P Navarro, Mark Gerstein
    Abstract:

    Functional Genomics experiments provide data on aspects of gene function in a variety of conditions and how they relate to organismal phenotype (e.g. "genes upregulated in AIDS"). These experiments do not necessarily concern findings on identifiable individuals, leading to a neglect of their privacy issues; however, for each experiment, it is possible to create "cryptic quasi-identifiers"9 statistically linking them back to individuals and thereby leaking sensitive phenotypic information (e.g. "HIV status"). Here, we develop metrics for quantifying this leakage and instantiate them in practical linking attacks. As genotyping noise is a crucial quantity for the feasibility of attacks, we perform them both with highly accurate reference Genomics datasets as well as by generating RNA and DNA data from more realistic environmental samples. Finally, in order to reduce leakage, we develop a data-sanitization protocol for making principled privacy-utility trade-offs, permitting the sharing of Functional Genomics data while minimizing risk of leakage.

  • sensitive information leakage from Functional Genomics data theoretical quantifications practical file formats for privacy preservation
    bioRxiv, 2018
    Co-Authors: Gamze Gursoy, Arif Harmanci, Molly E Green, Fabio C P Navarro, Mark Gerstein
    Abstract:

    Functional Genomics experiments on human subjects present a privacy conundrum. On one hand, many of the conclusions we infer from these experiments are not tied to the identity of individuals but represent universal statements about biology and disease. On the other hand, by virtue of the experimental procedure, the sequencing reads are tagged with small bits of patients9 variant information, which presents privacy challenges in terms of data sharing. There is great desire to share data as broadly as possible. Therefore, measuring the amount of variant information leaked in a variety of experiments, particularly in relation to the amount of sequencing, is a key first step in reducing information leakage and determining an appropriate set point for sharing with minimal leakage. To this end, we derived information-theoretic measures for the private information leaked in experiments and developed various file formats to reduce this during sharing. We show that high-depth experiments such as Hi-C provide accurate genotyping that can lead to large privacy leaks. Counterintuitively, low-depth experiments such as ChIP and single-cell RNA sequencing, although not useful for genotyping, can create strong quasi-identifiers for re-identification through linking attacks. We show that partial and incomplete genotypes from many of these experiments can further be combined to construct an individual9s complete variant set and identify phenotypes. We provide a proof-of-concept analytic framework, in which the amount of leaked information can be estimated from the depth and breadth of the coverage as well as sequencing biases of a given Functional Genomics experiment. Finally, as a practical instantiation of our framework, we propose file formats that maximize the potential sharing of data while protecting individuals9 sensitive information. Depending on the desired sharing set point, our proposed format can achieve differential trade-offs in the privacy-utility balance. At the highest level of privacy, we mask all the variants leaked from reads, but still can create useable signal profiles that give complete recovery of the original gene expression levels.

Madeleine Bouvier Dyvoire - One of the best experts on this subject based on the ideXlab platform.

  • a tilling platform for Functional Genomics in brachypodium distachyon
    PLOS ONE, 2013
    Co-Authors: Marion Dalmais, Sébastien Antelme, Severine Hoyuekuang, Yin Wang, Olivier Darracq, Madeleine Bouvier Dyvoire
    Abstract:

    The new model plant for temperate grasses, Brachypodium distachyon offers great potential as a tool for Functional Genomics. We have established a sodium azide-induced mutant collection and a TILLING platform, called "BRACHYTIL", for the inbred line Bd21-3. The TILLING collection consists of DNA isolated from 5530 different families. Phenotypes were reported and organized in a phenotypic tree that is freely available online. The tilling platform was validated by the isolation of mutants for seven genes belonging to multigene families of the lignin biosynthesis pathway. In particular, a large allelic series for BdCOMT6, a caffeic acid O-methyl transferase was identified. Some mutants show lower lignin content when compared to wild-type plants as well as a typical decrease of syringyl units, a hallmark of COMT-deficient plants. The mutation rate was estimated at one mutation per 396 kb, or an average of 680 mutations per line. The collection was also used to assess the Genetically Effective Cell Number that was shown to be at least equal to 4 cells in Brachypodium distachyon. The mutant population and the TILLING platform should greatly facilitate Functional Genomics approaches in this model organism.

Severine Hoyuekuang - One of the best experts on this subject based on the ideXlab platform.

  • a tilling platform for Functional Genomics in brachypodium distachyon
    PLOS ONE, 2013
    Co-Authors: Marion Dalmais, Sébastien Antelme, Severine Hoyuekuang, Yin Wang, Olivier Darracq, Madeleine Bouvier Dyvoire
    Abstract:

    The new model plant for temperate grasses, Brachypodium distachyon offers great potential as a tool for Functional Genomics. We have established a sodium azide-induced mutant collection and a TILLING platform, called "BRACHYTIL", for the inbred line Bd21-3. The TILLING collection consists of DNA isolated from 5530 different families. Phenotypes were reported and organized in a phenotypic tree that is freely available online. The tilling platform was validated by the isolation of mutants for seven genes belonging to multigene families of the lignin biosynthesis pathway. In particular, a large allelic series for BdCOMT6, a caffeic acid O-methyl transferase was identified. Some mutants show lower lignin content when compared to wild-type plants as well as a typical decrease of syringyl units, a hallmark of COMT-deficient plants. The mutation rate was estimated at one mutation per 396 kb, or an average of 680 mutations per line. The collection was also used to assess the Genetically Effective Cell Number that was shown to be at least equal to 4 cells in Brachypodium distachyon. The mutant population and the TILLING platform should greatly facilitate Functional Genomics approaches in this model organism.

Sébastien Antelme - One of the best experts on this subject based on the ideXlab platform.

  • a tilling platform for Functional Genomics in brachypodium distachyon
    PLOS ONE, 2013
    Co-Authors: Marion Dalmais, Sébastien Antelme, Severine Hoyuekuang, Yin Wang, Olivier Darracq, Madeleine Bouvier Dyvoire
    Abstract:

    The new model plant for temperate grasses, Brachypodium distachyon offers great potential as a tool for Functional Genomics. We have established a sodium azide-induced mutant collection and a TILLING platform, called "BRACHYTIL", for the inbred line Bd21-3. The TILLING collection consists of DNA isolated from 5530 different families. Phenotypes were reported and organized in a phenotypic tree that is freely available online. The tilling platform was validated by the isolation of mutants for seven genes belonging to multigene families of the lignin biosynthesis pathway. In particular, a large allelic series for BdCOMT6, a caffeic acid O-methyl transferase was identified. Some mutants show lower lignin content when compared to wild-type plants as well as a typical decrease of syringyl units, a hallmark of COMT-deficient plants. The mutation rate was estimated at one mutation per 396 kb, or an average of 680 mutations per line. The collection was also used to assess the Genetically Effective Cell Number that was shown to be at least equal to 4 cells in Brachypodium distachyon. The mutant population and the TILLING platform should greatly facilitate Functional Genomics approaches in this model organism.