Indel - Explore the Science & Experts

The Experts below are selected from a list of 31929 Experts worldwide ranked by ideXlab platform

Liqing Zhang - One of the best experts on this subject based on the ideXlab platform.

Uncovering missed Indels by leveraging unmapped reads.

Scientific reports, 2019

Co-Authors: Mohammad Shabbir Hasan, Liqing Zhang

Abstract:

In current practice, Next Generation Sequencing (NGS) applications start with mapping/aligning short reads to the reference genome, with the aim of identifying genetic variants. Although existing alignment tools have shown great accuracy in mapping short reads to the reference genome, a significant number of short reads still remain unmapped and are often excluded from downstream analyses thereby causing nonnegligible information loss in the subsequent variant calling procedure. This paper describes Genesis-Indel, a computational pipeline that explores the unmapped reads to identify novel Indels that are initially missed in the original procedure. Genesis-Indel is applied to the unmapped reads of 30 breast cancer patients from TCGA. Results show that the unmapped reads are conserved between the two subtypes of breast cancer investigated in this study and might contribute to the divergence between the subtypes. Genesis-Indel identifies 72,997 novel high-quality Indels previously not found, among which 16,141 have not been annotated in the widely used mutation database. Statistical analysis of these Indels shows significant enrichment of Indels residing in oncogenes and tumour suppressor genes. Functional annotation further reveals that these Indels are strongly correlated with pathways of cancer and can have high to moderate impact on protein functions. Additionally, some of the Indels overlap with the genes that do not have any Indel mutations called from the originally mapped reads but have been shown to contribute to the tumorigenesis in multiple carcinomas, further emphasizing the importance of rescuing Indels hidden in the unmapped reads in cancer and disease studies.

15 days free trial to Access Article
Uncovering missed Indels by leveraging unmapped reads

2018

Co-Authors: Mohammad Shabbir Hasan, Liqing Zhang

Abstract:

In current practice, Next Generation Sequencing (NGS) applications start with mapping/aligning short reads to the reference genome, with the aim of identifying genetic mutations. While most short reads can be mapped to the reference genome accurately by existing alignment tools, a significant number remain unmapped and excluded from downstream analyses thus potentially discarding important biological information hidden in the unmapped reads. This paper describes Genesis-Indel, a computational pipeline that explores the unmapped reads to identify novel Indels that are initially missed in the alignment procedure. Genesis-Indel is applied to the unmapped reads of 30 Breast Cancer patients from TCGA. Results show that the unmapped reads are conserved between the two subtypes of breast cancer investigated in this study and might contribute to the divergence between the subtypes. Genesis-Indel is able to leverage the unmapped reads to identify 72,997 small to large novel high-quality Indels previously not found in the original alignments and among them, 16,141 have not been annotated in the widely used mutation database. Statistical analysis shows that these new Indels mostly altered the oncogenes and tumor suppressor genes. Functional annotation further reveals that these Indels are strongly correlated to pathways of cancer and can have high to moderate impact on protein functions. Additionally, these Indels overlap with the genes that are missed in the Indels from the originally mapped reads and contribute to the tumorigenesis in multiple carcinomas.

15 days free trial to Access Article
UPS-Indel: a Universal Positioning System for Indels

Scientific Reports, 2017

Co-Authors: Mohammad Shabbir Hasan, Layne T. Watson, Liqing Zhang

Abstract:

Storing biologically equivalent Indels as distinct entries in databases causes data redundancy, and misleads downstream analysis. It is thus desirable to have a unified system for identifying and representing equivalent Indels. Moreover, a unified system is also desirable to compare the Indel calling results produced by different tools. This paper describes UPS-Indel, a utility tool that creates a universal positioning system for Indels so that equivalent Indels can be uniquely determined by their coordinates in the new system, which also can be used to compare different Indel calling results. UPS-Indel identifies 15% redundant Indels in dbSNP, 29% in COSMIC coding, and 13% in COSMIC noncoding datasets across all human chromosomes, higher than previously reported. Comparing the performance of UPS-Indel with existing variant normalization tools vt normalize, BCFtools, and GATK LeftAlignAndTrimVariants shows that UPS-Indel is able to identify 456,352 more redundant Indels in dbSNP; 2,118 more in COSMIC coding, and 553 more in COSMIC noncoding Indel dataset in addition to the ones reported jointly by these tools. Moreover, comparing UPS-Indel to state-of-the-art approaches for Indel call set comparison demonstrates its clear superiority in finding common Indels among call sets. UPS-Indel is theoretically proven to find all equivalent Indels, and thus exhaustive.

15 days free trial to Access Article
UPS-Indel: a Universal Positioning System for Indels

bioRxiv, 2017

Co-Authors: Mohammad Shabbir Hasan, Zhiyi Li, Layne T. Watson, Xiaowei Wu, Liqing Zhang

Abstract:

Indels, though differing in allele sequence and position, are biologically equivalent when they lead to the same altered sequences. Storing biologically equivalent Indels as distinct entries in databases causes data redundancy, and may mislead downstream analysis and interpretations. About 10% of the human Indels stored in dbSNP are redundant. It is thus desirable to have a unified system for identifying and representing equivalent Indels in publically available databases. Moreover, a unified system is also desirable to compare the Indel calling results produced by different tools. This paper describes UPS-Indel, a utility tool that creates a universal positioning system for Indels so that equivalent Indels can be uniquely determined by their coordinates in the new system, which also can be used to compare Indel calling results produced by different tools. UPS-Indel identifies nearly 15% Indels in dbSNP (version 142) as redundant across all human chromosomes, higher than previously reported. When applied to COSMIC coding and noncoding Indel datasets, UPS-Indel identifies nearly 29% and 13% Indels as redundant, respectively. Comparing the performance of UPS-Indel with existing variant normalization tools vt normalize, BCFtools, and GATK LeftAlignAndTrimVariants shows that UPS-Indel is able to identify 456,352 more redundant Indels in dbSNP; 2,118 more in COSMIC coding, and 553 more in COSMIC noncoding Indel dataset in addition to the ones reported jointly by these tools. Moreover, comparing UPS-Indel to other state-of-the-art approaches for Indel call set comparison demonstrates that UPS-Indel is clearly superior to other approaches in finding Indels in common among call sets. UPS-Indel is theoretically proven to find all equivalent Indels, and is thus exhaustive. UPS-Indel is written in C++ and the command line version is freely available to download at http://ups-Indel.sourceforge.net. The online version of UPS-Indel is available at http://bench.cs.vt.edu/ups-Indel/.

15 days free trial to Access Article
UPS-Indel: a Universal Positioning System for Indels

2017

Co-Authors: Mohammad Shabbir Hasan, Layne T. Watson, Liqing Zhang

Abstract:

AbstractBackgroundIndels, though differing in allele sequence and position, are biologically equivalent when they lead to the same altered sequences. Storing biologically equivalent Indels as distinct entries in databases causes data redundancy, and may mislead downstream analysis and interpretations. About 10% of the human Indels stored in dbSNP are redundant. It is thus desirable to have a unified system for identifying and representing equivalent Indels in publically available databases. Moreover, a unified system is also desirable to compare the Indel calling results produced by different tools. This paper describes UPS-Indel, a utility tool that creates a universal positioning system for Indels so that equivalent Indels can be uniquely determined by their coordinates in the new system, which also can be used to compare Indel calling results produced by different tools.ResultsUPS-Indel identifies nearly 15% Indels in dbSNP (version 142) as redundant across all human chromosomes, higher than previously reported. When applied to COSMIC coding and noncoding Indel datasets, UPS-Indel identifies nearly 29% and 13% Indels as redundant, respectively. Comparing the performance of UPS-Indel with existing variant normalization tools vt normalize, BCFtools, and GATK LeftAlignAndTrimVariants shows that UPS-Indel is able to identify 456,352 more redundant Indels in dbSNP; 2,118 more in COSMIC coding, and 553 more in COSMIC noncoding Indel dataset in addition to the ones reported jointly by these tools. Moreover, comparing UPS-Indel to other state-of-the-art approaches for Indel call set comparison demonstrates that UPS-Indel is clearly superior to other approaches in finding Indels in common among call sets.ConclusionsUPS-Indel is theoretically proven to find all equivalent Indels, and is thus exhaustive. UPS-Indel is written in C++ and the command line version is freely available to download at http://ups-Indel.sourceforge.net. The online version of UPS-Indel is available at http://bench.cs.vt.edu/ups-Indel/.

15 days free trial to Access Article

Mohammad Shabbir Hasan - One of the best experts on this subject based on the ideXlab platform.

Uncovering missed Indels by leveraging unmapped reads.

Scientific reports, 2019

Co-Authors: Mohammad Shabbir Hasan, Liqing Zhang

Abstract:

In current practice, Next Generation Sequencing (NGS) applications start with mapping/aligning short reads to the reference genome, with the aim of identifying genetic variants. Although existing alignment tools have shown great accuracy in mapping short reads to the reference genome, a significant number of short reads still remain unmapped and are often excluded from downstream analyses thereby causing nonnegligible information loss in the subsequent variant calling procedure. This paper describes Genesis-Indel, a computational pipeline that explores the unmapped reads to identify novel Indels that are initially missed in the original procedure. Genesis-Indel is applied to the unmapped reads of 30 breast cancer patients from TCGA. Results show that the unmapped reads are conserved between the two subtypes of breast cancer investigated in this study and might contribute to the divergence between the subtypes. Genesis-Indel identifies 72,997 novel high-quality Indels previously not found, among which 16,141 have not been annotated in the widely used mutation database. Statistical analysis of these Indels shows significant enrichment of Indels residing in oncogenes and tumour suppressor genes. Functional annotation further reveals that these Indels are strongly correlated with pathways of cancer and can have high to moderate impact on protein functions. Additionally, some of the Indels overlap with the genes that do not have any Indel mutations called from the originally mapped reads but have been shown to contribute to the tumorigenesis in multiple carcinomas, further emphasizing the importance of rescuing Indels hidden in the unmapped reads in cancer and disease studies.

15 days free trial to Access Article
Uncovering missed Indels by leveraging unmapped reads

2018

Co-Authors: Mohammad Shabbir Hasan, Liqing Zhang

Abstract:

In current practice, Next Generation Sequencing (NGS) applications start with mapping/aligning short reads to the reference genome, with the aim of identifying genetic mutations. While most short reads can be mapped to the reference genome accurately by existing alignment tools, a significant number remain unmapped and excluded from downstream analyses thus potentially discarding important biological information hidden in the unmapped reads. This paper describes Genesis-Indel, a computational pipeline that explores the unmapped reads to identify novel Indels that are initially missed in the alignment procedure. Genesis-Indel is applied to the unmapped reads of 30 Breast Cancer patients from TCGA. Results show that the unmapped reads are conserved between the two subtypes of breast cancer investigated in this study and might contribute to the divergence between the subtypes. Genesis-Indel is able to leverage the unmapped reads to identify 72,997 small to large novel high-quality Indels previously not found in the original alignments and among them, 16,141 have not been annotated in the widely used mutation database. Statistical analysis shows that these new Indels mostly altered the oncogenes and tumor suppressor genes. Functional annotation further reveals that these Indels are strongly correlated to pathways of cancer and can have high to moderate impact on protein functions. Additionally, these Indels overlap with the genes that are missed in the Indels from the originally mapped reads and contribute to the tumorigenesis in multiple carcinomas.

15 days free trial to Access Article
UPS-Indel: a Universal Positioning System for Indels

Scientific Reports, 2017

Co-Authors: Mohammad Shabbir Hasan, Layne T. Watson, Liqing Zhang

Abstract:

Storing biologically equivalent Indels as distinct entries in databases causes data redundancy, and misleads downstream analysis. It is thus desirable to have a unified system for identifying and representing equivalent Indels. Moreover, a unified system is also desirable to compare the Indel calling results produced by different tools. This paper describes UPS-Indel, a utility tool that creates a universal positioning system for Indels so that equivalent Indels can be uniquely determined by their coordinates in the new system, which also can be used to compare different Indel calling results. UPS-Indel identifies 15% redundant Indels in dbSNP, 29% in COSMIC coding, and 13% in COSMIC noncoding datasets across all human chromosomes, higher than previously reported. Comparing the performance of UPS-Indel with existing variant normalization tools vt normalize, BCFtools, and GATK LeftAlignAndTrimVariants shows that UPS-Indel is able to identify 456,352 more redundant Indels in dbSNP; 2,118 more in COSMIC coding, and 553 more in COSMIC noncoding Indel dataset in addition to the ones reported jointly by these tools. Moreover, comparing UPS-Indel to state-of-the-art approaches for Indel call set comparison demonstrates its clear superiority in finding common Indels among call sets. UPS-Indel is theoretically proven to find all equivalent Indels, and thus exhaustive.

15 days free trial to Access Article
UPS-Indel: a Universal Positioning System for Indels

bioRxiv, 2017

Co-Authors: Mohammad Shabbir Hasan, Zhiyi Li, Layne T. Watson, Xiaowei Wu, Liqing Zhang

Abstract:

Indels, though differing in allele sequence and position, are biologically equivalent when they lead to the same altered sequences. Storing biologically equivalent Indels as distinct entries in databases causes data redundancy, and may mislead downstream analysis and interpretations. About 10% of the human Indels stored in dbSNP are redundant. It is thus desirable to have a unified system for identifying and representing equivalent Indels in publically available databases. Moreover, a unified system is also desirable to compare the Indel calling results produced by different tools. This paper describes UPS-Indel, a utility tool that creates a universal positioning system for Indels so that equivalent Indels can be uniquely determined by their coordinates in the new system, which also can be used to compare Indel calling results produced by different tools. UPS-Indel identifies nearly 15% Indels in dbSNP (version 142) as redundant across all human chromosomes, higher than previously reported. When applied to COSMIC coding and noncoding Indel datasets, UPS-Indel identifies nearly 29% and 13% Indels as redundant, respectively. Comparing the performance of UPS-Indel with existing variant normalization tools vt normalize, BCFtools, and GATK LeftAlignAndTrimVariants shows that UPS-Indel is able to identify 456,352 more redundant Indels in dbSNP; 2,118 more in COSMIC coding, and 553 more in COSMIC noncoding Indel dataset in addition to the ones reported jointly by these tools. Moreover, comparing UPS-Indel to other state-of-the-art approaches for Indel call set comparison demonstrates that UPS-Indel is clearly superior to other approaches in finding Indels in common among call sets. UPS-Indel is theoretically proven to find all equivalent Indels, and is thus exhaustive. UPS-Indel is written in C++ and the command line version is freely available to download at http://ups-Indel.sourceforge.net. The online version of UPS-Indel is available at http://bench.cs.vt.edu/ups-Indel/.

15 days free trial to Access Article
UPS-Indel: a Universal Positioning System for Indels

2017

Co-Authors: Mohammad Shabbir Hasan, Layne T. Watson, Liqing Zhang

Abstract:

AbstractBackgroundIndels, though differing in allele sequence and position, are biologically equivalent when they lead to the same altered sequences. Storing biologically equivalent Indels as distinct entries in databases causes data redundancy, and may mislead downstream analysis and interpretations. About 10% of the human Indels stored in dbSNP are redundant. It is thus desirable to have a unified system for identifying and representing equivalent Indels in publically available databases. Moreover, a unified system is also desirable to compare the Indel calling results produced by different tools. This paper describes UPS-Indel, a utility tool that creates a universal positioning system for Indels so that equivalent Indels can be uniquely determined by their coordinates in the new system, which also can be used to compare Indel calling results produced by different tools.ResultsUPS-Indel identifies nearly 15% Indels in dbSNP (version 142) as redundant across all human chromosomes, higher than previously reported. When applied to COSMIC coding and noncoding Indel datasets, UPS-Indel identifies nearly 29% and 13% Indels as redundant, respectively. Comparing the performance of UPS-Indel with existing variant normalization tools vt normalize, BCFtools, and GATK LeftAlignAndTrimVariants shows that UPS-Indel is able to identify 456,352 more redundant Indels in dbSNP; 2,118 more in COSMIC coding, and 553 more in COSMIC noncoding Indel dataset in addition to the ones reported jointly by these tools. Moreover, comparing UPS-Indel to other state-of-the-art approaches for Indel call set comparison demonstrates that UPS-Indel is clearly superior to other approaches in finding Indels in common among call sets.ConclusionsUPS-Indel is theoretically proven to find all equivalent Indels, and is thus exhaustive. UPS-Indel is written in C++ and the command line version is freely available to download at http://ups-Indel.sourceforge.net. The online version of UPS-Indel is available at http://bench.cs.vt.edu/ups-Indel/.

15 days free trial to Access Article

Kiyoshi Ezawa - One of the best experts on this subject based on the ideXlab platform.

General continuous-time Markov model of sequence evolution via insertions/deletions: are alignment probabilities factorable?

BMC Bioinformatics, 2016

Co-Authors: Kiyoshi Ezawa

Abstract:

Background Insertions and deletions (Indels) account for more nucleotide differences between two related DNA sequences than substitutions do, and thus it is imperative to develop a stochastic evolutionary model that enables us to reliably calculate the probability of the sequence evolution through Indel processes. Recently, Indel probabilistic models are mostly based on either hidden Markov models (HMMs) or transducer theories, both of which give the Indel component of the probability of a given sequence alignment as a product of either probabilities of column-to-column transitions or block-wise contributions along the alignment. However, it is not a priori clear how these models are related with any genuine stochastic evolutionary model, which describes the stochastic evolution of an entire sequence along the time-axis. Moreover, currently none of these models can fully accommodate biologically realistic features, such as overlapping Indels, power-law Indel-length distributions, and Indel rate variation across regions. Results Here, we theoretically dissect the ab initio calculation of the probability of a given sequence alignment under a genuine stochastic evolutionary model, more specifically, a general continuous-time Markov model of the evolution of an entire sequence via insertions and deletions. Our model is a simple extension of the general “substitution/insertion/deletion (SID) model”. Using the operator representation of Indels and the technique of time-dependent perturbation theory, we express the ab initio probability as a summation over all alignment-consistent Indel histories. Exploiting the equivalence relations between different Indel histories, we find a “sufficient and nearly necessary” set of conditions under which the probability can be factorized into the product of an overall factor and the contributions from regions separated by gapless columns of the alignment, thus providing a sort of generalized HMM. The conditions distinguish evolutionary models with factorable alignment probabilities from those without ones. The former category includes the “long Indel” model (a space-homogeneous SID model) and the model used by Dawg, a genuine sequence evolution simulator. Conclusions With intuitive clarity and mathematical preciseness, our theoretical formulation will help further advance the ab initio calculation of alignment probabilities under biologically realistic models of sequence evolution via Indels.

15 days free trial to Access Article
general continuous time markov model of sequence evolution via insertions deletions are alignment probabilities factorable

BMC Bioinformatics, 2016

Co-Authors: Kiyoshi Ezawa

Abstract:

Insertions and deletions (Indels) account for more nucleotide differences between two related DNA sequences than substitutions do, and thus it is imperative to develop a stochastic evolutionary model that enables us to reliably calculate the probability of the sequence evolution through Indel processes. Recently, Indel probabilistic models are mostly based on either hidden Markov models (HMMs) or transducer theories, both of which give the Indel component of the probability of a given sequence alignment as a product of either probabilities of column-to-column transitions or block-wise contributions along the alignment. However, it is not a priori clear how these models are related with any genuine stochastic evolutionary model, which describes the stochastic evolution of an entire sequence along the time-axis. Moreover, currently none of these models can fully accommodate biologically realistic features, such as overlapping Indels, power-law Indel-length distributions, and Indel rate variation across regions. Here, we theoretically dissect the ab initio calculation of the probability of a given sequence alignment under a genuine stochastic evolutionary model, more specifically, a general continuous-time Markov model of the evolution of an entire sequence via insertions and deletions. Our model is a simple extension of the general “substitution/insertion/deletion (SID) model”. Using the operator representation of Indels and the technique of time-dependent perturbation theory, we express the ab initio probability as a summation over all alignment-consistent Indel histories. Exploiting the equivalence relations between different Indel histories, we find a “sufficient and nearly necessary” set of conditions under which the probability can be factorized into the product of an overall factor and the contributions from regions separated by gapless columns of the alignment, thus providing a sort of generalized HMM. The conditions distinguish evolutionary models with factorable alignment probabilities from those without ones. The former category includes the “long Indel” model (a space-homogeneous SID model) and the model used by Dawg, a genuine sequence evolution simulator. With intuitive clarity and mathematical preciseness, our theoretical formulation will help further advance the ab initio calculation of alignment probabilities under biologically realistic models of sequence evolution via Indels.

15 days free trial to Access Article
General continuous-time Markov model of sequence evolution via insertions/deletions: are alignment probabilities factorable?

BMC Bioinformatics, 2016

Co-Authors: Kiyoshi Ezawa

Abstract:

Insertions and deletions (Indels) account for more nucleotide differences between two related DNA sequences than substitutions do, and thus it is imperative to develop a stochastic evolutionary model that enables us to reliably calculate the probability of the sequence evolution through Indel processes. Recently, Indel probabilistic models are mostly based on either hidden Markov models (HMMs) or transducer theories, both of which give the Indel component of the probability of a given sequence alignment as a product of either probabilities of column-to-column transitions or block-wise contributions along the alignment. However, it is not a priori clear how these models are related with any genuine stochastic evolutionary model, which describes the stochastic evolution of an entire sequence along the time-axis. Moreover, currently none of these models can fully accommodate biologically realistic features, such as overlapping Indels, power-law Indel-length distributions, and Indel rate variation across regions. Here, we theoretically dissect the ab initio calculation of the probability of a given sequence alignment under a genuine stochastic evolutionary model, more specifically, a general continuous-time Markov model of the evolution of an entire sequence via insertions and deletions. Our model is a simple extension of the general “substitution/insertion/deletion (SID) model”. Using the operator representation of Indels and the technique of time-dependent perturbation theory, we express the ab initio probability as a summation over all alignment-consistent Indel histories. Exploiting the equivalence relations between different Indel histories, we find a “sufficient and nearly necessary” set of conditions under which the probability can be factorized into the product of an overall factor and the contributions from regions separated by gapless columns of the alignment, thus providing a sort of generalized HMM. The conditions distinguish evolutionary models with factorable alignment probabilities from those without ones. The former category includes the “long Indel” model (a space-homogeneous SID model) and the model used by Dawg, a genuine sequence evolution simulator. With intuitive clarity and mathematical preciseness, our theoretical formulation will help further advance the ab initio calculation of alignment probabilities under biologically realistic models of sequence evolution via Indels.

15 days free trial to Access Article

Marília D V Braga - One of the best experts on this subject based on the ideXlab platform.

Restricted DCJ-Indel model: sorting linear genomes with DCJ and Indels.

BMC bioinformatics, 2012

Co-Authors: Poly H Da Silva, Raphael Machado, Simone Dantas, Marília D V Braga

Abstract:

The double-cut-and-join (DCJ) is a model that is able to efficiently sort a genome into another, generalizing the typical mutations (inversions, fusions, fissions, translocations) to which genomes are subject, but allowing the existence of circular chromosomes at the intermediate steps. In the general model many circular chromosomes can coexist in some intermediate step. However, when the compared genomes are linear, it is more plausible to use the so-called restricted DCJ model, in which we proceed the reincorporation of a circular chromosome immediately after its creation. These two consecutive DCJ operations, which create and reincorporate a circular chromosome, mimic a transposition or a block-interchange. When the compared genomes have the same content, it is known that the genomic distance for the restricted DCJ model is the same as the distance for the general model. If the genomes have unequal contents, in addition to DCJ it is necessary to consider Indels, which are insertions and deletions of DNA segments. Linear time algorithms were proposed to compute the distance and to find a sorting scenario in a general, unrestricted DCJ-Indel model that considers DCJ and Indels. In the present work we consider the restricted DCJ-Indel model for sorting linear genomes with unequal contents. We allow DCJ operations and Indels with the following constraint: if a circular chromosome is created by a DCJ, it has to be reincorporated in the next step (no other DCJ or Indel can be applied between the creation and the reincorporation of a circular chromosome). We then develop a sorting algorithm and give a tight upper bound for the restricted DCJ-Indel distance. We have given a tight upper bound for the restricted DCJ-Indel distance. The question whether this bound can be reduced so that both the general and the restricted DCJ-Indel distances are equal remains open.

15 days free trial to Access Article
Restricted DCJ-Indel model: sorting linear genomes with DCJ and Indels.

BMC Bioinformatics, 2012

Co-Authors: Poly H Da Silva, Simone Dantas, Raphael C. S. Machado, Marília D V Braga

Abstract:

Background The double-cut-and-join (DCJ) is a model that is able to efficiently sort a genome into another, generalizing the typical mutations (inversions, fusions, fissions, translocations) to which genomes are subject, but allowing the existence of circular chromosomes at the intermediate steps. In the general model many circular chromosomes can coexist in some intermediate step. However, when the compared genomes are linear, it is more plausible to use the so-called restricted DCJ model, in which we proceed the reincorporation of a circular chromosome immediately after its creation. These two consecutive DCJ operations, which create and reincorporate a circular chromosome, mimic a transposition or a block-interchange. When the compared genomes have the same content, it is known that the genomic distance for the restricted DCJ model is the same as the distance for the general model. If the genomes have unequal contents, in addition to DCJ it is necessary to consider Indels, which are insertions and deletions of DNA segments. Linear time algorithms were proposed to compute the distance and to find a sorting scenario in a general, unrestricted DCJ-Indel model that considers DCJ and Indels.

15 days free trial to Access Article
Restricted DCJ-Indel model: sorting linear genomes with DCJ and Indels

BMC Bioinformatics, 2012

Co-Authors: Poly H Da Silva, Raphael Machado, Simone Dantas, Marília D V Braga

Abstract:

Abstract Background The double-cut-and-join (DCJ) is a model that is able to efficiently sort a genome into another, generalizing the typical mutations (inversions, fusions, fissions, translocations) to which genomes are subject, but allowing the existence of circular chromosomes at the intermediate steps. In the general model many circular chromosomes can coexist in some intermediate step. However, when the compared genomes are linear, it is more plausible to use the so-called restricted DCJ model, in which we proceed the reincorporation of a circular chromosome immediately after its creation. These two consecutive DCJ operations, which create and reincorporate a circular chromosome, mimic a transposition or a block-interchange. When the compared genomes have the same content, it is known that the genomic distance for the restricted DCJ model is the same as the distance for the general model. If the genomes have unequal contents, in addition to DCJ it is necessary to consider Indels, which are insertions and deletions of DNA segments. Linear time algorithms were proposed to compute the distance and to find a sorting scenario in a general, unrestricted DCJ-Indel model that considers DCJ and Indels. Results In the present work we consider the restricted DCJ-Indel model for sorting linear genomes with unequal contents. We allow DCJ operations and Indels with the following constraint: if a circular chromosome is created by a DCJ, it has to be reincorporated in the next step (no other DCJ or Indel can be applied between the creation and the reincorporation of a circular chromosome). We then develop a sorting algorithm and give a tight upper bound for the restricted DCJ-Indel distance. Conclusions We have given a tight upper bound for the restricted DCJ-Indel distance. The question whether this bound can be reduced so that both the general and the restricted DCJ-Indel distances are equal remains open.

15 days free trial to Access Article

Poly H Da Silva - One of the best experts on this subject based on the ideXlab platform.

Restricted DCJ-Indel model: sorting linear genomes with DCJ and Indels.

BMC bioinformatics, 2012

Co-Authors: Poly H Da Silva, Raphael Machado, Simone Dantas, Marília D V Braga

Abstract:

The double-cut-and-join (DCJ) is a model that is able to efficiently sort a genome into another, generalizing the typical mutations (inversions, fusions, fissions, translocations) to which genomes are subject, but allowing the existence of circular chromosomes at the intermediate steps. In the general model many circular chromosomes can coexist in some intermediate step. However, when the compared genomes are linear, it is more plausible to use the so-called restricted DCJ model, in which we proceed the reincorporation of a circular chromosome immediately after its creation. These two consecutive DCJ operations, which create and reincorporate a circular chromosome, mimic a transposition or a block-interchange. When the compared genomes have the same content, it is known that the genomic distance for the restricted DCJ model is the same as the distance for the general model. If the genomes have unequal contents, in addition to DCJ it is necessary to consider Indels, which are insertions and deletions of DNA segments. Linear time algorithms were proposed to compute the distance and to find a sorting scenario in a general, unrestricted DCJ-Indel model that considers DCJ and Indels. In the present work we consider the restricted DCJ-Indel model for sorting linear genomes with unequal contents. We allow DCJ operations and Indels with the following constraint: if a circular chromosome is created by a DCJ, it has to be reincorporated in the next step (no other DCJ or Indel can be applied between the creation and the reincorporation of a circular chromosome). We then develop a sorting algorithm and give a tight upper bound for the restricted DCJ-Indel distance. We have given a tight upper bound for the restricted DCJ-Indel distance. The question whether this bound can be reduced so that both the general and the restricted DCJ-Indel distances are equal remains open.

15 days free trial to Access Article
Restricted DCJ-Indel model: sorting linear genomes with DCJ and Indels.

BMC Bioinformatics, 2012

Co-Authors: Poly H Da Silva, Simone Dantas, Raphael C. S. Machado, Marília D V Braga

Abstract:

Background The double-cut-and-join (DCJ) is a model that is able to efficiently sort a genome into another, generalizing the typical mutations (inversions, fusions, fissions, translocations) to which genomes are subject, but allowing the existence of circular chromosomes at the intermediate steps. In the general model many circular chromosomes can coexist in some intermediate step. However, when the compared genomes are linear, it is more plausible to use the so-called restricted DCJ model, in which we proceed the reincorporation of a circular chromosome immediately after its creation. These two consecutive DCJ operations, which create and reincorporate a circular chromosome, mimic a transposition or a block-interchange. When the compared genomes have the same content, it is known that the genomic distance for the restricted DCJ model is the same as the distance for the general model. If the genomes have unequal contents, in addition to DCJ it is necessary to consider Indels, which are insertions and deletions of DNA segments. Linear time algorithms were proposed to compute the distance and to find a sorting scenario in a general, unrestricted DCJ-Indel model that considers DCJ and Indels.

15 days free trial to Access Article
Restricted DCJ-Indel model: sorting linear genomes with DCJ and Indels

BMC Bioinformatics, 2012

Co-Authors: Poly H Da Silva, Raphael Machado, Simone Dantas, Marília D V Braga

Abstract:

Abstract Background The double-cut-and-join (DCJ) is a model that is able to efficiently sort a genome into another, generalizing the typical mutations (inversions, fusions, fissions, translocations) to which genomes are subject, but allowing the existence of circular chromosomes at the intermediate steps. In the general model many circular chromosomes can coexist in some intermediate step. However, when the compared genomes are linear, it is more plausible to use the so-called restricted DCJ model, in which we proceed the reincorporation of a circular chromosome immediately after its creation. These two consecutive DCJ operations, which create and reincorporate a circular chromosome, mimic a transposition or a block-interchange. When the compared genomes have the same content, it is known that the genomic distance for the restricted DCJ model is the same as the distance for the general model. If the genomes have unequal contents, in addition to DCJ it is necessary to consider Indels, which are insertions and deletions of DNA segments. Linear time algorithms were proposed to compute the distance and to find a sorting scenario in a general, unrestricted DCJ-Indel model that considers DCJ and Indels. Results In the present work we consider the restricted DCJ-Indel model for sorting linear genomes with unequal contents. We allow DCJ operations and Indels with the following constraint: if a circular chromosome is created by a DCJ, it has to be reincorporated in the next step (no other DCJ or Indel can be applied between the creation and the reincorporation of a circular chromosome). We then develop a sorting algorithm and give a tight upper bound for the restricted DCJ-Indel distance. Conclusions We have given a tight upper bound for the restricted DCJ-Indel distance. The question whether this bound can be reduced so that both the general and the restricted DCJ-Indel distances are equal remains open.

15 days free trial to Access Article

Discover everything there is to know about the scientific topic Indel with ideXlab!

Liqing Zhang - One of the best experts on this subject based on the ideXlab platform.

Uncovering missed Indels by leveraging unmapped reads.

Uncovering missed Indels by leveraging unmapped reads

UPS-Indel: a Universal Positioning System for Indels

UPS-Indel: a Universal Positioning System for Indels

UPS-Indel: a Universal Positioning System for Indels

Mohammad Shabbir Hasan - One of the best experts on this subject based on the ideXlab platform.

Uncovering missed Indels by leveraging unmapped reads.

Uncovering missed Indels by leveraging unmapped reads

UPS-Indel: a Universal Positioning System for Indels

UPS-Indel: a Universal Positioning System for Indels

UPS-Indel: a Universal Positioning System for Indels

Kiyoshi Ezawa - One of the best experts on this subject based on the ideXlab platform.

General continuous-time Markov model of sequence evolution via insertions/deletions: are alignment probabilities factorable?

general continuous time markov model of sequence evolution via insertions deletions are alignment probabilities factorable

General continuous-time Markov model of sequence evolution via insertions/deletions: are alignment probabilities factorable?

Marília D V Braga - One of the best experts on this subject based on the ideXlab platform.

Restricted DCJ-Indel model: sorting linear genomes with DCJ and Indels.

Restricted DCJ-Indel model: sorting linear genomes with DCJ and Indels.

Restricted DCJ-Indel model: sorting linear genomes with DCJ and Indels

Poly H Da Silva - One of the best experts on this subject based on the ideXlab platform.

Restricted DCJ-Indel model: sorting linear genomes with DCJ and Indels.

Restricted DCJ-Indel model: sorting linear genomes with DCJ and Indels.

Restricted DCJ-Indel model: sorting linear genomes with DCJ and Indels