Inclusion Probability

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 117 Experts worldwide ranked by ideXlab platform

Yves Tillé - One of the best experts on this subject based on the ideXlab platform.

  • Size constrained unequal Probability sampling with a non-integer sum of Inclusion probabilities
    Electronic Journal of Statistics, 2012
    Co-Authors: Anton Grafström, Yves Tillé, Lionel Qualité, Alina Matei
    Abstract:

    More than 50 methods have been developed to draw unequal Probability samples with fixed sample size. All these methods require the sum of the Inclusion probabilities to be an integer number. There are cases, however, where the sum of desired Inclusion probabilities is not an integer. Then, classical algorithms for drawing samples cannot be directly applied. We present two methods to overcome the problem of sample selection with unequal Inclusion probabilities when their sum is not an integer and the sample size cannot be fixed. The first one consists in splitting the Inclusion Probability vector. The second method is based on extending the population with a phantom unit. For both methods the sample size is almost fixed, and equal to the integer part of the sum of the Inclusion probabilities or this integer plus one.

  • Computational aspects of order πps sampling schemes
    Computational Statistics & Data Analysis, 2007
    Co-Authors: Alina Matei, Yves Tillé
    Abstract:

    In an order sampling a finite population of size N has its units ordered by a ranking variable and then, a sample of the first n units is drawn. For order @pps sampling, the target Inclusion probabilities @l=(@l"k)"k"="1^N are computed using a measure of size which is correlated with a variable of interest. The quantities @l"k, however, are different from the true Inclusion probabilities @p"k. Firstly, a new, simple method to compute @p"k from @l"k is presented, and it is used to compute the Inclusion probabilities of order @pps sampling schemes (uniform, exponential and Pareto). Secondly, given two positively co-ordinated samples drawn with order @pps sampling, the joint Inclusion Probability of a unit in both samples is approximated. This approximation can be used to derive the expected overlap or to construct an estimate of the covariance on these two samples. All presented methods use numerical integration.

  • Unequal Probability sampling without replacement through a splitting method
    Biometrika, 1998
    Co-Authors: Jean-claude Deville, Yves Tillé
    Abstract:

    SUMMARY A very general class of sampling methods without replacement and with unequal probabilities is proposed. It consists of splitting the Inclusion Probability vector into several new Inclusion Probability vectors. One of these vectors is chosen randomly; thus, the initial problem is reduced to another sampling problem with unequal probabilities. This splitting is then repeated on these new vectors of Inclusion probabilities; at each step, the sampling problem is reduced to a simpler problem. The simplicity of this technique allows one to generate easily new sampling procedures with unequal probabilities. The splitting method also generalises well-known methods such as the Midzuno method, the elimination procedure and the Chao procedure. Next, a sufficient condition is given in order that a splitting method satisfies the Sen-Yates-Grundy condition. Finally, it is shown that the elimination procedure satisfies the Gabler sufficient condition.

Alina Matei - One of the best experts on this subject based on the ideXlab platform.

  • Size constrained unequal Probability sampling with a non-integer sum of Inclusion probabilities
    Electronic Journal of Statistics, 2012
    Co-Authors: Anton Grafström, Yves Tillé, Lionel Qualité, Alina Matei
    Abstract:

    More than 50 methods have been developed to draw unequal Probability samples with fixed sample size. All these methods require the sum of the Inclusion probabilities to be an integer number. There are cases, however, where the sum of desired Inclusion probabilities is not an integer. Then, classical algorithms for drawing samples cannot be directly applied. We present two methods to overcome the problem of sample selection with unequal Inclusion probabilities when their sum is not an integer and the sample size cannot be fixed. The first one consists in splitting the Inclusion Probability vector. The second method is based on extending the population with a phantom unit. For both methods the sample size is almost fixed, and equal to the integer part of the sum of the Inclusion probabilities or this integer plus one.

  • Computational aspects of order πps sampling schemes
    Computational Statistics & Data Analysis, 2007
    Co-Authors: Alina Matei, Yves Tillé
    Abstract:

    In an order sampling a finite population of size N has its units ordered by a ranking variable and then, a sample of the first n units is drawn. For order @pps sampling, the target Inclusion probabilities @l=(@l"k)"k"="1^N are computed using a measure of size which is correlated with a variable of interest. The quantities @l"k, however, are different from the true Inclusion probabilities @p"k. Firstly, a new, simple method to compute @p"k from @l"k is presented, and it is used to compute the Inclusion probabilities of order @pps sampling schemes (uniform, exponential and Pareto). Secondly, given two positively co-ordinated samples drawn with order @pps sampling, the joint Inclusion Probability of a unit in both samples is approximated. This approximation can be used to derive the expected overlap or to construct an estimate of the covariance on these two samples. All presented methods use numerical integration.

Didier Boichard - One of the best experts on this subject based on the ideXlab platform.

  • QTL fine mapping with Bayesian C(π): a simulation study
    Genetics Selection Evolution, 2013
    Co-Authors: Irene Van Den Berg, Sébastien Fritz, Didier Boichard
    Abstract:

    Background: Accurate QTL mapping is a prerequisite in the search for causative mutations. Bayesian genomic selection models that analyse many markers simultaneously should provide more accurate QTL detection results than single-marker models. Our objectives were to (a) evaluate by simulation the influence of heritability, number of QTL and number of records on the accuracy of QTL mapping with Bayes Cπ and Bayes C; (b) estimate the QTL status (homozygous vs. heterozygous) of the individuals analysed. This study focussed on the ten largest detected QTL, assuming they are candidates for further characterization.[br/] Methods: Our simulations were based on a true dairy cattle population genotyped for 38 277 phased markers. Some of these markers were considered biallelic QTL and used to generate corresponding phenotypes. Different numbers of records (4387 and 1500), heritability values (0.1, 0.4 and 0.7) and numbers of QTL (10, 100 and 1000) were studied. QTL detection was based on the posterior Inclusion Probability for individual markers, or on the sum of the posterior Inclusion probabilities for consecutive markers, estimated using Bayes C or Bayes Cπ. The QTL status of the individuals was derived from the contrast between the sums of the SNP allelic effects of their chromosomal segments.[br/] Results: The proportion of markers with null effect (π) frequently did not reach convergence, leading to poor results for Bayes Cπ in QTL detection. Fixing π led to better results. Detection of the largest QTL was most accurate for medium to high heritability, for low to moderate numbers of QTL, and with a large number of records. The QTL status was accurately inferred when the distribution of the contrast between chromosomal segment effects was bimodal.[br/] Conclusions: QTL detection is feasible with Bayes C. For QTL detection, it is recommended to use a large dataset and to focus on highly heritable traits and on the largest QTL. QTL statuses were inferred based on the distribution of the contrast between chromosomal segment effects.

Hélène Jacqmin-gadda - One of the best experts on this subject based on the ideXlab platform.

  • Health administrative data enrichment using cohort information: Comparative evaluation of methods by simulation and application to real data.
    PLOS ONE, 2019
    Co-Authors: Bernard C. Silenou, Marta Avalos, Catherine Helmer, Claudine Berr, Antoine Pariente, Hélène Jacqmin-gadda
    Abstract:

    Background Studies using health administrative databases (HAD) may lead to biased results since information on potential confounders is often missing. Methods that integrate confounder data from cohort studies, such as multivariate imputation by chained equations (MICE) and two-stage calibration (TSC), aim to reduce confounding bias. We provide new insights into their behavior under different deviations from representativeness of the cohort. Methods We conducted an extensive simulation study to assess the performance of these two methods under different deviations from representativeness of the cohort. We illustrate these approaches by studying the association between benzodiazepine use and fractures in the elderly using the general sample of French health insurance beneficiaries (EGB) as main database and two French cohorts (Paquid and 3C) as validation samples. Results When the cohort was representative from the same population as the HAD, the two methods are unbiased. TSC was more efficient and faster but its variance could be slightly underestimated when confounders were non-Gaussian. If the cohort was a subsample of the HAD (internal validation) with the Probability of the subject being included in the cohort depending on both exposure and outcome, MICE was unbiased while TSC was biased. The two methods appeared biased when the Inclusion Probability in the cohort depended on unobserved confounders. Conclusion When choosing the most appropriate method, epidemiologists should consider the origin of the cohort (internal or external validation) as well as the (anticipated or observed) selection biases of the validation sample

  • Health administrative data enrichment using cohort information: Comparative evaluation of methods by simulation and application to real data
    PLoS ONE, 2019
    Co-Authors: Bernard C. Silenou, Marta Avalos, Catherine Helmer, Claudine Berr, Antoine Pariente, Hélène Jacqmin-gadda
    Abstract:

    Studies using health administrative databases (HAD) may lead to biased results since information on potential confounders is often missing. Methods that integrate confounder data from cohort studies, such as multivariate imputation by chained equations (MICE) and two-stage calibration (TSC), aim to reduce confounding bias. We provide new insights into their behavior under different deviations from representativeness of the cohort. We conducted an extensive simulation study to assess the performance of these two methods under different deviations from representativeness of the cohort. We illustrate these approaches by studying the association between benzodiazepine use and fractures in the elderly using the general sample of French health insurance beneficiaries (EGB) as main database and two French cohorts (Paquid and 3C) as validation samples. When the cohort was representative from the same population as the HAD, the two methods are unbiased. TSC was more efficient and faster but its variance could be slightly underestimated when confounders were non-Gaussian. If the cohort was a subsample of the HAD (internal validation) with the Probability of the subject being included in the cohort depending on both exposure and outcome, MICE was unbiased while TSC was biased. The two methods appeared biased when the Inclusion Probability in the cohort depended on unobserved confounders. When choosing the most appropriate method, epidemiologists should consider the origin of the cohort (internal or external validation) as well as the (anticipated or observed) selection biases of the validation sample.

Irene Van Den Berg - One of the best experts on this subject based on the ideXlab platform.

  • QTL fine mapping with Bayesian C(π): a simulation study
    Genetics Selection Evolution, 2013
    Co-Authors: Irene Van Den Berg, Sébastien Fritz, Didier Boichard
    Abstract:

    Background: Accurate QTL mapping is a prerequisite in the search for causative mutations. Bayesian genomic selection models that analyse many markers simultaneously should provide more accurate QTL detection results than single-marker models. Our objectives were to (a) evaluate by simulation the influence of heritability, number of QTL and number of records on the accuracy of QTL mapping with Bayes Cπ and Bayes C; (b) estimate the QTL status (homozygous vs. heterozygous) of the individuals analysed. This study focussed on the ten largest detected QTL, assuming they are candidates for further characterization.[br/] Methods: Our simulations were based on a true dairy cattle population genotyped for 38 277 phased markers. Some of these markers were considered biallelic QTL and used to generate corresponding phenotypes. Different numbers of records (4387 and 1500), heritability values (0.1, 0.4 and 0.7) and numbers of QTL (10, 100 and 1000) were studied. QTL detection was based on the posterior Inclusion Probability for individual markers, or on the sum of the posterior Inclusion probabilities for consecutive markers, estimated using Bayes C or Bayes Cπ. The QTL status of the individuals was derived from the contrast between the sums of the SNP allelic effects of their chromosomal segments.[br/] Results: The proportion of markers with null effect (π) frequently did not reach convergence, leading to poor results for Bayes Cπ in QTL detection. Fixing π led to better results. Detection of the largest QTL was most accurate for medium to high heritability, for low to moderate numbers of QTL, and with a large number of records. The QTL status was accurately inferred when the distribution of the contrast between chromosomal segment effects was bimodal.[br/] Conclusions: QTL detection is feasible with Bayes C. For QTL detection, it is recommended to use a large dataset and to focus on highly heritable traits and on the largest QTL. QTL statuses were inferred based on the distribution of the contrast between chromosomal segment effects.