A statistical model for the analysis of beta values in DNA methylation studies
L Weinhold, S Wahl, S Pechlivanis, P Hoffmann… - BMC …, 2016 - Springer
L Weinhold, S Wahl, S Pechlivanis, P Hoffmann, M Schmid
BMC bioinformatics, 2016•SpringerBackground The analysis of DNA methylation is a key component in the development of
personalized treatment approaches. A common way to measure DNA methylation is the
calculation of beta values, which are bounded variables of the form M/(M+ U) that are
generated by Illumina's 450k BeadChip array. The statistical analysis of beta values is
considered to be challenging, as traditional methods for the analysis of bounded variables,
such as M-value regression and beta regression, are based on regularity assumptions that …
personalized treatment approaches. A common way to measure DNA methylation is the
calculation of beta values, which are bounded variables of the form M/(M+ U) that are
generated by Illumina's 450k BeadChip array. The statistical analysis of beta values is
considered to be challenging, as traditional methods for the analysis of bounded variables,
such as M-value regression and beta regression, are based on regularity assumptions that …
Background
The analysis of DNA methylation is a key component in the development of personalized treatment approaches. A common way to measure DNA methylation is the calculation of beta values, which are bounded variables of the form M/(M+U) that are generated by Illumina’s 450k BeadChip array. The statistical analysis of beta values is considered to be challenging, as traditional methods for the analysis of bounded variables, such as M-value regression and beta regression, are based on regularity assumptions that are often too strong to adequately describe the distribution of beta values.
Results
We develop a statistical model for the analysis of beta values that is derived from a bivariate gamma distribution for the signal intensities M and U. By allowing for possible correlations between M and U, the proposed model explicitly takes into account the data-generating process underlying the calculation of beta values. Using simulated data and a real sample of DNA methylation data from the Heinz Nixdorf Recall cohort study, we demonstrate that the proposed model fits our data significantly better than beta regression and M-value regression.
Conclusion
The proposed model contributes to an improved identification of associations between beta values and covariates such as clinical variables and lifestyle factors in epigenome-wide association studies. It is as easy to apply to a sample of beta values as beta regression and M-value regression.
Springer