Title: | Various Non-Additive Models for Genetic Associations |
---|---|
Description: | The goal of 'gnonadd' is to simplify workflows in the analysis of non-additive effects of sequence variants. This includes variance effects (Ivarsdottir et. al (2017) <doi:10.1038/ng.3928>), correlation effects, interaction effects and dominance effects. The package also includes convenience functions for visualization. |
Authors: | Audunn S. Snaebjarnarson [aut, cre, ctb], Gudmundur Einarsson [aut, ctb], Daniel F. Gudbjartsson [aut, ctb], deCODE Genetics/AMGEN [cph, fnd] |
Maintainer: | Audunn S. Snaebjarnarson <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.0.3 |
Built: | 2025-02-17 02:43:35 UTC |
Source: | https://github.com/decodegenetics/gnonadd |
This function estimates the variance effect of a genetic variant on a quantitative trait. The genotype is coded as 0 (non-carrier), 1 (single copy of effect allele) and 2 (two copies of effect allele). The variance effect (alpha) is modelled to be multiplicitive. We use a likelihood ratio test with 1 degree of freedom, * H0: y~N(mu_g,sigma^2) * H1: y~N(mu_g,sigma^2\*alpha^g) Under the alternative model, the variance of the trait changes multiplicatively with genotype: Var(y|g=j)=alpha^j*sigma^2, j in 0,1,2
alpha.calc(qt, g)
alpha.calc(qt, g)
qt |
A numeric vector. |
g |
An integer vector. |
A list with the values:
* alpha, the estimated variance effect * sigma2_alt, The variance estimated for non-carriers * pval, the p-value of the likelihood ratio test * chi2, the chi squared statistics (with one degree of freedom) corrisponding to the p-value
n_val <- 50000L geno_vec <- sample(c(0, 1, 2), size = n_val, replace = TRUE) qt_vec <- rnorm(n_val) * (1.1^geno_vec) res <- alpha.calc(qt_vec, geno_vec)
n_val <- 50000L geno_vec <- sample(c(0, 1, 2), size = n_val, replace = TRUE) qt_vec <- rnorm(n_val) * (1.1^geno_vec) res <- alpha.calc(qt_vec, geno_vec)
We estimate the variance effect of a primary variant conditioned on one or more secondary variants. We apply a likelyhood ratio test with one degree of freedom
H0: All secondary variants have a variance effect, but not the primary one. H1: All variants have a variance effect, including the primary one
Under both models we assume a full mean effect model. That is, the number of mean-value parameters is 3^(n+1), where n is the number of covariates Thus the null model has 3^(n+1)+n degrees of freedom whereas the alternative model has 3^(n+1)+n+1 Due to the exponential growth of free parameters, the test might have low statistical power if many covariates are used
IMPORTANT NOTE: We use the Gauss-Newton algorithm to estimate many parameters. We do not check if the algorithm converges to the true minimum values, or if it converges at all. Thus, we advise against blindly believing all results
alpha.cond(qt, g, g_covar, iter_num = 50)
alpha.cond(qt, g, g_covar, iter_num = 50)
qt |
A numeric vector. |
g |
An integer vector. |
g_covar |
An integer matrix where each column corresponds to a genetic covariate |
iter_num |
An integer. Represents the number of iterations performed in the Gauss-Newton algorithm |
A list with the values: * alpha, the estimated variance effect, conditioned on the covariates * pval, the p-value corresponding to alpha
n <- 10000 qt <- rnorm(n) g <- rbinom(n, 2, 0.3) qt <- qt * 1.2^g g_covar <- as.data.frame(matrix(0, nrow = n, ncol = 3)) for(i in 1:ncol(g_covar)){ freq <- runif(1, min = 0, max = 1) g_covar[, i] <- rbinom(n, 2, freq) } res <- alpha.cond(qt, g, g_covar)
n <- 10000 qt <- rnorm(n) g <- rbinom(n, 2, 0.3) qt <- qt * 1.2^g g_covar <- as.data.frame(matrix(0, nrow = n, ncol = 3)) for(i in 1:ncol(g_covar)){ freq <- runif(1, min = 0, max = 1) g_covar[, i] <- rbinom(n, 2, freq) } res <- alpha.cond(qt, g, g_covar)
We estimate the variance effect of a variant conditioned on one or more continous variable We apply a likelyhood ratio test with one degree of freedom
H0: All covariates have a variance effect, but not the variant H1: The variant has a variance effect, and the covariates as well
alpha.continuous.cond(qt, g, x, iter_num = 50, eps_param = 1e-10)
alpha.continuous.cond(qt, g, x, iter_num = 50, eps_param = 1e-10)
qt |
A numeric vector. |
g |
An integer vector. |
x |
A numeric matrix, each column represents a covariate. |
iter_num |
An integer. Represents the number of iterations performed in the Gauss-Newton algorithm |
eps_param |
A number. The Gauss-Newton algorithm terminates if the incriment change of all variance estimates is smaller than this number. |
A list with the values: * alpha, the estimated variance effect, conditioned on the covariates * pval, the p-value corresponding to alpha
n_val <- 50000 x <- matrix(0,nrow = n_val, ncol = 4) for(i in 1:4) { x[, i] <- rnorm(n_val) } g_vec <- rbinom(n_val,2,0.3) var_vec <- exp(0.2 * x[, 1] - 0.3 * x[, 4] + 0.3 * g_vec) qt_vec <- rnorm(n_val, 0, sqrt(var_vec)) res <- alpha.continuous.cond(qt_vec, g_vec, x)
n_val <- 50000 x <- matrix(0,nrow = n_val, ncol = 4) for(i in 1:4) { x[, i] <- rnorm(n_val) } g_vec <- rbinom(n_val,2,0.3) var_vec <- exp(0.2 * x[, 1] - 0.3 * x[, 4] + 0.3 * g_vec) qt_vec <- rnorm(n_val, 0, sqrt(var_vec)) res <- alpha.continuous.cond(qt_vec, g_vec, x)
This function jointly estimates the variance effect of a set of (continuous) variables on a qt trait. More precisely. It finds the maximum likelyhood estimators.
alpha.multi.est( qt, x, iter_num = 50, eps_param = 1e-10, initial_guess = rep(0, ncol(x)) )
alpha.multi.est( qt, x, iter_num = 50, eps_param = 1e-10, initial_guess = rep(0, ncol(x)) )
qt |
A numeric vector. |
x |
A numeric matrix, each column represents a covariate. |
iter_num |
An integer. Represents the number of iterations performed in the Gauss-Newton algorithm |
eps_param |
A number. The Gauss-Newton algorithm terminates if the incriment change of all variance estimates is smaller than this number. |
initial_guess |
A vector of length ncol(x). Represents the initial guess of parameters for the Gauss-Newton algorithm. |
A vector with a variance estimate for each variable.
n_val <- 50000 x <- matrix(0,nrow = n_val, ncol = 4) for(i in 1:4) { x[, i] <- rnorm(n_val) } var_vec <- exp(0.2 * x[, 1] - 0.3 * x[, 4]) qt_vec <- rnorm(n_val, 0, sqrt(var_vec)) res <- alpha.multi.est(qt_vec, x)
n_val <- 50000 x <- matrix(0,nrow = n_val, ncol = 4) for(i in 1:4) { x[, i] <- rnorm(n_val) } var_vec <- exp(0.2 * x[, 1] - 0.3 * x[, 4]) qt_vec <- rnorm(n_val, 0, sqrt(var_vec)) res <- alpha.multi.est(qt_vec, x)
This function finds appropriate scaling_parameters for the kappa_calc function by means of bootstrapping.
corr.calibration(qt1, qt2, reps = 10000)
corr.calibration(qt1, qt2, reps = 10000)
qt1 |
A numeric vector. |
qt2 |
A numeric vector. |
reps |
The number of repeates we want to perform for each sample size |
A list with the values:
* bias_scale, a value to determine the bias for the correlation estimators * weight_scale, a value to determine how much weight we assign to each correlation estimator * safe_weight_scale, same as weight scale, but larger. Using this weight decreases the chance of type 1 error, with the cost of statistical power.
n_val <- 10000 Q <- MASS::mvrnorm(n = n_val, mu = c(0,0), Sigma = matrix(c(1,0.3,0.3,1), nrow = 2, ncol = 2)) qt1_val <- Q[,1] qt2_val <- Q[,2] res <- corr.calibration(qt1_val, qt2_val, 10)
n_val <- 10000 Q <- MASS::mvrnorm(n = n_val, mu = c(0,0), Sigma = matrix(c(1,0.3,0.3,1), nrow = 2, ncol = 2)) qt1_val <- Q[,1] qt2_val <- Q[,2] res <- corr.calibration(qt1_val, qt2_val, 10)
This function estimates the dominance effect of a genetic variant on a case-control variable We apply a logistic regression model to estimate dominance effects. We include a linear term, coded as 0,1 and 2 for non-carriers, heterozygotes and homozygous carriers of the effect allele. We also include a dominance term, coded as 1 for homozygous carriers and 0 for others. Effect size and significance is based on the dominance term.
dominance_CC.calc( cc, g, yob = rep(-1, length(cc)), sex = rep(-1, length(cc)), round_imputed = FALSE, covariates = as.data.frame(matrix(0, nrow = 0, ncol = 0)) )
dominance_CC.calc( cc, g, yob = rep(-1, length(cc)), sex = rep(-1, length(cc)), round_imputed = FALSE, covariates = as.data.frame(matrix(0, nrow = 0, ncol = 0)) )
cc |
A case control vector, containing 0's and 1's |
g |
A vector with (possibly imputed) genotype values. All entries should be larger than 0 and smaller than 2. |
yob |
A numerical vector containing year of birth. If some are unknown they should be marked as -1 |
sex |
A numerical vector containing sex, coded 0 for males, 1 for females and -1 for unknown |
round_imputed |
A boolian variable determining whether imputed genotype values should be rounded to the nearest integer in the analysis |
covariates |
A dataframe containing any other covariates that should be used; one column per covariate. |
A list with the dominanc effect (on log-scale) and corresponding standard error, z statistic and p-value
g_vec <- rbinom(100000, 2, 0.3) cc_vec <- rbinom(100000, 1, 0.1 * (1.2 ^ (g_vec^2))) res <- dominance_CC.calc(cc_vec, g_vec)
g_vec <- rbinom(100000, 2, 0.3) cc_vec <- rbinom(100000, 1, 0.1 * (1.2 ^ (g_vec^2))) res <- dominance_CC.calc(cc_vec, g_vec)
This function estimates the dominance effect of a genetic variant on a quantitatvie trait Nothing fancy here. We apply a simple linear regression model to estimate dominance effects. We include a linear term, coded as 0,1 and 2 for non-carriers, heterozygotes and homozygous carriers of the effect allele. We also include a dominance term, coded as 1 for homozygous carriers and 0 for others. Effect size and significance is based on the dominance term.
dominance.calc( qt, g, round_imputed = FALSE, covariates = as.data.frame(matrix(0, nrow = 0, ncol = 0)) )
dominance.calc( qt, g, round_imputed = FALSE, covariates = as.data.frame(matrix(0, nrow = 0, ncol = 0)) )
qt |
A numeric vector |
g |
A vector with (possibly imputed) genotype values. All entries should be larger than 0 and smaller than 2. |
round_imputed |
A boolian variable determining whether imputed genotype values should be rounded to the nearest integer in the analysis |
covariates |
A dataframe containing any covariates that should be used; one column per covariate. |
A list with the dominanc effect and corresponding standard error, t statistic and p-value
g_vec <- rbinom(100000, 2, 0.3) qt_vec <- rnorm(100000) + 0.2 * g_vec^2 res <- dominance.calc(qt_vec, g_vec)
g_vec <- rbinom(100000, 2, 0.3) qt_vec <- rnorm(100000) + 0.2 * g_vec^2 res <- dominance.calc(qt_vec, g_vec)
This tool creates a scatter plot along with regression lines. Additionally it finds and plots the best ellipses that fit the data.
ellipse.by.gen( qt1, qt2, g, trait_name1 = "qt trait 1", trait_name2 = "qt trait 2", title = "", sample_size = 500 )
ellipse.by.gen( qt1, qt2, g, trait_name1 = "qt trait 1", trait_name2 = "qt trait 2", title = "", sample_size = 500 )
qt1 |
A numeric vector. |
qt2 |
A numeric vector. |
g |
An integer vector. |
trait_name1 |
A string. |
trait_name2 |
A string. |
title |
A string. |
sample_size |
A positive integer. |
A scatter plot.
n_val <- 10000L geno_vec <- c(rep(0, n_val), rep(1, n_val), rep(2, n_val)) qt_g0 <- MASS::mvrnorm(n_val, mu = c(0, 0), Sigma = matrix(c(0.93, 0.88, 0.88, 0.92), ncol = 2)) qt_g1 <- MASS::mvrnorm(n_val, mu = c(0, 0), Sigma = matrix(c(0.98, 0.88, 0.88, 0.90), ncol = 2)) qt_g2 <- MASS::mvrnorm(n_val, mu = c(0, 0), Sigma = matrix(c(1.57, 0.81, 0.81, 0.59), ncol = 2)) qt_vec <- rbind(qt_g0, qt_g1) qt_vec <- rbind(qt_vec, qt_g2) res <- ellipse.by.gen(qt_vec[, 1], qt_vec[, 2], geno_vec)
n_val <- 10000L geno_vec <- c(rep(0, n_val), rep(1, n_val), rep(2, n_val)) qt_g0 <- MASS::mvrnorm(n_val, mu = c(0, 0), Sigma = matrix(c(0.93, 0.88, 0.88, 0.92), ncol = 2)) qt_g1 <- MASS::mvrnorm(n_val, mu = c(0, 0), Sigma = matrix(c(0.98, 0.88, 0.88, 0.90), ncol = 2)) qt_g2 <- MASS::mvrnorm(n_val, mu = c(0, 0), Sigma = matrix(c(1.57, 0.81, 0.81, 0.59), ncol = 2)) qt_vec <- rbind(qt_g0, qt_g1) qt_vec <- rbind(qt_vec, qt_g2) res <- ellipse.by.gen(qt_vec[, 1], qt_vec[, 2], geno_vec)
This function estimates the interaction effect of a genetic variant with an environmental factor on a case control variable We apply a simple logistic regression model to estimate interaction effects. We include a linear term for the variant and environmental variable seperately. The variant is coded as 0,1 and 2 for non-carriers, heterozygotes and homozygous carriers of the effect allele. The environmental variable is rank-normalized automatically as part of the function. The interaction term is defined as the product of the genetic and the (normalized) environmental variable. Effect size and significance is based on the interaction term.
env_interaction_CC.calc( cc, g, env, yob = rep(-1, length(cc)), sex = rep(-1, length(cc)), round_imputed = FALSE, dominance_term = FALSE, square_env = FALSE, covariates = as.data.frame(matrix(0, nrow = 0, ncol = 0)) )
env_interaction_CC.calc( cc, g, env, yob = rep(-1, length(cc)), sex = rep(-1, length(cc)), round_imputed = FALSE, dominance_term = FALSE, square_env = FALSE, covariates = as.data.frame(matrix(0, nrow = 0, ncol = 0)) )
cc |
A numeric vector |
g |
A vector with (possibly imputed) genotype values. All entries should be larger than 0 and smaller than 2. |
env |
A numeric vector with an environmental variable |
yob |
A numerical vector containing year of birth. If some are unknown they should be marked as -1 |
sex |
A numerical vector containing sex, coded 0 for males, 1 for females and -1 for unknown |
round_imputed |
A boolian variable determining whether imputed genotype values should be rounded to the nearest integer in the analysis. |
dominance_term |
A boolian variable determining whether a dominance term for the variant should be included as a covariates in the analysis |
square_env |
A boolian variable determining whether the square of the environmental trait should be included as a covariate in the analysis |
covariates |
A dataframe containing any other covariates that should be used; one column per covariate |
A list with the environmental interaction effect and corresponding standard error, t statistic and p-value
g_vec <- rbinom(100000, 2, 0.9) env_vec <- round(runif(100000, min = 0, max = 6)) cc_vec <- rbinom(100000,1,0.1 * (1.05^g_vec) * (1.1^env_vec)* (1.1 ^ (g_vec * env_vec))) res <- env_interaction_CC.calc(cc_vec, g_vec, env_vec)
g_vec <- rbinom(100000, 2, 0.9) env_vec <- round(runif(100000, min = 0, max = 6)) cc_vec <- rbinom(100000,1,0.1 * (1.05^g_vec) * (1.1^env_vec)* (1.1 ^ (g_vec * env_vec))) res <- env_interaction_CC.calc(cc_vec, g_vec, env_vec)
This function estimates the interaction effect of a genetic variant with an environmental factor on a quantitatvie trait We apply a simple linear regression model to estimate interaction effects. We include a linear term for the variant and environmental variable seperately. The variant is coded as 0,1 and 2 for non-carriers, heterozygotes and homozygous carriers of the effect allele. The environmental variable is rank-normalized automatically as part of the function. The interaction term is defined as the product of the genetic and the (normalized) environmental variables. Effect size and significance is based on the interaction term.
env_interaction.calc( qt, g, env, round_imputed = FALSE, dominance_term = FALSE, square_env = FALSE, covariates = as.data.frame(matrix(0, nrow = 0, ncol = 0)) )
env_interaction.calc( qt, g, env, round_imputed = FALSE, dominance_term = FALSE, square_env = FALSE, covariates = as.data.frame(matrix(0, nrow = 0, ncol = 0)) )
qt |
A numeric vector |
g |
A vector with (possibly imputed) genotype values. All entries should be larger than 0 and smaller than 2. |
env |
A numeric vector with an environmental variable |
round_imputed |
A boolian variable determining whether imputed genotype values should be rounded to the nearest integer in the analysis. |
dominance_term |
A boolian variable determining whether a dominance term for the variant should be included as a covariates in the analysis |
square_env |
A boolian variable determining whether the square of the environmental trait should be included as a covariate in the analysis |
covariates |
A dataframe containing any other covariates that should be used; one column per covariate |
A list with the environmental interaction effect and corresponding standard error, t statistic and p-value
g_vec <- rbinom(100000, 2, 0.1) env_vec <- round(runif(100000,min=0,max=6)) qt_vec <- rnorm(100000) + 0.1 * g_vec + 0.05 * env_vec + 0.05 * g_vec * env_vec res <- env_interaction.calc(qt_vec, g_vec, env_vec)
g_vec <- rbinom(100000, 2, 0.1) env_vec <- round(runif(100000,min=0,max=6)) qt_vec <- rnorm(100000) + 0.1 * g_vec + 0.05 * env_vec + 0.05 * g_vec * env_vec res <- env_interaction.calc(qt_vec, g_vec, env_vec)
This function interpolates data from a simple simulation to give an estimate of the variance effect induced by an additive effect. The simulation code is stored under inst/raw/. We assume that the trait has been inverse normal transformed. Under the simulation, there is no variance effect, so the variance effect is fully induced by the inverse normal transform.
expected.variance.effect(maf, beta_add)
expected.variance.effect(maf, beta_add)
maf |
Minor allele frequency of the variant, should be in the range 0 to 0.5. |
beta_add |
Additive effect of the variant, should be in the range 0 to 3.5. This variable can be a vector of values. |
The expected variance effect for the variant from the given maf, beta combination.
maf <- 0.1 beta_val <- 0.3 expected_var <- expected.variance.effect(maf, beta_val) beta_vec <- seq(0.1,2, length.out = 20) expected_var <- expected.variance.effect(maf, beta_vec)
maf <- 0.1 beta_val <- 0.3 expected_var <- expected.variance.effect(maf, beta_val) beta_vec <- seq(0.1,2, length.out = 20) expected_var <- expected.variance.effect(maf, beta_vec)
This tool creates three histogram plots. One per genotype. Additionally, outliers are colored red (by default subjects that are in the top and bottom 2.5 and blue lines are added to indicate the mean and (by default) one standard deviation in each direction.
hist_by_gen( qt, g, bins = 100, trait_name = "qt trait", title = "", outlier_quantiles = c(0.025, 0.975), sd_lines = c(1, 1) )
hist_by_gen( qt, g, bins = 100, trait_name = "qt trait", title = "", outlier_quantiles = c(0.025, 0.975), sd_lines = c(1, 1) )
qt |
A numeric vector. |
g |
An integer vector. |
bins |
An integer. |
trait_name |
A string. |
title |
A string. |
outlier_quantiles |
A vector with length 2. |
sd_lines |
A vector with length 2. |
A histgram plot
n_val <- 50000L geno_vec <- sample(c(0, 1, 2), size = n_val, replace = TRUE) qt_vec <- rnorm(n_val) * (1.3^geno_vec) + 1 * geno_vec hist_by_gen(qt_vec, geno_vec)
n_val <- 50000L geno_vec <- sample(c(0, 1, 2), size = n_val, replace = TRUE) qt_vec <- rnorm(n_val) * (1.3^geno_vec) + 1 * geno_vec hist_by_gen(qt_vec, geno_vec)
This function estimates the interaction effect of a pair of genetic variant on a case-control variable We apply a logistic regression model to estimate interaction effects. We include a linear term for each variant seperately, coded as 0,1 and 2 for non-carriers, heterozygotes and homozygous carriers of the effect allele. We also include an interaction term, coded as the product of the two genotype values. Effect size and significance is based on the interaction term.
interaction_CC.calc( cc, g1, g2, yob = rep(-1, length(cc)), sex = rep(-1, length(cc)), round_imputed = FALSE, dominance_terms = FALSE, covariates = as.data.frame(matrix(0, nrow = 0, ncol = 0)) )
interaction_CC.calc( cc, g1, g2, yob = rep(-1, length(cc)), sex = rep(-1, length(cc)), round_imputed = FALSE, dominance_terms = FALSE, covariates = as.data.frame(matrix(0, nrow = 0, ncol = 0)) )
cc |
A numeric vector |
g1 |
A vector with (possibly imputed) genotype values. All entries should be larger than 0 and smaller than 2. |
g2 |
A vector with (possibly imputed) genotype values. All entries should be larger than 0 and smaller than 2. |
yob |
A numerical vector containing year of birth. If some are unknown they should be marked as -1 |
sex |
A numerical vector containing sex, coded 0 for males, 1 for females and -1 for unknown |
round_imputed |
A boolian variable determining whether imputed genotype values should be rounded to the nearest integer in the analysis. |
dominance_terms |
A boolian variable determining whether dominance terms for the variants should be included as covariates in the analysis |
covariates |
A dataframe containing any other covariates that should be used; one column per covariate |
A list with the interaction effect (on log-scale) and corresponding standard error, z statistic and p-value
g1_vec <- rbinom(100000, 2, 0.9) g2_vec <- rbinom(100000, 2, 0.1) cc_vec <- rbinom(100000,1,0.1 * (1.05^g1_vec) * (1.05^g2_vec) * (1.3 ^ (g1_vec * g2_vec))) res <- interaction_CC.calc(cc_vec, g1_vec, g2_vec)
g1_vec <- rbinom(100000, 2, 0.9) g2_vec <- rbinom(100000, 2, 0.1) cc_vec <- rbinom(100000,1,0.1 * (1.05^g1_vec) * (1.05^g2_vec) * (1.3 ^ (g1_vec * g2_vec))) res <- interaction_CC.calc(cc_vec, g1_vec, g2_vec)
This function estimates the interaction effect of a pair of genetic variant on a quantitatvie trait We apply a simple linear regression model to estimate interaction effects. We include a linear term for each variant seperately, coded as 0,1 and 2 for non-carriers, heterozygotes and homozygous carriers of the effect allele. We also include an interaction term, coded as the product of the two genotype values. Effect size and significance is based on the interaction term.
interaction.calc( qt, g1, g2, round_imputed = FALSE, dominance_terms = FALSE, covariates = as.data.frame(matrix(0, nrow = 0, ncol = 0)) )
interaction.calc( qt, g1, g2, round_imputed = FALSE, dominance_terms = FALSE, covariates = as.data.frame(matrix(0, nrow = 0, ncol = 0)) )
qt |
A numeric vector |
g1 |
A vector with (possibly imputed) genotype values. All entries should be larger than or equal to 0 and smaller than or equal to 2. |
g2 |
A vector with (possibly imputed) genotype values. All entries should be larger than or equal to 0 and smaller than or equal to 2. |
round_imputed |
A boolian variable determining whether imputed genotype values should be rounded to the nearest integer in the analysis. |
dominance_terms |
A boolian variable determining whether dominance terms for the variants should be included as covariates in the analysis |
covariates |
A dataframe containing any other covariates that should be used; one column per covariate |
A list with the interaction effect and corresponding standard error, t statistic and p-value
g1_vec <- rbinom(100000, 2, 0.9) g2_vec <- rbinom(100000, 2, 0.1) qt_vec <- rnorm(100000) + 0.1 * g1_vec + 0.2 * g2_vec +0.4 * g1_vec * g2_vec res <- interaction.calc(qt_vec, g1_vec, g2_vec)
g1_vec <- rbinom(100000, 2, 0.9) g2_vec <- rbinom(100000, 2, 0.1) qt_vec <- rnorm(100000) + 0.1 * g1_vec + 0.2 * g2_vec +0.4 * g1_vec * g2_vec res <- interaction.calc(qt_vec, g1_vec, g2_vec)
This function estimates the correlation effect of a genetic variant on a pair of quantitative traits. The effect (kappa) is measured on Fisher transformed correlation values. The genotype is coded as 0 (non-carrier), 1 (single copy of effect allele) and 2 (two copies of effect allele). We use a likelihood ratio test with 1 degree of freedom: * H0: z_0 = z_1 = z_2, * H1: z_j = z_null +kappa*j, where z_j is the Fisher-transformed correlation between the traits amongst subjects of genotype j.
If the traits follow a joint normal distribution, then (thoretically) the Fisher-transformed sample correlation (z) is approximately normally distributed with mean 0 and standard error 1/sqrt(n-3). Otherwise, we have to correct for possible bias in the estimators and scale the weights we assign to them appropriately
Note, that even though each trait follows a normal distribution individually, that does not necessarily imply that the pair of them follow a joint normal distirbution.
kappa_calc(qt1, qt2, g, weight_scale = 1, bias_scale = 0)
kappa_calc(qt1, qt2, g, weight_scale = 1, bias_scale = 0)
qt1 |
A numeric vector. |
qt2 |
A numeric vector. |
g |
An integer vector. |
weight_scale |
Used to appropriately scale the weight assigned to the correlation estimators |
bias_scale |
Used to appropriately scale the bias of the correlation estimators |
A list with the values:
* kappa, the estimated correlation effect * pval, the p-value of the likelihood ratio test
Sigma0 <- matrix(c(1,0,0,1),nrow=2,ncol=2) Sigma1 <- matrix(c(1,0.3,0.3,1),nrow=2,ncol=2) Sigma2 <- matrix(c(1,0.6,0.6,1),nrow=2,ncol=2) geno_vec <- c(rep(0,10000),rep(1,1000),rep(2,100)) Q0 <- MASS::mvrnorm(n = 10000, mu = c(0,0), Sigma = Sigma0) Q1 <- MASS::mvrnorm(n = 1000, mu = c(0,0), Sigma = Sigma1) Q2 <- MASS::mvrnorm(n = 100, mu = c(0,0), Sigma = Sigma2) Q <- rbind(Q0,Q1) Q <- rbind(Q,Q2) qt1_val <- Q[,1] qt2_val <- Q[,2] res <- kappa_calc(qt1_val, qt2_val, geno_vec)
Sigma0 <- matrix(c(1,0,0,1),nrow=2,ncol=2) Sigma1 <- matrix(c(1,0.3,0.3,1),nrow=2,ncol=2) Sigma2 <- matrix(c(1,0.6,0.6,1),nrow=2,ncol=2) geno_vec <- c(rep(0,10000),rep(1,1000),rep(2,100)) Q0 <- MASS::mvrnorm(n = 10000, mu = c(0,0), Sigma = Sigma0) Q1 <- MASS::mvrnorm(n = 1000, mu = c(0,0), Sigma = Sigma1) Q2 <- MASS::mvrnorm(n = 100, mu = c(0,0), Sigma = Sigma2) Q <- rbind(Q0,Q1) Q <- rbind(Q,Q2) qt1_val <- Q[,1] qt2_val <- Q[,2] res <- kappa_calc(qt1_val, qt2_val, geno_vec)
Given a set of variants and environmental traits, and a single case control variable, this function calculates the interaction effect of all possible variant-environmental pairs
pairwise_env_int_CC.calc( cc, g, env, yob = rep(-1, length(cc)), sex = rep(-1, length(cc)), round_imputed = FALSE, dominance_term = FALSE, square_env = FALSE, covariates = as.data.frame(matrix(0, nrow = 0, ncol = 0)), variant_names = paste(rep("variant", ncol(g)), as.character(1:ncol(g)), sep = "_"), env_names = paste(rep("env", ncol(env)), as.character(1:ncol(env)), sep = "_") )
pairwise_env_int_CC.calc( cc, g, env, yob = rep(-1, length(cc)), sex = rep(-1, length(cc)), round_imputed = FALSE, dominance_term = FALSE, square_env = FALSE, covariates = as.data.frame(matrix(0, nrow = 0, ncol = 0)), variant_names = paste(rep("variant", ncol(g)), as.character(1:ncol(g)), sep = "_"), env_names = paste(rep("env", ncol(env)), as.character(1:ncol(env)), sep = "_") )
cc |
A numeric vector |
g |
A matrix, where each colomn represents a variant |
env |
A matrix, where each row represents an environmental variable |
yob |
A numerical vector containing year of birth. If some are unknown they should be marked as -1 |
sex |
A numerical vector containing sex, coded 0 for males, 1 for females and -1 for unknown |
round_imputed |
A boolian variable determining whether imputed genotype values should be rounded to the nearest integer in the analysis. |
dominance_term |
A boolian variable determining whether a dominance term for the variant should be included as a covariates in the analysis |
square_env |
A boolian variable determining whether the square of the environmental trait should be included as a covariate in the analysis |
covariates |
A dataframe containing any other covariates that should be used; one column per covariate |
variant_names |
A list of the names of the variants |
env_names |
A list of the names of the environmental variables |
A dataframe with all possible variant-environmental pairs and their estimated interaction effect
N_run <- 25000 g_vec <- matrix(0, nrow = N_run, ncol = 3) freqs <- runif(ncol(g_vec), min = 0, max = 1) env_vec <- matrix(0, nrow = N_run, ncol = 3) for(i in 1:ncol(g_vec)){ g_vec[, i] <- rbinom(N_run, 2, freqs[i]) } for( i in 1:ncol(env_vec)){ env_vec[, i] <- round(runif(N_run,min=0,max=6)) } cc_vec <- rbinom(N_run,1,0.1 * (1.05 ^ g_vec[, 1]) * (1.06 ^ env_vec[,1]) * (0.95 ^ g_vec[, 2]) * (1.1^(g_vec[, 1] * env_vec[, 1]))) res <- pairwise_env_int_CC.calc(cc_vec, g_vec, env_vec)
N_run <- 25000 g_vec <- matrix(0, nrow = N_run, ncol = 3) freqs <- runif(ncol(g_vec), min = 0, max = 1) env_vec <- matrix(0, nrow = N_run, ncol = 3) for(i in 1:ncol(g_vec)){ g_vec[, i] <- rbinom(N_run, 2, freqs[i]) } for( i in 1:ncol(env_vec)){ env_vec[, i] <- round(runif(N_run,min=0,max=6)) } cc_vec <- rbinom(N_run,1,0.1 * (1.05 ^ g_vec[, 1]) * (1.06 ^ env_vec[,1]) * (0.95 ^ g_vec[, 2]) * (1.1^(g_vec[, 1] * env_vec[, 1]))) res <- pairwise_env_int_CC.calc(cc_vec, g_vec, env_vec)
Given a set of variants and environmental traits, and a single quantitative trait, this function calculates the interaction effect of all possible variant-environmental pairs
pairwise_env_int.calc( qt, g, env, round_imputed = FALSE, dominance_term = FALSE, square_env = FALSE, covariates = as.data.frame(matrix(0, nrow = 0, ncol = 0)), variant_names = paste(rep("variant", ncol(g)), as.character(1:ncol(g)), sep = "_"), env_names = paste(rep("env", ncol(env)), as.character(1:ncol(env)), sep = "_") )
pairwise_env_int.calc( qt, g, env, round_imputed = FALSE, dominance_term = FALSE, square_env = FALSE, covariates = as.data.frame(matrix(0, nrow = 0, ncol = 0)), variant_names = paste(rep("variant", ncol(g)), as.character(1:ncol(g)), sep = "_"), env_names = paste(rep("env", ncol(env)), as.character(1:ncol(env)), sep = "_") )
qt |
A numeric vector |
g |
A matrix, where each colomn represents a variant |
env |
A matrix, where each row represents an environmental variable |
round_imputed |
A boolian variable determining whether imputed genotype values should be rounded to the nearest integer in the analysis. |
dominance_term |
A boolian variable determining whether a dominance term for the variant should be included as a covariates in the analysis |
square_env |
A boolian variable determining whether the square of the environmental trait should be included as a covariate in the analysis |
covariates |
A dataframe containing any other covariates that should be used; one column per covariate |
variant_names |
A list of the names of the variants |
env_names |
A list of the names of the environmental variables |
A dataframe with all possible variant-environmental pairs and their estimated interaction effect
g_vec <- matrix(0, nrow = 100000, ncol = 3) freqs <- runif(ncol(g_vec), min = 0, max = 1) env_vec <- matrix(0, nrow = 100000, ncol = 3) for(i in 1:ncol(g_vec)){ g_vec[, i] <- rbinom(100000, 2, freqs[i]) } for( i in 1:ncol(env_vec)){ env_vec[, i] <- round(runif(100000,min=0,max=6)) } qt_vec <- rnorm(100000) + 0.1 * g_vec[, 1] + 0.2 * g_vec[, 2] -0.1 * env_vec[, 3] + 0.1 * env_vec[, 1] + 0.1 * g_vec[, 1] * env_vec[, 1] res <- pairwise_env_int.calc(qt_vec, g_vec, env_vec)
g_vec <- matrix(0, nrow = 100000, ncol = 3) freqs <- runif(ncol(g_vec), min = 0, max = 1) env_vec <- matrix(0, nrow = 100000, ncol = 3) for(i in 1:ncol(g_vec)){ g_vec[, i] <- rbinom(100000, 2, freqs[i]) } for( i in 1:ncol(env_vec)){ env_vec[, i] <- round(runif(100000,min=0,max=6)) } qt_vec <- rnorm(100000) + 0.1 * g_vec[, 1] + 0.2 * g_vec[, 2] -0.1 * env_vec[, 3] + 0.1 * env_vec[, 1] + 0.1 * g_vec[, 1] * env_vec[, 1] res <- pairwise_env_int.calc(qt_vec, g_vec, env_vec)
Given a set of variants and a case control variable, this function calculates the interaction effect of all possible variant-variant pairs
pairwise_int_CC.calc( cc, g, yob = rep(-1, length(cc)), sex = rep(-1, length(cc)), round_imputed = FALSE, dominance_terms = FALSE, covariates = as.data.frame(matrix(0, nrow = 0, ncol = 0)), variant_names = paste(rep("variant", ncol(g)), as.character(1:ncol(g)), sep = "_") )
pairwise_int_CC.calc( cc, g, yob = rep(-1, length(cc)), sex = rep(-1, length(cc)), round_imputed = FALSE, dominance_terms = FALSE, covariates = as.data.frame(matrix(0, nrow = 0, ncol = 0)), variant_names = paste(rep("variant", ncol(g)), as.character(1:ncol(g)), sep = "_") )
cc |
A numeric vector |
g |
A matrix, where each colomn represents a variant |
yob |
A numerical vector containing year of birth. If some are unknown they should be marked as -1 |
sex |
A numerical vector containing sex, coded 0 for males, 1 for females and -1 for unknown |
round_imputed |
A boolian variable determining whether imputed genotype values should be rounded to the nearest integer in the analysis. |
dominance_terms |
A boolian variable determining whether dominance terms for the variants should be included as covariates in the analysis |
covariates |
A dataframe containing any other covariates that should be used; one column per covariate |
variant_names |
A list of the names of the variants |
A dataframe with all possible variant pairs and their estimated interaction effect
N_run <- 25000 g_vec <- matrix(0, nrow = N_run, ncol = 5) freqs <- runif(ncol(g_vec), min = 0,max = 1) for(i in 1:ncol(g_vec)){ g_vec[, i] <- rbinom(N_run, 2, freqs[i]) } cc_vec <- rbinom(N_run,1,0.1 * (1.05 ^ g_vec[, 1]) * (1.06 ^ g_vec[,2]) * (0.95 ^ g_vec[, 3]) * (1.5^(g_vec[,1] * g_vec[,2]))) res <- pairwise_int_CC.calc(cc_vec, g_vec)
N_run <- 25000 g_vec <- matrix(0, nrow = N_run, ncol = 5) freqs <- runif(ncol(g_vec), min = 0,max = 1) for(i in 1:ncol(g_vec)){ g_vec[, i] <- rbinom(N_run, 2, freqs[i]) } cc_vec <- rbinom(N_run,1,0.1 * (1.05 ^ g_vec[, 1]) * (1.06 ^ g_vec[,2]) * (0.95 ^ g_vec[, 3]) * (1.5^(g_vec[,1] * g_vec[,2]))) res <- pairwise_int_CC.calc(cc_vec, g_vec)
Given a set of variants and a quantitative trait, this function calculates the interaction effect of all possible variant-variant pairs
pairwise_int.calc( qt, g, round_imputed = FALSE, dominance_terms = FALSE, variant_names = paste(rep("variant", ncol(g)), as.character(1:ncol(g)), sep = "_"), covariates = as.data.frame(matrix(0, nrow = 0, ncol = 0)) )
pairwise_int.calc( qt, g, round_imputed = FALSE, dominance_terms = FALSE, variant_names = paste(rep("variant", ncol(g)), as.character(1:ncol(g)), sep = "_"), covariates = as.data.frame(matrix(0, nrow = 0, ncol = 0)) )
qt |
A numeric vector |
g |
A matrix, where each colomn represents a variant |
round_imputed |
A boolian variable determining whether imputed genotype values should be rounded to the nearest integer in the analysis. |
dominance_terms |
A boolian variable determining whether dominance terms for the variants should be included as covariates in the analysis |
variant_names |
A list of the names of the variants |
covariates |
A dataframe containing any other covariates that should be used; one column per covariate |
A dataframe with all possible variant pairs and their estimated interaction effect
g_vec <- matrix(0, nrow = 100000, ncol = 5) freqs <- runif(ncol(g_vec), min = 0, max = 1) for(i in 1:ncol(g_vec)){ g_vec[,i] <- rbinom(100000, 2, freqs[i]) } qt_vec <- rnorm(100000) + 0.1 * g_vec[, 1] + 0.2 * g_vec[, 2] -0.1 * g_vec[, 3] + 0.2 * g_vec[, 1] * g_vec[, 2] res <- pairwise_int.calc(qt_vec, g_vec)
g_vec <- matrix(0, nrow = 100000, ncol = 5) freqs <- runif(ncol(g_vec), min = 0, max = 1) for(i in 1:ncol(g_vec)){ g_vec[,i] <- rbinom(100000, 2, freqs[i]) } qt_vec <- rnorm(100000) + 0.1 * g_vec[, 1] + 0.2 * g_vec[, 2] -0.1 * g_vec[, 3] + 0.2 * g_vec[, 1] * g_vec[, 2] res <- pairwise_int.calc(qt_vec, g_vec)
This function creates genetic risk scores, with the option of including interactions or dominance effects. The score is automatically shifted to have mean 0.
PRS_creator( g, betas, dominance_effects = rep(0, length(betas)), interaction_effects = matrix(0, nrow = 0, ncol = 0), log_scale = FALSE )
PRS_creator( g, betas, dominance_effects = rep(0, length(betas)), interaction_effects = matrix(0, nrow = 0, ncol = 0), log_scale = FALSE )
g |
A matrix, where each colomn represents a variant and each line represents a subject |
betas |
A numeric vector, representing the (additive) effects of the variants |
dominance_effects |
A numeric vector, representing dominance effects of the variants |
interaction_effects |
A matrix with three columns. First two columns are integers that correspond to the variants that are interacting. The third column is the effect size. |
log_scale |
A Boolian variabe. If true all analysis is done on log-transformed effect values and the resulting score is transformed back to an exponential scale in the end. |
A numeric vector with a poligenic risk score for each subject
g_vec <- matrix(0, nrow = 100000, ncol = 5) freqs <- runif(ncol(g_vec), min = 0, max = 1) for(i in 1:ncol(g_vec)){ g_vec[,i] <- rbinom(100000, 2, freqs[i]) } beta_vec <- runif(5, min = -0.5, max = 0.5) dom_vec <- runif(5, min = -0.5, max = 0.5) int_vec <- matrix(0,nrow = 2, ncol = 3) int_vec[, 1] <- c(1, 3) int_vec[, 2] <- c(2, 5) int_vec[, 3] <- runif(2, min = -0.5, max = 0.5) res <- PRS_creator(g_vec, beta_vec, dominance_effects = dom_vec, interaction_effects = int_vec)
g_vec <- matrix(0, nrow = 100000, ncol = 5) freqs <- runif(ncol(g_vec), min = 0, max = 1) for(i in 1:ncol(g_vec)){ g_vec[,i] <- rbinom(100000, 2, freqs[i]) } beta_vec <- runif(5, min = -0.5, max = 0.5) dom_vec <- runif(5, min = -0.5, max = 0.5) int_vec <- matrix(0,nrow = 2, ncol = 3) int_vec[, 1] <- c(1, 3) int_vec[, 2] <- c(2, 5) int_vec[, 3] <- runif(2, min = -0.5, max = 0.5) res <- PRS_creator(g_vec, beta_vec, dominance_effects = dom_vec, interaction_effects = int_vec)
This function trains a poligenic risk score model on a dataset, and then imputes the risk score into another dataset
train_and_impute_PRS( qt_training, g_training, g_impute, dominance_effects = rep(FALSE, ncol(g_training)), interaction_effects = matrix(0, nrow = 0, ncol = 0) )
train_and_impute_PRS( qt_training, g_training, g_impute, dominance_effects = rep(FALSE, ncol(g_training)), interaction_effects = matrix(0, nrow = 0, ncol = 0) )
qt_training |
A numeric vector. Represents the qt values of the data we train the model on. |
g_training |
A matrix, where each colomn represents a variant and each line represents a subject in the training data |
g_impute |
A matrix, where each column represents a variant and each line represents a subject. |
dominance_effects |
A Boolian vector, each term determines whether a dominance term for the corresponding variant is used in the model. |
interaction_effects |
An integer matrix with two columns. Each line represents a pair of interacting variants that should be included in the model. |
Returns a list with the following objects * PRS_imputed, the imputed PRS values * PRS_training, the PRS values for the training data * Residuals_training, the residuals from the model in the training data
g_train_vec <- matrix(0, nrow = 100000, ncol = 5) freqs <- runif(ncol(g_train_vec), min = 0, max = 1) for(i in 1:ncol(g_train_vec)){ g_train_vec[,i] <- rbinom(100000, 2, freqs[i]) } g_impute_vec <- matrix(0, nrow = 50000, ncol = 5) for(i in 1:ncol(g_impute_vec)){ g_impute_vec[,i] <- rbinom(50000, 2, freqs[i]) } dom_vec <- c(TRUE, FALSE, FALSE, TRUE, FALSE) int_vec <- matrix(c(1, 2, 4, 5), nrow = 2 , ncol = 2) qt_vec <- rnorm(100000) + 0.2 * g_train_vec[, 1] + 0.3 * g_train_vec[, 1] * g_train_vec[, 4] res <- train_and_impute_PRS(qt_vec, g_train_vec, g_impute_vec, dominance_effects = dom_vec, interaction_effects = int_vec)
g_train_vec <- matrix(0, nrow = 100000, ncol = 5) freqs <- runif(ncol(g_train_vec), min = 0, max = 1) for(i in 1:ncol(g_train_vec)){ g_train_vec[,i] <- rbinom(100000, 2, freqs[i]) } g_impute_vec <- matrix(0, nrow = 50000, ncol = 5) for(i in 1:ncol(g_impute_vec)){ g_impute_vec[,i] <- rbinom(50000, 2, freqs[i]) } dom_vec <- c(TRUE, FALSE, FALSE, TRUE, FALSE) int_vec <- matrix(c(1, 2, 4, 5), nrow = 2 , ncol = 2) qt_vec <- rnorm(100000) + 0.2 * g_train_vec[, 1] + 0.3 * g_train_vec[, 1] * g_train_vec[, 4] res <- train_and_impute_PRS(qt_vec, g_train_vec, g_impute_vec, dominance_effects = dom_vec, interaction_effects = int_vec)
Given is a set of (continuous) variables and a qt trait. First, this function adjusts the trait for the mean effects of the variables with a linear model. Next, the variance effect of the variables are estimated and the trait is adjusted further by scaling it in accordance with the results.
var.adj(qt, x, iter_num = 50, eps_param = 1e-10)
var.adj(qt, x, iter_num = 50, eps_param = 1e-10)
qt |
A numeric vector. |
x |
A numeric matrix, each column represents a covariate. |
iter_num |
An integer. Represents the number of iterations performed in the Gauss-Newton algorithm |
eps_param |
A number. The Gauss-Newton algorithm terminates if the incriment change of all variance estimates is smaller than this number. |
A vector, representing the adjusted trait.
n_val <- 50000 x <- matrix(0,nrow = n_val, ncol = 4) for(i in 1:4) { x[, i] <- rnorm(n_val) } var_vec <- exp(0.2 * x[, 1] - 0.3 * x[, 4]) qt_vec <- rnorm(n_val, 0, sqrt(var_vec)) res <- var.adj(qt_vec, x)
n_val <- 50000 x <- matrix(0,nrow = n_val, ncol = 4) for(i in 1:4) { x[, i] <- rnorm(n_val) } var_vec <- exp(0.2 * x[, 1] - 0.3 * x[, 4]) qt_vec <- rnorm(n_val, 0, sqrt(var_vec)) res <- var.adj(qt_vec, x)
This function finds the association between the predicted uncertanty of some estimates of a trait to the "actual uncertanty" of the estimates. Suppose we have estimates of some trait (this might be a polygenic risk score). Moreover, suppose we have assigned a variance value to each estimate (this might be a variance risk score) to reflect how certain we believe we are about each estimate. Given the true trait values, this function evaluates how well the assigned variance values reflect reality.
We use a likelihood ratio test with 1 degree of freedom * H0: y~N(mu+b*m_score,sigma_sq), * H1: y~N(mu+b*m_score,sigma_sq*(v_score)^a), where y is the trait m_score is the estimate of the trait and v_score is the variance assigned to the estimate. Thus H0 has three degrees of freedom (mu,b,sigma_sq), whereas H1 has four (mu,b,sigma_sq,a)
Var.assoc(qt, m_score, v_score, iter = 50)
Var.assoc(qt, m_score, v_score, iter = 50)
qt |
A numeric vector. |
m_score |
A numeric vector |
v_score |
A numeric vector with positive values |
iter |
A number of iterations for the Gauss-Newton algorithm |
A list with the values:
* a, the association between v_score and the actual variance. * pval, the p-value of the likelihood ratio test
n_val <- 50000L trait_vec <- rnorm(n_val,0,1) var_vec <- exp(rnorm(n_val,0,0.1)) est_vec <- trait_vec+rnorm(n_val,0,var_vec) res <- Var.assoc(trait_vec,est_vec,var_vec, iter = 20)
n_val <- 50000L trait_vec <- rnorm(n_val,0,1) var_vec <- exp(rnorm(n_val,0,0.1)) est_vec <- trait_vec+rnorm(n_val,0,var_vec) res <- Var.assoc(trait_vec,est_vec,var_vec, iter = 20)
Estimates the variance effect of several continuous variables jointly
var.summary(qt, x, iter_num = 50, eps_param = 1e-10)
var.summary(qt, x, iter_num = 50, eps_param = 1e-10)
qt |
A numeric vector. |
x |
A data frame, each column represents a covariate that should be numeric. |
iter_num |
An integer. Represents the number of iterations performed in the Gauss-Newton algorithm |
eps_param |
A number. The Gauss-Newton algorithm terminates if the incriment change of all variance estimates is smaller than this number. |
A list with the following objects: * summary, a dataframe with a variance effect estimate for each variable and summary statistics * chi2, the chi2 statistic obtained by considering all parameteres jointly * df, degrees of freedom for the chi2 statistic * pval, p-value of the model * adjusted_values, a vector with qt values that have been adjusted for both mean and variance effects
n_val <- 50000 x <- as.data.frame(matrix(0,nrow = n_val, ncol = 4)) colnames(x) <- c('A','B','C','D') for(i in 1:4) { x[, i] <- rnorm(n_val) } var_vec <- exp(0.2 * x[, 1] - 0.3 * x[, 4]) qt_vec <- rnorm(n_val, 0, sqrt(var_vec)) res <- var.summary(qt_vec, x)
n_val <- 50000 x <- as.data.frame(matrix(0,nrow = n_val, ncol = 4)) colnames(x) <- c('A','B','C','D') for(i in 1:4) { x[, i] <- rnorm(n_val) } var_vec <- exp(0.2 * x[, 1] - 0.3 * x[, 4]) qt_vec <- rnorm(n_val, 0, sqrt(var_vec)) res <- var.summary(qt_vec, x)
This tool creates a line plot that compares the predicted variance of data to its actual variance.
VarGS.plot( qt, v_score, bins = 10, xlab = "Predicted variance", ylab = "Variance", title = "" )
VarGS.plot( qt, v_score, bins = 10, xlab = "Predicted variance", ylab = "Variance", title = "" )
qt |
A numeric vector. |
v_score |
A numeric vector. |
bins |
An integer. |
xlab |
A string. |
ylab |
A string. |
title |
A string. |
A plot comparing predicted variance to actual variance.
n_val <- 100000L v_vec <- exp(rnorm(n_val, 0, 0.1)) qt_vec <- stats::rnorm(n_val, 0, sqrt(v_vec)) VarGS.plot(qt_vec, v_vec)
n_val <- 100000L v_vec <- exp(rnorm(n_val, 0, 0.1)) qt_vec <- stats::rnorm(n_val, 0, sqrt(v_vec)) VarGS.plot(qt_vec, v_vec)
This tool creates violin plots corresponding to each genotype.
Viol.by.gen(qt, g, trait_name = "qt trait", title = "")
Viol.by.gen(qt, g, trait_name = "qt trait", title = "")
qt |
A numeric vector. |
g |
An integer vector. |
trait_name |
A string. |
title |
A string. |
A violin plot
n_val <- 50000L geno_vec <- sample(c(0, 1, 2), size = n_val, replace = TRUE) qt_vec <- rnorm(n_val) * (1.3^geno_vec) + 1 * geno_vec Viol.by.gen(qt_vec, geno_vec)
n_val <- 50000L geno_vec <- sample(c(0, 1, 2), size = n_val, replace = TRUE) qt_vec <- rnorm(n_val) * (1.3^geno_vec) + 1 * geno_vec Viol.by.gen(qt_vec, geno_vec)