Title: | (Non)Additive Genetic Relatedness Matrices |
---|---|
Description: | Constructs (non)additive genetic relationship matrices, and their inverses, from a pedigree to be used in linear mixed effect models (A.K.A. the 'animal model'). Also includes other functions to facilitate the use of animal models. Some functions have been created to be used in conjunction with the R package 'asreml' for the 'ASReml' software, which can be obtained upon purchase from 'VSN' international (<https://vsni.co.uk/software/asreml>). |
Authors: | Matthew Wolak [cre, aut] |
Maintainer: | Matthew Wolak <[email protected]> |
License: | GPL (>=2) |
Version: | 2.18.0 |
Built: | 2025-02-22 04:09:21 UTC |
Source: | https://github.com/matthewwolak/nadiv |
Constructs (non)additive genetic relationship matrices, and their inverses, from a pedigree to be used in linear mixed effect models (A.K.A. the 'animal model'). Also includes other functions to facilitate the use of animal models. Some functions have been created to be used in conjunction with the R package for ASReml software, which can be obtained upon purchase from VSN international (<https://www.vsni.co.uk/software/asreml>).
Matthew Wolak [email protected]
Useful links:
Calculates AIC/AICc values, AIC differences, Likelihood of models, and model probabilities.
aic(logLik, fp, n = NULL)
aic(logLik, fp, n = NULL)
logLik |
A vector of model log-Likelihoods |
fp |
A vector containing the numbers of free parameters of each model included in the logLik vector |
n |
An optional vector of sample sizes for each model. Used to calculate AICc (small sample unbiased AIC). |
Calculations and notation follows chapter 2 of Burnham and Anderson (2002).
a list
:
vector containing AIC/AICc (depending on value of n
)
vector containing AIC differences from the minimum AIC(c)
vector containing likelihoods for each model, given the data. Represents the relative strength of evidence for each model.
Akaike weights.
Burnham, K.P. and D.R. Anderson. 2002. Model Selection and Multimodel Inference. A Practical Information-Theoretic Approach, 2nd edn. Springer, New York.
aic(c(-3139.076, -3136.784, -3140.879, -3152.432), c(8, 7, 8, 5))
aic(c(-3139.076, -3136.784, -3140.879, -3152.432), c(8, 7, 8, 5))
Produces the 1-alpha Upper and Lower Confidence Limits for the variance components in an ASReml-R model.
aiCI(asr.model, Dimnames = NULL, alpha = 0.05)
aiCI(asr.model, Dimnames = NULL, alpha = 0.05)
asr.model |
Object from a call to |
Dimnames |
A vector of characters if names are desired for the output.
If not specified, the default labels from the |
alpha |
A numeric value indicating the level of Type I error for constructing the Confidence Intervals. |
Variances from the inverse of the Average Information matrix of an ASReml
model are translated according to the varTrans
function and
used in constructing the 1-alpha Confidence Interval.
A matrix
is returned with a row for each variance component.
The three columns correspond to the Lower Confidence Limit, estimate from
the asreml
model, and Upper Confidence Limit for each variance
component.
The vector of Dimnames
should match the same order of variance
components specified in the model.
## Not run: library(asreml) ginvA <- ainverse(warcolak) ginvD <- makeD(warcolak[, 1:3])$listDinv attr(ginvD, "rowNames") <- as.character(warcolak[, 1]) attr(ginvD, "INVERSE") <- TRUE warcolak$IDD <- warcolak$ID warcolak.mod <- asreml(trait1 ~ sex, random = ~ vm(ID, ginvA) + vm(IDD, ginvD), data = warcolak) summary(warcolak.mod)$varcomp aiCI(warcolak.mod) ## End(Not run)
## Not run: library(asreml) ginvA <- ainverse(warcolak) ginvD <- makeD(warcolak[, 1:3])$listDinv attr(ginvD, "rowNames") <- as.character(warcolak[, 1]) attr(ginvD, "INVERSE") <- TRUE warcolak$IDD <- warcolak$ID warcolak.mod <- asreml(trait1 ~ sex, random = ~ vm(ID, ginvA) + vm(IDD, ginvD), data = warcolak) summary(warcolak.mod)$varcomp aiCI(warcolak.mod) ## End(Not run)
This function returns the sampling (co)variances of the variance components fitted in an mixed model solved using the Average Information algorithm
aiFun(model = NULL, AI.vec = NULL, inverse = TRUE, Dimnames = NULL)
aiFun(model = NULL, AI.vec = NULL, inverse = TRUE, Dimnames = NULL)
model |
A model object returned by a call to the |
AI.vec |
A numeric vector of the Average Information matrix. The order must be the row-wise lower triangle of the matrix (including the diagonal). |
inverse |
A logical indicating whether the elements of the inverse Average Information matrix are being provided. If FALSE, the Average Information matrix (and not its inverse) is being supplied. |
Dimnames |
A vector of characters if names are desired for the output
(co)variance matrix. If not specified, either the default labels from the
|
The inverse of the Average Information matrix provides the sampling
(co)variance of each (co)variance component in the random portion of the
mixed model. If a model from the ASReml-R function is supplied (model
is not NULL), this function extracts the inverse of the AI matrix from an
ASReml-R model and organizes it so that the sampling covariances between
random terms are the off-diagonals and the sampling variances of random
terms are located along the diagonal. The order of the variances along the
diagonal is the same as the order entered in the random section of the
asreml
function. This is also the same order as the rows of a call to
the summary function, summary(model)$varcomp
.
If model
is NULL then AI.vec
should contain the vector of
values from an Average Information matrix. The function will then
reconstruct this matrix, invert it, and supply the sampling (co) variances
for the random terms in the model as described above. Note, either
model
or AI.vec
must be supplied, but not both.
A matrix
of k x k dimensions is returned, if k is the number
of (co)variance components estimated in the model. Sampling covariances are
above and below the diagonal while variances are located along the
diagonal. If Dimnames
is specified, the row and column names are
assigned according the vector of names in this argument.
The vector of Dimnames
should match the same order of variance
components specified in the model.
Gilmour, A.R., Gogel, B.J., Cullis, B.R., & Thompson, R. 2009. ASReml User Guide Release 3.0. VSN International Ltd., Hemel Hempstead, UK.
## Not run: library(asreml) ginvA <- ainverse(warcolak) ginvD <- makeD(warcolak[, 1:3])$listDinv attr(ginvD, "rowNames") <- as.character(warcolak[, 1]) attr(ginvD, "INVERSE") <- TRUE warcolak$IDD <- warcolak$ID warcolak.mod <- asreml(trait1 ~ sex, random = ~ vm(ID, ginvA) + vm(IDD, ginvD), data = warcolak) summary(warcolak.mod)$varcomp aiFun(model = warcolak.mod, Dimnames = c("Va", "Vd", "Ve"), inverse = TRUE) ## End(Not run) output <- c(7.3075921, 7.0635161, 12.3423380, 1.9539486, 2.7586340, 0.6626111) aiFun(AI.vec = output, inverse = FALSE, Dimnames = c("Va", "Vd", "Ve"))
## Not run: library(asreml) ginvA <- ainverse(warcolak) ginvD <- makeD(warcolak[, 1:3])$listDinv attr(ginvD, "rowNames") <- as.character(warcolak[, 1]) attr(ginvD, "INVERSE") <- TRUE warcolak$IDD <- warcolak$ID warcolak.mod <- asreml(trait1 ~ sex, random = ~ vm(ID, ginvA) + vm(IDD, ginvD), data = warcolak) summary(warcolak.mod)$varcomp aiFun(model = warcolak.mod, Dimnames = c("Va", "Vd", "Ve"), inverse = TRUE) ## End(Not run) output <- c(7.3075921, 7.0635161, 12.3423380, 1.9539486, 2.7586340, 0.6626111) aiFun(AI.vec = output, inverse = FALSE, Dimnames = c("Va", "Vd", "Ve"))
Given a model object from asreml
and a range of estimates of the
parameter, the function will supply the likelihood ratio test statistic for
the comparison of the full model to one where the parameter of interest is
constrained.
constrainFun(parameter.val, full, fm2, comp, G, mit = 600)
constrainFun(parameter.val, full, fm2, comp, G, mit = 600)
parameter.val |
a value for which the log-Likelihood of a model is to be calculated |
full |
the full model |
fm2 |
starting values for the full model |
comp |
which variance component to constrain |
G |
logical, indicating if the component is part of the G structure |
mit |
numeric, indicating maximum number of iterations for the constrained asreml model |
A vector
of length 1 returning either a numeric
value
corresponding to the likelihood ratio test statistic or else the missing
value indicator NA
.
See also LRTest
This function simulates effects for random terms in a linear mixed model based on design matrices. The intended purpose is for simulating environmental effects from a pedigree.
drfx(G, fac, dataf, ...)
drfx(G, fac, dataf, ...)
G |
The variance-covariance matrix to model the effects after |
fac |
A character indicating the factor in |
dataf |
A dataframe with |
... |
Arguments to be passed to the internal use of |
If G = x, where 'x' is a single number, then 'x' should still be specified
as a 1-by-1 matrix (e.g., matrix(x)
). Note, the G-matrix should
never have a structure which produces a correlation exactly equal to 1 or
-1. Instead, covariances should be specified so as to create a correlation
of slightly less than (greater than) 1 (-1). For example: 0.9999 or
-0.9999.
fx |
A matrix with 'd' columns of random effects |
Z |
A
design matrix (of the format 'Matrix') from which the random effects in
|
# Create maternal common environment effects for 2 traits # with perfectly correlated effects Gmat <- matrix(c(10, 7.071, 7.071, 5), 2, 2) cfx <- drfx(G = Gmat, fac = "Dam", dataf = warcolak[1:200, ])
# Create maternal common environment effects for 2 traits # with perfectly correlated effects Gmat <- matrix(c(10, 7.071, 7.071, 5), 2, 2) cfx <- drfx(G = Gmat, fac = "Dam", dataf = warcolak[1:200, ])
Pedigree adapted from Fikse 2009 with genetic groups and fuzzy classification
F2009
F2009
A data.frame
with 16 observations on the following 11 variables:
a factor with levels indicating the unique individuals (including phantom parents) and genetic groups
a factor of observed maternal identities
a factor vector of observed paternal identities
a factor of maternal identities with genetic groups
inserted instead of NA
a factor of paternal identities with genetic groups
inserted instead of NA
a factor of maternal identities with phantom parents
inserted instead of NA
a factor of paternal identities with phantom parents
inserted instead of NA
a factor of genetic groups to which each phantom parent belongs
a numeric vector with probabilities of group g1
membership for each phantom parent
a numeric vector with probabilities of group g2
membership for each phantom parent
a numeric vector with probabilities of group g3
membership for each phantom parent
Fikse, F. 2009. Fuzzy classification of phantom parent groups in an animal model. Genetics Selection Evolution 41:42.
data(F2009) str(F2009)
data(F2009) str(F2009)
Pedigree, adapted from Table 1 in Fernando & Grossman (1990)
FG90
FG90
A data.frame
with 8 observations on the following 4 variables:
a factor with levels 1
2
3
4
5
6
7
8
a factor with levels 2
4
6
a factor with levels 1
3
5
a factor with levels 0
1
Fernando, R.L. & M. Grossman. 1990. Genetic evaluation with autosomal and X-chromosomal inheritance. Theoretical and Applied Genetics 80:75-80.
data(FG90) str(FG90)
data(FG90) str(FG90)
Given a pedigree, all pairs of individuals that are double first cousins are returned.
findDFC( pedigree, exact = FALSE, parallel = FALSE, ncores = getOption("mc.cores", 2L) )
findDFC( pedigree, exact = FALSE, parallel = FALSE, ncores = getOption("mc.cores", 2L) )
pedigree |
A pedigree with columns organized: ID, Dam, Sire |
exact |
A logical statement indicating if individuals who are exactly double first cousins are to be identified |
parallel |
A logical statement indicating if parallelization should be attempted. Note, only reliable for Mac and Linux operating systems. |
ncores |
Number of cpus to use, default is maximum available |
When exact = TRUE, only those individuals whose grandparents are completely unrelated will be identified as double first cousins. When exact = FALSE, as long as the parents of individuals i and j are two sets of siblings (i.e., either sires full brothers/dams full sisters or two pairs of opposite sex full sibs) then i and j will be considered double first cousins. In the event where the grandparents of i and j are also related, exact = FALSE will still consider i and j full sibs, even though genetically they will be more related than exact = TRUE double first cousins.
parallel
= TRUE should only be used on Linux or Mac OSes (i.e., not
Windows).
a list
:
gives the list of row numbers for all the pairs of individuals that are related as double first cousins.
gives the list of IDs, as characters, for all the pairs of individuals that are related as double first cousins.
If two individuals, i and j, are double first cousins, then i's siblings will also be double first cousins with j's siblings. Therefore, this is the total number of family pairs where offspring are related as double first cousins.
For every individual in a pedigree, the function identifies either the one female or male ancestor that is a founder (defined here as an individual identity in the pedigree for which both dam and sire information are missing).
founderLine(pedigree, sex)
founderLine(pedigree, sex)
pedigree |
A pedigree where the columns are ordered ID, Dam, Sire, Sex |
sex |
Character indicating the column name in pedigree identifying either the dam (for matriline) or sire (for patriline) identities |
Missing parents (e.g., base population) should be denoted by either 'NA', '0', or '*'.
Individuals with a missing parent for the column identified by the 'sex' argument are assigned themselves as their founder line. Thus, the definition of the founder population from a given pedigree is simply all individuals with missing parents (and in this case just a single missing parent classifies an individual as a founder).
A vector of length equal to the number of rows in the pedigree
founderLine(FG90, sex = "dam") # matriline from this example pedigree #Create random pedigree, tracking the matrilines ## Then compare with founderLine() output K <- 8 # No. individuals per generation (KEEP and even number) gen <- 10 # No. of generations datArr <- array(NA, dim = c(K, 5, gen)) dimnames(datArr) <- list(NULL, c("id", "dam", "sire", "sex", "matriline"), NULL) # initialize the data array datArr[, "id", ] <- seq(K*gen) datArr[, "sex", ] <- c(1, 2) femRow <- which(datArr[, "sex", 1] == 2) # assume this is same each generation # (Why K should always be an even number) datArr[femRow, "matriline", 1] <- femRow # males have overlapping generations, BUT females DO NOT for(g in 2:gen){ datArr[, "sire", g] <- sample(c(datArr[femRow-1, "id", 1:(g-1)]), size = K, replace = TRUE) gdams <- sample(femRow, size = K, replace = TRUE) datArr[, c("dam", "matriline"), g] <- datArr[gdams, c("id", "matriline"), g-1] } ped <- data.frame(apply(datArr, MARGIN = 2, FUN = function(x){x})) nrow(ped) #Now run founderLine() and compare ped$line <- founderLine(ped, sex = "dam") stopifnot(identical(ped$matriline, ped$line), sum(ped$matriline-ped$line, na.rm = TRUE) == 0, range(ped$matriline-ped$line, na.rm = TRUE) == 0)
founderLine(FG90, sex = "dam") # matriline from this example pedigree #Create random pedigree, tracking the matrilines ## Then compare with founderLine() output K <- 8 # No. individuals per generation (KEEP and even number) gen <- 10 # No. of generations datArr <- array(NA, dim = c(K, 5, gen)) dimnames(datArr) <- list(NULL, c("id", "dam", "sire", "sex", "matriline"), NULL) # initialize the data array datArr[, "id", ] <- seq(K*gen) datArr[, "sex", ] <- c(1, 2) femRow <- which(datArr[, "sex", 1] == 2) # assume this is same each generation # (Why K should always be an even number) datArr[femRow, "matriline", 1] <- femRow # males have overlapping generations, BUT females DO NOT for(g in 2:gen){ datArr[, "sire", g] <- sample(c(datArr[femRow-1, "id", 1:(g-1)]), size = K, replace = TRUE) gdams <- sample(femRow, size = K, replace = TRUE) datArr[, c("dam", "matriline"), g] <- datArr[gdams, c("id", "matriline"), g-1] } ped <- data.frame(apply(datArr, MARGIN = 2, FUN = function(x){x})) nrow(ped) #Now run founderLine() and compare ped$line <- founderLine(ped, sex = "dam") stopifnot(identical(ped$matriline, ped$line), sum(ped$matriline-ped$line, na.rm = TRUE) == 0, range(ped$matriline-ped$line, na.rm = TRUE) == 0)
Given a pedigree, the function assigns the generation number to which each individual belongs.
genAssign(pedigree, ...) ## Default S3 method: genAssign(pedigree, ...) ## S3 method for class 'numPed' genAssign(pedigree, ...)
genAssign(pedigree, ...) ## Default S3 method: genAssign(pedigree, ...) ## S3 method for class 'numPed' genAssign(pedigree, ...)
pedigree |
A pedigree where the columns are ordered ID, Dam, Sire |
... |
Arguments to be passed to methods |
0 is the base population.
Migrants, or any individuals where both parents are unknown, are assigned to generation zero. If parents of an individual are from two different generations (e.g., dam = 0 and sire = 1), the individual is assigned to the generation following the greater of the two parents (e.g., 2 in this example).
A vector of values is returned. This vector is in the same order as the ID column of the pedigree.
Functions that perform and summarize gene dropping conducted on supplied pedigrees
geneDrop( pedigree, N, parallel = FALSE, ncores = getOption("mc.cores", 2L), ... ) ## Default S3 method: geneDrop( pedigree, N, parallel = FALSE, ncores = getOption("mc.cores", 2L), ... ) ## S3 method for class 'numPed' geneDrop( pedigree, N, parallel = FALSE, ncores = getOption("mc.cores", 2L), ... )
geneDrop( pedigree, N, parallel = FALSE, ncores = getOption("mc.cores", 2L), ... ) ## Default S3 method: geneDrop( pedigree, N, parallel = FALSE, ncores = getOption("mc.cores", 2L), ... ) ## S3 method for class 'numPed' geneDrop( pedigree, N, parallel = FALSE, ncores = getOption("mc.cores", 2L), ... )
pedigree |
A pedigree with columns organized: ID, Dam, Sire. |
N |
The number of times to iteratively trace alleles through the pedigree |
parallel |
A logical indicating whether or not to use parallel processing. Note, this may only be available for Mac and Linux operating systems. |
ncores |
The number of cpus to use when constructing the dominance relatedness matrix. Default is all available. |
... |
Other arguments that can be supplied to alter what summaries are reported. |
Missing parents (e.g., base population) should be denoted by either 'NA', '0' , or '*'.
parallel
= TRUE should only be used on Linux or Mac operating systems
(i.e., not Windows).
Founder allelic values (the alleles assigned to an individual's maternal, paternal, or both haplotypes when the maternal, paternal, or both parents are missing) are equivalent positive and negative integer values corresponding to the maternal and paternal haplotypes, respectively. For example, if the first individual in the pedigree has two unknown parents it will have the following two allelic values: 1=maternal haplotype and -1=paternal haplotype.
a list
:
Original identities in the pedigree
Simulated maternal haplotypes
Simulated paternal haplotypes
Pedigree in class numPed
for convenient
post-processing of haplotypes
geneDrop(Mrode2, N = 10)
geneDrop(Mrode2, N = 10)
Calculates the genomic contribution each genetic group makes to every individual in a pedigree
ggcontrib(pedigree, ggroups = NULL, fuzz = NULL, output = "matrix")
ggcontrib(pedigree, ggroups = NULL, fuzz = NULL, output = "matrix")
pedigree |
A pedigree where the columns are ordered ID, Dam, Sire |
ggroups |
An optional vector of either: genetic group assignment for
every individual or just the unique genetic groups. |
fuzz |
A matrix containing the fuzzy classification of phantom parents
into genetic groups. |
output |
Format for the output |
The specification of genetic groups is done in one of two approaches, either using fuzzy classification or not.
Fuzzy classification enables phantom parents to be assigned to (potentially)
more than one genetic group (Fikse 2009). This method requires unique
phantom parent identities to be included in the pedigree for all observed
individuals with unknown parents. For 'p' phantom parents, 'p' identities
should be listed as individuals in the first 'p' rows of the pedigree and
these should be the only individuals in the pedigree with missing values in
their Dam and Sire columns (denoted by either 'NA', '0', or '*'). The matrix
supplied to the fuzz
argument should have 'p' rows (one for each
phantom parent) and 'r' columns, where 'r' is the number of genetic groups.
Non-fuzzy classification can handle the specification of genetic groups in three formats:
(1) similar to ASReml's format for specifying genetic groups, the first 'r'
rows of the pedigree (given to the pedigree
argument) contain the
label for each genetic group in the ID column and indicate missing values
for the Dam and Sire columns (denoted by either 'NA', '0', or '*'). No
object is supplied to the ggroups
argument. All individuals in the
pedigree must then have one of the 'r' genetic groups as parent(s) for each
unknown parent. Note, a warning message indicating In
numPed(pedigree): Dams appearing as Sires
is expected, since the dam and
sire can be the same for all individuals in the pedigree composing the base
population of a genetic group.
(2) similar to Jarrod Hadfield's rbv
function arguments in the
MCMCglmm
package, for a pedigree of dimension i x 3 (given to the
pedigree
argument), where 'i' is the total number of individuals in
the pedigree, a similar vector of length 'i' is given to the ggroups
argument. This vector lists either the genetic group to which each
individual's phantom parents belong or NA if the individual is not to be
considered part of one of the base populations (genetic groups). NOTE, this
approach does not allow phantom dams and phantom sires of a given individual
to be from different genetic groups.
(3) similar to DMU's format for specifying genetic groups. For a pedigree of
dimension i x 3 (given to the pedigree
argument), where 'i' is the
total number of individuals in the pedigree, instead of missing values for a
parent, one of the 'r' genetic groups is specified. A character vector of
length 'r' with unique genetic group labels is given to the ggroups
argument. Note, that all individuals with a missing parent should have a
genetic group substituted instead of the missing value symbol (i.e., either
'NA', '0', or '*').
Returns i x r genetic group contributions to all 'i' individuals
from each of the 'r' genetic groups. Default output is an object of class
matrix
(dense), but this format can be changed (e.g., "dgCMatrix"
for a sparse matrix).
Fikse, F. 2009. Fuzzy classification of phantom parent groups in an animal model. Genetics, Selection, Evolution. 41:42.
Quaas, R.L. 1988. Additive genetic model with groups and relationships. Journal of Dairy Science. 71:1338-1345.
# Use the pedigree from Quaas 1988 (See `data(Q1988)`) ########################## # Fuzzy classification ## Fuzzy classification with complete assignment to one group Q1988fuzz <- Q1988[-c(1:2), c("id", "phantomDam", "phantomSire")] Qfnull <- matrix(c(1,0,0,1,0, 0,1,1,0,1), nrow = 5, ncol = 2, dimnames = list(letters[1:5], c("g1", "g2"))) (Qfuzznull <- ggcontrib(Q1988fuzz, fuzz = Qfnull)) ## Should be identical to the non-fuzzy classification output # format (1) from above (Q <- ggcontrib(Q1988[-c(3:7), c(1,4,5)])) stopifnot(Qfuzznull == Q) ## Fuzzy classification with arbitrary assignments Qf <- matrix(c(1,0,0.5,0.5,0.5, 0,1,0.5,0.5,0.5), nrow = 5, ncol = 2, dimnames = list(letters[1:5], c("g1", "g2"))) (Qfuzz <- ggcontrib(Q1988fuzz, fuzz = Qf)) ## Using the pedigree and fuzzy classification in Fikse (2009) F2009fuzz <- data.frame(id = c(letters[1:7], LETTERS[1:6]), dam = c(rep(NA, 7), "a", "c", "e", "A", "C", "D"), sire = c(rep(NA, 7), "b", "d", "f", "B", "g", "E")) Ff <- matrix(c(1,0,1,0,0,0,0.2, 0,1,0,0.6,0,0.3,0.4, 0,0,0,0.4,1,0.7,0.4), nrow = 7, ncol = 3, dimnames = list(letters[1:7], paste0("g", 1:3))) # Actual Q matrix printed in Fikse (2009) Fikse2009Q <- matrix(c(0.5,0.5,0,0.5,0.1,0.3, 0.5,0.3,0.15,0.4,0.275,0.3375, 0,0.2,0.85,0.1,0.625,0.3625), nrow = 6, ncol = 3, dimnames = list(LETTERS[1:6], paste0("g", seq(3)))) Ffuzz <- ggcontrib(F2009fuzz, fuzz = Ff) (diffFfuzz <- Ffuzz - Fikse2009Q) # Encountering some rounding error stopifnot(length((drop0(diffFfuzz, tol = 1e-12))@x) == 0) ########################## # Non-fuzzy classification # format (1) from above Q1 <- Q1988[-c(3:7), c(1,4,5)] (gg1 <- ggcontrib(Q1, ggroups = NULL)) # note the warning message which is typical # format (2) from above Q2 <- Q1988[-c(1:7), 1:3] # arbitrarily assign individuals genetic groups for unknown parents ## Means gg2 is NOT comparable to gg1 or gg3! ggvec.in <- c("g1", "g2", "g1", NA) (gg2 <- ggcontrib(Q2, ggroups = ggvec.in)) # format (3) from above Q3 <- Q1988[-c(1:7), c(1,4,5)] gg3 <- ggcontrib(Q3, ggroups = c("g1", "g2")) stopifnot(gg1 == gg3)
# Use the pedigree from Quaas 1988 (See `data(Q1988)`) ########################## # Fuzzy classification ## Fuzzy classification with complete assignment to one group Q1988fuzz <- Q1988[-c(1:2), c("id", "phantomDam", "phantomSire")] Qfnull <- matrix(c(1,0,0,1,0, 0,1,1,0,1), nrow = 5, ncol = 2, dimnames = list(letters[1:5], c("g1", "g2"))) (Qfuzznull <- ggcontrib(Q1988fuzz, fuzz = Qfnull)) ## Should be identical to the non-fuzzy classification output # format (1) from above (Q <- ggcontrib(Q1988[-c(3:7), c(1,4,5)])) stopifnot(Qfuzznull == Q) ## Fuzzy classification with arbitrary assignments Qf <- matrix(c(1,0,0.5,0.5,0.5, 0,1,0.5,0.5,0.5), nrow = 5, ncol = 2, dimnames = list(letters[1:5], c("g1", "g2"))) (Qfuzz <- ggcontrib(Q1988fuzz, fuzz = Qf)) ## Using the pedigree and fuzzy classification in Fikse (2009) F2009fuzz <- data.frame(id = c(letters[1:7], LETTERS[1:6]), dam = c(rep(NA, 7), "a", "c", "e", "A", "C", "D"), sire = c(rep(NA, 7), "b", "d", "f", "B", "g", "E")) Ff <- matrix(c(1,0,1,0,0,0,0.2, 0,1,0,0.6,0,0.3,0.4, 0,0,0,0.4,1,0.7,0.4), nrow = 7, ncol = 3, dimnames = list(letters[1:7], paste0("g", 1:3))) # Actual Q matrix printed in Fikse (2009) Fikse2009Q <- matrix(c(0.5,0.5,0,0.5,0.1,0.3, 0.5,0.3,0.15,0.4,0.275,0.3375, 0,0.2,0.85,0.1,0.625,0.3625), nrow = 6, ncol = 3, dimnames = list(LETTERS[1:6], paste0("g", seq(3)))) Ffuzz <- ggcontrib(F2009fuzz, fuzz = Ff) (diffFfuzz <- Ffuzz - Fikse2009Q) # Encountering some rounding error stopifnot(length((drop0(diffFfuzz, tol = 1e-12))@x) == 0) ########################## # Non-fuzzy classification # format (1) from above Q1 <- Q1988[-c(3:7), c(1,4,5)] (gg1 <- ggcontrib(Q1, ggroups = NULL)) # note the warning message which is typical # format (2) from above Q2 <- Q1988[-c(1:7), 1:3] # arbitrarily assign individuals genetic groups for unknown parents ## Means gg2 is NOT comparable to gg1 or gg3! ggvec.in <- c("g1", "g2", "g1", NA) (gg2 <- ggcontrib(Q2, ggroups = ggvec.in)) # format (3) from above Q3 <- Q1988[-c(1:7), c(1,4,5)] gg3 <- ggcontrib(Q3, ggroups = c("g1", "g2")) stopifnot(gg1 == gg3)
The dataset was simulated using the simGG
function so that the
pedigree contains a base population comprised of founders and non-founder
immigrants. These data are then used in the main manuscript and tutorials
accompanying Wolak & Reid (2017).
ggTutorial
ggTutorial
A data.frame
with 6000 observations on the following 10
variables:
an integer vector specifying the 6000 unique individual identities
an integer vector specifying the unique dam for each individual
an integer vector specifying the unique sire for each individual
a numeric vector of the average autosomal total additive
genetic effects (u
) of each individual's parents
a numeric vector of the Mendelian sampling deviations
from parAvgU
autosomal total additive genetic effects that is
unique to each individual
a numeric vector of the total autosomal additive genetic
effects underlying p
a numeric vector of the residual (environmental) effects
underlying p
a numeric vector of phenotypic values
an integer vector with 0
for individuals born in the
focal population and 1
for individuals born outside of the
focal population, but immigrated
an integer vector specifying the generation in which each individual was born
The dataset was simulated as described in the ‘examples’ section
using the simGG
function. Full details of the function and
dataset can be found in Wolak & Reid (2017).
The data.frame
contains 6000 individuals across 15 generations. In
each generation, the carrying capacity is limited to 400 individuals, the
number of mating pairs limited to 200 pairs, and 40 immigrants per
generation arrive starting in the second generation.
The breeding values of the founders are drawn from a normal distribution with an expected mean of 0 and a variance of 1. The breeding values of all immigrants are drawn from a normal distribution with an expected mean of 3 and variance of 1. Consequently, the expected difference between mean breeding values in the founders and immigrants is 3. All individuals are assigned a residual (environmental) deviation that is drawn from a normal distribution with an expected mean of 0 and variance of 1.
Wolak, M.E. & J.M. 2017. Accounting for genetic differences among unknown parents in microevolutionary studies: how to include genetic groups in quantitative genetic animal models. Journal of Animal Ecology 86:7-20. doi:10.1111/1365-2656.12597
set.seed(102) #<-- seed value used originally library(nadiv) # create data using `simGG()` ggTutorial <- simGG(K = 400, pairs = 200, noff = 4, g = 15, nimm = 40, nimmG = seq(2, 14, 1), # nimmG default value VAf = 1, VAi = 1, VRf = 1, VRi = 1, # all default values mup = 20, muf = 0, mui = 3, murf = 0, muri = 0, # mup and mui non-default values d_bvf = 0, d_bvi = 0, d_rf = 0, d_ri = 0) # all default values
set.seed(102) #<-- seed value used originally library(nadiv) # create data using `simGG()` ggTutorial <- simGG(K = 400, pairs = 200, noff = 4, g = 15, nimm = 40, nimmG = seq(2, 14, 1), # nimmG default value VAf = 1, VAi = 1, VRf = 1, VRi = 1, # all default values mup = 20, muf = 0, mui = 3, murf = 0, muri = 0, # mup and mui non-default values d_bvf = 0, d_bvi = 0, d_rf = 0, d_ri = 0) # all default values
This function simulates effects for random terms in a linear mixed model based on relatedness matrices. The intended purpose is for simulating genetic and environmental effects from a pedigree.
grfx(n, G, incidence = NULL, output = "matrix", stdnorms = NULL, warn = TRUE)
grfx(n, G, incidence = NULL, output = "matrix", stdnorms = NULL, warn = TRUE)
n |
The number of individuals for which to simulate effects |
G |
The variance-covariance matrix to model the effects after |
incidence |
A matrix of the covariance structure of the 'n' individuals
or the Cholesky factorization of class |
output |
Format for the output |
stdnorms |
Standard normal deviates to use |
warn |
Should a warning message be produced when the function interprets
what to do based on the object class supplied to |
The total number of effects simulated will be n*d, where d is the number of
columns in the 'G' matrix. The standard normal deviates can be supplied
instead of generated within the function when stdnorms != NULL
. The
length of this vector must be n*nrow(G)
.
Supplied incidence matrices should be n-by-n symmetric matrices or cholesky
factorizations that resulted from a call to Matrix::Cholesky()
. For
simulated random effects using design matrices, see drfx
. If
no incidence matrix is supplied, incidence = NULL
, the Identity matrix
is used, which assumes that all 'n' random effects are independently and
identically distributed (default to Identity matrix).
See examples for how to make and use a Cholesky factorized incidence matrix,
for instance in a Monte Carlo simulation. Whether such an approach results
in performance of speed improvements within the Monte Carlo simulation, by
avoiding a Cholesky decomposition of a large matrix at each iteration, has
not been tested. Setting warn = FALSE
will suppress the warnings that
the function is assuming a Cholesky factorization is contained in the object
supplied to the incidence
argument. Currently, Cholesky factorizations
must inherit from the class “CHMfactor”.
If G = x, where 'x' is a single number, then 'x' should still be specified
as a 1-by-1 matrix (e.g., matrix(x)
). Note, the G-matrix should
never have a structure which produces a correlation exactly equal to 1 or
-1. Instead, covariances should be specified so as to create a correlation
of slightly less than (greater than) 1 (-1). For example: 0.9999 or
-0.9999.
The random effects coerced to be in the format specified by output. The default is a "matrix".
MCMCglmm
, drfx
,
makeA
, makeAA
, makeD
,
makeDomEpi
, makeDsim
, makeS
# Create additive genetic breeding values for 2 uncorrelated traits # with different additive genetic variances A <- makeA(warcolak[1:200, 1:3]) Gmat <- matrix(c(20, 0, 0, 10), 2, 2) breedingValues <- grfx(n = 200, G = Gmat, incidence = A) # Now with a user supplied set of standard normal deviates snorms <- rnorm(nrow(warcolak[1:200,]) * ncol(Gmat)) breedingValues2a <- grfx(n = 200, G = Gmat, incidence = A, stdnorms = snorms) breedingValues2b <- grfx(n = 200, G = Gmat, incidence = A, stdnorms = snorms) identical(breedingValues2a, breedingValues2b) #<-- TRUE var(breedingValues2a) var(breedingValues2b) # User supplied Cholesky factorization of the incidence matrix from above cA <- Cholesky(A, LDL = FALSE, super = FALSE) inherits(cA, "CHMfactor") #<-- TRUE breedingValues3 <- grfx(n = 200, G = Gmat, incidence = cA, stdnorms = snorms) all.equal(breedingValues2a, breedingValues3) #<-- TRUE
# Create additive genetic breeding values for 2 uncorrelated traits # with different additive genetic variances A <- makeA(warcolak[1:200, 1:3]) Gmat <- matrix(c(20, 0, 0, 10), 2, 2) breedingValues <- grfx(n = 200, G = Gmat, incidence = A) # Now with a user supplied set of standard normal deviates snorms <- rnorm(nrow(warcolak[1:200,]) * ncol(Gmat)) breedingValues2a <- grfx(n = 200, G = Gmat, incidence = A, stdnorms = snorms) breedingValues2b <- grfx(n = 200, G = Gmat, incidence = A, stdnorms = snorms) identical(breedingValues2a, breedingValues2b) #<-- TRUE var(breedingValues2a) var(breedingValues2b) # User supplied Cholesky factorization of the incidence matrix from above cA <- Cholesky(A, LDL = FALSE, super = FALSE) inherits(cA, "CHMfactor") #<-- TRUE breedingValues3 <- grfx(n = 200, G = Gmat, incidence = cA, stdnorms = snorms) all.equal(breedingValues2a, breedingValues3) #<-- TRUE
Test the null hypothesis that the two models fit the data equally well.
LRTest(full, reduced, df = 1, boundaryCorrection = FALSE)
LRTest(full, reduced, df = 1, boundaryCorrection = FALSE)
full |
A numeric variable indicating the log-likelihood of the full model |
reduced |
A numeric variable indicating the log-likelihood of the reduced model |
df |
The number of degrees of freedom to use, representing the difference between the full and reduced model in the number of parameters estimated |
boundaryCorrection |
A logical argument indicating whether a boundary correction under one degree of freedom should be included. If the parameter that is dropped from the reduced model is estimated at the boundary of its parameter space in the full model, the boundary correction is often required. See Details for more. |
Boundary correction should be applied if the parameter that is dropped from
the full model was on the boundary of its parameter space. In this instance,
the distribution of the log-likelihood ratio test statistic is approximated
by a mix of chi-square distributions (Self and Liang 1987). A TRUE
value will implement the boundary correction for a one degree of freedom
test. This is equivalent to halving the p-value from a test using a
chi-square distribution with one degree of freedom (Dominicus et al. 2006).
Currently, the test assumes that both log-likelihoods are negative or both are positive and will stop if they are of opposite sign. The interpretation is that the model with a greater negative log-likelihood (closer to zero) or greater positive log-likelihood provides a better fit to the data.
a list
:
a numeric log-likelihood ratio test statistic
a numeric p-value given the lambda
tested against a
chi-squared distribution with the number of degrees of freedom as
specified. May have had a boundary correction applied.
a logical indicating if the p-value was derived
using a boundary correction. See Details
Self, S. G., and K. Y. Liang. 1987. Asymptotic properties of maximum likelihood estimators and likelihood ratio tests under nonstandard conditions. Journal of the American Statistical Association 82:605-610.
Dominicus, A., A. Skrondal, H. K. Gjessing, N. L. Pedersen, and J. Palmgren. 2006. Likelihood ratio tests in behavioral genetics: problems and solutions. Behavior Genetics 36:331-340.
# No boundary correction (noBC <- LRTest(full = -2254.148, reduced = -2258.210, df = 1, boundaryCorrection = FALSE)) # No boundary correction (withBC <- LRTest(full = -2254.148, reduced = -2258.210, df = 1, boundaryCorrection = TRUE)) stopifnot(noBC$Pval == 2*withBC$Pval)
# No boundary correction (noBC <- LRTest(full = -2254.148, reduced = -2258.210, df = 1, boundaryCorrection = FALSE)) # No boundary correction (withBC <- LRTest(full = -2254.148, reduced = -2258.210, df = 1, boundaryCorrection = TRUE)) stopifnot(noBC$Pval == 2*withBC$Pval)
This returns the additive relationship matrix in sparse matrix format.
makeA(pedigree)
makeA(pedigree)
pedigree |
A pedigree where the columns are ordered ID, Dam, Sire |
Missing parents (e.g., base population) should be denoted by either 'NA', '0', or '*'.
Used as a support function to makeD
.
See function makeAinv
for directly obtaining the inverse of
the additive genetic relationship matrix.
Returns A, or the numerator relationship matrix, in sparse matrix form.
makeA(Mrode2)
makeA(Mrode2)
Given a pedigree, the matrix of additive by additive genetic relatedness (AA) among all individuals in the pedigree is returned.
makeAA(pedigree)
makeAA(pedigree)
pedigree |
A pedigree where the columns are ordered ID, Dam, Sire |
Missing parents (e.g., base population) should be denoted by either 'NA', '0', or '*'.
The function first estimates the A matrix using makeA
, then it
calculates the Hadamard (element-wise) product of the A matrix with itself
(A # A).
a list
:
the AA matrix in sparse matrix form
the log determinant of the AA matrix
the inverse of the AA matrix in sparse matrix form
the three column form of the non-zero elements for the inverse of the AA matrix
makeAA(Mrode2)
makeAA(Mrode2)
This returns the inverse of the numerator relationship matrix (inverse additive genetic relatedness matrix). It can also be used to obtain coefficients of inbreeding for the pedigreed population.
makeAinv( pedigree, f = NULL, ggroups = NULL, fuzz = NULL, gOnTop = FALSE, det = TRUE, ... ) ## Default S3 method: makeAinv( pedigree, f = NULL, ggroups = NULL, fuzz = NULL, gOnTop = FALSE, det = TRUE, ... ) ## S3 method for class 'fuzzy' makeAinv( pedigree, f = NULL, ggroups = NULL, fuzz, gOnTop = FALSE, det = TRUE, ... ) makeGGAinv(pedigree, f = NULL, ggroups = NULL, det = TRUE, ...)
makeAinv( pedigree, f = NULL, ggroups = NULL, fuzz = NULL, gOnTop = FALSE, det = TRUE, ... ) ## Default S3 method: makeAinv( pedigree, f = NULL, ggroups = NULL, fuzz = NULL, gOnTop = FALSE, det = TRUE, ... ) ## S3 method for class 'fuzzy' makeAinv( pedigree, f = NULL, ggroups = NULL, fuzz, gOnTop = FALSE, det = TRUE, ... ) makeGGAinv(pedigree, f = NULL, ggroups = NULL, det = TRUE, ...)
pedigree |
A pedigree where the columns are ordered ID, Dam, Sire |
f |
A numeric indicating the level of inbreeding. See Details |
ggroups |
Either a vector with the unique name of each genetic group, or a numeric indicating the number of unique genetic groups. See Details for different ways to specify. Note, if NULL then the regular A-inverse will be constructed. Also, must be NULL if fuzz is non-NULL. |
fuzz |
A matrix containing the fuzzy classification of phantom parents into genetic groups. See Details. |
gOnTop |
A logical indicating if (when including genetic groups) the
A-inverse should be constructed with the ‘g’ genetic groups located
in the first ‘g’ rows and columns if |
det |
Logical, indicating if the (log) determinant of the A matrix should be returned |
... |
Arguments to be passed to methods |
Missing parents (e.g., base population) should be denoted by either 'NA', '0', or '*'.
The functions implement an adaptation of the Meuwissen and Luo (1992)
algorithm (particularly, following the description of the algorithm in
Mrode 2005) with some code borrowed from the inverseA
function by
Jarrod Hadfield in the MCMCglmm
package. Further, providing a
non-NULL argument to ggroups
incorporates the Quaas (1988) algorithm
for directly obtaining the augmented A-inverse matrix for genetic groups
into Meuwissen and Luo's (1992) algorithm, thereby, considering inbreeding
during the construction of the A-inverse. Further calculations needed for
the algorithm to incorporate inbreeding and genetic groups follow the theory
presented in VanRaden (1992). Alternatively, group-specific inverse
relatedness matrices can be formed with makeGGAinv
, see below.
At the moment, providing the inbreeding level of individuals or the base population has not been implemented. However, this argument is a placeholder for now.
Genetic groups can be incorporated into a single A-inverse by providing a value
to the ggroups
argument in makeAinv
. The value supplied to
ggroups
can either be (1) a single integer indicating the number of
unique genetic groups or (2) a character vector containing the name for each
genetic group. These are referred to as pedigree types "A" and "D",
respectively, and further details follow below.
(Type="A") the pedigree contains unique IDs for the 'g' genetic groups in the
first 'g' lines of the pedigree. The dam and sire of the genetic group rows
should contain missing values (e.g., NA, "0", or "*"). All individuals in the
pedigree should then have one of the ‘g’ genetic groups instead of an unknown
parent.
(Type="D") the pedigree contains only individuals in the ID column (no
genetic groups have an ID) and there should be no missing values for any dams
or sires. Instead, individuals for whom the dam and/or sire is unknown should
have one of the genetic groups identified in the vector supplied to
ggroups
as the dam or sire.
‘Fuzzy classification’ of genetic groups (Fikse 2009) can be
implemented if a ‘matrix’ (of class matrix
or Matrix
)
is supplied to the fuzzy
argument. The fuzzy classification matrix
must have row names matching all of the phantom parents in the pedigree and
the column names must be present and specify the genetic groups. The fuzzy
classification matrix essentially contains probability of group membership
for each phantom parent. Therefore, each row should sum to 1. The pedigree
must have an identity in a unique row for every phantom parent and cannot
have genetic groups as either identities (in the first column) or as dam or
sire (second and third columns). Further, if fuzzy classification is
desired, the function must specify ggroups = NULL
.
When genetic groups (including the case of fuzzy classification of genetic
groups) are included in the A-inverse matrix, the argument to gOnTop
specifies if the genetic group elements in the A-inverse should occupy the
top-left (gOnTop = TRUE
) or bottom-right (gOnTop = FALSE
) of
the matrix. Depending on how the software implementing an animal model
solves the mixed model equations, the equations for the genetic groups (and
thus the elements in the augmented A-inverse) should be the first or last
set of equations.
a list
:
the inverse of the additive genetic relationship matrix in sparse matrix form
the three column list of the non-zero elements for the
inverse of the additive genetic relationship matrix with attributes
rowNames
and geneticGroups
. attr(*, "rowNames")
links the integer for rows/columns to the ID column from the pedigree.
attr(*, "geneticGroups")
is a two element vector with the first
integer indicating how many genetic groups are included in the
pedigree. This last attribute is necessary for some software programs
to correctly specify the residual degrees of freedom when calculating
the log-likelihood in a model that implicitly fits fixed genetic group
effects.
the individual coefficients of inbreeding for each individual
in the pedigree (matches the order of the first/ID column of the
pedigree). If the pedigree contains ‘g’ genetic groups in the
first ‘g’ rows, then the first ‘g’ elements of f
are assigned 0. If the pedigree contains ‘p’ phantom parents in
the first ‘p’ rows, then the first ‘p’ elements of
f
are assigned 0.
the log determinant of the A matrix
the (non-zero) elements of the diagonal D matrix of the A=TDT'
decomposition. Contains the variance of Mendelian sampling. Matches
the order of the first/ID column of the pedigree. If the pedigree
contains ‘g’ genetic groups in the first ‘g’ rows, then
the first ‘g’ elements of f
are assigned 0. If the
pedigree contains ‘p’ phantom parents in the first ‘p’
rows, then the first ‘p’ elements of f
are assigned 0.
Fikse, F. 2009. Fuzzy classification of phantom parent groups in an animal model. Genetics Selection Evolution 41:42.
Meuwissen, T.H.E & Luo, Z. 1992. Computing inbreeding coefficients in large populations. Genetics, Selection, Evolution. 24:305-313.
Mrode, R.A. 2005. Linear Models for the Prediction of Animal Breeding Values, 2nd ed. Cambridge, MA: CABI Publishing.
Quaas, R.L. 1988. Additive genetic model with groups and relationships. Journal of Dairy Science. 71:1338-1345.
VanRaden, P.M. 1992. Accounting for inbreeding and crossbreeding in genetic evaluation of large populations. Journal of Dairy Science. 75:3136-3144.
## Without genetic groups ## makeAinv(Mrode2) ## With genetic groups ## ## Type A typeAped <- Q1988[-c(3:7), c("id", "damGG", "sireGG")] AstarA <- makeAinv(typeAped, ggroups = 2, gOnTop = FALSE)$Ainv ## Type D typeDped <- Q1988[-c(1:7), c("id", "damGG", "sireGG")] AstarD <- makeAinv(typeDped, ggroups = c("g1", "g2"), gOnTop = FALSE)$Ainv stopifnot(identical(AstarA, AstarD)) # Show that the augmented A-inverse with genetic groups # contains the normal A-inverse (i.e., without genetic groups) ## Augmented A-inverse with genetic groups ggAinv <- makeAinv(Mrode3[-c(1,2), c("calf", "damGG", "sireGG")], ggroups = c("g1", "g2"), gOnTop = FALSE)$Ainv noggAinv <- makeAinv(Mrode3[-c(1,2), c("calf", "dam", "sire")], ggroups = NULL)$Ainv # First 8 rows & columns of ggAinv are same as A-inverse without ## genetic groups ggAinv[1:8, 1:8] noggAinv stopifnot(all.equal(ggAinv[1:8, 1:8], structure(noggAinv, geneticGroups = NULL))) ## With fuzzy classification of genetic groups ## ## example in Fikse (2009) Fped <- F2009[-c(1:3), c("id", "phantomDam", "phantomSire")] Fped$id <- factor(Fped$id, levels = as.character(unique(Fped$id))) Ffuzz <- as.matrix(F2009[4:10, c("g1", "g2", "g3")]) dimnames(Ffuzz)[[1]] <- as.character(F2009[4:10, 1]) AstarF <- makeAinv(Fped, fuzz = Ffuzz, gOnTop = FALSE)$Ainv ## Show that A-inverse with fuzzy classification of genetic groups ### can be the same as genetic group A-inverse without fuzzy classification ### Create a 'null' fuzzy classification matrix for Q1988 pedigree QfuzzNull <- matrix(c(1,0,0,1,0, 0,1,1,0,1), nrow = 5, ncol = 2, dimnames = list(letters[1:5], c("g1", "g2"))) typeFped <- Q1988[-c(1:2), c("id", "phantomDam", "phantomSire")] AstarNullFuzzy <- makeAinv(typeFped, fuzz = QfuzzNull, gOnTop = FALSE)$Ainv # Same as above using either pedigree type 'A' or 'D' stopifnot(identical(AstarNullFuzzy, AstarA), identical(AstarNullFuzzy, AstarD)) ## With genetic groups ## ## Type A typeAped <- Q1988[-c(3:7), c("id", "damGG", "sireGG")] (AinvOutA <- makeGGAinv(typeAped, ggroups = 2)$Ainv) ## Type D typeDped <- Q1988[-c(1:7), c("id", "damGG", "sireGG")] (AinvOutD <- makeGGAinv(typeDped, ggroups = c("g1", "g2"))$Ainv) stopifnot(identical(AinvOutA, AinvOutD))
## Without genetic groups ## makeAinv(Mrode2) ## With genetic groups ## ## Type A typeAped <- Q1988[-c(3:7), c("id", "damGG", "sireGG")] AstarA <- makeAinv(typeAped, ggroups = 2, gOnTop = FALSE)$Ainv ## Type D typeDped <- Q1988[-c(1:7), c("id", "damGG", "sireGG")] AstarD <- makeAinv(typeDped, ggroups = c("g1", "g2"), gOnTop = FALSE)$Ainv stopifnot(identical(AstarA, AstarD)) # Show that the augmented A-inverse with genetic groups # contains the normal A-inverse (i.e., without genetic groups) ## Augmented A-inverse with genetic groups ggAinv <- makeAinv(Mrode3[-c(1,2), c("calf", "damGG", "sireGG")], ggroups = c("g1", "g2"), gOnTop = FALSE)$Ainv noggAinv <- makeAinv(Mrode3[-c(1,2), c("calf", "dam", "sire")], ggroups = NULL)$Ainv # First 8 rows & columns of ggAinv are same as A-inverse without ## genetic groups ggAinv[1:8, 1:8] noggAinv stopifnot(all.equal(ggAinv[1:8, 1:8], structure(noggAinv, geneticGroups = NULL))) ## With fuzzy classification of genetic groups ## ## example in Fikse (2009) Fped <- F2009[-c(1:3), c("id", "phantomDam", "phantomSire")] Fped$id <- factor(Fped$id, levels = as.character(unique(Fped$id))) Ffuzz <- as.matrix(F2009[4:10, c("g1", "g2", "g3")]) dimnames(Ffuzz)[[1]] <- as.character(F2009[4:10, 1]) AstarF <- makeAinv(Fped, fuzz = Ffuzz, gOnTop = FALSE)$Ainv ## Show that A-inverse with fuzzy classification of genetic groups ### can be the same as genetic group A-inverse without fuzzy classification ### Create a 'null' fuzzy classification matrix for Q1988 pedigree QfuzzNull <- matrix(c(1,0,0,1,0, 0,1,1,0,1), nrow = 5, ncol = 2, dimnames = list(letters[1:5], c("g1", "g2"))) typeFped <- Q1988[-c(1:2), c("id", "phantomDam", "phantomSire")] AstarNullFuzzy <- makeAinv(typeFped, fuzz = QfuzzNull, gOnTop = FALSE)$Ainv # Same as above using either pedigree type 'A' or 'D' stopifnot(identical(AstarNullFuzzy, AstarA), identical(AstarNullFuzzy, AstarD)) ## With genetic groups ## ## Type A typeAped <- Q1988[-c(3:7), c("id", "damGG", "sireGG")] (AinvOutA <- makeGGAinv(typeAped, ggroups = 2)$Ainv) ## Type D typeDped <- Q1988[-c(1:7), c("id", "damGG", "sireGG")] (AinvOutD <- makeGGAinv(typeDped, ggroups = c("g1", "g2"))$Ainv) stopifnot(identical(AinvOutA, AinvOutD))
This returns the inverse of the additive genetic relationship matrix with
genetic groups (A*). The matrix is set up through matrix multiplication of
two sub-matrices instead of directly (as makeAinv
does).
makeAstarMult(pedigree, ggroups, fuzz = NULL, gOnTop = FALSE)
makeAstarMult(pedigree, ggroups, fuzz = NULL, gOnTop = FALSE)
pedigree |
A pedigree where the columns are ordered ID, Dam, Sire |
ggroups |
Either a vector with the unique name of each genetic group, or a numeric indicating the number of unique genetic groups. See Details for different ways to specify. Note, cannot be NULL. |
fuzz |
A matrix containing the fuzzy classification of individuals into genetic groups. |
gOnTop |
A logical indicating if the A-inverse should be constructed with the ‘g’ genetic groups located in the first ‘g’ rows and columns if TRUE, else the ‘g’ genetic groups are located in the last ‘g’ rows and columns of A-inverse. |
Missing parents (e.g., base population) should be denoted by either 'NA', '0', or '*'.
The function implements the matrix multiplication, using sub-matrices
Q
and A^-1
, as detailed in Quaas (1988, pp. 1342-1343).
Genetic groups can be incorporated into the A-inverse by providing a value
to the ggroups
argument. The value supplied to ggroups
can
either be (1) a single integer indicating the number of unique genetic
groups or (2) a character vector containing the name for each genetic group.
These are referred to as pedigree types "A" and "D", respectively, and
further details follow below. (Type="A") the pedigree contains unique IDs
for the 'g' genetic groups in the first 'g' lines of the pedigree. The dam
and sire of the genetic group rows should contain missing values (e.g., NA,
"0", or "*"). All individuals in the pedigree should then have one of the
'g' genetic groups instead of an unknown parent. (Type="D") the pedigree
contains only individuals in the ID column (no genetic groups have an ID)
and there should be no missing values for any dams or sires. Instead,
individuals for whom the dam and/or sire is unknown should have one of the
genetic groups identified in the vector supplied to ggroups
as the
dam or sire.
Fuzzy classification of genetic groups is implemented when fuzz
is
non-NULL.
The argument to gOnTop
specifies if the elements in the A-inverse
should come at the beginning (gOnTop = TRUE
) or end (gOnTop =
FALSE
) of the matrix. Depending on how the software implementing an animal
model solves the mixed model equations, the equations for the genetic groups
(and thus the elements in the augmented A-inverse) should be the first or
last set of equations.
See function makeAinv
for directly obtaining the inverse of
the additive genetic relationship matrix with genetic groups.
Returns A*, or the inverse of the numerator relationship with groups, in sparse matrix form.
Quaas, R.L. 1988. Additive genetic model with groups and relationships. Journal of Dairy Science. 71:1338-1345.
# Using the Q1988 dataset in nadiv ## assign a null fuzzy classification matrix QfuzzNull <- matrix(c(1,0,0,1,0, 0,1,1,0,1), nrow = 5, ncol = 2, dimnames = list(letters[1:5], c("g1", "g2"))) # Type A ## no fuzzy classification Astar_A <- makeAstarMult(Q1988[-c(3:7), c(1,4,5)], ggroups = 2) ## with fuzzy classification Astar_Afuzzy <- makeAstarMult(Q1988[, c(1, 6, 7)], ggroups = 2, fuzz = QfuzzNull) # Type D ## no fuzzy classification Astar_D <- makeAstarMult(Q1988[-c(1:7), c(1, 4, 5)], ggroups = c("g1", "g2")) ## with fuzzy classification Astar_Dfuzzy <- makeAstarMult(Q1988[-c(1:2), c(1, 6, 7)], ggroups = c("g1", "g2"), fuzz = QfuzzNull) # Obtain the matrix directly ## no fuzzy classification Astar_direct <- makeAinv(Q1988[-c(3:7), c(1,4,5)], ggroups = 2)$Ainv stopifnot(length(drop0(round(Astar_direct - (Astar_A - Astar_Afuzzy) - (Astar_D - Astar_Dfuzzy) - Astar_direct, 10))@x) == 0) ## with fuzzy classification Astar_directF <- makeAinv(Q1988[-c(1:2), c(1, 6, 7)], fuzz = QfuzzNull)$Ainv stopifnot(length(drop0(round(Astar_directF - (Astar_A - Astar_Afuzzy) - (Astar_D - Astar_Dfuzzy) - Astar_direct, 10))@x) == 0)
# Using the Q1988 dataset in nadiv ## assign a null fuzzy classification matrix QfuzzNull <- matrix(c(1,0,0,1,0, 0,1,1,0,1), nrow = 5, ncol = 2, dimnames = list(letters[1:5], c("g1", "g2"))) # Type A ## no fuzzy classification Astar_A <- makeAstarMult(Q1988[-c(3:7), c(1,4,5)], ggroups = 2) ## with fuzzy classification Astar_Afuzzy <- makeAstarMult(Q1988[, c(1, 6, 7)], ggroups = 2, fuzz = QfuzzNull) # Type D ## no fuzzy classification Astar_D <- makeAstarMult(Q1988[-c(1:7), c(1, 4, 5)], ggroups = c("g1", "g2")) ## with fuzzy classification Astar_Dfuzzy <- makeAstarMult(Q1988[-c(1:2), c(1, 6, 7)], ggroups = c("g1", "g2"), fuzz = QfuzzNull) # Obtain the matrix directly ## no fuzzy classification Astar_direct <- makeAinv(Q1988[-c(3:7), c(1,4,5)], ggroups = 2)$Ainv stopifnot(length(drop0(round(Astar_direct - (Astar_A - Astar_Afuzzy) - (Astar_D - Astar_Dfuzzy) - Astar_direct, 10))@x) == 0) ## with fuzzy classification Astar_directF <- makeAinv(Q1988[-c(1:2), c(1, 6, 7)], fuzz = QfuzzNull)$Ainv stopifnot(length(drop0(round(Astar_directF - (Astar_A - Astar_Afuzzy) - (Astar_D - Astar_Dfuzzy) - Astar_direct, 10))@x) == 0)
Given a pedigree, the matrix of coefficients of fraternity are returned - the D matrix for autosomes and the Sd matrix for sex chromosomes. Note, inbreeding is not directly incorporated into the calculation of the coefficients (see Details). Functions will return the inverses of the D and Sd matrices by default, otherwise this operation can be skipped if desired.
makeD( pedigree, parallel = FALSE, ncores = getOption("mc.cores", 2L), invertD = TRUE, returnA = FALSE, det = TRUE, verbose = TRUE ) makeSd( pedigree, heterogametic, DosageComp = c(NULL, "ngdc", "hori", "hedo", "hoha", "hopi"), parallel = FALSE, ncores = getOption("mc.cores", 2L), invertSd = TRUE, returnS = FALSE, det = TRUE, verbose = TRUE )
makeD( pedigree, parallel = FALSE, ncores = getOption("mc.cores", 2L), invertD = TRUE, returnA = FALSE, det = TRUE, verbose = TRUE ) makeSd( pedigree, heterogametic, DosageComp = c(NULL, "ngdc", "hori", "hedo", "hoha", "hopi"), parallel = FALSE, ncores = getOption("mc.cores", 2L), invertSd = TRUE, returnS = FALSE, det = TRUE, verbose = TRUE )
pedigree |
A pedigree with columns organized: ID, Dam, Sire. For use
with |
parallel |
Logical, indicating whether computation should be run on multiple processors at once. See details for considerations. |
ncores |
Number of cores to use when parallel = TRUE. Default is maximum available. Otherwise, set with an integer. See details for considerations. |
invertD , invertSd
|
A logical indicating whether or not to invert the D or S matrix |
returnA , returnS
|
Logical, indicating if the numerator relationship matrix (A or S) should be stored and returned. |
det |
Logical, indicating if the determinant of the D or Sd matrix should be returned. |
verbose |
Logical, indicating if progress messages should be displayed. |
heterogametic |
Character indicating the label corresponding to the heterogametic sex used in the "Sex" column of the pedigree |
DosageComp |
A character indicating which model of dosage compensation.
If |
Missing parents (e.g., base population) should be denoted by either 'NA', '0', or '*'.
There exists no convenient method of obtaining the inverse of the dominance
genetic relatedness matrix (or the D matrix itself) directly from a pedigree
(such as for the inverse of A, i.e., Quaas (1995)). Therefore, these
functions computes the coefficient of fraternity (Lynch and Walsh, 1998) for
every individual in the pedigree with a non-zero additive genetic
relatedness in the case of autosomes (makeD
) or for the homogametic
sex only in the case of sex chromosomes (makeSd
, because the
heterogametic sex has only one copy of the shared sex chromosome and
therefore cannot express dominance allelic interactions).
The coefficients of fraternity are only approximations that assume no inbreeding. The algorithm used here, however, incorporates inbreeding into the calculation of coefficients of coancestry (using 'makeA()') that are used to calculate coefficients of fraternity. Similarly, the diagonals of the D and Sd matrices are corrected for inbreeding. Meaning, the diagonals of D and Sd are (1-f) so that the overall dominance genetic variance is equal to (1-f)V_D, where f is the coefficient of inbreeding and V_D is dominance genetic variance. This is interpreted as the amount of dominance genetic variance that would be expected if the allele frequencies in the inbred population were representative of a non-inbred, randomly mating population (Shaw et al. 1998; Wolak and Keller 2014). Note, the construction of the D matrix is more computationally demanding (in computing time and space requirements) than is the construction of A. This is possibly also the case for construction of Sd in comparison to the S matrix.
To overcome the computational difficulties, this function can enable
parallel processing (see package parallel
included in the R
distribution) to speed up the execution. Note this is not be possible on
Windows (See parallel
documentation for further information),
therefore parallel
= TRUE should only be used on Linux or Mac
operating systems (i.e., not Windows). The default is to use the maximum
number of cpus available to the machine, but this can be restricted by
indicating the number desired in the argument ncores
. Setting up the
multi-processing takes some overhead, so no real advantage is gained for
small pedigrees. Also, since all processes are sharing a fixed amount of
RAM, very large pedigrees using many processes in parallel may not be
feasible due to RAM restrictions (i.e., if each process needs "n" amount of
RAM to run, then ncores
should be set to = total RAM/n). Otherwise
the machine can become overworked.
Note, for very large pedigrees returnA
or returnS
should be
set to FALSE to avoid drastically increasing the memory requirements while
making D or Sd, respectively. When this occurs, 'NULL' is returned for the
element of 'A' in the output of makeD
or for the element of 'S' in
the output of makeSd
.
a list
:
the A or S matrix in sparse matrix form
the D or Sd matrix in sparse matrix form
the log determinant of the D or Sd matrix
the inverse of the D or inverse of the Sd matrix in sparse matrix form
the three column form of the non-zero elements for the inverse of the D or the inverse of the Sd matrix
Quaas, R.L. 1995. Fx algorithms. An unpublished note.
Lynch M., & Walsh, B. 1998. Genetics and Analysis of Quantitative Traits. Sinauer, Sunderland, Massachusetts.
Shaw, R.G., D.L. Byers, and F.H. Shaw. 1998. Genetic components of variation in Nemophila menziesii undergoing inbreeding: Morphology and flowering time. Genetics. 150:1649-1661.
Wolak, M.E. and L.F. Keller. 2014. Dominance genetic variance and inbreeding in natural populations. In Quantitative Genetics in the Wild, A. Charmantier, L.E.B. Kruuk, and D. Garant eds. Oxford University Press, pp. 104-127.
DinvMat <- makeD(Mrode9, parallel = FALSE)$Dinv SdinvMat <- makeSd(FG90, heterogametic = "0", parallel = FALSE)$Sdinv # Check to make sure getting correct elements ## `simPedDFC()` for pedigree with 4 unique sex-linked dominance relatedness values uSdx <- unique(makeSd(simPedDFC(3), heterogametic = "M", returnS = FALSE)$Sd@x) stopifnot(all(uSdx %in% c(1, 0.5, 3/16, 1/16))) #<-- must match one of these 4
DinvMat <- makeD(Mrode9, parallel = FALSE)$Dinv SdinvMat <- makeSd(FG90, heterogametic = "0", parallel = FALSE)$Sdinv # Check to make sure getting correct elements ## `simPedDFC()` for pedigree with 4 unique sex-linked dominance relatedness values uSdx <- unique(makeSd(simPedDFC(3), heterogametic = "M", returnS = FALSE)$Sd@x) stopifnot(all(uSdx %in% c(1, 0.5, 3/16, 1/16))) #<-- must match one of these 4
Given a pedigree, the matrix of additive by dominance (AD) genetic relatedness, dominance by dominance (DD) genetic relatedness, or both are returned.
makeDomEpi( pedigree, output = c("AD", "DD", "both"), parallel = FALSE, invertD = FALSE, det = TRUE )
makeDomEpi( pedigree, output = c("AD", "DD", "both"), parallel = FALSE, invertD = FALSE, det = TRUE )
pedigree |
A pedigree where the columns are ordered ID, Dam, Sire |
output |
Character(s) denoting which matrix and its inverse is to be constructed. |
parallel |
A logical indicating whether or not to use parallel processing. Note, this may only be available on Mac and Linux operating systems. |
invertD |
A logical indicating whether or not to invert the D matrix |
det |
A logical indicating whether or not to return the determinants for the epistatic relationship matrices |
Missing parents (e.g., base population) should be denoted by either 'NA', '0', or '*'.
Because of the computational demands of constructing the D matrix (see
makeD
), this function allows for the inverses that are derived
from the D matrix (i.e., D-inverse, AD-inverse, and DD-inverse)to be
constructed at the same time. This way, the D matrix will only have to be
constructed once for use in the three separate genetic relatedness inverse
matrices that depend upon it. However, using the output
and
invertD
options in different combinations will ensure that only the
desired matrix inverses are constructed.
parallel
= TRUE should only be used on Linux or Mac OSes (i.e., not
Windows).
Both the AD and DD matrix are computed from the Hadamard product of the
respective matrices (see also, makeAA
).
All of the following will be returned. However, the values of the
output
and invertD
options passed to the function will
determine which of the following are not NULL objects within the list:
the D matrix in sparse matrix form
the log determinant of the D matrix
the AD matrix in sparse matrix form
the log determinant of the AD matrix
the DD matrix in sparse matrix form
the log determinant of the DD matrix
the inverse of the D matrix in sparse matrix form
the inverse of the AD matrix in sparse matrix form
the inverse of the DD matrix in sparse matrix form
the three column form of the non-zero elements for the inverse of the D matrix
the three column form of the non-zero elements for the inverse of the AD matrix
the three column form of the non-zero elements for the inverse of the DD matrix
Boutput <- makeDomEpi(Mrode9, output = "b", parallel = FALSE, invertD = FALSE) str(Boutput) DADoutput <- makeDomEpi(Mrode9, output = "AD", parallel = FALSE, invertD = TRUE) str(DADoutput)
Boutput <- makeDomEpi(Mrode9, output = "b", parallel = FALSE, invertD = FALSE) str(Boutput) DADoutput <- makeDomEpi(Mrode9, output = "AD", parallel = FALSE, invertD = TRUE) str(DADoutput)
Alleles are explicitly traced through a pedigree to obtain coefficients of fraternity between pairs of individuals (the probability of sharing both alleles identical by descent) - for either autosomes or sex chromosomes. This is accomplished in an iterative process to account for the various routes by which an allele will progress through a pedigree due to Mendelian sampling at either autosomes or sex chromosomes. The autosomal case is an implementation of the simulation approach of Ovaskainen et al. (2008).
makeDsim( pedigree, N, parallel = FALSE, ncores = getOption("mc.cores", 2L), invertD = TRUE, calcSE = FALSE, returnA = FALSE, verbose = TRUE ) makeSdsim( pedigree, heterogametic, N, DosageComp = c(NULL, "ngdc", "hori", "hedo", "hoha", "hopi"), parallel = FALSE, ncores = getOption("mc.cores", 2L), invertSd = TRUE, calcSE = FALSE, returnS = FALSE, verbose = TRUE )
makeDsim( pedigree, N, parallel = FALSE, ncores = getOption("mc.cores", 2L), invertD = TRUE, calcSE = FALSE, returnA = FALSE, verbose = TRUE ) makeSdsim( pedigree, heterogametic, N, DosageComp = c(NULL, "ngdc", "hori", "hedo", "hoha", "hopi"), parallel = FALSE, ncores = getOption("mc.cores", 2L), invertSd = TRUE, calcSE = FALSE, returnS = FALSE, verbose = TRUE )
pedigree |
A pedigree with columns organized: ID, Dam, Sire. For use
with |
N |
The number of times to iteratively trace alleles through the pedigree |
parallel |
A logical indicating whether or not to use parallel processing. Note, this may only be available for Mac and Linux operating systems. |
ncores |
The number of cpus to use when constructing the dominance relatedness matrix. Default is all available. |
invertD , invertSd
|
A logical indicating whether or not to invert the D or Sd matrix |
calcSE |
A logical indicating whether or not the standard errors for each coefficient of fraternity should be calculated |
returnA , returnS
|
Logical, indicating if the numerator relationship matrix (A or S) should be stored and returned. |
verbose |
Logical, indicating if progress messages should be displayed. |
heterogametic |
Character indicating the label corresponding to the heterogametic sex used in the "Sex" column of the pedigree |
DosageComp |
A character indicating which model of dosage compensation.
If |
Missing parents (e.g., base population) should be denoted by either 'NA', '0', or '*'.
parallel
= TRUE should only be used on Linux or Mac operating systems
(i.e., not Windows).
Ovaskainen et al. (2008) indicated that the method of calculating the D
matrix (see makeD
) is only an approximation. They proposed a
simulation method that is implemented here. This should be more
appropriate, especially when inbreeding occurs in the pedigree.
The objects listDsim
and listSdsim
will list both the
approximate values (returned from makeD
or
makeSd
) as well as the simulated values. If calcSE
is
TRUE, these values will be listed in listDsim
or listSdsim
.
a list
:
the A or S matrix in sparse matrix form
the approximate D or Sd matrix in sparse matrix form
the log determinant of the D or Sd matrix
the inverse of the approximate D or approximate Sd matrix in sparse matrix form
the three column form of the non-zero elements for the inverse of the approximate D matrix or the inverse of the approximate Sd matrix
the simulated D or Sd matrix in sparse matrix form
the log determinant of the simulated D or simulated Sd matrix
the inverse of the simulated D or simulated Sd matrix in sparse matrix form
the three column form of the non-zero and non-self elements for the simulated D or simulated Sd matrix
the three column form of the non-zero elements for the inverse of the simulated D or the inverse of the simulated Sd matrix
This simulation can take a long time for large pedigrees (a few
thousand and higher) and large values of N
(one thousand and
higher). If unsure, it is advisable to start with a lower N
and
gradually increase to obtain a sense of the time required to execute a
desired N
.
Ovaskainen, O., Cano, J.M., & Merila, J. 2008. A Bayesian framework for comparative quantitative genetics. Proceedings of the Royal Society B 275, 669-678.
simD <- makeDsim(Mrode9, N = 1000, parallel = FALSE, invertD = TRUE, calcSE = TRUE)$listDsim simSd <- makeSdsim(FG90, heterogametic = "0", N = 1000, parallel = FALSE, invertSd = TRUE, calcSE = TRUE)$listSdsim
simD <- makeDsim(Mrode9, N = 1000, parallel = FALSE, invertD = TRUE, calcSE = TRUE)$listDsim simSd <- makeSdsim(FG90, heterogametic = "0", N = 1000, parallel = FALSE, invertSd = TRUE, calcSE = TRUE)$listSdsim
This returns the (additive) mutational effects relationship matrix in sparse matrix format.
makeM(pedigree)
makeM(pedigree)
pedigree |
A pedigree where the columns are ordered ID, Dam, Sire |
Missing parents (e.g., base population) should be denoted by either 'NA', '0', or '*'.
See function makeMinv
for directly obtaining the inverse of
the (additive) mutational effects genetic relationship matrix.
Returns M, or the mutational effects relationship matrix, in sparse matrix form.
makeM(Mrode2)
makeM(Mrode2)
Returns the inverse of the (additive) mutational effects relationship matrix. It can also be used to obtain components needed for the calculations in the underlying algorithm.
makeMinv(pedigree, ...) makeMinvML(pedigree, ...)
makeMinv(pedigree, ...) makeMinvML(pedigree, ...)
pedigree |
A pedigree where the columns are ordered ID, Dam, Sire |
... |
Arguments to be passed to methods |
Missing parents (e.g., base population) should be denoted by either 'NA', '0', or '*'.
Note the assumption under the infinitesimal model, that mutation has essentially zero probability of affecting an inbred locus (hence removing inbred identity-by-descent), however, mutations may themselves be subject to inbreeding (Wray 1990).
By default, the algorithm described in Casellas and Medrano (2008) is
implemented here, in which the inverse-M is separate from the typical inverse
relatedness matrix (inverse-A). Casellas and Medrano's algorithm allows
separate partitioning of additive genetic variance attributed to inheritance
of allelic variation present in the base population (inverse-A) from
additive genetic variance arising from mutation and subsequent sharing of
mutant alleles identical-by-descent. Alternatively, Wray (1990) formulates
an algorithm which combines both of these processes (i.e., the A-inverse with
the M-inverse matrices). If the Wray algorithm is desired, this can be
implemented by specifying a numeric value to an argument named theta
.
The value used for theta
should be as described in Wray (1990). See
examples below for use of this argument.
a list
:
the inverse of the (additive) mutational effects relationship matrix in sparse matrix form
the three column list of the non-zero elements for the
inverse of the (additive) mutational effects relationship matrix.
attr(*, "rowNames")
links the integer for rows/columns to the ID
column from the pedigree.
the amount by which segregation variance is reduced by inbreeding. Similar to the individual coefficients of inbreeding (f) derived during the construction of the inverse numerator relatedness matrix. in the pedigree (matches the order of the first/ID column of the pedigree).
the log determinant of the M matrix
the (non-zero) elements of the diagonal D matrix of the M=TDT'
decomposition. Contains the variance of Mendelian sampling. Matches
the order of the first/ID column of the pedigree. Note Wray (1990) and
Casellas and Medrano (2008) algorithms use v=sqrt(dii)
.
Casellas, J. and J.F. Medrano. 2008. Within-generation mutation variance for litter size in inbred mice. Genetics. 179:2147-2155.
Meuwissen, T.H.E & Luo, Z. 1992. Computing inbreeding coefficients in large populations. Genetics, Selection, Evolution. 24:305-313.
Mrode, R.A. 2005. Linear Models for the Prediction of Animal Breeding Values, 2nd ed. Cambridge, MA: CABI Publishing.
Wray, N.A. 1990. Accounting for mutation effects in the additive genetic variance-covariance matrix and its inverse. Biometrics. 46:177-186.
## Example pedigree from Wray 1990 #### Implement Casellas & Medrano (2008) algorithm Mout <- makeMinv(Wray90[, 1:3]) #### Wray (1990) algorithm with extra argument `theta` Mwray <- makeMinv(Wray90[, 1:3], theta = 10.0)$Minv # compare to Wray p.184
## Example pedigree from Wray 1990 #### Implement Casellas & Medrano (2008) algorithm Mout <- makeMinv(Wray90[, 1:3]) #### Wray (1990) algorithm with extra argument `theta` Mwray <- makeMinv(Wray90[, 1:3], theta = 10.0)$Minv # compare to Wray p.184
The function returns the inverse of the additive relationship matrix in sparse matrix format for the sex chromosomes (e.g., either X or Z).
makeS( pedigree, heterogametic, DosageComp = c(NULL, "ngdc", "hori", "hedo", "hoha", "hopi"), returnS = FALSE )
makeS( pedigree, heterogametic, DosageComp = c(NULL, "ngdc", "hori", "hedo", "hoha", "hopi"), returnS = FALSE )
pedigree |
A pedigree where the columns are ordered ID, Dam, Sire, Sex |
heterogametic |
Character indicating the label corresponding to the heterogametic sex used in the “Sex” column of the pedigree |
DosageComp |
A character indicating which model of dosage compensation.
If |
returnS |
Logical statement, indicating if the relationship matrix should be constructed in addition to the inverse |
Missing parents (e.g., base population) should be denoted by either 'NA', '0', or '*'.
The inverse of the sex-chromosome additive genetic relationship matrix
(S-matrix) is constructed implementing the Meuwissen and Luo (1992)
algorithm to directly construct inverse additive relationship matrices
(borrowing code from Jarrod Hadfield's MCMCglmm function, inverseA
)
and using equations presented in Fernando & Grossman (1990; see Wolak et al.
2013). Additionally, the S-matrix itself can be constructed (although this
takes much longer than computing S-inverse directly).
The choices of dosage compensation models are: no global dosage compensation ("ngdc"), random inactivation in the homogametic sex ("hori"), doubling of the single shared sex chromosome in the heterogametic sex ("hedo"), halving expression of both sex chromosomes in the homogametic sex ("hoha"), or inactivation of the paternal sex chromosome in the homogametic sex ("hopi").
a list
:
the model of sex-chromosome dosage compensation assumed.
the sex-chromosome relationship matrix in sparse matrix
form or NULL if returnS
= FALSE
the log determinant of the S matrix
the inverse of the S matrix in sparse matrix form
the three column form of the non-zero elements for the inverse of the S matrix
the sex-linked inbreeding coefficients for all individuals in the pedigree
a vector of the (non-zero) elements of the diagonal V matrix of the S=TVT' decomposition. Contains the variance of Mendelian sampling for a sex-linked locus
Wolak, M.E., D.A. Roff, and D.J. Fairbairn. in prep. The contribution of sex chromosomal additive genetic (co)variation to the phenotypic resemblance between relatives under alternative models of dosage compensation.
Fernando, R.L. & Grossman, M. 1990. Genetic evaluation with autosomal and X-chromosomal inheritance. Theoretical and Applied Genetics, 80:75-80.
Meuwissen, T.H.E. and Z. Luo. 1992. Computing inbreeding coefficients in large populations. Genetics, Selection, Evolution, 24:305-313.
makeS(FG90, heterogametic = "0", returnS = TRUE)
makeS(FG90, heterogametic = "0", returnS = TRUE)
This returns the Cholesky decomposition of the numerator relationship matrix and its inverse. It can also be used to obtain coefficients of inbreeding for the pedigreed population.
makeTinv(pedigree, ...) ## Default S3 method: makeTinv(pedigree, ...) ## S3 method for class 'numPed' makeTinv(pedigree, ...) ## Default S3 method: makeT(pedigree, genCol = NULL, ...) ## Default S3 method: makeDiiF(pedigree, f = NULL, ...) ## S3 method for class 'numPed' makeDiiF(pedigree, f = NULL, ...)
makeTinv(pedigree, ...) ## Default S3 method: makeTinv(pedigree, ...) ## S3 method for class 'numPed' makeTinv(pedigree, ...) ## Default S3 method: makeT(pedigree, genCol = NULL, ...) ## Default S3 method: makeDiiF(pedigree, f = NULL, ...) ## S3 method for class 'numPed' makeDiiF(pedigree, f = NULL, ...)
pedigree |
A pedigree where the columns are ordered ID, Dam, Sire |
... |
Arguments to be passed to methods |
genCol |
An integer value indicating the generation up to which the
|
f |
A numeric vector indicating the level of inbreeding. See Details |
Missing parents (e.g., base population) should be denoted by either 'NA', '0', or '*'.
The function implements an adaptation of the Meuwissen and Luo (1992)
algorithm (particularly, following the description of the algorithm in
Mrode 2005) with some code borrowed from the inverseA
function by
Jarrod Hadfield in the MCMCglmm
package.
The inbreeding level of individuals can be provided instead of calculated.
f
must be a vector that is the same length as individuals in the
pedigree. Supplied coefficients of inbreeding are used instead of being
calculated until a NA
is encountered in the vector. From this position
on, then coefficients of inbreeding are calculated and replace entries in
f
. This can be used, for example, to calculate coefficients of
inbreeding for later generations when coefficients of inbreeding in the
previous generations have already been calculated. To specify an average
coefficient of inbreeding for the base population, modify the pedigree to
include a single phantom parent and specify this individual's non-zero
coefficient of inbreeding in f
with the rest of the terms as NA.
a list
:
the inverse of the Cholesky decomposition of the additive genetic relationship matrix (Ainv=Tinv' Dinv Tinv) in sparse matrix form
the diagonal D matrix of the A=TDT' Cholesky decomposition. Contains the variance of Mendelian sampling. Matches the order of the first/ID column of the pedigree.
the individual coefficients of inbreeding for each individual in the pedigree (matches the order of the first/ID column of the pedigree).
Meuwissen, T.H.E & Luo, Z. 1992. Computing inbreeding coefficients in large populations. Genetics, Selection, Evolution. 24:305-313.
Mrode, R.A. 2005. Linear Models for the Prediction of Animal Breeding Values, 2nd ed. Cambridge, MA: CABI Publishing.
Tinv <- makeTinv(Mrode2) # Method for a numeric pedigree (of `nadiv` class "numPed") nPed <- numPed(Mrode2) Tinv2 <- makeTinv(nPed) ######## DF <- makeDiiF(Mrode2) # manually construct the inverse of the relatedness matrix `Ainv` Dinv <- DF$D #<-- not the inverse yet, just copying the object Dinv@x <- 1 / DF$D@x #<-- inverse of a diagonal matrix handAinv <- crossprod(Tinv, Dinv) %*% Tinv # make the A-inverse directly Ainv <- makeAinv(Mrode2)$Ainv # Compare handAinv Ainv stopifnot(all(abs((Ainv - handAinv)@x) < 1e-6)) # supply previous generation coefficients of inbreeding (f) ## to keep from re-calculating their f when analyzing subsequent generations DF <- makeDiiF(Mrode2[, 1:3]) Mrode2$gen <- genAssign(Mrode2) Mrode2$f_full <- DF$f Mrode2$f_in <- with(Mrode2, c(f_full[gen <= 1], rep(NA, sum(gen > 1)))) DF2 <- makeDiiF(Mrode2[, 1:3], f = Mrode2$f_in) stopifnot(identical(DF, DF2))
Tinv <- makeTinv(Mrode2) # Method for a numeric pedigree (of `nadiv` class "numPed") nPed <- numPed(Mrode2) Tinv2 <- makeTinv(nPed) ######## DF <- makeDiiF(Mrode2) # manually construct the inverse of the relatedness matrix `Ainv` Dinv <- DF$D #<-- not the inverse yet, just copying the object Dinv@x <- 1 / DF$D@x #<-- inverse of a diagonal matrix handAinv <- crossprod(Tinv, Dinv) %*% Tinv # make the A-inverse directly Ainv <- makeAinv(Mrode2)$Ainv # Compare handAinv Ainv stopifnot(all(abs((Ainv - handAinv)@x) < 1e-6)) # supply previous generation coefficients of inbreeding (f) ## to keep from re-calculating their f when analyzing subsequent generations DF <- makeDiiF(Mrode2[, 1:3]) Mrode2$gen <- genAssign(Mrode2) Mrode2$f_full <- DF$f Mrode2$f_in <- with(Mrode2, c(f_full[gen <= 1], rep(NA, sum(gen > 1)))) DF2 <- makeDiiF(Mrode2[, 1:3], f = Mrode2$f_in) stopifnot(identical(DF, DF2))
Pedigree from Table 2.1 of Mrode (2005)
Mrode2
Mrode2
A data.frame
with 6 observations on the following 3 variables:
a numeric vector
a numeric vector
a numeric vector
Mrode, R.A. 2005. Linear Models for the Prediction of Animal Breeding Values, 2nd ed. Cambridge, MA: CABI Publishing.
str(Mrode2)
str(Mrode2)
Pedigree, from chapter 3 of Mrode (2005) with genetic groups and a trait column
Mrode3
Mrode3
A data.frame
with 10 observations on the following 8 variables:
a factor with levels indicating the unique genetic groups and individuals
a numeric vector of maternal identities
a numeric vector of paternal identities
a factor of maternal identities with genetic groups
inserted instead of NA
a factor of paternal identities with genetic groups
inserted instead of NA
a factor with levels female
male
a numeric vector of pre-weaning weight gain (kg) for five beef calves
Mrode, R.A. 2005. Linear Models for the Prediction of Animal Breeding Values, 2nd ed. Cambridge, MA: CABI Publishing.
data(Mrode3) str(Mrode3)
data(Mrode3) str(Mrode3)
Pedigree, adapted from example 9.1 of Mrode (2005)
Mrode9
Mrode9
A data.frame
with 12 observations on the following 3 variables:
a numeric vector
a numeric vector
a numeric vector
Mrode, R.A. 2005. Linear Models for the Prediction of Animal Breeding Values, 2nd ed. Cambridge, MA: CABI Publishing.
data(Mrode9) str(Mrode9)
data(Mrode9) str(Mrode9)
Conversion, checking, and row re-ordering of a pedigree in integer form of class ‘numPed’.
numPed(pedigree, check = TRUE) ronPed(x, i, ...)
numPed(pedigree, check = TRUE) ronPed(x, i, ...)
pedigree |
A three column pedigree object, where the columns correspond to: ID, Dam, & Sire |
check |
A logical argument indicating if checks on the validity of the pedigree structure should be made, but see Details |
x |
A pedigree of class ‘ |
i , ...
|
Index specifying elements to extract or replace: see
|
Missing parents (e.g., base population) should be denoted by either 'NA', '0', '-998', or '*'.
Individuals must appear in the ID column in rows preceding where they
appear in either the Dam or Sire column. See the
prepPed
function if this is not the case.
If pedigree inherits the class "numPed" (from a previous call to
numPed()
) and check = TRUE
, the checks are skipped. If
check = FALSE
any pedigree will be transformed into a pedigree
consisting of integers and missing values denoted by '-998'.
Based on code from the MCMCglmm
package
An S3 object of class “numPed” representing the pedigree,
where individuals are now numbered from 1 to n
and unknown parents
are assigned a value of ‘-998’.
(nPed <- numPed(Mrode2)) class(nPed) # re-order and retain class 'numPed' ronPed(nPed, order(nPed[, 2], nPed[, 3])) class(nPed)
(nPed <- numPed(Mrode2)) class(nPed) # re-order and retain class 'numPed' ronPed(nPed, order(nPed[, 2], nPed[, 3])) class(nPed)
Mainly checks to ensure the variance components in a REML mixed model do not change between the last two iterations more than what is allowed by the tolerance value. See details for extra check on asreml-R models.
pcc(object, traces = NULL, tol = 0.01, silent = FALSE)
pcc(object, traces = NULL, tol = 0.01, silent = FALSE)
object |
A list with at least one element named: |
traces |
Optionally, a matrix to substitute instead of the monitor
element to |
tol |
The tolerance level for which to check against all of the changes in variance component parameter estimates |
silent |
Optional argument to silence the output of helpful (indicating default underlying behavior) messages |
Object is intended to be an asreml-R model output. NOTE, The first 3 rows are ignored and thus should not be variance components from the model (e.g., they should be the loglikelihood or degrees of freedom, etc.). Also, the last column is ignored and should not be an iteration of the model (e.g., it indicates the constraint).
The function also checks object
to ensure that the output from the
asreml-R model does not contain a log-likelihood value of exactly 0.00. An
ASReml model can sometimes fail while still returning a monitor
object and TRUE
value in the converge
element of the output.
This function will return FALSE
if this is the case.
Returns TRUE
if all variance parameters change less than the
value specified by tol
, otherwise returns FALSE
. Also see the
details
section for other circumstances when FALSE
might be
returned.
# Below is the last 3 iterations from the trace from an animal model of # tait1 of the warcolak dataset. # Re-create the output from a basic, univariate animal model in asreml-R tracein <- matrix(c(0.6387006, 1, 0.6383099, 1, 0.6383294, 1, 0.6383285, 1), nrow = 2, ncol = 4, byrow = FALSE) dimnames(tracein) <- list(c("ped(ID)!ped", "R!variance"), c(6, 7, 8, 9)) pcc(object = NULL, trace = tracein)
# Below is the last 3 iterations from the trace from an animal model of # tait1 of the warcolak dataset. # Re-create the output from a basic, univariate animal model in asreml-R tracein <- matrix(c(0.6387006, 1, 0.6383099, 1, 0.6383294, 1, 0.6383285, 1), nrow = 2, ncol = 4, byrow = FALSE) dimnames(tracein) <- list(c("ped(ID)!ped", "R!variance"), c(6, 7, 8, 9)) pcc(object = NULL, trace = tracein)
This function takes a pedigree, adds missing founders, and then sorts the pedigree.
prepPed(pedigree, gender = NULL, check = TRUE)
prepPed(pedigree, gender = NULL, check = TRUE)
pedigree |
An object, where the first 3 columns correspond to: ID, Dam, & Sire. See details. |
gender |
An optional character for the name of the column in
|
check |
A logical argument indicating if checks on the validity of the pedigree structure should be made |
Many functions (both in nadiv and from other programs) dealing with pedigrees must first sort a pedigree such that individuals appear in the ID column in rows preceding where they appear in either the Dam or Sire column. Further, these functions and programs require that all individuals in the dam and sire columns of a pedigree also have an entry in the ID column. This function easily prepares data sets to accommodate these requirements using a very fast topological sorting algorithm.
NOTE: more columns than just a pedigree can be passed in the pedigree
argument. In the case of missing founders, these columns are given NA
values for all rows where founders have been added to the pedigree. The
entire object supplied to pedigree
is ordered, ensuring that all
information remains connected to the individual
Missing parents (e.g., base population) should be denoted by either 'NA', '0', or '*'.
When a non-null argument is given to gender
, dams without an entry in
the ID column (that are subsequently added to the pedigree) are given the
gender designated for other dams (and similarly for sires).
The check
argument performs checks on the format of the pedigree
supplied to try and identify any issues regarding the notation of missing
values and validity of the basic pedigree for further processing.
The pedigree object (can have more columns than just ID, Dam, and Sire), where: (1) the ID column contains an ID for all individuals from the original pedigree object's ID, Dam, and Sire columns (i.e., founders are added) and (2) the pedigree is now sorted so that individuals are not in rows preceding either their Dam or Sire.
# First create an unordered pedigree with (4) missing founders warcolak_unsuitable <- warcolak[sample(seq(5, nrow(warcolak), 1), size = (nrow(warcolak) - 4), replace = FALSE), ] nrow(warcolak) nrow(warcolak_unsuitable) # Fix and sort the pedigree ## Automatically assign the correct gender to the added founders ### Also sort the data accompanying each individual warcolak_fixed_ordered <- prepPed(warcolak_unsuitable, gender = "sex") head(warcolak_fixed_ordered)
# First create an unordered pedigree with (4) missing founders warcolak_unsuitable <- warcolak[sample(seq(5, nrow(warcolak), 1), size = (nrow(warcolak) - 4), replace = FALSE), ] nrow(warcolak) nrow(warcolak_unsuitable) # Fix and sort the pedigree ## Automatically assign the correct gender to the added founders ### Also sort the data accompanying each individual warcolak_fixed_ordered <- prepPed(warcolak_unsuitable, gender = "sex") head(warcolak_fixed_ordered)
This function removes individuals who are either not themselves or not ancestors to phenotyped individuals
prunePed(pedigree, phenotyped, ...) ## Default S3 method: prunePed(pedigree, phenotyped, ...) ## S3 method for class 'numPed' prunePed(pedigree, phenotyped, ...)
prunePed(pedigree, phenotyped, ...) ## Default S3 method: prunePed(pedigree, phenotyped, ...) ## S3 method for class 'numPed' prunePed(pedigree, phenotyped, ...)
pedigree |
An object, where the first 3 columns correspond to: ID, Dam, & Sire. See details. |
phenotyped |
A vector indicating which individuals in the pedigree have phenotypic information available. |
... |
Arguments to be passed to methods |
Often mixed effect models run much faster when extraneous information is removed before running the model. This is particularly so when reducing the number of random effects associated with a relationship matrix constructed from a pedigree.
NOTE: more columns than just a pedigree can be passed in the pedigree
argument.
Missing parents (e.g., base population) should be denoted by either 'NA', '0', or '*'.
This function is very similar to (and the code is heavily borrowed from) a
function of the same name in the MCMCglmm
package by Jarrod Hadfield.
The pedigree object (can have more columns than just ID, Dam, and Sire), where the ID column contains an ID for all individuals who are actually phenotyped or are an ancestor to an individual with a phenotype (and are thus informative for estimating parameters in the base population).
# Make a pedigree (with sex) from the warcolak dataset warcolak_ped <- warcolak[, 1:4] # Reduce the number of individuals that have a phenotype for "trait1" in #the warcolak dataset t1phenotyped <- warcolak t1phenotyped[sample(seq.int(nrow(warcolak)), 1500, replace = FALSE), "trait1"] <- NA t1phenotyped <- t1phenotyped[which(!is.na(t1phenotyped$trait1)), ] # The following will give a pedigree with only individuals that have a # phenotype for "trait1" OR are an ancestor to a phenotyped individual. pruned_warcolak_ped <- prunePed(warcolak_ped, phenotyped = t1phenotyped$ID) # Now compare the sizes (note, pruned_warcolak_ped retained its column indicating sex. dim(warcolak_ped) dim(pruned_warcolak_ped) # We could have kept all of the data associated with individuals who had phenotypic # information on "trait1" by instead specifying pruned_fullt1_warcolak_ped <- prunePed(warcolak, phenotyped = t1phenotyped$ID) dim(pruned_fullt1_warcolak_ped) #<-- compare number of columns with above
# Make a pedigree (with sex) from the warcolak dataset warcolak_ped <- warcolak[, 1:4] # Reduce the number of individuals that have a phenotype for "trait1" in #the warcolak dataset t1phenotyped <- warcolak t1phenotyped[sample(seq.int(nrow(warcolak)), 1500, replace = FALSE), "trait1"] <- NA t1phenotyped <- t1phenotyped[which(!is.na(t1phenotyped$trait1)), ] # The following will give a pedigree with only individuals that have a # phenotype for "trait1" OR are an ancestor to a phenotyped individual. pruned_warcolak_ped <- prunePed(warcolak_ped, phenotyped = t1phenotyped$ID) # Now compare the sizes (note, pruned_warcolak_ped retained its column indicating sex. dim(warcolak_ped) dim(pruned_warcolak_ped) # We could have kept all of the data associated with individuals who had phenotypic # information on "trait1" by instead specifying pruned_fullt1_warcolak_ped <- prunePed(warcolak, phenotyped = t1phenotyped$ID) dim(pruned_fullt1_warcolak_ped) #<-- compare number of columns with above
Pedigree with genetic groups adapted from Quaas (1988) equation [5]
Q1988
Q1988
A data.frame
with 11 observations on the following 8 variables:
a factor with levels indicating the unique individuals (including phantom parents) and genetic groups
a factor of observed maternal identities
a factor vector of observed paternal identities
a factor of maternal identities with genetic groups
inserted instead of NA
a factor of paternal identities with genetic groups
inserted instead of NA
a factor of maternal identities with phantom parents
inserted instead of NA
a factor of paternal identities with phantom parents
inserted instead of NA
a factor of genetic groups to which each phantom parent belongs
Quaas, R.L. 1988. Additive genetic model with groups and relationships. Journal of Dairy Science 71:1338-1345.
data(Q1988) str(Q1988)
data(Q1988) str(Q1988)
Simulates a pedigree and phenotype for a focal population receiving immigrants. Genetic and environmental differences can be specified between the focal and immigrant populations. Further, these differences can have temporal trends.
simGG( K, pairs, noff, g, nimm = 2, nimmG = seq(2, g - 1, 1), VAf = 1, VAi = 1, VRf = 1, VRi = 1, mup = 20, muf = 0, mui = 0, murf = 0, muri = 0, d_bvf = 0, d_bvi = 0, d_rf = 0, d_ri = 0 )
simGG( K, pairs, noff, g, nimm = 2, nimmG = seq(2, g - 1, 1), VAf = 1, VAi = 1, VRf = 1, VRi = 1, mup = 20, muf = 0, mui = 0, murf = 0, muri = 0, d_bvf = 0, d_bvi = 0, d_rf = 0, d_ri = 0 )
K |
Integer number of individuals per generation, or the focal population carrying capacity |
pairs |
Integer number of mating pairs created by sampling with replacement from adults of a given generation |
noff |
Integer number of offspring each pair contributes to the next generation |
g |
Integer number of (non-overlapping) generations to simulate |
nimm |
Integer number of immigrants added to the population each generation of migration |
nimmG |
Sequence of integers for the generations in which immigrants arrive in the focal population |
VAf |
Numeric value for the expected additive genetic variance in the first generation of the focal population - the founders |
VAi |
Numeric value for the expected additive genetic variance in each generation of immigrants |
VRf |
Numeric value for the expected residual variance in the focal population |
VRi |
Numeric value for the expected residual variance in each generation of the immigrants |
mup |
Numeric value for the expected mean phenotypic value in the first generation of the focal population - the founders |
muf |
Numeric value for the expected mean breeding value in the first generation of the focal population - the founders |
mui |
Numeric value for the expected mean breeding value for the immigrants |
murf |
Numeric value for the expected mean residual (environmental) deviation in the first generation of the focal population - the founders |
muri |
Numeric value for the expected mean residual (environmental) deviation for the immigrants |
d_bvf |
Numeric value for the expected change between generations in the mean breeding value of the focal population. Sets the rate of genetic selection occurring across generations |
d_bvi |
Numeric value for the expected change between generations in the mean breeding value of the immigrant population each generation |
d_rf |
Numeric value for the expected change between generations in the mean residual (environmental) deviation of the focal population each generation |
d_ri |
Numeric value for the expected change between generations in the mean residual (environmental) deviation of the immigrant population each generation |
Offspring total additive genetic values u
are the average of their
parents u
plus a Mendelian sampling deviation drawn from a normal
distribution with mean of 0 and variance equal to where
is
VAf
and
is the average of the parents' coefficient of inbreeding
f (p. 447 Verrier et al. 1993). Each ‘immigrant’ (individual
with unknown parents in generations >1) is given a total additive genetic
effect that is drawn from a normal distribution with mean of
mui
and
variance equal to VAi
. Residual deviations are sampled for
‘focal’ and ‘immigrant’ populations separately, using normal
distributions with means of murf
and muri
, respectively, and
variances of VRf
and VRi
, respectively. Phenotypes are the sum
of total additive genetic effects and residual deviations plus an overall
mean mup
.
Trends in total additive genetic effects and/or residual deviations can be
specified for both the focal and immigrant populations. Trends in total
additive genetic effects occurring in the immigrants, in the residual
deviations occurring in the focal population, and in the residual deviations
occurring in the immigrants are produced by altering the mean each
generation for the separate distribution from which these effects are each
drawn. The change in mean over a generation is specified in units of
standard deviations of the respective distributions (e.g., square roots of
VAi
, VRf
, and VRi
) and is set with d_bvi
,
d_rf
, or d_ri
, respectively.
Trends in total additive genetic effects for the focal population are produced by selecting individuals to be parents of the next generation according to their predicted total additive genetic effects. Individuals are assigned probabilities of being selected as a parent of the next generation depending on how closely their predicted total additive genetic effect matches an optimum value. Probabilities are assigned:
where x
is the vector of predicted total additive
genetic effects (u
), is the standard
deviation of
x
, and is the optimum value.
Sampling is conducted with replacement based on these probabilities.
The parameter d_bvf
specifies how much the optimal total additive
genetic effect changes per generation. The optimal total additive genetic
effect in a given generation is calculated as: muf + d_bvf
*
sqrt(VAf) * (i-2)
. Individuals with predicted total additive
genetic effects closest to this optimum have a higher probability of being
randomly sampled to be parents of the next generation. This represents
selection directly on predicted total additive genetic effects.
Total additive genetic effects are predicted for the first generation of
focal individuals and all immigrants using equation 1.3 in Mrode (2005,
p.3): . The
heritability is either
VAf
/ (VAf + VRf
) or VAi
/
(VAi + VRi
). Total additive genetic effects are predicted for all
other individuals using equation 1.9 in Mrode (2005, p. 10) - or as the
average of each individual's parents' predicted total additive genetic
effects.
A data.frame
with columns corresponding to:
Integer for each individual's unique identifying code
Integer indicating each individual's dam
Integer indicating each individual's sire
Numeric value for the average of each individual's dam and sire additive genetic effects
Numeric value for each individual's Mendelian sampling deviate from the mid-parental total additive genetic value
Numeric value of each individual's total additive genetic effect
Numeric value of each individual's residual (environmental) deviation
Numeric value of each individual's phenotypic value
Numeric value of each individual's predicted total additive genetic effect
Integer of either 0
if an individual was born in the
focal population or 1
if they were born in an immigrant
population
Integer value of the generation in which each individual was born
Verrier, V., J.J. Colleau, and J.L. Foulley. 1993. Long-term effects of selection based on the animal model BLUP in a finite population. Theoretical and Applied Genetics. 87:446-454.
Mrode, R.A. 2005. Linear Models for the Prediction of Animal Breeding Values, 2nd ed. Cambridge, MA: CABI Publishing.
# The dataset 'ggTutorial' was simulated as: set.seed(102) # seed used to simulate ggTutorial ggTutorial <- simGG(K = 400, pairs = 200, noff = 4, g = 15, nimm = 40, nimmG = seq(2, 14, 1), muf = 0, mui = 3) # Use genetic group methods to approximate the breeding values for ggTutorial ## First, construct a pedigree with genetic groups ggPed <- ggTutorial[, c("id", "dam", "sire", "is", "gen")] naPar <- which(is.na(ggPed[, 2])) ggPed$GG <- rep("NA", nrow(ggPed)) # 'focal' population genetic group = "foc0" and 'immigrant' = "g1" # obtained by pasting "foc" & "g" with immigrant status "0" or "1", respectively ggPed$GG[naPar] <- as.character(ggPed$is[naPar]) ggPed$GG[ggPed$GG == "0"] <- paste0("foc", ggPed$GG[ggPed$GG == "0"]) ggPed$GG[ggPed$GG == "1"] <- paste0("g", ggPed$GG[ggPed$GG == "1"]) ggPed[naPar, 2:3] <- ggPed[naPar, "GG"] ## Now create the Q matrix Q <- ggcontrib(ggPed[, 1:3], ggroups = c("foc0", "g1")) ## obtain the true values of the genetic group means foc0_mean <- mean(ggTutorial$u[which(ggTutorial$gen == 1 & ggTutorial$is == 0)]) g1_mean <- mean(ggTutorial$u[which(ggTutorial$is == 1)]) g_exp <- matrix(c(foc0_mean, g1_mean), ncol = 1) ## breeding values (a) are: ### tot. add. gen. effects (u) minus genetic group effects for each individual (Qg): a <- ggTutorial$u - Q %*% g_exp
# The dataset 'ggTutorial' was simulated as: set.seed(102) # seed used to simulate ggTutorial ggTutorial <- simGG(K = 400, pairs = 200, noff = 4, g = 15, nimm = 40, nimmG = seq(2, 14, 1), muf = 0, mui = 3) # Use genetic group methods to approximate the breeding values for ggTutorial ## First, construct a pedigree with genetic groups ggPed <- ggTutorial[, c("id", "dam", "sire", "is", "gen")] naPar <- which(is.na(ggPed[, 2])) ggPed$GG <- rep("NA", nrow(ggPed)) # 'focal' population genetic group = "foc0" and 'immigrant' = "g1" # obtained by pasting "foc" & "g" with immigrant status "0" or "1", respectively ggPed$GG[naPar] <- as.character(ggPed$is[naPar]) ggPed$GG[ggPed$GG == "0"] <- paste0("foc", ggPed$GG[ggPed$GG == "0"]) ggPed$GG[ggPed$GG == "1"] <- paste0("g", ggPed$GG[ggPed$GG == "1"]) ggPed[naPar, 2:3] <- ggPed[naPar, "GG"] ## Now create the Q matrix Q <- ggcontrib(ggPed[, 1:3], ggroups = c("foc0", "g1")) ## obtain the true values of the genetic group means foc0_mean <- mean(ggTutorial$u[which(ggTutorial$gen == 1 & ggTutorial$is == 0)]) g1_mean <- mean(ggTutorial$u[which(ggTutorial$is == 1)]) g_exp <- matrix(c(foc0_mean, g1_mean), ncol = 1) ## breeding values (a) are: ### tot. add. gen. effects (u) minus genetic group effects for each individual (Qg): a <- ggTutorial$u - Q %*% g_exp
Simulates a pedigree for the “double first cousin” mating design (Fairbairn and Roff 2006).
simPedDFC(U, gpn = 4, fsn = 4, s = 2, fws = 2, prefix = NULL)
simPedDFC(U, gpn = 4, fsn = 4, s = 2, fws = 2, prefix = NULL)
U |
An integer number of units or blocks for the design |
gpn |
Number of grandparent pairs in the generation 0 (GP) (must be >= 2). Equals the number of full-sib families in generation 1 (P). |
fsn |
Number of offspring in each full-sib family of generations 1 and 2 (P and F1 - must be an even number >= 4). |
s |
Number of sires per full-sib family in generation 1 (P - must be >=2) |
fws |
Number of generation 1 (P) families with sires. Together, with
|
prefix |
Optional prefix to add to every identity |
This is an adaption to a half-sib breeding design which also produces first cousins and double first cousins. Double first cousins are produced by mating two brothers to two sisters (the offspring of the resulting two families are double first cousins with one another). This is described in Fairbairn and Roff (2006) as being particularly effective for separating autosomal additive genetic variance from sex chromosomal additive genetic variance. It is also amenable to estimating dominance variance, however, it still has difficulty separating dominance variance from common maternal environmental variance (Meyer 2008).
For a given unit of the design (U
total), 2*gpn
0-generation
(grandparental or GP) individuals are created and paired to make gpn
full-sib families. Then the first fws
families are each allocated
s
males/sires and s*(fws-1)
females/dams in the 1 (parental or P)
generation. The remaining (gpn-fws
) families (only when:
gpn > fws
) are assigned s*fws
females/dams. If
fsn > (s*fws)
, the remaining generation 1 (P) individuals in each
full-sib family (fsn - (s*fws)
) are allocated to each family with
equal numbers of females and males [this allows for more individuals to be
phenotyped in generation 1 (P) than are used to produce generation 2 (F1)].
Generation 2 (F1) is then assigned, based on the mating design in Fairbairn
and Roff (2006) - essentially each sire [of the s
per full-sib family
in generation 1 (P)] is mated to a female from each of the other gpn-1
full-sib families to produce fsn
offspring (with equal numbers of
females and males).
A data.frame
with columns corresponding to: id, dam, sire,
and sex. Sex is M
for males and F
for females.
Fairbairn, D.J. and D.A. Roff. 2006. The quantitative genetics of sexual dimorphism: assessing the importance of sex-linkage. Heredity 97:319-328.
Meyer, K. 2008. Likelihood calculations to evaluate experimental designs to estimate genetic variances. Heredity 101:212-221.
DFC1 <- simPedDFC(U = 1, gpn = 2, fsn = 4, s = 2, fws = 2)
DFC1 <- simPedDFC(U = 1, gpn = 2, fsn = 4, s = 2, fws = 2)
Simulates a pedigree for a half-sib mating design (sometimes also called the North Carolina Design 1).
simPedHS(s, d, n, uniqueDname = TRUE, prefix = NULL)
simPedHS(s, d, n, uniqueDname = TRUE, prefix = NULL)
s |
Number of sires |
d |
Number of dams per sire |
n |
Number of offspring per mating (must be > or = 2) |
uniqueDname |
Logical indicating if dams should have unique names within sire families or throughout the entire pedigree |
prefix |
Optional prefix to add to every identity |
n
must be greater than or equal to 2, because one male and one female
offspring are produced from each mating
Some functions/calculations get bogged down if no two dams have the same ID
in the entire pedigree (e.g., aov
). However, other functions must
have unique identifiers for every individual.
A data.frame
with columns corresponding to: id, dam, sire,
and sex. Sex is "M" for males and "F" for females.
simPedHS(s = 1, d = 3, n = 2)
simPedHS(s = 1, d = 3, n = 2)
Simulates a pedigree for the “middle class neighborhood” mating design (Shabalina, Yampolsky, and Kondrashov 1997).
simPedMCN(pedTemp, g, Nfam = NULL, noff = 2)
simPedMCN(pedTemp, g, Nfam = NULL, noff = 2)
pedTemp |
A |
g |
Integer number of generations to produce from the middle class neighborhood design |
Nfam |
Integer number of families with which to start a new pedigree following the middle class neighborhood design. |
noff |
Integer number of full-sib offspring produced by each family (must be >=2). |
This creates a pedigree following a breeding design which maintains equal contributions to the next generation by each family in the design. It effectively removes the effect of natural selection which makes it amenable to quantify the contribution of mutations to phenotypic variance over the course of the breeding design.
For a starting pedigree template (pedTemp
), the last generation is used
as parents to begin the breeding design for the next g
generations.
The number of families in the last generation of the template pedigree
(pedTemp
) will be the number of families in each generation.
Alternatively, if no template pedigree is provided (pedTemp=NULL
),
Nfam
number of families will be produced in the first generation from
Nfam
unique sire and Nfam
unique dams.
Either pedTemp
or Nfam
must be NULL
, but not both.
A data.frame
with columns corresponding to: id, dam, sire, sex,
and generation. Sex is M
for males and F
for females. The
first generation produced in the middle class neighborhood scheme is assigned
a value of “1”, with their parents being assigned to generation
0
. If pedTemp
was provided, the generations from this pedigree
will be denoted with negative integers.
Shabalina, S.A, L.Y. Yampolsky, and A.S. Kondrashov. 1997. Rapid decline of fitness in panmictic populations of Drosophila melanogaster maintained under relaxed natural selection. Proc. Natl. Acad. Sci. USA. 94:13034-13039.
# No template pedigree provided - start from scrtach mcn1 <- simPedMCN(pedTemp = NULL, g = 3, Nfam = 4, noff = 2) # Provide a template pedigree (half-sib design) hsped <- simPedHS(s = 2, d = 2, n = 4) mcnHS <- simPedMCN(pedTemp = hsped, g = 3)
# No template pedigree provided - start from scrtach mcn1 <- simPedMCN(pedTemp = NULL, g = 3, Nfam = 4, noff = 2) # Provide a template pedigree (half-sib design) hsped <- simPedHS(s = 2, d = 2, n = 4) mcnHS <- simPedMCN(pedTemp = hsped, g = 3)
From a sparse matrix object, the three column, row ordered lower triangle of
non-zero elements is created. Mostly used within other functions (i.e.,
makeD
)
sm2list(A, rownames = NULL, colnames = c("row", "column", "A"))
sm2list(A, rownames = NULL, colnames = c("row", "column", "A"))
A |
a sparse matrix |
rownames |
a list of rownames from the 'A' matrix. |
colnames |
the columns will be labeled however they are entered in this character vector |
The sparse matrix and three column format must fit CERTAIN assumptions about row/column sorting and lower/upper triangle matrix.
Adapted from a function in the MCMCglmm
package
returns the list form of the sparse matrix as a data.frame
The inverse of the Average Information matrix in an ASReml-R object produces the sampling variances of the (co)variance components on the gamma scale. This function scales these variances to the original component scale. This allows for Confidence Intervals to be constructed about the variance component estimates.
varTrans(asr.object)
varTrans(asr.object)
asr.object |
Object from a call to |
Returns a numeric vector of variances for each variance component in an ASReml-R model.
## Not run: library(asreml) ginvA <- ainverse(warcolak) ginvD <- makeD(warcolak[, 1:3])$listDinv attr(ginvD, "rowNames") <- as.character(warcolak[, 1]) attr(ginvD, "INVERSE") <- TRUE warcolak$IDD <- warcolak$ID warcolak.mod <- asreml(trait1 ~ sex, random = ~ vm(ID, ginvA) + vm(IDD, ginvD), data = warcolak) summary(warcolak.mod)$varcomp sqrt(varTrans(warcolak.mod)) # sqrt() so can compare with standard errors from summary ## End(Not run)
## Not run: library(asreml) ginvA <- ainverse(warcolak) ginvD <- makeD(warcolak[, 1:3])$listDinv attr(ginvD, "rowNames") <- as.character(warcolak[, 1]) attr(ginvD, "INVERSE") <- TRUE warcolak$IDD <- warcolak$ID warcolak.mod <- asreml(trait1 ~ sex, random = ~ vm(ID, ginvA) + vm(IDD, ginvD), data = warcolak) summary(warcolak.mod)$varcomp sqrt(varTrans(warcolak.mod)) # sqrt() so can compare with standard errors from summary ## End(Not run)
A two trait example pedigree from the three generation breeding design of Fairbairn & Roff (2006) with two uncorrelated traits.
warcolak
warcolak
A data.frame
with 5400 observations on the following 13 variables:
a factor specifying 5400 unique individual identities
a factor specifying the unique Dam for each individual
a factor specifying the unique Sire for each individual
a factor specifying “M” if the individual is a male and “F” if it is a female
a numeric vector of phenotypic values: see ‘Details’
a numeric vector of phenotypic values: see ‘Details’
a numeric vector of the autosomal additive genetic effects underlying ‘trait1’
a numeric vector of the autosomal additive genetic effects underlying ‘trait2’
a numeric vector of the sex-chromosomal additive genetic effects underlying ‘trait2’
a numeric vector of the autosomal dominance genetic effects underlying ‘trait1’
a numeric vector of the autosomal dominance genetic effects underlying ‘trait2’
a numeric vector of the residual (environmental) effects underlying ‘trait1’
a numeric vector of the residual (environmental) effects underlying ‘trait2’
Unique sets of relatives are specified for a three generation breeding
design (Fairbairn & Roff, 2006). Each set contains 72 individuals. This
pedigree reflects an experiment which produces 75 of these basic sets from
Fairbairn & Roff's design. The pedigree was created using
simPedDFC()
.
The dataset was simulated to have two uncorrelated traits with different
genetic architectures (see examples
below). The trait means are both
equal to 1 for males and 2 for females. The additive genetic, dominance
genetic, and environmental (or residual) variances for both trait1
and trait2
are 0.4, 0.3, & 0.3, respectively. However, the additive
genetic variance for trait2
can be further decomposed to autosomal
additive genetic variance (0.3) and X-linked additive genetic variance (0.1;
assuming the ‘no global dosage compensation’ mechanism).
Females and males have equal variances (except for sex-chromosomal additive genetic variance, where by definition, males have half of the additive genetic variance as females; Wolak 2013) and a between-sex correlation of one for all genetic and residual effects (except the cross-sex residual covariance=0). All random effects were drawn from multivariate random normal distributions [e.g., autosomal additive genetic effects: N ~ (0, kronecker(A, G))] with means of zero and (co)variances equal to the product of the expected sex-specific (co)variances (e.g., G) and the relatedness (or incidence) matrix (e.g., A).
The actual variance in random effects will vary slightly from the amount
specified in the simulation, because of Monte Carlo error. Thus, the random
effects have been included as separate columns in the dataset. See
examples
below for the code that generated the dataset.
Before nadiv version 2.14.0, the warcolak
dataset used a 0/1
coding for ‘sex’ and did not contain the random effects.
Fairbairn, D.J. & Roff, D.A. 2006. The quantitative genetics of sexual dimorphism: assessing the importance of sex-linkage. Heredity 97, 319-328.
Wolak, M.E. 2013. The Quantitative Genetics of Sexual Differences: New Methodologies and an Empirical Investigation of Sex-Linked, Sex-Specific, Non-Additive, and Epigenetic Effects. Ph.D. Dissertation. University of California Riverside.
set.seed(101) library(nadiv) # create pedigree warcolak <- simPedDFC(U = 75, gpn = 4, fsn = 4, s = 2) names(warcolak)[1:3] <- c("ID", "Dam", "Sire") warcolak$trait2 <- warcolak$trait1 <- NA # Define covariance matrices for random effects: ## Autosomal additive genetic (trait1) Ga_t1 <- matrix(c(0.4, rep(0.399999, 2), 0.4), 2, 2) ## Autosomal additive genetic (trait2) Ga_t2 <- matrix(c(0.3, rep(0.299999, 2), 0.3), 2, 2) ## Sex-chromosomal additive genetic (trait2) Gs_t2 <- matrix(c(0.1, rep(0.099999, 2), 0.1), 2, 2) ## Autosomal dominance genetic Gd <- matrix(c(0.3, rep(0.299999, 2), 0.3), 2, 2) ## Environmental (or residual) ### Assumes no correlated environmental effects between sexes R <- diag(c(0.3, 0.3)) ## define variables to be re-used pedn <- nrow(warcolak) # Female (homogametic sex chromosomes) in first column # Male (heterogametic sex chromosomes) in second column sexcol <- as.integer(warcolak$sex) # Create random effects ## Additive genetic ### trait1 autosomal tmp_a <- grfx(pedn, G = Ga_t1, incidence = makeA(warcolak[, 1:3])) var(tmp_a) warcolak$t1_a <- tmp_a[cbind(seq(pedn), sexcol)] ### trait2 autosomal tmp_a <- grfx(pedn, G = Ga_t2, incidence = makeA(warcolak[, 1:3])) var(tmp_a) warcolak$t2_a <- tmp_a[cbind(seq(pedn), sexcol)] ### trait2 sex-chromosomal tmp_s <- grfx(pedn, G = Gs_t2, incidence = makeS(warcolak[, 1:4], heterogametic = "M", DosageComp = "ngdc", returnS = TRUE)$S) matrix(c(var(tmp_s[which(sexcol == 1), 1]), rep(cov(tmp_s)[2, 1], 2), var(tmp_s[which(sexcol == 2), 2])), 2, 2) # NOTE above should be: covar = male var = 0.5* female var warcolak$t2_s <- tmp_s[cbind(seq(pedn), sexcol)] ## Dominance genetic ### trait1 tmp_d <- grfx(pedn, G = Gd, incidence = makeD(warcolak[, 1:3], invertD = FALSE)$D) var(tmp_d) warcolak$t1_d <- tmp_d[cbind(seq(pedn), sexcol)] ### trait2 tmp_d <- grfx(pedn, G = Gd, incidence = makeD(warcolak[, 1:3], invertD = FALSE)$D) var(tmp_d) warcolak$t2_d <- tmp_d[cbind(seq(pedn), sexcol)] ## Residual ### trait1 tmp_r <- grfx(pedn, G = R, incidence = NULL) # warning of identity matrix var(tmp_r) warcolak$t1_r <- tmp_r[cbind(seq(pedn), sexcol)] ### trait2 tmp_r <- grfx(pedn, G = R, incidence = NULL) # warning of identity matrix var(tmp_r) warcolak$t2_r <- tmp_r[cbind(seq(pedn), sexcol)] # Sum random effects and add sex-specific means to get phenotypes ## females have slightly greater mean trait values warcolak$trait1 <- 1 + (-1*sexcol + 2) + rowSums(warcolak[, c("t1_a", "t1_d", "t1_r")]) warcolak$trait2 <- 1 + (-1*sexcol + 2) + rowSums(warcolak[, c("t2_a", "t2_s", "t2_d", "t2_r")])
set.seed(101) library(nadiv) # create pedigree warcolak <- simPedDFC(U = 75, gpn = 4, fsn = 4, s = 2) names(warcolak)[1:3] <- c("ID", "Dam", "Sire") warcolak$trait2 <- warcolak$trait1 <- NA # Define covariance matrices for random effects: ## Autosomal additive genetic (trait1) Ga_t1 <- matrix(c(0.4, rep(0.399999, 2), 0.4), 2, 2) ## Autosomal additive genetic (trait2) Ga_t2 <- matrix(c(0.3, rep(0.299999, 2), 0.3), 2, 2) ## Sex-chromosomal additive genetic (trait2) Gs_t2 <- matrix(c(0.1, rep(0.099999, 2), 0.1), 2, 2) ## Autosomal dominance genetic Gd <- matrix(c(0.3, rep(0.299999, 2), 0.3), 2, 2) ## Environmental (or residual) ### Assumes no correlated environmental effects between sexes R <- diag(c(0.3, 0.3)) ## define variables to be re-used pedn <- nrow(warcolak) # Female (homogametic sex chromosomes) in first column # Male (heterogametic sex chromosomes) in second column sexcol <- as.integer(warcolak$sex) # Create random effects ## Additive genetic ### trait1 autosomal tmp_a <- grfx(pedn, G = Ga_t1, incidence = makeA(warcolak[, 1:3])) var(tmp_a) warcolak$t1_a <- tmp_a[cbind(seq(pedn), sexcol)] ### trait2 autosomal tmp_a <- grfx(pedn, G = Ga_t2, incidence = makeA(warcolak[, 1:3])) var(tmp_a) warcolak$t2_a <- tmp_a[cbind(seq(pedn), sexcol)] ### trait2 sex-chromosomal tmp_s <- grfx(pedn, G = Gs_t2, incidence = makeS(warcolak[, 1:4], heterogametic = "M", DosageComp = "ngdc", returnS = TRUE)$S) matrix(c(var(tmp_s[which(sexcol == 1), 1]), rep(cov(tmp_s)[2, 1], 2), var(tmp_s[which(sexcol == 2), 2])), 2, 2) # NOTE above should be: covar = male var = 0.5* female var warcolak$t2_s <- tmp_s[cbind(seq(pedn), sexcol)] ## Dominance genetic ### trait1 tmp_d <- grfx(pedn, G = Gd, incidence = makeD(warcolak[, 1:3], invertD = FALSE)$D) var(tmp_d) warcolak$t1_d <- tmp_d[cbind(seq(pedn), sexcol)] ### trait2 tmp_d <- grfx(pedn, G = Gd, incidence = makeD(warcolak[, 1:3], invertD = FALSE)$D) var(tmp_d) warcolak$t2_d <- tmp_d[cbind(seq(pedn), sexcol)] ## Residual ### trait1 tmp_r <- grfx(pedn, G = R, incidence = NULL) # warning of identity matrix var(tmp_r) warcolak$t1_r <- tmp_r[cbind(seq(pedn), sexcol)] ### trait2 tmp_r <- grfx(pedn, G = R, incidence = NULL) # warning of identity matrix var(tmp_r) warcolak$t2_r <- tmp_r[cbind(seq(pedn), sexcol)] # Sum random effects and add sex-specific means to get phenotypes ## females have slightly greater mean trait values warcolak$trait1 <- 1 + (-1*sexcol + 2) + rowSums(warcolak[, c("t1_a", "t1_d", "t1_r")]) warcolak$trait2 <- 1 + (-1*sexcol + 2) + rowSums(warcolak[, c("t2_a", "t2_s", "t2_d", "t2_r")])
Pedigree, adapted from Wray (1990)
Wray90
Wray90
A data frame with 8 observations on the following 4 variables:
id
a numeric vector
dam
a numeric vector
sire
a numeric vector
time
a numeric vector
Wray, N.A. 1990. Accounting for mutation effects in the additive genetic variance-covariance matrix and its inverse. Biometrics. 46:177-186.
data(Wray90) str(Wray90)
data(Wray90) str(Wray90)