This vignette introduces you to `sizeMat`

package and provide a way to estimate **Size at Morphometric and Gonad Maturity**.

`install.packages("sizeMat")`

Or the development version from github:

```
# install.packages("devtools")
devtools::install_github("ejosymart/sizeMat")
```

For estimating morphometric maturity use the *crabdata* base. The data set containing allometric measures and other attributes (year, month, sex category) of 223 crabs of the specie *Chionectes tanneri*.

```
data(crabdata)
head(crabdata)
```

```
## year month carapace_width carapace_length chela_height chela_width
## 1 1974 1 106 107 14.0 22
## 2 1974 1 129 129 27.0 44
## 3 1974 1 119 122 14.6 23
## 4 1974 1 115 118 18.6 29
## 5 1974 1 97 97 11.0 17
## 6 1974 1 94 96 10.0 15
## sex_category
## 1 m
## 2 m
## 3 m
## 4 m
## 5 m
## 6 m
```

`names(crabdata)`

```
## [1] "year" "month" "carapace_width" "carapace_length"
## [5] "chela_height" "chela_width" "sex_category"
```

The estimation of morphometric maturity involves two processes:

First the *data classification*, which is referred to the classification of the individuals in two groups. The second is the *estimation*. This process uses the previous classification to estimate the size at morphometric maturity. The details are given below.

The `classify_mature`

function, classify the individuals in two groups (juveniles = 0 and adult = 1). The classification analysis is based on Principal Components Analysis with two allometric variables (x: independent variable, y: dependent variable) in log base, allowing to distinguish two groups that would represent juveniles and adult. The individuals are assigned to each group using a hierarchical classification procedure (hierarchical cluster with agglomeration method: “Ward.D” and the distance measure: “euclidean”). This method is based on establishing a predetermined number of groups (in this case, two) and assigning individuals to one of the groups according to their loads on the two axes of the PCA (Corgos and Freire, 2006).

Using the results of the classification (PCA + cluster), a discriminant analysis (linear or quadratic) is conducted to obtain a discriminating function that permitted any individuals to be classified as a juvenile or an adult on the basis of the X and Y variables.

The `classify_mature`

function requires a data.frame (e.g. crabdata) with allometric variables and sex category. The argument `varNames`

requires the name of two allometric variables only, and `varSex`

requires the name of the variable containing sex information. If the argument `selecSex`

is `NULL`

all the individuals will be used in the classification analysis. Finally the `method`

is focus in the discriminant analysis to be used (“ld”: linear discriminant analysis, “qd”: quadratic discriminant analysis). We recommend begin the analysis with the `method = "ld"`

.

The `classify_mature`

function returns an object of class “classify”, with the allometric variables “x” (independent) - “y”" (dependent), and classification of maturity (juveniles = 0, adult = 1).

```
#For all the individuals
classify_data = classify_mature(crabdata, varNames = c("carapace_width", "chela_height"),
varSex = "sex_category", selectSex = NULL, method = "ld")
```

`## all individuals were used in the analysis`

```
#For males only
classify_data_males = classify_mature(crabdata, varNames = c("carapace_width", "chela_height"),
varSex = "sex_category", selectSex = "m", method = "ld")
```

`## only m-sex were used in the analysis`

Print the results of the “classify_data” object. It shows the number of juveniles and adults after classification and the linear regression analysis for juveniles and adults.

`print(classify_data)`

```
## Number in juvenile group = 83
##
## Number in adult group = 140
##
## --------------------------------------------------------
## 1) Linear regression for juveniles
##
## Call:
## glm(formula = y ~ x, data = juv)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.77010 -0.57399 0.09397 0.56605 1.99008
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -3.794687 0.497056 -7.634 3.93e-11 ***
## x 0.161327 0.004701 34.314 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 0.7320842)
##
## Null deviance: 921.306 on 82 degrees of freedom
## Residual deviance: 59.299 on 81 degrees of freedom
## AIC: 213.63
##
## Number of Fisher Scoring iterations: 2
##
## --------------------------------------------------------
## 2) Linear regression for adults
##
## Call:
## glm(formula = y ~ x, data = adt)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -3.3055 -1.0932 -0.0628 1.1178 3.2759
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -11.246726 1.199496 -9.376 <2e-16 ***
## x 0.273837 0.008648 31.663 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 2.265729)
##
## Null deviance: 2584.24 on 139 degrees of freedom
## Residual deviance: 312.67 on 138 degrees of freedom
## AIC: 515.79
##
## Number of Fisher Scoring iterations: 2
##
## --------------------------------------------------------
## 3) Difference between slopes (ANCOVA)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -3.7946869 0.757105677 -5.012097 1.109526e-06
## x 0.1613275 0.007161179 22.528064 6.035478e-59
## mature -7.4520389 1.285219562 -5.798261 2.320729e-08
## x:mature 0.1125093 0.010361046 10.858878 2.956242e-22
## [1] "slopes are different"
```

The example shows the arguments that can be used in the plot for “classify_data” object:

```
par(mfrow = c(2,2))
plot(classify_data)
plot(classify_data, xlab = "Carapace width (mm.)", ylab = "Chela height (mm)", legendPlot = FALSE)
legend("topleft", "Put your legend here", bty = "n")
plot(classify_data, xlab = "Carapace width (mm.)", ylab = "Chela height (mm)",
col = c(2, 3), pch = c(5, 6), legendPlot = TRUE)
plot(classify_data, xlab = "Carapace width (mm.)", ylab = "Chela height (mm)",
col = c(2, 3), pch = c(5, 6), lty_lines = c(1, 2), lwd_lines = c(1, 3),
cex = c(1, 3), main = "Classification")
```

The `morph_mature`

function use the logit approach (frequentist or bayesian). The size at 50% maturity (\(L_{50}\)) was estimated as the length at which a randomly chosen specimen has a 50% chance of being mature (Somerton 1980, Roa et al. 1999).

In the regression analysis, \(X\) (e.g: carapace width) is considered the explanatory variable and the classification of maturity \(CS\) (juveniles: 0, adults: 1) is considered the response variable (binomial). The variables are fitted to a logit function with the form:

\[P_{CS} = \frac{1}{1+e^{-(\hat{\beta}_{0} + \hat{\beta}_{1}*X)}}\]

where \(P_{CS}\) is the probability of an individual of being mature at a determinate \(X\) length. \(\hat{\beta}_{0}\) (intercept) and \(\hat{\beta}_{1}\) (slope) are parameters estimated. The \(L_{50}\) is calculated as:

\[L_{50} = -\frac{\hat{\beta}_{0}}{\hat{\beta}_{1}}\]

The `morph_mature`

function requires an object of class “classify” with the X, Y (allometric variables) and classification of maturity (juveniles = 0, adults = 1).

The argument `method`

requires a character string indicating which regression will be used for the test. If `method = "fq"`

the logit regression is based on GLM (frequentist) and if `method = "bayes"`

a sample from the posterior distribution of a logit regression model using a random walk Metropolis algorithm is generated (see MCMClogit function).

The argument `niter`

requires a number. For the GLM regression (`method = "fq"`

), a non-parametric bootstrap method consists in generate B bootstrap samples, by resampling with replacement the original data. Then all statistics for each parameter can be calculated from each bootstrap sample (median and confidence intervals). For the `method = "bayes"`

, the argument `niter`

is related to the number of Metropolis iterations for the sampler.

The output is an object of class “morphMat”. This object contains a dataframe with the allometric variables X - Y and the classification of maturity. Also the fitted values for the logit regression and confidence intervals (95%). If you print the object, the median of the size at morphometric maturity estimation (\(L_{50}\)) and parameters are shown.

```
#Frequentist regression
my_ogive_fq = morph_mature(classify_data, method = "fq", niter = 1000)
print(my_ogive_fq)
```

`## formula: Y = 1/1+exp-(A + B*X)`

```
## Original Bootstrap (Median)
## A -20.753 -20.8586
## B 0.1748 0.1757
## L50 118.7237 118.6616
## R2 - 0.7111
```

```
#Bayesian regression
my_ogive_bayes = morph_mature(classify_data, method = "bayes", niter = 1000)
print(my_ogive_bayes)
```

`## formula: Y = 1/1+exp-(A + B*X)`

```
## Bootstrap (Median)
## A -20.6964
## B 0.1744
## L50 118.4079
## R2 0.7111
```

For plotting the maturity ogive, an object of class “morphMat” is required. The function `plot`

generates 4 graphics: 1), 2) and 3) are histograms for the A, B parameters and the size at morphometric maturity (\(L_{50}\)), the last is the maturity ogive.

```
par(mfrow = c(2,2))
plot(my_ogive_fq, xlab = "Carapace width (mm.)", ylab = "Proportion mature", col = c("blue", "red"))
```

```
## Size at morphometric maturity = 118.7
## Confidence intervals = 116.2 - 121.1
## Rsquare = 0.71
```

```
par(mfrow = c(2,2))
plot(my_ogive_bayes, xlab = "Carapace width (mm.)", ylab = "Proportion mature", col = c("blue", "red"))
```

```
## Size at morphometric maturity = 118.4
## Confidence intervals = 115.9 - 120.8
## Rsquare = 0.71
```

If you want the maturity ogive plot only, you have to add the param `onlyOgive = TRUE`

. Besides you can modify the axis (size, rotation, etc) and add the legend.

`plot(my_ogive_fq, xlab = "Carapace width (mm.)", ylab = "Proportion mature", col = c("blue", "red"), onlyOgive = TRUE)`

```
## Size at morphometric maturity = 118.7
## Confidence intervals = 116.2 - 121.1
## Rsquare = 0.71
```

This methodology has been used mainly in the estimation of morphological sexual maturity in crabs, but it can be extended to other taxas as Agostinho (2000) reported.

For estimating gonadal maturity use the *matFish* database. This database contains two variables:

`total_length`

: Total length in cm.`stage_mat`

: The gonadal maturation stages: I, II, III, IV, where I is considered immature.

```
data(matFish)
head(matFish)
```

```
## total_length stage_mat
## 1 12 I
## 2 12 I
## 3 13 I
## 4 14 I
## 5 14 I
## 6 14 I
```

The function to be used to estimate gonadal maturity is `gonad_mature`

. This function use the logistic approach.

The `gonad_mature`

function requires a data.frame with allometric variables (e.g: total length, fork length, carapace width, etc) and a variable containing the stages of sexual maturity (gonadal maturation stages).

The argument `varNames`

requires a character string indicating the name of one allometric and the stage of sexual maturity variable to be used for analysis (e.g `varNames = c("total_length", "stage_mat")`

). So the argument `varNames`

must contain two character strings only, the first is the allometric variable and the second is the stage of sexual maturity.

The arguments `inmName`

and `matName`

require a character string indicating the name of the stages of sexual maturity in the data.frame. The argument could contain one character string or could be a vector (e.g `inmName = "I"`

, `matName = c("II", "III", "IV")`

). The variable *stage_mat* in the *matFish* database, contains the stages of the sexual maturity. In this case, stage I is considered immature and II, III, IV are mature. Then the stages of sexual maturity are transformed in a binomial variable where immature = 0 and mature = 1.

The argument `method`

requires a character string indicating which regression will be used for the test. If `method = "fq"`

the logit regression is based on GLM (frequentist) and if `method = "bayes"`

a sample from the posterior distribution of a logit regression model using a random walk Metropolis algorithm is generated (see MCMClogit function).

The argument `niter`

requires a number. For the GLM regression (`method = "fq"`

), a non-parametric bootstrap method consists in generate B bootstrap samples, by resampling with replacement the original data. Then all statistics for each parameter can be calculated from each bootstrap sample (median and confidence intervals). For the `method = "bayes"`

, the argument `niter`

is related to the number of Metropolis iterations for the sampler.

The output is an object of class “gonadMat”. This object contains a dataframe with the allometric variable “X” and stage of sexual maturity (immature = 0, mature = 1). Also the fitted values for the curve logistic regression and confidence intervals (95%). If you print the object, the median of the size at gonad maturity estimation (\(L_{50}\)) and parameters are shown.

```
#Frequentist regression
my_ogive_fq = gonad_mature(matFish, varNames = c("total_length", "stage_mat"), inmName = "I",
matName = c("II", "III", "IV" ), method = "fq", niter = 999)
print(my_ogive_fq)
```

`## formula: Y = 1/1+exp-(A + B*X)`

```
## Original Bootstrap (Median)
## A -8.6047 -8.6407
## B 0.356 0.3576
## L50 24.1694 24.1714
## R2 0.5595 -
```

```
#Bayesian regression
my_ogive_bayes = gonad_mature(matFish, varNames = c("total_length", "stage_mat"), inmName = "I",
matName = c("II", "III", "IV" ), method = "bayes", niter = 999)
print(my_ogive_bayes)
```

`## formula: Y = 1/1+exp-(A + B*X)`

```
## Bootstrap (Median)
## A -8.4974
## B 0.3522
## L50 24.1295
## R2 0.5595
```

For plotting the maturity ogive the object of class “gonadMat”is required. The function `plot`

generates 4 graphics: 1), 2) and 3) are histograms for the A, B parameters and the size at gonadal maturity (\(L_{50}\)), the last is the maturity ogive.

```
par(mfrow = c(2,2))
plot(my_ogive_fq, xlab = "Total length (cm.)", ylab = "Proportion mature", col = c("blue", "red"))
```

```
## Size at gonad maturity = 24.2
## Confidence intervals = 23.8 - 24.6
## Rsquare = 0.56
```

```
par(mfrow = c(2,2))
plot(my_ogive_bayes, xlab = "Total length (cm.)", ylab = "Proportion mature", col = c("blue", "red"))
```

```
## Size at gonad maturity = 24.1
## Confidence intervals = 23.7 - 24.5
## Rsquare = 0.56
```

If you want the maturity ogive plot only, you have to add the param `onlyOgive = TRUE`

. Besides you can modify the axis (size, rotation, etc) and add the legend.

`plot(my_ogive_fq, xlab = "Total length (cm.)", ylab = "Proportion mature", col = c("blue", "red"), onlyOgive = TRUE)`