Additional materials

Below is the statistical process followed to analyse results from the paper titled

Divergent responses to training load in Football: Are Generic and Task-specific tools redundant or complementary?

*Authorship will be updated once peer review process is completed

This section will give a brief overview of analytic steps used with reproducible examples:

Constant velocity identification
Step detection and Normalization of High-intensity Intermittent-fixed Submaximal Run (HI-IR)
Cross correlations between outcome variables
Within subject variability & Dose-responses associations to external load

Load necessary packages

require("pacman")

Loading required package: pacman

pacman::p_load(data.table,lme4,mgcv,broom.mixed,mgcv,tidyverse,rmcorr,tidygraph,
               igraph,ggraph,ggnetwork,stringr,gratia,emmeans,pracma,signal,DHARMa,performance,collapse,ggthemes,univOutl,pwr)

We are unable to give access to complete raw data-sets but have uploaded de-identifed snippets so that the key processes could be replicated.

Outlier detection

To ensure data quality while preserving natural variability, we implemented a conservative outlier detection approach based on robust scale estimation (Qn method). Rather than relying on the standard deviation method, which can be heavily influenced by extreme values themselves (Rousseeuw & Croux, 1993).

The Qn method works by calculating pairwise differences between observations and selecting a scaled quantile from these differences. This approach has been show to be a robust measure of dispersion that maintains resistance to contamination even when up to 50% of the data contains outliers, making it substantially more reliable than traditional methods in the presence of extreme values (Rousseeuw & Croux, 1993)

For each subject and variable combination, we established outlier boundaries by:

Lower bound = Median - (2.75 � Qn scale) Upper bound = Median + (2.75 � Qn scale)

The multiplier of 2.75 was deliberately chosen to implement a conservative threshold that retains most natural variability while excluding only the most extreme observations. All observations that were flagged as potential outliers were scrutinized by domain experts to confirm before removal.

Rousseeuw, P. J., & Croux, C. (1993). Alternatives to the Median Absolute Deviation.�Journal of the American Statistical Association,�88(424), 1273–1283. https://doi.org/10.1080/01621459.1993.10476408

Across all the variables analysed for future analysis, a range of 7 to 31 (0.91% to 4.13% of total observations for each test) observations were considered flags and subsequently removed prior to analysis.

1) Constant Velocity identification

Below shows steps used to identify constant velocity during the High-intensity Intermittent-fixed Submaximal Run (HI-IR). Constant velocity was consistent with the method outlined by Garret et al.

phase_det_url <- "https://raw.githubusercontent.com/R2mu/AdditionalMaterials/main/data_AddMat/phasedetect_examp.csv"

phase_data <- fread(phase_det_url )

Create function for detecting phases based on Garrett et a 2019

catphase_garrett <- function(velocity,acceleration,rn,lowerlimit=-0.4,upperlimit=0.7) {
  # Find the index of peak acceleration
  peak_acc_index <- which.max(acceleration)
  
  first_index <- which(acceleration[(peak_acc_index + 1):length(acceleration)] <= 0.7)[1]
  last_index <- which(acceleration[(peak_acc_index + 1):length(acceleration)] <= (-0.4))[1]
  
  first_index <- first_index + peak_acc_index
  last_index  <- peak_acc_index + last_index 
  val=ifelse(velocity>=2&rn*100>=first_index&rn*100<=last_index,
             "CV",
             ifelse(rn*100<=first_index,"AP",
                    ifelse(rn*100>=last_index,"DP",NA)))
  return(val)
}

phase_examp=phase_data [,vel_poly6 :=round(predict(lm(vel_m.s ~ stats::poly(rn,6)),.SD),4), 
                        by = .(ID)
                ][,poly_acc := round((vel_poly6-shift(vel_poly6,n = 1))/0.01,3), by=.(ID)
                ][,phase_ID_garrett:= catphase_garrett(velocity = vel_poly6,
                                       acceleration =poly_acc,
                                       rn=rn,
                                       lowerlimit = (-.4),
                                       upperlimit = .7),by=.(ID)]


rect= phase_examp[phase_ID_garrett=="CV",
         ][,.(start =min(rn),
              end=max(rn))]


ggplot(phase_examp,aes(rn,vel_m.s))+
  annotate(geom = "rect",
           xmin=rect$start,xmax =rect$end,
           ymin=-Inf,ymax=Inf,fill="green",alpha=.2)+
  geom_path(aes(y=vel_poly6,col=phase_ID_garrett))+
  geom_hline(yintercept=c(.7,-0.4))+
  geom_path(aes(y=poly_acc,col=phase_ID_garrett))+
  geom_path(phase_examp[phase_ID_garrett=="CV",],
            mapping=aes(rn,acc.up))+
  annotate(geom = "text",x=3.2,y=3,
           label="VT acceleration collected from accelerometer",
           hjust=0,family="Times")+
 labs(x="Time (s)", y = bquote("Velocity (m"~"\u00B7"~"s"^-1~")"),
      #subtitle = "Black line represents VT accelerometer data",
      title = "Constant Velocity Phase Detection")+
  theme(legend.position = "top",
       text = element_text(family = "Times"),
       panel.background = element_rect(fill = NA),
       axis.line = element_line(.25,colour="black"))

Accelerometer data collected within the CV phase were then used to calculate player load calculations using formula outlined within the manuscript for all three vectors. Vertical accelerometer data was additionally used to calculate the number of steps that were achieved during the constant velocity phase, where the step detection method is outlined below.

2) Step detection and Normalization of High-intensity Intermittent-fixed Submaximal Run (HI-IR)

Prior to analysis, three trials were excluded from the study due to insufficient data. The exclusion criterion was set at a minimum of 6 steps identified during the constant velocity phase. This threshold was chosen to ensure at least three steps per leg were captured, which we deemed as bare minimum for accurate assessment of the collected parameters during constant velocity (44). Step identification was performed using vertical acceleration peaks from the 100Hz Inertial Measurement Unit (IMU) data. We employed components of a method described by Benson et al. (45), implemented using the pracma package. All identified peaks were visually inspected for each athlete-trial to ensure accuracy.

An example of this process is outlined below.

# load data
SD_url <-  "https://raw.githubusercontent.com/R2mu/AdditionalMaterials/main/data_AddMat/exampleStepDetection.csv"

stepDet<-fread(SD_url)

# find peaks algorithm
# settings chosen were based of visual inspection that most reliable identified peaks in contact across all individuals. 

find_peaks <- function(x,minPeakHeight=1.2) {
  peak_indices <- pracma::findpeaks(x, 
                                    threshold =0, 
                                    minpeakdistance = 16,
                                    minpeakheight = minPeakHeight,
                                    zero = "+")[, 2]
  peak_indicator <- as.integer(seq_along(x) %in% peak_indices) 
  return(peak_indicator)
}


# implementaion of find peaks across three sample players on a smoothed vertical acceleration signal can be seen below 
# Filtering was performed using a 4th order 25 hz low pass butterworth filter

stepDet <- stepDet[,bw.25_acc.up:=round(filtfilt(butter(1,.25,type="low"), acc.up),2),by=.(ID)
                ][,peak:=find_peaks(bw.25_acc.up,minPeakHeight = 1.2),
                   by=.(ID)]

Visual representation of step detection. This was performed for each testing occasion to ensure appropriate detection across all players.

ggplot(stepDet,aes(time/60,bw.25_acc.up))+
  geom_path()+
  geom_point(stepDet[peak==1,],mapping=aes(time/60,bw.25_acc.up),col="red")+
  facet_wrap(~ID,ncol=1,scales = "free",strip.position = "top")+
  ggthemes::theme_tufte()+
  labs (x= "time (s)",y = bquote(~CF-~SMFT~PL[VT]))+
  ggthemes::geom_rangeframe()

This was then summarised along with other key Player load derived measures to make sure a minimum of 6 steps was used for further analysis. Below is an example of how this was summarized for each player on each testing occasion.

stepDet[peak==1,.(count=.N),by=.(ID)]

       ID count
   <char> <int>
1:   #ID1    19
2:   #ID2    21
3:   #ID3    19

Normalization

As both duration and velocity of the HI-IR_plateau directly influence accumulated accelerometer-vector loads and were observed to vary from week-to-week, an adjusted estimate was developed. Below is an outline of the steps followed.

Load and split data

HIIR_url <- "https://raw.githubusercontent.com/R2mu/AdditionalMaterials/main/data_AddMat/HI_IR_data.csv"

#url
HIIR_dat<-fread(HIIR_url)[,ID       := as.factor(ID)
   ][,Season   := as.factor(Season)
   ][,logVT    := round(log(pl.up_CV_SMF1),3)
   ][,logAP   := round(log(pl.fwd_CV_SMF1),3)
   ][,logML  := round(log(pl.side_CV_SMF1),3)]

set.seed(123)  # for reproducibility

split <- 0.7
train <- sample(nrow(HIIR_dat), split * nrow(HIIR_dat))
test  <- setdiff(1:nrow(HIIR_dat), train)

train_data <- HIIR_dat[train, ]
test_data  <- HIIR_dat[test, ]

Create fully specified model

adj_gam <- gam(pl.up_CV_SMF1~te(mu.vel_CV_SMF1,tm_CV_SMF1)+
                             s(GH)+
                             ## random effects matrix below
                             s(mu.vel_CV_SMF1,ID,bs="re")+
                             s(Season, ID, bs = "re"),
                           method = "REML",
                   data = train_data)


non_log <-simulateResiduals(adj_gam,n=1000)

plot(non_log,title="Initial model response scale")

Based on the above results we will refit with log transformation and also explore a simpler model along with the fully specified model.

adj.VT_null <- gam(log(pl.up_CV_SMF1)~1+
                             ## random effects matrix below
                             s(mu.vel_CV_SMF1,ID, bs="re")+
                             s(Season,ID, bs = "re"),
                           method = "REML",
                   data = train_data)

# simpler RE structure
adj.VT_sml <- gam(log(pl.up_CV_SMF1)~te(mu.vel_CV_SMF1,tm_CV_SMF1)+
                             s(GH)+
                             ## random effects matrix below
                             s(mu.vel_CV_SMF1,ID, bs="re")+
                             s(ID, bs="re"),
                           method = "REML",
                   data = train_data)

# fully specified
adj.VT_fs <- gam(log(pl.up_CV_SMF1)~te(mu.vel_CV_SMF1,tm_CV_SMF1)+
                             s(GH)+
                             ## random effects matrix below
                             s(mu.vel_CV_SMF1,ID, bs="re")+
                             s(Season,ID, bs = "re"),
                           method = "REML",
                   data = train_data)


sim=DHARMa::simulateResiduals(adj.VT_sml,n=1000)
plot(sim,title = "Simple RE structure")

sim=DHARMa::simulateResiduals(adj.VT_fs,n=1000)
plot(sim,title = "Full RE structure")

Assess models

#model_performance(adj.VT_fs)
compare_performance(adj.VT_null,adj.VT_sml,adj.VT_fs)

Some of the nested models seem to be identical

# Comparison of Model Performance Indices

Name        | Model |  AIC (weights) | AICc (weights) |  BIC (weights) |    R2
------------------------------------------------------------------------------
adj.VT_null |   gam | 1131.0 (<.001) | 1145.3 (<.001) | 1393.2 (<.001) | 0.378
adj.VT_sml  |   gam |  305.6 (<.001) |  323.7 (<.001) |  598.2 (0.994) | 0.854
adj.VT_fs   |   gam |  197.2 (>.999) |  234.8 (>.999) |  608.6 (0.006) | 0.883

Name        |  RMSE | Sigma
---------------------------
adj.VT_null | 0.239 | 0.252
adj.VT_sml  | 0.115 | 0.122
adj.VT_fs   | 0.100 | 0.109

Significant improvements from the null model to the fully specified are noted above, we which were also assessed on the hold out data below.

model_list <- list(
  null_model = adj.VT_null,
  sparse_re = adj.VT_sml, # simplified random effects structure
  full_re = adj.VT_fs # maximal random effects structure
 # acc_AP = log_adj_AP, place holder for acc.fwd
#  acc_ML = log_adj_ML  place holder for acc.side
)


for (model_name in names(model_list)) {
  current_model <- model_list[[model_name]]
  
tst_dat = test_data[,preds:=predict(current_model,.SD)
        ][!is.na(preds),
        ][,diff_sq := (logVT-preds)^2
        ]  
  
rmse = tst_dat[,.(RMSE=sqrt(mean(diff_sq)))]

print(paste0("test data RMSE for ",model_name,": ",round(rmse,2)))

}

[1] "test data RMSE for null_model: 0.25"
[1] "test data RMSE for sparse_re: 0.13"
[1] "test data RMSE for full_re: 0.12"

While the full model exhibited a significant dispersion result, potentially indicating some over specification in the random effects structure, the remaining diagnostic measures were satisfactory. Given that our primary focus was on predictive performance and we believe that the full RE structure is more representative of the data at hand we stuck with the full RE model.

Predicted vs. Observed plot on hold out data

ggplot(tst_dat,aes(logVT,preds))+
  geom_point(pch=21,fill="gray80",size=2)+
  geom_abline(slope = 1,intercept = 0)+
  labs(x=bquote(~Measured~PL[VT]),y=bquote(~Predicted~PL[VT]))+
  theme_tufte()+
  geom_rangeframe()

Normalising fly results

First, we retrained the chosen model on the full data set, this was performed for each acceleration vector from the IMU.

log_adj_VT_fin <- gam(logVT~te(mu.vel_CV_SMF1,tm_CV_SMF1)+
                             s(GH)+
                             ## random effects matrix below
                             s(ID, mu.vel_CV_SMF1, bs="re")+
                             s(Season,ID, bs = "re"),
                           method = "REML",
                   data = HIIR_dat)

Create a reference data.frame with average values

ref_data<-copy(HIIR_dat)[,mu.vel_CV_SMF1 := round(mean(mu.vel_CV_SMF1),2)
       ][,tm_CV_SMF1      := round(mean(tm_CV_SMF1),2)
       ][,GH              := round(mean(GH),2)
       ][,.(Season,ID,GH,nsteps_CV_SMF1,mu.vel_CV_SMF1,tm_CV_SMF1)
      # create reference prediction based of average values
       ][,refPred_VT     := round(predict.gam(log_adj_VT_fin,.SD),2)
      # do some data cleaning for merging later
       ][,setnames(.SD,"mu.vel_CV_SMF1","mu_vel")
       ][,setnames(.SD,"tm_CV_SMF1","mu_time")
       ][,!c("GH","nsteps_CV_SMF1")]

Create a prediction grid based of observed values for that day

ind_pred = HIIR_dat[,.(Season,ID,GH,nsteps_CV_SMF1,
                           pl.up_CV_SMF1,pl.fwd_CV_SMF1,pl.side_CV_SMF1,
                           mu.vel_CV_SMF1,tm_CV_SMF1,logVT),
         # individual prediction: this would be extended fro AP and ML
                      ][,indPred_VT   := round(predict.gam(log_adj_VT_fin),3)]

Join with reference prediction for normalisation.

HIIR_adj=join(ind_pred,ref_data,how="left",verbose=0
       )[,logAdj_pl.VT   := logVT   - (indPred_VT-refPred_VT)]

Visually represent the impact of adjustment on a subset of players

set.seed(13)

rs <- sample(1:49,12,replace = F)

examp_lng <- HIIR_adj[,.(ID,logVT,logAdj_pl.VT,indPred_VT)
                    ][,melt.data.table(.SD,id.vars = 1:2)
                    ][,variable:=ifelse(variable=="logAdj_pl.VT","Post Adjustment","Pre Adjustment")
                    ][ID%in%rs,]



ggplot(examp_lng,aes(logVT,value,col=variable))+
  geom_point(alpha=.3)+
  stat_smooth(method = "lm",se=F)+
  labs(x="measured VT_acc",y="predicted VT_acc",title = "Subset of adjustment example")+
  theme_tufte()+
  facet_wrap(~ID)+
  theme(panel.background = element_rect(fill = NA))

From a global perspective we can notice the reduction in variance post adjustment.

HIIR_lng <- HIIR_adj[,.(ID,logVT,logAdj_pl.VT)
                     ][,melt.data.table(.SD,id.vars = 1)
                     ][,variable:=ifelse(variable=="logAdj_pl.VT","Post Adjustment","Pre Adjustment")]

ggplot(HIIR_lng,aes(x=value,y=variable,fill=variable))+
  stat_summary(fun.data = "mean_sdl",
               geom = "crossbar",
               fun.args = list(mult=1),position = position_nudge(y=.2),
               width=.1,alpha=.3,
               show.legend = F)+
  geom_jitter(pch=21,height = 0.1,alpha=.3,show.legend = F)+
  labs(x=bquote(PL[VT]), y = NULL)+
  theme_tufte()+
  theme(panel.background = element_rect(fill=NA))

3) Cross Correlations

This section will outline how the within participate cross correlations were computed.

Load data.

crossCor_url<-"https://raw.githubusercontent.com/R2mu/AdditionalMaterials/main/data_AddMat/crossCor_db.csv"


CrossCor_dat<-fread(crossCor_url)[,ID := as.factor(ID)]

Variables to use for cross correlations

toi <-paste0("value_",c("ave_GB",
                 "ConPV_mx","JHIM.mx","FTCT.mx","FZVmx","EccDur.mn","TTPP.mn",
                 "tm_CV_SMF1","mu.vel_CV_SMF1",
                 "logAdj_pl.up","logAdj_pl.fwd","logAdj_pl.side",
                 "HRmu_perc",
                 "pl.up_CV_SMR1","pl.fwd_CV_SMR1","pl.side_CV_SMR1"))

Compute within participate correlation matrix

crossCor=rmcorr_mat(participant = as.factor(ID),
                    variables = toi,
                    dataset =CrossCor_dat,
                    CI.level = 0.95)

###3a) Sensitivity analysis

Using the above data and to help contenxtulise results we ran a sensitivity analysis to given an indicator of the smallest correlation we could observed with 80% pwoer and at a alpha of 0.05.

pwrRange <- crossCor$summary|>as.data.table()

pwrRange[, .(measure1, measure2, effective.N)
         ][c(which.min(effective.N), which.max(effective.N))]

          measure1        measure2 effective.N
            <char>          <char>       <num>
1: value_EccDur.mn value_HRmu_perc         389
2:  value_ConPV_mx   value_JHIM.mx         829

# Parameters
alpha <- 0.05  
target_power <- 0.8  
minEffN <- 389 #lowest effective.N
maxEffN <- 829 #Highest effective.N
muEffN <- pwrRange[,.(muEffN=mean(effective.N))][[1]] #Mean effective n

EffN_rng <- c(minEffN,muEffN,maxEffN)

power_range <- list()

for (j in seq_along(EffN_rng)) {
  
  i <- EffN_rng[j]
  
  power_result <- pwr.r.test(n = i,
                             sig.level = alpha,
                             power = target_power,
                             alternative = "two.sided")
  
  power_range[[j]] <- paste("Effective N:", i, ", MinDetectable r =", round(power_result$r, 2))
}

# Print the clean list
power_range

[[1]]
[1] "Effective N: 389 , MinDetectable r = 0.14"

[[2]]
[1] "Effective N: 544.15 , MinDetectable r = 0.12"

[[3]]
[1] "Effective N: 829 , MinDetectable r = 0.1"

Do some data cleaning for graphical represent cross correlations

graphStrt <- crossCor$summary |>
  mutate(
    across(c("measure1", "measure2"), ~sub("^value_", "", .)),
    across(c("measure1", "measure2"), ~ifelse(.x=="ave_GB","ADD~Strength",.x)),
    across(c("measure1", "measure2"), ~ifelse(.x=="logAdj_pl.up","HI-IR~PL[V]",.x)),
    across(c("measure1", "measure2"), ~ifelse(.x=="logAdj_pl.fwd","HI-IR~PL[AP]",.x)),
    across(c("measure1", "measure2"), ~ifelse(.x=="logAdj_pl.side","HI-IR~PL[ML]",.x)),
    across(c("measure1", "measure2"), ~ifelse(.x=="mu.vel_CV_SMF1","HI-IR~Vel",.x)),
    across(c("measure1", "measure2"), ~ifelse(.x=="tm_CV_SMF1","HI-IR~Dur",.x)),
    across(c("measure1", "measure2"), ~ifelse(.x=="pl.fwd_CV_SMR1","CF-SMFT~PL[AP]",.x)),
    across(c("measure1", "measure2"), ~ifelse(.x=="pl.up_CV_SMR1","CF-SMFT~PL[V]",.x)),
    across(c("measure1", "measure2"), ~ifelse(.x=="pl.side_CV_SMR1","CF-SMFT~PL[ML]",.x)),
    across(c("measure1", "measure2"), ~ifelse(.x=="HRmu_perc","CF-SMFT~HR[ex]",.x)),
    across(c("measure1", "measure2"), ~ifelse(.x=="TTPP.mn","CMJ~TTPP",.x)),
#    across(c("measure1", "measure2"), ~ifelse(.x=="TTPF.mx","CMJ~TTPF",.x)),
    across(c("measure1", "measure2"), ~ifelse(.x=="EccDur.mn","CMJ~Ecc~Dur",.x)),
    across(c("measure1", "measure2"), ~ifelse(.x=="FZVmx","CMJ~F0V",.x)),
    across(c("measure1", "measure2"), ~ifelse(.x=="FTCT.mx","CMJ~FT:CT",.x)),
    across(c("measure1", "measure2"), ~ifelse(.x=="JHIM.mx","CMJ~JH",.x)),
    across(c("measure1", "measure2"), ~ifelse(.x=="ConPV_mx","CMJ~Con~Vel",.x)),
    r = rmcorr.r,  # Keep original r values
    r_label = ifelse(r > 0 & lowerCI > 0.1 | r < 0 & upperCI < (-0.1), round(r,2), NA)  # Create a new column for labels
  )

Turn into graph object

graphCor <-graphStrt|>
  graph_from_data_frame(directed = FALSE)

Plot network graph

networkFig<-  ggraph(graphCor,layout = "circle") +
  geom_edge_link(aes(
    edge_alpha = abs(r),
    edge_width = abs(r), 
    color = r,
    label = ifelse(is.na(r_label),"",r_label)),
    label_size=3,
    family = "Times",
    lineend = "round",linejoin = "round"
  ) +
  guides(edge_alpha = "none", edge_width = "none") +
  scale_edge_colour_gradientn(limits = c(-1, 1), 
                              breaks = c(-1,-.8,-0.5,-0.3,-0.1,0.1,0.3,0.5,0.8,1),
                              colors = c("firebrick2","white","dodgerblue2"),
                              name="within participant correlation r") +
  geom_node_point(color = "gray70", size = 5,pch=21) +
  geom_node_label(aes(label = name,
                      hjust = ifelse(x < 0, 1, 0),
                      vjust = ifelse(y < 0, 1,-0.1)),
                  parse = T,
                  size = 3.5,
                  fill = "white",
                  alpha=.2,
                  family = "Times",
                  label.padding = unit(0.1, "lines"),
                  label.r = unit(0.1, "lines")) +
  theme_graph() +
  scale_x_continuous(expand = c(0.12,0.12))+
  theme(legend.position = "bottom",
        text = element_text(family = "Times",size = 12),
        #axis.text = element_text(),
        legend.title.position = "bottom",
        legend.title = element_text(hjust=.5),
        legend.key.width = unit(4,units = "lines"),  # Adjust overall legend width
        legend.key.height = unit(0.5, "lines"))

networkFig

#ggsave(networkFig,filename="networkFig.png",dpi = 700,
#       width = 8,height = 7,units = "in")

4) Within Athlete Variability and Load Response Effects

See below an example of the process followed for quantifying the within athlete variability and fitting GAM models to assess dose response models.

DVOI_scaled <-paste0("Zscr_BT_",c("ave_GB",
                          "JHIM.mx","FZVmx","FTCT.mx",#"TTPP.mn",
                          "tm_CV_SMF1","mu.vel_CV_SMF1",
                          "logAdj_pl.up","logAdj_pl.fwd","logAdj_pl.side",
                          "HRmu_perc",
                          "pl.up_CV_SMR1","pl.fwd_CV_SMR1","pl.side_CV_SMR1"))

DVOI <-paste0("value_",c("ave_GB",
                         "JHIM.mx","FZVmx","FTCT.mx",#"TTPP.mn",
                         "tm_CV_SMF1","mu.vel_CV_SMF1",
                         "logAdj_pl.up","logAdj_pl.fwd","logAdj_pl.side",
                         "HRmu_perc",
                         "pl.up_CV_SMR1","pl.fwd_CV_SMR1","pl.side_CV_SMR1"))



cvte_url<-"https://raw.githubusercontent.com/R2mu/AdditionalMaterials/main/data_AddMat/modelData.csv"

CVTE_dat <- fread(cvte_url)[,ID := as.factor(ID)
                 ][,Season:=as.factor(Season)]

Please note for simplicity the code below is not in parallel and may take some time to run, this is mainly due to profile confidence intervals.

pacman::p_load(kableExtra,formattable, purrr,marginaleffects)

back_transform_mu <- function(log_mean, log_sd) {
  # Compute sigma squared
  sigma_sq <- log_sd^2
  
  # Calculate the mean on the original scale
  mean_original <- exp(log_mean + sigma_sq / 2)
  
  return(mean_original)
}



back_transform_sd <- function(log_mean, log_sd) {
  # Compute sigma squared
  sigma_sq <- log_sd^2
  
  # Calculate the mean on the original scale
  mean_original <- exp(log_mean + sigma_sq / 2)
  
  # Calculate the residual SD on the original scale
  sd_original <- mean_original * sqrt(exp(sigma_sq) - 1)
  
  return(sd_original)
}

  
CV_TE   <- list()
counts  <- list()
edf     <- list()
cp      <- list()
modperf <- list()
resid_str  <- list() 
storeModel <-list()
re_struc   <- list()


for (i in 1:length(DVOI)) {
  
  #i = 1
 
  DV_OI    = DVOI[i]
  DV_Zscr  = DVOI_scaled[i]

getcols =c("ID","Season","Week",DV_OI)

count_df =CVTE_dat[!is.na(Zscr_WI_TD_acu7),
][,..getcols
][!is.na(get(DV_OI)),
][,.(cnt=.N),by=.(ID,Season)
][,.(mu_count = mean(cnt),
     sd_count = sd(cnt),
     sm_count = sum(cnt),
     nplayer=.N),by=.(Season)]


n1=count_df[1,4][[1]]
n2=count_df[2,4][[1]]

weighted_avg <- sum(count_df$mu_count * c(n1,n2)) / (n1+n2)

pooled_sd <- sqrt(((n1 - 1) * count_df$sd_count[1]^2 + (n2 - 1) * count_df$sd_count[2]^2) / (n1 + n2 - 2))

counts[[i]]<-data.table(DV=DV_OI,
                        pooled_mu = weighted_avg,
                        pooled_sd = pooled_sd)

  # Null linear mixed effects models for TE estimation
  null_mod = as.formula(paste0(DV_OI,'~1+(Week||ID/Season)'))
  
  
  mer1  = lmer(null_mod,data = CVTE_dat,
                  control = lmerControl(
    optimizer = "Nelder_Mead"))                       
  
  
  SDgrp = VarCorr(mer1)|>data.frame()|>
          dplyr::filter(grp=="ID")|>
          pull(sdcor)
  
  Resid = VarCorr(mer1)|>data.frame()|>
          dplyr::filter(grp=="Residual")|>
          pull(sdcor)



 
# GAM formulation
# s(Week, ID, bs = "re") +
null_std = as.formula(paste0(DV_Zscr,'~1+
             s(ID, bs = "re") +               # Random intercept for ID
             s(Week, ID, bs = "re") +         # Random slope for ID
             s(Week, Season, ID, bs = "re")+  # Random slope for Season within ID
             s(Season, ID, bs = "re")')       # Random intercept for Season within ID
)
  
#s(Week, ID, by = Season,bs = "re")+ 
#s(Zscr_WI_HIR_17_acu7,bs="ts")+

# GAM model
form_zscr = as.formula(paste0(DV_Zscr,'~
             s(Zscr_WI_TD_acu7,bs="ts")+
             s(Zscr_WI_VHSR_25_acu7,bs="ts")+
             s(Week,by=Season,bs="cs")+
             s(ID, bs = "re") +               # Random intercept for ID
             s(Week, ID, bs = "re") +         # Random slope for ID
             s(Week, Season, ID, bs = "re")+  # Random slope for Season within ID
             s(Season, ID, bs = "re")')       # Random intercept for Season within ID
)
 
 
##       
 # GAM model scaled
gam_scaled <- bam(form_zscr,     
                 data = CVTE_dat,
                 discrete = T,
                 method = "fREML",
                 select = T)



storeModel[[i]]<-gam_scaled

modperf[[i]]<- model_performance(gam_scaled)|>
               data.frame()|>
               dplyr::mutate(DV = DVOI_scaled[i])  

#"Zscr_WI_HIR_17_acu7",
vars <- c("Zscr_WI_TD_acu7","Zscr_WI_VHSR_25_acu7")

# for context against null model 
gam_null <- bam(null_std,
               data = CVTE_dat,
               discrete = T,
               method = "fREML")

#extract_vc(gam_null,digits = 3)[,c(1,2,4,5,6)]

## extract GAM summary stats
sm =summary(gam_scaled)
edf[[i]]<-sm$s.table|>data.frame()|>
  tibble::rownames_to_column("var")|>
  dplyr::mutate(DV = DV_Zscr)
 

# Extract smooth estimates
cp[[i]] <- smooth_estimates(gam_scaled,
                            unconditional = T,
                            overall_uncertainty = T) |>
  add_confint(coverage = .95)|>
  dplyr::filter(!.type=="Random effect")|>
  select(.smooth,.estimate,.lower_ci,.upper_ci,
         Zscr_WI_TD_acu7,Zscr_WI_VHSR_25_acu7,Week)|>
  reshape2::melt(id.vars=1:4)|>
  mutate(DV = DV_Zscr)|>
  as.data.table()

#resid[[i]] 
resid<- CVTE_dat[!is.na(get(DV_Zscr))&!is.na(Zscr_WI_TD_acu7),]|>
  add_partial_residuals(model = gam_scaled)|>
  mutate(rn=row_number(),DV= DV_Zscr,
         key =paste0(ID,"_",rn))|>
  select(key,DV,Zscr_WI_TD_acu7,Zscr_WI_VHSR_25_acu7,
        "s(Zscr_WI_TD_acu7)", "s(Zscr_WI_VHSR_25_acu7)",
        "s(Week):Season2023","s(Week):Season2024")|>
    setDT()



resid_IV <- resid[,1:4
          ][,melt.data.table(.SD,id.vars=1:2,variable.name = "IV",value.name = "IV_value")]

resid_DV <- resid[,c(1:2,5:8)
          ][,melt.data.table(.SD,id.vars=1:2,variable.name = "function",value.name = "DV_value")]


resid_str[[i]]<- join(resid_IV,resid_DV,how="left")


mu = mean(CVTE_dat[,get(DV_OI)],na.rm=T)
sd = sd(CVTE_dat[,get(DV_OI)],na.rm=T)

## clean table

re_struc[[i]] <- broom.mixed::tidy(mer1,effects="ran_pars",
                 conf.int=T,conf.method="profile")|>
                 mutate(DV = DV_OI,
                        Component = c(
                        "Between-Season Trajectory (within athlete)",
                        "Between-Season Baseline (within athlete)", 
                        "Between-Athlete Trajectory",
                        "Between-Athlete Baseline",
                        "Residual"))


CV_TE[[i]]<-re_struc[[i]]|>
    dplyr::filter(group=="Residual")|>
  mutate(Ave =  mu,
         SD  =  sd,
         Ave = ifelse(grepl("logAdj",DV_OI),
                      back_transform_mu(mu,sd),Ave),
         SD =  ifelse(grepl("logAdj",DV_OI), 
               back_transform_sd(mu,sd),SD),
        Residual_te = Resid,
        Residual_te = ifelse(grepl("logAdj",DV_OI),
                        back_transform_sd(mu,Resid),Residual_te),
        conf.low = ifelse(grepl("logAdj",DV_OI),
                     back_transform_sd(mu,conf.low),conf.low),
        conf.high= ifelse(grepl("logAdj",DV_OI),
                       back_transform_sd(mu,conf.high),conf.high))|>
  mutate(CV = ifelse(grepl("logAdj",DV_OI),
               Residual_te/Ave*100,
               estimate/Ave*100),
         LL = conf.low/Ave*100,
         UL = conf.high/Ave*100,
         Null = performance::rmse(gam_null),
         Full  = performance::rmse(gam_scaled))



print(DVOI[i])

}

left join: resid_IV[key, DV] 2974/2974 (100%) <2:1st> resid_DV[key, DV] 1487/5948 (25%)

[1] "value_ave_GB"
left join: resid_IV[key, DV] 1772/1772 (100%) <2:1st> resid_DV[key, DV] 886/3544 (25%)

[1] "value_JHIM.mx"
left join: resid_IV[key, DV] 1754/1754 (100%) <2:1st> resid_DV[key, DV] 877/3508 (25%)

[1] "value_FZVmx"

left join: resid_IV[key, DV] 1766/1766 (100%) <2:1st> resid_DV[key, DV] 883/3532 (25%)

[1] "value_FTCT.mx"

left join: resid_IV[key, DV] 1510/1510 (100%) <2:1st> resid_DV[key, DV] 755/3020 (25%)

[1] "value_tm_CV_SMF1"

left join: resid_IV[key, DV] 1508/1508 (100%) <2:1st> resid_DV[key, DV] 754/3016 (25%)

[1] "value_mu.vel_CV_SMF1"
left join: resid_IV[key, DV] 1504/1504 (100%) <2:1st> resid_DV[key, DV] 752/3008 (25%)

[1] "value_logAdj_pl.up"

left join: resid_IV[key, DV] 1460/1460 (100%) <2:1st> resid_DV[key, DV] 730/2920 (25%)

[1] "value_logAdj_pl.fwd"

left join: resid_IV[key, DV] 1438/1438 (100%) <2:1st> resid_DV[key, DV] 719/2876 (25%)

[1] "value_logAdj_pl.side"

left join: resid_IV[key, DV] 1288/1288 (100%) <2:1st> resid_DV[key, DV] 644/2576 (25%)

[1] "value_HRmu_perc"
left join: resid_IV[key, DV] 1330/1330 (100%) <2:1st> resid_DV[key, DV] 665/2660 (25%)

[1] "value_pl.up_CV_SMR1"
left join: resid_IV[key, DV] 1320/1320 (100%) <2:1st> resid_DV[key, DV] 660/2640 (25%)

[1] "value_pl.fwd_CV_SMR1"
left join: resid_IV[key, DV] 1334/1334 (100%) <2:1st> resid_DV[key, DV] 667/2668 (25%)

[1] "value_pl.side_CV_SMR1"

Break down of residual variance from random effects (NULL MODEL)

REstruct <- rbindlist(re_struc) |>mutate(DV = case_when(DV=="value_ave_GB"~"Adductor Strength (N)",
                        DV=="value_JHIM.mx"~"Jump height (cm)",
                        DV=="value_FZVmx"~"Fz at 0v (N)",
                        DV=="value_FTCT.mx"~"FT/CT",
                       # DV=="value_TTPP.mn"~"TTPP",
                        DV=="value_tm_CV_SMF1"~"SMF Duration",
                        DV=="value_mu.vel_CV_SMF1"~"SMF Velocity",
                        DV=="value_logAdj_pl.up"~"SMF PL.VT",
                        DV=="value_logAdj_pl.fwd"~"SMF PL.AP",
                        DV=="value_logAdj_pl.side"~"SMF PL.ML",
                        DV=="value_HRmu_perc"~"CF-SMFT HR%",
                        DV=="value_pl.up_CV_SMR1"~"CF-SMFT PL.VT",
                        DV=="value_pl.fwd_CV_SMR1"~"CF-SMFT PL.AP",
                        DV=="value_pl.side_CV_SMR1"~"CF-SMFT PL.ML"
                        ))|>
  mutate(Test = ifelse(grepl("CF",DV),"CF-SMF",
                       ifelse(grepl("SMF",DV),"SMF",
                       ifelse(grepl("Adductor",DV),"Groin Bar","CMJ"))))|>
  dplyr::group_by(DV)|>
  mutate(sumVar = sum(estimate))|>
  dplyr::ungroup()|>
  mutate(contribution = estimate/sumVar*100)

ggplot(REstruct,aes(Component,DV,fill=contribution,label=round(estimate,1)))+
  geom_tile(col="white")+
  geom_text(col="white",size=3)+
  facet_grid(Test~Component,scales = "free",space = "free",
             labeller = label_wrap_gen(multi_line = T,width = 10))+
  theme(strip.text = element_text(size=7,face = "bold"),
        axis.text.x = element_blank())

Between-Athlete Baseline: How much athletes differ from each other in their overall score. e.g. the between athlete SD across athletes for Jump height is 4.2cm
Between athlete Trajectory: How much athletes differ from each other in their rate of change over the season.
Between-Season baseline (within athlete): How much an individuals average performance varies between seasons.
Between-Season trajectory (within athlete): How much an individual rate of change varies between seasons.
Residual: Remaining variation within an athlete-season after accounting for all other patterns (within-subject variability)

Create example of Table 1.

pacman::p_load(kableExtra)

 rbindlist(CV_TE)|> 
  mutate_if(is.numeric, round,2)|>
  select(DV,Ave,SD,Residual_te,conf.low,conf.high,CV,LL,UL,Null,Full)|>
  mutate(DV = case_when(DV=="value_ave_GB"~"Adductor Strength (N)",
                        DV=="value_JHIM.mx"~"Jump height (cm)",
                        DV=="value_FZVmx"~"Fz at 0v (N)",
                        DV=="value_FTCT.mx"~"FT/CT",
                       # DV=="value_TTPP.mn"~"TTPP",
                        DV=="value_tm_CV_SMF1"~"SMF Duration",
                        DV=="value_mu.vel_CV_SMF1"~"SMF Velocity",
                        DV=="value_logAdj_pl.up"~"SMF PL.VT",
                        DV=="value_logAdj_pl.fwd"~"SMF PL.AP",
                        DV=="value_logAdj_pl.side"~"SMF PL.ML",
                        DV=="value_HRmu_perc"~"CF-SMFT HR%",
                        DV=="value_pl.up_CV_SMR1"~"CF-SMFT PL.VT",
                        DV=="value_pl.fwd_CV_SMR1"~"CF-SMFT PL.AP",
                        DV=="value_pl.side_CV_SMR1"~"CF-SMFT PL.ML"
                        ))|>
  kableExtra::kable()|>
    add_header_above(c(" ","Descriptive" = 2, "residual SD/TE" = 3,
                       "CV%" = 3,"RMSE" = 2))|>
  pack_rows("Groin Bar", 1,1) |>
  pack_rows("CMJ", 2, 4) |>
  pack_rows("HI-IR assessment", 5, 9) |>
  pack_rows("CF-SMFT assessment",10, 13) |>
  kable_paper()

	Descriptive		residual SD/TE			CV%			RMSE
DV	Ave	SD	Residual_te	conf.low	conf.high	CV	LL	UL	Null	Full
Groin Bar
Adductor Strength (N)	478.48	82.58	30.75	29.61	31.96	6.43	6.19	6.68	0.36	0.36
CMJ
Jump height (cm)	37.73	4.44	1.61	1.53	1.70	4.27	4.07	4.50	0.36	0.31
Fz at 0v (N)	2389.43	456.87	129.43	123.03	136.42	5.42	5.15	5.71	0.27	0.26
FT/CT	0.91	0.16	0.06	0.06	0.06	6.57	6.24	6.92	0.35	0.33
HI-IR assessment
SMF Duration	4.13	1.33	1.20	1.14	1.27	29.17	27.74	30.73	0.85	0.69
SMF Velocity	24.93	2.59	2.30	2.18	2.42	9.22	8.75	9.72	0.87	0.62
SMF PL.VT	2.49	0.50	0.24	0.23	0.26	9.77	9.25	10.34	0.50	0.48
SMF PL.AP	1.45	0.27	0.18	0.17	0.19	12.28	11.55	12.92	0.63	0.62
SMF PL.ML	1.44	0.27	0.18	0.17	0.19	12.69	12.01	13.43	0.65	0.64
CF-SMFT assessment
CF-SMFT HR%	76.36	4.43	2.63	2.48	2.80	3.45	3.25	3.66	0.55	0.50
CF-SMFT PL.VT	19.65	3.46	1.67	1.58	1.78	8.50	8.02	9.04	0.52	0.48
CF-SMFT PL.AP	9.19	1.65	0.97	0.92	1.03	10.61	10.02	11.26	0.55	0.49
CF-SMFT PL.ML	8.96	1.96	1.03	0.97	1.10	11.53	10.86	12.27	0.57	0.53

  #kableExtra::kable_styling(bootstrap_options = c("hover"))

GAM summary table merged paramater effect

edf_bind <- rbindlist(edf)|>
        mutate_if(is.numeric, round,3)|>
        select(DV,var,edf,F,p.value)|>
        dplyr::filter(!var=="s(ID)"&!var=="s(Season,ID)")|>
        dplyr::rename(contrast=var)|>
        dplyr::rename(edf_pvalue=p.value)|>
        dplyr::mutate(contrast =  str_extract(contrast, "(?<=\\().*(?=\\))"))|>
        setDT()


vars <- c("s(Zscr_WI_TD_acu7)", "s(Zscr_WI_VHSR_25_acu7)")

rng =rbindlist(cp)[!is.na(.estimate),
       ][.smooth%in%vars,
       ][!is.na(value),
       ][,comb:=paste0(DV,"_",variable)
      # calculate maximum observed effect over smoothed range
       ][,zero:= ifelse(.lower_ci<=0&.upper_ci>=0,0,1)  
       ][,.(min=round(min(.estimate),2),
             max=round(max(.estimate),2),
            CL_overlap=max(zero)),by=.(DV,variable)
       ][,CL_overlap:=ifelse(CL_overlap==1,"No","Yes")
       ][,range:=max-min
       ][,.(DV,variable,range,CL_overlap)
       ][,setnames(.SD,"variable","contrast")]


# Show key variables who had signficant edf pvalue and effect size >0.3
joined <- join(edf_bind,rng)|>
    fsubset(!grepl("Week",contrast))|>
    fsubset(edf_pvalue<0.01)#&range>=.3)

left join: edf_bind[DV, contrast] 26/78 (33.3%) <1:1st> rng[DV, contrast] 26/26 (100%)

#joined
# Show table
joined|>
  mutate(DV = case_when(DV=="Zscr_BT_ave_GB"~"Adductor Strength (N)",
                       DV=="Zscr_BT_JHIM.mx"~"Jump height (cm)",
                       DV=="Zscr_BT_FZVmx"~"Fz at 0v (N)",
                       DV=="Zscr_BT_FTCT.mx"~"FT/CT",
                       DV=="Zscr_BT_TTPP.mn"~"TTPP",
                       DV=="Zscr_BT_tm_CV_SMF1"~"SMF Duration",
                       DV=="Zscr_BT_mu.vel_CV_SMF1"~"SMF Velocity",
                       DV=="Zscr_BT_logAdj_pl.up"~"SMF PL.VT",
                       DV=="Zscr_BT_logAdj_pl.fwd"~"SMF PL.AP",
                       DV=="Zscr_BT_logAdj_pl.side"~"SMF PL.ML",
                       DV=="Zscr_BT_HRmu_perc"~"CF-SMFT HR%",
                       DV=="Zscr_BT_pl.up_CV_SMR1"~"CF-SMFT PL.VT",
                       DV=="Zscr_BT_pl.fwd_CV_SMR1"~"CF-SMFT PL.AP",
                       DV=="Zscr_BT_pl.side_CV_SMR1"~"CF-SMFT PL.ML"
                       ),
         edf = round(edf,1),
         F = round(F,2))|>
    kableExtra::kable()|>
      add_header_above(c("","","gam summary" = 3, "effect"=2))|>
  kable_paper()

		gam summary			effect
DV	contrast	edf	F	edf_pvalue	range	CL_overlap
Jump height (cm)	Zscr_WI_TD_acu7	1.0	15.68	0.000	0.26	No
Fz at 0v (N)	Zscr_WI_TD_acu7	4.9	20.81	0.003	0.26	Yes
SMF Velocity	Zscr_WI_VHSR_25_acu7	0.9	1.33	0.003	0.47	Yes
CF-SMFT HR%	Zscr_WI_TD_acu7	1.2	13.50	0.000	0.72	No
CF-SMFT PL.AP	Zscr_WI_TD_acu7	2.8	4.62	0.000	0.62	No
CF-SMFT PL.ML	Zscr_WI_TD_acu7	2.9	5.23	0.000	0.77	No

Clean data for plotting

Key to filter for key variable

rm  <- paste0(joined$DV,"_",joined$contrast)
str <- paste(rm, collapse = "','")
str <- paste0("c('", str, "')")

Create plotting data.frame

pltDat<- rbindlist(cp)[,comb:= paste0(DV,"_",variable)
                     #][comb%in%eval(parse(text = str)),
                     ][,DV:=sub("^Zscr_BT_", "", DV)
                     ][,DV:=fcase(DV == "ave_GB","~Adductor~Strength",
                                  DV== "FZVmx","CMJ~F0V",
                                  DV== "FTCT.mx","CMJ~FT:CT",
                                  DV== "TTPP.mn","CMJ~TTPP",
                                  DV== "JHIM.mx","CMJ~JH",
                                  DV== "mu.vel_CV_SMF1","HI-IR~Vel",
                                  DV== "tm_CV_SMF1","HI-IR~Dur",
                                  DV== "logAdj_pl.up","HI-IR~PL[VT]",
                                  DV== "logAdj_pl.fwd","HI-IR~PL[AP]",
                                  DV== "logAdj_pl.side","HI-IR~PL[ML]",
                                  DV== "HRmu_perc","CF-SMFT~HR[ex]",
                                  DV== "pl.up_CV_SMR1","CF-SMFT~PL[V]",
                                  DV== "pl.fwd_CV_SMR1","CF-SMFT~PL[AP]",
                                  DV== "pl.side_CV_SMR1","CF-SMFT~PL[ML]")
              # Dropped below for the sake of a 3*4 grid
                ][!DV=="HI-IR~Dur",
                ][,variable:=ifelse(variable=="Zscr_WI_TD_acu7","~7~day~TD",
                             ifelse(variable=="Zscr_WI_VHSR_25_acu7","7~day~VHSR","Week")
                             )]

                
pltDat$variable<-fct_relevel(pltDat$variable, "~7~day~TD",
                             "7~day~VHSR",
                             "Week")


vars <- c("s(Zscr_WI_TD_acu7)", "s(Zscr_WI_VHSR_25_acu7)")

residuals <-rbindlist(resid_str)[,comb:= paste0(DV,"_",IV)
                    #  ][comb%in%eval(parse(text = str)),
                      ][,DV:=sub("^Zscr_BT_", "", DV)
                      ][,variable:=ifelse(IV=="Zscr_WI_TD_acu7","~7~day~TD",
                             ifelse(IV=="Zscr_WI_VHSR_25_acu7","7~day~VHSR","Week"))
                     ][,DV:=fcase(DV == "ave_GB","~Adductor~Strength",
                                  DV== "FZVmx","CMJ~F0V",
                                  DV== "FTCT.mx","CMJ~FT:CT",
                                  DV== "TTPP.mn","CMJ~TTPP",
                                  DV== "JHIM.mx","CMJ~JH",
                                  DV== "mu.vel_CV_SMF1","HI-IR~Vel",
                                  DV== "tm_CV_SMF1","HI-IR~Dur",
                                  DV== "logAdj_pl.up","HI-IR~PL[VT]",
                                  DV== "logAdj_pl.fwd","HI-IR~PL[AP]",
                                  DV== "logAdj_pl.side","HI-IR~PL[ML]",
                                  DV== "HRmu_perc","CF-SMFT~HR[ex]",
                                  DV== "pl.up_CV_SMR1","CF-SMFT~PL[V]",
                                  DV== "pl.fwd_CV_SMR1","CF-SMFT~PL[AP]",
                                  DV== "pl.side_CV_SMR1","CF-SMFT~PL[ML]")
              # Dropped below for the sake of a 3*4 grid
                ][!DV=="HI-IR~Dur",]
  
       

                
residuals$variable<-fct_relevel(residuals$variable, "~7~day~TD", 
                             "7~day~VHSR",
                             "Week")

Warning: 1 unknown level in `f`: Week

Examine mean centered PDP/ Recreate effects figure1

#pacman::p_load(ggalt)

pdpsum<-ggplot()+
      annotate(geom = "rect",fill="gray80",xmin=-Inf,xmax = Inf,
               ymin = -0.3,ymax=.3,
           alpha=.3)+
  geom_hline(yintercept = c(0),lwd=.25)+
 # geom_hline(yintercept = c(-.6,0.6),linetype="dashed")+
  #geom_vline(xintercept = c(-2,-1,1,2),lwd=.25,alpha=.1)+
  geom_vline(xintercept = c(0),lwd=.25,alpha=.9)+
  geom_point(data = residuals[comb%in%eval(parse(text = str)),],alpha=.4,size=1,
           mapping = aes(x=IV_value,y=DV_value),
           colour="#4e94ce")+
  geom_ribbon(pltDat[comb%in%eval(parse(text = str)),],
              mapping=aes(value,ymin=.lower_ci,ymax=.upper_ci),
              col=NA,alpha=.3)+
  geom_path(pltDat[comb%in%eval(parse(text = str)),],mapping=aes(value,y=.estimate))+
  # facet_grid(variable~DV,labeller =labeller(.default = label_parsed,
  #     .multi_line = TRUE))+
  facet_wrap(~DV+variable,nrow=1,
             labeller = label_parsed)+
  scale_x_continuous(limits = c(-3,3),breaks = c(-3,-2,-1,0,1,2,3))+
  scale_y_continuous(limits = c(-2.4,2.4),
                     breaks = c(-2,-1.2,-.6,-.3,0,0.3,0.6,1.2,2))+
     theme_tufte()+
  labs(y="Between athlete scaled response",x="Within athlete scaled load")+
  theme(panel.background = element_rect(fill = NA,colour = "black"))

pdpsum

Warning: Removed 3 rows containing missing values or values outside the scale range
(`geom_point()`).

Warning: Removed 300 rows containing missing values or values outside the scale range
(`geom_path()`).

#ggsave(pdpsum,width = 9,height = 3.2,filename="pdpsummaryIPsml_upd.png",dpi=800)

Post Hoc analysis

## SMFT observed effects

smft_hr=storeModel[[10]] |> 
  emmeans( consec~ Zscr_WI_TD_acu7, 
           var ="Zscr_WI_TD_acu7",
           at = list("Zscr_WI_TD_acu7" = c(-2,0)),
           cov.reduce=F)|>summary(infer=T)
#smft_hr$contrasts|>clipr::write_clip()
#smft_hr$emmeans|>clipr::write_clip()

## CF-SMFT PL AP

#rise phase
smft_plup=storeModel[[12]] |> 
  emmeans(consec~ Zscr_WI_TD_acu7, 
           var ="Zscr_WI_TD_acu7",
           at = list("Zscr_WI_TD_acu7" = c(-2,0)),
           cov.reduce=F)|>summary(infer=T)

#smft_plup$contrasts|>clipr::write_clip()
smft_plup$emmeans

 Zscr_WI_TD_acu7 emmean    SE  df lower.CL upper.CL t.ratio p.value
              -2 -0.299 0.165 557  -0.6233   0.0256  -1.809  0.0710
               0  0.145 0.119 557  -0.0897   0.3788   1.212  0.2259

Results are averaged over the levels of: Week, Season 
Confidence level used: 0.95

#plateau phase
smft_plup=storeModel[[12]] |> 
  emmeans(consec~ Zscr_WI_TD_acu7, 
           var ="Zscr_WI_TD_acu7",
           at = list("Zscr_WI_TD_acu7" = c(0,2)),
           cov.reduce=F)|>summary(infer=T)

#smft_plup$contrasts|>clipr::write_clip()
smft_plup$emmeans

 Zscr_WI_TD_acu7   emmean    SE  df lower.CL upper.CL t.ratio p.value
               0  0.14458 0.119 557  -0.0897    0.379   1.212  0.2259
               2 -0.00603 0.160 557  -0.3197    0.308  -0.038  0.9699

Results are averaged over the levels of: Week, Season 
Confidence level used: 0.95

## CF-SMFT PL ML

#rise phase
smft_plup=storeModel[[13]] |> 
  emmeans(consec~ Zscr_WI_TD_acu7, 
           var ="Zscr_WI_TD_acu7",
           at = list("Zscr_WI_TD_acu7" = c(-2,0)),
           cov.reduce=F)|>summary(infer=T)

#smft_plup$contrasts|>clipr::write_clip()
smft_plup$emmeans

 Zscr_WI_TD_acu7  emmean    SE  df lower.CL upper.CL t.ratio p.value
              -2 -0.5710 0.162 578   -0.889   -0.253  -3.522  0.0005
               0 -0.0398 0.112 578   -0.260    0.180  -0.355  0.7226

Results are averaged over the levels of: Week, Season 
Confidence level used: 0.95

#plateau phase
smft_plup=storeModel[[13]] |> 
  emmeans(consec~ Zscr_WI_TD_acu7, 
           var ="Zscr_WI_TD_acu7",
           at = list("Zscr_WI_TD_acu7" = c(0,2)),
           cov.reduce=F)|>summary(infer=T)

#smft_plup$contrasts|>clipr::write_clip()
smft_plup$emmeans

 Zscr_WI_TD_acu7  emmean    SE  df lower.CL upper.CL t.ratio p.value
               0 -0.0398 0.112 578   -0.260   0.1802  -0.355  0.7226
               2 -0.2277 0.159 578   -0.541   0.0853  -1.429  0.1536

Results are averaged over the levels of: Week, Season 
Confidence level used: 0.95

## HIIR effects

hiir_vel=storeModel[[6]] |> 
  emmeans(consec~ Zscr_WI_VHSR_25_acu7, 
           var ="Zscr_WI_VHSR_25_acu7",
           at = list("Zscr_WI_VHSR_25_acu7" = c(-2,0)),
           cov.reduce=F)|>summary(infer=T)

#hiir_vel$contrasts|>clipr::write_clip()

Groin bar effects

ggplot()+
      geom_hline(yintercept = c(-0.3,0.3),linetype="dashed",col="gray70")+
    geom_hline(yintercept = 0,col="gray7")+
    geom_vline(xintercept = c(-1,1),linetype="dashed",col="gray70")+
    geom_vline(xintercept = 0,linetype="dashed",col="gray7")+
    geom_point(data = residuals[grepl("Adductor",DV)&!variable=="Week",],alpha=.4,size=1,
           mapping = aes(x=IV_value,y=DV_value),
           colour="#4e94ce")+
    geom_ribbon(pltDat[grepl("Adductor",DV)&!variable=="Week",],
       mapping=aes(value,.estimate,ymin=.lower_ci,ymax=.upper_ci),
       alpha=.5,col=NA)+
    geom_path(pltDat[grepl("Adductor",DV)&!variable=="Week",],
       mapping=aes(value,.estimate))+
    facet_grid(rows = vars(DV),cols = vars(variable),
             labeller = label_parsed)+
   #  facet_wrap(~DV+variable,nrow=3,
  #           labeller = label_parsed)+
   # theme_tufte()+
    ylim(c(-1.2,1.2))+
    labs(x = "Within participant scaled load",
       y = "Between participant scaled response")+
    theme(legend.position = "bottom",
          text=element_text(family = "Times"),
          panel.grid = element_blank(),
        panel.background = element_rect(fill = NA,colour="gray30"))

CMJ dose response effects

CMJ dose response

ggplot()+
      geom_hline(yintercept = c(-0.3,0.3),linetype="dashed",col="gray70")+
    geom_hline(yintercept = 0,col="gray7")+
    geom_vline(xintercept = c(-1,1),linetype="dashed",col="gray70")+
    geom_vline(xintercept = 0,linetype="dashed",col="gray7")+
    geom_point(data = residuals[grepl("CMJ",DV)&!variable=="Week",],alpha=.4,size=1,
           mapping = aes(x=IV_value,y=DV_value),
           colour="#4e94ce")+
    geom_ribbon(pltDat[grepl("CMJ",DV)&!variable=="Week",],
       mapping=aes(value,.estimate,ymin=.lower_ci,ymax=.upper_ci),
       alpha=.5,col=NA)+
    geom_path(pltDat[grepl("CMJ",DV)&!variable=="Week",],
       mapping=aes(value,.estimate))+
  facet_grid(rows = vars(DV),cols = vars(variable),
             labeller = label_parsed,scale="free")+
   #  facet_wrap(~DV+variable,nrow=3,
  #           labeller = label_parsed)+
   # theme_tufte()+
    ylim(c(-1.2,1.2))+
    labs(x = "Within participant scaled load",
       y = "Between participant scaled response")+
    theme(legend.position = "bottom",
          text=element_text(family = "Times"),
          panel.grid = element_blank(),
        panel.background = element_rect(fill = NA,colour="gray30"))

HIIR dose response

ggplot()+
      geom_hline(yintercept = c(-0.3,0.3),linetype="dashed",col="gray70")+
    geom_hline(yintercept = 0,col="gray7")+
    geom_vline(xintercept = c(-1,1),linetype="dashed",col="gray70")+
    geom_vline(xintercept = 0,linetype="dashed",col="gray7")+
    geom_point(data = residuals[grepl("HI-IR",DV)&!variable=="Week",],alpha=.4,size=1,
           mapping = aes(x=IV_value,y=DV_value),
           colour="#4e94ce")+
    geom_ribbon(pltDat[grepl("HI-IR",DV)&!variable=="Week",],
       mapping=aes(value,.estimate,ymin=.lower_ci,ymax=.upper_ci),
       alpha=.5,col=NA)+
    geom_path(pltDat[grepl("HI-IR",DV)&!variable=="Week",],
       mapping=aes(value,.estimate))+
  facet_grid(rows = vars(DV),cols = vars(variable),
             labeller = label_parsed)+
    ylim(c(-1.2,1.2))+
    labs(x = "Within participant scaled load",
       y = "Between participant scaled response")+
    theme(legend.position = "bottom",
          text=element_text(family = "Times"),
          panel.grid = element_blank(),
        panel.background = element_rect(fill = NA,colour="gray30"))

CF - SMFT dose response

ggplot()+
      geom_hline(yintercept = c(-0.3,0.3),linetype="dashed",col="gray70")+
    geom_hline(yintercept = 0,col="gray7")+
    geom_vline(xintercept = c(-1,1),linetype="dashed",col="gray70")+
    geom_vline(xintercept = 0,linetype="dashed",col="gray7")+
    geom_point(data = residuals[grepl("CF-SMFT",DV)&!variable=="Week",],alpha=.3,size=1,
           mapping = aes(x=IV_value,y=DV_value),
           colour="#4e94ce")+
    geom_ribbon(pltDat[grepl("CF-SMFT",DV)&!variable=="Week",],
       mapping=aes(value,.estimate,ymin=.lower_ci,ymax=.upper_ci),
       alpha=.5,col=NA)+
    geom_path(pltDat[grepl("CF-SMFT",DV)&!variable=="Week",],
       mapping=aes(value,.estimate))+
  facet_grid(rows = vars(DV),cols = vars(variable),
             labeller = label_parsed)+
   #  facet_wrap(~DV+variable,nrow=3,
  #           labeller = label_parsed)+
   # theme_tufte()+
    ylim(c(-1.2,1.2))+
    labs(x = "Within participant scaled load",
       y = "Between participant scaled response")+
    theme(legend.position = "bottom",
          text=element_text(family = "Times"),
          panel.grid = element_blank(),
        panel.background = element_rect(fill = NA,colour="gray30"))

4a) Sensitivity analysis for GAM models.

Running a sensitivity analysis for a GAM model is not as straight forward as for a linear model as the potential functional form is not known apriori. Instead, to provide insight into the stability of our model estimates, we performed a post-hoc sensitivity analysis. Specifically, we refitted the final model to multiple random subsamples of the data. To help contextualise the model’s stability, we have plotted the resulting distributions of the p-values and the Effective Degrees of Freedom (EDF) for our key predictors.

pacman::p_load(doSNOW,foreach,doParallel,progress)

# Set up variables
subject_ids <- unique(CVTE_dat$ID)
n_subjects <- length(subject_ids)
vars <- c(2,10, 12,13)
subsample_props <- c(0.5,0.6, 0.7, 0.8, 0.9)
n_iterations <- 100

param_grid <- expand_grid(
  dependent_var_index = vars,
  prop = subsample_props,
  iteration = 1:n_iterations
)|>mutate(dependent_var_name = DVOI_scaled[dependent_var_index])




## Helper function to run in parallel below
run_single_model <- function(dv_name, proportion, iter_id) {
  
  # Set a seed for reproducibility
  set.seed(iter_id * 1000 + proportion * 100) 
  
  # Sample subjects
  subsample_ids <- sample(subject_ids, 
                          size = floor(n_subjects * proportion), 
                          replace = FALSE)
  
  sub_dat <- CVTE_dat[ID %in% subsample_ids, ]
  
  # Build and run the model
  formula_str <- paste0(dv_name, '~ s(Zscr_WI_TD_acu7,bs="ts")+
                        s(Zscr_WI_VHSR_25_acu7,bs="ts")+
                        s(Week,by=Season,bs="cs")+
                        s(ID, bs = "re")+
                        s(Week, ID, bs = "re")+
                        s(Week, Season, ID, bs = "re")+
                        s(Season, ID, bs = "re")')
  
  gam_subsample <- bam(as.formula(formula_str), 
                       data = sub_dat, 
                       discrete = TRUE, 
                       method = "fREML", 
                       select = TRUE)
  
  # Extract results and add metadata
  summary_df <- summary(gam_subsample)$s.table %>%
    as.data.frame() %>%
    tibble::rownames_to_column("var") %>%
    mutate(
      iteration = iter_id,
      sample_proportion = proportion,
      dependent_var = dv_name
    )
    
  return(summary_df)
}


## set up for parallel processing

NCKF <- 4 # number of cores to keep free (update based on your computer)

cl <- makeCluster(detectCores() - NCKF)
registerDoSNOW(cl) # needs to be DoSnow for progress bar

# create progress bar function
pb <- progress_bar$new(
  format = "Simulating [:bar] :percent | ETA: :eta",
  total = nrow(param_grid),
  width = 80
)

progress_opts <- list(progress = function(n) pb$tick())

# set up and run

final_results_df <- foreach(
  i = 1:nrow(param_grid),
  .combine = 'rbind',
  .packages = c('mgcv', 'dplyr', 'tibble', 'data.table'),
  .export = c('run_single_model', 'subject_ids', 'n_subjects', 'CVTE_dat'),
  .options.snow = progress_opts
) %dopar% {
  
  # Get parameters and run the model
  dv_name    <- param_grid$dependent_var_name[i]
  proportion <- param_grid$prop[i]
  iter_id    <- param_grid$iteration[i]
  
  run_single_model(dv_name, proportion, iter_id)
  
}

Warning in e$fun(obj, substitute(ex), parent.frame(), e$data): already
exporting variable(s): run_single_model, subject_ids, n_subjects, CVTE_dat

#CLOSE CLUSTER
stopCluster(cl)


#"
plot_data <-setDT(final_results_df)[,
  dv := fcase(
    dependent_var == "Zscr_BT_ave_GB",       "~Adductor~Strength",
    dependent_var == "Zscr_BT_JHIM.mx",      "CMJ~JH",
    dependent_var == "Zscr_BT_FZVmx",        "CMJ~FT:CT",
    dependent_var == "Zscr_BT_FTCT.mx",      "CMJ~FT:CT",
    dependent_var == "Zscr_BT_tm_CV_SMF1",   "HI-IR~Dur",
    dependent_var == "Zscr_BT_mu.vel_CV_SMF1","HI-IR~Vel",
    dependent_var == "Zscr_BT_logAdj_pl.up", "HI-IR~PL[VT]",
    dependent_var == "Zscr_BT_logAdj_pl.fwd","HI-IR~PL[AP]",
    dependent_var == "Zscr_BT_logAdj_pl.side","HI-IR~PL[ML]",
    dependent_var == "Zscr_BT_HRmu_perc",    "CF-SMFT~HR[ex]",
    dependent_var == "Zscr_BT_pl.up_CV_SMR1","CF-SMFT~PL[V]",
    dependent_var == "Zscr_BT_pl.fwd_CV_SMR1","CF-SMFT~PL[AP]",
    dependent_var == "Zscr_BT_pl.side_CV_SMR1","CF-SMFT~PL[ML]",
    default = dependent_var   # keep original if no match
  )
][var %in% c("s(Zscr_WI_TD_acu7)"),]

#View(plot_data)




plt2_dat =plot_data[,.(dependent_var,dv,var,sample_proportion,edf,`p-value`)
                    ][,sample_proportion:=as.factor(sample_proportion),
                    ]|>melt.data.table(.SD,id.vars=1:4)

plt2_dat_sig <- 
  plt2_dat[variable == "p-value",
][,sig_flag := ifelse(value > 0.05, "n.s", "sig")
][,dcast.data.table(.SD,dependent_var+dv+sample_proportion~sig_flag)
]

Warning in dcast.data.table(.SD, dependent_var + dv + sample_proportion ~ :
'fun.aggregate' is NULL, but found duplicate row/column combinations, so
defaulting to length(). That is, the variables [dependent_var, dv,
sample_proportion, sig_flag] used in 'formula' do not uniquely identify rows in
the input 'data'. In such cases, 'fun.aggregate' is used to derive a single
representative value for each combination in the output data.table, for example
by summing or averaging (fun.aggregate=sum or fun.aggregate=mean,
respectively). Check the resulting table for values larger than 1 to see which
combinations were not unique. See ?dcast.data.table for more details.

plt2_dat_sig <- plt2_dat_sig[,event_rate := sig/100][,variable:="p-value"]

ggplot(plt2_dat,aes(sample_proportion,value,group=sample_proportion))+
  geom_jitter(alpha=0.4)+
   geom_boxplot()+
  geom_text(plt2_dat_sig,
            mapping=aes(sample_proportion,0.85,
                        label=paste(event_rate)),
            size=3)+#paste0(n.s,"/100"))))+
   geom_hline(
    data = data.table(variable = "p-value", value = 0.05),
    mapping=aes(yintercept = 0.05),
    linetype = "dashed",
    colour = "blue",
    inherit.aes = FALSE
  )+
  theme(panel.background = element_rect(fill = NA,colour = "gray30"))+
  facet_grid(variable~dv,scales="free",labeller = label_parsed)

Warning in geom_hline(data = data.table(variable = "p-value", value = 0.05), :
Ignoring unknown parameters: `inherit.aes`

Temporal effects

temp_sum= edf_bind|>
    fsubset(grepl("Week",contrast))|>
    fsubset(!grepl("ID",contrast))|>
    fsubset(edf_pvalue<0.01)


temp_sum|>
  kableExtra::kable()|>
      add_header_above(c("","","gam summary" = 3))|>
  kable_paper()

		gam summary
DV	contrast	edf	F	edf_pvalue
Zscr_BT_JHIM.mx	Week	5.674	57.164	0.000
Zscr_BT_JHIM.mx	Week	2.235	19.037	0.000
Zscr_BT_FZVmx	Week	1.694	21.843	0.000
Zscr_BT_FTCT.mx	Week	4.827	21.304	0.000
Zscr_BT_FTCT.mx	Week	1.752	10.927	0.001
Zscr_BT_tm_CV_SMF1	Week	8.332	18.306	0.000
Zscr_BT_tm_CV_SMF1	Week	8.336	33.001	0.000
Zscr_BT_mu.vel_CV_SMF1	Week	8.749	58.122	0.000
Zscr_BT_mu.vel_CV_SMF1	Week	8.407	38.862	0.000
Zscr_BT_logAdj_pl.up	Week	4.290	5.747	0.000
Zscr_BT_logAdj_pl.side	Week	4.030	2.963	0.002
Zscr_BT_HRmu_perc	Week	2.701	20.203	0.000
Zscr_BT_pl.fwd_CV_SMR1	Week	2.098	12.668	0.000
Zscr_BT_pl.side_CV_SMR1	Week	2.075	12.461	0.000
Zscr_BT_pl.side_CV_SMR1	Week	2.490	6.564	0.000

temporalFig=ggplot(pltDat[grepl("Week",.smooth)&!DV=="HI-IR~Vel",],
       aes(value,.estimate,
           ymin=.lower_ci,
           ymax=.upper_ci,
           fill=.smooth))+
  annotate(geom = "rect",xmin = -Inf,xmax=Inf,ymin = -0.3,ymax=0.3,
           alpha=.2)+
  geom_hline(yintercept = c(-0.3,0.3),size=0.1)+
  geom_ribbon(alpha=.5,col="white")+
  geom_path()+
  ylim(c(-1,1))+
  labs(y="Between athlete scaled Estimate",x="Week")+
  facet_wrap(~DV,labeller = label_parsed)+
  theme(panel.background = element_rect(fill = NA,colour="black"),
        panel.grid = element_blank(),
        legend.position = "top",
        text = element_text(family = "Times"))

HIIRvel_temp <-ggplot(pltDat[grepl("Week",.smooth)&DV=="HI-IR~Vel",],
       aes(value,.estimate,
           ymin=.lower_ci,
           ymax=.upper_ci,
           fill=.smooth))+
  annotate(geom = "rect",xmin = -Inf,xmax=Inf,ymin = -0.3,ymax=0.3,
           alpha=.2)+
  geom_ribbon(alpha=.5)+
  geom_path()+
  ylim(c(-3,3))+
  labs(y="Between athlete scaled Estimate",x="Week")+
  facet_wrap(~DV,labeller = label_parsed)+
  theme(panel.background = element_rect(fill = NA,colour="black"),
        panel.grid = element_blank(),
        legend.position = "top",
        text = element_text(family = "Times"))

ggpubr::ggarrange(temporalFig,HIIRvel_temp,widths = c(0.7,0.3),common.legend = T)