library(tidyr)
library(dplyr)
library(osfr)
library(readr)
library(devtools)
#devtools::install_github("HerveAbdi/DistatisR")
library(DistatisR) # covSTATIS analysis
library(PTCA4CATA) # factor map plotting
library(ExPosition) # makeDataNominal()
library(gplots) # heatmap.2()
library(sp) # convex hulls
library(ggplot2)
library(RColorBrewer)
#devtools::install_github("mychan24/superheat")
library(superheat)
## Theme
<- theme(axis.text = element_text(size = 8, color = "#42376B"),
scatter.theme axis.title = element_text(size = 10, color = "#42376B"),
title = element_text(size = 10, color = "#42376B"), plot.title = element_text(hjust = 0.5))
Apply regular covSTATIS to task fMRI data
Setup
Data info
All data used in this tutorial are stored here. The “Data” repo contains: 144x3 functional connectivity text files (n=144, 3=task conditions, 0-,1-,2-back), a “participants_ages.csv” file with participants info and an atlas labels file “Schaefer_100parcel_17network_labels.csv”.
After standard preprocessing (see Rieck et al., 2021 Data in Brief), each individual’s voxel-time series were parcellated, for each task condition, using the standard 100 region-17 network Schaefer atlas (Schaefer et al., 2018 Cerebral Cortex). For each participant and condition, Pearson’s correlations were quantified between region pairs, to ultimately obtain 100x100x3 functional connectivity matrices for each individual.
Read in the data
The first step is to read in our connectivity matrices to form a 3D array (= ‘X’ here below) for analysis with covSTATIS. In this tutorial, the 3D array dimensionality is 100 x 100 x 144 x 3 (ROI x ROI x N x Task condition).
<- "Schaefer100_17"
atlas_prefix <- read_csv('../Data/Schaefer_100parcel_17network_labels.csv')
atlas_labels $Network <- as.factor(atlas_labels$Network)
atlas_labels<- read_csv('../Data/participant_ages.csv')
demog_in <- demog_in$subjectID
ids
## Read in FC .txt files and reshape into 3D array with size n_roi x n_roi x n_sub*n_conditions (i.e. n_fns)
<-list.files(path = '../Data', pattern = paste0('^', atlas_prefix, '.*txt'), full.names = TRUE)
all_fns<-lapply(X=all_fns, FUN=read_delim, delim="\t", col_names = F,col_type = list(.default = col_double()))
X_list<- array(unlist(X_list), c(dim(X_list[[1]]), length(X_list))) #our 3D array
X rm(X_list)
## Label rows and columns with ROI label
if(length(grep("Schaefer", atlas_prefix)>0)){
rownames(X)<- atlas_labels$ROI_label
colnames(X)<- atlas_labels$ROI_label
}
## Label 3rd dimension with subject and task information (modified from filenames)
dimnames(X)[[3]]<-all_fns %>% gsub(pattern=".txt", replacement = "") %>%
gsub(pattern=paste0(atlas_prefix,"_"), replacement = "")
Run covSTATIS
Now that we have created our 3D data array (X), we can run covSTATIS. We used the distatis
function to do so. Since we are dealing with correlation values, the flag should be set to . The flag denotes the number of components we want to keep (the bigger the value, the slower the computations).
The function
In distatis
, there are a few other arguments that the user may want to specify according to their research question and data:
Norm
Norm
specifies how each individual data table is normalized before it is entered into distatis
. By default, norm is set to "MFA"
, short for Multiple Factor Analysis by Pagès, Jérôme. MFA normalizes each table by its first singular value. This type of normalization ensures that data tables with larger variance do not dominate the first component of covSTATIS; in other words, every data table contributes the same amount of variance to the analysis.
Other options are "None"
(no normalization step is taken), "SUMPCA"
(each table is normalized by its total variance), and "NUCLEAR"
(each table is normalized by its sum of singular values).
Our advice: We recommended people to use at least MFA or SUMPCA.
double-centering
double-centering
centers the rows and columns of each table after the normalization has taken place, before running distatis
. By default, double-centering
is set to "TRUE"
(i.e., each row/column will have a mean of 0). This step will ensure that the factor scores will have a mean of 0 for each dimension. When set to "FALSE"
, normalized (and not double-centered) data are input into distatis
.
Our advice: We suggest users to take careful consideration regarding this argument, as double-centering could artifically alter results when 0s are biologically relevant. Specifically, in the case of functional connectivity, when both positive and negative correlation coefficients are analyzed, the 0s in each matrix, as well as the sign of each correlation value, are meaningful. In this case, double-centering will adjust the values, inducing an artificial sign flip (i.e., weak negative might become weak positive, or the other way around). To note: When running covSTATIS on non-centered matrices with both positive and negative values, one should interpret the relationship between ROIs in the compromise space based on the cosine of the angle they form with the origin (0 in the compromise components space). On the other hand, when only positive correlations are analyzed, double-centering will not change the data pattern. In this case, the double-centering procedure can improve the visualization of the compromise space (as the origin is at the center of all the ROIs) and the relationships between these ROIs can be interpreted based on their distance.
design
design
allows users to specify the group structure of the matrices and ensures equal contribution of all groups when generating the weights to build the compromise. (Default is "NULL"
.)
Our advice: This is useful for some users that do group analyses and may be dealing with imbalanced groups (e.g., clinical vs healthy controls, with controls being larger in size). In this examplar case, specifying the
design
argument will avoid the analysis being dominated by the larger group (e.g., the healthy controls) and therefore will place both groups on the same plane (i.e., contributing equally to the patterns).
There multiple arguments in this function, and we encourage users to choose
<- distatis(X,
covstatis_res Norm = "MFA", # default
Distance = FALSE,
double_centering = TRUE, # default
nfact2keep = 10,
design = NULL, # default
compact = FALSE)
Step 1: \(R_V\) similarity matrix
The first operation covSTATIS computes is the generation of an \(R_V\) similarity matrix. The RV matrix quantifies the pairwise similarity among all data tables (squared Pearson’s correlation), irrespective of their rotation or scaling. The RV matrix then undergoes eigenvalue decomposition (EVD). The resulting first component best represents the common pattern across all tables, and its first eigenvector quantifies how similar each table is to this common pattern.
Let’s visualize the results of this EVD on the \(R_V\) matrix. The Scree plot shows how much variance is explained by each component/dimension. Ideally, there will be one strong component (as is the case with these data) indicating a strong coherence in the general network organization across all observations.
PlotScree(ev = covstatis_res$res4Cmat$eigValues,
title = "RV-map: Explained Variance per Dimension")
The factor map of the \(R_V\) space shows how similar the data tables are to each other.
<- createxyLabels.gen(x_axis = 1,
rv.labels y_axis = 2,
lambda = covstatis_res$res4Cmat$eigValues,
tau = covstatis_res$res4Cmat$tau,
axisName = "Component ")
## factor map
<- createFactorMap(covstatis_res$res4Cmat$G,
rv.map title = "The component space of the RV matrix",
col.background = NULL,
col.axes = "#42376B",
width.axes = 1,
alpha.axes = 0.5)
$zeMap_background + rv.map$zeMap_dots + rv.labels + scatter.theme rv.map
Step 2: Compromise space
Next, covSTATIS grabs the first eigenvector of this EVD on RV and scales it to sum to 1. The compromise matrix is built from the weighted sum of all data matrices and then also submitted to EVD. Let’s first visualize the compromise space/matrix.
<- as.data.frame(unique(atlas_labels[,c(3,4)]))
net.col rownames(net.col) <- net.col$Network
<- brewer.pal(11, "RdBu")
col.pal superheat(covstatis_res$res4Splus$Splus,
membership.cols = atlas_labels$Network,
membership.rows = atlas_labels$Network,
clustering.method = NULL,
heat.lim = c(-0.1,0.1),
heat.pal = rev(brewer.pal(11, "RdBu")),
heat.pal.values = c(0,0.45,0.5,0.55,1),
left.label.size = 0.08,
bottom.label.size = 0.05,
y.axis.reverse = TRUE,
left.label.col = net.col[levels(atlas_labels$Network),"Network_color"], # order by community name
bottom.label.col = net.col[levels(atlas_labels$Network), "Network_color"],
left.label.text.size = 3,
bottom.label.text.size = 2,
left.label.text.col = "black",
bottom.label.text.col = "black",
left.label.text.alignment = "left",
title = "The compromise")$plot
TableGrob (7 x 4) "layout": 5 grobs
z cells name grob
1 1 (3-3,3-3) panel gTree[panel-1.gTree.105]
2 2 (6-6,3-3) layout gtable[layout]
3 3 (3-3,2-2) layout gtable[layout]
4 4 (4-4,3-3) layout gtable[layout]
5 5 (2-2,3-3) layout gtable[layout]
Next, the compromise undergoes EVD. Let’s visualize its scree plot:
PlotScree(ev = covstatis_res$res4Splus$eigValues,
title = "Compromise: Explained Variance per Dimension")
Let’s now reconstruct a heatmap to explore the dimensions from the EVD on the compromise. Let’s start by visualizing the first two components (LV1, LV2).
<- tcrossprod(covstatis_res$res4Splus$F[,c(1,2)])
covstatis_corrmat_lv1_lv2
superheat(covstatis_corrmat_lv1_lv2,
membership.cols = atlas_labels$Network,
membership.rows = atlas_labels$Network,
clustering.method = NULL,
heat.lim = c(-0.1,0.1),
heat.pal = rev(brewer.pal(11, "RdBu")),
heat.pal.values = c(0,0.45,0.5,0.55,1),
left.label.size = 0.08,
bottom.label.size = 0.05,
y.axis.reverse = TRUE,
left.label.col = net.col[levels(atlas_labels$Network),"Network_color"], # order by community name
bottom.label.col = net.col[levels(atlas_labels$Network), "Network_color"],
left.label.text.size = 3,
bottom.label.text.size = 2,
left.label.text.col = "black",
bottom.label.text.col = "black",
left.label.text.alignment = "left",
title = "Rebuilt Heatmap for LV1 and LV2")$plot
TableGrob (7 x 4) "layout": 5 grobs
z cells name grob
1 1 (3-3,3-3) panel gTree[panel-1.gTree.310]
2 2 (6-6,3-3) layout gtable[layout]
3 3 (3-3,2-2) layout gtable[layout]
4 4 (4-4,3-3) layout gtable[layout]
5 5 (2-2,3-3) layout gtable[layout]
Step 3: global factor scores from compromise space
The compromise space shows us how regional connectivity values relate to each other across all data tables (individuals and conditions). Each point in the compromise space represents a global factor score: a different ROI from our functional connectivity matrices, colored by their network membership. The closer points are, the stronger the functional connectivity across the entire dataset.
<- unique(atlas_labels$Network)
network_labels <- unique(atlas_labels$Network_color)
network_colors
# Get labels for the compromise factor map
<- createxyLabels.gen(x_axis = 1,
compromise_label y_axis = 2,
lambda = covstatis_res$res4Splus$eigValues,
tau = covstatis_res$res4Splus$tau,
axisName = "Component ")
# To get graphs with axes 1 and 2:
<-1 # component to plot on x-axis
h_axis <-2 # component to plot on y-axis
v_axis
<- createFactorMap(covstatis_res$res4Splus$F,
compromise_graph_out axis1 = h_axis, axis2 = v_axis,
title = 'Compromise ROI Space',
col.points = atlas_labels$Network_color,
alpha.points=.6, cex=3,
col.background = NULL, col.axes = "#42376B",width.axes = 1, alpha.axes = 0.5)
print(compromise_graph_out$zeMap_background + compromise_graph_out$zeMap_dots + scatter.theme + compromise_label)
We can also compute the average of all the ROIs (i.e., barycenter) for each a priori network to get a coarser network compromise map.
<- t(apply(makeNominalData(as.matrix(atlas_labels$Network)),2,function(x){x/sum(x)})) %*% covstatis_res$res4Splus$F
network_mean_F row.names(network_mean_F) <- network_labels
<- createFactorMap(network_mean_F,
compromise_network_graph_out axis1 = h_axis,axis2 = v_axis,
title = "Compromise map for network-level barycenters",
col.points = network_colors, col.labels = network_colors,
alpha.points=.8, cex=5, text.cex=4, alpha.labels=.8, pch=18,
col.background = NULL, col.axes = "#42376B",width.axes = 1, alpha.axes = 0.5)
print(compromise_network_graph_out$zeMap + compromise_network_graph_out$zeMap_dots + scatter.theme + compromise_label)
Both the ROI-level and network-level factor maps can be overlaid to show how individuals ROIs cluster around their respective networks, for the whole sample.
print(compromise_graph_out$zeMap_background + compromise_graph_out$zeMap_dots +
$zeMap_dots + compromise_network_graph_out$zeMap_text + scatter.theme + compromise_label) compromise_network_graph_out
Step 4: partial factor scores from compromise space
Using partial projection, we can now project each condition back onto the compromise space (i.e., partial factor scores) to understand how each correlation matrix fits in the compromise (or sample-level) space.
To do that, we should first create a design matrix specifying which slice in our X cube corresponds to which n-back condition:
<- vector("character",dim(X)[3])
condition_design grep("*_nbk_0b",dimnames(X)[[3]])]<-"0b"
condition_design[grep("*_nbk_1b",dimnames(X)[[3]])]<-"1b"
condition_design[grep("*_nbk_2b",dimnames(X)[[3]])]<-"2b" condition_design[
Now we can compute and plot the partial maps for the different n-back conditions, for each ROI. This plot shows how much the connectivity for each ROI is similar/different across task conditions (closer to the ROI center: more similar; farther from ROI center: more different).
#Compute the Partial map
<- covstatis_res$res4Splus$PartialF
F_j <- covstatis_res$res4Cmat$alpha
alpha_j # create the groups based on design
<- unique(condition_design)
code4Groups
<- length(code4Groups)
nK # initialize F_K and alpha_k
<- array(0, dim = c(dim(F_j)[[1]], dim(F_j)[[2]],nK))
F_k dimnames(F_k) <- list(dimnames(F_j)[[1]],
dimnames(F_j)[[2]], code4Groups)
<- rep(0, nK)
alpha_k names(alpha_k) <- code4Groups
<- F_j
Fa_j for (j in 1:dim(F_j)[[3]]){ Fa_j[,,j] <- F_j[,,j] * alpha_j[j] }
for (k in 1:nK){
<- condition_design == code4Groups[k]
lindex <- sum(alpha_j[lindex])
alpha_k[k] <- (1/alpha_k[k])*apply(Fa_j[,,lindex],c(1,2),sum)
F_k[,,k]
}
<- createPartialFactorScoresMap(
compromise_pfs factorScores = covstatis_res$res4Splus$F,
partialFactorScores = F_k,
axis1 = 1, axis2 = 2,
colors4Items = as.vector(atlas_labels$Network_color),
colors4Blocks = c("yellowgreen", "olivedrab", "darkgreen"),
names4Partial = dimnames(F_k)[[3]], #
shape.points = 20)
print(compromise_graph_out$zeMap_background + compromise_graph_out$zeMap_dots +
$mapColByBlocks + ggtitle('Compromise map - Partial factor scores by task design') + scatter.theme + compromise_label) compromise_pfs
That’s a very busy plot! Let’s clean it up to make things easier to interpret. We can compute the partial factor scores by the mean network compromise. Similar to our ROI plot, for each network, the more spread out the conditions are from the network mean, the more diverse the functional connectivity is across task conditions. Tip: if you have groups, sometimes it is helpful to separate partial factor scores per group and plot them individually (e.g., you may not be able to see a difference if everything is plotted together).
<-array(NA, dim=c(length(network_labels), dim(F_k)[2], dim(F_k)[3]))
F_k_networkdimnames(F_k_network)<-list(network_labels, dimnames(F_k)[[2]], dimnames(F_k)[[3]])
for(i in 1:dim(F_k_network)[3]){
<- t(apply(makeNominalData(as.matrix(atlas_labels$Network)),2,function(x){x/sum(x)})) %*% F_k[,,i]
F_k_network[,,i]
}
<- createPartialFactorScoresMap(
compromise_pfs_network factorScores = network_mean_F,
partialFactorScores = F_k_network,
axis1 = 1, axis2 = 2,
colors4Items = network_colors,
colors4Blocks = c("yellowgreen", "olivedrab", "darkgreen"),
names4Partial = dimnames(F_k_network)[[3]], #
size.labels=3.5, size.points = 2,shape.points=20)
print(compromise_network_graph_out$zeMap + compromise_pfs_network$mapColByBlocks + ggtitle('Network level compromise map - Partial factor scores by task design') + scatter.theme + compromise_label)
Step 5: area of the convex hull
We can now compute the area of the convex hull around the partial factor scores for each individual at the network-level (NB: this can be done at the regional level too). Area of the hull scores can be seen as “task condition differentiation scores”: the larger the area of the hull, the more diverse the network functional connectivity is across task conditions (more distant from mean); the smaller the area of the hull, the more homogenous the functional connectivity across 0-,1-, and 2-back. Note: area of the hull values are arbitrary so you can go ahead and normalize them. Make sure you have at least 4 points to build the area of the hull (here you’ll see warnings cause some networks have less than 4 observations/ROIs).
## Create empty matrix to put results in, should be # participants (rows) by # networks (columns)
<-data.frame(matrix(0, length(ids),length(network_labels)))
task_diff_network_level rownames(task_diff_network_level) <- ids
names(task_diff_network_level) <- network_labels
<- covstatis_res$res4Splu$PartialF
partialF
for(i in 1:length(ids)){
### Grab all the partialF values associated with this subject for LV1
<- partialF[,1,grep(paste0(ids[i], "_*"),dimnames(partialF)[[3]])]
this_lv1 ## Transform the node-level partialF to network-level partialF for LV1
<-t(apply(makeNominalData(as.matrix(atlas_labels$Network)),2,
this_lv1_networkfunction(x){x/sum(x)})) %*% as.matrix(this_lv1)
### Grab all the partialF values associated with this P for LV2
<- partialF[,2,grep(paste0(ids[i], "_*"),dimnames(partialF)[[3]])]
this_lv2 ## Transform the node-level partialF to network-level partialF for LV2
<-t(apply(makeNominalData(as.matrix(atlas_labels$Network)),2,
this_lv2_networkfunction(x){x/sum(x)})) %*% as.matrix(this_lv2)
## Loop through each network, get the LV1 and LV2 points, and compute the area of the hull around the points
for(n in 1:length(network_labels)){
<- chull(this_lv1_network[n,], this_lv2_network[n,]) #here you can play with the flag "percentage" and include either all points (like done here) or e.g., 95% of them in which case you'd write "percentage=0.95"
hull_pts <- cbind(this_lv1_network[n,hull_pts],this_lv2_network[n,hull_pts] )
this_hull_coords <- Polygon(this_hull_coords, hole=F)@area
this_poly_area <- this_poly_area
task_diff_network_level[i,n]
}
}
#print(task_diff_network_level) #you can go ahead and save this output and relate it to e.g., behavior
Area of the hull scores can be used in secondary analyses, such as brain-behavior correlation. Since our sample is an adult lifespan sample, we can relate area of the hull scores to age. This plot shows how Default A functional connectivity becomes more homogeneous/less differentiated across tasks as a function of age.
<- cbind(demog_in$Age, task_diff_network_level)
secondary_analyses <- as.data.frame(secondary_analyses)
secondary_analyses colnames(secondary_analyses)[1] <- "Age"
colnames(secondary_analyses)[14] <- "DefaultA"
ggplot(secondary_analyses, aes(x=Age, y=DefaultA)) +
geom_point()+
geom_smooth(method=lm)