library(sbm)
library(ggplot2)
library(knitr)
library(patchwork)
library(mclust)📊 Analysis of a complete dataset with SBM
Fungus-tree network
Requirements
This tutorial follows the one on sbm. As before, we will need the package sbm package, along with some additional packages for data manipulation and visualization:
1. An antagonistic tree/fungus interaction network
We consider the fungus-tree interaction network studied by Vacher, Piou, and Desprez-Loustau (2008), available with the package sbm :
This data set provides information about \(154\) fungi sampled on \(51\) tree species. It is a list with the following entries:
tree_names: list of the tree species namesfungus_names: list of the fungus species namestree_tree: weighted tree-tree interactions: number of common fungal species two tree species host.fungus_tree: binary fungus-tree interactionscovar_tree: covariates associated to pairs of trees (namely genetic, taxonomic and geographic distances)
data("fungusTreeNetwork")
str(fungusTreeNetwork, max.level = 1)
#> List of 5
#> $ tree_names : Factor w/ 51 levels "Abies alba","Abies grandis",..: 1 2 3 14 42 4 5 6 7 8 ...
#> $ fungus_names: Factor w/ 154 levels "Amphiporthe leiphaemia",..: 1 2 3 4 5 6 7 8 9 10 ...
#> $ tree_tree : num [1:51, 1:51] 0 12 9 3 2 2 0 0 2 7 ...
#> $ fungus_tree : int [1:154, 1:51] 0 0 0 0 1 1 1 0 0 0 ...
#> $ covar_tree :List of 3In what follows, we will first consider the simple tree-tree network in its binary and weighted versions. Then we will consider the bipartite network between trees and fungi.
2. Analysis of the binary 🌳–🌲 network.
2.1 The data
We first consider the binary network where an edge is drawn between two trees when they do share a least one common fungi:
tree_tree_binary <- 1 * (fungusTreeNetwork$tree_tree != 0)The function plotMyMatrix can be used to represent simple or bipartite SBM:
Code
plotMyMatrix(tree_tree_binary, dimLabels = list(row = 'tree', col = 'tree'))2.2 The model
We look for some latent organization of the network by adjusting a simple SBM with the function estimateSimpleSBM. We assume the our matrix is the realization of the SBM model.
The SBM model is defined as: \[\begin{align*} (Z_i) \text{ i.i.d.} \qquad & Z_i \sim \mathcal{M}_K(1, \pi) \\ (Y_{ij}) \text{ indep.} \mid (Z_i) \qquad & (Y_{ij} \mid Z_i=k, Z_j = \ell) \sim \mathcal{B}(\alpha_{k\ell}) \end{align*}\]
The function from sbm package (based on the package : blockmodels) infers this model with Variational EM. Note that simpleSBM refers to standard networks (w.r.t. bipartite).
2.3 Fitting the Bernoulli-SBM
We apply the VEM and ICL criterion to infer the structure of the network.
mySimpleSBM <- tree_tree_binary %>%
estimateSimpleSBM("bernoulli", directed= FALSE,
dimLabels ='tree',
estimOptions = list(verbosity = 0, plot=FALSE))We can plot the ICL with respect to \(K\) as follows.
mySimpleSBM$storedModels %>%
ggplot() +
aes(x = nbBlocks, y = ICL) + geom_line() + geom_point(alpha = 0.5) + geom_vline(xintercept = mySimpleSBM$storedModels$nbBlocks[which.max(mySimpleSBM$storedModels$ICL)],color = "red")We obtain 5 blocks. We can now plot the reordered matrix.
p1 <- plot(mySimpleSBM, type = "data", dimLabels = list(row = 'tree', col = 'tree'))
#> Warning: `aes_string()` was deprecated in ggplot2 3.0.0.
#> ℹ Please use tidy evaluation idioms with `aes()`.
#> ℹ See also `vignette("ggplot2-in-packages")` for more information.
#> ℹ The deprecated feature was likely used in the sbm package.
#> Please report the issue at <https://github.com/GrossSBM/sbm/issues>.
#> Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
#> ℹ Please use `linewidth` instead.
#> ℹ The deprecated feature was likely used in the sbm package.
#> Please report the issue at <https://github.com/GrossSBM/sbm/issues>.
p2 <- plot(mySimpleSBM, type = "expected", dimLabels = list(row = 'tree', col = 'tree'))
p1 | p23. Analysis of the weighted 🌳–🌲 network
Instead of considering the binary network tree-tree we may consider the weighted network where the link between two trees is the number of fungi they share.
We plot the matrix with function plotMyMatrix:
tree_tree <- fungusTreeNetwork$tree_tree
plotMyMatrix(tree_tree, dimLabels = list(row = 'tree', col = 'tree'))Here again, we look for some latent structure of the network by adjusting a simple SBM with the function estimateSimpleSBM, considering a Poisson distribution on the edges.
The Poisson-SBM model is: \[\begin{align*} (Z_i) \text{ i.i.d.} \qquad & Z_i \sim \mathcal{M}_K(1, \pi) \\ (Y_{ij}) \text{ indep.} \mid (Z_i) \qquad & (Y_{ij} \mid Z_i=k, Z_j = \ell) \sim \mathcal{P}(\exp(\alpha_{kl})) = \mathcal{P}(\lambda_{kl}) \end{align*}\]
mySimpleSBMPoisson <- tree_tree %>%
estimateSimpleSBM("poisson",
directed = FALSE,
estimOptions = list(verbosity = 2 , plot = TRUE),
dimLabels = c('tree'))We now plot the matrix reordered according to the memberships estimated in the SBM. Once again, one can also plot the expected network which, in case of the Poisson model, corresponds to the expected weight of connections between any pair of nodes in the network.
plot(mySimpleSBMPoisson, type = "data") | plot(mySimpleSBMPoisson, type = "expected")The composition of the clusters/blocks can be displayed by now be scrutined.
Cluster composition
#-------------------------------------------
print_cluster = function(Z,names_Z){
names(Z) = sub("\\(.*", "", names_Z) # short tree species names
Z <- sort(Z)
K <- max(Z)
clusters <- lapply(1:K,function(k){names(Z)[Z==k]})
max_len <- max(sapply(clusters, length))
df <- sapply(clusters, function(x) c(x, rep('', max_len - length(x)))) %>% as.data.frame()
names(df) <- paste('Cluster',1:K)
return(df)
}
df <- print_cluster(mySimpleSBMPoisson$memberships, fungusTreeNetwork$tree_names)
knitr::kable(df, caption = "Tree Clusters")| Cluster 1 | Cluster 2 | Cluster 3 | Cluster 4 | Cluster 5 | Cluster 6 |
|---|---|---|---|---|---|
| Abies alba | Large Maples | Abies nordmanniana | Small Maples | Cupressus sempervirens | Betulus spp |
| Abies grandis | Fagus silvatica | Larix kaempferi | Alnus glutinosa | Pinus brutia | Carpinus betulus |
| Cedrus spp | Fraxinus spp | Picea sitchensis | Castanea sativa | Pinus cembra | Populus tremula |
| Larix decidua | Juglans spp | Pinus halepensis | Quercus ilex | Pinus pinea | Quercus pyrenaica |
| Picea excelsa | Cultivated Poplars | Pinus nigra nigra | Quercus pubescens | Pinus radiata | Sorbus aucuparia |
| Pinus nigra laricio | Prunus avium | Pinus strobus | Quercus suber | Pinus taeda | Sorbus domestica |
| Pinus pinaster | Quercus petraea | Pinus uncinata | Sorbus aria | Platanus hybrida | Taxus baccata |
| Pinus sylvestris | Quercus robur | Tilia spp | Tsuga heterophylla | Thuja plicata | |
| Pseudotsuga menziesii | Quercus rubra | Ulmus spp | |||
| Sorbus torminalis |
We are interested in comparing the two clusterings. To do so we use the alluvial flow plots and compute the ARI .
listMemberships <- list(Binary = mySimpleSBM$memberships)
listMemberships$Weighted <- mySimpleSBMPoisson$memberships
ARI <- adjustedRandIndex(mySimpleSBM$memberships, mySimpleSBMPoisson$memberships)
P <- plotAlluvial(listMemberships)4. Poisson SBM with covariates
We have on each pair of trees 3 covariates, namely the genetic distance, the taxonomic distance and the geographic distance. Each covariate has to be introduced as a matrix: \(X^k_{ij}\) corresponds to the value of the \(k\)-th covariate describing the couple \((i,j)\).
We can also use the sbm package to estimate the parameters of the SBM with covariates.
THE SBM with covariates is: \[\begin{align*} (Z_i) \text{ i.i.d.} \qquad & Z_i \sim \mathcal{M}(1, \pi) \\ (Y_{ij}) \text{ indep.} \mid (Z_i) \qquad & (Y_{ij} \mid Z_i=k, Z_j = \ell) \sim \mathcal{P}(\exp(\alpha_{kl} + x_{ij}^\intercal \theta)) = \mathcal{P}(\lambda_{kl}\exp(x_{ij}^\intercal \theta)) \end{align*}\]
mySimpleSBMCov<- estimateSimpleSBM(
netMat = as.matrix(tree_tree),
model = 'poisson',
directed =FALSE,
dimLabels =c('tree'),
covariates = fungusTreeNetwork$covar_tree,
estimOptions = list(verbosity = 2,plot=TRUE))Code
mySimpleSBMCov$storedModels %>%
ggplot() +
aes(x = nbBlocks, y = ICL) + geom_line() + geom_point(alpha = 0.5) + geom_vline(xintercept = mySimpleSBMCov$storedModels$nbBlocks[which.max(mySimpleSBMCov$storedModels$ICL)],color = "red")We now obtain 4 blocks, which is less than before. As a consequence, the covariates do explain a part of the structure. They do not explain all the structure because \(K\) is still greater tant \(1\).
We can now extract the parameters of interest, namely the block proportions and the effect of the covariates.
mySimpleSBMCov$blockProp
#> [1] 0.3715916 0.2159544 0.2354116 0.1770425
mySimpleSBMCov$covarParam
#> [,1]
#> [1,] 0.1975175
#> [2,] -2.0552576
#> [3,] -0.3586006We could perform a selection of the covariates by estimating the model with a subset of the complete covariate set.
mySimpleSBMCov2<- estimateSimpleSBM(
netMat = as.matrix(tree_tree),
model = 'poisson',
directed =FALSE,
dimLabels =c('tree'),
covariates = fungusTreeNetwork$covar_tree[-1],
estimOptions = list(verbosity = 0,plot=FALSE))
mySimpleSBMCov2$ICL - mySimpleSBMCov$ICLIn this case, mySimpleSBMCov2$ICL - mySimpleSBMCov$ICL >0 so the model without the genetic distance is prefered.
Note that the clusters do not play the same role anymore. They do not encode the structure of the network by only the residual structure after the covariates have been taking into account. As a consequence, they are difficult to interpret. Reordering the matrix with respect to this clustering may not highlight structures as clear as before.
Code
plot(mySimpleSBMCov)5. Analysis of the 🌳-🍄 network
Finally, we go back to the original data and analyze the bipartite tree/fungi interactions. The incidence matrix can be plotted with the function
Code
biAdj <- fungusTreeNetwork$fungus_tree
colnames(biAdj) <- sub("\\(.*", "",fungusTreeNetwork$tree_names)
row.names(biAdj) <- sub("\\(.*", "",fungusTreeNetwork$fungus_names)
plotMyMatrix(t(biAdj), dimLabels=list(col = 'fungis',row = 'tree'),plotOptions = list(colNames=TRUE, rowNames=TRUE))+ theme(
axis.text = element_text(size = 3)
)THE SBM adapted to bipartite network is: \[\begin{align*} (Z_i) \text{ i.i.d.} \qquad & Z_i \sim \mathcal{M}_K(1, \pi) \\ (W_j) \text{ i.i.d.} \qquad & W_j \sim \mathcal{M}_L(1, \rho) \\ (Y_{ij}) \text{ indep.} \mid (Z_i, W_j) \qquad & (Y_{ij} \mid Z_i=k, W_j = \ell) \sim \mathcal{B}(\alpha_{k\ell}) \end{align*}\]
myBipartiteSBM <- estimateBipartiteSBM(
netMat = as.matrix(fungusTreeNetwork$fungus_tree),
model = 'bernoulli',
dimLabels=c(row = 'fungis',col = 'tree'),
estimOptions = list(verbosity = 1,plot = TRUE))We can now plot the reorganized matrix.
Code
plot(myBipartiteSBM, type = "data") | plot(myBipartiteSBM, type = "expected")An interesting structure emerges: block 2 of trees interacts primarily with fungal blocks 2 and 4, whereas tree blocks 1 and 3 mainly interact with fungal blocks 1 and 2. Tree block 4 consists of species associated with relatively few fungal partners.
The clusters of the trees are now:
| Cluster 1 | Cluster 2 | Cluster 3 | Cluster 4 |
|---|---|---|---|
| Picea excelsa | Large Maples | Abies alba | Small Maples |
| Pinus nigra laricio | Castanea sativa | Abies grandis | Alnus glutinosa |
| Pinus pinaster | Fagus silvatica | Abies nordmanniana | Betulus spp |
| Pinus sylvestris | Fraxinus spp | Cedrus spp | Carpinus betulus |
| Pseudotsuga menziesii | Juglans spp | Larix decidua | Cupressus sempervirens |
| Cultivated Poplars | Larix kaempferi | Pinus brutia | |
| Prunus avium | Picea sitchensis | Pinus cembra | |
| Quercus petraea | Pinus halepensis | Pinus radiata | |
| Quercus robur | Pinus nigra nigra | Platanus hybrida | |
| Quercus rubra | Pinus pinea | Populus tremula | |
| Pinus strobus | Quercus ilex | ||
| Pinus taeda | Quercus pubescens | ||
| Pinus uncinata | Quercus pyrenaica | ||
| Quercus suber | |||
| Sorbus aria | |||
| Sorbus aucuparia | |||
| Sorbus domestica | |||
| Sorbus torminalis | |||
| Taxus baccata | |||
| Thuja plicata | |||
| Tilia spp | |||
| Tsuga heterophylla | |||
| Ulmus spp |
The clusters of the fungi are now:
| Cluster 1 | Cluster 2 | Cluster 3 | Cluster 4 |
|---|---|---|---|
| Armillaria ostoyae | Armillaria gallica | Armillaria cepistipes | Amphiporthe leiphaemia |
| Heterobasidion annosum | Armillaria mellea | Caliciopsis pinea | Apiognomonia errabunda |
| Sphaeropsis sapinea | Botrytis cinerea | Cenangium ferruginosum | Apiognomonia veneta |
| Sydowia polyspora | Botryosphaeria stevensii | Ceratocystis ips | Asteroma tiliae |
| Fomitopsis pinicola | Ceratocystis minor | Biscogniauxia mediterranea var. mediterranea | |
| Fusarium oxysporum | Chrysomyxa abietis | Bjerkandera adusta | |
| Haematonectria haematococca | Chrysomyxa ledi var. rhododendri | Blumeriella jaapii | |
| Ophiostoma piceae | Coleosporium tussilaginis f.sp. senecionis-silvatici | Botryosphaeria dothidea | |
| Phaeolus schweinitzii | Cronartium flaccidum | Ceratocystis platani | |
| Cronartium ribicola | Chondrostereum purpureum | ||
| Crumenulopsis sororia | Colpoma quercinum | ||
| Cyclaneusma minus | Cristulariella depraedans | ||
| Cyclaneusma niveum | Cronartium quercuum | ||
| Cylindrocarpon didymum | Cryptodiaporthe castanea | ||
| Delphinella abietis | Cryptostroma corticale | ||
| Gremmeniella abietina | Cryptodiaporthe hystrix | ||
| Herpotrichia juniperi | Cryphonectria parasitica | ||
| Lachnellula willkommii | Cryptodiaporthe populea | ||
| Leptographium lundbergii | Daedaleopsis confragosa | ||
| Leptographium procerum | Dicarpella dryina | ||
| Lirula macrospora | Discella castanea | ||
| Lirula nervisequia var. nervisequia | Drepanopeziza punctiformis | ||
| Lophodermium conigenum | Entoleuca mammata | ||
| Lophodermium piceae | Eutypa lata | ||
| Lophodermium pinastri | Fistulina hepatica | ||
| Lophodermium seditiosum | Fomes fomentarius | ||
| Melampsorella caryophyllacearum | Fomitopsis cytisina | ||
| Melampsora populnea | Fusicoccum quercus | ||
| Mycosphaerella dearnessii | Ganoderma adspersum | ||
| Mycosphaerella pini | Ganoderma applanatum | ||
| Nectria fuckeliana | Ganoderma resinaceum | ||
| Ophiostoma serpens | Gnomonia leptostyla | ||
| Pestalotiopsis funerea | Gymnopus fusipes | ||
| Phacidium coniferarum | Hypoxylon fragiforme | ||
| Phaeocryptopus gaeumannii | Laetiporus sulphureus | ||
| Phellinus chrysoloma | Leucostoma persoonii | ||
| Phellinus hartigii | Melampsora allii-populina | ||
| Phellinus pini | Melampsora laricis-populina | ||
| Pucciniastrum epilobii | Melanconis modonia | ||
| Rhabdocline pseudotsugae | Meripilus giganteus | ||
| Rhizosphaera kalkhoffii | Microsphaera alphitoides | ||
| Rhizosphaera macrospora | Microstroma juglandis | ||
| Rhizosphaera oudemansii | Monochaetia monochaeta | ||
| Rhizina undulata | Mycosphaerella castaneicola | ||
| Sarea resinae | Mycosphaerella microsora | ||
| Sirococcus strobilinus | Nectria cinnabarina | ||
| Sparassis crispa | Nectria coccinea | ||
| Stereum sanguinolentum | Nectria ditissima | ||
| Truncatella hartigii | Nectria veuillotiana | ||
| Valsa kunzei | Neonectria galligena | ||
| Zythiostroma pinastri | Ophiostoma ulmi | ||
| Oudemansiella mucida | |||
| Oxyporus populinus | |||
| Pezicula cinnamomea | |||
| Phellinus igniarius | |||
| Phellinus robustus | |||
| Phloeospora aceris | |||
| Phomopsis juniperivora | |||
| Phomopsis quercella | |||
| Phyllosticta aceris | |||
| Phyllactinia guttata | |||
| Pleurotus ostreatus | |||
| Pleuroceras pseudoplatani | |||
| Podosphaera clandestina var. aucupariae | |||
| Pollaccia radiosa | |||
| Polyporus squamosus | |||
| Pseudoinonotus dryadeus | |||
| Rhytisma acerinum | |||
| Rosellinia necatrix | |||
| Sawadaea tulasnei | |||
| Schizophyllum commune | |||
| Seiridium cardinale | |||
| Septotis podophyllina | |||
| Septoria quercicola | |||
| Stegonsporium pyriforme | |||
| Stereum hirsutum | |||
| Stigmina carpophila | |||
| Taphrina betulina | |||
| Taphrina caerulescens | |||
| Taphrina cerasi | |||
| Taphrina populina | |||
| Titaeosporina tremulae | |||
| Trametes gibbosa | |||
| Trametes versicolor | |||
| Valsa ambiens | |||
| Valsa nivea | |||
| Valsa sordida | |||
| Venturia fraxini | |||
| Venturia inaequalis | |||
| Verticillium dahliae |
6. To go further
Consider your preferred foodweb or plant-pollinator network (or the one from Day 2). Apply the sbm inference and compare the clusters to the ones introduced on Day 2.