CIMP analysis
22 Jul 2015Introduction
This page contains the pipeline analysis for the "CIMP analysis". Source code is available here. The whole analysis can be computed by running the run_all.R after data processing.
Data processing
Data processing is done by running the run_all.R located in data/src. The procedure is standard (see report).
Data analysis
Part A: Analysis of average CGI+SS patterns
DiseaseList <- c(('BLCA','BRCA','COAD','LUAD','STAD')
1) We first assess CIMP in each tissue using the same methodology on genome-wide methylation profiles by performing hierarchical clustering on the top 5% most variant probes in each disease.
source('fun/analyze_CIMP_all_CGIs.R')
for (DiseaseName in DiseaseList)
{
print(DiseaseName)
out <- analyze_CIMP_all_CGIs(DiseaseName=DiseaseName,CIMP.Number = 2, calc.Var= T)
}
2) We assess the robustness of the clusters by varying the number of CGIs considered from 1 to 10 percent. At the same time, we also look at the stability of 3 clusters to assess the existence of a CIMP-low phenotype.
var.list <- c(1,2,5,10)
CIMP.list <- c(2,3)
source('fun/analyze_CIMP_all_CGIs.R')
for (CIMP.Number in CIMP.list)
{
for (var.thresh in var.list)
{
for (DiseaseName in DiseaseList)
{
print(paste0('Analyzing ',DiseaseName,'...', ' with var=',var.thresh,'% and CIMP.number=',CIMP.Number))
out <- analyze_CIMP_all_CGIs(DiseaseName=DiseaseName,CIMP.Number = CIMP.Number, calc.Var= F, var.thresh = var.thresh)
}
}
}
### 2.A) Look at cluster robustness
source('fun/cluster_analysis.R')
out <- cluster_analysis(DiseaseList=DiseaseList, var.thresh=var.thresh)
### 2.B) Look at cluster robustness given the var.thresh
source('fun/cluster_analysis_var.R')
var.list <- c(1,2,5,10)
for (Disease in DiseaseList)
{
out <- cluster_analysis_var(DiseaseName=Disease, CIMP.Number=3, var.list= var.list)
}
3) We fix the top 5% CGIs to define the CIMP-signature instead of another cutoff as a tradeoff between relevant probes and having a wide enough coverage. We then analyze whether there is a common panel of probes between the tissue-specific CIMP-signature.
var.thresh <- 5
source('fun/compare_panel_all_CGIs.R')
DiseaseList <- c('BRCA','BLCA','COAD','LUAD','STAD')
out <- compare_panel_all_CGIs(DiseaseList, var.thresh=var.thresh)
We obtain a subset of 89 CGIs common between all the CIMP-signatures.
4) By combining the samples from the different tissues, we then perform clustering on this common CIMP-signature:
source('fun/analyze_CIMP_all_CGIs_bis.R')
out <- analyze_CIMP_all_CGIs_bis(DiseaseList,CIMP.Number = 2)
4) We then analyze whether the methylation aberrations can be associated with transcriptomic or genetic variations:
4.A) Can we assess CIMP from gene expression variations i.e CIMP=f(Gene Expression)?
We propose to tackle this problem using a sparse logistic regression with different formulations:
i. In the first case we predict the CIMP status for each tissue separately:
source('fun/predict_CIMP_GE_glmnet.R')
for (DiseaseName in DiseaseList)
{
print(DiseaseName)
out <- predict_CIMP_GE(DiseaseName, var.thresh=var.thresh, CIMP.Number=2, centered=T, scaled=T, intercept=T, n.folds=3, bootstrap=100, cores=10, log_exp=T, balanced=T)
}
ii. In the second case we compute a single classifier for all datasets:
source('fun/predict_CIMP_GE_all.R')
out <- predict_CIMP_GE_all(DiseaseList, var.thresh=var.thresh, CIMP.Number=2, centered=T, scaled=T, intercept=T, n.folds=3, bootstrap=100, cores=10, log_exp=T, balanced=T)
iii. Finally, in the last case we relax the previous constraint (single classifier) by forcing each tissue-specific predictor to have the same non-zero coefficients but allowing the coefficients to vary:
source('fun/predict_CIMP_GE_MT_par.R')
out <- predict_CIMP_GE_MT(DiseaseList, var.thresh=var.thresh, CIMP.Number=2, centered=T, scaled=T, intercept=T, n.folds=3, bootstrap=100, cores=10, balanced=T)
iv. Summary of the results:
source('fun/analyze_predict_CIMP_GE_MT.R')
4.B) Analysis of the mutations associated with CIMP:
i) We analyzed the the association between CIMP and known reported mutations associated with tissue-specific CIMPs (e.g BRAF, KRAS, IDH1, IDH2, TET2).
source('fun/analyze_mutations.R')
Mutation.List <- c('BRAF','KRAS','IDH1','IDH2','TET2')
analyze_mutations(DiseaseList, Mutation.List=Mutation.List)
ii) We then also searched for other mutations significantly associated with CIMP in all diseases:
{r}
source('fun/analyze_mutations.R')
analyze_mutations(DiseaseList, Mutation.List=Mutation.List)
5) Survival analysis
source('fun/compare_clinical.R')
for (DiseaseName in DiseaseList)
{
print(DiseaseName)
out <- compare_clinical(DiseaseName=DiseaseName, var.thresh=var.thresh, CIMP.Number= CIMP.Number)
}