I spend one day per week (typically Tuesdays) at the Delft Bioinformatics Group. The rest of the time I hold a position as head of the Bioinformatics and statistics group at the Netherlands Cancer Institute - Antoni van Leeuwenhoek Hospital. Our group provides leadership on the collection and analysis of data for the research programs of the institute, by performing state of the art analyses of a wide array of data types, including laboratory and animal experiments, clinical trials, and epidemiologic studies. The members of the group also conduct research in bioinformatics and statistics, for example on stratifying tumors into groups with distinct and homogeneous outcome and therapy response; on the function of genes and pathways involved in tumorigenesis and understanding molecular regulatory mechanisms. A number of exemplary projects are presented below in more detail.
Extracting oncogenes and oncogenic pathways from insertional mutagenesis screens
To find oncogenic lesions which are collaborating events in tumorigenesis, we developed an approach to detect the significantly frequent co-occurrence of independent insertions within one tumor. We have extended this approach to detect combinatorial association logic networks (CALs): simple logic circuits which employ combinations of co-occurring and mutually exclusive insertions to predict the expression pattern of downstream targets. In classical one-dimensional analyses, direct interactions between the insertion patterns and transcription levels across tumors are detected. However, when the insertion loci themselves interact, direct associations between the individual loci and transcript levels may become undetectable. Therefore, our method detects associations between transcript levels and the outputs of small Boolean logic networks that combine multiple genetic loci. The detection of logic networks requires solving a demanding optimization problem. By reformulating the objective function and applying a customized branch and bound algorithm, we obtain runtimes of up to four orders of magnitude faster than exhaustive search. We demonstrated our method on an insertional mutagenesis dataset, combining insertion data with transcriptional information from the same sample, finding known and novel associations between genes involved in Notch signaling.
Identification of networks of co-occurring oncogenic gains and losses
Collaborating oncogenic events can also be induced by copy number alterations. To detect such events in aCGH data, we developed a scoring framework to separate truly co-occurring aberrations from passenger mutations and dominant single signals present in the data. Analysis of high-resolution DNA copy number data from a panel of 95 hematological tumor cell lines correctly identified co-occurring recombinations at the T-cell receptor and immunoglobulin loci in T- and B-cell malignancies, respectively. This demonstrates that we can recover truly co-occurring genomic alterations. In addition, our analysis revealed networks of co-occurring genomic losses and gains that are enriched for cancer genes. The detected co-occurrences are highly enriched for functional relationships. The co-occurring losses we find are independent of the canonical cancer genes within the network. Our findings suggest that large-scale, low- intensity copy number changes may be an important feature of cancer development or maintenance by affecting the gene dosages of a large interconnected network of functionally related genes.
Integration of clinical and expression data for breast cancer outcome prediction
Several models exist that can be used to predict disease outcome of breast cancer patients. Only a few studies have created a single prediction model using both expression and clinical data. These studies often remain inconclusive regarding an obtained improvement (if any). We rigorously compared three different integration strategies (early, intermediate, and late integration) and no integration (only one data source) using five classifiers of varying complexity. We performed our analysis on a set of 295 breast cancer samples, for which expression data and an extensive set of clinical parameters are available.
A nearest mean classifier employing a logical OR operation on clinical and expression classifier outputs significantly outperforms all other classifiers. Moreover, regardless of the integration strategy, the nearest mean classifier achieves the best performance. All five classifiers achieve their best performance when employing an integration strategy. The late integration strategy performed best for four out of five classifiers, and early integration once. A nearest mean classifier that is trained on the originally published clinical variables performs worse than an expression based nearest mean classifier. However, adding the outputs from clinical prediction models, and a set of new pathological variables, results in a performance equivalent to that of the expression based classifier. Thus, there is no longer a significant performance argument to choose one data source over the other, but rather employ a late integration strategy based on nearest mean classifiers for optimal results.
Dynamics of genome - nuclear lamina interactions
In collaboration with the van Steensel group we study genome – nuclear lamina interactions in various cell types. For this, we use DamID data of the LaminB1 protein, which is one of the components of the nuclear lamina. We are not only interested in how the genome is organized in a cell nucleus, but more specifically how it is reorganized during, for example, differentiation. To this end we employed an in vitro differentiation system in which cultured mouse embryonic stem cells are differentiated into neural precursor cells, which in turn are induced to form astrocytes. For all three stages DamID profiles were collected. We developed a statistical test to discriminate between ‘constitutive’ and more dynamic, or ‘facultative’, genomic regions across these stages. Our data are currently obtained using high-density genome-wide tiling arrays, for which a strong dependency between probes adjacent on the genome is observed. The developed test employs the variance between independent biological replicates and autocorrelation levels present in the tiling array data to collectively estimate levels of technical and non-specific biological variance.
Statistical evaluation of biomarkers predicting treatment response
Biomarkers predicting treatment response are useful for tailoring treatment to host characteristics of individual patients in order to maximize treatment benefit and minimize side effects. Before prospective randomized trials are launched to evaluate a promising biomarker candidate, the first evaluation in humans often takes place in relatively small retrospective patient series or trials. Standard analyses use interaction terms in regression models. However, the impact of the introduction of a predictive biomarker into clinical practice can be also be estimated retrospectively by assigning patients to the marker-based and non-marker based arm of a hypothetical prospective trial. For example, this has been done in a retrospective analysis of phosphorylation of the estrogen receptor and tamoxifen response in a Swedish trial of premenopausal ER-positive breast cancer, where offering adjuvant tamoxifen treatment to the 52% patients with phosphorylated tumors (10-year recurrence-free survival of 75%) but not to the remaining 48% (10-year recurrence free survival of 52%) would result in an estimated 10-year recurrence-free survival of 64% for patients with phosphorylated tumors. This value is equal to the estimated 10-year recurrence-free survival if all patients are treated with adjuvant tamoxifen irrespective of phosphorylation, i.e., phosphorylation-guided treatment may save unnecessary treatment for half of the patients while maintaining approximately the same 10-year recurrence-free survival. Other examples include homologous recombination deficiency to predict response to high dose chemotherapy for breast cancer and EGFR ligands and insulin-like growth factors to predict response to EGFR-inhibitor treatment for lung cancer.