Amin Allahyar

Bioinformatics: Collaboration of biology, math and computer science

Short bio

I was born in Shiraz, Iran in 1987. I am fascinated about computers and specifically how they learn. I got my M.Sc. degree (cum laude) in artificial intelligence from Ferdowsi University of Mashhad in 2012. During my master, I have surveyed many different aspect of this exciting field including online learning and semi-supervised learning.

Current position

In order to apply my acquired knowledge in real world problems, I joined Delft Bioinformatics Lab (DBL) as a PhD student in December 2013. My aim is to employ machine learning methods to extract the useful information in biological networks (e.g. STRING, KEGG or HPRD) in order to fill up the gap between tumor cell’s inner state (measured by gene expression, DNA methylation or copy number variation) and phenotype under study (e.g. cancer outcome, synthetic lethality or drug response).

My acquaintance with Deep Learning

I recently got interested in Deep Learning. This notion encloses a set of similar neural network architectures (either supervised or unsupervised) that aim to detect and utilize low level elements in data (e.g. edges in a picture) to construct the high level perceptions (e.g. a face) in a multi-scale manner. I am working on a particular type of DL methods called Auto Encoder (AE) which tries to minimize the reconstruction error by having identical input and output layer.

Currently, I am involved in the following projects:

  • Generating new samples using currently existing ones

    The current DL methods exploit massive amount of data to achieve a performance beyond state of the art techniques. This is indeed an issue for application of DL in biologically related problems as they are often small in terms of sample size. In order to overcome this difficulty, I intend to generate new samples using existing ones by exploiting biological knowledge. This is in contrast with current methods were new samples are generated by applying a Gaussian noise to already existing ones. This idea is inspired by a similar trick in computer vision and image analysis where new samples are produced by adding relevant variations to existing images (e.g. rotation). This approach can be extended to biological data.

Non-linear integration in network based outcome prediction

One of the hallmarks of cancer is that it is caused by deregulation of several processes or cellular pathways. To model these processes in Network Based Outcome Prediction methods (NOPs), several functionally related genes are commonly aggregated to produce so-called meta-genes. Meta-genes are the key factors in prediction power of NOPs. However, in nearly all of these methods, the meta-gene is being formed by linear integration (typically using average operator). This limitation can be removed using the implicit non-linear integration in DL methods. Apart from performance improvement, it might offer more insights to the essential aberrant procedures of this complex disease.

Multi-scale representation of cell processes

It is known that cellular functions arise at different scales. However in outcome prediction problem, the appropriate number of genes that should contribute in the meta-gene construction (representing a cellular function) is not known a priory. Many strategies are considered to determine gene set (i.e. scale of meta-gene) including clustering or greedy search. Using these methods, the manufactured meta-genes are able to detect the abnormally within interacting genes (i.e. within pathways). Yet, the irregularity can occur in different scales (e.g. between pathways). Nearly all DL methods are in essence a multi-level neural network which have the potential to identify the aberrations in multiple scales simultaneously.