# Archive 2011

**December 7, 2011 ****Frank Redig (TU Delft)**

*Path space large deviations and Gibbs-non-Gibbs transitions.*

Motivated by the phenomenon of dynamical Gibbs-non-Gibbs transitions, we consider the typical behaviour of nearly deterministic processes conditioned on the future. Depending on how far in the future one conditions, there can be unique or non-unique optimal trajectories. Part of the talk will also be devoted to the recently developed path-space large deviation formalism by Feng and Kurtz.

**November 30, 2011 Ellen Saada (Paris universite Paris Descartes et CNRS) **

*A shape theorem for an epidemic model in dimension $d\ge 3$.*

We prove a shape theorem for the set of infected individuals in a spatial epidemic model with 3 states (susceptible-infected-recovered) on $\Z^d,d\ge 3$, when there is no extinction of the infection.

For this, we derive percolation estimates (using dynamic renormalization techniques) for a locally dependent random graph in correspondence with the epidemic model.

This is a joint work with E. D. Andjel and N. Chabot.

**November 23, 2011 Marco Loog (TU Delft)**

*Challenging Semi-Supervised Learning*.

Semi-supervised learning aims learn classification rules from both labeled and, typically more easily obtainable, unlabeled data. Though studied since the late 60s and early 70s, surprisingly little headway has been made with respect to methods that can guarantee, in expectation, to always outperform their supervised counterparts. A principle problem is that current state-of-the-art semi-supervised learning techniques make additional assumptions about the underlying data in an attempt to exploit all unlabeled instances. These assumptions, however, typically do not hold true and, as a result, making them can considerably deteriorate classification performance.

After giving a brief impression of the day-to-day worries of a pattern recognizer, I will present and discuss some of my very preliminary ideas and results concerning the problem of semi-supervision. My basic proposal is to develop semi-supervised learning techniques that do not make assumptions beyond those implicitly or explicitly made by the classification scheme employed. The overarching idea to achieve this is to exploit constraints and prior knowledge intrinsic to the classifiers considered. A simple example using the nearest mean classifier is provided. After my presentation there will hopefully be time for the audience to answer some of the questions I have.

**October ****12, 2011 Sandjai Bhulai (VU)**

*Optimal Allocation of Resources in Adaptive Survey Designs. *

Survey nonresponse occurs when members of a sample cannot or will not participate in the survey. It remains a problem despite the development of statistical methods that aim to reduce nonresponse. Instead, we address the problem of resource allocation in survey designs in which the focus is on the quality of the survey results given that there will be nonresponse. Therefore, we propose a novel method in which the optimal allocation of survey resources can be determined.

**October 5, 2011 ****Tom Kemptom (UU)**

*Beta Expansions and Bernoulli Convolutions.*

Almost every number has a unique expansion to base ten, known as the decimal expansion. Conversely, given a non integer real number beta greater than one, almost every number has uncountably many different expansions to base beta. In this talk we discuss some counting questions relating to beta expansions. We are able to give stronger results in the case that the Bernoulli convolution corresponding to beta is absolutely continuous, and consequently gain a new necessary condition for the absolute continuity of Bernoulli convolutions.

**September 14, 2011 Anne Fey (TU Delft)**

*Anisotropic bootstrap percolation in three dimensions. *

**June 1, 2011 Rik Lopuhaa (TU Delft)**

*The limit distribution of the supremum distance for Grenander type estimators.*

**May 25, 2011 Doug Hensley (Texas A&M University (TAMU)**

*Computing key parameters of continued fraction dynamical systems.*

**May 18, 2011 Anca Hanea (TU Delft)**

Parameter estimation using dynamic non-parametric Bayesian networks

**May 11, 2011 Karl Petersen (University of North Carolina at Chapel Hill, USA)**

*Invariant measures and combinatorics of some nonstationary adic systems*

We review recent work (much of it joint with Frick or Varchenko) on adic (Bratteli-Vershik) dynamical systems which come from walks or reinforced walks on finite graphs. Identification of the ergodic invariant measures depends on knowing path counts between vertices in the associated diagram, and this leads to interesting combinatorial problems and formulas involving binomial coefficients as well as Eulerian, Stirling, and Delannoy numbers. Among dynamical properties that can be determined are lack of point spectrum, faithful coding by subshifts, topological weak mixing, loosely Bernoulli, and complexity.

**April 27, 2011 Ronald Meester (VU)**

*Long range percolation on the hierarchical lattice*

We study long-range percolation on the hierarchical lattice of order N, where any edge of length k is present with probability p_k=1-exp(-eta^{-k} alpha),independent of all other edges. For fixed eta, we show that the critical value alpha_c(eta) is non-trivial if and only if N < eta < N^2. Furthermore, we show uniqueness of the infinite component and continuity of the percolation probability. The uniqueness problem involves a discussion of the so called Neumann-Kakutani transformation – we will explain the connection.

**April 20, 2011 Sergey Foss (Herriot Watt University)**

*Convergence in the total variation, directed percolation,last passage percolation, chains with infinite memory,and extended renovation theory.*

I will discuss conditions for convergence in the total variation for functionals of a Markov chain (or, more generally, of a stochastic recursion) which may depend on entirely infinite future and/or past.Various examples will be given.

**April 13, 2011 Tina Nane (TUD)**

*"Shape constrained nonparametric baseline estimators in the Cox proportional hazards model"*

Within survival analysis, the Cox proportional hazards model is one of the most acknowledged approaches to model right-censored time to event data in the presence of covariates. Different functionals of the lifetime distribution are commonly investigated. The hazard function is of particular interest, as it represents an important feature of the time course of a process under study, e.g., death or a certain disease.

Numerous survival studies indicate explicit evidence of monotone baseline hazard functions. The main objective is therefore to derive nonparametric baseline hazard estimators under monotonicity constraints and investigate their asymptotic behavior. Through the classical graphical representation, our first approach starts from the maximum likelihood estimator of the baseline cumulative hazard estimator, namely the Breslow (1972) estimator. For a nondecreasing baseline hazard, we define the least-squares (LS) baseline hazard estimator as the left-hand slope of the Greatest Convex Minorant (GCM) of the Breslow estimator. This estimator can be viewed as a least-squares projection on the space of all distributions with nondecreasing baseline hazards.

Succeedingly, a maximum likelihood estimator (MLE) of a nondecreasing baseline hazard has been derived by maximizing the (log)likelihood function over the set of all distributions with nondecreasing baseline hazards. Similarly, a monotone baseline density estimator has been defined and its strong consistency established.

**March 23, 2011 Tobias Mueller (CWI)**

*Random geometric graphs*

If we pick n points at random from d-dimensional space (i.i.d. according to some probability measure) and fix an r > 0, then we obtain a random geometric graph by joining two points by an edge whenever their distance is at most r.

I will give a brief overview of some of the main results on random geometric graphs and then describe my own work on Hamilton cycles and the chromatic number of random geometric graphs.

**March 16, 2011 Paul Eggermont ( University of Delaware)**

*Moment discretization of ill-posed problems and reproducing kernel Hilbert spaces*

**March 2, 2011 Jasper Anderluh (TU Delft en Fondsbeheerder HiQ)**

*Real Options in Nuclear Reactor Valuation.*

Real options are opportunities for economic players. Consider a nuclear plant operator who builds a special type of reactor, the so called Fast Reactor, that will give him in 25 years the opportunity to recycle part of the nuclear waste and feed it back into his reactor. In 2035 years he has to decide whether to continue generating waste or he can decide to change his fuel cycle by recycling the waste and feeding it back into the system. As it is his choice to change the fuel cycle, he owns an option to do so, which is called a real option. From an economic point of view, this option should have a value. In this talk we will address the issues of determining the real option value and how it is different from computing the price of a standard financial equity option. The talk will be mainly focussing on the concepts that are needed and a little less on the detailed mathematics.

Joint work with Ulrike Lauferts.

**February 23, 2011 Erik van Zwet (LUMC)**

*What is Caus**al Inference?*

Researchers often want to know if one thing causes another. Statisticians tend to respond that they are happy to test for association, but that association does not imply causation. Now "causal inference" aims to address causation itself. With its particular notation and terminology, causal inference seems very different from standard statistical inference. Judea Pearl, who wrote a book on causality, even states: "Almost by definition, causal and statistical concepts do not mix". From Pearl's book, I learned that at the heart of causal inference lies a very neat idea from 1960 due to Robert Strotz and Herman Wold. This idea leads us to interesting parameters to estimate. Estimating these parameters from data is, of course, just standard statistics. I should mention that I am not an expert on causal inference. My goal is just to help bridge the gap between causality and mainstream statistics.

**February 16, 2011 Wessel van Wieringen (VU)**

*A random **effects model for regional co-expression associated with DNA copy number aberrations*

We combine coupled DNA copy number and gene expression high-throughput data in order to study regional co-expression, i.e. the phenomenon of neighborhoods of contiguous genes showing similar expression patterns. Such neighborhoods appear throughout the cancer genome and often coincide with DNA copy number aberrations (CNA). We use a random coefficients model to link DNA copy number data of a genomic region to its genes' expression data. The model facilitates a global analysis of regional co-expression at the level of the region (rather than its genes) to assess whether a) there is a shared CNA effect on expression levels of genes within the region, and b) the CNA effect is identical for all genes. To estimate the parameters from high-throughput data, we optimize estimation wrt computational speed and memory use, while incorporating prior knowledge on the parameters. Two examples illustrate the methodology.

**February 9,** 2011 Derong Kong (TU Delft)

*The Markov binomial distribution and a stochastic reactive transport model*

We study the shape of the probability mass function of the Markov binomial distribution, and give necessary and sufficient conditions for the probability mass function to be unimodal, bimodal or trimodal. Moreover, we give a closed form expression for the variance of the Markov binomial distribution (MBD), and expressions for the mean and the variance conditioned on the state at time n.

In the second part of our talk we introduce a discrete time microscopic single particle model for kinetic transport. The kinetics is modeled by a two-state Markov chain, the transport by deterministic advection plus a random space step. The position of the particle after n time steps is then given by a random sum of space steps, where the size of the sum is given by the Markov Binomial Distribution . We prove that by letting the length of the time steps and the intensity of the switching between states tend to zero linearly, we obtain a random variable S(t), which is closely connected to a well known deterministicPDE reactive transport model from the engineering literature. Our model explains (via bimodality of the MBD) the well known double peaking behavior of the concentration of solutes in the PDE model. Moreover, we show that (under a restriction on the initial distribution of the Markov chain) the partial densities do exist, and do satisfy the partial differential equations.

**February ****2, 2011 Rui Castro (Eindhoven University of Technology)**

*Active Learning and sequential experimental design for classification and sparse signal inference*

Many traditional approaches to statistical inference and machine learning are passive, in the sense that all data are collected prior to any analysis. However, in many practical scenarios it is possible to actively use information gleaned from previous observations to sequentially focus the data collection process, closing the loop between data analysis and acquisition. Such scenarios are often denoted as active learning or inference using sequential experimental designs. Despite the potential to dramatically improve inference performance, analysis of such procedures is difficult, due to the complicated data dependencies created by the closed-loop observation process. These difficulties are further exasperated by the presence of measurement uncertainty or noise. This talk will be divided in two parts. First, I'll summarize some results on minimax performance bounds for active learning in non-parametric classification settings.

Second, I'll present a novel adaptive sensing procedure - Distilled Sensing - which is highly effective for detection and estimation of high-dimensional sparse signals in noise. Large-sample analysis shows that the proposed procedure provably outperforms the best possible detection methods based on non-adaptive sensing, allowing for detection and estimation of extremely weak signals, imperceptible without adaptive sensing. Some extensions of these ideas to the compressed sensing framework will also be discussed.

**January 16, 2011 Ivan Corwin (Courant Institute of Mathematics, New York)**

*Beyond the Gaussian Universality Class*

The Gaussian central limit theorem says that for a wide class of stochastic systems, the bell curve (Gaussian distribution) describes the statistics for random fluctuations of important observables. In this talk I will look beyond this class of systems to a collection of probabilistic models which include random growth models, polymers, particle systems, matrices and stochastic PDEs, as well as certain asymptotic problems in combinatorics and representation theory. I will explain in what ways these different examples all fall into a single new universality class with a much richer mathematical structure than that of the Gaussian.