# OxWaSP Mini-Symposia 2018-19

#### Term 2

#### Friday March 8th

14:00-15:00: Ali Eslami (DeepMind)

**What do few-shot learning and unsupervised scene understanding have in common? (MB 0.07)**

I will first introduce the Generative Query Network (GQN), a framework within which machines learn to represent scenes using only their own sensors. The GQN takes as input images of a scene taken from different viewpoints, constructs an internal representation, and uses this representation to predict the appearance of that scene from previously unobserved viewpoints. The GQN demonstrates representation learning without human labels or domain knowledge, paving the way toward machines that autonomously learn to understand the world around them. In the second half of the talk I will cover the theory behind GQNs and introduce Neural Processes (NPs), a generalisation of the GQN training regime to a wider range of tasks such as regression and classification. NPs are inspired by the flexibility of stochastic processes such as Gaussian processes, but are structured as neural networks and trained via gradient descent. I show how NPs make accurate predictions after observing only a handful of training data points, and scale to complex functions and large datasets.

15:30 - 16:30: Javier Gonzalez (Amazon)

**Gaussian processes and the common ground of decision making under uncertainty (MB 0.07)**

In this talk we will review the common ground of several decision making methods under uncertainty (such as Bandits, Bayesian optimization, Active Learning and Bayesian quadrature) and the role that Gaussian processes play as belief model in these approaches. We will discuss some recent advances in these fields and propose a general recipe to design and implement new methods.

#### Friday February 22nd

13:55-14:55 Jeremie Houssineau (University of Warwick)

**Bayesian estimation with transport maps**

In recent years, optimal transport has received a lot of attention not only from a theoretical perspective but also in more applied areas such as machine learning and statistical inference. In particular, solutions to the Bayesian inference problem using transport maps have been recently proposed, enabling both filtering and smoothing to be performed for non-linear non-Gaussian state space models. In addition to general examples of applications, more specific uses of the transport map methodology will be given for solving inference problems related to stochastic differential equations.

15:30pm - 16:30pm Geoff Nichols (University of Oxford)

**Calibrating approximate Bayes coverage (MB 0.07)**

In a paper from 2004 with the snappy title "Getting it Right", Geweke suggests a test to see if an MCMC sampler has converged. It goes something like this (this is not Geweke's test but the same idea):

simulate a parameter phi from the prior and synthetic data y' from the observation model y'|phi; use your MCMC to simulate theta|y'; now phi and theta should be exchangeable. Do this 100 times and make, say, a rank test for exchangeability of the phi's and theta's.

However when we use Monte Carlo methods to fit complex models to large data sets we often make additional approximations, so we wouldnâ€™t really expect to pass this test. We may approximate the likelihood and mess with the algorithm itself. Last year Yao and co-authors proposed a similar approach to quantify the damage done by these additional approximations - differences between the distributions of the phi's and theta's tell us something about the damage done by our approximation.

We give new methods for quantifying the bias introduced by approximations made when we estimate credible sets. We focus on the Bayesian coverage of the credible set by which we mean the probability a credible set for the approximate posterior at the data covers a draw from the exact posterior at the data. This is the coverage we would get if nature really did draw the parameter from the prior and the data from the observation model. We give a generic computational wrapper that calibrates a credible set - without sampling the exact posterior itself.

This is a tough computational problem and there is work to be done finding better estimators within the general framework we provide.

However our simple first-shot estimators work well in at least a few substantial cases.

#### Friday February 8th

14:00pm -15:00pm David Rossell (Dept. of Economics & Business, Universitat Pompeu Fabra)

**High-dimensional Bayesian model selection: ideas, theory and examples (MB 0.07)**

Bayesian model selection is a general framework for two fundamental tasks in Statistics: choosing the best model among a set under consideration and quantifying the selection uncertainty.

Applications include hypothesis testing, variable selection, dimension reduction or mixture models, to name a few examples. We discuss the state of the art, with an emphasis on high-dimensional settings where the number of models may far exceed the sample size. A natural question is whether the solution provided by BMS is good in a frequentist sense, e.g. what are the guarantees that one selects the optimal model and that posterior probabilities adequately portray uncertainty. We overview current results and discuss limitations, particularly the consequences of unduly enforcing sparsity or model misspecification. We will illustrate the practical implications of the results via a variety of examples, including high-dimensional regression, mixture and factor models.

15:30pm -16:30pm Ritabrata Dutta (University of Warwick)

**Likelihood-free Inference: with a pinch of 'Classification' (MB 0.07)**

To explain the underlying natural phenomena which causes the universe to expand or how M-theory binds the smallest of the small particles or how social dynamics can lead to revolutions, Natural scientists develop complex 'mechanistic models'. As the analytical form of the likelihood function for the parameters of these mechanistic models are not available, we need likelihood-free inferential schemes (e.g. Approximate Bayesian Computation (ABC)) for the calibration or the choice of the best mechanistic model, given the recent observations of expanding universe, Higgs boson particles or 'online' societal interactions. These likelihood-free methods are based on: (i) directly approximating the likelihood function or in the case of ABC (ii) the fundamental idea of accepting a parameter value which can forward simulate data 'similar' to the true observed data. In this talk, we will first illustrate how multiple issues in likelihood-free inference (e.g.choice of 'similarity' measure between two datasets, approximation of the likelihood function and model selection) can be seen as classification problem. Building up on this connection, we show how Machine learning algorithms for classification (including logistics regression, random forest or deep learning) "have been"/"can be" used for likelihood-free inference, specifically ABC, to improve inferential results. Finally, we will pose some open-ended research questions for further thoughts.

#### Friday January 25th

14:00pm - 15:00pm Petros Dellaportas (UCL)

**Identifying and predicting jumps in financial time series (MB 0.07)**

Abstract :We deal with the problem of identifying jumps in multiple financial time series using the stochastic volatility model combined with a jump process. We develop efficient MCMC algorithms to perform Bayesian inference for the parameters and the latent states of the proposed models. In the univariate case we use an homogeneous compound Poisson process for the modelling of the jump component. In the multivariate case we adopt an inhomogeneous Poisson process, with intensity which is also a stochastic process varying across time and economic sectors and markets. A Gaussian process is used as prior distribution for the intensity of the Poisson process. This model is known as doubly stochastic Poisson process or Gaussian Cox process. The efficiency of the proposed algorithms is compared with existing MCMC algorithms. Our methodology is tested through simulation based experiments and applied on 600 stock daily returns of Euro STOXX index over a period of 10 years.

15:30pm - 16:30pm Dario Spano (Warwick)

**Time-dependent non-parametric models via genealogies and duality (MB 0.07)**

In many statistical problems (including financial econometrics, filtering, genetics, meta-analysis) the parameter of interest depends on a covariate which we can conveniently interpret as time. The observations are taken at distinct time points thus they are, in general, not exchangeable. From a Bayesian perspective, it is important to be able to

model tractable and interpretable prior distributions on time-dependent parameters to capture heterogeneity in the data. The problem becomes complicated when the parameter is, at each time, infinite-dimensional e.g. a measure. I will illustrate how two families of measure-valued stochastic processes, well-known in the area of population genetics, can be used to generate continous-time-dependent variants of the popular Dirichlet Process and the Gamma process nonparametric priors. I will illustrate how genealogical processes embedded in the mentioned population models can be used, in connection with various probabilistic notions of duality, to describe dependence fairly explicitly, and to provide insight on the so-called "borrowing strength" properties of the model.

#### Term 1

#### Friday November 30th

14:00pm -15:00pm Mingli Chen (Warwick)

**Modelling Networks via Sparse Beta Model (in room MB0.07)**

We propose the Sparse Beta Model, a novel network model that interpolates the celebrated Erdos-Renyi model and the Beta Model and show that the Sparse Beta Model is a tractable model for modelling sparseness of a network. We apply the proposed model and estimation procedure to the well-known microfinance data in Banerjee et al. (Science, 2013) and find interesting results.

15:30pm - 16:30pm Mihai Cucuringu (Oxford University)

**Spectral methods for certain inverse problems on graphs (in room MB0.07)**

We study problems that share an important common feature: they can all be solved by exploiting the spectrum of their corresponding graph Laplacian. We consider the classic problem of establishing a statistical ranking of a set of items given a set of inconsistent and incomplete pairwise comparisons between such items. Instantiations of this problem occur in numerous applications in data analysis (e.g., ranking teams in sports data), computer vision, and machine learning. We formulate the above problem of ranking with incomplete noisy information as an instance of the group synchronization problem over the group SO(2) of planar rotations, whose usefulness has been demonstrated in numerous applications in recent years. Its least squares solution can be approximated by either a spectral or a semidefinite programming relaxation, followed by a rounding procedure. We also present a simple spectral approach to the well-studied constrained clustering problem. It captures constrained clustering as a generalized eigenvalue problem with graph Laplacians. The proposed algorithm works in nearly-linear time, provides guarantees for the quality of the clusters for 2-way partitioning, and consistently outperforms existing spectral approaches both in speed and quality. Building on this work, we recently proposed an algorithm for clustering signed networks (where the edge weights between the nodes of the graph may take either positive or negative edges) that compares favourably to state-of-the-art methods. Time permitting, we discuss possible future extensions of the group synchronization framework, applications to extracting leaders and laggers in multivariate time series data, and the phase unwrapping problem.

#### Friday November 16th

14:00 Chris Wymant (Big Data Institute, University of Oxford)

**Analysis of pathogen genetic sequence data to help prevent the spread of infectious diseases (in room MB0.07)**

Infectious diseases kill millions of people every year. Epidemiological studies of these diseases try to identify patterns that are associated with disease spread, so that we can more effectively intervene and improve public health. Molecular epidemiology uses molecular data for this aim; in particular, the genetic sequence of the pathogen from infected individuals. Sequences accumulate mutations over time, and so such data allow us to make inferences about the pathogen's evolutionary history and perhaps about factors affecting it, i.e. the story of the epidemic from the pathogen's point of view. After a general introduction to this field of work I will explain our molecular epidemiological method 'phyloscanner', some applications of it to large HIV datasets, what we learned, the statistical models involved, and ways in which we would like these models to be better.

15:30 Julia Brettschneider (Department of Statistics, University of Warwick)

**Spatial statistics in scientific research involving image data (in room MB0.07)**

Progress in imaging technologies has opened up new avenues for scientific research. Statistical methodology needs to be adapted and extended to optimally exploit the available data and address questions formulated by scientists. Interdisciplinary dialog about new types of data can also lead to better models of the measurement process, to improved preprocessing and quality assessment of the data and to novel methods of knowledge extraction and models of the measurement process. In X-ray CT, for example, planar point processes are a natural model for dead pixels. Concepts such as complete spatial randomness can for example be explored with functions capturing between point interactions. They can be used to make statements and inference about the state of the detectors. In fluorescent confocal microscopy, a central interest is the imaging of protein concentration. Distances between point clouds can be captured, for example, using the earth movers distance. In cell biology, this can be used to model relative abundance of two protein species or to describe and analysis of the temporal evolution of a single protein.

#### Friday November 2nd

1400-1500 Sarah Penington (Bath University)

**The spreading speed of solutions of the non-local Fisher-KPP equation (in room MB0.07)**

The non-local Fisher-KPP equation is a partial differential equation (PDE) which is used to model non-local interaction and competition in a population, and can be seen as a generalisation of the classical Fisher-KPP equation. In the 80s, Bramson used a Feynman-Kac formula to prove fine asymptotics for the spreading speed of solutions of the Fisher-KPP equation using probabilistic techniques. Bramson's proofs also rely on a maximum principle which does not hold for the non-local form of the PDE. However, it turns out that we can adapt his techniques to prove results on the non-local Fisher-KPP equation using only probabilistic arguments - in particular, probability estimates for Brownian motion.

1530-1630 Sigurd Assing (University of Warwick)

**Enlargement of Filtrations (in room MB0.07)**

In a recent paper I was able to apply Enlargement of Filtrations to turn a parabolic Stochastic Partial Differential Equation (SPDE) into an elliptic one, and this was a little bit surprising as this technique usually does not belong to the tool-box used to treat SPDEs. So, I started to think that Enlargement of Filtrations might be of interest to people working in all sorts of fields, and decided to talk about it to a Stats-Community. This talk won't be a lecture on Enlargement of Filtrations. I'll rather discuss examples and try to make connections to related fields. I'll also touch questions like "What is the essence of Ito's formula?" and "Why is Malliavin calculus useful?".

#### Friday October 19th

14:00 Martyn Plummer (Warwick University)

**Bayesian Analysis of Generalized Linear Mixed Models with MCMC (in room MB0.07)**

BUGS is a language for describing hierarchical Bayesian models which syntactically resembles R. BUGS allows large complex models to be built from smaller components. JAGS is a BUGS interpreter which enables Bayesian inference using Markov Chain Monte Carlo (MCMC).

The efficiency of MCMC depends heavily on the sampling methods used.

JAGS is "black box" software that makes decisions about sampling methods without user input. Therefore a key function of the JAGS interpreter is to identify design motifs in a large complex Bayesian model that have well-characterized MCMC solutions and apply the appropriate sampling methods.

Generalized linear models (GLMs) form a recurring design motif in many hierarchical Bayesian models. Several data augmentation schemes have been proposed that reduce a GLM to a linear model and allow joint sampling of the coefficients. I will review the schemes that have been implemented in JAGS and highlight the important relationship between graphical models and sparse matrix algebra.

15:30 Ricardo Silva (University College London)

**Neural Networks and Graphical Models for Constructing and Fitting Cumulative Distribution Functions (in room MB0.07)**

There are several ways of building a likelihood function. Mixture models, hierarchies, re-normalized energy functions and copulas are examples of popular approaches. In this talk, we will explore how multivariate constructions and parameter fitting can be accomplished by parameterizing cumulative distribution functions and handling the computational implications. In our first method, we describe how deep neural networks can encode a very general family of CDFs, and how the standard tools of automatic differentiation can be easily repurposed for maximum likelihood under this parameterization. In our second discussion, we will show factorized representations of CDFs and how the machinery of graphical models immediately transfers into this domain by reinterpreting message-passing.