Venue:  Centre for Mathematical Sciences, Cambridge

Dates: 28 – 30 September, 2015



We are experiencing a data-driven revolution at the moment with data being collected at an unprecedented rate across the sciences and industry. The scale and complexity of these modern datasets often render classical techniques infeasible, and several new methods have been developed within the fields of Statistics and Computer Science to address the challenges posed by the large-scale (and often non-standard) nature of the data. The approaches taken by researchers in these fields are often rather different, with statisticians more concerned with extracting the greatest amount of information given limited data, and computer scientists instead thinking of the computational budget as the primary constraint. In order to develop successful methodology for large-scale data, it is often necessary to draw on ideas from both of these approaches and balance statistical efficiency with computational speed.

In this workshop, we will bring together statisticians and computer scientists working on methodology for large-scale data, as well as researchers working on applications in the sciences and industry. The aim is to map out the Big Data landscape in terms of putting forward the challenges faced by practitioners, and charting the main promising directions in Statistics and Computer Science. The final goal would be to foster collaboration and identify new research directions that require a symbiosis of these fields.

Key scientific questions to be answered:

Given current developments in Statistics and Computer Science, and the data analysis challenges that are being faced and will be faced by science and industry as technology develops, what are the key research areas that may benefit from a collaborative approach between the two fields? More specifically, we expect discussion around the following scientific questions:

  • How can we develop prediction and estimation methods that perform well given constraints on both the amount of data and a computational budget?
  • How can we quantify uncertainty in complex and high-dimensional settings in a way that is computationally efficient and also takes account of the limitation that any output must be interpretable?

Key topics to be addressed:

  • High-dimensional inference
  • Large-scale optimisation
  • Network data
  • Genomics
  • The role of randomisation in large-scale learning
  • Online algorithms
  • Sketching.

Key sectors involved and impacted:

Both the mathematical sciences community and the computer science community will be involved and impacted by this workshop.  In addition to feeding back into core fundamental research in high-dimensional statistical methodology and the theory of algorithms, the workshop will also benefit practitioners across the big data sector, allowing them to better understand future developments in computational-statistical efficiency trade-offs and the ability to quantify uncertainty in the context of big data analytics.