Report on the experiences of the partnering event between mathematicians in academia and industry.

Partnering event 23. September 2016 (Scandic Lerkendal)

A special gathering for promoting exchange of views and network building between academics and industrial researchers was organised during the conference Math meets industry held in Trondheim, Norway, on September 22nd and 23rd 2016. The conference was organised by NTNU in collaboration with ECMI and Nor-Maths-In (the Norwegian network of mathematics in industry and innovation).

We here briefly report on the experiences collected from the organisation of this partnering event.

The goal was to facilitate networking and inter-sectorial scientific discussions, aiming at joint scientific projects and research proposals as possible outcomes.

Because the participants had so different background, sharing experiences and gaining familiarity with each other’s work and approaches was an important part of the process.

The partnering event took place just after a session on funding opportunities (funding from the Research Council of Norway, from H2020 and from other national sources), this was done to stimulate the participants not only to generate common projects and ideas but also to think of how to fund them.

A number of (research) topics were selected. We have asked to the participants to suggest one or more of such topics in the registration form. Priority was given to the topics proposed by company representatives.

Industry suggestions for the discussion groups

The suggestions we have gathered from the companies participating in the event, via the registration form were:

Big data for production and reservoir optimisation. (Statoil)
Machine learning Cyber Security (Telenor)
Analysis of huge amounts of usage data from Microsoft’s cloud services. (Microsoft, Fast)
Data analysis and challenges related to the media industry (Amedia).
Numerical simulations of flow in porous media, unstable displacement processes (typically for gas injection), improved numerical solution methods for such models/equations, increased computational efficiency and accuracy. (Statoil)
Multiphase flow model numerical solvers Production Optimization Data reconciliation (Forsys Subsea).
7. Mathematics/statistics for analytics + Machine Learning for Language understanding and Personalisation (Telenor).

Discussion groups

In order to harmonise the interests (and expertise) of academic and non-academic participants, mathematicians working in academia were asked to formulate possible discussion themes starting from the list of topics suggested by the industry. The final discussion topics and their moderators were:

Porous media flow and multi-phase flow (Knut- Andreas Lie, SINTEF and Florin Radu, University of Bergen).
Data analytics and Big data (Bo Lindqvist, NTNU).
Health analytics and related challenges, e.g. language recognition (Fred Godtliebsen, University of Tromsø).
Optimisation and inverse problems (Anton Evgrafov, NTNU and Markus Grasmair, NTNU).
Business and industrial statistics (John Tyssedal, NTNU).
Topological data analysis (Nils Baas, NTNU).

All main Norwegian Universities sent their representatives to contribute to the discussion, in addition there were participants from SINTEF (a Norwegian research foundation) and the Simula Center. These centres have extensive experience of collaboration with industry. We have already mentioned a number of Norwegian companies who sent their representatives to the partnering event, in addition to these EMGS, Fei, DNV and General Electric participated in the discussions.

Programme

We divided the room designated to the partnering event into different areas assigned to each of the groups and where people within each group could interact. We provided flip charts for writing, where remarks and key-words could be noted.

More concretely the programme for the event was:

Walk around and write on the boards scientific problems challenges and opportunities related to one or more of the discussion groups (20 min).
See what has been written by the other participants. Use 5 stickers to vote the topics you think are most promising (20 min).
Finally, a more pointed discussion on selected topics should take place. Write down (at least some of) your thoughts on the flip charts. (40 min).
Closing.

We encouraged the participants to approach this event with an open mind, being willing to contribute with their experience and background, and being open to new scientific acquaintances and partnerships.

As a side effect, the partnering event gave to young researchers (PhD students and post-docs) a chance to speak with potential non-academinc employers.

In what follows we have gathered the reports from each of the discussion groups.

BUSINESS AND INDUSTRIAL STATISTICS

Moderator: John Sølve Tyssedal

The following topics were written down (with votes in parenthesis)

Industrial data (3)

Noise
Correlation
Time lags

Ship sensor systems (7)

Efficiency
Weather effects (waves, wind, currents etc)

Experimental design on web pages (3)

Merge Physical systems

Knowledge and data

Making use of different data sources (7)

Fault detection

High dimensional sensor system (3)

Educating industry employees (6)

Uncertainty in Geological data

Fast methods for calculating harmonic pressure in propagating sound pulse. (200ms calculation time)

Discussion

There was some discussion around educating industry employees. One suggestion was to establish mobility scholarships, with the intention that academic employees spent some time in industry, or even better several time periods to ensure better guidance in using statistical methods.

Amedia wanted to use Experimental design on webpages, or, more precisely, how changes in web-pages affected their activity which focuses on being the leader in disseminating local news in Norway. Some agreement was made to keep contact and come up with suitable student projects.

DATA SCIENCE AND BIG DATA

Moderator: Bo Lindqvist

Participants' keywords written on the board (ranked according to number of coloured “dots” obtained):

From data to information
Privacy
How to get rid of bad data
Combine Big Data and Knowledge/Intuition?
Adversarial learning
Outliers, extreme cases
Sparse signals in huge data sets
How to craft good training data when there seemingly isn’t any data fitting your objective
Parallel analysis
Knowing when the data cannot answer the question
How to fit noise?
Variable bias (e.g. time dependent bias)
High dimensional sensor systems
Big MODELS
Are all data “created equal”?
Value of more data
Building data science teams
Software engineering requirements for data science

Some notes from the discussion:

Use domain knowledge to guide research data visualization
How to build data scientist team - researcher + siv.ing. vs. Data scientist (multiple competences)
How to know what the right profile for a business is? – Is there a lack of software engineering skills within data sciences in Norway?
How to build secure machine learning?

More detailed comments

Tools for Big Data analysis make high-powered statistical and machine learning methods available to not only professional statisticians and computer scientists, but also to casual users. As with any tools, the results to be expected are proportional to the knowledge and skill of the user, as well as the quality of the data.

Unfortunately, much of the data mining, machine learning, and Big Data literature may give casual users the impression that if one has a powerful enough algorithm and a lot of data, good models and good results are guaranteed at the push of a button.

There is thus a need for building data scientist teams, involving scientists with multiple competences. Here, statisticians can contribute by applying sound principles of statistical data design and inference.

Snee et al. (2014) consider four important principles of, what they call statistical engineering, and which in their opinion have been either overlooked or underemphasized in the Big Data literature. In some sense these principles also summarize much of the discussion from the participants at the Partnering Event of Math Meets Industry:

Need for a clear strategy to guide the analysis of Big Data sets and the solution of the associated problems of interest.
The importance of using sequential approaches to scientific investigation, as opposed to the "one-shot study" so popular in the algorithms literature.
The need for empirical modeling to be guided by domain knowledge (subject—matter theory), including interpretation of data within the context of the processes and measurement systems that generated it, and
The inaccuracy of the typical unstated assumption that all data are created equal, and therefore that data quantity is more important than data quality.

How can Norwegian academia contribute?

With the advent of Big Data, statisticians have internationally contributed significantly in developing new tools to analyze such data, including classification and regression trees (CART), neural nets, methods based on bootstrapping, such as random forests, and various clustering algorithms.

The mathematics/statistics departments in Norway should play an important role in both research and teaching in “Big Data” at the various Universities. One idea might be to introduce a sub-program involving topics from mathematics, statistics and computer science, either as part of the industrial mathematics program or the master program in mathematical sciences. Some universities in USA have programs for "statistical engineering" where analysis of Big Data is emphasized.

The Department of Mathematical Sciences at NTNU has today a PhD-course, "General Statistical Methods", where the main topic is statistical learning, with emphasis on techniques that are important for Big Data. This course can be further developed into a master/doctoral course, and will be open also for students from other departments who want an introduction to the statistical underpinning of Big Data methods.

The Partnering Event revealed an interest of industrial partners in research in Big Data at the Universities. This might give rise to future research collaborations. Here is a list of topics received before the seminar:

Big data for production and reservoir optimisation. (Statoil)
Machine learning Cyber Security (Telenor)
Analysis of huge amounts of usage data from Microsofts cloud services. (Microsoft, Fast)
Data analysis and challenges related to the media industry (Amedia)
Mathematics/statistics for analytics + Machine Learning for Language understanding and Personalisation (Telenor)

The following abstract of the talk given by Stig Faltinsen, Danske Bank, also indicates how the Universities can contribute in the analysis of Big Data. Faltinsen also recommended IMF to give “state of the art”continued education courses in mathematical disciplines.

(Abstract): In this talk, we show examples of how mathematics is applied in finance, banking and life insurance. Important modelling themes are risk measurement, risk management, pricing and forecasting. Within these industries, a wide range of mathematical disciplines is made use of for example Statistics, stochastic calculus, Monte Carlo simulations, time series analysis and numerical solutions of SDEs and PDEs to mention some.

In addition, the Department of Mathematical Sciences at NTNU (and the statistics groups in particular) has after the seminar received a request from Christian Meland in Sparebank 1/Credit Cards. He would like to cooperate with students and advisors who would like to work on various problems involving big data from credit card transactions.

Reference:

Snee, Ronald D., Richard D. DeVeau, and Roger W. Hoerl. "Follow the fundamentals." Quality Progress 47.1 (2014): 24-28.

NTNU, 4 October 2016,

Bo Lindqvist

APPENDIX: Flip-over notes from the Partnering Event:

missing image

TOPOLOGICAL DATA ANALYSIS (TDA)

Moderator: Nils A. Baas

Participants’ keywords written on the board (ranked according to number of coloured “votes” obtained):

Can topology help us understand the human brain?
Statistics and data on manifolds (or with external geometry).
Can some aspects of climate change be studied with this?
Can this be used to classify subsurface data?

TDA is a relatively new field of applied topology. The idea is to take a data set and associate a topological space to it. Often this will depend on a scale or correlation parameter, giving a family of spaces. The new method called persistent homology

then gives invariants called persistent diagrams or Betti-curves measuring the topological holes in the spaces. This often gives useful geometric information about the data set. Comparison with "random spaces" gives a useful test of the degree of structure in the data set.

At the Department of Mathematical Sciences at NTNU we have mostly looked at biological data, genomic and neural. For example recordings from rat brains in the form of "spike trains" giving rise to spaces and Betti-curves. The method can then give information about the influence of various stimuli on the spike trains, especially in the hippocampus and entorhinal cortex.

The method seems to have a great potential for a variety of data sets in meteorology, geophysics, etc. all to be explored!

At the partnering event there were mostly comments and questions regarding applications to neuroscience and statistical aspects of the method.

HEALTH ANALYTICS AND RELATED CHALLENGES

Moderator: Fred Godtliebsen

During the first part of this session, the following tasks were written down:

EHR info extraction
Personalized medicine
Data integration
Regularization and legal aspects
Automatic organ recognition in ultrasound images
Extend to disease recognition (myocardial contraction)
Information privacy
Combining imaging + treatment (e.g. ultrasound and MRI)
Limited resources in health sector => How to optimize ‘public health’
Uncertainty
Epidemics/pandemics

In the second part the following was written down on the board:

Personalized medicine

Algorithms for blood glucose control in T1D patients
Gene information
Organ recognition based on curves or surfaces to improve recognition of objects (Must be fast).
Deep learning based methods for organ recognition

How can academia keep up with industry driven research?
Neural networks are coming back. Links to machine learning

(See paper ‘The two cultures’ by Leo Breiman)

Concluding remarks:

Several good discussions took place in front of the board. The two most important outcomes were:

Potential collaboration between Machine Learning group at UiT and GE Healthcare about organ detection using machine learning. Kjell Kristoffersen recommended the UiT group to contact Erik Steen in GE Healthcare and this has already been done. The shape analysis performed at NTNU (by Elena Celledoni and collaborators) may also play an important role here. UiT will have an initial discussion during week 42 with GE Healthcare about potential collaboration.
Potential collaboration between Simula researchers and Machine Learning at UiT concerning algorithms for controlling blood glucose in persons with Type 1 Diabetes. There has been no contact regarding this after the meeting, but this may change when the Machine Learning group in Tromsø has made more progress in this direction. There is also much room for collaboration with the APT (Artificial Pancreas Trondheim) group, in particular for obtaining more realistic simulators.

POROUS MEDIA FLOW

Moderators: Knut-Andreas Lie and Florin A. Radu

In the discussion group “flow in porous media” we agreed that there are still a lot of relevant potential research questions, especially in connection to:

New mathematical models
Stochastic simulations
Software development

New models and numerical methods are needed because in many relevant applications (e.g. enhanced oil recovery or water pollution), the classical models are not able to reproduce the reality. As an example, saturation overshoot and fingering cannot be modelled by Richards’ equation, which is the classical approach. In order to achieve this one has to include either hysteresis or dynamic capillarity, which are so called non-standard models. Another example is unstable displacements and viscous fingering phenomena in miscible fluid systems, that are not well resolved by standard simulation methods.

Stochastic simulations are of an increasing interest. It is not new that realistic reservoir porous media are heterogeneous, but so far stochastic simulations were simply too expensive. In the meantime, the simulation techniques and the computer power were improved a lot and stochastic simulations are feasible also for large scale applications.

Last, but not least, there is a significant interest in new applications in which porous media theory is coupled to other effects like mechanics, heating, electric currents, etc. One example here is within metallurgy (submerged arc furnaces, blast furnaces, etc).

Optimisation and inverse problems

Moderators: Markus Grasmair and Anton Evgrafov

Most of the popular topics discussed in this group are related to techniques for dealing with large optimization and inverse problems such as for example model reduction. Probably owing to a sizeable representation of statisticians in the discussion keywords such as stochastic inversion and uncertainty modelling have been mentioned: the latter (together with uncertainty quantification) is indeed a very active research topic in the area of applied mathematics. Regularization is an integral part of solution procedures for most inverse problems and appears often in methods for dimension reduction of large scale problems (for example sparse regularization), and the discussion participants acknowledge this. Shape optimization has been mentioned both in the context of geological inverse problems with structured priors, and in the context of geometric optimization and optimization on manifolds. Machine learning and many other big-data and statistical inference related activities can be naturally expressed in optimization terms.

Since the Norwegian economy is still strongly dependent on the oil industry questions related to geological inverse problems (resolving geological layers, identification of geological material properties and/or boundaries from indirect measurements) are of great interest for several industrial players. These problems often include large scale, partial and noisy measurements, and are extremely ill posed. At least some of these issues can be mitigated by utilizing usually available prior information about the geological layout and nature of the solution, which can frequently be of statistical nature.

Many of the topics outlined on the posters have been brought up by fellow mathematicians and statisticians, which indicates many possible research opportunities within the department of mathematical sciences, and we think will be of strong industrial interest in the future; for example we refer to the modern optimization activities in infinite dimensional Riemannian geometry and applications of these in shape optimization.

We feel that many generic topics, which are mostly related to mathematical modelling, that have been brought up on the board reflect the overwhelming interest and need for inverse problems and optimization in both the industry and the academia.

Contact

The conference is sponsored by the Research Council of Norway and by ECMI.

Contact

Elena Celledoni, NTNU

Alexander Schmeding, NTNU

Fred Godtliebsen and Trygve Johnsen
University of Tromsø

Ingrid Kristine Glad
University of Oslo

Sigmund Selberg and Adrian Florin Radu
University of Bergen

Yuriy Rogovchenko
University of Agder

Tore Selland Kleppe and
Bjørn Henrik Auestad
University of Stavanger

Norges Forskningsråd

ECMI

H2020 RISE MSCA project CHiPS

Språkvelger