### Multi-dimensional stochastic dynamical systems and network theory with applications

Main supervisor: Woosok Moon

Scientists in the past struggled obtaining necessary data from observations and experiments to test their hypotheses. However, in the era of big data our focus has transitioned towards how to extract useful information from a plethora of quantities. The most popular modern formalism for doing this is machine learning. In particular, deep learning based on neural networks is being widely used for numerous applications.

Recurrent Neural Networks (RNN) are one of the deep learning architectures that have been applied to prediction schemes in time-series data. This specific deep learning architecture tries to embrace various patterns contained in the nonlinear structure of time-series and to use them for near-future projections. For a successful prediction, the time-series should be sufficiently long to accommodate the patterns required for complete deep learning. Unfortunately, many important time-series, such as stock market indices, oil prices and global averaged temperature, are not long enough to rely only on the RNNs to predict near-future status.

In condensed matter physics and statistical mathematics, the theory of stochastic dynamical systems, deals with a dynamical system subjected to noise forcing. The simplest stochastic dynamical system, called the Langevin equation, was derived to study Brownian Motion and describes the balance between the deterministic stabilizing process of viscous dissipation and the effect of high frequency collisions as “noise”. In this formalism, we accept the limitation of predicting the contribution of the short-time processes.

Although these two formalisms have different advantages and disadvantages in their application to a time-series data, this project focuses on a hybrid methodology to overcome the limitations of finite-size time-series for predictability. To this end, the project will develop mathematical theories of multi-dimensional stochastic dynamical systems and complex networks to analyze and diagnose spatio-temporal data ubiquitous in environmental science. The ultimate goal is to develop a hybrid method using RNNs and stochastic mathematics for realistic forecasting. As a direct application, the model will be tested in a variety of abrupt climate transition scenarios in the observational record.

### Statistical modeling and algorithms for analyzing large biological datasets

Main Supervisor: Kristoffer Sahlin

General description
The group led by Kristoffer Sahlin primarily works on problems in the area of Computational Biology. The area of Computational Biology typically studies and designs algorithms, statistical and mathematical models, or machine learning methods to process, correct, visualize, or summarize large biological datasets. Typical problems in this area are to predict DNA or RNA sequences and their abundance from noisy sequencing data or to estimate phylogeny or viruses, species, or tumors DNA mutation rates.

Project description
In our cells, DNA and RNA play an important role in determining the state and condition of cells and tissue and therefore give key information to the study of diseases, traits, and evolution. We can now access DNA and RNA sequencing data at large scale and low cost. The successful applicant will work with designing algorithms and statistical methods to extract information from large RNA and DNA sequencing datasets, with immediate relevance to biological and biomedical questions. The applicant will have the opportunity to collaborate with groups both in biological and mathematical sciences.

Other information
We are looking for an ambitious and highly motivated Ph.D. student in Computational Biology. It is preferential that successful applicants are familiar with scripting/programming. Bioinformatic/biological familiarity or experience with sequencing data is an advantage but is not required.  The applicant can expect to develop into an interdisciplinary researcher with strong interdisciplinary knowledge in statistical methods and algorithm design for large-scale life science and biology datasets. The position is available at the Department of Mathematics at Stockholm University in the Division of Computational Mathematics. The group of Kristoffer Sahlin is located at the interdisciplinary research facility SciLifeLab in proximity to the campus of Karolinska Institute where the working atmosphere is international with English as the working language.

### Cubical methods for computer formalization of mathematics

Main supervisor: Anders Mörtberg

Dependent type theory is a foundation for constructive mathematics that underlies many modern proof assistants (Agda, Coq, Lean, …). These systems allow for interactive development of proofs from mathematics and computer science so that all of the steps are verified to be logically sound. In recent years new type theories and models for these theories have been developed using ideas from topology and homotopy theory, leading to a new area called Homotopy Type Theory and Univalent Foundations (HoTT/UF). As part of these new developments various Cubical Type Theories (CTT) have been designed, leading to the adaptation of these ideas in modern proof assistants. For instance, the dependently typed programming language Agda now comes with a cubical mode. These new developments enable many interesting possibilities for doing constructive HoTT/UF and the PhD project would involve studying this further. More specific project ideas include:

1. Computing using CTT: of particular interest is the "Brunerie number" (a synthetic definition of $n \in \mathbb{Z}$ such that $\pi_4(\mathbb{S}^3) = \mathbb{Z}/n\mathbb{Z}$). Despite considerable efforts all attempts to compute this number using a computer have so far been unfruitful. Solving this problem is of great interest as it will open up the possibility of doing more synthetic mathematics in type theory using computational methods.

2. Study metatheoretical properties of type theory in general and CTT in particular. For example, analyze the relationships between the constructive cubical set models with the Kan simplicial set model of HoTT/UF.

3. Explore the consequences of constructive HoTT/UF using Cubical Agda by doing various formalization projects in both mathematics and computer science. For instance, using higher inductive types to do constructive synthetic homotopy theory or to represent programming
languages quotiented by beta-eta equality.

4. Develop better support for CTT in modern proof assistants. As part of this one will have to answer many interesting questions related to typechecking, unification, etc., for CTT.

The project will be carried out in the newly formed Computational Mathematics division. It will also be possible to collaborate with other groups in the department, such as the Mathematical Logic group (with experts on constructive mathematics and type theory like Per Martin-Löf and Peter LeFanu Lumsdaine) and the Algebra, Geometry, Topology, and Combinatorics group.

### Computational problems in evolution relating to the role of gene duplications in coevolutionary interactions

We have in past projects developed models and tools for analyzing gene evolution with respect to species evolution, explicitly taking gene duplications, transfers, and losses into account. Using techniques from statistical learning and inference, the PhD candidate will adapt and extend this work for other types of data and new question in evolution. Working with experts on the evolution of Pieridae butterflies and their Brassicales hostplants, we want to address detailed questions regarding the evolutionary history of Pieridae genes, in particular relating to gene duplication and functional innovation. The aim is to better understand the genetic and evolutionary basis of what drives the diversification of much of life on Earth.

The project is carried out in collaboration with Prof Christopher Wheat at the Department of Zoology, who has recently hired a PhD student to work on the biological analysis part of the project. The goal is to have both PhD students work together and share insights between their two fields.