Efficient training of diffusion transformers for the weather
The promise of diffusion transformers for weather is limited by how they typically require far more resources to train than non-generative models. With a push to higher-resolution data and handling of multimodalities such as diverse observations, transformers must process even more data. This increases their computational cost which typically scales quadratically with the number of data patches processed. A current solution for improving training efficiency involves randomly masking patches during training to reduce the number of patches processed....
Score-based generative modelling
This page is about score-based generative modelling: why we would want to do it, and a simple way to do it. This is background to the more sophisticated diffusion models used in practice today. There are many tutorials and blog posts on diffusion already, such as Lilian Weng’s . My favourite is Yang Song’s which I found excellent and refer you to. I largely follow his structure and refer to the same equations here....
Challenges in using causal ML for Numerical Weather Prediction
I want to describe some of the key challenges to overcome if we are to use causal ML to forecast the weather. These challenges also apply more broadly to the ML forecasting of dynamical systems, but I will focus on the weather as it is an application area which I’m interested in (I work on this) and one where there is a lot of ML progress being made at the moment....
Representation learning for weather data
Representation learning has been hugely beneficial to the process of getting machine learning (ML) to do useful things with text. These useful things include getting better search results from a Google search, and synthesizing images given text prompts (such as done by models like DALL-E). Representation learning is about learning meaningful ways to mathematically represent input data. The power of representation learning is that a single ML model can often extract mathematical representations of an input (e....
The link between causality and invariant predictors
There are a number of reasons we may wish to learn causal mechanisms when modelling a system/forecasting the weather/classifying an image. If our model captures the underlying causal mechanisms, it should be robust to new scenarios (e.g. the future), and it should still produce sensible results if we alter the input (“make an intervention”). Intervening on a system and seeing how things end up helps us make decisions. The issue is that the majority of existing ML tools simply learn correlations....
ML for Numerical Weather Prediction
Recently, there have been a great number of ML models which do Numerical Weather Prediction (NWP) with accuracy similar to state-of-the-art physics-based models. Moreover, these ML models are orders of magnitude quicker for creating forecasts. My intention here is to highlight the parts of these ML models which I think are particularly noteworthy. I will be using the following groupings: Efficiency - this explores what techniques are used in order to work with the large data which represents atmospheric states....
The Kelly Criterion and making bets
I offer you a game: I’ll flip a coin, and you will make a bet. If you guess the result correctly, I’ll give you your bet back plus 200% of the bet. If you guess wrong, I’ll keep your money. We will play this game many, many times. I ask you for a coin, and you sneakily pass me a biased coin with an 80% chance of landing on heads. Given this, the game seems favourable to you, and you sense that this is indeed a game worth playing (after all, if you say “heads”, the probability that you’ll be right is 80%)....
Fitting Models by Maximizing Likelihood
How should we fit models to data? If you look around, some people minimize mean-squared-error. For others, it is the mean-absolute-error which should be reduced. A few may feel inclined towards the Huber loss. The aim of this article is to convince you that when we want to fit a model to some data, it is sensible to do so by maximizing the likelihood which that model assigns to your data....