Efficient training of diffusion transformers for the weather

The promise of diffusion transformers for weather is limited by how they typically require far more resources to train than non-generative models. With a push to higher-resolution data and handling of multimodalities such as diverse observations, transformers must process even more data. This increases their computational cost which typically scales quadratically with the number of data patches processed. A current solution for improving training efficiency involves randomly masking patches during training to reduce the number of patches processed....

August 30, 2024 · 14 min · 2862 words · Raghul Parthipan

Score-based generative modelling

This page is about score-based generative modelling: why we would want to do it, and a simple way to do it. This is background to the more sophisticated diffusion models used in practice today. There are many tutorials and blog posts on diffusion already, such as Lilian Weng’s . My favourite is Yang Song’s which I found excellent and refer you to. I largely follow his structure and refer to the same equations here....

August 10, 2024 · 8 min · 1536 words · Raghul Parthipan