Cloud Engineering Meets Cloud Research

The Cloud Case ⛅
Ever explained cloud services (AWS) to a new user? Fun times! We recently helped a research institute take their weather prediction models from "runs on a laptop" to "runs in the cloud like a charm", not only for them, but also for partner institutes. Here's how we turned their Python-powered forecasting models into a proper cloud solution, and what we learned along the way.
Making Heavy Models Become Friends with AWS
Picture this: brilliant researchers with powerful models that crunch massive amounts of weather data to predict droughts worldwide. Cool stuff, right? But this brings a fun data challenge – or rather, several:
Models (yes, plural) that need serious computing power (and we mean serious)
Enabling researchers with great expertise on climate modelling, and making sure they can focus on what they are good at, rather than having to wonder if Docker is a Pokémon or not
Code that needed to work both on a personal laptop and in the cloud
Recurring prediction runs with new data that had to happen... well, monthly
Multiple collaborators needing their slice of the data pie
Sharing Our Recipe
Here's the overview of what we built:
Github repo for forecasting code and infrastructure
Github Actions workflow for automated Docker build
ECR for keeping our Docker images neat and tidy
AWS S3 for data storage (because where else would you keep terabytes of data?)
AWS Batch with Fargate for running the heavy lifting
EventBridge for making sure everything runs on schedule
IAM to control the action (getting the rights you need and not more than you need)
The secret ingredient to all this? Infrastructure as Code in Terraform, making sure our infrastructure stays as neat and tidy as our code - version controlled and reproducible.
Less is More
The initial brilliant and cool idea was to run multiple weather models in parallel to make things faster. Then we reconsidered: predictions run once a month… Keep it simple! Running in series had little drawbacks (shorter runtime was not a priority, we schedule it to run overnight anyway). However, this reduced the complexity of the solution, leading to: reduced (overhead) cost, less points of failure and easier maintenance/troubleshooting. Sometimes the best engineering decision is deciding what not to engineer.
What We Learned
1. Good Code = Happy Cloud
Before even thinking about AWS, we rolled up our sleeves and invested time to make the codebase cloud-proof. Clean code is like a good foundation, making the cloud engineering much easier – boring but essential.
2. Simplicity Wins Every Time
Could we have made it more complex? Sure! Would it have helped? Nope. Sometimes the best solution is the one that's easiest to explain to someone else at 3 AM.
3. Terraform is Your Friend
Yes, writing infrastructure as code (IaC) takes time. No, you won't regret it when you need to hand things over or have to change something six months later.
4. Testing Isn't Optional
We made it super easy to test everything, and monitor the processes, everywhere. Because we love an efficient development workflow and we rather skip long troubleshooting in production.
5. CI/CD for the win
Setting up proper CI/CD pipelines from day one meant our code went from commit to cloud without breaking a sweat (automating testing, containerisation and deployment). There's nothing better than knowing exactly which commit predicted today’s weather.
The Happy End
The result? A system that:
Runs like clockwork (monthly, to be precise)
Makes researchers happy (they can focus on science!)
Delivers results reliably (no more "did it run?" messages)
Can be fixed easily when things go wrong (because they inevitably will)
What's Next?
We could add fancy dashboards, more automation, or performance tweaks. But first, we'll let the researchers play with their new toy and tell us what they actually need. Because sometimes the best feature is the one you don't build.
TL;DR (Too Long; Didn’t Read)
Start with good code – it makes everything else easier
Keep it simple – complexity is the enemy of reliability
Use infrastructure as code – your future self will thank you
Make it testable – because everyone hates surprises in production
Listen to your users – sometimes they need less than you think
Remember: the best cloud solution isn't always the most sophisticated one – it's the one that gets the job done while letting everyone sleep at night (or day). 🌤