Why move to the cloud¶
Remove technical barriers¶
Atmospheric scientists often need to waste time on non-science tasks: installing software libraries, making models compile and run without bugs, preparing model input data, or even setting up a Linux server.
Those technical tasks are getting more and more challenging – as atmospheric models evolve to incorporate more scientific understandings and better computational technologies, they also need more complicated software, more computing power, and much more data.
Cloud computing can largely alleviate those problems. The goal of this project is to allow researchers to fully focus on scientific analysis, not fighting with software and hardware problems.
On the cloud, you can launch a server with everything configured correctly. Once I have built the model and saved it as an Amazon Machine Image (AMI), anyone can replicate exactly the same software environment and start using the model immediately (see Quick start guide for new users). You will never see compile errors anymore.
This has more implications in the age of High-Performance Computing (HPC). Modern atmospheric models are often built with complicated software frameworks, notably the Earth System Modeling Framework (ESMF). Those frameworks allow model developers to utilize HPC technologies without writing tons of boilerplate MPI code, but they add extra burdens on model users – installing and configuring those frameworks is daunting, if not impossible, for a typical graduate student without a CS background. Fortunately, no matter how difficult it is to install those libraries, there only needs to be one person to build it once on the cloud. Then, no one needs to redo this labor again.
This software dependency hell can also be solved by containers such as Docker and Singularity. But the cloud also solves compute and data problems, as discussed below. You can combine containers and cloud to have a consistent environment across local machines and cloud platforms. This is introduced in advanced tutorials.
Local machines need up-front investment and have fixed capability. Right before AGU, everyone is running models and jobs are pending forever in the queue. During Christmas, no one is working and machines are just idle but still incur maintenance cost.
Clouds are elastic. You can request an HPC cluster with 1000 cores for just 1 hour, and only pay for exactly that hour. If you have powerful local machines, you can still use the cloud to boost computing power temporarily.
GEOS-Chem currently have 30 TB of GEOS-FP/MERRA2 meteorological input data. With a bandwidth of 1 MB/s, it takes two weeks to download a 1-TB subset and a year to download the full 30 TB. To set up a high-resolution nested simulation, one often needs to spend long time getting the corresponding meteorological fields. GCHP can ingest global high-resolution data and will further push the data size to increase.
The new paradigm to solve this big data challenge is to “move compute to data”, i.e. perform computing directly in the cloud environment where data is already available. (also see Massive Earth observation data). AWS has agreed to host all GEOS-Chem input data for free under the Public Data Set Program. By having all the data already available in the cloud environment, you can perform simulations over any periods with any configurations.
Open new research opportunities¶
Cloud not only makes model simulations much easier, but also opens many new research opportunities in Earth science.
Massive Earth observation data¶
Massive amounts of satellite and other Earth science data are being moved to the cloud. One success story is the migration of NOAA’s NEXRAD data to AWS (Ansari et al., 2017, BAMS) – it is reported that “data access that previously took 3+ years to complete now requires only a few days” (NAS, 2018, Chapter “Data and Computation in the Cloud”). By learning cloud computing you can get access to massive Earth science datasets on AWS, without having to spend long time downloading them to local machines.
The most exciting project is perhaps the cloud migration of NASA’s Earth Observing System Data and Information System (EOSDIS). It will open new opportunities such as ultra-high-resolution inversion of satellite data, leveraging massive data and computing power available on the cloud. This kind of analysis is hard to imagine on traditional platforms.
Machine learning and deep learning¶
Cloud platforms are the go-to choice for training machine learning models, especially deep neural networks. There are massive amounts of GPUs on the cloud, which can offer ~50x performance than CPUs for training neural nets. Pre-configured environment on the cloud (e.g. AWS Deep Learning AMI) allows users to run the program immediately without wasting time configuring GPU libraries.
Instructions on using cloud are often included in the official documentations of ML/DL frameworks:
- Keras on AWS GPU. Keras is the most popular high-level deep learning library, built on top of TensorFlow.
- XGBoost on AWS cluster. XGBoost is the most popular library for gradient boosting, and is also the most widely used tool in Kaggle.
… and in deep learning textbooks and course materials:
- Stanford CS231n: Convolutional Neural Networks for Visual Recognition. See Google Cloud Tutorial and AWS Tutorial. (CS231n should be one of the most popular deep learning courses, with all videos and materials freely available online)
- Deep learning with Python. by François Chollet, the author of Keras. See Appendix B. Running Jupyter notebooks on AWS GPU. (This book got full 5-star on Amazon)
- Deep Learning - The Straight Dope. It is a very nice interactive textbook on deep learning. Its official Chinese version has an instruction on using AWS. See AWS official docs for the equivalent instruction in English.