This lesson is being piloted (Beta version)

ESMValTool Tutorial

Introduction

Overview

Teaching: 5 min
Exercises: 10 min
Questions
  • What is ESMValTool?

  • Who are the people behind ESMValTool?

Objectives
  • Familiarize with ESMValTool

  • Synchronize expectations

What is ESMValTool?

This tutorial is a first introduction to ESMValTool. Before diving into the technical steps, let’s talk about what ESMValTool is all about.

What is ESMValTool?

What do you already know about, or expect from ESMValTool?

ESMValTool is…

EMSValTool is many things, but in this tutorial we will focus on the following traits:

A tool to analyse climate data

A collection of diagnostics for reproducible climate science

A community effort

A tool to analyse climate data

ESMValTool takes care of finding, opening, checking, fixing, concatenating, and preprocessing CMIP data and several other supported datasets.

The central component of ESMValTool that we will see in this tutorial is the recipe. Any ESMValTool recipe is basically a set of instructions to reproduce a certain result. The basic structure of a recipe is as follows:

An example recipe could look like this:

documentation:
  description: Example recipe
  authors:
    - lastname_firstname

datasets:
  - {dataset: HadGEM2-ES, project: CMIP5, exp: historical, mip: Amon, ensemble: r1i1p1, start_year: 1960, end_year: 2005}

preprocessors:
  global_mean:
    area_statistics:
      operator: mean

diagnostics:
  hockeystick_plot:
    description: plot of global mean temperature change
    variables:
      temperature:
        short_name: tas
        preprocessor: global_mean
    scripts: hockeystick.py

Understanding the different section of the recipe

Try to figure out the meaning of the different dataset keys. Hint: they can be found in the documentation of ESMValTool.

Solution

The keys are explained in the ESMValTool documentation, in the section The recipe format, under datasets

A collection of diagnostics for reproducible climate science

More than a tool, ESMValTool is a collection of publicly available recipes and diagnostic scripts. This makes it possible to easily reproduce important results.

Explore the available recipes

Go to the documentation of esmvaltool and explore the available recipes section. Which recipe(s) would you like to try?

A community effort

ESMValTool is built and maintained by an active community of scientists and software engineers. It is an open source project to which anyone can contribute. Many of the interactions take place on GitHub. Here, we briefly introduce you to some of the most important pages.

Meet ESMValGroup

Browse to github.com/ESMValGroup. This is our ‘organization’ GitHub page. Have a look around. How many collaborators are there? Do you know any of them?

Near the top of the page there are 2 pinned repositories: ESMValTool and ESMValCore. Visit each of the repositories. How many people have contributed to each of them? Can you also find out how many people have contributed to this tutorial?

Issues and pull requests

Go back to the repository pages of ESMValTool or ESMValCore. There are tabs for ‘issues’ and ‘pull requests’. You can use the labels to navigate them a bit more. How many open issues are about enhancements of ESMValTool? And how many bugs have been fixed in ESMValCore? There is also an ‘insights’ tab, where you can see a summary of recent activity. How many issues have been opened and closed in the past month?

Conclusion

This concludes the introduction of the tutorial. You now have a basic knowledge of ESMValTool and its community. The following episodes will walk you through the installation, configuration and running your first recipes.

Key Points

  • ESMValTool provides a reliable interface to analyse and evaluate climate data

  • A large collection of recipes and diagnostic scripts is already available

  • ESMValTool is built and maintained by an active community of scientists and developers


Installation

Overview

Teaching: 10 min
Exercises: 10 min
Questions
  • What are the prerequisites for installing ESMValTool?

  • How do I confirm that the installation was successful?

Objectives
  • Install ESMValTool

  • Demonstrate that the installation was successful

Overview

The instructions help with the installation of ESMValTool on operating systems like Linux/MacOSX/Windows. We use the Conda package manager to install the ESMValTool. Other installation methods are also available; they can be found in the documentation. We will first install Conda, and then ESMValTool. We end this chapter by testing that the installation was successful.

Before we begin, here are all the possible ways in which you can use ESMValTool depending on your level of expertise or involvement with ESMvalTool and associated software such as GitHub and Conda.

  1. If you have access to a server where ESMValTool is already installed as a module, for e.g., the CEDA JASMIN server, you can simply load the module with the following:
    module load esmvaltool
    

    After loading esmvaltool, we can start using ESMValTool. Please see the next lesson.

  2. If you would like to install ESMValTool as a conda package, then this lesson will tell you how!
  3. If you would like to start experimenting with existing diagnostics or contributing to ESMvalTool, please see the instructions for source installation in the lesson Development and contribution and in the documentation.

Install ESMValTool on Windows

ESMValTool does not directly support Windows, but successful usage has been reported through the Windows Subsystem for Linux(WSL), available in Windows 10. To install the WSL please follow the instructions on the Windows Documentation page. After installing the WSL, installation can be done using the same instructions for Linux/MacOSX.

Install ESMValTool on Linux/MacOSX

Install Conda

ESMValTool is distributed using Conda. Let’s check if we already have Conda installed by running:

conda list

If conda is installed, we will see a list of packages. We recommend updating conda before esmvaltool installation. To do so, run:

conda update -n base conda

If conda is not installed, we can use Miniconda minimal installer for conda. We recommend a Python 3 based installer. For more information about installing conda, see the conda installation documentation.

To install conda on Linux or MacOSX, follow the instructions below:

  1. Please download Miniconda3 at the miniconda page. If you have problems with the 64 bit version in the next step(s) you can alternatively try a 32 bit version.

  2. Next, run the installer from the place where you downloaded it:

    On Linux:

    bash Miniconda3-latest-Linux-x86_64.sh
    

    On MacOSX:

    bash Miniconda3-latest-MacOSX-x86_64.sh
    
  3. Follow the instructions in the installer. The defaults should normally suffice.

  4. You will need to restart your terminal for the changes to have effect.

  5. Verify you have a working conda installation by listing all installed packages

    conda list
    

Install Julia

Some ESMValTool diagnostics are written in the Julia programming language. If you want a full installation of ESMValTool including Julia diagnostics, you need to make sure Julia is installed before installing ESMValTool.

In this tutorial, we will not use Julia, but for reference, we have listed the steps to install Julia below. Complete instructions for installing Julia can be found on the Julia installation page.

Julia installation instructions

First, open a bash terminal and create a directory to install Julia in and cd into it

mkdir ~/julia
cd ~/julia

Next, to download and extract the file julia-1.0.5-linux-x86_64.tar.gz, you can use the following commands::

wget https://julialang-s3.julialang.org/bin/linux/x64/1.0/julia-1.0.5-linux-x86_64.tar.gz
tar -xvzf julia-1.0.5-linux-x86\_64.tar.gz

This will extract the files to a directory named ~/julia/julia-1.0.5. To run Julia, you need to add the full path of Julia’s bin folder to PATH environment variable. To do this, you can edit the ~/.bashrc (or ~/.bash_profile) file. Open the file in a text editor called nano:

nano ~/.bashrc

and add a new line as follows at the bottom of the file:

export PATH="$PATH:$HOME/julia/julia-1.0.5/bin"

Finally, for the settings to take effect, either reload your bash profile

source ~/.bashrc

(or source ~/.bash_profile), or close the bash terminal window and open a new one.

To check that the Julia executable can be found, run

which julia

to display the path to the Julia executable, it should be

~/julia/julia-1.0.5/bin/julia

To test that Julia is installed correctly, run

julia

to start the interactive Julia interpreter. Press Ctrl+D to exit.

Install the ESMValTool package

The ESMValTool package contains diagnostics scripts in four languages: R, Python, Julia and NCL. This introduces a lot of dependencies, and therefore the installation can take quite long. It is, however, possible to install ‘subpackages’ for each of the languages. The following (sub)packages are available:

For the tutorial, we will use only Python diagnostics. Thus, to install the ESMValTool-python package, run

conda create -n esmvaltool -c conda-forge -c esmvalgroup esmvaltool-python

This will create a new Conda environment called esmvaltool, with the ESMValTool-Python package and all of its dependencies installed in it.

Common issues

  • Installation takes a long time
    • Downloads and compilations can take several minutes to complete, please be patient.
    • You might have bad luck and the dependencies can not be resolved at the moment, please try again later or raise an issue
    • If Solving environment takes more than 10 minutes, you may need to update conda: conda update -n base conda
    • You can help conda solve the environment by specifying the python version:
      conda create -n esmvaltool -c conda-forge -c esmvalgroup esmvaltool-python python=3.8
      
    • Note that on MacOSX, esmvaltool-python and esmvaltool-ncl only work with Python 3.7. Use python=3.7 instead of python=3.8.
  • If you have an old conda installation you could get a UnsatisfiableError message. Please install a newer version of conda and try again
  • Downloads fail due to company proxy, see conda docs how to resolve.

Test that the installation was successful

To test that the installation was successful, run

conda activate esmvaltool

to activate the conda environment called esmvaltool. In the shell prompt the active conda environment should have been changed from (base) to (esmvaltool).

Next, run

esmvaltool --help

to display the command line help.

Version of ESMValTool

Can you figure out which version of ESMValTool has been installed?

Solution

The esmvaltool --help command lists version as a command to get the version

In my case when I run

esmvaltool version

I get that my installed ESMValTool version is

ESMValCore: 2.0.0
ESMValTool: 2.0.0

Key Points

  • All the required packages can be installed using conda.

  • You can find more information about installation in the documentation.


Configuration

Overview

Teaching: 10 min
Exercises: 10 min
Questions
  • What is the user configuration file and how should I use it?

Objectives
  • Understand the contents of the user-config.yml file

  • Prepare a personalized user-config.yml file

  • Configure ESMValTool to use some settings

The configuration file

The config-user.yml configuration file contains all the global level information needed by ESMValTool to run. This is a YAML file.

You can get the default configuration file by running:

  esmvaltool config get_config_user

It will save the file to: ~/.esmvaltool/config-user.yml, where ~ is the path to your home directory. Note that files and directories starting with a period are “hidden”, to see the .esmvaltool directory in the terminal use ls -la ~.

We run a text editor called nano to have a look inside the configuration file and then modify it if needed:

  nano ~/.esmvaltool/config-user.yml

This file contains the information for:

Text editor side note

No matter what editor you use, you will need to know where it searches for and saves files. If you start it from the shell, it will (probably) use your current working directory as its default location. We use nano in examples here because it is one of the least complex text editors. Press ctrl + O to save the file, and then ctrl + X to exit nano.

Output settings

The configuration file starts with output settings that inform ESMValTool about your preference for output. You can turn on or off the setting by true or false values. Most of these settings are fairly self-explanatory. For example, write_plots: true means that diagnostics create plots.

Saving preprocessed data

Later in this tutorial, we will want to look at the contents of the preproc folder. This folder contains preprocessed data and is removed by default when ESMValTool is run. In the configuration file, which settings can be modified to prevent this from happening?

Solution

If the option remove_preproc_dir is set to false, then the preproc/ directory contains all the pre-processed data and the metadata interface files. If the option save_intermediary_cubes is set to true then data will also be saved after each preprocessor step in the folder preproc. Note that saving all intermediate results to file will result in a considerable slowdown, and can quickly fill your disk.

Destination directory

The destination directory is the rootpath where ESMValTool will store its output folders containing e.g. figures, data, logs, etc. With every run, ESMValTool automatically generates a new output folder determined by recipe name, and date and time using the format: YYYYMMDD_HHMMSS.

Set the destination directory

Let’s name our destination directory esmvaltool_output in the working directory. ESMValTool should write the output to this path. How to modify the config-user.yml?

Solution

We use output_dir entry in the config-user.yml file as:

output_dir: ./esmvaltool_output

If the esmvaltool_output does not exist, ESMValTool will generate it for you.

Rootpath to input data

ESMValTool uses several categories (in ESMValTool, this is referred to as projects) for input data based on their source. The current categories in the configuration file are mentioned below. For example, CMIP is used for a dataset from the climate model intercomparison project whereas OBS is used for an observational dataset. We can find more information about the projects in the ESMValTool documentation. The rootpath specifies the directories where ESMValTool will look for input data. For each category, you can define either one path or several paths as a list. For example:

rootpath:
  CMIP5: [~/cmip5_inputpath1, ~/cmip5_inputpath2]
  OBS: ~/obs_inputpath
  RAWOBS: ~/rawobs_inputpath
  default: ~/default_inputpath
  CORDEX: ~/default_inputpath

Site-specific entries for Jasmin and DKRZ are listed at the end of the example configuration file.

Set the correct rootpath

In this tutorial, we will work with data from CMIP5 and CMIP6. How can we moodify the rootpath to make sure the data path is set correctly for both CMIP5 and CMIP6?

Note: to get the data, check instruction in Setup.

Solution

  • Are you working on your own local machine? You need to add the root path of the folder where the data is available to the config-user.yml file as:
    rootpath:
    ...
      CMIP5: ~/esmvaltool_tutorial/data
      CMIP6: ~/esmvaltool_tutorial/data
    
  • Are you working with on a computer cluster like Jasmin or DKRZ? Site-specific path to the data are already listed at the end of the config-user.yml file. You need to uncomment the related lines. For example, on Jasmin:
  # Site-specific entries: Jasmin
  # Uncomment the lines below to locate data on JASMIN
  rootpath:
    CMIP6: /badc/cmip6/data/CMIP6
    CMIP5: /badc/cmip5/data/cmip5/output1
  #  CMIP3: /badc/cmip3_drs/data/cmip3/output
  #  OBS: /gws/nopw/j04/esmeval/obsdata-v2
  #  OBS6: /gws/nopw/j04/esmeval/obsdata-v2
  #  obs4mips: /gws/nopw/j04/esmeval/obsdata-v2
  #  ana4mips: /gws/nopw/j04/esmeval/obsdata-v2
  #  CORDEX: /badc/cordex/data/CORDEX/output
  • For more information about setting the rootpath, see also the ESMValTool documentation.

Directory structure for the data from different projects

Input data can be from various models, observations and reanalysis data that adhere to the CF/CMOR standard. The drs setting describes the file structure.

The drs setting describes the file structure for several projects (e.g. CMIP6, CMIP5, obs4mips, OBS6, OBS) on several key machines (e.g. BADC, CP4CDS, DKRZ, ETHZ, SMHI, BSC). For more information about drs, you can visit the ESMValTool documentation on Data Reference Syntax (DRS).

Set the correct drs

In this lesson, we will work with data from CMIP5 and CMIP6. How can we set the correct drs?

Solution

  • Are you working on your own local machine? You need to set the drs of the data in the config-user.yml file as:
    drs:
      CMIP5: default
      CMIP6: default
    
  • Are you working with on a computer cluster like Jasmin or DKRZ? Site-specific drs of the data are already listed at the end of the config-user.yml file. You need to uncomment the related lines. For example, on Jasmin:
    # Site-specific entries: Jasmin
    # Uncomment the lines below to locate data on JASMIN
    drs:
      CMIP6: BADC
      CMIP5: BADC
    #  CMIP3: BADC
    #  CORDEX: BADC
    #  OBS: BADC
    #  OBS6: BADC
    #  obs4mips: BADC
    #  ana4mips: BADC
    

Explain the default drs (if working on local machine)

  1. In the previous exercise, we set the drs of CMIP5 data to default. Can you explain why?
  2. Have a look at the directory structure of the data. There is the folder Tier1. What does it mean?

Solution

  1. drs: default is one way to retrieve data from a ROOT directory that has no DRS-like structure. default indicates that all the files are in a folder without any structure.

  2. Observational data are organized in Tiers depending on their level of public availability. Therefore the default directory must be structured accordingly with sub-directories TierX e.g. Tier1, Tier2 or Tier3, even when drs: default.

Other settings

Auxiliary data directory

The auxiliary_data_dir setting is the path where any required additional auxiliary data files are stored. This location allows us to tell the diagnostic script where to find the files if they can not be downloaded at runtime. This option should not be used for model or observational datasets, but for data files (e.g. shape files) used in plotting such as coastline descriptions and if you want to feed some additional data (e.g. shape files) to your recipe.

auxiliary_data_dir: ~/auxiliary_data

See more information in ESMValTool document.

Number of parallel tasks

This option enables you to perform parallel processing. You can choose the number of tasks in parallel as 1/2/3/4/… or you can set it to null. That tells ESMValTool to use the maximum number of available CPUs. For the purpose of the tutorial, please set ESMValTool use only 1 cpu:

max_parallel_tasks: 1

In general, if you run out of memory, try setting max_parallel_tasks to 1. Then, check the amount of memory you need for that by inspecting the file run/resource_usage.txt in the output directory. Using the number there you can increase the number of parallel tasks again to a reasonable number for the amount of memory available in your system.

Make your own configuration file

It is possible to have several configuration files with different purposes, for example: config-user_formalised_runs.yml, config-user_debugging.yml. In this case, you have to pass the path of your own configuration file as a command-line option when running the ESMValTool. We will learn how to do this in the next lesson.

Key Points

  • The config-user.yml tells ESMValTool where to find input data.

  • output_dir defines the destination directory.

  • rootpath defines the root path of the data.

  • drs defines the directory structure of the data.


Running your first recipe

Overview

Teaching: 15 min
Exercises: 15 min
Questions
  • How to run a recipe?

  • What happens when I run a recipe?

Objectives
  • Run an existing ESMValTool recipe

  • Examine the log information

  • Navigate the output created by ESMValTool

  • Make small adjustments to an existing recipe

This episode describes how ESMValTool recipes work, how to run a recipe and how to explore the recipe output. By the end of this episode, you should be able to run your first recipe, look at the recipe output, and make small modifications.

Running an existing recipe

The recipe format has briefly been introduced in episode 1. To see all the recipes that are shipped with ESMValTool, type

esmvaltool recipes list

We will start by running examples/recipe_python.yml

esmvaltool run examples/recipe_python.yml

If everything is okay, you should see that ESMValTool is printing a lot of output to the command line. The final message should be “Run was successful”. The exact output varies depending on your machine, but it should look something like the example output below.

Example output

INFO    [29586]
______________________________________________________________________
          _____ ____  __  ____     __    _ _____           _
         | ____/ ___||  \/  \ \   / /_ _| |_   _|__   ___ | |
         |  _| \___ \| |\/| |\ \ / / _` | | | |/ _ \ / _ \| |
         | |___ ___) | |  | | \ V / (_| | | | | (_) | (_) | |
         |_____|____/|_|  |_|  \_/ \__,_|_| |_|\___/ \___/|_|
______________________________________________________________________

ESMValTool - Earth System Model Evaluation Tool.

http://www.esmvaltool.org

CORE DEVELOPMENT TEAM AND CONTACTS:
  Veronika Eyring (PI; DLR, Germany - veronika.eyring@dlr.de)
  Bouwe Andela (NLESC, Netherlands - b.andela@esciencecenter.nl)
  Bjoern Broetz (DLR, Germany - bjoern.broetz@dlr.de)
  Lee de Mora (PML, UK - ledm@pml.ac.uk)
  Niels Drost (NLESC, Netherlands - n.drost@esciencecenter.nl)
  Nikolay Koldunov (AWI, Germany - nikolay.koldunov@awi.de)
  Axel Lauer (DLR, Germany - axel.lauer@dlr.de)
  Benjamin Mueller (LMU, Germany - b.mueller@iggf.geo.uni-muenchen.de)
  Valeriu Predoi (URead, UK - valeriu.predoi@ncas.ac.uk)
  Mattia Righi (DLR, Germany - mattia.righi@dlr.de)
  Manuel Schlund (DLR, Germany - manuel.schlund@dlr.de)
  Javier Vegas-Regidor (BSC, Spain - javier.vegas@bsc.es)
  Klaus Zimmermann (SMHI, Sweden - klaus.zimmermann@smhi.se)

For further help, please read the documentation at
http://docs.esmvaltool.org. Have fun!

INFO    [29586] Using config file esmvaltool_config.yml
INFO    [29586] Writing program log files to:
/home/user/esmvaltool_tutorial/output/recipe_python_20201007_141734/run/main_log.txt
/home/user/esmvaltool_tutorial/output/recipe_python_20201007_141734/run/main_log_debug.txt
INFO    [29586] Starting the Earth System Model Evaluation Tool v2.0.0 at time: 2020-10-07 14:17:34 UTC
INFO    [29586] ----------------------------------------------------------------------
INFO    [29586] RECIPE   = /home/user/gh/esmvalgroup/ESMValTool/esmvaltool/recipes/examples/recipe_python.yml
INFO    [29586] RUNDIR     = /home/user/esmvaltool_tutorial/output/recipe_python_20201007_141734/run
INFO    [29586] WORKDIR    = /home/user/esmvaltool_tutorial/output/recipe_python_20201007_141734/work
INFO    [29586] PREPROCDIR = /home/user/esmvaltool_tutorial/output/recipe_python_20201007_141734/preproc
INFO    [29586] PLOTDIR    = /home/user/esmvaltool_tutorial/output/recipe_python_20201007_141734/plots
INFO    [29586] ----------------------------------------------------------------------
INFO    [29586] Running tasks using at most 1 processes
INFO    [29586] If your system hangs during execution, it may not have enough memory for keeping this number of tasks in memory.
INFO    [29586] If you experience memory problems, try reducing 'max_parallel_tasks' in your user configuration file.
INFO    [29586] Creating tasks from recipe
INFO    [29586] Creating tasks for diagnostic map
INFO    [29586] Creating preprocessor task map/tas
INFO    [29586] Creating preprocessor 'select_january' task for variable 'tas'
INFO    [29586] Using input files for variable tas of dataset BCC-ESM1:
/home/user/esmvaltool_tutorial/data/cmip6/tas_Amon_BCC-ESM1_historical_r1i1p1f1_gn_185001-201412.nc
INFO    [29586] Using input files for variable tas of dataset CanESM2:
/home/user/esmvaltool_tutorial/data/cmip5/tas_Amon_CanESM2_historical_r1i1p1_185001-200512.nc
INFO    [29586] PreprocessingTask map/tas created. It will create the files:
/home/user/esmvaltool_tutorial/output/recipe_python_20201007_141734/preproc/map/tas/CMIP5_CanESM2_Amon_historical_r1i1p1_tas_2000-2000.nc
/home/user/esmvaltool_tutorial/output/recipe_python_20201007_141734/preproc/map/tas/CMIP6_BCC-ESM1_Amon_historical_r1i1p1f1_tas_2000-2000.nc
INFO    [29586] Creating diagnostic task map/script1
INFO    [29586] Creating tasks for diagnostic timeseries
INFO    [29586] Creating preprocessor task timeseries/tas_amsterdam
INFO    [29586] Creating preprocessor 'annual_mean_amsterdam' task for variable 'tas'
INFO    [29586] Using input files for variable tas of dataset BCC-ESM1:
/home/user/esmvaltool_tutorial/data/cmip6/tas_Amon_BCC-ESM1_historical_r1i1p1f1_gn_185001-201412.nc
INFO    [29586] Using input files for variable tas of dataset CanESM2:
/home/user/esmvaltool_tutorial/data/cmip5/tas_Amon_CanESM2_historical_r1i1p1_185001-200512.nc
INFO    [29586] PreprocessingTask timeseries/tas_amsterdam created. It will create the files:
/home/user/esmvaltool_tutorial/output/recipe_python_20201007_141734/preproc/timeseries/tas_amsterdam/CMIP5_CanESM2_Amon_historical_r1i1p1_tas_1850-2000.> nc
/home/user/esmvaltool_tutorial/output/recipe_python_20201007_141734/preproc/timeseries/tas_amsterdam/> CMIP6_BCC-ESM1_Amon_historical_r1i1p1f1_tas_1850-2000.nc
INFO    [29586] Creating preprocessor task timeseries/tas_global
INFO    [29586] Creating preprocessor 'annual_mean_global' task for variable 'tas'
INFO    [29586] Using input files for variable tas of dataset BCC-ESM1:
/home/user/esmvaltool_tutorial/data/cmip6/tas_Amon_BCC-ESM1_historical_r1i1p1f1_gn_185001-201412.nc
INFO    [29586] Using input files for variable tas of dataset CanESM2:
/home/user/esmvaltool_tutorial/data/cmip5/tas_Amon_CanESM2_historical_r1i1p1_185001-200512.nc
INFO    [29586] PreprocessingTask timeseries/tas_global created. It will create the files:
/home/user/esmvaltool_tutorial/output/recipe_python_20201007_141734/preproc/timeseries/tas_global/CMIP6_BCC-ESM1_Amon_historical_r1i1p1f1_tas_1850-2000.> nc
/home/user/esmvaltool_tutorial/output/recipe_python_20201007_141734/preproc/timeseries/tas_global/CMIP5_CanESM2_Amon_historical_r1i1p1_tas_1850-2000.nc
INFO    [29586] Creating diagnostic task timeseries/script1
INFO    [29586] These tasks will be executed: timeseries/tas_global, timeseries/tas_amsterdam, map/tas, map/script1, timeseries/script1
INFO    [29586] Running 5 tasks sequentially
INFO    [29586] Starting task map/tas in process [29586]
INFO    [29586] Successfully completed task map/tas (priority 0) in 0:00:04.291697
INFO    [29586] Starting task map/script1 in process [29586]
INFO    [29586] Running command ['/home/user/miniconda3/envs/esmvaltool_tutorial/bin/python3.8', '/home/user/gh/esmvalgroup/ESMValTool/esmvaltool/diag_scripts/> examples/diagnostic.py', '/home/user/esmvaltool_tutorial/output/recipe_python_20201007_141734/run/map/script1/settings.yml']
INFO    [29586] Writing output to /home/user/esmvaltool_tutorial/output/recipe_python_20201007_141734/work/map/script1
INFO    [29586] Writing plots to /home/user/esmvaltool_tutorial/output/recipe_python_20201007_141734/plots/map/script1
INFO    [29586] Writing log to /home/user/esmvaltool_tutorial/output/recipe_python_20201007_141734/run/map/script1/log.txt
INFO    [29586] To re-run this diagnostic script, run:
cd /home/user/esmvaltool_tutorial/output/recipe_python_20201007_141734/run/map/script1; MPLBACKEND="Agg" /home/user/miniconda3/envs/esmvaltool_tutorial/bin/> python3.8 /home/user/gh/esmvalgroup/ESMValTool/esmvaltool/diag_scripts/examples/diagnostic.py /home/user/esmvaltool_tutorial/output/> recipe_python_20201007_141734/run/map/script1/settings.yml
INFO    [29586] Maximum memory used (estimate): 0.3 GB
INFO    [29586] Sampled every second. It may be inaccurate if short but high spikes in memory consumption occur.
INFO    [29586] Successfully completed task map/script1 (priority 1) in 0:00:03.574651
INFO    [29586] Starting task timeseries/tas_amsterdam in process [29586]
INFO    [29586] Generated PreprocessorFile: /home/user/esmvaltool_tutorial/output/recipe_python_20201007_141734/preproc/timeseries/tas_amsterdam/> MultiModelMean_Amon_tas_1850-2000.nc
INFO    [29586] Successfully completed task timeseries/tas_amsterdam (priority 2) in 0:00:09.730443
INFO    [29586] Starting task timeseries/tas_global in process [29586]
WARNING [29586] /home/user/miniconda3/envs/esmvaltool_tutorial/lib/python3.8/site-packages/iris/analysis/cartography.py:394: UserWarning: Using > DEFAULT_SPHERICAL_EARTH_RADIUS.
  warnings.warn("Using DEFAULT_SPHERICAL_EARTH_RADIUS.")

INFO    [29586] Calculated grid area shape: (1812, 64, 128)
WARNING [29586] /home/user/miniconda3/envs/esmvaltool_tutorial/lib/python3.8/site-packages/iris/analysis/cartography.py:394: UserWarning: Using > DEFAULT_SPHERICAL_EARTH_RADIUS.
  warnings.warn("Using DEFAULT_SPHERICAL_EARTH_RADIUS.")

INFO    [29586] Calculated grid area shape: (1812, 64, 128)
INFO    [29586] Successfully completed task timeseries/tas_global (priority 3) in 0:00:06.073527
INFO    [29586] Starting task timeseries/script1 in process [29586]
INFO    [29586] Running command ['/home/user/miniconda3/envs/esmvaltool_tutorial/bin/python3.8', '/home/user/gh/esmvalgroup/ESMValTool/esmvaltool/diag_scripts/> examples/diagnostic.py', '/home/user/esmvaltool_tutorial/output/recipe_python_20201007_141734/run/timeseries/script1/settings.yml']
INFO    [29586] Writing output to /home/user/esmvaltool_tutorial/output/recipe_python_20201007_141734/work/timeseries/script1
INFO    [29586] Writing plots to /home/user/esmvaltool_tutorial/output/recipe_python_20201007_141734/plots/timeseries/script1
INFO    [29586] Writing log to /home/user/esmvaltool_tutorial/output/recipe_python_20201007_141734/run/timeseries/script1/log.txt
INFO    [29586] To re-run this diagnostic script, run:
cd /home/user/esmvaltool_tutorial/output/recipe_python_20201007_141734/run/timeseries/script1; MPLBACKEND="Agg" /home/user/miniconda3/envs/esmvaltool_tutorial/> bin/python3.8 /home/user/gh/esmvalgroup/ESMValTool/esmvaltool/diag_scripts/examples/diagnostic.py /home/user/esmvaltool_tutorial/output/> recipe_python_20201007_141734/run/timeseries/script1/settings.yml
INFO    [29586] Maximum memory used (estimate): 0.3 GB
INFO    [29586] Sampled every second. It may be inaccurate if short but high spikes in memory consumption occur.
INFO    [29586] Successfully completed task timeseries/script1 (priority 4) in 0:00:03.712112
INFO    [29586] Ending the Earth System Model Evaluation Tool v2.0.0 at time: 2020-10-07 14:18:02 UTC
INFO    [29586] Time for running the recipe was: 0:00:27.483308
INFO    [29586] Maximum memory used (estimate): 0.7 GB
INFO    [29586] Sampled every second. It may be inaccurate if short but high spikes in memory consumption occur.
INFO    [29586] Run was successful

Pro tip: ESMValTool search paths

You might wonder how ESMValTool was able find the recipe file, even though it’s not in your working directory. All the recipe paths printed from esmvaltool recipes list are relative to ESMValTool’s installation location. This is where ESMValTool will look if it cannot find the file by following the path from your working directory.

Investigating the log messages

Let’s dissect what’s happening here.

Output files and directories

After the banner and general information, the output starts with some important locations.

  1. Did ESMValTool use the right config file?
  2. What is the path to the example recipe?
  3. What is the main output folder generated by ESMValTool?
  4. Can you guess what the different output directories are for?
  5. ESMValTool creates two log files. What is the difference?

Answers

  1. The config file should be the one we edited in the previous episode, something like /home/<user>/.esmvaltool/config-user.yml.
  2. ESMValTool found the recipe in its installation directory, something like /home/user/miniconda3/envs/esmvaltool_tutorial/bin/esmvaltool/recipes/examples/.
  3. ESMValTool creates a time-stamped output directory for every run. In this case, it should be something like recipe_python_YYYYMMDD_HHMMSS. This folder is made inside the output directory specified in the previous episode: ~/home/<user>/esmvaltool_tutorial/output.
  4. There should be four output folders:
    • plots/: this is where output figures are stored.
    • preproc/: this is where pre-processed data are stored.
    • run/: this is where esmvaltool stores general information about the run, such as log messages and a copy of the recipe file.
    • work/: this is where output files (not figures) are stored.
  5. The log files are:
    • main_log.txt is a copy of the command-line output
    • main_log_debug.txt contains more detailed information that may be useful for debugging.

Debugging: No ‘preproc’ directory?

If you’re missing the preproc directory, then your config-user.yml file has the value remove_preproc_dir set to true (this is used to save disk space). Please set this value to false and run the recipe again.

After the output locations, there are two main sections that can be distinguished in the log messages:

Analyse the tasks

List all the tasks that ESMValTool is executing for this recipe. Can you guess what this recipe does?

Answer

Just after ‘creating tasks’ and before ‘executing tasks’, we find the following line in the output:

INFO    [29586] These tasks will be executed: timeseries/tas_global, timeseries/tas_amsterdam, map/tas, map/script1, timeseries/script1

So there are three tasks related to timeseries: global temperature, Amsterdam temperature, and a script (tas: near-surface air temperature). And then there are two tasks related to a map: something with temperature, and again a script.

Examining the recipe file

To get more insight into what is happening, we will have a look at the recipe file itself. Use the following command to copy the recipe to your working directory

esmvaltool recipes get examples/recipe_python.yml

Now you should see the recipe file in your working directory (type ls to verify). Use the nano editor to open this file:

nano recipe_python.yml

For reference, you can also view the recipe by unfolding the box below.

recipe_python.yml

# ESMValTool
# recipe_python.yml
---
documentation:
  description: |
    Example recipe that plots a map and timeseries of temperature.

  authors:
    - andela_bouwe
    - righi_mattia

  maintainer:
    - schlund_manuel

  references:
    - acknow_project

  projects:
    - esmval
    - c3s-magic

datasets:
  - {dataset: BCC-ESM1, project: CMIP6, exp: historical, ensemble: r1i1p1f1, grid: gn}
  - {dataset: CanESM2, project: CMIP5, exp: historical, ensemble: r1i1p1}

preprocessors:

  select_january:
    extract_month:
      month: 1

  annual_mean_amsterdam:
    extract_point:
      latitude: 52.379189
      longitude: 4.899431
      scheme: linear
    annual_statistics:
      operator: mean
    multi_model_statistics:
      statistics:
        - mean
      span: overlap

  annual_mean_global:
    area_statistics:
      operator: mean
    annual_statistics:
      operator: mean

diagnostics:

  map:
    description: Global map of temperature in January 2000.
    themes:
      - phys
    realms:
      - atmos
    variables:
      tas:
        mip: Amon
        preprocessor: select_january
        start_year: 2000
        end_year: 2000
    scripts:
      script1:
        script: examples/diagnostic.py
        write_netcdf: true
        output_file_type: pdf
        quickplot:
          plot_type: pcolormesh
          cmap: Reds

  timeseries:
    description: Annual mean temperature in Amsterdam and global mean since 1850.
    themes:
      - phys
    realms:
      - atmos
    variables:
      tas_amsterdam:
        short_name: tas
        mip: Amon
        preprocessor: annual_mean_amsterdam
        start_year: 1850
        end_year: 2000
      tas_global:
        short_name: tas
        mip: Amon
        preprocessor: annual_mean_global
        start_year: 1850
        end_year: 2000
    scripts:
      script1:
        script: examples/diagnostic.py
        quickplot:
          plot_type: plot

Do you recognize the basic recipe structure that was introduced in episode 1?

Analyse the recipe

Try to answer the following questions:

  1. Who wrote this recipe?
  2. Who should be approached if there is a problem with this recipe?
  3. How many datasets are analyzed?
  4. What does the preprocessor called annual_mean_global do?
  5. Which script is applied for the diagnostic called map?
  6. Can you link specific lines in the recipe to the tasks that we saw before?

Answers

  1. The example recipe is written by Bouwe Andela and Mattia Righi.
  2. Manual Schlund is listed as the maintainer of this recipe.
  3. Two datasets are analysed:
    • CMIP6 data from the model BCC-ESM1
    • CMIP5 data from the model CANESM2
  4. The preprocessor annual_mean_global computes an area mean as well as annual means
  5. The diagnostic called map executes a script referred to as script1. This is a python script named examples/diagnostic.py
  6. There are two diagnostics: map and timeseries. Under the diagnostic map we find two tasks:
    • a preprocessor task called tas, applying the preprocessor called select_january to the variable tas.
    • a diagnostic task called script1, applying the script examples/diagnostic.py to the preprocessed data (map/tas).

    Under the diagnostic timeseries we find three tasks:

    • a preprocessor task called tas_amsterdam, applying the preprocessor called annual_mean_amsterdam to the variable tas.
    • a preprocessor task called tas_global, applying the preprocessor called annual_mean_global to the variable tas.
    • a diagnostic task called script1, applying the script examples/diagnostic.py to the preprocessed data (timeseries/tas_global and timeseries/tas_amsterdam).

Pro tip: short names and variable groups

The preprocessor tasks in ESMValTool are called ‘variable groups’. For the diagnostic timeseries, we have two variable groups: tas_amsterdam and tas_global. Both of them operate on the variable tas (as indicated by the short_name), but they apply different preprocessors. For the diagnostic map the variable group itself is named tas, and you’ll notice that we do not explicitly provide the short_name. This is a shorthand built into ESMValTool.

Output files

Have another look at the output directory created by the ESMValTool run.

Which files/folders are created by each task?

Answer

  • map/tas: creates /preproc/map/tas, which contains preprocessed data for each of the input datasets, and a file called metadata.yml describing the contents of these datasets.
  • timeseries/tas_global: creates /preproc/timeseries/tas_global, which contains preprocessed data for each of the input datasets, and metadata.yml.
  • timeseries/tas_amsterdam: creates /preproc/timeseries/tas_amsterdam, which contains preprocessed data for each of the input datasets, plus a combined MultiModelMean, and metadata.yml.
  • map/script1: creates /run/map/script1 with general information and a log of the diagnostic script run. It also creates /plots/map/script1 and /work/map/script1, which contain output figures and output datasets, respectively. For each output file, there is also corresponding provenance information in the form of .svg, .xml, .bibtex and .txt files.
  • timeseries/script1: creates /run/timeseries/script1 with general information and a log of the diagnostic script run. It also creates /plots/timeseries/script1 and /work/timeseries/script1, which contain output figures and output datasets, respectively. For each output file, there is also corresponding provenance information in the form of .svg, .xml, .bibtex and .txt files.

Pro tip: diagnostic logs

When you run ESMValTool, any log messages from the diagnostic script are not printed on the terminal. But they are written to the log.txt files in the folder /run/<diag_name>/log.txt.

ESMValTool does print a command that can be used to re-run a diagnostic script. When you use this the output will be printed to the command line.

Modifying the example recipe

Let’s make a small modification to the example recipe. Notice that now that you have copied and edited the recipe, you can use

esmvaltool run recipe_example.yml

to refer to your local file rather than the default version shipped with ESMValTool.

Change your location

Modify and run the recipe to analyse the temperature for your own location.

Solution

In principle, you only have to modify the latitude and longitude coordinates in the preprocessor called annual_mean_amsterdam. However, it is good practice to also replace all instances of amsterdam with the correct name of your location. Otherwise the log messages and output will be confusing. You are free to modify the names of preprocessors or diagnostics.

In the diff file below you will see the changes we have made to the file. The top 2 lines are the filenames and the lines like @@ -29,10 +29,10 @@ represent the line numbers in the original and modified file, respectively. For more info on this format, see here.

--- recipe_python.yml
+++ recipe_python_london.yml
@@ -29,10 +29,10 @@
     extract_month:
       month: 1

-  annual_mean_amsterdam:
+  annual_mean_london:
     extract_point:
-      latitude: 52.379189
-      longitude: 4.899431
+      latitude: 51.5074
+      longitude: 0.1278
       scheme: linear
     annual_statistics:
       operator: mean
@@ -71,16 +71,16 @@
           cmap: Reds

   timeseries:
-    description: Annual mean temperature in Amsterdam and global mean since 1850.
+    description: Annual mean temperature in London and global mean since 1850.
     themes:
       - phys
     realms:
       - atmos
     variables:
-      tas_amsterdam:
+      tas_london:
         short_name: tas
         mip: Amon
-        preprocessor: annual_mean_amsterdam
+        preprocessor: annual_mean_london
         start_year: 1850
         end_year: 2000
       tas_global:

Key Points

  • ESMValTool recipes work ‘out of the box’ (if input data is available)

  • There are strong links between the recipe, log file, and output folders

  • Recipes can easily be modified to re-use existing code for your own use case


Conclusion of the basic tutorial

Overview

Teaching: 10 min
Exercises: 0 min
Questions
  • What do I do now?

  • Where can I get help?

  • What if I find a bug?

  • Where can I find more information about ESMValtool?

  • How can I cite ESMValtool?

Objectives
  • breathe - you’re finished now!

  • Congratulations & Thanks!

  • Find out about the mini-tutorials, and what to do next.

Congratulations!

Congratulations on completing the ESMValTool tutorial! You should be now ready to go and start using ESMValTool independently.

The rest of this tutorial contains individual mini-tutorials to help work through a specific issue (not developed yet).

What next?

From here, there are lots of ways that you can continue to use ESMValTool.

Exercise: What do you want to do next?

  • Think about what you want to do with ESMValTool.
    • Decide what datasets and variables you want to use.
    • Is any observational data available?
    • How will you preprocess the data?
    • What will your diagnostic script need to do?
    • What will your final figure show?

Where can I get more information on ESMValTool?

Additional resources:

Where can I get more help?

There are lots of resources available for helping you use ESMValTool.

ESMValTool Discussions page. is a good place to learn about general issues, or to see if your question has been already addressed. If you have a GitHub account, you can also post your questions on the page.

If you get stuck, a great starting point is to create a new issue.

There is also an ESMValTool email list. Please see information on how to subscribe to user mailing list.

What if I find a bug?

If you find a bug, please report it back to the ESMValTool team. This will help us fix it, so that you can continue working, but also it means that ESMValTool will be more stable for everyone else as well.

To report a bug, please create a new issue using the new issue page.

In your bug report, please describe the problem as clearly and as completely as possible. You may need to include a recipe or the output log as well.

How do I cite the Tutorial?

Please use citation information available at https://doi.org/10.5281/zenodo.3974591.

Key Points

  • Individual mini-tutorials help work through a specific issue (not developed yet).

  • We are constantly improving this tutorial.


Writing your own recipe

Overview

Teaching: 15 min
Exercises: 30 min
Questions
  • How do I create a new recipe?

  • Can I use different preprocessors for different variables?

  • Can I use different datasets for different variables?

  • How can I combine different preprocessor functions?

Objectives
  • Create a recipe with multiple preprocessors

  • Use different preprocessors for different variables

  • Run a recipe with variables from different datasets

Introduction

One of the key strenghts of ESMValTool is in making complex analyses reusable and reproducible. But that doesn’t mean everything in ESMValTool needs to be complex. Sometimes, the biggest challenge is in making things simpler. You probably know the ‘warming stripes’ visualization by Professor Ed Hawkins. On the site https://showyourstripes.info you can find the same visualization for many regions in the world.

Warming stripes Shared by Ed Hawkins under a Creative Commons 4.0 Attribution International licence. Source: https://showyourstripes.info

In this episode, we will reproduce and extend this functionality with ESMValTool. We have prepared a small Python script that takes a NetCDF file with timeseries data, and visualizes it in the form of our desired warming stripes figure.

You can find the diagnostic script that we will use here (warming_stripes.py).

Download the file and store it in your working directory. If you want, you may also have a look at the contents, but it is not necessary to follow along.

We will write an ESMValTool recipe that takes some data, performs the necessary preprocessing, and then runs our Python script.

Drawing up a plan

Previously, we have seen that ESMValTool executed a number of tasks. Write down which tasks we will need to do in this episode. And what do each of these tasks do?

Answer

In this episode, we will need to do 2 tasks:

  • A preprocessing task that converts the gridded temperature data to a timeseries of global temperature anomalies
  • A diagnostic tasks that calls our Python script, taking our preprocessed timeseries data as input.

Building a recipe from scratch

The easiest way to make a new recipe is to start from an existing one, and modify it until it does exactly what you need. However, in this episode we will start from scratch. This forces us to think about all the steps. We will deal with common errors as they occur throughout the development.

Remember the basic structure of a recipe, and notice that each of them is extensively described in the documentation under the header “The recipe format”:

This is the first place to look for help if you get stuck.

Open a new file called recipe_warming_stripes.yml:

nano recipe_warming_stripes.yml

Let’s add the standard header comments (these do not do anything), and a first description.

# ESMValTool
# recipe_warming_stripes.yml
---
documentation:
  description: Reproducing Ed Hawkins' warming stripes visualization

Notice that yaml always requires 2 spaces indentation between the different levels. Pressing ctrl+o will save the file. Verify the filename at the bottom and press enter. Then use ctrl+x to exit the editor.

We will try to run the recipe after every modification we make, to see if it (still) works.

esmvaltool run recipe_warming_stripes.yml

In this case, it gives an error. Below you see the last few lines of the error message.

...
Error validating data /home/user/esmvaltool_tutorial/recipe_barcodes.yml with schema /home/user/miniconda3/envs/esmvaltool_tutorial/lib/python3.8/site-packages/esmvalcore/recipe_schema.yml
	documentation.authors: Required field missing
2020-10-08 15:23:11,162 UTC [19451] INFO    If you suspect this is a bug or need help, please open an issue on https://github.com/ESMValGroup/ESMValTool/issues and attach the run/recipe_*.yml and run/main_log_debug.txt files from the output directory.

Here, ESMValTool is telling us that it is missing a required field, namely the authors. It is good to know that ESMValTool always tries to validate the recipe in an early stage. This initial check doesn’t catch everything though, so we should always stay alert.

Let’s add some additional information to the recipe. Open the recipe file again, and add an authors section below the description. ESMValTool expects the authors as a list, like so:

authors:
  - lastname_firstname

To bypass a number of similar error messages, add a minimal diagnostics section below the documentation. The file should now look like:

# ESMValTool
# recipe_warming_stripes.yml
---
documentation:
  description: Reproducing Ed Hawkins' warming stripes visualization
  authors:
    - doe_john
diagnostics:
  dummy_diagnostic_1:
    scripts: null

This is the minimal recipe layout that is required by ESMValTool. If we now run the recipe again, you will probably see the following error:

ValueError: Tag 'doe_john' does not exist in section 'authors' of /home/user/miniconda3/envs/esmvaltool_tutorial/python3.8/site-packages/esmvaltool/config-references.yml

Pro tip: config-references.yml

The error message above points to a file named config-references.yml. This is where ESMValTool stores all its citation information. To add yourself as an author, add your name in the form lastname_firstname in alphabetical order following the existing entries, under the # Development team comment. See the List of authors section in the ESMValTool documentation for more information.

For now, let’s just use one of the existing references. Change the author field to righi_mattia, who cannot receive enough credit for all the effort he put into ESMValTool. If you now run the recipe again, you should see the final message

INFO    Run was successful

Adding a dataset entry

Let’s add a datasets section. We will reuse the same datasets that we used in previous episodes. The data files are stored in ~/esmvaltool_tutorial/data.

Filling in the dataset keys

Explore the data directory, and look at the explanation of the dataset entry in the ESMValTool documentation. For both the datasets, write down the following properties:

  • project
  • variable (short name)
  • CMIP table
  • dataset (model name or obs/reanalysis dataset)
  • experiment
  • ensemble member
  • grid
  • start year
  • end year

Answers

key file 1 file 2
project CMIP6 CMIP5
short name tas tas
CMIP table Amon Amon
dataset BCC-ESM1 CanESM2
experiment historical historical
ensemble r1i1p1f1 r1i1p1
grid gn (native grid) N/A
start year 1850 1850
end year 2014 2005

Note that the grid key is only required for CMIP6 data, and that the extent of the historical period has changed between CMIP5 and CMIP6.

We will start with the BCC-ESM1 dataset. Add a datasets section to the recipe, listing a single dataset, like so:

datasets:
  - {dataset: BCC-ESM1, project: CMIP6, mip: Amon, exp: historical, ensemble: r1i1p1f1, grid: gn, start_year: 1850, end_year: 2014}

Verify that the recipe still runs. Note that we have not included the short name of the variable in this dataset section. This allows us to reuse this dataset entry with different variable names later on. This is not really necessary for our simple use case, but it is common practice in ESMValTool.

Adding the preprocessor section

Above, we already described the preprocessing task that needs to convert the standard, gridded temperature data to a timeseries of temperature anomalies.

Defining the preprocessor

Have a look at the available preprocessors in the documentation. Write down

  • Which preprocessor functions do you think we should use?
  • What are the parameters that we can pass to these functions?
  • What do you think should be the order of the preprocessors?
  • A suitable name for the overall preprocessor

Solution

We need to calculate anomalies and global means. There is an anomalies preprocessor which needs a granularity, a reference period, and whether or not to standardize the data. The global means can be calculated with the area_statistics preprocessor, which takes an operator as argument (in our case we want to compute the mean).

The default order in which these preprocessors are applied can be seen here: area_statistics comes before anomalies. If you want to change this, you can use the custom_order preprocessor. We will keep it like this.

Let’s name our preprocessor global_anomalies.

Add the following block to your recipe file:

preprocessors:
  global_anomalies:
    area_statistics:
      operator: mean
    anomalies:
        period: month
        reference:
          start_year: 1981
          start_month: 1
          start_day: 1
          end_year: 2010
          end_month: 12
          end_day: 31
        standardize: false

and verify that the recipe still runs.

Completing the diagnostics section

Now we are ready to finish our diagnostics section. Remember that we want to make two tasks: a preprocessor task, and a diagnostic task. To illustrate that we can also pass settings to the diagnostic script, we add the option to specify a custom colormap.

Fill in the blanks

Extend the diagnostics section in your recipe by filling in the blanks in the following template:

diagnostics:
  <... (suitable name for our diagnostic)>:
    description: <...>
    variables:
      <... (suitable name for the preprocessed variable)>:
        short_name: <...>
        preprocessor: <...>
    scripts:
      <... (suitable name for our python script)>:
        script: <full path to python script>
        colormap: <... choose from matplotlib colormaps>

Solution

diagnostics:
  diagnostic_warming_stripes:
    description: visualize global temperature anomalies as warming stripes
    variables:
      global_temperature_anomalies_global:
        short_name: tas
        preprocessor: global_anomalies
    scripts:
      warming_stripes_script:
        script: ~/esmvaltool_tutorial/warming_stripes.py
        colormap: 'bwr'

Now you should be able to run the recipe to get your own warming stripes.

Note: for the purpose of simplicity in this episode, we have not added logging or provenance tracking in the diagnostic script. Once you start to develop your own diagnostic scripts and want to add them to the ESMValTool repositories, this will be required. However, writing your own diagnostic script is beyond the scope of the basic tutorial.

Bonus exercises

Below are a couple of exercise to practice modifying the recipe. For your reference, here’s a copy of the recipe at this point. This will be the point of departure for each of the modifications we’ll make below.

Specific location

On showyourstripes.org, you can download stripes for specific locations. We will reproduce this possibility. Look at the available preprocessors in the documentation, and replace the global mean with a suitable alternative.

Solution

You could have used extract_point or extract_region. We used extract_point. Here’s a copy of the recipe at this point and this is the difference with the previous recipe:

--- recipe_warming_stripes.yml
+++ recipe_warming_stripes_local.yml
@@ -10,9 +10,11 @@
   - {dataset: BCC-ESM1, project: CMIP6, mip: Amon, exp: historical, ensemble: r1i1p1f1, grid: gn, start_year: 1850, end_year: 2014}

 preprocessors:
-  global_anomalies:
-    area_statistics:
-      operator: mean
+  anomalies_amsterdam:
+    extract_point:
+      latitude: 52.379189
+      longitude: 4.899431
+      scheme: linear
     anomalies:
       period: month
       reference:
@@ -27,9 +29,9 @@
 diagnostics:
   diagnostic_warming_stripes:
     variables:
-      global_temperature_anomalies:
+      temperature_anomalies_amsterdam:
         short_name: tas
-        preprocessor: global_anomalies
+        preprocessor: anomalies_amsterdam
     scripts:
       warming_stripes_script:
         script: ~/esmvaltool_tutorial/warming_stripes.py

Different periods

Split the diagnostic in 2: the second one should use a different period. You’re free to choose the periods yourself. For example, 1 could be ‘recent’, the other ‘20th_century’. For this, you’ll have to add a new variable group.

Solution

Here’s a copy of the recipe at this point and this is the difference with the previous recipe:

--- recipe_warming_stripes_local.yml
+++ recipe_warming_stripes_periods.yml
@@ -7,7 +7,7 @@
     - righi_mattia

 datasets:
-  - {dataset: BCC-ESM1, project: CMIP6, mip: Amon, exp: historical, ensemble: r1i1p1f1, grid: gn, start_year: 1850, end_year: 2014}
+  - {dataset: BCC-ESM1, project: CMIP6, mip: Amon, exp: historical, ensemble: r1i1p1f1, grid: gn}

 preprocessors:
   anomalies_amsterdam:
@@ -29,9 +29,16 @@
 diagnostics:
   diagnostic_warming_stripes:
     variables:
-      temperature_anomalies_amsterdam:
+      temperature_anomalies_recent:
         short_name: tas
         preprocessor: anomalies_amsterdam
+        start_year: 1950
+        end_year: 2014
+      temperature_anomalies_20th_century:
+        short_name: tas
+        preprocessor: anomalies_amsterdam
+        start_year: 1900
+        end_year: 1999
     scripts:
       warming_stripes_script:
         script: ~/esmvaltool_tutorial/warming_stripes.py

Different preprocessors

Now that you have different variable groups, we can also use different preprocessors. Add a second preprocessor to add another location of your choosing.

Pro-tip: if you want to avoid repetition, you can use YAML anchors.

Solution

Here’s a copy of the recipe at this point and this is the difference with the previous recipe:

--- recipe_warming_stripes_periods.yml
+++ recipe_warming_stripes_multiple_locations.yml
@@ -15,7 +15,7 @@
       latitude: 52.379189
       longitude: 4.899431
       scheme: linear
-    anomalies:
+    anomalies: &anomalies
       period: month
       reference:
         start_year: 1981
@@ -25,18 +25,24 @@
         end_month: 12
         end_day: 31
       standardize: false
+  anomalies_london:
+    extract_point:
+      latitude: 51.5074
+      longitude: 0.1278
+      scheme: linear
+    anomalies: *anomalies

 diagnostics:
   diagnostic_warming_stripes:
     variables:
-      temperature_anomalies_recent:
+      temperature_anomalies_recent_amsterdam:
         short_name: tas
         preprocessor: anomalies_amsterdam
         start_year: 1950
         end_year: 2014
-      temperature_anomalies_20th_century:
+      temperature_anomalies_20th_century_london:
         short_name: tas
-        preprocessor: anomalies_amsterdam
+        preprocessor: anomalies_london
         start_year: 1900
         end_year: 1999
     scripts:

Additional datasets

So far we have defined the datasets in the datasets section of the recipe. However, it’s also possible to add specific datasets only for specific variable groups. Look at the documentation to learn about the ‘additional_datasets’ keyword, and add a second dataset only for one of the variable groups.

Solution

Here’s a copy of the recipe at this point and this is the difference with the previous recipe:

--- recipe_warming_stripes_multiple_locations.yml
+++ recipe_warming_stripes_additional_datasets.yml
@@ -45,6 +45,8 @@
         preprocessor: anomalies_london
         start_year: 1900
         end_year: 1999
+        additional_datasets:
+          - {dataset: CanESM2, project: CMIP5, mip: Amon, exp: historical, ensemble: r1i1p1}
     scripts:
       warming_stripes_script:
         script: ~/esmvaltool_tutorial/warming_stripes.py

Key Points

  • A recipe can work with different preprocessors at the same time.

  • The setting additional_datasets can be used to add a different dataset.

  • Variable groups are useful for defining different settings for different variables.


Development and contribution

Overview

Teaching: 10 min
Exercises: 20 min
Questions
  • What is a development installation?

  • How can I test new or improved code?

  • How can I incorporate my contributions into ESMValTool?

Objectives
  • Execute a successful ESMValTool installation from the source code.

  • Contribute to ESMValTool development.

We now know how ESMValTool works, but how do we develop it? ESMValTool is an open-source project in ESMValGroup. We can contribute to its development by:

In this lesson, we first show how to set up a development installation of ESMValTool so you can make changes or additions. We then explain how you can contribute these changes to the community.

Git knowledge

For this episode, you need some knowledge of Git. You can refresh your knowledge in the corresponding Git carpentries course.

Development installation

We’ll explore how ESMValTool can be installed it in a develop mode. Even if you aren’t collaborating with the community, this installation is needed to run your new codes with ESMValTool. Let’s get started.

1 Source code

The ESMValTool source code is available on a public GitHub repository: https://github.com/ESMValGroup/ESMValTool. To obtain the code, there are two options:

  1. download the code from the repository. A ZIP file called ESMValTool-master.zip is downloaded. To continue the installation, unzip the file, move to the ESMValTool-master directory and then follow the sequence of steps starting from 2 ESMValTool dependencies.
  2. clone the repository if you want to contribute to the ESMValTool development:
git clone https://github.com/ESMValGroup/ESMValTool.git

This command will ask your GitHub username and a personal token as password. Please follow instructions on GitHub token authentication requirements to create a personal access token. After the authentication, the output might look like:

Cloning into 'ESMValTool'...
remote: Enumerating objects: 163, done.
remote: Counting objects: 100% (163/163), done.
remote: Compressing objects: 100% (125/125), done.
remote: Total 95049 (delta 84), reused 76 (delta 30), pack-reused 94886
Receiving objects: 100% (95049/95049), 175.16 MiB | 5.48 MiB/s, done.
Resolving deltas: 100% (68808/68808), done.

Now, a folder called ESMValTool has been created in your working directory. This folder contains the source code of the tool. To continue the installation, we move into the ESMValTool directory:

cd ESMValTool

Note that the master branch is checked out by default. We can see this if we run:

git status
On branch master
Your branch is up to date with 'origin/master'.

nothing to commit, working tree clean

2 ESMValTool dependencies

It is recommended to use conda to manage ESMValTool dependencies. For a minimal conda installation, see section Install Conda in lesson Installation. To simplify the installation process, an environment file environment.yml is provided in the ESMValTool directory. We create an environment by running:

conda env create --file environment.yml

The environment is called esmvaltool by default. If an esmvaltool environment is already created following the lesson Installation, we should choose another name for the new environment in this lesson by:

conda env create -n a_new_name --file environment.yml

For more information see conda managing environments. Now, we should activate the environment:

conda activate esmvaltool

3 ESMValTool installation

ESMValTool can be installed in a develop mode by running:

pip install --editable '.[develop]'

This will add the esmvaltool directory to the Python path in editable mode and install the development dependencies. We should check if the installation works properly. To do this, run the tool with:

esmvaltool --help

If the installation is successful, ESMValTool prints a help message to the console.

Checking the development installation

We can use the command conda list to list installed packages in the esmvaltool environment. Use this command to check that ESMValTool is installed in a develop mode.

Tip: see the documentation on conda list.

Solution

Run:

conda list esmvaltool
# Name                    Version                   Build  Channel
esmvaltool                2.1.1                     dev_0    <develop>

As can be seen, it is mentioned <develop> under the Channel.

4 Updating ESMValTool

The master branch has the latest features of ESMValTool. Please make sure that the source code on your machine is up-to-date. If you obtain the source code using git clone as explained in step 1 Source code, you can run git pull to update the source code. Then ESMValTool installation will be updated with changes from the master branch.

Contribution

We have seen how to install ESMValTool in a develop mode. Now, we try to contribute to its development. Let’s see how we can achieve this.

Review process

We first discuss our ideas in an issue in ESMValTool repository. This can avoid disappointment at a later stage, for example, if more people are doing the same thing. It also gives other people an early opportunity to provide input and suggestions, which results in more valuable contributions.

Then, we create a new branch locally and start developing new codes. Once our development is finished, we can initiate a pull request. For a full description of the GitHub workflow, please see ESMValTool documentation on GitHub Workflow.

The pull request will be tested, discussed and merged. This is called “review process”. The process will take some effort and time to learn. However, a few (command line) tools can get you a long way, and we’ll cover those essentials in the next sections.

Tip: we encourage you to keep the pull requests small. Reviewing small incremental changes are more efficient.

Background

We saw ‘warming stripes’ in lesson Writing your own recipe. Imagine the following task: you want to contribute warming stripes recipe and diagnostics to ESMValTool. You have to add the diagnostics warming_stripes.py and the recipe recipe_warming_stripes.yml to their locations in ESMValTool directory. After these changes, you should also check if everthing works fine. This is where we take advantage of the tools that are introduced later.

Let’s get started.

Check code quality

We aim to adhere to best practices and coding standards. There are several tools that check our code against those standards like:

The good news is that pre-commit has been already installed when we chose development installation. pre-commit is a command line and runs all of those tools. It also fixes some of those errors. To explore other tools, have a look at ESMValTool documentation on Code style.

Using pre-commit

Let’s checkout our local branch and add the script warming_stripes.py to the esmvaltool/diag_scripts directory.

cd ESMValTool
git checkout your_branch_name
cp path_of_warming_stripes.py esmvaltool/diag_scripts/

By default, pre-commit only runs on the files that have been staged in git:

git status
git add esmvaltool/diag_scripts/warming_stripes.py
pre-commit run --files esmvaltool/diag_scripts/warming_stripes.py

Inspect the output of pre-commit and fix the remaining errors.

Solution

The output of pre-commit:

Check for added large files..............................................Passed
Check python ast.........................................................Passed
Check for case conflicts.................................................Passed
Check for merge conflicts................................................Passed
Debug Statements (Python)................................................Passed
Fix End of Files.........................................................Passed
Trim Trailing Whitespace.................................................Passed
yamllint.............................................(no files to check)Skipped
nclcodestyle.........................................(no files to check)Skipped
style-files..........................................(no files to check)Skipped
lintr................................................(no files to check)Skipped
codespell................................................................Passed
isort....................................................................Passed
yapf.....................................................................Passed
docformatter.............................................................Failed
- hook id: docformatter
- files were modified by this hook
flake8...................................................................Failed
- hook id: flake8
- exit code: 1

esmvaltool/diag_scripts/warming_stripes.py:20:5:
F841 local variable 'nx' is assigned to but never used

As can be seen above, there are two Failed check:

  1. docformatter: it is mentioned that “files were modified by this hook”. We run git diff to see the modifications. The syntax """ at the end of docstring is moved by one line.
  2. flake8: the error message is about an unused local variable nx. We should check our codes regarding the usage of nx. For now, let’s assume that it is added by mistake and remove it.

Run unit tests

Previous section introduced some tools to check code style and quality. What it hasn’t done is show us how to tell whether our code is getting the right answer. To achieve that, we need to write and run tests for widely-used functions. ESMValTool comes with a lot of tests that are in the folder tests.

To run tests, first we make sure that the working directory is ESMValTool and our local branch is checked out. Then, we can run tests using pytest locally:

pytest

Tests will also be run automatically by CircleCI, when you submit a pull request.

Running tests

Let’s checkout our local branch and add the recipe recipe_warming_stripes.yml to the the esmvaltool/recipes directory:

cp path_of_recipe_warming_stripes.yml esmvaltool/recipes/

Run pytest and inspect the results. If a test is failed, try to fix it.

Solution

Run:

pytest

When pytest run is complete, you can inspect the test reports that are printed in the console. Have a look at the second section of the report FAILURES:

================================ FAILURES ==========================================
______________ test_recipe_valid[recipe_warming_stripes.yml] ______________

The test message shows that the recipe recipe_warming_stripes.yml is not a valid recipe. Look for a line that starts with an E in the rest of the message:


E           esmvalcore._task.DiagnosticError: Cannot execute script
'~/esmvaltool_tutorial/warming_stripes.py' (~/esmvaltool_tutorial/warming_stripes.py):
file does not exist.

To fix the recipe, we need to edit the path of the diagnostic script as warming_stripes.py:

   scripts:
     warming_stripes_script:
       script: warming_stripes.py

For details, see lesson Writing your own diagnostic script.

Build documentation

When we add or update a code, we also update its corresponding documentation. The ESMValTool documentation is available on docs.esmvaltool.org. The source files are located in ESMValTool/doc/sphinx/source/.

To build documentation locally, first we make sure that the working directory is ESMValTool and our local branch is checked out. Then, we run:

python setup.py build_sphinx -Ea

Similar to code, documentation should be well written and adhere to standards. If the documentation is built properly, the previous command prints a message to the console:

build succeeded.

The HTML pages are in doc/sphinx/build/html.

The main page of the documentation has been built into index.html in doc/sphinx/build/html directory. To preview this page locally, we open the file in a web browser:

xdg-open doc/sphinx/build/html/index.html

Creating a documenation

In previous exercises, we added the recipe recipe_warming_stripes.yml to ESMValTool. Now, we create a documentation file recipe_warming_stripes.rst for this recipe:

nano doc/sphinx/source/recipes/recipe_warming_stripes.rst

Add a reference i.e. .. _recipe_warming_stripes:, a section title and some text about the recipe like:

.. _recipe_warming_stripes:

Reproducing Ed Hawkins' warming stripes visualization
======================================================

This recipe produces warming stripes plots.

Save and close the file. We can think of this file as one page of a book. Then, we need to decide where this page should be located inside the book. The table of content is defined by index.rst. Let’s have a look at the content:

nano doc/sphinx/source/recipes/index.rst

Add the recipe name i.e. recipe_warming_stripes to the section Other in this file and preview the recipe documentation page locally.

Solution

First, we add the recipe name recipe_warming_stripes to the section Other:

Other
^^^^^
.. toctree::
  :maxdepth: 1
  ...
  ...
  recipe_warming_stripes

Then, we build and preview the documentation page:

python setup.py build_sphinx -Ea
xdg-open doc/sphinx/build/html/recipes/recipe_warming_stripes.html

Congratulations! You are now ready to make a pull request.

Key Points

  • A development installation is needed if you want to incorporate your code into ESMValTool.

  • Contributions include adding a new or improved script or helping with a review process.

  • There are several tools to help improve the quality of your code.

  • It is possible to run tests on your machine.

  • You can preview documentation pages locally.


Writing your own diagnostic script

Overview

Teaching: 20 min
Exercises: 30 min
Questions
  • How do I write a new diagnostic in ESMValTool?

  • How do I use the preprocessor output in a Python diagnostic?

Objectives
  • Write a new Python diagnostic script.

  • Explain how a diagnostic script reads the preprocessor output.

Introduction

The diagnostic script is an important component of ESMValTool and it is where the scientific analysis or performance metric is implemented. With ESMValTool, you can adapt an existing diagnostic or write a new script from scratch. Diagnostics can be written in a number of open source languages such as Python, R, Julia and NCL but we will focus on understanding and writing Python diagnostics in this lesson.

In this lesson, we will explain how to find an existing diagnostic and run it using ESMValTool installed in editable/development mode. For a development installation, see the instructions in the lesson Development and contribution. Also, we will work with the recipe recipe_python.yml and the diagnostic script diagnostic.py called by this recipe that we have seen in the lesson Running your first recipe.

Let’s get started!

Understanding an existing Python diagnostic

After a development mode installation, a folder called ESMValTool is created in your working directory. This folder contains the source code of the tool. We can find the recipe recipe_python.yml and the python script diagnostic.py in these directories:

Let’s have look at the code in diagnostic.py. For reference, we show the diagnostic code in the dropdown box below. There are four main sections in the script:

diagnostic.py

 1:  """Python example diagnostic."""
 2:  import logging
 3:  from pathlib import Path
 4:  from pprint import pformat
 5:
 6:  import iris
 7:
 8:  from esmvaltool.diag_scripts.shared import (
 9:      group_metadata,
10:      run_diagnostic,
11:      save_data,
12:      save_figure,
13:      select_metadata,
14:      sorted_metadata,
15:  )
16:  from esmvaltool.diag_scripts.shared.plot import quickplot
17:
18:  logger = logging.getLogger(Path(__file__).stem)
19:
20:
21:  def get_provenance_record(attributes, ancestor_files):
22:      """Create a provenance record describing the diagnostic data and plot."""
23:      caption = ("Average {long_name} between {start_year} and {end_year} "
24:                 "according to {dataset}.".format(**attributes))
25:
26:      record = {
27:          'caption': caption,
28:          'statistics': ['mean'],
29:          'domains': ['global'],
30:          'plot_types': ['zonal'],
31:          'authors': [
32:              'andela_bouwe',
33:              'righi_mattia',
34:          ],
35:          'references': [
36:              'acknow_project',
37:          ],
38:          'ancestors': ancestor_files,
39:      }
40:      return record
41:
42:
43:  def compute_diagnostic(filename):
44:      """Compute an example diagnostic."""
45:      logger.debug("Loading %s", filename)
46:      cube = iris.load_cube(filename)
47:
48:      logger.debug("Running example computation")
49:      cube = iris.util.squeeze(cube)
50:      return cube
51:
52:
53:  def plot_diagnostic(cube, basename, provenance_record, cfg):
54:      """Create diagnostic data and plot it."""
55:
56:      # Save the data used for the plot
57:      save_data(basename, provenance_record, cfg, cube)
58:
59:      if cfg.get('quickplot'):
60:          # Create the plot
61:          quickplot(cube, **cfg['quickplot'])
62:          # And save the plot
63:          save_figure(basename, provenance_record, cfg)
64:
65:
66:  def main(cfg):
67:      """Compute the time average for each input dataset."""
68:      # Get a description of the preprocessed data that we will use as input.
69:      input_data = cfg['input_data'].values()
70:
71:      # Demonstrate use of metadata access convenience functions.
72:      selection = select_metadata(input_data, short_name='tas', project='CMIP5')
73:      logger.info("Example of how to select only CMIP5 temperature data:\n%s",
74:                  pformat(selection))
75:
76:      selection = sorted_metadata(selection, sort='dataset')
77:      logger.info("Example of how to sort this selection by dataset:\n%s",
78:                  pformat(selection))
79:
80:      grouped_input_data = group_metadata(input_data,
81:                                          'variable_group',
82:                                          sort='dataset')
83:      logger.info(
84:          "Example of how to group and sort input data by variable groups from "
85:          "the recipe:\n%s", pformat(grouped_input_data))
86:
87:      # Example of how to loop over variables/datasets in alphabetical order
88:      groups = group_metadata(input_data, 'variable_group', sort='dataset')
89:      for group_name in groups:
90:          logger.info("Processing variable %s", group_name)
91:          for attributes in groups[group_name]:
92:              logger.info("Processing dataset %s", attributes['dataset'])
93:              input_file = attributes['filename']
94:              cube = compute_diagnostic(input_file)
95:
96:              output_basename = Path(input_file).stem
97:              if group_name != attributes['short_name']:
98:                  output_basename = group_name + '_' + output_basename
99:              provenance_record = get_provenance_record(
100:                  attributes, ancestor_files=[input_file])
101:              plot_diagnostic(cube, output_basename, provenance_record, cfg)
102:
103:
104:  if __name__ == '__main__':
105:
106:      with run_diagnostic() as config:
107:          main(config)
108:

What is the starting point of a diagnostic?

  1. Can you spot a function called main in the code above?
  2. What are its input arguments?
  3. How many times is this function mentioned?

Answer

  1. The main function is defined in line 66 as main(cfg).
  2. The input argument to this function is the variable cfg, a Python dictionary that holds all the necessary information needed to run the diagnostic script such as the location of input data and various settings. We will next parse this cfg variable in the main function and extract information as needed to do our analyses (e.g. in line 69).
  3. The main function is called near the very end on line 107. So, it is mentioned twice in our code - once where it is called by the top-level Python script and second where it is defined.

The function run_diagnostic

The function run_diagnostic (line 106) is called a context manager provided with ESMValTool and is the main entry point for most Python diagnostics.

Preprocesor-diagnostic interface

In the previous exercise, we have seen that the variable cfg is the input argument of the main function. The first thing passed to the diagnostic via the cfg dictionary is a path to a file called settings.yml. The ESMValTool documentation page provides an overview of what is in this file, see Diagnostic script interfaces.

What information do I need when writing a diagnostic script?

From the lesson Configuration, we saw how to change the configuration settings before running a recipe. First we set the option remove_preproc_dir to false in the configuration file, then run the recipe recipe_python.yml:

esmvaltool run recipe_example.yml
  1. Find one example of the file settings.yml in the run directory?
  2. Take a look at the input_files list. It contains pathes to some files metadata.yml. What information do you think is saved in those files?

Answer

  1. One example of settings.yml can be found in the directory: path_to_recipe_output/run/map/script1/settings.yml
  2. The metadata.yml files hold information about the preprocessed data. There is one file for each variable having detailed information on your data including project (e.g., CMIP6, CMIP5), dataset names (e.g., BCC-ESM1, CanESM2), variable attributes (e.g., standard_name, units), preprocessor applied and time range of the data. You can use all of this information in your own diagnostic.

Diagnostic shared functions

Looking at the code in diagnostic.py, we see that input_data is read from the cfg dictionary (line 69). Now we can group the input_data according to some criteria such as the model or experiment. To do so, ESMValTool provides many functions such as select_metadata (line 72), sorted_metadata (line 76), and group_metadata (line 80). As you can see in line 8, these functions are imported from esmvaltool.diag_scripts.shared that means these are shared across several diagnostics scripts. A list of available functions and their description can be found in The ESMValTool Diagnostic API reference.

Extracting information needed for analyses

We have seen the functions used for selecting, sorting and grouping data in the script. What do these functions do?

Answer

There is a statement after use of select_metadata, sorted_metadata and group_metadata that starts with logger.info (lines 73, 77 and 83). These lines print output to the log files. In the previous exercise, we ran the recipe recipe_python.yml. If you look at the log file path_to_recipe_output/run/map/script1/log.txt, you can see the output from each of these functions, for example:

2021-03-05 13:19:38,184 [34706] INFO     diagnostic,83  Example of how to group and
sort input data by variable groups from the recipe:
{'tas': [{'activity': 'CMIP',
        'alias': 'CMIP6',
        'dataset': 'BCC-ESM1',
        'diagnostic': 'map',
        'end_year': 2000,
        'ensemble': 'r1i1p1f1',
        'exp': 'historical',
        'filename': '~/recipe_python_20210305_131929/preproc/map/tas/CMIP6_BCC-ESM1_Amon_historical_r1i1p1f1_tas_2000-2000.nc',
        'frequency': 'mon',
        'grid': 'gn',
        'institute': ['BCC'],
        'long_name': 'Near-Surface Air Temperature',
        'mip': 'Amon',
        'modeling_realm': ['atmos'],
        'preprocessor': 'select_january',
        'project': 'CMIP6',
        'recipe_dataset_index': 0,
        'short_name': 'tas',
        'standard_name': 'air_temperature',
        'start_year': 2000,
        'units': 'K',
        'variable_group': 'tas'},
       {'alias': 'CMIP5',
        'dataset': 'CanESM2',
        'diagnostic': 'map',
        'end_year': 2000,
        'ensemble': 'r1i1p1',
        'exp': 'historical',
        'filename': '~/recipe_python_20210305_131929/preproc/map/tas/CMIP5_CanESM2_Amon_historical_r1i1p1_tas_2000-2000.nc',
        'frequency': 'mon',
        'institute': ['CCCma'],
        'long_name': 'Near-Surface Air Temperature',
        'mip': 'Amon',
        'modeling_realm': ['atmos'],
        'preprocessor': 'select_january',
        'project': 'CMIP5',
        'recipe_dataset_index': 1,
        'short_name': 'tas',
        'standard_name': 'air_temperature',
        'start_year': 2000,
        'units': 'K',
        'variable_group': 'tas'}]}

This is how we can access preprocessed data within our diagnostic.

Diagnostic computation

After grouping and selecting data, we can read individual attributes (such as filename) of each item. Here we have grouped the input data by variables so we loop over the variables (line 89-93). Following this, is a call to the function compute_diagnostic (line 94). Let’s have a look at the definition of this function in line 43 where the actual analysis on the data is done.

Note that output from the ESMValCore preprocessor is in the form of NetCDF files. Here, compute_diagnostic uses Iris to read data from a netCDF file and performs an operation squeeze to remove any dimensions of length one. We can adapt this function to add our own analysis. As an example, here we calculate the bias using the average of the data using Iris cubes.

def compute_diagnostic(filename):
    """Compute an example diagnostic."""
    logger.debug("Loading %s", filename)
    cube = iris.load_cube(filename)

    logger.debug("Running example computation")
    cube = iris.util.squeeze(cube)

    # Calculate a bias using the average of data
    cube.data = cube.core_data() - cube.data.mean()
    return cube

iris cubes

Iris reads data from NetCDF files into data structures called cubes. The data in these cubes can be modified, combined with other cubes’ data or plotted.

Reading data using xarray

Alternately, you can use xarrays to read the data instead of Iris.

Answer

First, import xarray package at the top of the script as:

import xarray as xr

Then, change the compute_diagnostic as:

def compute_diagnostic(filename):
   """Compute an example diagnostic."""
   logger.debug("Loading %s", filename)
   dataset = xr.open_dataset(filename)

   #do your analyses on the data here

   return dataset

Reading data using the netCDF4 package

Yet another option to read the NetCDF file data is to use the netCDF-4 Python interface to the netCDF C library.

Answer

First, import the netCDF4 package at the top of the script as:

import netCDF4

Then, change compute_diagnostic as:

def compute_diagnostic(filename):
   """Compute an example diagnostic."""
   logger.debug("Loading %s", filename)
   nc_data = netCDF4.Dataset(filename,'r')

   #do your analyses on the data here

   return netcdf_file

Diagnostic output

Plotting the output

Often, the end product of a diagnostic script is a plot or figure. The Iris cube returned from the compute_diagnostic function (line 94) is passed to the plot_diagnostic function (line 101). Let’s have a look at the definition of this function in line 53. This is where we would plug in our plotting routine in the diagnostic script.

More specifically, the quickplot function (line 61) can be replaced with the function of our choice. As can be seen, this function uses **cfg['quickplot'] as an input argument. If you look at the diagnostic section in the recipe recipe_python.yml, you see quickplot is a key there:

     script1:
       script: examples/diagnostic.py
        quickplot:
          plot_type: pcolormesh
          cmap: Reds

This way, we can pass arguments such as the type of plot pcolormesh and the colormap cmap:Reds from the recipe to the quickplot function in the diagnostic.

Passing arguments from the recipe to the diagnostic

Change the type of the plot and its colormap and inspect the output figure.

Answer

In the recipe recipe_python.yml, you could change plot_type and cmap. As an example, we choose plot_type: pcolor and cmap: BuGn:

    script1:
      script: examples/diagnostic.py
       quickplot:
         plot_type: pcolor
         cmap: BuGn

The plot can be found at path_to_recipe_output/plots/map/script1/png.

ESMValTool makes it possible to produce a wide array of plots and figures as seen in the gallery.

Saving the output

In our example, the function save_data in line 57 is used to save the Iris cube. The saved files can be found under the work directory in a .nc format. There is also the function save_figure in line 63 to save the plots under the plot directory in a .png format (or preferred format specified in your configuration settings). Again, you may choose your own method of saving the output.

Recording the provenance

When developing a diagnostic script, it is good practice to record provenance. To do so, we use the function get_provenance_record (line 99). Let us have a look at the definition of this function in line 21 where we describe the diagnostic data and plot. Using the dictionary record, it is possible to add custom provenance to our diagnostics output. Provenance is stored in the W3C PROV XML format and also in an SVG file under the work and plot directory. For more information, see recording provenance.

Congratulations!

You now know the basic diagnostic script structure and some available tools for putting together your own diagnostics. Have a look at existing recipes and diagnostics in the repository for more examples of functions you can use in your diagnostics!

Key Points

  • ESMValTool provides helper functions to interface a Python diagnostic script with preprocessor output.

  • Existing diagnostics can be used as templates and modified to write new diagnostics.

  • Helper functions can be imported from esmvaltool.diag_scripts.shared and used in your own diagnostic script.


CMORization: adding new datasets to ESMValTool

Overview

Teaching: 15 min
Exercises: 45 min
Questions
  • CMORization: what is it and why do we need it?

  • How to use the existing CMORizer scripts shipped with ESMValTool?

  • How to add support for new (observational) datasets?

Objectives
  • Understand what CMORization is and why it is necessary.

  • Use existing scripts to CMORize your data.

  • Write a new CMORizer script to support additional data.

Data flow with ESMValTool

Introduction

This episode deals with “CMORization”. ESMValTool is designed to work with data that follow the CMOR standards. Unfortunately, not all datasets follow these standards. In order to use such datasets in ESMValTool we first need to reformat the data. This process is called “CMORization”.

What are the CMOR standards?

The name “CMOR” originates from a tool: the Climate Model Output Rewriter. This tool is used to create “CF-Compliant netCDF files for use in the CMIP projects”. So CMOR extends the CF-standard with additional requirements for the Coupled Model Intercomparison Projects (see e.g. here).

Concretely, the CMOR standards dictate e.g. the variable names and units, coordinate information, how the data should be structured (e.g. 1 variable per file), additional metadata requirements, but also file naming conventions a.k.a. the data reference syntax (DRS). All this information is stored in so-called CMOR tables. As an example, the CMOR tables for the CMIP6 project can be found here.

ESMValTool offers two ways to CMORize data:

  1. A reformatting script can be used to create a CMOR-compliant copy. CMORizer scripts for several popular datasets are included in ESMValTool, and ESMValTool also provides a convenient way to execute them.
  2. ESMValCore can execute CMOR fixes ‘on the fly’. The advantage is that you don’t need to store an additional, reformatted copy of the data. The disadvantage is that these fixes should be implemented inside ESMValCore, which is beyond the scope of this tutorial.

In this lesson, we will re-implement a CMORizer script for the FLUXCOM dataset that contains observations of the Gross Primary Production (GPP), a variable that is important for calculating components of the global carbon cycle.

We will assume that you are using a development installation of ESMValTool as explained in the Development and Contribution episode.

Obtaining the data

The data for this episode is available via the FluxCom Data Portal. First you’ll need to register. After registration, in the dropdown boxes, select FLUXCOM as the data choice and click download. Three files will be displayed. Click the download button on the “FLUXCOM (RS+METEO) Global Land Carbon Fluxes using CRUNCEP climate data”. You’ll receive an email with the FTP address to access the server. Connect to the server, follow the path in your email, and look for the file raw/monthly/GPP.ANN.CRUNCEPv6.monthly.2000.nc. Download that file and save it in a folder called /RAWOBS/Tier3/FLUXCOM.

Note: you’ll need a user-friendly ftp client. On Linux, ncftp works okay.

What is the deal with those “tiers”?

Many datasets come with access restrictions. In this way the data providers can keep track of how their data is used. In many cases “restricted access” just means that one has to register with an email address and accept the terms of use, which typically ask that you acknowledge the data providers.

There are also datasets available that do not need a registration. The “obs4MIPs” or “ana4MIPs” datasets, for example, are specifically produced to facilitate comparisons with model simulations.

To reflect these different levels of access restriction, the ESMValTool team has created a tier-system. The definition of the different tiers are as follows:

  • Tier1: obs4MIPs and ana4MIPS datasets (can be used directly with the ESMValTool)
  • Tier2: other freely available datasets (most of them will need some kind of cmorization)
  • Tier3: datasets with access restrictions (most of these datasets will also need some kind of cmorization)

These access restrictions are also the reason why the ESMValTool developers cannot distribute copies or automate downloading of all observations and reanalysis data used in the recipes. As a compromise we provide the CMORization scripts so that each user can CMORize their own copy of the access restricted datasets if they need them.

Run the existing CMORizer script

Before we develop our own CMORizer script, let’s first see what happens when we run the existing one. There is a specific command available in the ESMValTool to run the CMORizer scripts:

cmorize_obs -c <config-user.yml> -o <dataset-name>

The config-user-yml is the file in which we define the different data paths, e.g. where the ESMValTool would find the “RAWOBS” folder. The dataset-name needs to be identical to the folder name that was created to store the raw observation data files, in our case this would be “FLUXCOM”.

If everything is okay, the output should look something like this:

...
... Starting the CMORization Tool at time: 2021-02-26 14:02:16 UTC
... ----------------------------------------------------------------------
... input_dir  = /home/peter/data/RAWOBS
... output_dir = /home/peter/esmvaltool_output/cmorize_obs_20210226_140216
... ----------------------------------------------------------------------
... Running the CMORization scripts.
... Using cmorizer scripts repository: /home/peter/miniconda3/envs/esmvaltool/lib/python3.8/site-packages/esmvaltool/cmorizers/obs
... Processing datasets {'Tier3': ['FLUXCOM']}
... Input data from: /home/peter/data/RAWOBS/Tier3/FLUXCOM
... Output will be written to: /home/peter/esmvaltool_output/cmorize_obs_20210226_140216/Tier3/FLUXCOM
... Reformat script: /home/peter/miniconda3/envs/esmvaltool/lib/python3.8/site-packages/esmvaltool/cmorizers/obs/cmorize_obs_fluxcom
... CMORizing dataset FLUXCOM using Python script /home/peter/miniconda3/envs/esmvaltool/lib/python3.8/site-packages/esmvaltool/cmorizers/obs/cmorize_obs_fluxcom.py
... Found input file '/home/peter/data/RAWOBS/Tier3/FLUXCOM/GPP.ANN.CRUNCEPv6.monthly.*.nc'
... CMORizing variable 'gpp'
... Lmon
... Var is gpp
... ... UserWarning: Ignoring netCDF variable 'GPP' invalid units 'gC m-2 day-1'
  warnings.warn(msg)
... Fixing time...
... Fixing latitude...
... Fixing longitude...
... Flipping dimensional coordinate latitude...
... Saving file
... Converting data type of data from 'float64' to 'float32'
... Saving: /home/peter/esmvaltool_output/cmorize_obs_20210226_140216/Tier3/FLUXCOM/OBS_FLUXCOM_reanaly_ANN-v1_Lmon_gpp_200001-200012.nc
... Cube has lazy data [lazy is preferred]
... Ending the CMORization Tool at time: 2021-02-26 14:02:16 UTC
... Time for running the CMORization scripts was: 0:00:00.605970

So you can see that several fixes are applied, and the CMORized file is written to the ESMValTool output directory. In order to use it, we’ll have to copy it from the output directory to a folder called <path_to_your_data>/OBS/Tier3/FLUXCOM and make sure the path to OBS is set correctly in our config-user file.

You can also see the path where ESMValTool stores the reformatting script: <path to esmvaltool>/esmvaltool/cmorizers/obs/cmorize_obs_fluxcom.py. You may have a look at this file if you want. The script also uses a configuration file: <path to esmvaltool>/esmvaltool/cmorizers/obs/cmor_config/FLUXCOM.yml.

Make a test recipe

To verify that the data is correctly CMORized, we will make a simple test recipe. As illustrated in the figure at the top of this episode, one of the steps that ESMValTool executes is a CMOR-check. If the data is not correctly CMORized, ESMValTool will give a warning or error.

Create a test recipe

Create a simple recipe called recipe_check_fluxcom.yml that loads the FLUXCOM data. It should include a datasets section with a single entry for the “FLUXCOM” dataset with the correct dataset keys, and a diagnostics section with two variables: gpp. We don’t need any preprocessors or scripts (set scripts: null), but we have to add a documentation section with a description, authors and maintainer, otherwise the recipe will fail.

Use the following dataset keys:

  • project: OBS
  • dataset: FLUXCOM
  • type: reanaly
  • version: ANN-v1
  • mip: Lmon
  • start_year: 2000
  • end_year: 2000
  • tier: 3

Some of these dataset keys are further explained in the callout boxes in this episode.

Answer

Here’s an example recipe

documentation:

  description: Test recipe for FLUXCOM data

  authors:
    - kalverla_peter

  maintainer:
    - kalverla_peter

datasets:
  - {project: OBS, dataset: FLUXCOM, mip: Lmon, tier: 3, start_year: 2000, end_year: 2000, type: reanaly, version: ANN-v1}

diagnostics:
  check_fluxcom:
    description: Check that ESMValTool can load the cmorized fluxnet data without errors.
    variables:
      gpp:
    scripts: null

To learn more about writing a recipe, please refer to Writing your own recipe.

Try to run the example recipe with

esmvaltool run recipe_check_fluxcom.yml --log_level debug

If everything is okay, the recipe should run without problems.

Starting from scratch

Now that you’ve seen how to use an existing CMORizer script, let’s think about adding a new one. We will remove the existing CMORizer script, and re-implement it from scratch. This exercise allows us to point out all the details of what’s going on. We’ll also remove the CMORized data that we’ve just created, so our test recipe will not be able to use it anymore.

rm <path_to_your_data>/OBS/Tier3/FLUXCOM/OBS_FLUXCOM_reanaly_ANN-v1_Lmon_gpp_200001-200012.nc
rm <path_to_esmvaltool>/esmvaltool/cmorizers/obs/cmorize_obs_fluxcom.nc
rm <path to esmvaltool>/esmvaltool/cmorizers/obs/cmor_config/FLUXCOM.yml

If you now run the test recipe again it should fail, and somewhere in the output you should find something like:

No input files found for ...
Looking for files matching ['OBS_FLUXCOM_reanaly_ANN-v1_Lmon_gpp[_.]*nc'] in ['/home/peter/data/OBS/Tier3/FLUXCOM']

From this we can see that the first thing our CMORizer should do is to rename the file so that it follows the CMOR filename conventions.

Create a new CMORizer script and a corresponding config file

The first step now is to create a new file in the right folder that will contain our new CMORizer instructions. Create a file called cmorize_obs_fluxcom.py

nano <path_to_esmvaltool>/esmvaltool/cmorizers/obs/cmorize_obs_fluxcom.py

and fill it with the following boilerplate code:

"""ESMValTool CMORizer for FLUXCOM GPP data.

<We will add some useful info here later>
"""
import logging
from . import utilities as utils

logger = logging.getLogger(__name__)

def cmorization(in_dir, out_dir, cfg, _):
    """Cmorize the dataset."""

    # This is where you'll add the cmorization code
    # 1. find the input data
    # 2. apply the necessary fixes
    # 3. store the data with the correct filename

Here, in_dir corresponds to the input directory of the raw files, out_dir to the output directory of final reformatted data set and cfg to a configuration dictionary given by a configuration file that we will get to shortly. When you type the command cmorize_obs in the terminal, ESMValTool will call this function with the settings found in your configuration files.

The ESMValTool CMORizer also needs a dataset configuration file. Create a file called <path_to_esmvaltool>/esmvaltool/cmorizers/obs/cmor_config/FLUXCOM.yml and fill it with the following boilerplate:

---
# filename: ???

attributes:
  project_id: OBS6
#   dataset_id: ???
#   version: ???
#   tier: ???
#   modeling_realm: ???
#   source: ???
#   reference: ???
#   comment: ???

# variables:
#   ???:
#     mip: ???

Note: the name of this file must be identical to dataset name.

As you can see, the configuration file contains information about the original filename of the dataset, and some additional metadata that you might recognize from the CMOR filename structure. It also contains a list of variables that’s available for this dataset. We’ll add this information step by step in the following sections.

RAWOBS, OBS, OBS6!?

In the configuration above we’ve already filled in the project_id. ESMValTool uses these project IDs to find the data on your hard drive, and also to find more information about the data. The RAWOBS and OBS projects refer to external data before and after CMORization, respectively. Historically, most external data were observations, hence the naming.

In going from CMIP5 to CMIP6, the CMOR standards changed a bit. For example, some variables were renamed, which posed a dilemma: should CMORization reformat to the CMIP5 or CMIP6 definition? To solve this, the OBS6 project was created. So OBS6 data follow the CMIP6 standards, and that’s what we’ll use for the new CMORizer.

You can try running the CMORizer at this point, and it should work without errors. However, it doesn’t produce any output yet:

cmorize_obs -c <config-user.yml> -o FLUXCOM

1. Find the input data

First we’ll get the CMORizer script to locate our FLUXCOM data. We can use the information from the in_dir and cfg variables. Add the following snippet to your CMORizer script:

# 1. find the input data
logger.info("in_dir: '%s'", in_dir)
logger.info("cfg: '%s'", cfg)

If you run the CMORizer again, it will print out the content of these variables.

Load the data

Try to locate the input data inside the CMORizer script and load it (we’ll use iris because ESMValTool includes helper utilities for iris cubes). Confirm that you’ve loaded the data by logging the correct path and (part of the) file content.

Solution

There are many ways to do it. In any case, you should have added the original filename to the configuration file (and un-commented this line): filename: 'GPP.ANN.CRUNCEPv6.monthly.*.nc'. Note the *: this is a useful shorthand to find multiple files for different years. In a similar way we can also look for multiple variables, etc.

Here’s an example solution (inserted directly under the original comment):

# 1. find the input data
filename_pattern = cfg['filename']
matches = Path(in_dir).glob(filename_pattern)

for match in matches:
    input_file = str(match)
    logger.info("found: %s", input_file)
    cube = iris.load_cube(input_file)
    logger.info("content: %s", cube)

To make this work we’ve added import iris and from pathlib import Path at the top of the file. Note that we’ve started a loop, since we may find multiple files if there’s more than one year of data available.

2. Save the data with the correct filename

Before we start adding fixes, we’ll first make sure that our CMORizer can also write output files with the correct name. This will enable us to use the test recipe for the CMOR compatibility check.

We can use the save function from the utils that we imported at the top. The call signature looks like this: utils.save_variables(cube, var, outdir, attrs, **kwargs).

We already have the cube and the outdir. The variable short name (var) and attributes (attrs) are set through the configuration file. So we need to find out what the correct short name and attributes are.

The standard attributes for CMIP variables are defined in the CMIP tables. These tables are differentiated according to the “MIP” they belong to. The tables are a copy of the PCMDI guidelines.

Find the variable “gpp” in a CMOR table

Check the available CMOR tables to find the variable “gpp” with the following characteristics:

  • standard_name: gross_primary_productivity_of_biomass_expressed_as_carbon
  • frequency: mon
  • modeling_realm: land

Answers

The variable “gpp” belongs to the land variables. The temporal resolution that we are looking for is “monthly”. This information points to the “Lmon” CMIP table. And indeed, the variable “gpp” can be found in the file here.

If the variable you are interested in is not available in the standard CMOR tables, you could write a custom CMOR table entry for the variable. This, however, is beyond the scope of this tutorial.

Fill the configuration file

Uncomment the following entries in your configuration file and fill them with appropriate values:

  • dataset_id
  • version
  • tier
  • modeling_realm
  • short_name (the ??? immediately under variables)
  • mip

Answers

The configuration file now look something like this:

---
filename: 'GPP.ANN.CRUNCEPv6.monthly.*.nc'

attributes:
  project_id: OBS6
  dataset_id: FLUXCOM
  version: 'ANN-v1'
  tier: 3
  modeling_realm: reanaly
  source: ''
  reference: ''
  comment: ''

variables:
  gpp:
    mip: Lmon

Now that we have set this information correctly in the config file, we can call the save function. Add the following python code to your CMORizer script:

# 3. store the data with the correct filename
attributes = cfg['attributes']
variables = cfg['variables']

for short_name, variable_info in variables.items():
    all_attributes = {**attributes, **variable_info}  # add the mip to the other attributes
    utils.save_variable(cube=cube, var=short_name, outdir=out_dir, attrs=all_attributes)

Since we only have one variable (gpp), the loop is not strictly necessary. However, this makes it possible to add more variables later on.

Was the CMORization successful so far?

If you run the CMORizer again, you should see that it creates an output file named OBS_FLUXCOM_reanaly_ANN-v1_Lmon_gpp_xxxx01-xxxx12.nc. The “xxxx” and “yyyy” represent the start and end year of the data.

Great! So we have produced a NetCDF file with the CMORizer that follows the naming convention for ESMValTool datasets. Let’s have a look at the NetCDF file as it was written with the very basic CMORizer from above.

ncdump -h OBS6_FLUXCOM_reanaly_ANN-v1_Lmon_gpp_200001-200012.nc
netcdf OBS6_FLUXCOM_reanaly_ANN-v1_Lmon_gpp_200001-200012 {
dimensions:
        time = 12 ;
        lat = 360 ;
        lon = 720 ;
variables:
        float GPP(time, lat, lon) ;
                GPP:_FillValue = 1.e+20f ;
                GPP:long_name = "GPP" ;
        double time(time) ;
                time:axis = "T" ;
                time:units = "days since 1582-10-15 00:00:00" ;
                time:standard_name = "time" ;
                time:calendar = "gregorian" ;
        double lat(lat) ;
        double lon(lon) ;

// global attributes:
                :_NCProperties = "version=2,netcdf=4.7.4,hdf5=1.10.6" ;
                :created_by = "Fabian Gans [fgans@bgc-jena.mpg.de], Ulrich Weber [uweber@bgc-jena.mpg.de]" ;
                :flux = "GPP" ;
                :forcing = "CRUNCEPv6" ;
                :institution = "MPI-BGC-BGI" ;
                :invalid_units = "gC m-2 day-1" ;
                :method = "Artificial Neural Networks" ;
                :provided_by = "Martin Jung [mjung@bgc-jena.mpg.de] on behalf of FLUXCOM team" ;
                :reference = "Jung et al. 2016, Nature; Tramontana et al. 2016, Biogeosciences" ;
                :temporal_resolution = "monthly" ;
                :title = "GPP based on FLUXCOM RS+METEO with CRUNCEPv6 climate " ;
                :version = "v1" ;
                :Conventions = "CF-1.7" ;
}

The file contains a variable named “GPP” that contains three dimensions: “time”, “lat”, “lon”. Notice the strange time units, and the invalid_units in the global attributes section. Also it seems that there is not information available about the lat and lon coordinates. These are just some of the things we’ll address in the next section.

3. Implementing additional fixes

Copy the output of the CMORizer to your folder <path to your data>/OBS6/Tier3/ and change the test recipe to look for OBS6 data instead of OBS (note: we’re upgrading the CMORizer to newer standards here!).

If we now run the test recipe on our newly ‘CMORized’ data,

esmvaltool run recipe_check_fluxcom.yml --log_level debug

it should be able to find the correct file, but it does not succeed yet. The first thing that the ESMValTool CMOR checker brings up is:

iris.exceptions.UnitConversionError: Cannot convert from unknown units. The
"units" attribute may be set directly.

If you look closely at the error messages, you can see that this error concerns the units of the coordinates. ESMValTool tries to fix them automatically, but since no units are defined on the coordinates, this fails.

The cmorizer utilities also include a function called fix_coords, but before we can use it, we’ll also need to make sure the coordinates have the correct standard name. Add the following code to your cmorizer:

# Fix/add coordinate information and metadata
cube.coord('lat').standard_name = 'latitude'
cube.coord('lon').standard_name = 'longitude'
utils.fix_coords(cube)

With some additional refactoring, our cmorization function might then look something like this:

def cmorization(in_dir, out_dir, cfg, _):
    """Cmorize the dataset."""

    # Get general information from the config file
    attributes = cfg['attributes']
    variables = cfg['variables']

    for short_name, variable_info in variables.items():
        logger.info("CMORizing variable: %s", short_name)

        # 1a. Find the input data (one file for each year)
        filename_pattern = cfg['filename']
        matches = Path(in_dir).glob(filename_pattern)

        for match in matches:
            # 1b. Load the input data
            input_file = str(match)
            logger.info("found: %s", input_file)
            cube = iris.load_cube(input_file)

            # 2. Apply the necessary fixes
            # 2a. Fix/add coordinate information and metadata
            cube.coord('lat').standard_name = 'latitude'
            cube.coord('lon').standard_name = 'longitude'
            utils.fix_coords(cube)

            # 3. Save the CMORized data
            all_attributes = {**attributes, **variable_info}
            utils.save_variable(cube=cube, var=short_name, outdir=out_dir, attrs=all_attributes)

Have a look at the netCDF file, and confirm that the coordinates now have much more metadata added to them. Then, run the test recipe again with the latest CMORizer output. The next error is:

esmvalcore.cmor.check.CMORCheckError: There were errors in variable GPP:
Variable GPP units unknown can not be converted to kg m-2 s-1 in cube:

Okay, so let’s fix the units of the “GPP” variable in the CMORizer. Remember that you can find the correct units in the CMOR table. Add the following three lines to our CMORizer:

# 2b. Fix gpp units
logger.info("Changing units for gpp from gc/m2/day to kg/m2/s")
cube.data = cube.core_data() / (1000 * 86400)
cube.units = 'kg m-2 s-1'

If everything is okay, the test recipe should now pass. We’re getting there. Looking through the output though, there’s still a warning.

WARNING There were warnings in variable GPP:
Standard name for GPP changed from None to gross_primary_productivity_of_biomass_expressed_as_carbon
Long name for GPP changed from GPP to Carbon Mass Flux out of Atmosphere Due to Gross Primary Production on Land [kgC m-2 s-1]

ESMValTool is able to apply automatic fixes here, but if we are running a CMORizer script anyway, we might as well fix it immediately.

Add the following snippet:

# 2c. Fix metadata
cmor_table = cfg['cmor_table']
cmor_info = cmor_table.get_variable(variable_info['mip'], short_name)
utils.fix_var_metadata(cube, cmor_info)

You can see that we’re using the CMOR table here. This was passed on by ESMValTool as part of the CFG input variable. So here we’re making sure that we’re updating the cubes metadata to conform to the CMOR table.

Finally, the test recipe should run without errors or warnings.

4. Finalizing the CMORizer

Once everything works as expected, there’s a couple of things that we can still do.

Fill out the header for the “FLUXCOM” dataset

Fill out the header of the new CMORizer. The different parts that need to be present in the header are the following:

  • Caption: the first line of the docstring should summarize what the script does.
  • Tier
  • Source
  • Last access
  • Download and processing instructions

Answers

The header for the “FLUXCOM” dataset could look something like this:

"""ESMValTool CMORizer for FLUXCOM GPP data.

Tier
    Tier 3: restricted dataset.

Source
    http://www.bgc-jena.mpg.de/geodb/BGI/Home

Last access
    20190727

Download and processing instructions
    From the website, select FLUXCOM as the data choice and click download.
    Two files will be displayed. One for Land Carbon Fluxes and one for
    Land Energy fluxes. The Land Carbon Flux file (RS + METEO) using
    CRUNCEP data file has several data files for different variables.
    The data for GPP generated using the
    Artificial Neural Network Method will be in files with name:
    GPP.ANN.CRUNCEPv6.monthly.\*.nc
    A registration is required for downloading the data.
    Users in the UK with a CEDA-JASMIN account may request access to the jules
    workspace and access the data.
    Note : This data may require rechunking of the netcdf files.
    This constraint will not exist once iris is updated to
    version 2.3.0 Aug 2019
"""
# 2d. Update the cubes metadata with all info from the config file
utils.set_global_atts(cube, attributes)

Some final comments

Congratulations! You have just added support for a new dataset to ESMValTool! Adding a new CMORizer is definitely already an advanced task when working with the ESMValTool. You need to have a basic understanding of how the ESMValTool works and how it’s internal structure looks like. In addition, you need to have a basic understanding of NetCDF files and a programming language. In our example we used python for the CMORizing script since we advocate for focusing the code development on only a few different programming languages. This helps to maintain the code and to ensure the compatibility of the code with possible fundamental changes to the structure of the ESMValTool and ESMValCore.

More information about adding observations to the ESMValTool can be found in the documentation.

Key Points

  • CMORizers are dataset-specific scripts that can be run once to generate CMOR-compliant data.

  • ESMValTool comes with a set of CMORizers readily available, but you can also add your own.


Debugging

Overview

Teaching: 30 min
Exercises: 15 min
Questions
  • How can I handle errors/warnings?

Objectives
  • Fix a broken recipe

Every user encounters errors. Once you know why you get certain types of errors, they become much easier to fix. The good news is, ESMValTool creates a record of the output messages and stores them in log files. They can be used for debugging or monitoring the process. This lesson helps to understand what the different types of errors are and when you are likely to encounter them.

Log files

Each time we run ESMValTool, it will produce a new output directory. This directory should contain the run folder that is automatically generated by ESMValTool. To examine this, we run a recipe_example.yml that can be found in Setup. Let’s download it to our working directory esmvaltool_tutorial that was created during the Configuration.

In a new terminal, go to our working directory esmvaltool_tutorial where the file recipe_example.yml is located and run the recipe:

  cd esmvaltool_tutorial
  esmvaltool run recipe_example.yml
esmvaltool: command not found

ESMValTool encounters this error because the conda environment esmvaltool has not been activated. To fix the error, before running the recipe, activate the environment:

conda activate esmvaltool

conda environment

More information about the conda environment can be found at Installation.

Let’s change the working directory to the folder run and list its files:

  cd esmvaltool_output/recipe_example_#_#/run
  ls
diag_timeseries_temperature  main_log_debug.txt   main_log.txt  recipe_example.yml   resource_usage.txt

In the main_log_debug.txt and main_log.txt, ESMValTool writes the output messages, warnings and possible errors that might happen during pre-processings. To inspect them, we can look inside the files. For example:

  cat main_log.txt

Now, let’s have a look inside the folder diag_timeseries_temperature:

  cd diag_timeseries_temperature/timeseries_diag
  ls
log.txt  resource_usage.txt  settings.yml

In the log.txt, ESMValTool writes the output messages, warnings and possible errors that are related to the diagnostic script.

If you encounter an error and don’t know what it means, it is important to read the log information. Sometimes knowing where the error occurred is enough to fix it, even if you don’t entirely understand the message. However, note that you may not always be able to find the error or fix it. In that case, ESMValTool community helps you figure out what went wrong.

Different log files

In the run directory, there are two log files main_log_debug.txt and main_log.txt. What are their differences?

Solution

The main_log_debug.txt contains the output messages from the pre-processor whereas the main_log.txt shows general errors and warnings that might happen in running the recipe and diagnostics script.

Let’s change some settings in the recipe to run a regional pre-processor. We use a text editor called nano to open the recipe file:

  cd ~/esmvaltool_tutorial
  nano recipe_example.yml

Text editor side note

No matter what editor you use, you will need to know where it searches for and saves files. If you start it from the shell, it will (probably) use your current working directory as its default location. We use nano in examples here because it is one of the least complex text editors. Press ctrl + O to save the file, and then ctrl + X to exit nano.

See the recipe_example.yml

01    # ESMValTool
02    # recipe_example.yml
03    ---
04    documentation:
05      description: Demonstrate basic ESMValTool example
06
07      authors:
08        - demora_lee
09        - mueller_benjamin
10        - swaminathan_ranjini
11
12      maintainer:
13        - demora_lee
14
15      references:
16        - demora2018gmd
17        # Some plots also appear in ESMValTool paper 2.
18
19      projects:
20        - ukesm
21
22    datasets:
23      - {dataset: HadGEM2-ES, project: CMIP5, exp: historical, mip: Omon, ensemble: r1i1p1, start_year: 1859, end_year: 2005}
24
25    preprocessors:
26      prep_timeseries:  # For 0D fields
27        annual_statistics:
28          operator: mean
29
30    diagnostics:
31      # --------------------------------------------------
32      # Time series diagnostics
33      # --------------------------------------------------
34      diag_timeseries_temperature:
35        description: simple_time_series
36        variables:
37          timeseries_variable:
38            short_name: thetaoga
39            preprocessor: prep_timeseries
40        scripts:
41          timeseries_diag:
42            script: ocean/diagnostic_timeseries.py

Keys and values in recipe settings

The ESMValTool pre-processors cover a broad range of operations on the input data, like time manipulation, area manipulation, land-sea masking, variable derivation, etc. Let’s add the preprocessor extract_region to the section prep_timeseries:

25    preprocessors:
26      prep_timeseries:  # For 0D fields
27        annual_statistics:
28          operator: mean
29        extract_region:
30          start_longitude: -10
31          end_longitude: 40
32          start_latitude: 27
33          end_latitude: 70

Also, we change the projects value ukesm to tutorial:

19      projects:
20        - tutorial

Then, we save the file and run the recipe:

  esmvaltool run recipe_example.yml
ValueError: Tag 'tutorial' does not exist in section 'projects' of esmvaltool/config-references.yml
2020-06-29 18:09:56,641 UTC [46055] INFO    If you suspect this is a bug or need help,
please open an issue on https://github.com/ESMValGroup/ESMValTool/issues and
attach the run/recipe_*.yml and run/main_log_debug.txt files from the output directory.

The values for the keys author, maintainer, projects and references in the recipe should be known by ESMValTool:

ESMValTool can’t locate the data

You are assisting a colleague with ESMValTool. The colleague replaces the CMIP5 entry in project: CMIP5 to CMIP6 and runs the recipe. However, ESMValTool encounters an error like:

esmvalcore._recipe_checks.RecipeError: Missing data
2020-06-29 17:26:41,303 UTC [43830] INFO    If you suspect this is a bug or need help,
please open an issue on https://github.com/ESMValGroup/ESMValTool/issues and
attach the run/recipe_*.yml and run/main_log_debug.txt files from the output directory.

What suggestions would you give the researcher for fixing the error?

Solution

  1. Inspect main_log.txt
  2. Check user-config.yml to see if the correct directory for input data is introduced
  3. Check the available data, regarding exp, mip, ensemble, start_year, and end_year
  4. Check the variable name in the diag_timeseries_temperature section in the recipe

Check pre-processed data

The setting save_intermediary_cubes in the configuration file can be used to save the pre-processed data. More information about this setting can be found at Configuration.

save_intermediary_cubes

Note that this setting should be only used for debugging, as it significantly slows down the recipe and increases disk usage because a lot of output files need to be stored.

Check diagnostic script path

The result of the pre-processor is passed to the diagnostic_timeseries.py script, that is introduced in the recipe as:

40        scripts:
41          timeseries_diag:
42            script: ocean/diagnostic_timeseries.py

The diagnostic scripts are located in the folder diag_scripts in the ESMValTool installation directory <path_to_esmvaltool>. To find where ESMValTool is located on your system, see Installation.

Let’s see what happens if we can change the script path as:

40        scripts:
41          timeseries_diag:
42            script: diag_scripts/ocean/diagnostic_timeseries.py
  esmvaltool run recipe_example.yml
esmvalcore._task.DiagnosticError: Cannot execute script 'diag_scripts/examples/diagnostic.py' (esmvaltool/diag_scripts/diag_scripts/examples/diagnostic.py): file does not exist.
2020-06-29 20:39:31,669 UTC [53008] INFO    If you suspect this is a bug or need help, please open an issue on https://github.com/ESMValGroup/ESMValTool/issues and attach the run/recipe_*.yml and run/main_log_debug.txt files from the output directory.

The script path should be relative to diag_scripts directory. It means that the script diagnostic_timeseries.py is located in <path_to_esmvaltool>/diag_scripts/ocean/. Alternatively, the script path can be an absolute path. To examine this, we can download the script from the ESMValTool repository:

wget https://github.com/ESMValGroup/ESMValTool/blob/master/esmvaltool/diag_scripts/ocean/diagnostic_timeseries.py

One way to get the absolute path is to run:

readlink -f diagnostic_timeseries.py

Then we can update the script path and run the recipe:

40        scripts:
41          timeseries_diag:
42            script: <path_to_script>/diagnostic_timeseries.py
  esmvaltool run recipe_example.yml

Now examine ./esmvaltool_output/recipe_example_#_#/run/diag_timeseries_temperature/timeseries_diag/ to see if it worked!

Available recipe and diagnostic scripts

ESMValTool provides a broad suite of recipes and diagnostic scripts for different disciplines like atmosphere, climate metrics, future projections, IPCC, land, ocean, ….

Re-running a diagnostic

Look at the main_log.txt file and answer the following question: How to re-run the diagnostic script?

Solution

The main_log.txt file contains information on how to re-run the diagnostic script without re-running the pre-processors:

2020-06-29 20:36:32,844 UTC [52810] INFO    To re-run this diagnostic script, run:

If you run the command in the next line, you will be able to re-run the diagnostic.

Memory issues

If you run out of memory, try setting max_parallel_tasks to 1 in the configuration file. Then, check the amount of memory you need for that by inspecting the file run/resource_usage.txt in the output directory. Using the number there you can increase the number of parallel tasks again to a reasonable number for the amount of memory available in your system.

Key Points

  • There are three different kinds of log files: main_log.txt, and main_log_debug.txt and log.txt.