Running simple experiments

Note

Running even simple experiments with EC-Earth 4 is a complex task, mainly because model and experiment configurations can vary widely. Dependencies might differ from case to case (related to the user and computational environments) and varying configuration parameters will be available, depending on the experiment setup. Hence, the following part of the documentation needs probably more adaptation to your needs than the previously explained steps to build the model.

Furthermore, a number of choices or features may be hard-coded in the scripts, or not yet supported at all. This will change as development of EC-Earth 4 and this documentation progresses.

Caution

Make sure that the EC-Earth 4 environment is correctly created and activated. This includes also setting OASIS_BUILD_PATH and adding ${OASIS_BUILD_PATH}/lib to LD_LIBRARY_PATH, as described in Completing the environment.

To prepare for a simple test experiment, we start from the ScriptEngine example scripts provided in scripts/runtime and subdirectories:

ecearth4> cd scripts/runtime
se> ls -1
scriptlib/
templates/
user-config-example.yml
experiment-config-example.yml

The ScriptEngine runtime environment (SE RTE) is split into separate YAML scripts, partly with respect to the model component they are dealing with, and partly with respect to the runtime stage they belong to. This is done in order to provide a modular approach for different configurations and avoid overly complex scripts and duplication. Most of the YAML scripts are provided in the scriptlib subdirectory.

However, this splitting is not “build into” ScriptEngine or the SE RTE, it is entirely up to the user to adapt the scripts for her needs, possibly splitting up the scripts in vastly different ways.

Main structure of the run scripts

The main run script logic is coded in scriptlib/main.yml, which calls separate scripts for the one leg of the experiment (such as config, setup, pre, run, post, and resubmit), taking into account all model components needed for the model and experiment setup.

However, scriptlib/main.yml and the scripts called therein rely on a correct and consistent set of configuration parameters covering the platform, user, model and experiment configuration. Hence, you have to provide configuration scripts together with scriptlib/main.yml. A typical command to start an EC-Earth 4 experiment might look like:

se> se my-user-config.yml my-platform-config.yml my-experiment-config.yml scriptlib/main.yml

where the my-*-config.yml scripts contain all needed configuration parameters. The name of the configuration scripts and how the parameters are split across them is not hardcoded anywhere in ScriptEngine or the runtime environment. Thus, you are free to adapt the configuration scripts to your needs.

Caution

While you are free to adapt the configuration to your needs, you still need to make sure that the changes result in valid ScriptEngine scripts. For example, the order of scripts is important because some scripts may define context variables that other scripts refer to.

In order to make it easier to get started, examples are provided. To start with, the same platform configuration file that was used to build the model should be used for the runtime environment. Thus, the model can be started with (still assuming that the current working directory is ecearth4/scripts/runtime):

se> se \
  my-user-config.yml \
  ../platforms/my-platform-config.yml \
  my-experiment-config.yml \
  scriptlib/main.yml

As for the experiment (including the model configuration) and user configuration, example scripts are provided.

The experiment configuration contains, for example,

base.context:
  experiment:
    id: TEST
    description: A new ECE4 experiment

and, as part of the model configuration:

base.context:
  model_config:
    components: [oifs, nemo, rnfm, xios, oasis, lpjg]

which configures the model in GCM configuration (which atmosphere, ocean, coupler, and I/O server).

Assuming that all configuration parameters are set in the platform, experiment (and model), and user configuration scripts, the main run script scriptlib/main.yml proceeds with the following steps:

# Submit job to batch system
# ...

# Configure 'main' and all components
- base.include:
    src: "scriptlib/config-{{component}}.yml"
    ignore_not_found: yes
  loop:
    with: component
    in: "{{['main'] + main.components}}"

# On first leg: setup 'main' and all components
# ...

# Pre step for 'main' and all components
# ...

# Start model run for leg
# ...

# Run post step for all components
# ...

# Monitoring
# ...

# Re-submit
# ...

Basically, the run script defines the following stages:

  1. Configure the batch system and submit the job

  2. config-*, which sets configuration parameters for each component.

  3. setup-*, which runs, for each component, once at the beginning of the experiment.

  4. pre-*, which runs, for each component, at each leg before the executables.

  5. run, which starts the actual model run (i.e. the executables).

  6. post-*, which is run, for each component, at each leg after the model run has completed.

  7. resubmit, which submits the model for the following leg.

  8. monitor, which prepares data for online monitoring.

Not every stage has to be present in each model run, and not all stages have to be present for all components. For all stages and components that are present, there is a corresponding scriptlib/<stage>-<component>.yml script, which is included (via the base.include ScriptEngine task). Hence, the main implementation logic of scriptlib/main.yml is to go through all stages and execute all component scripts for that stage, if they exist.

Note that there is an artificial model component, called main, which is executed first in all stages. The corresponding scriptlib/<stage>-main.yml files includes tasks that are general and not associated with a particular component of the model.

Available grid configurations

ECE4-VLR-PALEO

ECE4-VLR

ECE4-LR

ECE4-SR

ECE4-HR

Purpose

Deeptime paleo

Sensitivity

Paleo

CMIP

OptimESM

Atm. grid

TL63.L31

TL63.L31

TL159.L91

TL255.L91

TCO399.L91

Ocean grid

PALEORCA2.L31

ORCA2.L31

ORCA1.L75

eORCA1.L75

eORCA025.L75

Ice-shelf cav.

No

No

No

Yes

No

LPJG

No

No

No

Yes

No

EC-Earth4 is supported for use in five different grid configurations: please keep in mind that the “support” to each configuration is not the same, meaning that some configurations are more tested and have more features available than others.

The standard resolution, ECE4-SR, has an atmosphere and ocean resolution of approximately 80 and 100 km respectively. The lower resolutions are intended for simulating paleoclimates and the high-resolution configuration is intended for use in some specific projects. Note that LPJ-Guess is currently only possible to run with the TL255 atmosphere grid, i.e. the ECE4-SR configuration. Also, volcanic aerosols are only available on the L91 vertical grid.

Note

While it is technically possible to combine any atmosphere grid with any ocean grid, only the configurations listed above are supported.

The grid configuration is controlled in the experiment-config.yml file

model_config:
   oifs:
      grid: TL255L91
   ...
   nemo:
      grid: eORCA1L75_ISO

The _ISO suffix for the NEMO grid sets whether the NEMO grid includes ice-shelf cavities or not. Currently this is only available for the eORCA1.L75 grid.

Resolution-dependent model parameters such as model time step, coupling time step, mixing parameters, etc. are set in the config-*.yml files and the namelist templates.

The experiment schedule

Simulation length

ScriptEngine supports recurrence rules (rrules, RFC 5545) via the Python rrule module in order to define schedules with recurring events.

This is used in the SE RTE to specify the experiment schedule, with start date, leg restart dates, and end date. This allows a great deal of flexibility when defining the experiment, allowing for irregular legs with restarts at almost any combination of dates.

Warning

Event though rrules provide a lot of flexibility for the experiment schedule, it is not certain that all parts of the SE RTE and the model code can deal with arbitrary start/restart dates. This feature is provided in order to not limit the definition of a schedule at a technical level in the RTE.

A simple schedule with yearly restarts could look like:

base.context:
  schedule:
    all: !rrule >
      DTSTART:19900101
      RRULE:FREQ=YEARLY;UNTIL=20000101

which would define the start date of the experiment as 1990-01-01 00:00 and yearly restart on the 1st of January until the end date 2000-01-01 00:00 is reached, i.e. 10 legs.

As another example, two-year legs from 1850 until 1950 would be defined as:

base.context:
  schedule:
    all: !rrule >
      DTSTART:18500101
      RRULE:FREQ=YEARLY;INTERVAL=2;UNTIL=19500101

Initial condition and restarts

NEMO support initialization from different initial conditions with the start_from parameter in the model_config.nemo section.

nemo:
  start_from:
    ts_state:
      file:
      weight_file:
    restart:
      dir:

If left empty, NEMO will start with a cold-start, meaning with homogeneous temperature and salinity fields over all the ocean grid points. Filling ts_state.file with the path to a NetCDF file that contains the 3D temperature (thetao) and salinity (so) fields will activate initialization of temperature and salinity. If the ts_state.file needs to be interpolated, ts_state.weight_file should be filled with the path to the corresponding interpolation weights file. Finally, setting restart.dir where the restart.nc (oce), restart_ice.nc (ice) will activate initialization from global restart files.

Similarly, LPJG can be initialized from different initial conditions with the start_from parameter in the model_config.lpjg section.

lpjg:
  start_from:
     lpjg_state: lpjg_state_1850

Multi-year runs via job.resubmit

Long runs are typically chained as one-year sbatch jobs linked by SLURM afterok. Each sbatch covers one leg, and post-main submits the next:

base.context:
  experiment:
    run_from_scratch: false
    schedule:
      nlegs: 1
      all: !rrule >
        DTSTART:18500101
        RRULE:FREQ=YEARLY;UNTIL=18530101
  job:
    resubmit: true

run_from_scratch: true is incompatible with job.resubmit: true: config-main.yml removes the run directory and disables resubmit whenever run_from_scratch is true. The validated pattern is to set run_from_scratch: false from the first submit. The setup stage in main.yml is gated by not exists(run_dir), so it runs on the first submit (when the run directory is absent) and is skipped on each afterok-chained sbatch that follows. A wrapper preflight asserting the run directory is absent at first-fire is recommended to guard against stale state from earlier failed attempts.

The chain self-terminates when leg.end reaches schedule.end; no further sbatch is queued. The line Submitting job for next leg... in log/<leg.num>/<id>.log is present on every non-final leg and absent on the final.

CMIP GHG forcing

The GHG forcing configuration is set under experiment.forcing.cmip.

The experiment.cmip.forcing section defines the CMIP forcing for the atmosphere and ocean. The parameter version selects the CMIP version, which can be either CMIP6 or CMIP7. Most important, the experiment.forcing.cmip.experiment_id parameter selects the specific CMIP experiment to run, e.g. “historical”, “ssp245”, “ssp370”, “ssp585”, “piControl”, “control-2012”, etc.

Note

CMIP7 ScenarioMIP is still under development. EC-Earth4 only supports “h” and “vl” scenarios so far.

The available options are defined in scripts/runtime/scriptlib/config-cmip-experiment.yml, which can be further configured for extra experiment kind. This file control OpenIFS namelist to set abruptCO2 (via experiment.forcing.cmip.abruptCO2 and experiment.forcing.cmip.NxCO2) or 1pctCO2 (via experiment.forcing.cmip.pctCO2) simulations, as well as run with a fixed year forcing (experiment.forcing.cmip.fixyear).

OpenIFS features

Wave model

The OpenIFS atmosphere model includes the wave model, ECWAM <https://confluence.ecmwf.int/display/OIFS/3.4+OpenIFS%3A+Ocean+waves>. It can be activated by setting

model_config:
  oifs:
    wave_model: True

It is turned on by default since v4.1.6. Note that the wave model cannot run experiments longer than 67 years without setting rollback: true (it is true by default).

Numerical precision

It is possible to compile and run OpenIFS in both double and single precision. By default, EC-Earth will compile and run OpenIFS in double precision. To compile single precision, set the following in scripts/build/user-settings.yml:

build:
  oifs_exe: ['SP-GCM']

This will build the single precision model. You can build both double and single precision simultaneously by setting ['DP-GCM', 'SP-GCM']

To run in single precision, set

model_config:
  oifs:
    precision: SP

If precision is empty, it will default to DP.

Warning

Single precision is to be considered experimental.

Aerosols

By default, without any interactive aerosols, OpenIFS relies on a climatology to describe the aerosols. However, when using CMIP forcing, the MACv2-SP plume parameterization is activated by default. It includes both a direct and indirect aerosol effect. You can control it with:

experiment:
  forcing:
    oifs:
      macv2sp: 0

where 0 will turn off MACv2-SP and 1 will have it on but only use the direct aerosol effect. The default value is 2, i.e. both direct and indirect effect.

Furthermore, the HAM-M7 interactive tropospheric aerosols model has been implemented. Currently it works only for the TL255L91 resolution with CMIP7 emissions. It is switched on by setting activate to true in:

model_config:
  oifs:
      compo:
        activate: true
        aerosols: 'hamm7'
        chemistry: 'SimChem'

When M7 is activated, MACv2-SP is automatically switched off.

Rollback

The date and time in OpenIFS is generally set by reading the dataDate in the ICMGGECE4INIT file (usually 1990-01-01) and then adding seconds_since_origin = NSTEP * TSTEP. After 100 years of simulation, OpenIFS will thus take 1990-01-01 and add 3,155,760,000 seconds or 1753200 time steps (assuming 1800s time step). This presents two problems:

  1. 3,155,760,000 exceeds the max value of an 32-bit integer which causes some part of OpenIFS e.g. WAM to crash.

  2. OpenIFS carries some arrays of size 0:NSTEP. As a result, OpenIFS will run slower each year and also use more memory.

The solution is to “roll back” the time steps in OpenIFS each leg of the simulation by moving the dataDate in ICMGGECE4INIT forward and NSTEP back each leg. For example:

  • Leg 1: Start 1990-01-01 and NSTEP = 0.

  • Leg 2: Start 1990-01-01 and NSTEP = 17520.

  • Leg 3: Start 1991-01-01 and NSTEP = 17520.

  • Leg 4: Start 1992-01-01 and NSTEP = 17568 (account for leap year).

In other words, OpenIFS is “tricked” into thinking each leg is the 2nd leg.

rollback is turned on/off in the runscript and is on by default.

model_config:
  oifs:
    rollback: true

Caution

rollback cannot be changed during a simulation. It must be true or false at all times.

NEMO Features

PISCES and Passive Tracers

To enable PISCES (the biogeochemistry model in NEMO) and/or inert tracers (water age, CFCs, radiocarbon), you need to compile the TOP (Tracers in the Ocean Paradigm) module. Edit user-settings.yml to include top_active: true and compile NEMO.

PISCES and any of the passive tracers can be activated independently by setting the model_config.nemo.pisces and model_config.nemo.inerttrc parameters, respectively. If either of these parameters is set, nemo executable compiled with TOP will be linked into runtime directory, as well as corresponding namelist and xml templates, inidata files and tracer restarts.

  • To activate PISCES, set:

    model_config:
      nemo:
        pisces: true
    
  • To activate passive tracers, add them to the inerttrc list. For example, to enable all five available inert tracers:

    model_config:
      nemo:
        inerttrc: [age, cfc11, cfc12, sf6, c14]
    

Inidata for PISCES is the same as in ECE3. Several text files, which provide surface boundary conditions for passive tracers, are distributed within the NEMO sources and have been copied to the inidata directory: splco2.dat, atmc14.dat, CFCs_CDIAC.dat, CFCs_CMIP6.dat.

PISCES can be coupled with the atmosphere; it will receive CO2 concentrations from OIFS and send back CO2 fluxes. See the CO2 Coupling section below for details.

Surface restoring and nudging

Surface restoring of temperature and salinity is activated when the name of target observational data experiment.forcing.nemo_ssr.data is not empty (applies to both coupled and ocean-only runs).

To apply surface restoring:

  • Set the data parameter to point to your surface restoring dataset:

    experiment:
      forcing:
        nemo:
          ssr:
            data: s5         # use ORAS5 reanalysis data available on HPC2020 and MN5
            climatology:      # false for inter-annual data (default), true to use a climatology
    
  • Specify the restoring strength coefficients:

    experiment:
      forcing:
        nemo:
          ssr:
            sstr_coeff: 0          # temperature is not relaxed
            sssr_coeff: -166.67    # standard salinity restoring strength for the forced configuration.
    
  • The user may specify their own conventions for target observational data location, file names and variables. If empty, default (BSC) conventions are used:

    experiment:
      forcing:
        nemo:
          ssr:
            conventions: # custom conventions can be specified here
    

The conventions for file location, names and variables within them are predefined in scripts/runtime/scriptlib/config-nemo-nudging.yml. The namelist template is provided in scripts/runtime/templates/nemo/nemo-ssr_default.namelist.j2.

3D nudging/damping

Nudging can be applied to temperature, salinity in the ocean interior. This feature is useful for producing ocean reconstructions and ocean restart files for initialized predictions.

To activate 3D nudging:

  • Set the data parameter to point to your target dataset:

    experiment:
      forcing:
        nemo:
          dmp:
            data: en4-v4.2.2  # e.g. en4-v4.2.2 to use EN4 reanalysis data available on HPC2020 and MN5
            climatology:      # false for inter-annual data (default), true to use a climatology
            resto:            # name of the resto.nc. If empty, default one will be used.
    
  • Relaxation timescales for 3D nudging are defined in the resto.nc file. If resto: field is left empty, the default one provided in the {{experiment.repo_dir}}/nudging/ocean/RESTO_DEFAULT directory will be used. You can generate your own resto.nc file using NEMO’s DMP_TOOLS. See scripts/runtime/scriptlib/config-nemo-nudging.yml for details.

  • Optionally specify custom conventions for target data location, names and variables:

    experiment:
      forcing:
        nemo:
          dmp:
            conventions: # custom conventions can be specified here
    

The conventions for file location, names and variables within them are predefined in scripts/runtime/scriptlib/config-nemo-nudging.yml.

Tracer surface restoring

PISCES tracer surface restoring is activated when experiment.forcing.nemo.trcssr.data is not empty. This applies a relaxation for selected tracers at the ocean surface, restoring them towards observational data.

To activate tracer surface restoring:

  • Set the data parameter to point to your tracer restoring dataset:

    experiment:
      forcing:
        nemo:
          trcssr:
            data: ESA_CCI_v5  # e.g. ESA_CCI_v5, ESA_CCI_v5-clim, or ESA_CCI_v6
    

These datasets are available from BSC upon request. Please contact Raffaele Bernardello/Valentina Sicardi/Vladimir Lapin for more information.

  • Specify conventions for file location, names and variables. If empty, default (BSC) conventions are used:

    experiment:
      forcing:
        nemo:
          trcssr:
            conventions: # custom conventions can be specified here
    

The conventions for file location, names and variables within them are predefined in scripts/runtime/scriptlib/config-nemo-nudging.yml. A matching namelist template is provided in scripts/runtime/templates/nemo/nemo-trcssr_default.namelist.j2.

Ice shelf and iceberg melt

Ice shelf cavities can either be closed or open (model_config.nemo.grid should have the _ISO suffix as explained in Available grid configurations). This refers to two different domain_cfg with or without cavities. In the case of open cavities, ocean cells are below the surface and follow the ice shelf boundary so top_level>1.

Prescribed maps of ice shelf and iceberg melt are in {ini_dir}/nemo/climatology:

  • isfmlt_cav_{grid}.nc if open cavities (computed from a coupled run with eORCA1 and thermodynamic ice shelf melt on)

  • isfmlt_par_{grid}.nc if closed cavities

  • icbmlt_{grid}.nc for icebergs (computed from a coupled run with eORCA1 and the Lagrangian icebergs on given a prescribed calving rate. A calving-rate climatology of 1265 Gt/year is available in {ini_dir}/nemo/climatology, regridded from Abello et al. (2015) using data from Rignot et al. (2013).)

Antarctica ids are disabled by default from the runoff mapper calving. They are given in model_config.rnfm.default_ant_ids. They are predefined in scripts/runtime/scriptlib/config-rnfm.yml

Different melt options for ice shelf and icebergs are available and can be activated in model_config.nemo.isf_fwf and model_config.nemo.icb_fwf:

  • No melt

    model_config:
      nemo:
      isf_fwf:
      icb_fwf:
    
  • Specified melt (default in NEMO-only configuration) from prescribed map described above

    model_config:
      nemo:
      isf_fwf: spe
      icb_fwf: spe
    
  • Global melt from OIFS excess snow from Antarctica ids redistributed with weight maps (default in coupled configuration) through runoff mapper to NEMO (default: 50% in ice shelf melt and 50% in iceberg melt)

    model_config:
      nemo:
      isf_fwf: oasis
      icb_fwf: oasis
    
  • Thermodynamic ice shelf melt (3 equation melt param)

    model_config:
      nemo:
      isf_fwf: 3eq
    
  • Lagrangian iceberg melt

    model_config:
      nemo:
      icb_fwf: lagrangian
    

Running NEMO Standalone

To run an ocean-only ECE4 experiment, you need to compile XIOS and NEMO without OASIS. It is recommended to edit user-settings.yml, setting components: [xios, nemo] and build using compile-components.yml ( Building the EC-Earth 4 components ).

Now, let’s go through the required changes in experiment-config-example.yml. Firstly, the run scripts rely on the model_config to recognize the experiment as a standalone NEMO experiment, i.e. without OASIS and OIFS.

base.context:
  model_config:
    components: [xios, nemo]

Then, the section that defines the atmospheric forcing for nemo-standalone simulations.

base.context:
  forcing:
    nemo_only:
      atmospheric: !noparse "{{model_config.nemo.all_forcings.CoreII_interannual}}"
      fixed_year: true  # true for a climatology, false for yearly varying or integer year for fixed-year forcing (e.g. 2000)

Here user must select a forcing set from a list predefined in scripts/runtime/scriptlib/config-nemo-only.yml. Currently available sets are: CoreII_interannual (provided in the official inidata); ERA5_HRES (available on MN5 and HPC2020); and JRA55_1.5 (available on MN5). It defines the conventions for file location, names and variables within them and a matching NEMO namelist template, e.g. scripts/runtime/templates/nemo/nemo-forcing_CoreII_interannual.namelist.j2. Advanced users are encouraged to contribute to this list by adding their own forcing sets. A fixed-year switch can be set to true for a climatology, false for yearly varying forcing or integer year for fixed-year forcing (e.g. 2000)

Finally, the launch options must reflect the NEMO STANDALONE configuration. Only the slurm-wrapper-taskset option has been tested successfully so far. The following example assumes a platform with 128 cores per compute node (e.g. ECMWF HPC2020).

base.context:
  job:
    launch:
      method: slurm-wrapper-taskset
    groups:
      - {nodes: 1, xios: 1, nemo: 127}

LPJG (LPJ-GUESS)

LPJG (LPJ-GUESS) is the dynamic global vegetation model used in EC-Earth4 for simulating vegetation dynamics and the land carbon cycle.

To include LPJG in your experiment, add ‘lpjg’ to the components list in experiment-config-example.yml:

base.context:
  model_config:
    components: [oifs, nemo, rnfm, xios, oasis, lpjg]

LPJG is coupled with the atmosphere (OIFS), it receives surface atmosphere fields and sends vegetation fields. It can also be coupled for CO2 exchanges.

Land ice and ice-sheet coupling

EC-Earth4 has two ways to represent land ice. The standalone oifs.landice path adds an OIFS-side land-ice surface scheme. The interactive ismm path adds PISM as a coupled component with feedback to OIFS through per-region forcing files. The two can be active together: ismm provides the region prefixes and OIFS layers the land-ice surface scheme on top.

Activation keys on whether ismm is in model_config.components and on the value of model_config.oifs.landice. The OIFS NAMECECFG block in templates/oifs/namelist.oifs.j2 emits ECE_CPL_ISMM when ismm is present, ECE_LANDICE when oifs.landice is true, and the NISMRGNS / ISMRGNPFX pair whenever either path is active. With a single-region ismm_regions, ISMRGNPFX carries trailing character-array padding after the region name; assert by content, not literal Fortran quoting.

Standalone OIFS land ice

Activate with:

base.context:
  model_config:
    oifs:
      landice: true
      landice_thresh: 0.5

landice_thresh defaults to 0.5. setup-landice.yml copies the per-region forcing file (initial_files.ece_forcing from experiment.ismm_regions) to {run_dir}/{name}_pism2ece.nc once at first leg; OIFS reads it each step. Override experiment.ismm_regions with a single-element list to run Greenland-only or Antarctica-only.

Interactive ice-sheet coupling (ismm)

Activate by adding ismm to the components list:

base.context:
  model_config:
    components: [oifs, nemo, rnfm, xios, oasis, lpjg, ismm]

The ism-mapper executable runs inside the main ECE allocation alongside OIFS / NEMO / LPJG, so the ismm slot must appear in job.groups. The canonical layout puts it on the first node together with rnfm and lpjg:

base.context:
  job:
    groups:
      - {nodes: 1, rnfm: 1, ismm: 1, lpjg: 10}
      # ... remaining nodes for oifs / xios / nemo

PISM itself runs in a nested sbatch with its own resources (hardcoded in pism_driver.py at 8 MPI tasks, 16 GB memory, 2-hour walltime); only the ism-mapper needs a slot in the main allocation.

Per leg, after the model launch and before the post-* stage, scripts/utils/coupling_ece4_pism_v2.py is called once per region. It builds atmosphere, elevation, ocean, and frontal-melt forcing for PISM, submits the nested PISM job (sbatch --wait), and produces the next-leg OIFS feedback. post-ismm.yml then moves the PISM restart and refreshes run-dir symlinks.

Each region in experiment.ismm_regions is a mapping:

base.context:
  experiment:
    ismm_regions:
      - name: grtes
        ref_grid_file: greenland_4km_ref_retreat.nc
        grid:
          nx: 421
          ny: 721
        initial_files:
          ece_forcing: grtes_forcing_4km.nc
          ism_forcing: grtes_pism_4km.nc
        cpl_dir: "{{ experiment.run_dir }}/ismm-coupling/grtes/"

name becomes the region prefix in OIFS and the subdirectory under {ini_dir}/pism/{aux,initial}/. The defaults in config-ismm.yml are Greenland (4 km) and Antarctica (16 km).

Set model_config.ismm.static_ice: true to run PISM with frozen geometry (forwards -no_mass -max_dt 1 to the nested job). Use it during spin-up and tuning; switch off for runs that should evolve ice.

Under static_ice: true PISM produces zero basal-melt and frontal-melt fluxes; the v2 driver and post-ismm.yml propagate those zeros through the OASIS coupling to the runoff mapper, so the ocean component receives zero ice-sheet freshwater. This is consistent with the frozen-geometry posture but is not equivalent to running dynamic-ice PISM with a real freshwater feedback into NEMO.

The ismm path requires three files per region under {ini_dir}/pism/aux/{name}/: the file named by the region’s ref_grid_file (default greenland_4km_ref_retreat.nc for grtes, antarctica_16km_ref_retreat.nc for antar), G128.nc, and pism2ifs_con_weights_{name}.nc. Up to four bias-correction files per region are read from the same directory if present, with names hardcoded in the RegionConfig for that region in coupling_ece4_pism_v2.py (defaults follow atm_{long_name}_{offsets,factor}.nc and ocean_{long_name}_{offsets,factor}.nc where long_name is greenland or antarctica); apply_bias() guards each branch with os.path.exists() and skips absent files.

Use the resubmit pattern documented under Multi-year runs via job.resubmit to chain multiple sim-years.

AIME (Antarctic Ice Melt Emulator)

The AIME represents freshwater fluxes from Antarctic land ice melt in response to ocean thermal forcing. It computes meltwater fluxes (ice shelf basal melt and iceberg melt) combining temperatures inside ice shelf cavities with input data, i.e. basal melt sensitivies and Linear response functions from standalone ice sheet models. A description of the previous (ECEarth3) version of AIME is available in the ECE3-ESM manuscript. The sources/aime/src folder contains the AIME main python script (ThetaoDrivenFreshwaterForcingAnomalies_ece4.py), which is run as a post-processing job at the end of a EC-Earth4 leg.

When aime_fwf is undefined (commented out), AIME is enabled by default for the eORCA1L75_ISO grid. For other grids, it will be turned off unless the user activates it by setting aime_fwf: true (requires calving inidata and, possibly, changes in the python scripts). For the eORCA1L75_ISO grid activation of AIME implies setting both isf_fwf and icb_fwf to spe, i.e. using the AIME-computed melt maps for both ice shelf and iceberg melt:

nemo:
  grid: eORCA1L75_ISO
  # Decide which type of freshwater flux treatment (fwf) you want to use for ice shelves (isf_fwf) and icebergs (icb_fwf)
      # default oasis if oifs component else spe
      isf_fwf: spe     # spe (melt from file), 3eq (thermodynamic melt), oasis (rnfm excess snow)
      icb_fwf: spe     # spe (melt from file), lagrangian (interactice icebergs), oasis (rnfm excess snow)
      aime_fwf: true

Note that AIME works only with simulations with 1 year legs.

CO2 Coupling

EC-Earth4 supports an interactive CO2 tracer in the atmosphere, with optional coupling to the land (LPJG) and ocean (PISCES) components for full carbon cycle simulations.

The CO2 configuration is set under model_config.oifs.co2. All options are false by default.

base.context:
  model_config:
    oifs:
      co2:
        tracer: true          # Activate CO2 tracer (true/false)
        cpl_lpjg: true        # Couple with LPJ-GUESS (true/false); OIFS sends CO2 ppm, LPJG sends CO2 fluxes
        cpl_pisces: true      # Couple with PISCES (true/false); OIFS sends CO2 ppm, PISCES sends CO2 fluxes
        init_val: 280         # Initial CO2 value in ppm used to scale icmgginit; if not provided, uses input4MIPS for experiment start year
        debug: false          # Enable debug output in namcouple and extra output in OIFS

To enable the CO2 tracer, set tracer to true. Then, set cpl_lpjg to true to couple with LPJG for land carbon fluxes, and/or cpl_pisces to true for ocean carbon fluxes via PISCES. The debug flag is used for development and allows to compute diagnostics for global C conservation.

Exotic experiment setup

There might be situations where you need to override specific configuration for peculiar experiments, e.g. change the timestep of a specific resolution, change the coupling frequency, etc. This can be done by hacking the scriptlib scripts of your desired experiment, or standardized with a control flag in the experiment configuration under the exotic_experiment section, which is empty by default.

All the changes related to exotic experiments are implemented in the scriptlib/config-exotic-experiment.yml file. Further configuration can be designed and activated in a similar way for other specific experiments.

Available exotic experiments

eocene

A specific configuration for Paleoclimate Eocene runs has been implemented.

To activate the Eocene configuration, set the eocene flag to true in the experiment configuration:

base.context:
  experiment:
    exotic_experiment:
      eocene: true

When activated, the following changes are applied to the model configuration:

  • OIFS timestep: Set to 2700 seconds (reduced from default for stability with paleoclimate conditions)

  • NEMO timestep: Set to 3600 seconds (as for OIFS)

  • Eocene warm fix: Enabled (warm_fix: true) to numerical instabilities in the convection scheme due to warm climate conditions

Warning

The Eocene configuration will abort with an error if you attempt to use it with grid resolutions other than TL63L31 for OIFS and PALEORCA2L31 for NEMO, as it has only been tested with these grids.

Initial data location

The directory with initial data for EC-Earth 4 is configured by the parameter experiment.ini_dir:

base.context:
  experiment:
    ini_dir: /path/to/inidata

As this is usually provided once for all users on a certain HPC system, it is configured in the platform configuration file. This is, however, entirely possible to put this parameter in another file.

ECMWF HPC2020

While the platform file points to the dataset for the latest release, it is possible to use the dataset attached to an older release by setting in your experiment YAML file:

experiment:
  ini_dir: "/hpcperm/gdjk/ece-4-inidata_4.x.y"

where 4.x.y refer to the release version. This works from 4.1.5 onward.

Data repository

Data repository for various EC-Earth 4 input files that are used by optional features (e.g. nudging) is set by experiment.repo_dir in the platform configuration file:

base.context:
  experiment:
    repo_dir: /path/to/repository

Running batch jobs from ScriptEngine

ScriptEngine can send jobs to the SLURM batch system when the scriptengine-tasks-hpc package is installed, which is done automatically if the environment.yml file has been used to create the Python virtual environment, as described in Creating the Python virtual environment. Here is an example of using the hpc.slurm.sbatch task:

# Submit batch job
hpc.slurm.sbatch:
  account: my_slurm_account
  nodes: 14
  time: !noparse 0:30:00
  job-name: "ece4-{{experiment.id}}"
  output: ece4.out
  error: ece4.out

What this task does is to run the entire ScriptEngine command, including all scripts given to se at the command line, as a batch job with the given arguments (e.g. account, number of nodes, and so on).

As a simplified example, a ScriptEngine script such as:

- hpc.slurm.sbatch:
    account: my_slurm_account
    nodes: 1
    time: 5
- base.echo:
    msg: Hello from batch job!

would in the first place submit a batch job and then stop. When the batch job executes, the first task (hpc.slurm.sbatch) would execute again, but do nothing because it already runs in a batch job. Then, the next task (base.echo) would be executed, writing the message to standard output in the batch job.

Note that in the default runscript examples, submitting the job to SLURM is done behind the scenes in scriptlib/submit.yml. The actual configuration for the batch job, such as account, allocated resources, etc, is configured according to the chosen launch option, as described below.

Launch options

The ScriptEngine runtime environment supports different ways to start the actual model run once the jobs is executed by the batch system:

  • SLURM heterogeneous jobs (slurm-hetjob)

  • SLURM multiple program configuration and taskset process/thread pinning (slurm-mp-taskset)

  • SLURM wrapper with taskset and node groups (slurm-wrapper-taskset)

  • SLURM job with generic shell script template (slurm-shell)

Each option has advantages and disadvantages and they come also with different configuration parameters. The choice of an option might affect the performance and efficiency of the model run on a given HPC system. Moreover, not all options might be supported on all systems.

SLURM heterogeneous jobs

This launch option uses the SLURM heterogeneous job support to start the EC-Earth 4 experiment. Compute nodes will not be shared between different model components. This option will therefore often lead to some idle cores, limiting the efficiency particularly for systems with many cores per node. It is, on the other hand, rather easy to configure and fairly portable across system and therefore a good choice to start with.

Here is a complete configuration example for the slurm-hetjob launch option using SLURM heterogeneous jobs:

job:
  launch:
    method: slurm-hetjob
  oifs:
    ntasks: 288  # number of OIFS processes (MPI tasks)
    ntasks_per_node: 16  # number of tasks per node for OIFS
    omp_num_threads:  1 # number of OpenMP threads per OIFS process
  nemo:
    ntasks: 96  # number of NEMO processes (MPI tasks)
    ntasks_per_node: 16  # number of tasks per node for NEMO
  xios:
    ntasks: 1  # number of XIOS processes (MPI tasks)
    ntasks_per_node: 1  # number of tasks per node for XIOS
  slurm:
    sbatch:
      opts:
        # Options to be used for the sbatch command
        account: your_slurm_account
        time: !noparse 01:30:00  # one hour, thirty minutes
        output: !noparse "{{experiment.id}}.log"
        job-name: !noparse "ECE4_{{experiment.id}}"
    srun:
      # Arguments for the srun command (a list!)
      args: [
        --label,
        --kill-on-bad-exit,
      ]

SLURM multiprog and taskset

This launch option uses the SLURM srun command together with

  • a HOSTFILE created on-the-fly

  • a multi-prog configuration file, which uses

  • the taskset command to set the CPU’s affinity for MPI processes and OpenMP threads

The slurm-mp-taskset option is configured very similar to srun-hetjob. The following example configures the option to use 4 OpenMP threads for OpenIFS, assuming 16 cores per node:

job:
  launch:
    method: slurm-mp-taskset
  oifs:
    ntasks: 288  # number of OIFS processes (MPI tasks)
    ntasks_per_node: 4  # number of tasks per node for OIFS
    omp_num_threads:  4 # number of OpenMP threads per OIFS process

  # remaining configuration same as for slurm-hetjob

This launch option will share the first node between XIOS and either the AMIP Forcing-reader (for atmosphere-only) or the Runoff-mapper (for GCM). This is an improvement over slurm-hetjob but will still lead to idle cores in many cases, because the remaining nodes are used exclusively for one component each.

SLURM wrapper and taskset

This launch option uses the SLURM srun command together with

  • a HOSTFILE created on-the-fly

  • a wrapper created on-the-fly, which uses

  • the taskset command to set the CPU’s affinity for MPI processes, OpenMP threads and hyperthreads

The slurm-wrapper-taskset option is configured per node. Instead of choosing the total number of tasks or nodes dedicated to each component, you specify the number of MPI processes for each component that will execute on each computing node. To avoid repeating the same node configuration over and over again, the configuration is structured in groups, each representing a set of nodes with the same configuration.

Warning

This launch method will only work if the computing nodes are allocated so that all the cpus are available for the execution of the job (typically using the --exclusive slurm option). Tasks are bound to execute in specific CPUs in the computing nodes, using an heuristic that assumes that all the CPUs are eligible for that.

The following simple example assumes a computer platform that has 128 cores per compute node, such as, for example, the ECMWF HPC2020 system. Three nodes are allocated to run a model configuration with four components: XIOS (1 process), OpenIFS (250 processes), NEMO (132) and the Runoff-mapper (1 process):

platform:
  cpus_per_node: 128
job:
  launch:
    method: slurm-wrapper-taskset
  groups:
    - {nodes: 1, xios: 1, oifs: 126, rnfm: 1}
    - {nodes: 2, oifs: 62, nemo: 66}

Two groups are defined in this example: the first comprising one node (running XIOS, OpenIFS and the Runoff-mapper), and the second group with two nodes running OpenIFS and NEMO.

Note

The platform.cpus_per_node parameter and the job.* parameters do not have to be defined in the same file, as suggested in the simple example. In fact, the platform.* parameters are usually defined in the platform configuration file, while job.* is usually found in the experiment configuration.

A second example illustrates the use of hybrid parallelization (MPI+OpenMP) for OpenIFS. The number of MPI tasks per node reflects that each process will be using more than one core:

platform:
  cpus_per_node: 128
job:
  launch:
    method: slurm-wrapper-taskset
  oifs:
    omp_num_threads: 2
    omp_stacksize: "64M"
  groups:
    - {nodes: 1, xios: 3, lpjg: 10, rnfm: 1}
    - {nodes: 2, oifs: 64}
    - {nodes: 2, oifs: 31, nemo: 66}

Note the configuration of job.oifs.omp_num_thread and job.oifs.omp_stacksize, which set the OpenMP environment for OpenIFS. The example utilises 3 MPI ranks for XIOS, 66 for NEMO, 10 for LPJ-Guess and 1 for the Runoff-mapper, and 159 MPI ranks for OpenIFS. However, each OpenIFS MPI rank has now two OpenMP threads, which results in 318 cores being used for the atmosphere.

Caution

The omp_stacksize parameter is needed on some platforms in order to avoid errors when there is too little stack memory for OpenMP threads (see OpenMP documentation). However, the example (and in particular the value of 64MB) should not be seen as a general recommendation for all platforms.

Overall, the slurm-wrapper-taskset launch method allows to share the compute nodes flexibly and in a controlled way between EC-Earth 4 components, which is useful to avoid idle cores. It can also help to decrease the computational costs of configurations involving components with high memory requirements, by allowing them to share nodes with components that need less memory.

Optional configuration

Some special configuration parameters may be required for the slurm-wrapper-taskset launcher on some machines.

Hint

Do not use these special parameters, unless you need to!

The first special parameter is platform.mpi_rank_env_var:

platform:
  mpi_rank_env_var: SLURM_PROCID

This is the name of an environment variable that must contain the MPI rank for each task at runtime. The default value is SLURM_PROCID, which should work for SLURM when using the srun command. Other possible choices that work for some platforms are PMI_RANK` or PMIX_RANK.

Another special parameter is platform.shell:

platform:
  shell: "/usr/bin/env bash"

It is used for the wrapper script to determine the appropriate shell. It must be configured if the given default value is not valid for your platform.

Implementation of Hyper-threading

The implementation of Hyper-threading in this launch method is restricted to OpenMP programs (only available for OpenIFS for now). It assumes that CPUs number i and i + platform.cpus_per_node correspond to the same physical core. By enabling the job.oifs.use_hyperthreads option, both cpus i and i + job.cpus_per_node are bound for the execution of that component. In this case, the number of OpenMP threads executing that component is twice the value given in job.oifs.omp_num_threads. The following example would configure OpenIFS to execute using 4 threads in the [0..127] range:

platform:
  cpus_per_node: 128
job:
  oifs:
    omp_num_threads: 4
    omp_stacksize: "64M"
    use_hyperthreads: false

while the following example would result in 8 OpenIFS threads, with 4 of them in the [0..127] range, and the others in [128..255]:

platform:
  cpus_per_node: 128
job:
  oifs:
    omp_num_threads: 4
    omp_stacksize: "64M"
    use_hyperthreads: true

There is also the possibility of using all the 256 logical cpus in the node to run more MPI tasks, as in the following example. In this case, the job.oifs.use_hyperthreads option must be disabled for every component (it is disabled by default):

platform:
  cpus_per_node: 256
job:
  oifs:
    use_hyperthreads: false

SLURM shell template

Caution

SLURM shell does not work on all HPCs. It is not supported and users are recommended to use another launcher if possible.

This launch option uses SLURM and a user-defined shell script template, which the user needs to specify using the configuration parameter job.launch.shell.script. The shell script template that the parameter refers to must exist in the scripts/runtime/templates/launch folder.

The slurm-shell launch option allows the user to create specific launch scripts for HPC platforms where other options do not work.

Currently available script templates:

  • run-srun-multiprog.sh: uses the srun command and compute nodes can be shared between different model components, recommended for systems with large nodes

  • run-gcc+ompi.sh: uses the mpirun command and compute nodes will not be shared between different model components

The following example uses the run-srun-multiprog.sh shell script template on the ecmwf-hpc2020 platform. The first node will be shared between XIOS and NEMO and the second node will be shared between OpenIFS and the Runoff-mapper.

job:
  launch:
      method: slurm-shell
      shell:
        script: run-srun-multiprog.sh
    oifs:
      ntasks: 127
      ntasks_per_node: 127
      omp_num_threads: 1
      omp_stacksize: "64M"

    nemo:
      ntasks: 127
      ntasks_per_node: 127
    xios:
      ntasks: 1
      ntasks_per_node: 1
  slurm:
    sbatch:
      opts:
        hint: nomultithread

  # remaining configuration same as for slurm-hetjob