Running simple experiments ========================== .. note:: Running even simple experiments with |ece4| is a complex task, mainly because model and experiment configurations can vary widely. Dependencies might differ from case to case (related to the user and computational environments) and varying configuration parameters will be available, depending on the experiment setup. Hence, the following part of the documentation needs probably more adaptation to your needs than the previously explained steps to build the model. Furthermore, a number of choices or features may be hard-coded in the scripts, or not yet supported at all. This will change as development of |ece4| and this documentation progresses. .. caution:: Make sure that the |ece4| environment is correctly created and activated. This includes also setting ``OASIS_BUILD_PATH`` and adding ``${OASIS_BUILD_PATH}/lib`` to ``LD_LIBRARY_PATH``, as described in :ref:`completing-the-environment`. To prepare for a simple test experiment, we start from the ScriptEngine example scripts provided in ``runtime/se`` and subdirectories: .. code-block:: shell ecearth4> cd runtime/se se> ls -1 scriptlib/ templates/ user-config-example.yml experiment-config-example.yml The ScriptEngine runtime environment (SE RTE) is split into separate YAML scripts, partly with respect to the model component they are dealing with, and partly with respect to the runtime stage they belong to. This is done in order to provide a modular approach for different configurations and avoid overly complex scripts and duplication. Most of the YAML scripts are provided in the ``scriptlib`` subdirectory. However, this splitting is not “build into” ScriptEngine or the SE RTE, it is entirely up to the user to adapt the scripts for her needs, possibly splitting up the scripts in vastly different ways. Main structure of the run scripts --------------------------------- The main run script logic is coded in ``scriptlib/main.yml``, which calls separate scripts for the one leg of the experiment (such as config, setup, pre, run, post, and resubmit), taking into account all model components needed for the model and experiment setup. However, ``scriptlib/main.yml`` and the scripts called therein rely on a correct and consistent set of configuration parameters covering the platform, user, model and experiment configuration. Hence, you have to provide configuration scripts together with ``scriptlib/main.yml``. A typical command to start an |ece4| experiment might look like: .. code-block:: shell se> se my-user-config.yml my-platform-config.yml my-experiment-config.yml scriptlib/main.yml where the ``my-*-config.yml`` scripts contain all needed configuration parameters. The name of the configuration scripts and how the parameters are split across them is not hardcoded anywhere in ScriptEngine or the runtime environment. Thus, you are free to adapt the configuration scripts to your needs. .. caution:: While you are free to adapt the configuration to your needs, you still need to make sure that the changes result in valid ScriptEngine scripts. For example, the order of scripts is important because some scripts may define context variables that other scripts refer to. In order to make it easier to get started, examples are provided. To start with, the same platform configuration file that was used to build the model should be used for the runtime environment. Thus, the model can be started with (still assuming that the current working directory is ``ecearth4/runtime/se``): .. code-block:: shell se> se \ my-user-config.yml \ ../../sources/se/platforms/my-platform-config.yml \ my-experiment-config.yml \ scriptlib/main.yml As for the experiment (including the model configuration) and user configuration, example scripts are provided. The experiment configuration contains, for example, .. code-block:: yaml+jinja base.context: experiment: id: TEST description: A new ECE4 experiment and, as part of the model configuration: .. code-block:: yaml+jinja base.context: model_config: components: [oifs, nemo, rnfm, xios, oasis] which configures the model in GCM configuration (which atmosphere, ocean, coupler, and I/O server). Assuming that all configuration parameters are set in the platform, experiment (and model), and user configuration scripts, the main run script ``scriptlib/main.yml`` proceeds with the following steps: .. code-block:: yaml+jinja # Submit job to batch system # ... # Configure 'main' and all components - base.include: src: "scriptlib/config-{{component}}.yml" ignore_not_found: yes loop: with: component in: "{{['main'] + main.components}}" # On first leg: setup 'main' and all components # ... # Pre step for 'main' and all components # ... # Start model run for leg # ... # Run post step for all components # ... # Monitoring # ... # Re-submit # ... Basically, the run script defines the following stages: 0. Configure the batch system and submit the job 1. ``config-*``, which sets configuration parameters for each component. 2. ``setup-*``, which runs, for each component, once at the beginning of the experiment. 3. ``pre-*``, which runs, for each component, at each leg before the executables. 4. ``run``, which starts the actual model run (i.e. the executables). 5. ``post-*``, which is run, for each component, at each leg after the model run has completed. 6. ``resubmit``, which submits the model for the following leg. 7. ``monitor``, which prepares data for online monitoring. Not every stage has to be present in each model run, and not all stages have to be present for all components. For all stages and components that are present, there is a corresponding ``scriptlib/-.yml`` script, which is included (via the ``base.include`` ScriptEngine task). Hence, the main implementation logic of ``scriptlib/main.yml`` is to go through all stages and execute all component scripts for that stage, if they exist. Note that there is an artificial model component, called ``main``, which is executed first in all stages. The corresponding ``scriptlib/-main.yml`` files includes tasks that are general and not associated with a particular component of the model. Running batch jobs from ScriptEngine ------------------------------------ ScriptEngine can send jobs to the SLURM batch system when the ``scriptengine-tasks-hpc`` package is installed, which is done automatically if the ``environment.yml`` file has been used to create the Python virtual environment, as described in :ref:`creating_virtual_environment`. Here is an example of using the ``hpc.slurm.sbatch`` task: .. code-block:: yaml+jinja # Submit batch job hpc.slurm.sbatch: account: my_slurm_account nodes: 14 time: !noparse 0:30:00 job-name: "ece4-{{experiment.id}}" output: ece4.out error: ece4.out What this task does is to run the entire ScriptEngine command, including all scripts given to ``se`` at the command line, as a batch job with the given arguments (e.g. account, number of nodes, and so on). As a simplified example, a ScriptEngine script such as: .. code-block:: yaml+jinja - hpc.slurm.sbatch: account: my_slurm_account nodes: 1 time: 5 - base.echo: msg: Hello from batch job! would in the first place submit a batch job and then stop. When the batch job executes, the first task (``hpc.slurm.sbatch``) would execute again, but do nothing because it already runs in a batch job. Then, the next task (``base.echo``) would be executed, writing the message to standard output in the batch job. Note that in the default runskript examples, submitting the job to SLURM is done behind the scenes in ``scriptlib/submit.yml``. The actual configuration for the batch job, such as account, allocated resources, etc, is configured according to the chosen launch option, as described below. Launch options -------------- The ScriptEngine runtime environment supports different ways to start the actual model run once the jobs is executed by the batch system: * SLURM heterogeneous jobs (``slurm-hetjob``) * SLURM multiple program configuration and ``taskset`` process/thread pinning (``slurm-mp-taskset``) * SLURM wrapper with taskset and node groups (``slurm-wrapper-taskset``) * SLURM job with generic shell script template (``slurm-shell``) Each option has advantages and disadvantages and they come also with different configuration parameters. The choice of an option might affect the performance and efficiency of the model run on a given HPC system. Moreover, not all options might be supported on all systems. SLURM heterogeneous jobs ~~~~~~~~~~~~~~~~~~~~~~~~ This launch option uses the `SLURM heterogeneous job `_ support to start the |ece4| experiment. Compute nodes will not be shared between different model components. This option will therefore often lead to some idle cores, limiting the efficiency particularly for systems with many cores per node. It is, on the other hand, rather easy to configure and fairly portable across system and therefore a good choice to start with. Here is a complete configuration example for the ``slurm-hetjob`` launch option using SLURM heterogeneous jobs: .. code-block:: yaml job: launch: method: slurm-hetjob oifs: ntasks: 288 # number of OIFS processes (MPI tasks) ntasks_per_node: 16 # number of tasks per node for OIFS omp_num_threads: 1 # number of OpenMP threads per OIFS process nemo: ntasks: 96 # number of NEMO processes (MPI tasks) ntasks_per_node: 16 # number of tasks per node for NEMO xios: ntasks: 1 # number of XIOS processes (MPI tasks) ntasks_per_node: 1 # number of tasks per node for XIOS slurm: sbatch: opts: # Options to be used for the sbatch command account: your_slurm_account time: !noparse 01:30:00 # one hour, thirty minutes output: !noparse "{{experiment.id}}.log" job-name: !noparse "ECE4_{{experiment.id}}" srun: # Arguments for the srun command (a list!) args: [ --label, --kill-on-bad-exit, ] SLURM multiprog and taskset ~~~~~~~~~~~~~~~~~~~~~~~~~~~ This launch option uses the SLURM ``srun`` command together with * a HOSTFILE created on-the-fly * a multi-prog configuration file, which uses * the ``taskset`` command to set the CPU's affinity for MPI processes and OpenMP threads The ``slurm-mp-taskset`` option is configured very similar to ``srun-hetjob``. The following example configures the option to use 4 OpenMP threads for OpenIFS, assuming 16 cores per node: .. code-block:: yaml job: launch: method: slurm-mp-taskset oifs: ntasks: 288 # number of OIFS processes (MPI tasks) ntasks_per_node: 4 # number of tasks per node for OIFS omp_num_threads: 4 # number of OpenMP threads per OIFS process # remaining configuration same as for slurm-hetjob This launch option will share the first node between XIOS and either the AMIP Forcing-reader (for atmosphere-only) or the Runoff-mapper (for GCM). This is an improvement over ``slurm-hetjob`` but will still lead to idle cores in many cases, because the remaining nodes are used exclusively for one component each. SLURM wrapper and taskset ~~~~~~~~~~~~~~~~~~~~~~~~~ This launch option uses the SLURM ``srun`` command together with * a HOSTFILE created on-the-fly * a wrapper created on-the-fly, which uses * the ``taskset`` command to set the CPU's affinity for MPI processes, OpenMP threads and hyperthreads The ``slurm-wrapper-taskset`` option is configured per node. Instead of choosing the total number of tasks or nodes dedicated to each component, you specify the number of MPI processes for each component that will execute on each computing node. To avoid repeating the same node configuration over and over again, the configuration is structured in groups, each representing a set of nodes with the same configuration. The following simple example assumes a computer platform that has 128 cores per comupte node, such as, for example, the ECMWF HPC2020 system. Three nodes are allocated to run a model configuration with four components: XIOS (1 process), OpenIFS (250 processes), NEMO (132) and the Runoff-mapper (1 process): .. code-block:: yaml platform: cpus_per_node: 128 job: launch: method: slurm-wrapper-taskset groups: - {nodes: 1, xios: 1, oifs: 126, rnfm: 1} - {nodes: 2, oifs: 62, nemo: 66} Two groups are defined in this example: the first comprising **one** node (running XIOS, OpenIFS and the Runoff-mapper), and the second group with **two** nodes running OpenIFS and NEMO. .. note:: The ``platform.cpus_per_node`` parameter and the ``job.*`` parameters do not have to be defined in the same file, as suggested in the simple example. In fact, the ``platform.*`` parameters are usually defined in the platform configuration file, while ``job.*`` is usually found in the experiment configuration. A second example illustrates the use of hybrid parallelization (MPI+OpenMP) for OpenIFS. The number of MPI tasks per node reflects that each process will be using more than one core: .. code-block:: yaml platform: cpus_per_node: 128 job: launch: method: slurm-wrapper-taskset oifs: omp_num_threads: 2 omp_stacksize: "64M" groups: - {nodes: 1, xios: 1, oifs: 63, rnfm: 1} - {nodes: 2, oifs: 64} - {nodes: 2, oifs: 31, nemo: 66} Note the configuration of ``job.oifs.omp_num_thread`` and ``job.oifs.omp_stacksize``, which set the OpenMP environment for OpenIFS. The example utilises the same number of MPI ranks for XIOS, NEMO and the Runoff-mapper, and 253 MPI ranks for OpenIFS. However, each OpenIFS MPI rank has now two OpenMP threads, which results in 506 cores being used for the atmosphere. .. caution:: The ``omp_stacksize`` parameter is needed on some platforms in order to avoid errors when there is too little stack memory for OpenMP threads (see `OpenMP documentation `_). However, the example (and in particular the value of 64MB) should not be seen as a general recommendation for all platforms. Overall, the ``slurm-wrapper-taskset`` launch method allows to share the compute nodes flexibly and in a controlled way between |ece4| components, which is useful to avoid idle cores. It can also help to decrease the computational costs of configurations involving components with high memory requirements, by allowing them to share nodes with components that need less memory. Optional configuration ...................... Some special configuration parameters may be required for the ``slurm-wrapper-taskset`` launcher on some machines. .. hint:: Do not use these special parameters, unless you need to! The first special parameter is ``platform.mpi_rank_env_var``: .. code-block:: yaml platform: mpi_rank_env_var: SLURM_PROCID This is the name of an environment variable that must contain the MPI rank for each task at runtime. The default value is `SLURM_PROCID`, which should work for SLURM when using the `srun` command. Other possible choices that work for some platforms are `PMI_RANK`` or `PMIX_RANK`. Another special parameter is ``platform.shell``: .. code-block:: yaml platform: shell: "/usr/bin/env bash" It is used for the wrapper script to determine the appropriate shell. It must be configured if the given default value is not valid for your platform. Implementation of Hyper-threading ................................. The implementation of Hyper-threading in this launch method is restricted to OpenMP programs (only available for OpenIFS for now). It assumes that CPUs number ``i`` and ``i + platform.cpus_per_node`` correspond to the same physical core. By enabling the ``job.oifs.use_hyperthreads`` option, both cpus ``i`` and ``i + job.cpus_per_node`` are bound for the execution of that component. In this case, the number of OpenMP threads executing that component is twice the value given in ``job.oifs.omp_num_threads``. The following example would configure OpenIFS to execute using 4 threads in the [0..127] range: .. code-block:: yaml platform: cpus_per_node: 128 job: oifs: omp_num_threads: 4 omp_stacksize: "64M" use_hyperthreads: false while the following example would result in 8 OpenIFS threads, with 4 of them in the [0..127] range, and the others in [128..255]: .. code-block:: yaml platform: cpus_per_node: 128 job: oifs: omp_num_threads: 4 omp_stacksize: "64M" use_hyperthreads: true There is also the possibility of using all the 256 logical cpus in the node to run more MPI tasks, as in the following example. In this case, the ``job.oifs.use_hyperthreads`` option must be disabled for every component (it is disabled by default): .. code-block:: yaml platform: cpus_per_node: 256 job: oifs: use_hyperthreads: false SLURM shell template ~~~~~~~~~~~~~~~~~~~~ This launch option uses SLURM and a user-defined shell script template, which the user needs to specify using the configuration parameter ``job.launch.shell.script``. The shell script template that the parameter refers to must exist in the ``runtime/se/templates/launch`` folder. The ``slurm-shell`` launch option allows the user to create specific launch scripts for HPC platforms where other options do not work. Currently available script templates: * ``run-srun-multiprog.sh``: uses the ``srun`` command and compute nodes can be shared between different model components, recommended for systems with large nodes * ``run-gcc+ompi.sh``: uses the ``mpirun`` command and compute nodes *will not* be shared between different model components The following example uses the ``run-srun-multiprog.sh`` shell script template on the ecmwf-hpc2020 platform. The first node will be shared between XIOS and NEMO and the second node will be shared between OpenIFS and the Runoff-mapper. .. code-block:: yaml job: launch: method: slurm-shell shell: script: run-srun-multiprog.sh oifs: ntasks: 127 ntasks_per_node: 127 omp_num_threads: 1 omp_stacksize: "64M" nemo: ntasks: 127 ntasks_per_node: 127 xios: ntasks: 1 ntasks_per_node: 1 slurm: sbatch: opts: hint: nomultithread # remaining configuration same as for slurm-hetjob The experiment schedule ----------------------- ScriptEngine supports recurrence rules (rrules, `RFC 5545 `_) via the Python `rrule module `_ in order to define schedules with recurring events. This is used in the SE RTE to specify the experiment schedule, with start date, leg restart dates, and end date. This allows a great deal of flexibility when defining the experiment, allowing for irregular legs with restarts at almost any combination of dates. .. warning:: Event though `rrules` provide a lot of flexibility for the experiment schedule, it is not certain that all parts of the SE RTE and the model code can deal with arbitrary start/restart dates. This feature is provided in order to not limit the definition of a schedule at a technical level in the RTE. A simple schedule with yearly restarts could look like: .. code-block:: yaml+jinja base.context: schedule: all: !rrule > DTSTART:19900101 RRULE:FREQ=YEARLY;UNTIL=20000101 which would define the start date of the experiment as 1990-01-01 00:00 and yearly restart on the 1st of January until the end date 2000-01-01 00:00 is reached, i.e. 10 legs. As another example, two-year legs from 1850 until 1950 would be defined as: .. code-block:: yaml+jinja base.context: schedule: all: !rrule > DTSTART:18500101 RRULE:FREQ=YEARLY;INTERVAL=2;UNTIL=19500101 Initial data ------------ The directory with initial data for |ece4| is configured by the parameter ``experiment.ini_dir``: .. code-block:: yaml base.context: experiment: ini_dir: /path/to/inidata As this is usually provided once for all users on a certain HPC system, it is configured in the platform configuration file. This is, however, entirely possible to put this parameter in another file. For now, the set if initial data can be downloaded from the SMHI Publisher at NSC, the link is given on the |ece4| `Tutorial Wiki page `_ at the EC-Earth Development Portal. .. note:: An account is needed to access the EC-Earth Development Portal, because the information is restricted to EC-Earth consortium member institutes.