Running COBALT

This document provides a comprehensive guide to running COBALT observations. It covers the preparation steps, workflow details, and instructions for both single-node and multi-node setups.

Note

If you encounter any problems during the preparation process, refer to the Common Issues section for troubleshooting tips.

Contents:

  1. Preparation: Learn how to set up a COBALT environment and load the required modules. See Preparation.

  2. Workflow: Understand the sequence of scripts and processes involved in running a COBALT observation. See Workflow.

  3. Running an Observation (Single Node): Step-by-step instructions for running an observation on a single node. See Running an Observation.

  4. Running an Observation (Multi-Node): Instructions for running an observation across multiple nodes. See Running a multi-node Observation.

  5. Common Issues: Troubleshooting tips for common problems encountered while running COBALT. See Common Issues.

Preparation

For convenience, a script to load all required modules is provided in the COBALT repository. By including this bash script in the .bashrc file, the modules are loaded automatically when a new shell is opened.

To use this script:

git clone https://git.astron.nl/cobalt/cobalt-installation

vi ~/.bashrc
# Now, add the following line to the end of the file:
source /data/cobalt/spack/activate_cobalt.sh

Workflow

Running a COBALT observation involves several scripts that interact with each other. The scripts in GPUProc/{src,test} contain several scripts to start observations.

The call chain is as follows:

digraph observation_flow {
    rankdir=LR;

    // Primary Nodes (Main Steps)
    node [shape=box, style=filled, fillcolor="#438dd5", fontcolor=white, fontsize=14];
    "OnlineControl (MAC)";
    "startBGL.$h";
    "runObservation.sh";
    "mpirun.sh";
    "[mpirun] rtcp";

    // Intermediate Nodes
    node [fillcolor="#5a9bd4", shape=ellipse, fontsize=14];
    "stopBGL.sh";
    "tMACfeedback.sh";
    "tProductionParsets.sh";
    "testParset.sh";

    // Connections
    "runObservation.sh" -> "OnlineControl (MAC)";
    "startBGL.$h" -> "runObservation.sh";
    "stopBGL.sh" -> "runObservation.sh";
    "runObservation.sh" -> "mpirun.sh";
    "mpirun.sh" -> "[mpirun] rtcp";

    "OnlineControl (MAC)" -> "stopBGL.sh";
    "tMACfeedback.sh" -> "runObservation.sh";
    "testParset.sh" -> "runObservation.sh";
    "tProductionParsets.sh" -> "runObservation.sh";
}

The central thread is the main call chain, with the following roles and responsibilities:

  1. startBGL.sh

    Syntax:

    ./startBGL.sh 1 2 3 $PARSET $OBSID
    

    Description:

    • Starts the observation in the background by calling runObservation.sh.

    • Augments the parset with keys from $LOFARROOT/etc/parset-additions.d/*.

    • Redirects output to $LOFARROOT/var/log/rtcp-$OBSID.log.

    • Tracks the PID for stopBGL.sh.

    • Informs OnlineControl when the observation starts.

  2. runObservation.sh

    Syntax:

    ./runObservation.sh $PARSET
    

    Description:

    • Runs the observation in the foreground and calls mpirun.sh.

    • Optionally augments the parset with COBALT-specific settings.

    • Forced localhost execution with (e.g., 4) MPI nodes can be enabled using -l 4.

    • Copies Observation$OBSID_feedback to OnlineControl (ccu001).

    • Reports ABORT or FINISHED to OnlineControl (can be suppressed with -F).

    • Creates a PID file for stopBGL.sh.

  3. mpirun.sh

    Syntax:

    mpirun.sh -x LOFARROOT=$LOFARROOT \
    -H `mpi_node_list -n $PARSET` `which rtcp` \
    $PARSET
    

    Description:

    • Acts as mpirun, but wraps the currently selected MPI library (OpenMPI, MVAPICH2, or no MPI).

    • Starts the COBALT software by launching the rtcp executable with the parset.

  4. stopBGL.sh

    Syntax:

    ./stopBGL.sh 1 $OBSID
    

    Description:

    • Stops the corresponding observation using the saved PID.

    • Signals a running runObservation.sh to finish or abort.

  5. rtcp

    Description:

    • The actual program that performs the correlation/beamforming.

Running an Observation

To run a observation in COBALT, one should provide a so-called parset file. The parset is either generated by the Radio Observatory software (the Scheduler) or hand-crafted.

After a parset file has been selected, a new observation run can be initiated. The following steps are required:

Note

For simplicity, the following example assumes that COBALT is on ran on a single node, i.e., localhost. For a multi-node setup, please refer to the Running an Observation (Multi-Node) section.

  1. Change the $PARSET variable such that point to the correct file, i.e.:

export PARSET=/path/to/parset
  1. Now, the $LOFARROOT environment variable should be set to the root directory of the COBALT installation. For instance, if COBALT is installed in cobalt, it can be set by running:

export LOFARROOT=/home/[user]/cobalt

For simplicity, the bin directory can be added to the $PATH as well:

export PATH=$LOFARROOT/build/gnucxx11_CEP4_optarch/bin:$PATH
  1. Prepare the LOFARROOT subdirectories:

mkdir -p $LOFARROOT/nfs/parset/
mkdir -p $LOFARROOT/var/log/
mkdir -p $LOFARROOT/var/run/
mkdir -p $LOFARROOT/nfs/feedback/
  1. To start a observation, run the following command format shall be used.

pkill outputProc # Ensure that no stale outputProc processes are running

runObservation.sh -l -P [PID_file] -c [PIPE_file] [PARSET] [OBSERVATION_ID]

Note that the -l and -P flags are optional. The -l flag is used to run the observation on localhost, while the -P flag is used to create a PID file.

Example:

This command starts an observation using the specified parset file and observation ID, while storing the process ID in the given PID file and using the specified pipe for communication. The observation ID shall match the one in the parset file.

runObservation.sh -l -P cobalt/var/run/rtcp-5.pid -c cobalt/var/run/rtcp-5.pipe ../../../CobaltTest/test/tManyPartTABOutput.parset 123882

Arguments:

  • -l: Run solely on localhost using a specified number of MPI processes. This is useful for isolated testing.

  • -P cobalt/var/run/rtcp-5.pid: Specifies the path to the PID file where the process ID of the observation will be stored.

  • -c cobalt/var/run/rtcp-5.pipe: Specifies the path to the named pipe used for communication.

  • ../../../CobaltTest/test/tManyPartTABOutput.parset: The parset file that defines the observation parameters.

  • 123882: The observation ID, which uniquely identifies the observation.

All available command line options for runObservation.sh are listed below:

runObservation.sh Command Line Options

Option

Description

-A

Do NOT augment parset.

-B

Do NOT add broken antenna information.

-C

Run with check tool specified in environment variable LOFAR_CHECKTOOL.

-F

Do NOT send data points to a PVSS gateway.

-P

Create PID file.

-d

Dummy run: don’t execute anything.

-l

Run solely on localhost using nprocs MPI processes (isolated test).

-p

Enable profiling. This enforces:

  • Sequential kernel execution.

  • One SubbandProc/GPU (no overlap between kernels and GPU transfers).

  • Performance statistics collection.

-o

Add option KEY=VALUE to the parset.

-x

Propagate environment variable KEY=VALUE.

Running an multi-node Observation

WIP

Common Issues

Problem

Cause

Solution

mpirun hangs

The likely cause is that the key fingerprint for the host has changed or has not been added to the known hosts file before

Try to log in to the SSH server manually and accept the key fingerprint. Then try to run the job again.

No such file or directory or related errors

Currently, the runObservation.sh script is not able to create all required directories in the LOFARROOT directory.

Create the required directories manually (also see Running an Observation).

Cannot open IERSeop97 table

This error is related to casacore.

Create a file ~/.casarc with the following content: measures.directory: /var/software/casa-measures

The MPI_Alloc_mem() function was called before MPI_INIT was invoked.

MPI is not initialized.

Ensure that COBALT was built with MPI support, i.e. USE_MPI is set to ON in the CMake configuration.