Running COBALT
This document provides a comprehensive guide to running COBALT observations. It covers the preparation steps, workflow details, and instructions for both single-node and multi-node setups.
Note
If you encounter any problems during the preparation process, refer to the Common Issues section for troubleshooting tips.
Contents:
Preparation: Learn how to set up a COBALT environment and load the required modules. See Preparation.
Workflow: Understand the sequence of scripts and processes involved in running a COBALT observation. See Workflow.
Running an Observation (Single Node): Step-by-step instructions for running an observation on a single node. See Running an Observation.
Running an Observation (Multi-Node): Instructions for running an observation across multiple nodes. See Running a multi-node Observation.
Common Issues: Troubleshooting tips for common problems encountered while running COBALT. See Common Issues.
Preparation
For convenience, a script to load all required modules is provided in the COBALT repository. By including this bash script in the .bashrc file, the modules are loaded automatically when a new shell is opened.
To use this script:
git clone https://git.astron.nl/cobalt/cobalt-installation
vi ~/.bashrc
# Now, add the following line to the end of the file:
source /data/cobalt/spack/activate_cobalt.sh
Workflow
Running a COBALT observation involves several scripts that interact with each other. The scripts in GPUProc/{src,test} contain several scripts to start observations.
The call chain is as follows:
![digraph observation_flow {
rankdir=LR;
// Primary Nodes (Main Steps)
node [shape=box, style=filled, fillcolor="#438dd5", fontcolor=white, fontsize=14];
"OnlineControl (MAC)";
"startBGL.$h";
"runObservation.sh";
"mpirun.sh";
"[mpirun] rtcp";
// Intermediate Nodes
node [fillcolor="#5a9bd4", shape=ellipse, fontsize=14];
"stopBGL.sh";
"tMACfeedback.sh";
"tProductionParsets.sh";
"testParset.sh";
// Connections
"runObservation.sh" -> "OnlineControl (MAC)";
"startBGL.$h" -> "runObservation.sh";
"stopBGL.sh" -> "runObservation.sh";
"runObservation.sh" -> "mpirun.sh";
"mpirun.sh" -> "[mpirun] rtcp";
"OnlineControl (MAC)" -> "stopBGL.sh";
"tMACfeedback.sh" -> "runObservation.sh";
"testParset.sh" -> "runObservation.sh";
"tProductionParsets.sh" -> "runObservation.sh";
}](_images/graphviz-546d0bcb991edba4446acf66d17e8512fb6dbafd.png)
The central thread is the main call chain, with the following roles and responsibilities:
startBGL.sh
Syntax:
./startBGL.sh 1 2 3 $PARSET $OBSID
Description:
Starts the observation in the background by calling runObservation.sh.
Augments the parset with keys from $LOFARROOT/etc/parset-additions.d/*.
Redirects output to $LOFARROOT/var/log/rtcp-$OBSID.log.
Tracks the PID for stopBGL.sh.
Informs OnlineControl when the observation starts.
runObservation.sh
Syntax:
./runObservation.sh $PARSET
Description:
Runs the observation in the foreground and calls mpirun.sh.
Optionally augments the parset with COBALT-specific settings.
Forced localhost execution with (e.g., 4) MPI nodes can be enabled using -l 4.
Copies Observation$OBSID_feedback to OnlineControl (ccu001).
Reports ABORT or FINISHED to OnlineControl (can be suppressed with -F).
Creates a PID file for
stopBGL.sh.
mpirun.sh
Syntax:
mpirun.sh -x LOFARROOT=$LOFARROOT \ -H `mpi_node_list -n $PARSET` `which rtcp` \ $PARSET
Description:
Acts as mpirun, but wraps the currently selected MPI library (OpenMPI, MVAPICH2, or no MPI).
Starts the COBALT software by launching the rtcp executable with the parset.
stopBGL.sh
Syntax:
./stopBGL.sh 1 $OBSID
Description:
Stops the corresponding observation using the saved PID.
Signals a running runObservation.sh to finish or abort.
rtcp
Description:
The actual program that performs the correlation/beamforming.
Running an Observation
To run a observation in COBALT, one should provide a so-called parset file. The parset is either generated by the Radio Observatory software (the Scheduler) or hand-crafted.
After a parset file has been selected, a new observation run can be initiated. The following steps are required:
Note
For simplicity, the following example assumes that COBALT is on ran on a single node, i.e., localhost. For a multi-node setup, please refer to the Running an Observation (Multi-Node) section.
Change the
$PARSETvariable such that point to the correct file, i.e.:
export PARSET=/path/to/parset
Now, the
$LOFARROOTenvironment variable should be set to the root directory of the COBALT installation. For instance, if COBALT is installed incobalt, it can be set by running:
export LOFARROOT=/home/[user]/cobalt
For simplicity, the bin directory can be added to the $PATH as well:
export PATH=$LOFARROOT/build/gnucxx11_CEP4_optarch/bin:$PATH
Prepare the LOFARROOT subdirectories:
mkdir -p $LOFARROOT/nfs/parset/
mkdir -p $LOFARROOT/var/log/
mkdir -p $LOFARROOT/var/run/
mkdir -p $LOFARROOT/nfs/feedback/
To start a observation, run the following command format shall be used.
pkill outputProc # Ensure that no stale outputProc processes are running
runObservation.sh -l -P [PID_file] -c [PIPE_file] [PARSET] [OBSERVATION_ID]
Note that the -l and -P flags are optional. The -l flag is used to run the observation on localhost, while the -P flag is used to create a PID file.
Example:
This command starts an observation using the specified parset file and observation ID, while storing the process ID in the given PID file and using the specified pipe for communication. The observation ID shall match the one in the parset file.
runObservation.sh -l -P cobalt/var/run/rtcp-5.pid -c cobalt/var/run/rtcp-5.pipe ../../../CobaltTest/test/tManyPartTABOutput.parset 123882
Arguments:
-l: Run solely on localhost using a specified number of MPI processes. This is useful for isolated testing.-P cobalt/var/run/rtcp-5.pid: Specifies the path to the PID file where the process ID of the observation will be stored.-c cobalt/var/run/rtcp-5.pipe: Specifies the path to the named pipe used for communication.../../../CobaltTest/test/tManyPartTABOutput.parset: The parset file that defines the observation parameters.123882: The observation ID, which uniquely identifies the observation.
All available command line options for runObservation.sh are listed below:
Option |
Description |
|---|---|
|
Do NOT augment parset. |
|
Do NOT add broken antenna information. |
|
Run with check tool specified in environment variable |
|
Do NOT send data points to a PVSS gateway. |
|
Create PID file. |
|
Dummy run: don’t execute anything. |
|
Run solely on localhost using |
|
Enable profiling. This enforces:
|
|
Add option |
|
Propagate environment variable |
Running an multi-node Observation
WIP
Common Issues
Problem |
Cause |
Solution |
|---|---|---|
|
The likely cause is that the key fingerprint for the host has changed or has not been added to the known hosts file before |
Try to log in to the SSH server manually and accept the key fingerprint. Then try to run the job again. |
No such file or directory or related errors |
Currently, the runObservation.sh script is not able to create all required directories in the LOFARROOT directory. |
Create the required directories manually (also see Running an Observation). |
Cannot open IERSeop97 table |
This error is related to casacore. |
Create a file |
The MPI_Alloc_mem() function was called before MPI_INIT was invoked. |
MPI is not initialized. |
Ensure that COBALT was built with MPI support, i.e. USE_MPI is set to ON in the CMake configuration. |