How-to guides
This section walks the user through specific steps for solving a real-world problem.
Table of contents
Generate code skeletons for projects, plugins, components, etc
Work with factory metadata for collecting statistics, etc
Create a service which can be shared between different plugins
Handle both real and simulated data
Handle EPICS data
Use JANA with ROOT
Persist the entire DST using ROOT
Checkpoint the entire DST using ROOT
Build and filter events (“L1 and L2 triggers”)
Process subevents
Migrate from JANA1 to JANA2
Building JANA
First, set your $JANA_HOME
environment variable. This is where the executables, libraries, headers, and plugins get installed. (It is also where we will clone the source). CMake will install to $JANA_HOME
if it is set (it will install to ${CMAKE_BINARY_DIR}/install
if not). Be aware that although CMake usually defaults CMAKE_INSTALL_PREFIX
to /usr/local
, we have disabled this because we rarely want this in practice, and we don’t want the build system picking up outdated headers and libraries we installed to /usr/local
by accident. If you want to set JANA_HOME=/usr/local
, you are free to do so, but you must do so deliberately.
Next, set your build directory. This is where CMake’s caches, logs, intermediate build artifacts, etc go. The convention is to name it build
and put it in the project’s root directory. If you are using CLion, it will automatically create a cmake-build-debug
directory which works just fine.
Finally, you can cd into your build directory and build and install everything the usual CMake way.
export JANA_VERSION=v2.0.5 # Convenient to set this once for specific release
export JANA_HOME=${PWD}/JANA${JANA_VERSION} # Set full path to install dir
git clone https://github.com/JeffersonLab/JANA2 --branch ${JANA_VERSION} ${JANA_HOME} # Get JANA2
mkdir build # Set build dir
cd build
cmake3 ${JANA_HOME} # Generate makefiles # Generate makefiles
make -j8 install # Build (using 8 threads) and install
source ${JANA_HOME}/bin/jana-this.sh # Set PATH (and other envars)
jana -Pplugins=JTest # Run JTest plugin to verify successful install
Note: If you want to use a compiler other than the default one on your system, it is not enough to modify your $PATH, as CMake ignores this by design. You either need to set the CXX
environment variable or the CMAKE_CXX_COMPILER
CMake variable.
By default, JANA will look for plugins und=$JANA_HOME/plugins`. For your plugins to propagate here, you have to install
them. If you don’t want to do that, you can also set the environment variable $JANA_PLUGIN_PATH
to point to the build directory of your project. JANA will report where exactly it went looking for your plugins and what happened when it tried to load them if you set the JANA config jana:debug_plugin_loading=1
.
jana -Pplugins=JTest -Pjana:debug_plugin_loading=1
Using JANA in a CMake project
To use JANA in a CMake project, simply add $JANA_HOME/lib/cmake/JANA
to your CMAKE_PREFIX_PATH
, or alternatively, set the CMake variable JANA_DIR=$JANA_HOME/lib/cmake/JANA
.
Using JANA in a non-CMake project
To use JANA in a non-CMake project:
Source
$JANA_HOME/bin/jana-this.sh
to set the environment variables needed for JANA’s dependenciesUse
$JANA_HOME/bin/jana-config --cflags
to obtain JANA’s compiler flagsUse
$JANA_HOME/bin/jana_config --libs
to obtain JANA’s linker flags
How to benchmark JANA
JANA includes a built-in facililty for benchmarking programs and plugins. It produces a scalability curve by repeatedly pausing execution, adding additional worker threads, resuming execution, and measuring the resulting throughput over fixed time intervals. There is an additional option to measure the scalability curves for a matrix of different affinity and locality strategies. This is useful when your hardware architecture has nonuniform memory access.
In case you don’t have JANA code ready to benchmark yet, JANA provides a plugin called JJTest
which can simulate different workloads. JTest
runs a dummy algorithm on randomly generated data, using a user-specified event size and number of FLOPs (floating point operations) per event. This gives a rough estimate of your code’s performance. If you don’t know the number of FLOPs per event, you can still compare the performance of JANA on different hardware architectures just by using the default settings.
Here is how you do benchmarking with JTest
:
# Obtain and build JANA, if you haven't already
git clone http://github.com/JeffersonLab/JANA2
cd JANA2
mkdir build
mkdir install
export JANA_HOME=`pwd`/install
cmake -S . -B build
cmake --build build -j 10 --target install
cd install/bin
# Run the benchmarking
./jana -b -Pplugins=JTest
# -b enables benchmarking
# -Pplugins=JTest pulls in the JTest plugin
# Additional configuration options are listed below
# Benchmarking may take awhile. You can terminate any time without
# losing data by pressing Ctrl-C _once or twice_. If you press it three
# times or more, it will hard-exit and won't write the results file.
cd JANA_Test_Results
# Raw data CSV files are in `samples.dat`
# Average and RMS rates are in `rates.dat`
# Show the scalability curve in a matplotlib window
./jana-plot-scaletest.py
If you already have a JANA project you would like to benchmark, all you have to do is build and install it the way you usually would, and then run
jana -b -Pplugins=$MY_PLUGIN
# Or
my_jana_app -b
cd JANA_Test_Results
# Raw data CSV files are in `samples.dat`
# Average and RMS rates are in `rates.dat`
# Show the scalability curve in a matplotlib window
./jana-plot-scaletest.py
These are the relevant configuration parameters for JTest
:
Name |
Units |
Default |
Description |
---|---|---|---|
benchmark:nsamples |
int |
15 |
Number of measurements made for each thread count |
benchmark:minthreads |
int |
1 |
Minimum thread count |
benchmark:maxthreads |
int |
ncores |
Maximum thread count |
benchmark:threadstep |
int |
1 |
Thread count increment |
benchmark:resultsdir |
string |
JANA_Test_Results |
Directory name for benchmark test results |
Detect when a group of events has finished
Sometimes it is necessary to organize events into groups, process the events the usual way, but then notify some component whenever a group has completely finished. The original motivating example for this was EPICS data, which was maintained as a bundle of shared state. Whenever updates arrived, JANA1 would emit a ‘barrier event’ which would stop the data flow until all in-flight events completed, so that preceding events could only read the old state and subsequent events could only read the new state. We now recommend EPICS data be handled differently. Nevertheless this pattern still occasionally comes into play.
One example is a JEventProcessor which writes statistics for the previous run every time the run number changes. This is trickier than it first appears because events may arrive out of order. The JEventProcessor can easily maintain a set of run numbers it has already seen, but it won’t know when it has seen all of the events for a given run number. For that it needs an additional piece of information: the number of events emitted with that run number. Complicating things further, this information needs to be read and modified by both the JEventSource and the JEventProcessor.
Our current recommendation is a JService
called JEventGroupManager
. This is designed to be used as follows:
A JEventSource should keep a pointer to the current JEventGroup, which it obtains through the JEventGroupManager. Groups are given a unique id, which
Whenever the JEventSource emits a new event, it should insert the JEventGroup into the JEvent. The event is now tagged as belonging to that group.
When the JEventSource moves on to the next group, e.g. if the run number changed, it should close out the old group by calling JEventGroup::CloseGroup(). The group needs to be closed before it will report itself as finished, even if there are no events still in-flight.
A JEventProcessor should retrieve the JEventGroup object by calling JEvent::Get. It should report that an event is finished by calling JEventGroup::FinishEvent. Please only call this once; although we could make JEventGroup robust against repeated calls, it would add some overhead.
A JEventSource or JEventProcessor (or technically anything whose lifespan is enclosed by the lifespan of JServices) may then test whether this is the last event in its group by calling JEventGroup::IsGroupFinished(). A blocking version, JEventGroup::WaitUntilGroupFinished(), is also provided. This mechanism allows relatively arbitrary hooks into the event stream.
Stream data to and from JANA
The first question to ask is: What is the relationship between messages and events? Remember, a message is just a packet of data sent over the wire, whereas an event is JANA’s main unit of independent computation, corresponding to all data associated with one physics interaction. The answer will depend on:
What systems already exist upstream, and how difficult they are to change
The expected size of each event
Whether event building is handled upstream or within JANA
If events are large enough (>0.5MB), the cleanest thing to do is to establish a one-to-one relationship between messages and events. JANA provides JStreamingEventSource to make this convenient.
If events are very small, you probably want many events in one message. A corresponding helper class does not exist yet, but would be a straightforward adaptation of the above.
If upstream doesn’t do any event building (e.g. it is reading out ADC samples over a fixed time window) you probably want to have JANA determine physically meaningful event boundaries, maybe even incorporating a software L2 trigger. This is considerably more complicated, and is discussed in the event building how-to instead.
For the remainder of this how-to we assume that messages and events are one-to-one.
The second question to ask is: What transport should be used?
JANA makes it so that the message format and transport can be varied independently. The transport wrapper need only implement the JTransport interface, which is essentially just:
The key detail is that both send
and receive
should block until data has finished transferring to/from the JMessage
buffer so that the buffer may be accessed by the caller with no additional synchronization. If there are no pending messages, receive
should return TRYAGAIN
immediately so as not to block the event source. In contrast, send
must block until it succeeds, as otherwise there will be data loss.
An implementation already exists for ZeroMQ. See examples/JExample7/ZmqTransport.h
The final and most important question to ask is: What is the message format?
Message formats each get their own class, which must inherit from the JMessage and JEventMessage interfaces.
Using the JANA CLI
JANA is typically run like this:
$JANA_HOME/bin/jana -Pplugins=JTest -Pnthreads=8 ~/data/inputfile.txt
Note that the JANA executable won’t do anything until you provide plugins. A simple plugin is provided called JTest, which verifies that everything is working and optionally does a quick performance benchmark. Additional simple plugins are provided in src/examples
. Instructions on how to write your own are given in the Tutorial section.
Along with specifying plugins, you need to specify the input files containing the events you wish to process. Note that JTest ignores these and crunches randomly generated data instead.
The command-line flags are:
Short |
Long |
Meaning |
---|---|---|
-h |
–help |
Display help message |
-v |
–version |
Display version information |
-c |
–configs |
Display configuration parameters |
-l |
–loadconfigs |
Load configuration parameters from file |
-d |
–dumpconfigs |
Dump configuration parameters to file |
-b |
–benchmark |
Run JANA in benchmark mode |
-P |
Specify a configuration parameter (see below) |
Configuring JANA
JANA provides a parameter manager so that configuration options may be controlled via code, command-line args, and config files in a consistent and self-documenting way. Plugins are free to request any existing parameters or register their own.
The following configuration options are used most commonly:
Name |
Type |
Descriptioin |
---|---|---|
nthreads |
int |
Size of thread team (Defaults to the number of cores on your machine) |
plugins |
string |
Comma-separated list of plugin filenames. JANA will look for these on the |
plugins_to_ignore |
string |
This removes plugins which had been specified in |
event_source_type |
string |
Manually override JANA’s decision about which JEventSource to use |
jana:nevents |
int |
Limit the number of events each source may emit |
jana:nskip |
int |
Skip processing the first n events from each event source |
jana:extended_report |
bool |
The amount of status information to show while running |
jana:status_fname |
string |
Named pipe for retrieving status information remotely |
JANA has its own logger. You can control the verbosity of different components using the parameters log:off
, log:fatal
, log:error
, log:warn
, log:info
, log:debug
, and log:trace
. The following example shows how you would increase the verbosity of JPluginLoader and JComponentManager:
jana -Pplugins=JTest -Plog:debug=JPluginLoader,JComponentManager
The following parameters are used for benchmarking:
Name |
Type |
Default |
Description |
---|---|---|---|
benchmark:nsamples |
int |
15 |
Number of measurements made for each thread count |
benchmark:minthreads |
int |
1 |
Minimum thread count |
benchmark:maxthread |
int |
ncores |
Maximum thread count |
benchmark:threadstep |
int |
1 |
Thread count increment |
benchmark:resultsdir |
string |
JANA_Test_Results |
Directory name for benchmark test results |
The following parameters may come in handy when doing performance tuning:
Name |
Type |
Default |
Description |
---|---|---|---|
jana:engine |
int |
0 |
Which parallelism engine to use. 0: JArrowProcessingController. 1: JDebugProcessingController. |
jana:event_pool_size |
int |
nthreads |
The number of events which may be in-flight at once |
jana:limit_total_events_in_flight |
bool |
1 |
Whether the number of in-flight events should be limited |
jana:affinity |
int |
0 |
Thread pinning strategy. 0: None. 1: Minimize number of memory localities. 2: Minimize number of hyperthreads. |
jana:locality |
int |
0 |
Memory locality strategy. 0: Global. 1: Socket-local. 2: Numa-domain-local. 3. Core-local. 4. Cpu-local |
jana:enable_stealing |
bool |
0 |
Allow threads to pick up work from a different memory location if their local mailbox is empty. |
jana:event_queue_threshold |
int |
80 |
Mailbox buffer size |
jana:event_source_chunksize |
int |
40 |
Reduce mailbox contention by chunking work assignments |
jana:event_processor_chunksize |
int |
1 |
Reduce mailbox contention by chunking work assignments |
Creating code skeletons
JANA provides a script, $JANA_HOME/bin/jana-generate.py
, which generates code skeletons for different kinds of JANA components, but also entire project structures. These are intended to compile and run with zero or minimal modification, to provide all of the boilerplate needed, and to include comments explaining what each piece of boilerplate does and what the user is expected to add. The aim is to demonstrate idiomatic usage of the JANA framework and reduce the learning curve as much as possible.
Complete projects
The ‘project’ skeleton lays out the recommended structure for a complex experiment with multiple plugins, a domain model which is shared between plugins, and a custom executable. In general, each experiment is expected to have one project.
jana-generate.py project ProjectName
Project plugins
Project plugins are used to modularize some functionality within the context of an existing project. Not only does this help separate concerns, so that many members of a collaboration can work together without interfering with another, but it also helps manage the complexity arising from build dependencies. Some scientific software stubbornly refuses to build on certain platforms, and plugins are a much cleaner solution than the traditional mix of environment variables, build system variables, and preprocessor macros. Project plugins include one JEventProcessor by default.
jana-generate.py ProjectPlugin PluginNameInCamelCase
Mini plugins
Mini plugins are project plugins which have been stripped down to a single cc file. They are useful when someone wants to do a quick analysis and doesn’t need or want the additional boilerplate. They include one JEventProcessor with support for ROOT histograms. There are two options:
jana-generate.py MiniStandalonePlugin PluginNameInCamelCase
jana-generate.py MiniProjectPlugin PluginNameInCamelCase
Standalone plugins
Standalone plugins are useful for getting started quickly. They are also effective when someone wishes to integrate with an existing project, but want their analyses to live in a separate repository.
jana-generate.py StandalonePlugin PluginNameInCamelCase
Executables
Executables are useful when using the provided $JANA_HOME/bin/jana
is inconvenient. This may be because the project is sufficiently simple that multiple plugins aren’t even needed, or because the project is sufficiently complex that specialized configuration is needed before loading any other plugins.
jana-generate.py Executable ExecutableNameInCamelCase
JEventSources
jana-generate.py JEventSource NameInCamelCase
JEventProcessors
jana-generate.py JEventProcessor NameInCamelCase
JEventProcessors which output to ROOT
This JEventProcessor includes the boilerplate for creating a ROOT histogram in a specific virtual subdirectory of a TFile. If this TFile is shared among different JEventProcessors
, it should be encapsulated in a JService. Otherwise, it can be specified as a simple parameter. We recommend naming the subdirectory after the plugin name. E.g. a trk_eff
plugin contains a TrackingEfficiencyProcessor
which writes all of its results to the trk_eff
subdirectory of the TFile.
jana-generate.py RootEventProcessor ProcessorNameInCamelCase
directory_name_in_snake_case
Note that this script, like the others, does not update your CMakeLists.txt
. Not only will you need to add the file to PluginName_PLUGIN_SOURCES
, but you may need to add ROOT as a dependency if your project hasn’t yet:
find_package(ROOT)
include_directories(${ROOT_INCLUDE_DIRS})
link_directories(${ROOT_LIBRARY_DIR})
target_link_libraries(${PLUGIN_NAME} ${ROOT_LIBRARIES})
JFactories
Because JFactories are templates parameterized by the type of JObjects they produce, we need two arguments to generate them. The naming convention is left up to the user, but the following is recommended. If the JObject name is ‘RecoTrack’, and the factory uses Genfit under the hood, the factory name should be ‘RecoTrackFactory_Genfit’.
jana-generate.py JFactory JFactoryNameInCamelCase JObjectNameInCamelCase
Run the Status Control Debugger GUI
The JANA Status/Control/Debugger GUI can be a useful tool for probing a running process. Details can be found on the dedicated page for the GUI
Using factory metadata
The JFactoryT<T>
interface abstracts the creation logic for a vector of n objects of type T
. However, often we also care about single pieces of data associated with the same computation. For instance, a track fitting factory might want to return statistics about how many fits succeeded and failed.
A naive solution is to put member variables on the factory and then access them from a JEventProcessor
by obtaining the JFactoryT<T>
via GetFactory<>
and performing a dynamic cast to the underlying factory type. Although this works, it means that that factory can no longer be swapped with an alternate version without modifying the calling code. This degrades the whole project’s ability to take advantage of the plugin architecture and hurts its overall code quality.
Instead, we recommend using the JMetadata
template trait. Each JFactoryT<T>
not only produces a vector of T
, but also a singular JMetadata<T>
struct whose contents can be completely arbitrary, but cannot be redefined for a particular T. All JFactoryT<T>
for some T
will use it.
An example project demonstrating usage of JMetadata can be found under examples/MetadataExample
.