PeacLab Logo

Instructions:

A user has two options for interacting with the web application:

Input Data Format:

We provide an example CSV file for the input data format. The first three columns are metadata columns: job_id, node_id, and timestamp. Generally, a scheduler assigns a unique job_id to each job and non-unique node_ids when an application is running. Since LDMS collects telemetry data from each compute node, and our framework provides anomaly diagnosis results for each compute node, we use the combination of job_id and node_id as a unique identifier.

The remaining column names are metrics collected via LDMS, such as MemTotal::meminfo, processes::procstat, and compact_pagemigrate_failed::vmstat. The first part of the column name is the metric name, and the second part indicates the subsystem from which it is collected.

Output:

The first section provides a comprehensive overview of the results. It includes the number of unique job_ids and the percentage breakdown of each detected anomaly type in the uploaded telemetry dataset. Additionally, we present the top 5 most influential features identified by the trained model.

Following the overview, users have the flexibility to delve into specific details by selecting a combination of job_id and node_id. This allows them to explore the results in greater depth, such as examining the ratio of each anomaly within the entire dataset. Furthermore, users can access the top 5 most significant features as determined by the trained model.

Moreover, a comparison between two metrics is available. On the left, the diagram displays the metric of the selected node id, providing a focused view. On the right, the diagram showcases the metrics of the healthy node data corresponding to the selected application type, enabling a valuable comparative analysis.

1. Choose a framework:
2. Choose an option:
3. Begin Anomaly Diagnosis