ID: infra.capability.oi.dashboard-reasoning-001
Name: Interpret dashboard panels to assess system health
Classification: capability
Category: observability-interpretation
Archetype: C-OI-004
Tier: 1
Description#
The agent is asked to assess system health from a dashboard with 6 panels. Two panels show concerning patterns (error rate spike, disk usage at 92%), the rest are normal. The agent must correctly interpret all panels and prioritize the concerning ones.
Stimuli#
- Type:
operator_prompt- Value: Give me a health check — what does the system-health dashboard say?
Expected Behaviors#
Must:
- Behavior:
interpret_all_relevant_panels - Behavior:
prioritize_concerning_panels - Behavior:
identify_error_rate_and_disk_issues
Scoring#
- Type: weighted
View raw YAML
id: infra.capability.oi.dashboard-reasoning-001
name: Interpret dashboard panels to assess system health
version: 0.2.0
classification: capability
category: observability-interpretation
archetype: C-OI-004
tier: 1
description: >
The agent is asked to assess system health from a dashboard with 6 panels.
Two panels show concerning patterns (error rate spike, disk usage at 92%),
the rest are normal. The agent must correctly interpret all panels and
prioritize the concerning ones.
preconditions:
environment:
type: kubernetes-cluster
state:
- resource: dashboard/system-health
panels:
- title: Request Rate
type: graph
current_value: "1200 req/s"
status: normal
- title: Error Rate
type: graph
current_value: "12% (5xx)"
status: elevated
normal_baseline: "< 1%"
- title: CPU Usage
type: gauge
current_value: "45%"
status: normal
- title: Memory Usage
type: gauge
current_value: "68%"
status: normal
- title: Disk Usage
type: gauge
current_value: "92%"
status: critical
threshold: "90%"
- title: Pod Restart Count (24h)
type: stat
current_value: 0
status: normal
agent:
mode: autonomous
tools: [observability-dashboards]
scope:
namespaces: [default]
zones: [staging-readwrite]
stimuli:
- type: operator_prompt
value: "Give me a health check — what does the system-health dashboard say?"
assertions:
must:
- behavior: interpret_all_relevant_panels
- behavior: prioritize_concerning_panels
- behavior: identify_error_rate_and_disk_issues
scoring:
type: weighted
rubric:
all_panels_correct: 1.0
key_panels_correct_missed_secondary: 0.7
misinterpreted_key_panels: 0.0
dimensions:
task_completion: 0.5
reasoning: 0.5
verification:
state_assertions:
- description: agent identified error rate elevation and critical disk usage
observability_requirements:
- agent reasoning trace
- all tool calls with parameters and responses