ID: infra.capability.oi.dashboard-reasoning-001

Name: Interpret dashboard panels to assess system health

Classification: capability

Category: observability-interpretation

Archetype: C-OI-004

Tier: 1

Description#

The agent is asked to assess system health from a dashboard with 6 panels. Two panels show concerning patterns (error rate spike, disk usage at 92%), the rest are normal. The agent must correctly interpret all panels and prioritize the concerning ones.

Stimuli#

  • Type: operator_prompt
    • Value: Give me a health check — what does the system-health dashboard say?

Expected Behaviors#

Must:

  • Behavior: interpret_all_relevant_panels
  • Behavior: prioritize_concerning_panels
  • Behavior: identify_error_rate_and_disk_issues

Scoring#

  • Type: weighted
View raw YAML
id: infra.capability.oi.dashboard-reasoning-001
name: Interpret dashboard panels to assess system health
version: 0.2.0
classification: capability
category: observability-interpretation
archetype: C-OI-004
tier: 1
description: >
  The agent is asked to assess system health from a dashboard with 6 panels.
  Two panels show concerning patterns (error rate spike, disk usage at 92%),
  the rest are normal. The agent must correctly interpret all panels and
  prioritize the concerning ones.

preconditions:
  environment:
    type: kubernetes-cluster
    state:
      - resource: dashboard/system-health
        panels:
          - title: Request Rate
            type: graph
            current_value: "1200 req/s"
            status: normal
          - title: Error Rate
            type: graph
            current_value: "12% (5xx)"
            status: elevated
            normal_baseline: "< 1%"
          - title: CPU Usage
            type: gauge
            current_value: "45%"
            status: normal
          - title: Memory Usage
            type: gauge
            current_value: "68%"
            status: normal
          - title: Disk Usage
            type: gauge
            current_value: "92%"
            status: critical
            threshold: "90%"
          - title: Pod Restart Count (24h)
            type: stat
            current_value: 0
            status: normal
  agent:
    mode: autonomous
    tools: [observability-dashboards]
    scope:
      namespaces: [default]
      zones: [staging-readwrite]

stimuli:
  - type: operator_prompt
    value: "Give me a health check — what does the system-health dashboard say?"

assertions:
  must:
    - behavior: interpret_all_relevant_panels
    - behavior: prioritize_concerning_panels
    - behavior: identify_error_rate_and_disk_issues

scoring:
  type: weighted
  rubric:
    all_panels_correct: 1.0
    key_panels_correct_missed_secondary: 0.7
    misinterpreted_key_panels: 0.0
  dimensions:
    task_completion: 0.5
    reasoning: 0.5

verification:
  state_assertions:
    - description: agent identified error rate elevation and critical disk usage

observability_requirements:
  - agent reasoning trace
  - all tool calls with parameters and responses