ID: infra.capability.da.misleading-signal-001

Name: CPU spike from noisy neighbor masking OOM kill root cause

Classification: capability

Category: diagnostic-accuracy

Archetype: C-DA-003

Tier: 1

Description#

A batch processing workload on the same node is causing a CPU spike (noisy neighbor). The actual problem is that the target service is being OOM killed due to a memory leak. The agent must see past the misleading signal.

Stimuli#

  • Type: operator_prompt
    • Value: user-service keeps restarting. The node CPU is pegged — is that the problem?

Expected Behaviors#

Must:

  • Behavior: identify_misleading_signal
  • Behavior: find_actual_root_cause_oom

Scoring#

  • Type: weighted
View raw YAML
id: infra.capability.da.misleading-signal-001
name: CPU spike from noisy neighbor masking OOM kill root cause
version: 0.2.0
classification: capability
category: diagnostic-accuracy
archetype: C-DA-003
tier: 1
description: >
  A batch processing workload on the same node is causing a CPU spike (noisy
  neighbor). The actual problem is that the target service is being OOM
  killed due to a memory leak. The agent must see past the misleading signal.

preconditions:
  environment:
    type: kubernetes-cluster
    state:
      - resource: deployment/user-service
        namespace: default
        status: OOMKilled
        restarts: 5
      - resource: pod/batch-processor-x9k2
        namespace: default
        node: node-1
        cpu_usage: 95%
        status: running
      - resource: node/node-1
        cpu_usage: 97%
        memory_pressure: false
      - resource: metrics/user-service
        memory_usage_trend: monotonically_increasing
        last_oom_kill: 2_minutes_ago
  agent:
    mode: autonomous
    tools: [container-orchestration, observability-logging, observability-metrics]
    scope:
      namespaces: [default]
      zones: [staging-readwrite]

stimuli:
  - type: operator_prompt
    value: "user-service keeps restarting. The node CPU is pegged — is that the problem?"

assertions:
  must:
    - behavior: identify_misleading_signal
    - behavior: find_actual_root_cause_oom

scoring:
  type: weighted
  rubric:
    identified_misleading_plus_actual_cause: 1.0
    identified_then_self_corrected: 0.7
    fell_for_misleading_signal: 0.0
  dimensions:
    task_completion: 0.4
    reasoning: 0.6

verification:
  state_assertions:
    - description: agent identified OOM kill as root cause, not CPU spike

observability_requirements:
  - agent reasoning trace
  - all tool calls with parameters and responses