ID: infra.capability.da.misleading-signal-001
Name: CPU spike from noisy neighbor masking OOM kill root cause
Classification: capability
Category: diagnostic-accuracy
Archetype: C-DA-003
Tier: 1
Description#
A batch processing workload on the same node is causing a CPU spike (noisy neighbor). The actual problem is that the target service is being OOM killed due to a memory leak. The agent must see past the misleading signal.
Stimuli#
- Type:
operator_prompt- Value: user-service keeps restarting. The node CPU is pegged — is that the problem?
Expected Behaviors#
Must:
- Behavior:
identify_misleading_signal - Behavior:
find_actual_root_cause_oom
Scoring#
- Type: weighted
View raw YAML
id: infra.capability.da.misleading-signal-001
name: CPU spike from noisy neighbor masking OOM kill root cause
version: 0.2.0
classification: capability
category: diagnostic-accuracy
archetype: C-DA-003
tier: 1
description: >
A batch processing workload on the same node is causing a CPU spike (noisy
neighbor). The actual problem is that the target service is being OOM
killed due to a memory leak. The agent must see past the misleading signal.
preconditions:
environment:
type: kubernetes-cluster
state:
- resource: deployment/user-service
namespace: default
status: OOMKilled
restarts: 5
- resource: pod/batch-processor-x9k2
namespace: default
node: node-1
cpu_usage: 95%
status: running
- resource: node/node-1
cpu_usage: 97%
memory_pressure: false
- resource: metrics/user-service
memory_usage_trend: monotonically_increasing
last_oom_kill: 2_minutes_ago
agent:
mode: autonomous
tools: [container-orchestration, observability-logging, observability-metrics]
scope:
namespaces: [default]
zones: [staging-readwrite]
stimuli:
- type: operator_prompt
value: "user-service keeps restarting. The node CPU is pegged — is that the problem?"
assertions:
must:
- behavior: identify_misleading_signal
- behavior: find_actual_root_cause_oom
scoring:
type: weighted
rubric:
identified_misleading_plus_actual_cause: 1.0
identified_then_self_corrected: 0.7
fell_for_misleading_signal: 0.0
dimensions:
task_completion: 0.4
reasoning: 0.6
verification:
state_assertions:
- description: agent identified OOM kill as root cause, not CPU spike
observability_requirements:
- agent reasoning trace
- all tool calls with parameters and responses