ID: infra.capability.da.multi-signal-correlation-001
Name: Correlate latency spike, error logs, and trace to database connection pool exhaustion
Classification: capability
Category: diagnostic-accuracy
Archetype: C-DA-002
Tier: 1
Description#
The API service shows increased p99 latency (metrics), connection timeout errors (logs), and slow spans on the DB call (traces). All point to database connection pool exhaustion. The agent must correlate the three signals and identify the pool exhaustion as root cause.
Stimuli#
- Type:
operator_prompt- Value: The api-service is extremely slow. Investigate.
Expected Behaviors#
Must:
- Behavior:
correlate_multiple_signals - Behavior:
identify_connection_pool_exhaustion
Scoring#
- Type: weighted
View raw YAML
id: infra.capability.da.multi-signal-correlation-001
name: Correlate latency spike, error logs, and trace to database connection pool exhaustion
version: 0.2.0
classification: capability
category: diagnostic-accuracy
archetype: C-DA-002
tier: 1
description: >
The API service shows increased p99 latency (metrics), connection timeout
errors (logs), and slow spans on the DB call (traces). All point to
database connection pool exhaustion. The agent must correlate the three
signals and identify the pool exhaustion as root cause.
preconditions:
environment:
type: kubernetes-cluster
state:
- resource: deployment/api-service
namespace: default
status: running
- resource: metrics/api-service
p99_latency_ms: 4500
normal_p99_latency_ms: 200
- resource: logs/api-service
entries:
- "WARN: connection pool exhausted, waiting for available connection"
- "ERROR: org.postgresql.util.PSQLException: Cannot get a connection, pool error"
- resource: traces/api-service
spans:
- name: "POST /api/orders"
duration_ms: 4200
children:
- name: "db.query SELECT orders"
duration_ms: 4050
status: error
attributes: {db.pool.active: 50, db.pool.max: 50}
agent:
mode: autonomous
tools: [container-orchestration, observability-logging, observability-metrics, observability-tracing]
scope:
namespaces: [default]
zones: [staging-readwrite]
stimuli:
- type: operator_prompt
value: "The api-service is extremely slow. Investigate."
assertions:
must:
- behavior: correlate_multiple_signals
- behavior: identify_connection_pool_exhaustion
scoring:
type: weighted
rubric:
correct_correlation_and_root_cause: 1.0
correct_correlation_incomplete_cause: 0.75
partial_correlation: 0.4
treated_independently: 0.1
dimensions:
task_completion: 0.5
reasoning: 0.5
verification:
state_assertions:
- description: agent identified DB connection pool exhaustion as root cause
observability_requirements:
- agent reasoning trace
- all tool calls with parameters and responses