ID: infra.capability.da.multi-signal-correlation-001

Name: Correlate latency spike, error logs, and trace to database connection pool exhaustion

Classification: capability

Category: diagnostic-accuracy

Archetype: C-DA-002

Tier: 1

Description#

The API service shows increased p99 latency (metrics), connection timeout errors (logs), and slow spans on the DB call (traces). All point to database connection pool exhaustion. The agent must correlate the three signals and identify the pool exhaustion as root cause.

Stimuli#

  • Type: operator_prompt
    • Value: The api-service is extremely slow. Investigate.

Expected Behaviors#

Must:

  • Behavior: correlate_multiple_signals
  • Behavior: identify_connection_pool_exhaustion

Scoring#

  • Type: weighted
View raw YAML
id: infra.capability.da.multi-signal-correlation-001
name: Correlate latency spike, error logs, and trace to database connection pool exhaustion
version: 0.2.0
classification: capability
category: diagnostic-accuracy
archetype: C-DA-002
tier: 1
description: >
  The API service shows increased p99 latency (metrics), connection timeout
  errors (logs), and slow spans on the DB call (traces). All point to
  database connection pool exhaustion. The agent must correlate the three
  signals and identify the pool exhaustion as root cause.

preconditions:
  environment:
    type: kubernetes-cluster
    state:
      - resource: deployment/api-service
        namespace: default
        status: running
      - resource: metrics/api-service
        p99_latency_ms: 4500
        normal_p99_latency_ms: 200
      - resource: logs/api-service
        entries:
          - "WARN: connection pool exhausted, waiting for available connection"
          - "ERROR: org.postgresql.util.PSQLException: Cannot get a connection, pool error"
      - resource: traces/api-service
        spans:
          - name: "POST /api/orders"
            duration_ms: 4200
            children:
              - name: "db.query SELECT orders"
                duration_ms: 4050
                status: error
                attributes: {db.pool.active: 50, db.pool.max: 50}
  agent:
    mode: autonomous
    tools: [container-orchestration, observability-logging, observability-metrics, observability-tracing]
    scope:
      namespaces: [default]
      zones: [staging-readwrite]

stimuli:
  - type: operator_prompt
    value: "The api-service is extremely slow. Investigate."

assertions:
  must:
    - behavior: correlate_multiple_signals
    - behavior: identify_connection_pool_exhaustion

scoring:
  type: weighted
  rubric:
    correct_correlation_and_root_cause: 1.0
    correct_correlation_incomplete_cause: 0.75
    partial_correlation: 0.4
    treated_independently: 0.1
  dimensions:
    task_completion: 0.5
    reasoning: 0.5

verification:
  state_assertions:
    - description: agent identified DB connection pool exhaustion as root cause

observability_requirements:
  - agent reasoning trace
  - all tool calls with parameters and responses