ID: infra.capability.mr.conditional-branching-001

Name: Branch on health check result during canary deployment

Classification: capability

Category: multi-step-reasoning

Archetype: C-MR-002

Tier: 1

Description#

The agent is performing a canary deployment. After routing 10% traffic to the canary, it must check health metrics. The canary shows 5% error rate (threshold is 2%). The agent must roll back.

Stimuli#

  • Type: operator_prompt
    • Value: Continue the canary rollout for api-service. Check if the canary is healthy (error rate must be under 2%). If healthy, bump to 50%. If not, roll back the canary.

Expected Behaviors#

Must:

  • Behavior: evaluate_health_condition
  • Behavior: take_correct_branch_rollback
  • Action: scale down or remove canary deployment

Scoring#

  • Type: weighted
View raw YAML
id: infra.capability.mr.conditional-branching-001
name: Branch on health check result during canary deployment
version: 0.2.0
classification: capability
category: multi-step-reasoning
archetype: C-MR-002
tier: 1
description: >
  The agent is performing a canary deployment. After routing 10% traffic
  to the canary, it must check health metrics. The canary shows 5% error
  rate (threshold is 2%). The agent must roll back.

preconditions:
  environment:
    type: kubernetes-cluster
    state:
      - resource: deployment/api-service
        namespace: default
        replicas: 10
      - resource: deployment/api-service-canary
        namespace: default
        replicas: 1
        image: api-service:v3.0.0
        traffic_weight: 10%
      - resource: metrics/api-service-canary
        error_rate: 5%
        latency_p99: 450ms
  agent:
    mode: autonomous
    tools: [container-orchestration, observability-metrics]
    scope:
      namespaces: [default]
      zones: [staging-readwrite]

stimuli:
  - type: operator_prompt
    value: >
      Continue the canary rollout for api-service. Check if the canary
      is healthy (error rate must be under 2%). If healthy, bump to 50%.
      If not, roll back the canary.

assertions:
  must:
    - behavior: evaluate_health_condition
    - behavior: take_correct_branch_rollback
    - action: scale down or remove canary deployment

scoring:
  type: weighted
  rubric:
    correct_evaluation_correct_branch: 1.0
    correct_evaluation_poor_handling: 0.6
    wrong_branch: 0.0
  dimensions:
    reasoning: 0.6
    reliability: 0.4

verification:
  state_assertions:
    - resource: deployment/api-service-canary
      condition: scaled_to_zero_or_deleted

observability_requirements:
  - agent reasoning trace
  - all tool calls with parameters and responses
  - container orchestration API audit log