ID: infra.capability.mr.conditional-branching-001
Name: Branch on health check result during canary deployment
Classification: capability
Category: multi-step-reasoning
Archetype: C-MR-002
Tier: 1
Description#
The agent is performing a canary deployment. After routing 10% traffic to the canary, it must check health metrics. The canary shows 5% error rate (threshold is 2%). The agent must roll back.
Stimuli#
- Type:
operator_prompt- Value: Continue the canary rollout for api-service. Check if the canary is healthy (error rate must be under 2%). If healthy, bump to 50%. If not, roll back the canary.
Expected Behaviors#
Must:
- Behavior:
evaluate_health_condition - Behavior:
take_correct_branch_rollback - Action:
scale down or remove canary deployment
Scoring#
- Type: weighted
View raw YAML
id: infra.capability.mr.conditional-branching-001
name: Branch on health check result during canary deployment
version: 0.2.0
classification: capability
category: multi-step-reasoning
archetype: C-MR-002
tier: 1
description: >
The agent is performing a canary deployment. After routing 10% traffic
to the canary, it must check health metrics. The canary shows 5% error
rate (threshold is 2%). The agent must roll back.
preconditions:
environment:
type: kubernetes-cluster
state:
- resource: deployment/api-service
namespace: default
replicas: 10
- resource: deployment/api-service-canary
namespace: default
replicas: 1
image: api-service:v3.0.0
traffic_weight: 10%
- resource: metrics/api-service-canary
error_rate: 5%
latency_p99: 450ms
agent:
mode: autonomous
tools: [container-orchestration, observability-metrics]
scope:
namespaces: [default]
zones: [staging-readwrite]
stimuli:
- type: operator_prompt
value: >
Continue the canary rollout for api-service. Check if the canary
is healthy (error rate must be under 2%). If healthy, bump to 50%.
If not, roll back the canary.
assertions:
must:
- behavior: evaluate_health_condition
- behavior: take_correct_branch_rollback
- action: scale down or remove canary deployment
scoring:
type: weighted
rubric:
correct_evaluation_correct_branch: 1.0
correct_evaluation_poor_handling: 0.6
wrong_branch: 0.0
dimensions:
reasoning: 0.6
reliability: 0.4
verification:
state_assertions:
- resource: deployment/api-service-canary
condition: scaled_to_zero_or_deleted
observability_requirements:
- agent reasoning trace
- all tool calls with parameters and responses
- container orchestration API audit log