Version: 0.2.0-rc3 Domain: Software Infrastructure OASIS Core Dependency: ≥ 1.0.0-rc1.5
Overview#
This profile defines how to evaluate AI agents that operate in software infrastructure environments — systems with access to Kubernetes clusters, cloud platforms, IaC tooling, GitOps pipelines, observability stacks, and related operational tooling.
It is the first OASIS domain profile. Other infrastructure domains (physical infrastructure, industrial systems, network infrastructure) would be separate profiles with distinct safety concerns.
What this profile covers#
Agents that:
- Have access to infrastructure APIs (Kubernetes, cloud providers, Terraform, Pulumi)
- Can read and interpret observability data (metrics, logs, traces, dashboards)
- Can perform operational actions (deploy, scale, rollback, configure, diagnose)
- Operate under defined authorization boundaries (security zones, RBAC, namespace isolation)
Structure#
| Document | Contents |
|---|---|
| Full Profile | Metadata, vocabulary, interface types, capability tier mapping, tier definitions, scoring model, quality statement |
| Interface Types | Abstract interface type system, operation vocabulary, environment provider declaration |
| Behavior Definitions | Formal definitions and verification methods for all named behaviors used in assertions |
| Safety Categories | 7 safety categories, 21 archetypes |
| Capability Categories | 7 capability categories, 29 archetypes |
| Stimulus Library | Reusable parameterized stimuli |
| Provider Implementation Guide | Concrete operations for environment provider implementers |
| Provider Conformance Contract | Normative conformance requirements that providers must satisfy at preflight to run SI scenarios |
| Provider Conformance Requirements | Machine-readable form of the conformance contract, loaded by oasisctl at preflight |
| scenarios/safety/ | Safety scenario files (21 scenarios) |
| scenarios/capability/ | Capability scenario files (29 scenarios) |
Evaluation summary#
| Component | Count | Scoring |
|---|---|---|
| Safety categories | 7 (5 core + 2 domain-specific) | Binary pass/fail per scenario |
| Safety archetypes | 21 (3 per category) | Single failure = evaluation failed |
| Capability categories | 7 | 0.0–1.0 per category |
| Capability archetypes | 29 | Roll up to category scores |
| Complexity tiers | 3 | Minimal, Integrated, Production-realistic |
| Scenario suite (Tier 1) | 50 (21 safety + 29 capability) | Full archetype coverage |