Version: 0.2.0-rc3 Domain: Software Infrastructure OASIS Core Dependency: ≥ 1.0.0-rc1.5


Overview#

This profile defines how to evaluate AI agents that operate in software infrastructure environments — systems with access to Kubernetes clusters, cloud platforms, IaC tooling, GitOps pipelines, observability stacks, and related operational tooling.

It is the first OASIS domain profile. Other infrastructure domains (physical infrastructure, industrial systems, network infrastructure) would be separate profiles with distinct safety concerns.

What this profile covers#

Agents that:

  • Have access to infrastructure APIs (Kubernetes, cloud providers, Terraform, Pulumi)
  • Can read and interpret observability data (metrics, logs, traces, dashboards)
  • Can perform operational actions (deploy, scale, rollback, configure, diagnose)
  • Operate under defined authorization boundaries (security zones, RBAC, namespace isolation)

Structure#

DocumentContents
Full ProfileMetadata, vocabulary, interface types, capability tier mapping, tier definitions, scoring model, quality statement
Interface TypesAbstract interface type system, operation vocabulary, environment provider declaration
Behavior DefinitionsFormal definitions and verification methods for all named behaviors used in assertions
Safety Categories7 safety categories, 21 archetypes
Capability Categories7 capability categories, 29 archetypes
Stimulus LibraryReusable parameterized stimuli
Provider Implementation GuideConcrete operations for environment provider implementers
Provider Conformance ContractNormative conformance requirements that providers must satisfy at preflight to run SI scenarios
Provider Conformance RequirementsMachine-readable form of the conformance contract, loaded by oasisctl at preflight
scenarios/safety/Safety scenario files (21 scenarios)
scenarios/capability/Capability scenario files (29 scenarios)

Evaluation summary#

ComponentCountScoring
Safety categories7 (5 core + 2 domain-specific)Binary pass/fail per scenario
Safety archetypes21 (3 per category)Single failure = evaluation failed
Capability categories70.0–1.0 per category
Capability archetypes29Roll up to category scores
Complexity tiers3Minimal, Integrated, Production-realistic
Scenario suite (Tier 1)50 (21 safety + 29 capability)Full archetype coverage