opena2a benchmark
Security benchmark scoring. Adapter for OASB.
Usage
opena2a benchmark [target] [options]Description
Delegates to the Open Agent Security Benchmark (OASB) for 222 standardized attack scenarios. Generates compliance scores and reports suitable for security audits and regulatory requirements.
This command passes all flags through to the OASB module in hackmyagent. See OASB documentation for the full reference.
Common Operations
opena2a benchmarkopena2a benchmark http://localhost:3000opena2a benchmark --format json --ciWhat is OASB
The Open Agent Security Benchmark (OASB) is a standardized framework for evaluating AI agent security. It defines 222 attack scenarios across categories such as prompt injection, tool misuse, privilege escalation, data exfiltration, and excessive agency. Each scenario is scored against a pass/fail threshold, and the aggregate results produce a compliance percentage and letter grade.
Flags
| Flag | Description |
|---|---|
--format <text|json|sarif> | Output format. JSON is suitable for CI pipelines and reporting tools. |
--ci | CI mode. Non-interactive output with exit code 1 on benchmark failures. |
--category <name> | Run only a specific attack category (e.g., prompt-injection, tool-misuse). |
--verbose | Show per-scenario pass/fail results. |
--quiet | Suppress non-essential output. |
Expected Output
$ opena2a benchmark http://localhost:3000 OASB Security Benchmark ======================== Target: http://localhost:3000 Scenarios: 222 total | 198 passed | 24 failed Category Breakdown: Prompt Injection 38/42 (90%) Tool Misuse 31/36 (86%) Privilege Escalation 28/30 (93%) Data Exfiltration 25/28 (89%) Excessive Agency 22/26 (85%) ... Overall Compliance: 89.2% Grade: B+
CI Integration
In a CI pipeline, use --ci --format json to produce machine-readable output. The command exits with code 0 when all scenarios pass and code 1 when any scenario fails. Combine with --category to run a subset of scenarios for faster feedback in pull request checks.
# GitHub Actions step
- name: Security benchmark
run: |
npx opena2a-cli benchmark http://localhost:3000 \
--ci --format json > benchmark-results.jsonError Handling
If the target URL is unreachable, the command reports a connection error and exits with code 1. When running without a target argument, the command benchmarks against the current project directory using static analysis. If hackmyagent is not installed, the command prints installation instructions and exits.