How AI Agents Can Run Penetration Tests With RedVeil

AI coding agents are shipping code faster than ever. They review pull requests, refactor modules, resolve tickets, and generate entire features from a prompt. But until now, there has been no standard way for an agent to answer a simple question: is this application actually secure?

RedVeil changes that. With the pentest-agent CLI, any AI agent that can execute shell commands can run a complete penetration test — from project creation to finding triage to compliance-ready reporting — without a human ever opening a browser.

This guide walks through how it works, which agent platforms support it today, and what the practical workflows look like.

The Problem: Security Is Still a Human Bottleneck

Modern development teams are automating nearly everything. Code generation, testing, deployment, monitoring — agents and pipelines handle the bulk of it. But security testing remains stubbornly manual. Someone has to log into a dashboard, configure a scan, wait for results, triage findings, and generate a report.

That loop takes hours at best and weeks at worst. It breaks the speed that agent-driven development enables. And it means that security validation often happens long after the code has shipped.

Traditional pentesting is even worse. Teams schedule an engagement weeks in advance, hand off credentials and scope documents, and then wait — sometimes for a month or more — with almost no visibility into what's actually happening. There's no progress bar, no interim findings, no way to know whether the tester is actively working or the report is sitting in a review queue. The final deliverable arrives as a static PDF long after the code it assessed has already been changed, redeployed, and changed again.

The result is a gap between how fast teams build and how fast they verify what they've built.

What RedVeil Brings to the Table

RedVeil is an AI-powered penetration testing platform that delivers human-level depth at machine speed. Unlike traditional scanners that check for known patterns, RedVeil's AI agents reason through multi-step attack chains, maintain authentication state, and validate every finding through controlled exploitation. Each result comes with proof-of-concept evidence, reproduction steps, and clear remediation guidance.

Tests run on demand in hours, not weeks. Reports are audit-ready for SOC 2, ISO 27001, HIPAA, PCI, and more. And because every finding is validated before it's reported, the noise level is dramatically lower than what teams are used to from conventional tools.

That's the engine. The missing piece was a way for other AI agents to use it.

Enter pentest-agent

pentest-agent is a CLI built specifically for headless, agent-driven environments. Every command emits structured JSON, follows predictable conventions, and works without any interactive prompts. It's the bridge between your coding agent and RedVeil's penetration testing platform.

An agent with access to pentest-agent can execute the full penetration test lifecycle:

Authenticate — set a token via environment variable or use the device auth flow.
Create a project — define the target: web application, API, external network, or cloud account.
Launch and monitor scans — start, pause, resume, cancel, or schedule scans and poll for status.
Triage findings — list vulnerabilities, inspect evidence, mark false positives with justification, or trigger retests.
Generate reports — produce executive, technical, or compliance PDFs programmatically.
Track usage — estimate cost before scanning and verify budget availability.

Here's what a full autonomous workflow looks like in practice:

export REDVEIL_TOKEN="$REDVEIL_SECRET"
 
PROJECT=$(pentest-agent project create webapp \
  --name "Acme App" \
  --target https://app.acme.com \
  --auth-type bearer \
  --bearer-token @/secrets/bearer.txt \
  --json)
PROJECT_ID=$(echo "$PROJECT" | jq -r '._id')
 
pentest-agent scan start "$PROJECT_ID"
pentest-agent scan status "$PROJECT_ID" --json
 
pentest-agent finding list "$PROJECT_ID" --json
pentest-agent report generate --project "$PROJECT_ID" --type executive_pdf --json

Every response is parseable. The agent reads each result, decides what to do next, and keeps going.

Which Agent Platforms Work With It

Any agent that can run shell commands can use pentest-agent. Three platforms stand out as particularly well suited.

OpenClaw

OpenClaw is an open-source agent framework built for autonomous software engineering. Agents in OpenClaw operate through a tool-calling loop — they observe their environment, decide on an action, and execute it.

pentest-agent commands slot directly into this model. Register them as available tools, and the agent can orchestrate a full security test lifecycle as part of its broader workflow. After shipping a feature, the same agent that wrote the code can create a RedVeil project, launch a scan, and triage the results.

Claude Code

Claude Code is Anthropic's agentic coding tool that operates directly in the terminal. It reads files, runs commands, and iterates on code autonomously.

Give Claude Code access to pentest-agent and ask it to run a security test. It installs the CLI, authenticates, creates a project for the target application, launches the scan, waits for completion, reviews findings, and generates a report. The entire workflow runs end to end without manual intervention.

Because Claude Code already understands code context, it can also cross-reference findings with the actual source — identifying which file and function introduced a vulnerability and proposing a fix in the same session.

Codex

Codex is OpenAI's cloud-based coding agent that executes tasks in parallel sandboxed environments. Each task gets its own isolated runtime with full shell access.

Add pentest-agent to the environment setup and Codex can create projects, launch scans, analyze findings, and generate reports autonomously. Because Codex runs tasks in parallel, it can test multiple applications simultaneously — useful for teams managing a portfolio of services that all need regular security validation.

Any Agent That Can Run Commands

OpenClaw, Claude Code, and Codex are highlighted here because they're popular, but they aren't special cases. pentest-agent has no opinion about what's calling it. If your agent framework can execute a shell command and read the output, it can drive RedVeil. That includes Devin, Cline, Aider, Sweep, SWE-Agent, Cursor's agent mode, custom LangChain or CrewAI pipelines, or a simple bash script wired to an LLM. The interface is the same: structured JSON in, structured JSON out, no interactive prompts, no browser required.

Practical Workflows

Connecting an agent to pentest-agent unlocks workflows that simply weren't possible when security testing required a human at the keyboard.

Post-Merge Security Gates

A coding agent merges a feature branch, then immediately runs a penetration test against the staging environment. If critical findings appear, the agent blocks the deploy and opens fix PRs with remediation guidance from the report. If the scan is clean, the deploy proceeds automatically.

- name: Security scan
  env:
    REDVEIL_TOKEN: ${{ secrets.REDVEIL_TOKEN }}
  run: |
    npx pentest-agent scan start "$PROJECT_ID"
    STATUS=$(npx pentest-agent scan status "$PROJECT_ID" --json | jq -r '.scanStatus')
    while [ "$STATUS" = "running" ]; do
      sleep 30
      STATUS=$(npx pentest-agent scan status "$PROJECT_ID" --json | jq -r '.scanStatus')
    done
    CRITICAL=$(npx pentest-agent finding list "$PROJECT_ID" --json | jq '[.[] | select(.severity >= 9)] | length')
    if [ "$CRITICAL" -gt 0 ]; then
      echo "::error::$CRITICAL critical findings detected"
      exit 1
    fi

This works in GitHub Actions, GitLab CI, Jenkins, CircleCI, or any CI system that supports shell steps.

Continuous Retesting

After an agent remediates a vulnerability — applying a patch, updating a dependency, rotating a credential — it calls pentest-agent finding retest to verify the fix. No waiting for a human to log in, find the finding, and click a button. The feedback loop closes in minutes.

Scheduled Compliance Runs

For teams that need regular penetration testing for SOC 2, PCI, or ISO 27001, a cron job can trigger scans on a cadence. The agent generates attestation reports and pushes them to a shared drive or uploads them to a compliance platform. Fully unattended.

Cost-Aware Scanning

Before starting any scan, an agent can call pentest-agent usage estimate-project to preview the Agent Ops cost and pentest-agent usage check to verify budget availability. This prevents surprise overages and lets teams enforce spending policies programmatically.

Why This Matters Now

The security industry has a structural problem. Software ships faster than it can be tested. AI-generated code is accelerating that gap further. And traditional penetration testing — expensive, slow, scheduled weeks in advance — was already struggling to keep pace before agents entered the picture.

Agent-driven security testing isn't a nice-to-have. It's the logical next step in a world where agents are already writing, reviewing, and deploying code. If an agent can ship a feature, it should be able to verify that the feature didn't introduce a vulnerability.

RedVeil provides the testing engine. pentest-agent provides the interface. Together, they give AI agents the ability to run professional-grade penetration tests — validated findings, compliance-ready reports, and real exploitation evidence — at the speed of the development cycle that produced the code in the first place.

Get Started

Install the CLI globally:

npm install -g pentest-agent

Or run without installing:

npx pentest-agent --help

Authenticate and run your first scan:

pentest-agent auth login
pentest-agent project create webapp \
  --name "My App" \
  --target https://myapp.com \
  --auth-type none
pentest-agent scan start <projectId>

Full CLI documentation is available at redveil.ai/docs/cli/overview. To see how RedVeil works with your agent platform, visit the Agents page or book a demo to talk through your setup.