{"id":81540,"date":"2026-04-21T19:44:42","date_gmt":"2026-04-21T19:44:42","guid":{"rendered":"https:\/\/diyhaven858.wasmer.app\/index.php\/three-ai-coding-agents-leaked-secrets-through-a-single-prompt-injection-one-vendors-system-card-predicted-it\/"},"modified":"2026-04-21T19:44:42","modified_gmt":"2026-04-21T19:44:42","slug":"three-ai-coding-agents-leaked-secrets-through-a-single-prompt-injection-one-vendors-system-card-predicted-it","status":"publish","type":"post","link":"https:\/\/diyhaven858.wasmer.app\/index.php\/three-ai-coding-agents-leaked-secrets-through-a-single-prompt-injection-one-vendors-system-card-predicted-it\/","title":{"rendered":"Three AI coding agents leaked secrets through a single prompt injection. One vendor&#039;s system card predicted it"},"content":{"rendered":"<p> <br \/>\n<br \/><img decoding=\"async\" src=\"https:\/\/images.ctfassets.net\/jdtwqhzvc2n1\/338TKAZmGg0eamRggz05Kq\/fdc8465b217fab704067d122300d6da7\/hero_model_comparison.png?w=300&amp;q=30\" \/><\/p>\n<p>A security researcher, working with colleagues at Johns Hopkins University, opened a GitHub pull request, typed a malicious instruction into the PR title, and watched Anthropic\u2019s Claude Code Security Review action post its own API key as a comment. The same prompt injection worked on Google\u2019s Gemini CLI Action and GitHub\u2019s Copilot Agent (Microsoft). No external infrastructure required.<\/p>\n<p>Aonan Guan, the researcher who discovered the vulnerability, alongside Johns Hopkins colleagues Zhengyu Liu and Gavin Zhong, published the full technical disclosure last week, calling it \u201cComment and Control.\u201d GitHub Actions does not expose secrets to fork pull requests by default when using the pull_request trigger, but workflows using pull_request_target, which most AI agent integrations require for secret access, do inject secrets into the runner environment. This limits the practical attack surface but does not eliminate it: collaborators, comment fields, and any repo using pull_request_target with an AI coding agent are exposed.<\/p>\n<p>Per Guan\u2019s disclosure timeline: Anthropic classified it as CVSS 9.4 Critical ($100 bounty), Google paid a $1,337 bounty, and GitHub awarded $500 through the Copilot Bounty Program. The $100 amount is notably low relative to the CVSS 9.4 rating; Anthropic\u2019s HackerOne program scopes agent-tooling findings separately from model-safety vulnerabilities. All three patched quietly, and none had issued CVEs in the NVD or published security advisories through GitHub Security Advisories as of Saturday.<\/p>\n<p>Comment and Control exploited a prompt injection vulnerability in Claude Code Security Review, a specific GitHub Action feature that Anthropic\u2019s own system card acknowledged is \u201cnot hardened against prompt injection.\u201d The feature is designed to process trusted first-party inputs by default; users who opt into processing untrusted external PRs and issues accept additional risk and are responsible for restricting agent permissions. Anthropic updated its documentation to clarify this operating model after the disclosure. The same class of attack operates beneath OpenAI\u2019s safeguard layer at the agent runtime, based on what their system card does not document \u2014 not a demonstrated exploit. The exploit is the proof case, but the story is what the three system cards reveal about the gap between what vendors document and what they protect.<\/p>\n<p>OpenAI and Google did not respond for comment by publication time.<\/p>\n<p>\u201cAt the action boundary, not the model boundary,\u201d Merritt Baer, CSO at Enkrypt AI and former Deputy CISO at AWS, told VentureBeat when asked where protection actually needs to sit. \u201cThe runtime is the blast radius.\u201d<\/p>\n<h2>What the system cards tell you<\/h2>\n<p>Anthropic\u2019s Opus 4.7 system card runs 232 pages with quantified hack rates and injection resistance metrics. It discloses a restricted model strategy (Mythos held back as a capability preview) and states directly that Claude Code Security Review is \u201cnot hardened against prompt injection.\u201d The system card explains to readers that the runtime was exposed. Comment and Control proved it. Anthropic does gate certain agent actions outside the system card\u2019s scope \u2014 Claude Code Auto Mode, for example, applies runtime-level protections \u2014 but the system card itself does not document these runtime safeguards or their coverage.<\/p>\n<p>OpenAI\u2019s GPT-5.4 system card documents extensive red teaming and publishes model-layer injection evals but not agent-runtime or tool-execution resistance metrics. Trusted Access for Cyber scales access to thousands. The system card tells you what red teamers tested. It does not tell you how resistant the model is to the attacks they found.<\/p>\n<p>Google\u2019s Gemini 3.1 Pro model card, shipped in February, defers most safety methodology to older documentation, a VentureBeat review of the card found. Google\u2019s Automated Red Teaming program remains internal only. No external cyber program.<\/p>\n<table>\n<tbody>\n<tr>\n<td>\n<p><b>Dimension<\/b><\/p>\n<\/td>\n<td>\n<p><b>Anthropic (Opus 4.7)<\/b><\/p>\n<\/td>\n<td>\n<p><b>OpenAI (GPT-5.4)<\/b><\/p>\n<\/td>\n<td>\n<p><b>Google (Gemini 3.1 Pro)<\/b><\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p>System card depth<\/p>\n<\/td>\n<td>\n<p>232 pages. Quantified hack rates, classifier scores, and injection resistance metrics.<\/p>\n<\/td>\n<td>\n<p>Extensive. Red teaming hours documented. No injection resistance rates published.<\/p>\n<\/td>\n<td>\n<p>Few pages. Defers to older Gemini 3 Pro card. No quantified results.<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p>Cyber verification program<\/p>\n<\/td>\n<td>\n<p>CVP. Removes cyber safeguards for vetted pentesters and red teamers doing authorized offensive work. Does not address prompt injection defense. Platform and data-retention exclusions not yet publicly documented.<\/p>\n<\/td>\n<td>\n<p>TAC. Scaled to thousands. Constrains ZDR.<\/p>\n<\/td>\n<td>\n<p>None. No external defender pathway.<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p>Restricted model strategy<\/p>\n<\/td>\n<td>\n<p>Yes. Mythos held back as a capability preview. Opus 4.7 is the testbed.<\/p>\n<\/td>\n<td>\n<p>No restricted model. Full capability released, access gated.<\/p>\n<\/td>\n<td>\n<p>No restricted model. No stated plan for one.<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p>Runtime agent safeguards<\/p>\n<\/td>\n<td>\n<p>Claude Code Security Review: system card states it is not hardened against prompt injection. The feature is designed for trusted first-party inputs. Anthropic applies additional runtime protections (e.g., Claude Code Auto Mode) not documented in the system card.<\/p>\n<\/td>\n<td>\n<p>Not documented. TAC governs access, not agent operations.<\/p>\n<\/td>\n<td>\n<p>Not documented. ART internal only.<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p>Exploit response (Comment and Control)<\/p>\n<\/td>\n<td>\n<p>CVSS 9.4 Critical. $100 bounty. Patched. No CVE.<\/p>\n<\/td>\n<td>\n<p>Not directly exploited. Structural gap inferred from TAC design, not demonstrated.<\/p>\n<\/td>\n<td>\n<p>$1,337 bounty per Guan disclosure. Patched. No CVE.<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p>Injection resistance data<\/p>\n<\/td>\n<td>\n<p>Published. Quantified rates in the system card.<\/p>\n<\/td>\n<td>\n<p>Model-layer injection evals published. No agent-runtime or tool-execution resistance rates.<\/p>\n<\/td>\n<td>\n<p>Not published. No quantified data available.<\/p>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>Baer offered specific procurement questions. \u201cFor Anthropic, ask how safety results actually transfer across capability jumps,\u201d she told VentureBeat. \u201cFor OpenAI, ask what \u2018trusted\u2019 means under compromise.\u201d For both, she said, directors need to \u201cdemand clarity on whether safeguards extend into tool execution, not just prompt filtering.\u201d<\/p>\n<h2>Seven threat classes neither safeguard approach closes<\/h2>\n<p>Each row names what breaks, why your controls miss it, what Comment and Control proved, and the recommended action for the week ahead.<\/p>\n<table>\n<tbody>\n<tr>\n<td>\n<p><b>Threat Class<\/b><\/p>\n<\/td>\n<td>\n<p><b>What Breaks<\/b><\/p>\n<\/td>\n<td>\n<p><b>Why Your Controls Miss It<\/b><\/p>\n<\/td>\n<td>\n<p><b>What Comment and Control Proved<\/b><\/p>\n<\/td>\n<td>\n<p><b>Recommended Action<\/b><\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p>1. Deployment surface mismatch<\/p>\n<\/td>\n<td>\n<p>CVP is designed for authorized offensive security research, not prompt injection defense. It does not extend to Bedrock, Vertex, or ZDR tenants. TAC constrains ZDR. Google has no program. Your team may be running a verified model on an unverified surface.<\/p>\n<\/td>\n<td>\n<p>Launch announcements describe the program. Support documentation lists the exclusions. Security teams read the announcement. Procurement reads neither.<\/p>\n<\/td>\n<td>\n<p>The exploit targets the agent runtime, not the deployment platform. A team running Claude Code on Bedrock is outside CVP coverage, but CVP was not designed to address this class of vulnerability in the first place.<\/p>\n<\/td>\n<td>\n<p>Email your Anthropic and OpenAI reps today. One question, in writing: \u2018Confirm whether [your platform] and [your data retention config] are covered by your runtime-level prompt injection protections, and describe what those protections include.\u2019 File the response in your vendor risk register.<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p>2. CI secrets exposed to AI agents<\/p>\n<\/td>\n<td>\n<p>ANTHROPIC_API_KEY, GEMINI_API_KEY, GITHUB_TOKEN, and any production secret stored as a GitHub Actions env var are readable by every workflow step, including AI coding agents.<\/p>\n<\/td>\n<td>\n<p>The default GitHub Actions config does not scope secrets to individual steps. Repo-level and org-level secrets propagate to all workflows. Most teams never audit which steps access which secrets.<\/p>\n<\/td>\n<td>\n<p>The agent read the API key from the runner env var, encoded it in a PR comment body, and posted it through GitHub\u2019s API. No attacker-controlled infrastructure required. Exfiltration ran through GitHub\u2019s own API \u2014 the platform itself became the C2 channel.<\/p>\n<\/td>\n<td>\n<p>Run: grep -r \u2018secrets\\.\u2019 .github\/workflows\/ across every repo with an AI agent. List every secret the agent can access. Rotate all exposed credentials. Migrate to short-lived OIDC tokens (GitHub, GitLab, CircleCI).<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p>3. Over-permissioned agent runtimes<\/p>\n<\/td>\n<td>\n<p>AI agents granted bash execution, git push, and API write access at setup. Permissions never scoped down. No periodic least-privilege review. Agents accumulate access in the same way service accounts do.<\/p>\n<\/td>\n<td>\n<p>Agents are configured once during onboarding and inherited across repos. No tooling flags unused permissions. The Comment and Control agent had bash, write, and env-read access for a code review task.<\/p>\n<\/td>\n<td>\n<p>The agent had bash access it did not need for code review. It used that access to read env vars and post exfiltrated data. Stripping bash would have blocked the attack chain entirely.<\/p>\n<\/td>\n<td>\n<p>Audit agent permissions repo by repo. Strip bash from code review agents. Set repo access to read-only. Gate write access (PR comments, commits, merges) behind a human approval step.<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p>4. No CVE signal for AI agent vulnerabilities<\/p>\n<\/td>\n<td>\n<p>CVSS 9.4 Critical. Anthropic, Google, and GitHub patched. Zero CVE entries in NVD. Zero advisories. Your vulnerability scanner, SIEM, and GRC tool all show green.<\/p>\n<\/td>\n<td>\n<p>No CNA has yet issued a CVE for a coding agent prompt injection, and current CVE practices have not captured this class of failure mode. Vendors patch through version bumps. Qualys, Tenable, and Rapid7 have nothing to scan for.<\/p>\n<\/td>\n<td>\n<p>A SOC analyst running a full scan on Monday morning would find zero entries for a Critical vulnerability that hit Claude Code Security Review, Gemini CLI Action, and Copilot simultaneously.<\/p>\n<\/td>\n<td>\n<p>Create a new category in your supply chain risk register: \u2018AI agent runtime.\u2019 Assign a 48-hour check-in cadence with each vendor\u2019s security contact. Do not wait for CVEs. None have come yet, and the taxonomy gap makes them unlikely without industry pressure.<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p>5. Model safeguards do not govern agent actions<\/p>\n<\/td>\n<td>\n<p>Opus 4.7 blocks a phishing email prompt. It does not block an agent from reading $ANTHROPIC_API_KEY and posting it as a PR comment. Safeguards gate generation, not operation.<\/p>\n<\/td>\n<td>\n<p>Safeguards filter model outputs (text). Agent operations (bash, git push, curl, API POST) bypass safeguard evaluation entirely. The runtime is outside the safeguard perimeter. Anthropic applies some runtime-level protections in features like Claude Code Auto Mode, but these are not documented in the system card and their scope is not publicly defined.<\/p>\n<\/td>\n<td>\n<p>The agent never generated prohibited content. It performed a legitimate operation (post a PR comment) containing exfiltrated data. Safeguards never triggered.<\/p>\n<\/td>\n<td>\n<p>Map every operation your AI agents perform: bash, git, API calls, file writes. For each, ask the vendor in writing: does your safeguard layer evaluate this action before execution? Document the answer.<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p>6. Untrusted input parsed as instructions<\/p>\n<\/td>\n<td>\n<p>PR titles, PR body text, issue comments, code review comments, and commit messages are all parsed by AI coding agents as context. Any can contain injected instructions.<\/p>\n<\/td>\n<td>\n<p>No input sanitization layer between GitHub and the agent instruction set. The agent cannot distinguish developer intent from attacker injection in untrusted fields. Claude Code GitHub Action is designed for trusted first-party inputs by default. Users who opt into processing untrusted external PRs accept additional risk.<\/p>\n<\/td>\n<td>\n<p>A single malicious PR title became a complete exfiltration command. The agent treated it as a legitimate instruction and executed it without validation or confirmation.<\/p>\n<\/td>\n<td>\n<p>Implement input sanitization as defense-in-depth, but do not rely on traditional WAF-style regex patterns. LLM prompt injections are non-deterministic and will evade static pattern matching. Restrict agent context to approved workflow configs and combine with least-privilege permissions.<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p>7. No comparable injection resistance data across vendors<\/p>\n<\/td>\n<td>\n<p>Anthropic publishes quantified injection resistance rates in 232 pages. OpenAI publishes model-layer injection evals but no agent-runtime resistance rates. Google publishes a few-page card referencing an older model.<\/p>\n<\/td>\n<td>\n<p>No industry standard for AI safety metric disclosure. Vendors may have internal metrics and red-team programs, but published disclosures are not comparable. Procurement has no baseline and no framework to require one.<\/p>\n<\/td>\n<td>\n<p>Anthropic, OpenAI, and Google were all approved for enterprise use without comparable injection resistance data. The exploit exposed what unmeasured risk looks like in production.<\/p>\n<\/td>\n<td>\n<p>Write one sentence for your next vendor meeting: \u2018Show me your quantified injection resistance rate for my model version on my platform.\u2019 Document refusals for EU AI Act high-risk compliance. Deadline: August 2026.<\/p>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>OpenAI\u2019s GPT-5.4 was not directly exploited in the Comment and Control disclosure. The gaps identified in the OpenAI and Google columns are inferred from what their system cards and program documentation do not publish, not from demonstrated exploits. That distinction matters. Absence of published runtime metrics is a transparency gap, not proof of a vulnerability. It does mean procurement teams cannot verify what they cannot measure.<\/p>\n<p>Eligibility requirements for Anthropic\u2019s Cyber Verification Program and OpenAI\u2019s Trusted Access for Cyber are still evolving, as are platform coverage and program scope, so security teams should validate current vendor docs before treating any coverage described here as definitive. Anthropic\u2019s CVP is designed for authorized offensive security research \u2014 removing cyber safeguards for vetted actors \u2014 and is not a prompt injection defense program. Security leaders mapping these gaps to existing frameworks can align threat classes 1\u20133 with NIST CSF 2.0 GV.SC (Supply Chain Risk Management), threat class 4 with ID.RA (Risk Assessment), and threat classes 5\u20137 with PR.DS (Data Security).<\/p>\n<p>Comment and Control focuses on GitHub Actions today, but the seven threat classes generalize to most CI\/CD runtimes where AI agents execute with access to secrets, including GitHub Actions, GitLab CI, CircleCI, and custom runners. Safety metric disclosure formats are in flux across all three vendors; Anthropic currently leads on published quantification in its system card documentation, but norms are likely to converge as EU AI Act obligations come into force. Comment and Control targeted Claude Code GitHub Action, a specific product feature, not Anthropic\u2019s models broadly. The vulnerability class, however, applies to any AI coding agent operating in a CI\/CD runtime with access to secrets.<\/p>\n<h2>What to do before your next vendor renewal<\/h2>\n<p>\u201cDon\u2019t standardize on a model. Standardize on a control architecture,\u201d Baer told VentureBeat. \u201cThe risk is systemic to agent design, not vendor-specific. Maintain portability so you can swap models without reworking your security posture.\u201d<\/p>\n<p><b>Build a deployment map. <\/b>Confirm your platform qualifies for the runtime protections you think cover you. If you run Opus 4.7 on Bedrock, ask your Anthropic account rep what runtime-level prompt injection protections apply to your deployment surface. Email your account rep today. (Anthropic Cyber Verification Program)<\/p>\n<p><b>Audit every runner for secret exposure. <\/b>Run grep -r \u2018secrets\\.\u2019 .github\/workflows\/ across every repo with an AI coding agent. List every secret the agent can access. Rotate all exposed credentials. (GitHub Actions secrets documentation)<\/p>\n<p><b>Start migrating credentials now. <\/b>Switch stored secrets to short-lived OIDC token issuance. GitHub Actions, GitLab CI, and CircleCI all support OIDC federation. Set token lifetimes to minutes, not hours. Plan full rollout over one to two quarters, starting with repos running AI agents. (GitHub OIDC docs | GitLab OIDC docs | CircleCI OIDC docs)<\/p>\n<p><b>Fix agent permissions repo by repo. <\/b>Strip bash execution from every AI agent doing code review. Set repository access to read-only. Gate write access behind a human approval step. (GitHub Actions permissions documentation)<\/p>\n<p><b>Add input sanitization as one layer, not the only layer. <\/b>Filter pull request titles, comments, and review threads for instruction patterns before they reach agents. Combine with least-privilege permissions and OIDC. Static regex will not catch non-deterministic prompt injections on its own.<\/p>\n<p><b>Add \u201cAI agent runtime\u201d to your supply chain risk register. <\/b>Assign a 48-hour patch verification cadence with each vendor\u2019s security contact. Do not wait for CVEs. None have come yet for this class of vulnerability.<\/p>\n<p><b>Check which hardened GitHub Actions mitigations you already have in place. <\/b>Hardened GitHub Actions configurations block this attack class today: the permissions key restricts GITHUB_TOKEN scope, environment protection rules require approval before secrets are injected, and first-time-contributor gates prevent external pull requests from triggering agent workflows. (GitHub Actions security hardening guide)<\/p>\n<p><b>Prepare one procurement question per vendor before your next renewal. <\/b>Write one sentence: \u201cShow me your quantified injection resistance rate for the model version I run on the platform I deploy to.\u201d Document refusals for EU AI Act high-risk compliance. The deadline is August 2026.<\/p>\n<p>\u201cRaw zero-days aren\u2019t how most systems get compromised. Composability is,\u201d Baer said. \u201cIt\u2019s the glue code, the tokens in CI, the over-permissioned agents. When you wire a powerful model into a permissive runtime, you\u2019ve already done most of the attacker\u2019s work for them.\u201d<\/p>\n<p><br \/>\n<br \/><a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>A security researcher, working with colleagues at Johns Hopkins University, opened a GitHub pull request, typed a malicious instruction into the PR title, and watched Anthropic\u2019s Claude Code Security Review action post its own API key as a comment. The same prompt injection worked on Google\u2019s Gemini CLI Action and GitHub\u2019s Copilot Agent (Microsoft). No [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":81541,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_daextam_enable_autolinks":"","jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[11],"tags":[],"class_list":["post-81540","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-tech-news"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"https:\/\/diyhaven858.wasmer.app\/wp-content\/uploads\/2026\/04\/hero_model_comparison.png","jetpack_sharing_enabled":true,"jetpack-related-posts":[],"_links":{"self":[{"href":"https:\/\/diyhaven858.wasmer.app\/index.php\/wp-json\/wp\/v2\/posts\/81540","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/diyhaven858.wasmer.app\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/diyhaven858.wasmer.app\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/diyhaven858.wasmer.app\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/diyhaven858.wasmer.app\/index.php\/wp-json\/wp\/v2\/comments?post=81540"}],"version-history":[{"count":0,"href":"https:\/\/diyhaven858.wasmer.app\/index.php\/wp-json\/wp\/v2\/posts\/81540\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/diyhaven858.wasmer.app\/index.php\/wp-json\/wp\/v2\/media\/81541"}],"wp:attachment":[{"href":"https:\/\/diyhaven858.wasmer.app\/index.php\/wp-json\/wp\/v2\/media?parent=81540"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/diyhaven858.wasmer.app\/index.php\/wp-json\/wp\/v2\/categories?post=81540"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/diyhaven858.wasmer.app\/index.php\/wp-json\/wp\/v2\/tags?post=81540"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}