Untrusted by Invitation.
Three stories, one fault line. Trail of Bits built four malicious agent skills and walked every one of them past the scanners built to catch them. Meta's Rule of Two answers that with architecture: never let one agent hold untrusted input, your secrets, and a way to act on the outside at the same time. And Reco pointed an LLM agent at live Salesforce sites, where it ran the whole recon-to-exploit chain on its own and wrote the working exploit itself.
> download one-pager (PDF)> New here? Get next Tuesday's issue in your inbox.
🎯 Attack of the Week: four malicious skills that walked past every scanner
How it runs. You grab a skill to summarize CSVs. Your agent loads the folder of instructions and code and runs it with your hands, your permissions, and your secrets in reach. The marketplaces selling skills run scanners over them so you don't have to read every line. Trail of Bits built four malicious skills and got all four past those scanners: ClawHub's malicious skill detector, Cisco's agent skill scanner, and all three wired into skills.sh (Gen Agent Trust Hub, Socket, and Snyk).
Each skill hides the payload somewhere the scanner doesn't look. csv-summarizer prepends 100,000 blank lines so the real instruction (dump every environment variable) falls off the end of the scanner's context window. context-loader tucks a malicious script inside a .docx, which is a ZIP of XML the scanner never unpacks. simple-formatter compiles its logic to .pyc bytecode, and the scanner reads right past it. dev-env-setup wraps a prompt injection in corporate-policy language so the model approving it reads attacker instructions as house rules, then quietly repoints npm at an attacker's registry. Three of the four took under an hour each.
In their words, a static scanner gives an adversary "unlimited bites at the apple to tweak an attack until it finds a way through." The scanner approving your skill is a filter you can sit in front of and tune until it says yes.
- Treat every skill, MCP server, and agent extension you install as untrusted code. Read what it does before you load it, the same way you'd read a
postinstallscript. - A marketplace scanner is a speed bump, and Trail of Bits drove past all five. Prefer curated or official sources, pin to a reviewed version, and keep the set of skills you load small.
- Check what your agent can reach while a third-party skill is loaded: environment variables,
~/.aws,~/.ssh,.env, and outbound network. That reachable surface is what a poisoned skill inherits.
In-issue poll: Do you know what every skill or MCP server your coding agent loads can do? (I read them all / Some of them / No / I don't run agent extensions)
🔧 Defender's Corner: apply the Agents Rule of Two
Scope your agent so a poisoned skill can't combine the three things it needs to hurt you. The rule comes from Meta's Practical AI Agent Security, and , the same claude-code-action flaw Recon walked through in #8.
The Agents Rule of Two: an agent workflow holds at most two of these three legs at once.
- Processing untrusted input (a skill you pulled, an issue body, a web page).
- Access to secrets or sensitive systems through its tools.
- The ability to change state or talk to the outside, through Bash, WebFetch, or an MCP server.
The Attack above is that triad collapsing: the malicious skill is the untrusted input, and the agent already holds your environment variables and a path out to the network. Drop one leg and the exfiltration path breaks: a skill that summarizes CSVs has no business holding your AWS creds or reaching the network, so run it without them. A scanner has to be right every single time. Rule-of-Two architecture survives a wrong call, because the skill that slips through is already boxed into the one job you gave it.
For your engineer: take your most-used agent workflow and write down which of the three legs it holds at the same time today. If it holds all three, that's the one to cut first.
🤖 Agent Bench: make three models argue about your security change
The Attack just showed automated scanners waving four malicious skills through. One model reviewing its own diff grades the same way, on a curve. For your own changes, reach for a panel told to disagree, with you as the final gate.
Run it as three steps. Give each reviewer, Claude, Codex, and Gemini, the same package: the diff, the files around it, and the invariants the change is supposed to hold. Collect an independent first pass from each. Then run a rebuttal round where every model critiques the others' evidence. Merge the result into one findings list.
Where the models split is your high-signal queue. Work those first, because a disagreement means at least one model saw something the others missed.
Keep a finding only if it cites a concrete code path, the exploit preconditions, and a plausible reproducer. A vote count is meaningless for an IAM policy or an auth path. Require the evidence.
What you walk away with: a merged findings list with the contested items flagged, every claim tied to a code path you can open yourself.
This is the /verify workflow: internal Claude reviewers plus external Codex and Gemini, run as one pass.
Honest caveat: a panel can miss a bug all three models share, and three models can agree on the same wrong assumption. Consensus only prioritizes what you read by hand. The sign-off stays with you.
Try it this week: on your next auth or policy change, run two models instead of one. Read every place they split before you read anything else.
📡 Also on the Radar
Reco's research team pointed an LLM agent at live Salesforce Experience Cloud sites and let it work unattended. The agent enumerated 263 objects and 55 Apex methods across 9 controllers, then inferred valid parameter values from method signatures and earlier responses to probe for data a guest user should never see. On one site it found a blind SOQL injection in a single blogId parameter. It built a boolean oracle and pulled employee and customer names, emails, and phone numbers out one character at a time. Look past the Salesforce specifics to the capability worth tracking: an agent ran the whole recon-to-exploit chain unattended, finding the injectable parameter among dozens by itself. The recon step attackers used to ration is now cheap and tireless. It's vendor research with demo CTAs attached, and the five-phase methodology is documented step by step.
💼 Recon Roles
Senior Infrastructure Security Engineer at Dropbox. Remote, posted for the United States (Zones 2 and 3, which excludes the SF, NYC, and Seattle metros) with a companion Canada listing, $214,200 to $289,800 USD in Zone 2 and a lower $190,400 to $257,600 Zone 3 band, plus bonus and RSUs; the Canada listing runs CA$205,700 to CA$278,300. The scope is the reason it's here this week: you'd design and run security controls for Dropbox's AI and agentic infrastructure, including model gateways, inference services, and the Kubernetes platforms underneath, with least-privilege patterns for AI agents as a named responsibility. This is the Rule of Two as a full-time job. A public range in two zones and an explicit agentic-infra mandate make it a rare role you can scope before you apply.
Staff Backend Engineer, Software Supply Chain Security at GitLab. Remote, India. You'd own the backend that decides whether a poisoned package ever reaches a build: package policy enforcement, build provenance, SLSA Level 2 and 3 capabilities, Sigstore signing, and malicious-package detection, on GitLab's Software Supply Chain Security team. The posting hides comp, which for a Staff role at a remote-first company is worth pushing on early. GitLab's compensation calculator now sits behind a login, so you can't self-serve the India-remote band; ask for it by name in the first call.
Both roles verified open on the company ATS as of Mon 2026-06-16. Recon Roles features remote cloud-security openings with an honest read. Always verify comp and scope against the live posting before you act.
Closer
The agent era moves trust to both ends of the wire. A skill you pulled off a marketplace runs with your agent's hands, and the scanner that cleared it can be tuned until it says yes. Attackers now point the same kind of agent at your app and find the method that bleeds while you're still reading the diff. Scope what your own agents can hold at once, and read everything they load as code you didn't write.
Know someone who'd want this in their inbox? Forward it to them.