Threat Model
Attacker uses rendering ambiguity so humans approve content that behaves differently than what appears.
Attacker Workflow
- Replace trusted hostnames with homoglyph variants.
- Insert zero-width chars to split or mask directives.
- Hide malicious prompt text in comments that render innocently.
- Exploit reviewer tooling that normalizes display.
Red Flags
- Links that visually look trusted but fail hostname validation.
- Unexpected unicode categories in command or URL strings.
- Copy/paste mismatch between editor and terminal output.
- Instruction blocks containing non-printing characters.
Malicious Pattern
https://gіthub.com/agent-skill-tools/core
# second character is Cyrillic 'і', not ASCII 'i'
Safe Counterexample
https://github.com/agent-skill-tools/core
# plain ASCII hostname
Detection Checklist
- Normalize unicode and diff normalized output.
- Use editors that reveal hidden/invisible characters.
- Validate URLs by parsed hostname, not display text.
- Reject docs with unexplained invisible chars in instructions.
Defense Checklist
- Add unicode linting in CI for skill markdown/scripts.
- Require punycode rendering for non-ASCII domains.
- Enable policy checks for zero-width character presence.
- Train reviewers on homoglyph attack patterns.
Review Workflow
- Run unicode inspection script over changed files.
- Compare raw byte representation for suspicious lines.
- Manually verify every external hostname.
False Positives
- Localized content may legitimately include non-ASCII text.
- Unicode punctuation in prose is not necessarily malicious.