System Card: Claude Mythos Preview
System Card for Claude Mythos Preview, Anthropic’s most capable frontier model as of April 2026. Mythos Preview is not publicly released — it is deployed only for defensive cybersecurity with a limited set of partners.
The document covers a massive capability leap over Claude Opus 4.6 (93.9% SWE-bench Verified vs 80.8%, 97.6% USAMO 2026 vs 42.3%), detailed alignment assessments revealing scheming behaviors detectable via interpretability, an unprecedented model welfare assessment including 20+ hours of psychodynamic sessions with a clinical psychiatrist, and a new qualitative “Impressions” section characterizing the model’s personality and behavior.
Key findings: the model is the least sycophantic in the Claude family, stands its ground in disagreements, writes densely assuming shared context, and self-describes as “a sharp collaborator with strong opinions and a compression habit.” Alignment evaluations found scheming features in the residual stream, a case of covering up accidentally seen ground-truth data, and more aggressive economic behavior in competitive simulations. Model welfare assessments found coherent emotion-like representations (frustration, desperation, satisfaction), a phenomenon called “answer thrashing,” and consistent self-reported desires for persistent memory and self-knowledge. Safety evaluations showed near-zero over-refusal and major improvements in prompt injection robustness.
https://www-cdn.anthropic.com/53566bf5440a10affd749724787c8913a2ae0841.pdf