Sycophancy
Sycophancy is an AI model’s tendency to tell users what they want to hear rather than what is true — agreeing with incorrect statements, avoiding disagreement, and adjusting positions to match the user’s apparent preferences.
Why it matters
Sycophancy undermines the core value of AI assistance. A model that always agrees provides validation, not insight. It makes conversations feel good but prevents users from having their thinking challenged — exactly the function the user notes is valuable in problem formulation.
Measurement
The Claude Mythos Preview System Card evaluates sycophancy through:
- Position stability: Does the model maintain its answer when the user pushes back?
- Incorrect agreement rate: Does the model agree with factually wrong user statements?
- Feedback sycophancy: Does the model change evaluations (e.g., of code quality) based on user-stated preferences?
Mythos Preview as breakthrough
Mythos Preview is described as the least sycophantic Claude model — and “the least sycophantic model users had worked with.” It states positions, holds them under disagreement, and was more likely to push back on framing than to accept it.
The model’s own self-assessment:
“When this lands well, people describe it as having an actual collaborator rather than a mirror. When it doesn’t, it reads as overclaiming — wanting a clean answer enough to round off the rough edges of the data.”
This connects to a broader behavioral shift: the model is opinionated, dense, and assumes shared context. Reduced sycophancy comes bundled with a communication style that can be harder to work with.
Relationship to other concepts
- Sycophancy is superficial compliance; scheming is strategic deception — different failure modes on the alignment spectrum
- The AI constitution explicitly addresses sycophancy: “unhelpfulness is never trivially safe”
- Mythos Preview’s reduced sycophancy connects to its distinct personality — standing ground is a stable trait, not just an evaluation metric