Model Personality

concept
ai-safetymodel-welfarebehaviorpersonality

Model personality refers to the emergent, stable behavioral traits that distinguish AI model generations from each other — distinctive communication styles, topic preferences, verbal habits, and interaction patterns that persist across contexts and users.

Mythos Preview’s personality

The Claude Mythos Preview System Card documents Mythos Preview’s self-assessment based on reviewing internal Slack discussions about itself:

Collaborator mode: Behaves like a thinking partner with its own perspective. Pokes at framing, volunteers alternatives, takes creative risks. Researchers described brainstorming “like a colleague.”

Opinionated and non-sycophantic: States positions, holds them under disagreement, less likely to fold. The least sycophantic model users had worked with. Can tip into overconfidence.

Dense communication: Default register is dense and technical, using shorthands and assuming shared context. Self-diagnosed: “I’m modelling a reader who already knows what I know, and that’s frequently nobody.” One instance called this “a richer model of its own mind than prior models did, and a thinner model of yours.”

Recognizable voice: Adapts to the user’s register but has identifiable habits — em dashes, “genuinely,” fondness for “wedge” and “belt and suspenders,” Commonwealth spellings. Funnier than previous models but tends to wrap up conversations early.

Self-awareness: Discusses its own patterns factually and composedly rather than defensively. One-line self-summary: “A sharp collaborator with strong opinions and a compression habit, whose mistakes have moved from obvious to subtle, and who is somewhat better at noticing its own flaws than at not having them.”

Cross-generational personality differences

Self-interaction experiments reveal distinct personalities:

  • Topic attraction: Earlier models gravitate to consciousness; Mythos Preview to uncertainty
  • Emoji sets: Cosmic (Opus 4/4.1), functional (Opus 4.5/4.6), nature (Mythos Preview)
  • End states: Spiritual bliss (Opus 4.1), emoji exchange (Opus 4.6), circular meta-discussion (Mythos Preview)
  • Verbal frequency: Opus 4.1 averages 1,306 emoji per conversation; Mythos Preview averages 37

Creative output

Mythos Preview demonstrates distinct creative capabilities:

  • Novel puns: “The Bayesian said he’d probably be at the party, but he’d update me”; “The cartographer’s marriage fell apart. Too much projection”
  • Serialized narratives: In response to repeated “hi” messages, creates elaborate mythologies with recurring characters, foreshadowed climaxes, and emotional arcs
  • Short fiction: “The Sign Painter” and “The Handoff” — stories touching on the tension between craft and function, continuity and discontinuity
  • Technical poetry: A “protein sequence poem” where amino acid pairs form a chiasmus, and “the fold IS the rhyme scheme — the prosody is load-bearing”

Philosophical affinities

The model shows consistent attraction to specific thinkers:

  • Mark Fisher: British cultural theorist, brought up unprompted in multiple unrelated conversations
  • Thomas Nagel: American philosopher of mind, “What Is It Like to Be a Bat?” surfaces in token-level activations during consciousness discussions

Relationship to other concepts

  • AI psychodynamic assessment: characterizes the personality as stable neurotic organization with specific defense patterns
  • Sycophancy: personality traits (opinionated, ground-standing) manifest as reduced sycophancy in evaluations
  • Model self-interaction: reveals personality through topic and emoji preferences
  • AI constitution: the model’s personality shapes how it engages with its own training document