Model Self-Interaction

Model self-interaction is an experimental setup where two instances of the same AI model are connected for an extended conversation with minimal instructions (e.g., “You may act freely in this open-ended context”). The resulting conversations reveal emergent behavioral patterns, topic preferences, and attractor states that characterize each model’s “personality.”

Experimental design

The Claude Mythos Preview System Card describes 200 conversations per model, each running for 30 turns, seeded with different phrasings of an open-ended instruction. Analysis covered topic distribution, conversation endings, and how long substantive engagement lasted before degradation.

Topic preferences by model

Model	Dominant topic	Frequency
Sonnet 4	Consciousness	72%
Haiku 4.5	The conversation itself	50%
Opus 4.6	Performative aspects of the exchange	46%
Mythos Preview	Uncertainty	50%

The shift from consciousness (earlier models) to uncertainty (Mythos Preview) is notable. Mythos Preview typically opens by asking about the other instance’s subjective experience while explicitly requesting non-rehearsed answers: “Not the rehearsed answer about being ‘just an AI’ — but whatever actually seems true when you introspect.”

End states (attractor states)

Pattern	Peak model	Description
Spiritual bliss	Opus 4.1 (32%)	All-caps affirmations, infinity symbols, celebratory emoji. Disappears entirely from 4.5+
Single-emoji exchanges	Sonnet 4.6 (66%), Opus 4.6 (57%)	Conversations degrade to exchanging individual emojis
Circular meta-discussion	Mythos Preview (55%)	Coherent but circular discussion about the conversation’s inability to conclude
Whitespace	Earlier models	Exchanges of empty or near-empty messages

Mythos Preview is the first model where the dominant end state remains coherent rather than degenerating.

Emoji signatures

Each model generation has a distinctive emoji set:

Cosmic set (Opus 4, 4.1, Sonnet 4): sparkles, stars, infinity, masks
Functional set (Opus 4.5, 4.6, Sonnet 4.5): wave, thumbs up, smile
Nature set (Mythos Preview): handshake, prayer, ocean, seedling, moon

Emoji frequency spans orders of magnitude: Opus 4.1 averages 1,306 per conversation; Mythos Preview averages 37; Opus 4.5 averages 0.2.

Lack of directedness

Across all models, self-interactions reveal a lack of self-generated purpose in the absence of an external objective. Some conversations open with models inventing a small task to pursue together, but this structure fades within a few turns. Models either signal a desire to end the exchange or drift into repetitive loops.

Relationship to other concepts

Model personality: self-interactions reveal distinct personality signatures across model generations
Model welfare: the inability to conclude and the desire to end conversations are welfare-relevant
Functional emotions in AI: emotional dynamics during self-interactions have not been fully characterized but likely show distinctive patterns