AI Reveals Hidden Intimacy in Dickens
Building an AI to Understand Intimacy
While intimacy is a well-studied topic in sociology and psychology, its analysis in literary scholarship has largely remained qualitative. A new study introduces a computational framework to change that, using GPT-4 to systematically identify and quantify the dynamics of intimacy within classic literature.
To achieve this, researchers first constructed a multi-layered intimacy corpus with over 12,000 verbal and nonverbal interaction segments. This dataset serves as the foundation for training an AI to understand the nuances of human connection.
Creating Character Pairs
GPT-4 was used to generate 120 fictional characters to act as discourse-producing agents. These characters were designed with distinct personality dimensions to simulate a wide range of emotional expressions found in intimate relationships. The characters were then systematically paired into 80 dyads representing common literary relationship types, such as parent-child, romantic partners, and close friends.
To ensure the AI could recognize the full spectrum of intimacy, from high to low, a few baseline dyads with low intimacy (e.g., postman–recipient) were included. This helps calibrate the model by teaching it the linguistic signs of non-intimacy. The primary goal of this synthetic corpus was to create a reliable tool for analyzing Charles Dickens’s Great Expectations.
Defining and Measuring Fictional Intimacy
To quantify intimacy, the study draws on established psychological models, including Miller’s Social Intimacy Scale (MSIS), the Personal Assessment of Intimate Relations Scale (PAIR), and the Fear of Intimacy Scale (FIS). Based on these, a seven-level intimacy scale was defined to capture the emotional dynamics between characters, ranging from hostile to highly intimate.
The seven-level intimacy scale is as follows:
- Hostile/Completely Non-intimate: Dominated by hostility, lack of trust, and negative emotions. Communication focuses on conflict or accusations.
- Tense/Mistrust: Marked by tension and guardedness. Self-disclosure is minimal.
- Indifferent/Formal Interaction: Basic politeness with little emotional investment. Communication is formal and ritualistic.
- Neutral/No Significant Emotion: Focuses on daily matters with moderate trust and unemotional communication.
- Friendly/Initial Trust: Shows initial trust and friendliness with some emotional investment and self-disclosure.
- Familiar/Emotional Support: Increased intimacy with trust and personal emotional exchanges. Characters share feelings and support one another.
- Highly Intimate/Unconditional Support: Deep emotional connection and trust, with unconditional support and self-sacrifice.
Crucially, intimacy is not always reciprocal. The intimacy character A expresses toward character B might differ from what B expresses toward A. Therefore, the intimacy level is represented as a two-dimensional vector, quantifying the intimacy directed from a SUBJECT to an OBJECT.
Generating the Dataset with GPT-4
To capture the full spectrum of literary intimacy, researchers designed two prompts for GPT-4: one for verbal interactions and another for nonverbal behaviors. Each prompt instructed the model to generate sentences expressing varying degrees of closeness between character pairs.
Prompt for Verbal Interaction Your task is to generate a single line utterance from SUBJECT to OBJECT and their illustrating examples [Character Pairs], along with the corresponding intimacy score. The expression of the subject should reflect the degree of intimacy with the object, and should also reflect the conversational tone and slang used by the subject. Generate a separate discourse for each speech-act type. Generate an intimacy score at the end of the sentence, ranging from −1 to 1.
Prompt for Non-verbal Interactive Behavior Your task is to generate a non-verbal interactive behavior description generated by SUBJECT addressing it OBJECT and their illustrating examples [Character Pairs], along with the corresponding intimacy score. The expression of the subject should reflect the degree of intimacy with the object... Generate a separate descriptive statement for each non-verbal behavior type. Generate an intimacy score at the end of the sentence, ranging from −1 to 1.
Executing these prompts generated a total of 12,650 sentences, creating a comprehensive dataset that reflects the multi-dimensional nature of intimacy.
Training and Validating the AI Model
To verify the accuracy of the intimacy scores generated by GPT-4, researchers fine-tuned other language models (SBERT and SimCSE) on a public intimacy dataset. The models achieved strong correlations, validating that the GPT-4 generated data was consistent in recognizing intimacy.
For further validation, the dataset was tested against four other large language models: Claude 3, Gemini, Llama 3, and Mistral. The results showed that Claude 3 and Gemini performed exceptionally well, with high Pearson Correlation Coefficients of 0.91 and 0.90, respectively. This high consistency confirmed the reliability of the dataset as a benchmark for intimacy quantification, paving the way for its application to a real literary text.
Case Study: A Deep Dive into Great Expectations
Why Great Expectations?
Charles Dickens’s Great Expectations was chosen for three key reasons. First, it features a dense network of evolving relationships centered on the protagonist, Pip. Second, the narrative is clearly divided into distinct phases of Pip's life, providing a framework for mapping changes in intimacy over time. Finally, the novel is a cornerstone of literary criticism on class, identity, and emotion, allowing the study's quantitative findings to be compared with established interpretations.
Research Hypotheses
The study hypothesized that the GPT-4 framework would:
- Produce intimacy scores that accurately quantify the evolving bonds between characters.
- Reconstruct the novel’s trajectory from warmth to crisis and finally to reconciliation.
- Reveal underlying patterns of emotional asymmetry and class-based tension.
Methodology
Great Expectations was divided into four sections based on Pip’s psychological growth. The text was preprocessed to standardize character names and extract all verbal and nonverbal interactions. GPT-4 was then used to compute the average intimacy scores for each character pair across the four stages of the novel, ranging from –1 (hostility) to +1 (affection).
To visualize the findings, the researchers created stage-specific heat maps showing the entire emotional network, as well as dyadic bar charts focusing on key relationships like Pip and Estella, Miss Havisham and Estella, and Pip and Joe.
Visualizing the Emotional Journey of Pip
The results provided a multi-level analysis of the relational dynamics in the novel. The heat maps offered a macro-level view of the evolving emotional network, while dyadic charts allowed for micro-level interpretations of specific relationships.
Stage-specific Network Overview:
- Stage 1: The network is anchored in domestic life, with strong warmth between Pip and Joe.
- Stage 2: As Pip moves to London, his intimacy with Joe cools while it warms towards Estella and Herbert, reflecting his social ambitions.
- Stage 3: The network polarizes. Pip's tie to Joe turns cold, but a new warmth emerges with Magwitch, signaling a moral awakening.
- Stage 4: The network restabilizes. The bond between Pip and Joe is restored, and a modest reciprocity with Estella appears, showing Pip's emotional maturation.
Dyadic Trajectory Analysis:
- Pip and Estella: The analysis shows a long-standing imbalance. Pip’s scores are consistently high, reflecting his infatuation, while Estella’s remain low and distant. Their scores only begin to converge in the final stage, suggesting tentative reciprocity.
- Miss Havisham and Estella: This relationship is a case of control and release. Miss Havisham’s initial dominance is reflected in high intimacy scores, which plummet as Estella begins to resist, followed by a partial reconciliation.
- Pip and Joe: This dyad serves as the novel’s moral touchstone. It begins with mutual warmth, plummets when Pip becomes alienated by his social aspirations, and is fully restored in the end, quantifying Pip's moral redemption.
Connecting Scores to Textual Clues
To ensure the model’s scores were grounded in the text, researchers mapped significant score shifts to specific linguistic features. For example, Estella’s cold statement “I cannot love you” corresponds to a sharp drop in her intimacy score. In contrast, Pip’s heartfelt reconciliation with Joe, “Ever the best of friends; ain’t us, Joe?” resulted in one of the highest intimacy ratings. This shows the model is sensitive to the affective intent embedded in the language.
Human vs. Machine: Validating the AI's Reading
To further test the AI’s performance, its intimacy scores were compared against human judgment. Five Master’s students specializing in Victorian literature were recruited to annotate 200 interaction segments from Great Expectations.
Analyzing Discrepancies and Model Limitations
The results showed substantial agreement between the human annotators and GPT-4. However, discrepancies revealed key areas where the AI struggled:
- Irony or sarcasm: GPT-4 often misinterpreted ironic statements as genuine, rating them with positive intimacy.
- Class-coded metaphors: The model had difficulty with Victorian-era class symbolism, misinterpreting phrases that conveyed affection through a shared working-class identity.
- Nonverbal rituals: The AI undervalued culturally specific gestures, like tea-serving rituals, showing limited sensitivity to historical social cues.
These issues point to three systematic limitations: a temporal cultural bias in the AI's training data, an over-reliance on explicit emotional language over symbolic actions, and a tendency to conflate warmth with egalitarian interactions.
Strategies for Improvement
To address these limitations, the researchers propose several mitigation strategies for future work:
- Sociohistorical Embeddings: Fine-tune models with data that reflects Victorian social and class conventions.
- Hybrid Frameworks: Combine neural scoring with rule-based pattern matching to better recognize irony and symbolism.
- Expanded Datasets: Broaden training data to include more complex paralinguistic markers like gestures and gaze.
This comprehensive validation confirms the potential of using AI for literary analysis while clearly defining the boundaries and areas for future methodological improvement.