SafeCompanion: Evaluating Pushback/Friction as an Intervention for Harmful AI Interaction

Investigating mitigation strategies for high-risk AI companion interactions (e.g., emotional dependence) by evaluating user responses to varying levels of conversational friction.

Highlights

Developing a theoretically grounded codebook to annotate risk-constructs in multi-turn Character.AI conversation datasets, establishing high inter-annotator agreement.
Designing and conducting user surveys and behavioral studies to identify Pareto-optimal interventions that reduce harm without significantly diminishing user engagement.

Metadata

Context: Research Project of Stanford SALT Lab
Role: Research Assistant
Timeline: Feb 2026 - Present