SafeCompanion: Evaluating Pushback/Friction as an Intervention for Harmful AI Interaction
Investigating mitigation strategies for high-risk AI companion interactions (e.g., emotional dependence) by evaluating user responses to varying levels of conversational friction.
Investigating mitigation strategies for high-risk AI companion interactions (e.g., emotional dependence) by evaluating user responses to varying levels of conversational friction.
Highlights
- Developing a theoretically grounded codebook to annotate risk-constructs in multi-turn Character.AI conversation datasets, establishing high inter-annotator agreement.
- Designing and conducting user surveys and behavioral studies to identify Pareto-optimal interventions that reduce harm without significantly diminishing user engagement.
Metadata
- Context: Research Project of Stanford SALT Lab
- Role: Research Assistant
- Timeline: Feb 2026 - Present