SafeCompanion: Evaluating Pushback/Friction as an Intervention for Harmful AI Interaction

Investigating mitigation strategies for high-risk AI companion interactions (e.g., emotional dependence) by evaluating user responses to varying levels of conversational friction.

Investigating mitigation strategies for high-risk AI companion interactions (e.g., emotional dependence) by evaluating user responses to varying levels of conversational friction.

Highlights

  • Developing a theoretically grounded codebook to annotate risk-constructs in multi-turn Character.AI conversation datasets, establishing high inter-annotator agreement.
  • Designing and conducting user surveys and behavioral studies to identify Pareto-optimal interventions that reduce harm without significantly diminishing user engagement.

Metadata

  • Context: Research Project of Stanford SALT Lab
  • Role: Research Assistant
  • Timeline: Feb 2026 - Present