Job Details

Job Overview

Transform cutting-edge research into reliable, user-visible AI features in a mental health context. This role focuses on turning prompts into production-ready end-to-end LLM features for voice and text therapy sessions, prioritizing safety and evaluation to ensure quality outcomes. Key Responsibilities: Design and deploy LLM features for therapy, manage offline and online evaluation loops, implement safety systems, and drive a weekly prototype to release cadence. Required Technical Skills:...

Responsibilities

You’ll be the force-multiplier who turns research and product ideas into reliable, safe, user-visible features—fast. Day to day you’ll design prompts/agents, wire in tools, build evals and safety guardrails, and ship to production with the founders, always tying quality to real outcomes. (NEW GRADS OKAY)

  • Ship end-to-end LLM features for voice + text therapy sessions: prompt/agent design, tool use/function calling, latency & turn-taking handling, and production deployment.
  • Build offline and online eval loops (unit tests, regression suites, shadow/prod checks) that track reliability and user outcomes (e.g., anxiety-score trends like GAD-7), and use them to decide what ships.
  • Implement and iterate safety systems: risky-response detection, fallback/deferral strategies, human-in-the-loop escalation, and post-incident reviews; treat safety as a first-class product feature.
  • Own Python services and lightweight product surfaces (internal tools, small UX hooks) that speed up experimentation and founder feedback loops.
  • Partner with Design (voice/chat UX) and Clinical Research to translate findings into product improvements and safeguards; instrument what you ship so we can learn quickly in production.
  • Drive a weekly shipping cadence: prototype → evaluate → harden → release; document decisions and metrics so the team can build on them.

Qualifications

Sonia optimizes for builders who have actually shipped LLM systems and love the loop of design → measurement → iteration, in a small, in-person SF team.

  • Strong Python plus hands-on prompt/LLM engineering (tool use, function calling, retrieval or memory patterns, evals); you’ve shipped something real users touched.
  • Product sense and speed: you can simplify ambiguous problems, choose pragmatic baselines, and deliver value in days—not quarters.
  • Track record of safety-critical thinking (red-flag detection, guardrails, fallback paths) and comfort being accountable for quality in production.
  • Evidence-driven mindset: you instrument features and are comfortable tying quality to measurable outcomes (not just engagement).
  • Collaboration in a tiny, high-trust team: you like building in person with founders and cross-functional partners (design/research). In-person, San Francisco required; w/ US work authorization (with no sponsorship needs)

Ideal Candidate

We’re looking for a design–engineering hybrid who owns discovery → UX/UI → implementation for our iOS app, iterates quickly with the founders in person in San Francisco, and cares deeply about measurable outcomes and safety.

What makes you a strong fit

• You’ve shipped mobile product end-to-end (portfolio shows discovery, design, and hands-on build), ideally in Swift/SwiftUI.

• You can design stateful conversational experiences (voice + chat) and instrument what you ship to learn quickly.

• You use research and data to decide—e.g., you can explain how you’d evaluate changes with validated measures like GAD-7, not just engagement metrics.

• You thrive on small-team pace and high ownership, collaborating daily with founders in person in SF.

• You treat safety as a product feature and can describe guardrails you’ve designed for sensitive contexts.

Benchmark profile (from founders)

• Design: young, hungry, ideally with some engineering skills or interest; new-grad OK.

https://x.com/floguo (design engineer; strong product taste + build skills).

Signals we’re likely to pass

• Only visual polish with no shipped, user-validated work.

• Can’t work in person in San Francisco.

• No experience designing or building for voice/chat or other safety-critical UX.

Why this is exciting

• Mission with evidence: Sonia is building a safe AI therapist (voice + text) and publicly emphasizes outcomes; you’ll shape how the app looks, feels, and measures progress from day one.

Must-Have Requirements

  • In-person, San Francisco (team works on-site). (From founder email + YC page)
  • US work authorization (YC lists “US citizen/visa only”).
  • Strong Python & Swift Mobile Development
  • Evidence of shipped LLM/agent features (code or live demo).
  • Safety + eval mindset (guardrails, pre-delivery checks) given the mental-health context

Screening Questions

1. (Optional Video). This step is completely optional. If you’d like, record a short 2–3 minute video introducing yourself and your experience — or share a recording of your interview with the recruiter if that’s easier. You can upload the link via Loom or Google Drive. This just helps us get to know you better, but there’s no pressure if you’d prefer to skip it.
2. (Optional Portfolio / GitHub) If available, please share a link to your GitHub, portfolio, or any recent projects you’ve worked on. This is entirely optional but helps provide more context about your work.
3. Why Sonia? What about our mission (building a safe AI therapist, voice + text) and our in-person SF culture resonates with you?

Client Messaging Channel

Client Messaging Channel

Please sign in and apply for this bounty to gain access to the messaging channel.

Login & Apply to View More

Sign in to your account to access full job details and apply.