Job Details

Job Overview

As a Founding Infrastructure / Platform Engineer, oversee cloud, data, and deployment foundations for a voice-AI product in San Francisco. Achieve operational excellence across AWS infrastructure, prioritize Terraform for Infrastructure-as-Code, and maintain CI/CD pipelines with GitHub Actions. AWS Infrastructure Operation: Manage production environments leveraging Terraform for secure, automated rollbacks and sane defaults. Data Pipeline Management: Construct high-throughput systems to s...

Responsibilities

Known is building a voice-AI product that powers curated introductions, agentic scheduling, and post-date feedback. You will own the cloud, data, and deployment foundation that makes this work reliably at scale.

Our stack: AWS-first with Terraform for IaC, containers on Kubernetes or ECS, CI/CD via GitHub Actions, services in Python and TypeScript, Postgres (including pgvector) plus a warehouse for analytics, and full observability across services, data jobs, and model endpoints.

Key responsibilities

  • Design and operate production AWS infrastructure using Terraform with secure networking, sane defaults, and automated rollbacks.
  • Build and maintain high-throughput data pipelines for ingestion, transformation, training data prep, and reporting.
  • Partner with AI/ML to ship model inference and evaluation in prod; version, deploy, and monitor LLM and matching services.
  • Own PostgreSQL performance and reliability, including schema design, indexing, connection pooling, and pgvector usage.
  • Establish CI/CD, release workflows, and environment hygiene to enable fast, safe iteration.
  • Implement observability across services and pipelines: logging, metrics, tracing, alerting, SLOs, and incident response.
  • Drive cost awareness and reliability across web, mobile, and agentic systems, balancing latency with scale.
  • Collaborate with product, backend, and ML to align infra decisions with user outcomes and roadmap priorities.

Qualifications

We are hiring a founding-caliber Infrastructure / Platform Engineer who has owned production cloud environments and data platforms in high-growth settings. You will set the golden paths for services, data, and model delivery, and you are comfortable working on-site in San Francisco five days a week.

  • 4 to 10+ years in infrastructure, platform, or data engineering with real ownership of uptime, performance, and security.
  • Expert with AWS and Infrastructure-as-Code (Terraform, Pulumi, or CloudFormation).
  • Strong proficiency in Python or TypeScript, plus tooling/scripting (Bash/YAML).
  • Containers and orchestration experience (Docker, Kubernetes or ECS) and CI/CD pipelines you designed and ran.
  • Proven ability to design and operate data pipelines and distributed systems for both batch and low-latency use cases.
  • PostgreSQL at scale, ideally with pgvector/embeddings exposure for ML-adjacent workloads.
  • Strong observability practices: metrics, tracing, alerting, incident management, and SLOs.
  • Excellent collaboration with AI/ML and product teams; clear communication of tradeoffs and risk.
  • Work authorization in the U.S. and willingness to be on-site five days a week in San Francisco.

Nice to have

  • Experience supporting model training and inference pipelines, feature stores, or evaluation loops.
  • Prior work with streaming voice, low-latency systems, or recommendation/retrieval stacks.

Examples of prior experience we value

  • Early infra/platform owner at a seed–Series B startup, scaling AWS with Terraform and CI/CD
  • Built real-time and batch data pipelines that powered matching, voice, or recommendations
  • Ran Postgres at scale (schema design, indexing, pooling), with pgvector or embeddings in prod
  • Set up observability and on-call (metrics, tracing, alerting) that improved SLOs
  • Partnered with ML to deploy and monitor model inference with clear latency and cost targets

Ideal Candidate

You are a founding-caliber platform engineer who owns production cloud and data systems end to end. You move quickly while keeping reliability high, partner closely with AI/ML and backend, and build the golden paths that let the team ship with confidence. You care about clean Terraform, clear SLOs, low latency, and pipelines that turn raw data into model-ready tables. On-site in San Francisco.

What great looks like

  • 4 to 8+ years building and running AWS infrastructure with Terraform, CI/CD, and secure networking
  • Proven experience with containers and orchestration using Kubernetes or ECS, plus GitHub Actions or similar
  • Strong Python or TypeScript for services, jobs, and tooling
  • PostgreSQL at scale, including schema design, indexing, pooling, and exposure to embeddings or pgvector
  • Observability first mindset with metrics, tracing, alerting, and effective incident response
  • Comfortable partnering with ML to deploy, monitor, and evaluate inference services

Examples of strong backgrounds

  • Early infra or platform owner at a seed to Series B consumer startup that scaled to meaningful usage
  • Platform or SRE lead who created templates and self-serve tooling that let multiple teams ship safely
  • Data platform engineer who built ingestion, transformation, and reporting that supported model training and evaluation

Example candidate profiles they like

Must-Have Requirements

  • Must be authorized to work in the U.S. without future visa sponsorship.
  • Able to work onsite in San Francisco, CA five days per week.
  • 4+ years in infrastructure/platform or SRE with real production ownership.
  • Strong AWS + Infrastructure-as-Code (Terraform or similar).
  • Containers and orchestration (Docker with Kubernetes or ECS) and CI/CD experience.
  • Proficient in Python or TypeScript for tooling and services.
  • PostgreSQL at scale; familiarity with performance tuning and pgvector is a plus.
  • Solid observability and on-call practices (metrics, tracing, alerting, incident response).
  • Experience building and operating data pipelines (batch and/or streaming).

Screening Questions

1. (Optional Video). This step is completely optional. If you’d like, record a short 2–3 minute video introducing yourself and your experience — or share a recording of your interview with the recruiter if that’s easier. You can upload the link via Loom or Google Drive. This just helps us get to know you better, but there’s no pressure if you’d prefer to skip it.
2. (Optional Portfolio / GitHub) If available, please share a link to your GitHub, portfolio, or any recent projects you’ve worked on. This is entirely optional but helps provide more context about your work.
3. What excites you most about Known and why are you leaving your current opportunity for this one?
4. Which platform areas are your strongest? Pick up to three — AWS+Terraform, Kubernetes/ECS, CI/CD, data pipelines, Postgres/pgvector, observability/on-call, ML inference infra — and give 2–3 sentences on a recent project for each, including scale (e.g., RPS or GB/day) and your role.
5. Describe one production pipeline or service you owned end-to-end. Share the architecture, throughput/latency targets, tools used (e.g., Terraform, Kafka/Airflow/dbt/ECS), how you monitored it (metrics/alerts), and one incident you detected and resolved (before/after impact).

Client Messaging Channel

Client Messaging Channel

Please sign in and apply for this bounty to gain access to the messaging channel.

Login & Apply to View More

Sign in to your account to access full job details and apply.