Infrastructure / Platform Engineer (AI Voice & Social Product) – w/ Equity

Bounty Amount: $18,000-20000

Company Name: Known

Role Type: Full-Time

Location: San Francisco, CA (onsite 5 days a week)

Salary / Hourly Rate: $180,000 - $200,000 per year

Benefits: US Citizens Only

Role Information

Role Overview: N/A

Responsibilities: Design and operate production AWS infrastructure using Terraform with secure networking, sane defaults, and automated rollbacks., Build and maintain high-throughput data pipelines for ingestion, transformation, training data prep, and reporting., Partner with AI/ML to ship model inference and evaluation in prod; version, deploy, and monitor LLM and matching services., Own PostgreSQL performance and reliability, including schema design, indexing, connection pooling, and pgvector usage., Establish CI/CD, release workflows, and environment hygiene to enable fast, safe iteration., Implement observability across services and pipelines: logging, metrics, tracing, alerting, SLOs, and incident response., Drive cost awareness and reliability across web, mobile, and agentic systems, balancing latency with scale., Collaborate with product, backend, and ML to align infra decisions with user outcomes and roadmap priorities.

Qualifications: 4 to 10+ years in infrastructure, platform, or data engineering with real ownership of uptime, performance, and security., Expert with AWS and Infrastructure-as-Code (Terraform, Pulumi, or CloudFormation)., Strong proficiency in Python or TypeScript, plus tooling/scripting (Bash/YAML)., Containers and orchestration experience (Docker, Kubernetes or ECS) and CI/CD pipelines you designed and ran., Proven ability to design and operate data pipelines and distributed systems for both batch and low-latency use cases., PostgreSQL at scale, ideally with pgvector/embeddings exposure for ML-adjacent workloads., Strong observability practices: metrics, tracing, alerting, incident management, and SLOs., Excellent collaboration with AI/ML and product teams; clear communication of tradeoffs and risk., Work authorization in the U.S. and willingness to be on-site five days a week in San Francisco., Experience supporting model training and inference pipelines, feature stores, or evaluation loops., Prior work with streaming voice, low-latency systems, or recommendation/retrieval stacks., Early infra/platform owner at a seed–Series B startup, scaling AWS with Terraform and CI/CD, Built real-time and batch data pipelines that powered matching, voice, or recommendations, Ran Postgres at scale (schema design, indexing, pooling), with pgvector or embeddings in prod, Set up observability and on-call (metrics, tracing, alerting) that improved SLOs, Partnered with ML to deploy and monitor model inference with clear latency and cost targets

Minimum Requirements: Must be authorized to work in the U.S. without future visa sponsorship.,Able to work onsite in San Francisco, CA five days per week.,4+ years in infrastructure/platform or SRE with real production ownership.,Strong AWS + Infrastructure-as-Code (Terraform or similar).,Containers and orchestration (Docker with Kubernetes or ECS) and CI/CD experience.,Proficient in Python or TypeScript for tooling and services.,PostgreSQL at scale; familiarity with performance tuning and pgvector is a plus.,Solid observability and on-call practices (metrics, tracing, alerting, incident response).,Experience building and operating data pipelines (batch and/or streaming).

Screening Questions: (Optional Video). This step is completely optional. If you’d like, record a short 2–3 minute video introducing yourself and your experience — or share a recording of your interview with the recruiter if that’s easier. You can upload the link via Loom or Google Drive. This just helps us get to know you better, but there’s no pressure if you’d prefer to skip it.,(Optional Portfolio / GitHub) If available, please share a link to your GitHub, portfolio, or any recent projects you’ve worked on. This is entirely optional but helps provide more context about your work.,What excites you most about Known and why are you leaving your current opportunity for this one?,Which platform areas are your strongest? Pick up to three — AWS+Terraform, Kubernetes/ECS, CI/CD, data pipelines, Postgres/pgvector, observability/on-call, ML inference infra — and give 2–3 sentences on a recent project for each, including scale (e.g., RPS or GB/day) and your role.,Describe one production pipeline or service you owned end-to-end. Share the architecture, throughput/latency targets, tools used (e.g., Terraform, Kafka/Airflow/dbt/ECS), how you monitored it (metrics/alerts), and one incident you detected and resolved (before/after impact).

Company Information

About Company: N/A

Culture: N/A

Additional Information

Interview Process: Make sure the candidate has read the role one-pager: https://docs.google.com/document/d/1Ts-CARiQiSi8L313TDK5Hf_4_YO1EUotHWrpbdmuhsQ/edit?usp=sharing, Walk them through the product tech overview deck: https://www.figma.com/design/fMbQlGiJ0vPHoELqW1MzJA/Known-VC-Decks?node-id=292-5671, Confirm full working rights in the U.S., Confirm willingness to work 5 days per week onsite in the Marina office., Stage 1: Intro call — 30 minutes, Stage 2: Technical Screen I — 60 minutes(CoderPad coding exercise that focuses on problem solving and clarity in TypeScript.), (CoderPad coding exercise that focuses on problem solving and clarity in TypeScript.), Stage 3: Onsite Technical II — about 3.5 hours total~15 min briefing on a longer prompt.~2.5 hours independent work on the prompt.~45 min presentation and discussion of the solution.Policy: Candidate may use editor conveniences and AI tools like ChatGPT or Cursor. Raw correctness can be judged from output. Frontend details are reviewed in the wrap-up., ~15 min briefing on a longer prompt., ~2.5 hours independent work on the prompt., ~45 min presentation and discussion of the solution.Policy: Candidate may use editor conveniences and AI tools like ChatGPT or Cursor. Raw correctness can be judged from output. Frontend details are reviewed in the wrap-up., Policy: Candidate may use editor conveniences and AI tools like ChatGPT or Cursor. Raw correctness can be judged from output. Frontend details are reviewed in the wrap-up., Final stage: Company fit — 30 to 45 minutesValues alignment, team collaboration style, and onsite expectations. In person., Values alignment, team collaboration style, and onsite expectations. In person.

Day to day: You will design, build, and operate the cloud and data platform that powers Known’s voice-AI product. Most days you will move between Terraform and AWS changes, CI/CD improvements, container orchestration, and monitoring live systems. You will build and maintain data pipelines for ingestion, training data prep, and reporting, then partner with AI/ML to deploy and evaluate models in production. You will sit in the Marina office in San Francisco five days a week, join a short stand-up, align with product on priorities, and close the day by reviewing dashboards and alerts for what you shipped.

Team: You will join a small founding engineering group and report to a founding engineering lead. You will work closely with the CEO and product on priorities, and with peers across backend and AI/ML. Collaboration is tight and hands-on. Everyone ships, reviews code and Terraform, writes lightweight docs and runbooks, and owns measurable outcomes like availability, latency, and data freshness.

Growth: This role has clear paths to Staff or Lead Platform. You can take ownership of the core platform areas: AWS architecture, data pipelines, CI/CD, observability, and security. As we scale, you may lead hiring for a data engineer and help define golden paths that enable other teams to ship quickly and safely. Success looks like reliable SLOs, faster deploys, lower p95 latency, and cost per user that trends down while usage grows.

Ideal Candidate Profile: 4 to 8+ years building and running AWS infrastructure with Terraform, CI/CD, and secure networking, Proven experience with containers and orchestration using Kubernetes or ECS, plus GitHub Actions or similar, Strong Python or TypeScript for services, jobs, and tooling, PostgreSQL at scale, including schema design, indexing, pooling, and exposure to embeddings or pgvector, Observability first mindset with metrics, tracing, alerting, and effective incident response, Comfortable partnering with ML to deploy, monitor, and evaluate inference services, Early infra or platform owner at a seed to Series B consumer startup that scaled to meaningful usage, Platform or SRE lead who created templates and self-serve tooling that let multiple teams ship safely, Data platform engineer who built ingestion, transformation, and reporting that supported model training and evaluation, https://www.linkedin.com/in/justincinmd/, https://www.linkedin.com/in/xinxindai/, https://www.linkedin.com/in/mikephoran/, https://www.linkedin.com/in/abhinavs515/

Companies to source from: [object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Messaging Channel

This is the messaging channel between the recruiters and the hiring manager for this role.