Own the infrastructure powering AI and ML models across critical business surfaces–supporting growth, recommendations, trust and safety, fraud, seller tooling, and more.
Guide the prototyping, deployment, and productionization of novel ML architectures that directly shape user experience and marketplace dynamics.
Help design and scale inference infrastructure capable of serving large models with low latency and high throughput.
Oversee and evolve real-time feature pipelines that feed both our online and offline stores, ensuring single-second feedback from behavioral signals, high reliability, and model training fidelity.
Drive feature platform improvements and expand scope to cover non-ML use cases such as fraud rules where point-in-time backtesting is also critical.
Lead the development of distributed training and inference pipelines leveraging GPUs and both model and data parallelism.
Optimize system performance by managing resource utilization and developing intelligent feature caching strategies.
Empower scientists to iterate faster by building abstractions, APIs, and developer tools that simplify the development of near-realtime features and model iteration.
Roll out ever-better ergonomics around model training and deployment.
Stretch beyond your comfort zone to take on new technical challenges as we scale AI across Whatnot’s ecosystem.
Requirements
1+ years of TLM experience developing production machine learning systems at consumer-scale loads
Bachelor’s degree in Computer Science, Statistics, Applied Mathematics or a related technical field, or equivalent work experience.
5+ years of hands-on software engineering experience building and maintaining production systems for consumer-scale loads.
1+ years of professional experience developing software in Python
Ability to work autonomously and drive initiatives across multiple product areas and communicate findings with leadership and product teams.
Experience with operational, search, and key-value databases such as PostgreSQL, DynamoDB, Elasticsearch, Redis.
Experience working with with ML-specific tools and frameworks such as MLFlow, LitServe, TorchServe, Triton
Firm grasp of visualization tools for monitoring and logging e.g. DataDog, Grafana.
Familiarity with cloud computing platforms and managed services such as AWS Sagemaker, Lambda, Kinesis, S3, EC2, EKS/ECS, Apache Kafka, Flink.
Professionalism around collaborating in a remote working environment and well tested, reproducible work.
Exceptional documentation and communication skills.
Tech Stack
Apache
AWS
Cloud
DynamoDB
EC2
ElasticSearch
Grafana
Kafka
Postgres
Python
Redis
Benefits
Generous Holiday and Time off Policy
Health Insurance options including Medical, Dental, Vision
Work From Home Support
Home office setup allowance
Monthly allowance for cell phone and internet
Care benefits
Monthly allowance for wellness
Annual allowance towards Childcare
Lifetime benefit for family planning, such as adoption or fertility expenses
Retirement; 401k offering for Traditional and Roth accounts in the US (employer match up to 4% of base salary) and Pension plans internationally
Monthly allowance to dogfood the app
All Whatnauts are expected to develop a deep understanding of our product. We're passionate about building the best user experience, and all employees are expected to use Whatnot as both a buyer and a seller as part of their job (our dogfooding budget makes this fun and easy!).
Parental Leave
16 weeks of paid parental leave + one month gradual return to work *company leave allowances run concurrently with country leave requirements which take precedence.