See all roles

Member of Engineering – Pre-training, Data Engineering

Work from home Full-time role Hiring

Job Description:

  • Build and maintain high-performance pipelines for trillions of tokens.
  • Deliver diverse and high quality datasets for pre-training foundation models.
  • Closely work with other teams such as Pretraining, Posttraining, Evals and Product to to ensure alignment on the quality of the models delivered.

Requirements:

  • Strong background in building production-grade, distributed data systems for machine learning, with experience in:
  • Orchestration: Slurm, Airflow, or Dagster
  • Observability & Reliability: CI/CD, Grafana, Prometheus, etc.
  • Infra: Git, Docker, k8s, cloud managed services
  • Batched inference (ex: vLLM)
  • Performance obsession, especially with large-scale GPU clusters and distributed pipelines
  • Expert-level python knowledge and ability to write clean and maintainable code
  • Strong algorithmic foundations
  • Proficiency with libraries like Polars, Dask, or PySpark
  • Nice to have:
  • Experience in building trillion-scale SOTA pretraining datasets
  • Experience translating research to production at scale
  • Experience with OCR, web crawling, or evals
  • Prior experience pre-training LLMs

Benefits:

  • Fully remote work & flexible hours
  • 37 days/year of vacation & holidays
  • Health insurance allowance for you and dependents
  • Company-provided equipment
  • Wellbeing, always-be-learning and home office allowances
  • Frequent team get togethers
  • Great diverse & inclusive people-first culture

Apply tot his job Apply To this Job

You might like

Data Engineer (For OPT/CPT Candidates)- Immediate Hiring

Work from home Full-time role

Revenue Cycle Business Intelligence Lead Analyst Remote

Work from home Full-time role

Senior Business Analyst /HARRISBURG, PA ( Remote )-8 months Contract

Work from home Full-time role

Sr. Business Intelligence Analyst, Supply Chain - Remote

Work from home Full-time role

Geospatial Data Engineer, Federal

Work from home Full-time role

Senior Business Intelligence Analyst - Great American Risk Solutions

Work from home Full-time role

Business Intelligence & AI Lead

Work from home Full-time role

Senior Data Engineer – Real-Time Streaming

Work from home Full-time role

Power BI Developer (SQL/Adobe/Data Bricks And Adobe Analytics) :: Remote

Work from home Full-time role

Sr. Data Engineer, Enterprise - Slack

Work from home Full-time role

Experienced Data Entry Clerk / General Administrative Specialist/Analyst in Concord, NH at arenaflex

Work from home Full-time role

Strategic Customer Success Manager – SaaS Solutions for Electric Vehicle Charging Infrastructure

Work from home Full-time role

Remote Online Data Entry Work From Home - Entry Level

Work from home Full-time role

Remote Full-Time Data Entry Clerk – Accurate Data Management & Reporting Specialist at arenaflex

Work from home Full-time role

Experienced Part-Time Online Community Moderator – Live Chat Spam Comment Removal Specialist

Work from home Full-time role

AI Trainer - Personal Finance Advisors (Remote)

Work from home Full-time role

senior administrative assistant, Southeast Region

Work from home Full-time role

Lifecycle Specialist, Employee Relations & Transitions - CAD/LATAM

Work from home Full-time role

Experienced Remote Data Entry Clerk – Join arenaflex's Dynamic Team

Work from home Full-time role

Inside Sales Representative I

Work from home Full-time role