See all roles

LLM Data Engineer | United States | Fully Remote

Work from home Full-time role Hiring

We are seeking an experienced AI/LLM Data Engineer to build and maintain the data pipeline for our Generative AI platform. The ideal candidate will be well-versed in the latest Large Language Model (LLM) technologies and have a strong background in data engineering, with a focus on Retrieval-Augmented Generation (RAG) and knowledge-base techniques. This role sits in the AI COE within DX Tech & Digital. As a AI/LLM Data Engineer (you will report into the Director, AI Solutions & Development who oversees the AI COE. You will work on highly visible strategic projects, collaborating with cross-functional teams to define requirements and deliver high-quality AI solutions. The ideal candidate will have a passion for Generative AI and LLMs, with a proven track record of delivering innovative AI applications.Responsibilities

  • Design, implement, and maintain an end-to-end multi-stage data pipeline for LLMs, including Supervised Fine Tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF) data processes
  • Identify, evaluate, and integrate diverse data sources and domains to support the Generative AI platform
  • Develop and optimize data processing workflows for chunking, indexing, ingestion, and vectorization for both text and non-text data
  • Benchmark and implement various vector stores, embedding techniques, and retrieval methods
  • Create a flexible pipeline supporting multiple embedding algorithms, vector stores, and search types (e.g., vector search, hybrid search)
  • Implement and maintain auto-tagging systems and data preparation processes for LLMs
  • Develop tools for text and image data crawling, cleaning, and refinement
  • Collaborate with cross-functional teams to ensure data quality and relevance for AI/ML models
  • Work with data lake house architectures to optimize data storage and processing
  • Integrate and optimize workflows using Snowflake and various vector store technologies Requirements• Master's degree in Computer Science, Data Science, or a related field
  • 3-5 years of work experience in data engineering, preferably in AI/ML contexts
  • Proficiency in Python, JSON, HTTP, and related tools
  • Strong understanding of LLM architectures, training processes, and data requirements
  • Experience with RAG systems, knowledge base construction, and vector databases
  • Familiarity with embedding techniques, similarity search algorithms, and information retrieval concepts
  • Hands-on experience with data cleaning, tagging, and annotation processes (both manual and automated)
  • Knowledge of data crawling techniques and associated ethical considerations
  • Strong problem-solving skills and ability to work in a fast-paced, innovative environment
  • Familiarity with Snowflake and its integration in AI/ML pipelines
  • Experience with various vector store technologies and their applications in AI
  • Understanding of data lakehouse concepts and architectures
  • Excellent communication, collaboration, and problem-solving skills.
  • Ability to translate business needs into technical solutions.
  • Passion for innovation and a commitment to ethical AI development.
  • Experience building LLMs pipeline using framework like LangChain, LlamaIndex, Semantic Kernel, OpenAI functions.
  • Familiar with different LLM parameters like temperate, top-k, and repeat penalty, and different LLM outcome evaluation data science metrics and methodologies. Preferred Skills Experience with popular LLM/ RAG frameworks Familiarity with distributed computing platforms (e.g., Apache Spark, Dask) Knowledge of data versioning and experiment tracking tools Experience with cloud platforms (AWS, GCP, or Azure) for large-scale data processing Understanding of data privacy and security best practices Practical experience implementing data lakehouse solutions Proficiency in optimizing queries and data processes in Snowflake or Databricks Hands-on experience with different vector store technologies BenefitsUS employees benefit package.

Apply tot his job Apply To this Job

You might like

PQRs and Litigation Analyst

Work from home Full-time role

Personal Injury Attorney – Expansion (Ideal for Solo Practitioners)

Work from home Full-time role

Mortgage Loan Originator | Retail - Remote Washington DC

Work from home Full-time role

Senior Performance Tester

Work from home Full-time role

ABAD Logistics Analyst (International Assignment) NO REMOTE WORK

Work from home Full-time role

Loan Processing Manager - Mortgage

Work from home Full-time role

Principal Machine Learning Architect (L7) - Content Promotion & Distribution

Work from home Full-time role

Project Procurement & Logistics Manager

Work from home Full-time role

Senior Machine Learning Scientist (USA Remote)

Work from home Full-time role

Advanced Data Science Associate Consultant - Generative AI and Machine Learning

Work from home Full-time role

Evenings And Weekends Therapist At Careermilard

Work from home Full-time role

Director, Omnichannel Strategist (Life Sciences)

Work from home Full-time role

Experienced Data Entry Specialist – Remote Opportunity for Beginners with Competitive Pay and Flexible Schedules

Work from home Full-time role

Online Math Tutor

Work from home Full-time role

Entry level Data Entry Clerk/Typing - Remote

Work from home Full-time role

Urgently Need English Tutor – Remote in Peoria, AZ

Work from home Full-time role

Data Entry Clerk with Flexible Hours (Typist) - Remote | WFH

Work from home Full-time role

Part-Time Remote Data Analyst - Customer Service Business Intelligence & Reporting Specialist

Work from home Full-time role

Indirect Tax Compliance Analyst (hybrid)

Work from home Full-time role

Remote Data Entry Specialist – Part‑Time, $30/hr – Join arenaflex’s Dynamic Virtual Workforce

Work from home Full-time role