[Remote] AI Platform Lead Engineer @ Remote
Note: The job is a remote job and is open to candidates in USA. Momento USA is a global technology consulting firm seeking an AI Platform Lead Engineer. The role involves leading the technical strategy for AI/ML platforms, architecting robust AI infrastructure, and mentoring engineering teams to deliver innovative AI solutions that drive business transformation.
Responsibilities
- Implement the AWS and per the approved architecture and security design
- Build and maintain landing zones, accounts, networking, IAM, and Snowflake account/role structures
- Operate as the on-the-ground infrastructure lead, coordinating closely with customer security and cloud teams
- Enable AI Engineers with reusable platform components (Bedrock setup, Cortex enablement, secure data zones)
- Drive infrastructure code reviews, environment promotions, and platform governance
- Design and implement scalable AI/ML platforms supporting model training, deployment, monitoring, and lifecycle management
- Drive adoption of cloud-native and hybrid architectures for AI workloads
- Ensure platform reliability, performance, and security at scale
- Lead cross-functional engineering teams, providing technical guidance and mentorship
- Collaborate with product, data science, and business stakeholders to align AI platform capabilities with organizational goals
- Define long-term AI infrastructure strategy and roadmap
- Establish best practices for MLOps, CI/CD pipelines, and automated workflows
- Implement monitoring, observability, and governance frameworks for AI models
- Optimize cost efficiency and resource utilization across cloud and on-prem environments
- Stay ahead of emerging AI/ML technologies, frameworks, and tools
- Evaluate and integrate new solutions to enhance platform capabilities
- Champion innovation by fostering a culture of experimentation and continuous improvement
Skills
- 14+ years of experience in software engineering, with at least 5+ years focused on AI/ML platforms
- Proven expertise in cloud platforms (Azure, AWS, Google Cloud Platform) and container orchestration (Kubernetes, Docker)
- Strong knowledge of MLOps frameworks (Kubeflow, MLflow, SageMaker, Vertex AI)
- Hands-on experience with distributed systems, data pipelines, and model deployment at scale
- Proficiency in programming languages such as Python, Java, or Go
- Deep understanding of AI/ML lifecycle management, including monitoring, retraining, and governance
- Excellent leadership, communication, and stakeholder management skills
- Experience in building Enterprise AI platforms for large-scale organizations
- Familiarity with data engineering tools (Spark, Kafka, Databricks)
- Exposure to generative AI and LLM deployment frameworks
- Track record of leading global teams and delivering complex AI initiatives
Company Overview
Company H1B Sponsorship