Senior Site Reliability (SRE) Engineer

Work from home Full-time role Hiring

ADT is transitioning to a hybrid in-office work model that blends the benefits of in-person collaboration with remote flexibility. New team members will begin working remotely and should plan to shift to a hybrid schedule at a later date if hired within one of our three talent hubs: Boca Raton, FL, Irving, TX, or Blue Bell, PA. We’ll keep you informed and supported throughout this transition.

While we are open to considering fully remote candidates based in the U.S., our preference is for team members to be located in one of our talent hubs to participate in the hybrid model once it is in place.

Applicants must be authorized to work for any employer in the U.S. We are unable to sponsor or take over sponsorship of an employment Visa at this time.

ADT's Site Reliability Engineering (SRE) team is seeking talented individuals who want their code to positively influence our customers, the bottom line, and the industry. Our team of engineers works tirelessly to keep the ADT platform running smoothly and our customers protected 24/7.

What You'll Do:

As a member of ADT's SRE team, you will play a critical role in ensuring the reliability, scalability, and performance of our large-scale distributed systems. You will drive operational excellence by proactively identifying and solving problems, improving system performance, and ensuring our production environments remain resilient and efficient. Your expertise in orchestrating and automating complex systems, combined with a focus on improving software release processes and managing large cloud environments, will be key to our ongoing success.

Key Responsibilities:

Ensure the reliability, availability, and scalability of large-scale distributed systems and applications.
Provide engineering and operational support for multiple production environments, ensuring uptime and minimal downtime.
Identify performance bottlenecks, reliability issues, and areas for improvement, and implement solutions proactively.
Develop and manage infrastructure as code using tools like Terraform and Ansible to automate cloud resource provisioning and configuration management.
Manage cloud environments (AWS, GCP) and work with Kubernetes-based infrastructure.
Implement and manage observability and monitoring solutions (e.g., Dynatrace, Prometheus) to provide real-time insights and identify issues.
Contribute to and improve software release processes, ensuring smooth deployments and minimal disruption to production systems.
Collaborate with cross-functional teams to reduce MTTR and enhance the operational health of our systems through proactive monitoring, automation, and SRE best practices.
Mentor junior SREs and help develop best practices.
Participate in an on-call rotation, responding to incidents and ensuring high availability and reliability of services.

What You'll Need:

5+ years in Site Reliability Engineering, DevOps, or related roles.
Strong focus on tactical operations and experience managing large-scale distributed software applications.
Solid experience with infrastructure as code (Terraform, Ansible).
Proven experience with cloud environments such as GCP and AWS.
Expertise in managing and optimizing Kubernetes clusters for large-scale deployments.
Proficiency in one or more programming languages, such as Python, Java, C/C++, Ruby, or JavaScript.
Strong understanding of software development and change management processes.
Experience with monitoring and observability platforms like Dynatrace, Prometheus, or similar tools.
In-depth experience managing dynamic, scalable cloud infrastructure and distributed systems.
Ability to diagnose and resolve complex system issues with a focus on operational excellence.
Strong communication skills with the ability to collaborate across teams and mentor junior engineers.
Comfortable with ambiguity and complex systems, with the ability to handle challenges with confidence.
Experience in CI/CD pipelines and automation tools.
Familiarity with incident response processes and post-mortem analysis.

Submitting Your Application:

To be considered, please submit a resume, no more than three pages, highlighting the technologies you’ve worked with directly. We’re particularly interested in how you’ve applied relevant technical skills and tools in past roles or projects, along with any impactful accomplishments.

Compensation & Benefits:

The salary range for this role is $104,000.00 - $156,000.00 and is based on experience and qualifications.

Certain roles are eligible for annual bonus and may include equity. These awards are allocated based on company and individual performance.

We offer employees access to healthcare benefits, a 401(k) plan and company match, short-term and long-term disability coverage, life insurance, wellbeing benefits and paid time off among others. Employees accrue up to 120 hours in their first year. Your accrual rate increases after your first year. We also offer 6 paid holidays.

The anticipated application end date will be on 7/11/2025.

Originally posted on Himalayas

Apply To this Job

Apply

Senior Site Reliability (SRE) Engineer

What You'll Do:

Key Responsibilities:

What You'll Need:

Submitting Your Application:

Compensation & Benefits:

You might like

Senior Solution Executive

Critical Incident Manager – Service Management

Member of Compliance, Analytics & Automation

HR Generalist – Amazon Store

Vendor Manager, Risk Operations

Software Engineer, New Grad

Software Engineer, Intern

PhD Machine Learning Engineer, New Grad

Commercial Lead

Growth Sales Representative

Lifecycle Specialist, Time & Attendance

Work From Home Customer Service Representative

Appointment Setter (Southeast Asia)

[Remote] Manager-Regional Brand Marketing (5E)

Customer Service Representative – Remote Clinical Member Support Specialist for arenaflex Healthcare Programs ($15‑$16/hour, Full‑Time & Seasonal Opportunities)

Experienced Data Entry Specialist – Remote Work Opportunity at blithequark

[Entry level Remote jobs] Aetna Data Entry Remote Jobs ? Apply Now

Experienced Customer Service Technical Support Analyst – Remote

Senior Director of Enterprise Architecture & Innovation (Oracle Fusion)

Controller (m/w/d) Projekte und Softwareentwicklung

Senior Site Reliability (SRE) Engineer

What You'll Do:

Key Responsibilities:

What You'll Need:

Submitting Your Application:

Compensation & Benefits:

You might like

Senior Solution Executive

Critical Incident Manager – Service Management

Member of Compliance, Analytics & Automation

HR Generalist – Amazon Store

Vendor Manager, Risk Operations

Software Engineer, New Grad

Software Engineer, Intern

PhD Machine Learning Engineer, New Grad

Commercial Lead

Growth Sales Representative

Lifecycle Specialist, Time & Attendance

Work From Home Customer Service Representative

Appointment Setter (Southeast Asia)

[Remote] Manager-Regional Brand Marketing (5E)

Customer Service Representative – Remote Clinical Member Support Specialist for arenaflex Healthcare Programs ($15‑$16/hour, Full‑Time & Seasonal Opportunities)

Experienced Data Entry Specialist – Remote Work Opportunity at blithequark

[Entry level Remote jobs] Aetna Data Entry Remote Jobs ? Apply Now

Experienced Customer Service Technical Support Analyst – Remote

Senior Director of Enterprise Architecture & Innovation (Oracle Fusion)

Controller (m/w/d) Projekte und Softwareentwicklung

Customer Service Representative – Remote Clinical Member Support Specialist for arenaflex Healthcare Programs ($15‑$16/hour, Full‑Time & Seasonal Opportunities)