Climate & Environment Full-time

Principal, Customer Reliability Engineer

Crusoe

Posted

Feb 28, 2026

Location

Remote

Type

Full-time

Mission

What you will drive

Crusoe's mission is to accelerate the abundance of energy and intelligence. Weโ€™re crafting the engine that powers a world where people can create ambitiously with AI โ€” without sacrificing scale, speed, or sustainability.

Be a part of the AI revolution with sustainable technology at Crusoe. Here, you'll drive meaningful innovation, make a tangible impact, and join a team thatโ€™s setting the pace for responsible, transformative cloud infrastructure.

About the Role:

As a Principal Customer Reliability Engineer, you define and elevate the technical reliability strategy of Crusoe Cloud at the company level.

You are an organization-wide authority in distributed systems, AI/ML infrastructure, networking, storage, compute, k8, and cloud operations. Your impact extends beyond CX, you shape how Crusoe designs, deploys, and scales high-performance GPU infrastructure.

This is not an escalation engineer role. This is a systems architect and reliability strategist role with direct impact on enterprise readiness and revenue protection.

What You'll Be Working On:

Org-Level Reliability Strategy

  • Define the technical vision for AI/ML workload reliability.

  • Architect guardrails across compute, storage, networking, and orchestration.

  • Partner with Product & Engineering to influence roadmap decisions impacting scalability and resilience.

Incident & Risk Governance

  • Lead post-incident structural reforms for major outages.

  • Define enterprise-grade incident management standards.

  • Establish reliability metrics that align with ARR protection and expansion.

Advanced Systems Architecture

  • Evaluate and improve:

  • Kubernetes multi-cluster design

  • Software-defined networking

  • IB fabric architecture

  • GPU lifecycle management

  • Observability frameworks

  • Drive automation-first operational maturity.

  • Executive & External Credibility

  • Serve as technical spokesperson during high-severity events.

  • Build enterprise confidence in Crusoeโ€™s technical depth.

  • Contribute to technical thought leadership (blogs, architecture reviews, customer briefings).

Talent Multiplier

  • Mentor Sr. Staff engineers.

  • Raise hiring bar for advanced infrastructure roles.

  • Create technical learning frameworks for HPC & AI operations.

  • Work on tooling and automation for the CX team

  • Engage with customers during their onboarding phase

  • Work on Executive level escalations and high priority incidents

What You'll Bring to the Team:

  • 12+ years experience in distributed systems, SRE, DevOps, or HPC engineering.

  • Deep expertise in:

  • Linux internals

  • Kubernetes at scale

  • Infiniband / RDMA

  • GPU cluster performance engineering

  • Large-scale AI/ML workloads

  • Demonstrated ability to architect reliability systems, not just troubleshoot them.

  • Experience leading large-scale incident reform or platform redesign.

  • Exceptional cross-functional influence.

  • Strong executive communication skills.

Benefits:

  • Industry competitive pay

  • Restricted Stock Units in a fast growing, well-funded technology company

  • Health insurance package options that include HDHP and PPO, vision, and dental for you and your dependents

  • Employer contributions to HSA accounts

  • Paid Parental Leave

  • Paid life insurance, short-term and long-term disability

  • Teladoc

  • 401(k) with a 100% match up to 4% of salary

  • Generous paid time off and holiday schedule

  • Cell phone reimbursement

  • Tuition reimbursement

  • Subscription to the Calm app

  • MetLife Legal

  • Company paid commuter benefit; $300/month

Compensation Range

Compensation will be paid in the range of up to $230,000 - $280,000 + Bonus. Restricted Stock Units are included in all offers. Compensation to be determined by the applicants knowledge, education, and abilities, as well as internal equity and alignment with market data.

Crusoe is an Equal Opportunity Employer. Employment decisions are made without regard to race, color, religion, disability, genetic information, pregnancy, citizenship, marital status, sex/gender, sexual preference/ orientation, gender identity, age, veteran status, national origin, or any other status protected by law or regulation.

Profile

What makes you a great fit

Crusoe's mission is to accelerate the abundance of energy and intelligence. Weโ€™re crafting the engine that powers a world where people can create ambitiously with AI โ€” without sacrificing scale, speed, or sustainability.

Be a part of the AI revolution with sustainable technology at Crusoe. Here, you'll drive meaningful innovation, make a tangible impact, and join a team thatโ€™s setting the pace for responsible, transformative cloud infrastructure.

About the Role:

As a Principal Customer Reliability Engineer, you define and elevate the technical reliability strategy of Crusoe Cloud at the company level.

You are an organization-wide authority in distributed systems, AI/ML infrastructure, networking, storage, compute, k8, and cloud operations. Your impact extends beyond CX, you shape how Crusoe designs, deploys, and scales high-performance GPU infrastructure.

This is not an escalation engineer role. This is a systems architect and reliability strategist role with direct impact on enterprise readiness and revenue protection.

What You'll Be Working On:

Org-Level Reliability Strategy

  • Define the technical vision for AI/ML workload reliability.

  • Architect guardrails across compute, storage, networking, and orchestration.

  • Partner with Product & Engineering to influence roadmap decisions impacting scalability and resilience.

Incident & Risk Governance

  • Lead post-incident structural reforms for major outages.

  • Define enterprise-grade incident management standards.

  • Establish reliability metrics that align with ARR protection and expansion.

Advanced Systems Architecture

  • Evaluate and improve:

  • Kubernetes multi-cluster design

  • Software-defined networking

  • IB fabric architecture

  • GPU lifecycle management

  • Observability frameworks

  • Drive automation-first operational maturity.

  • Executive & External Credibility

  • Serve as technical spokesperson during high-severity events.

  • Build enterprise confidence in Crusoeโ€™s technical depth.

  • Contribute to technical thought leadership (blogs, architecture reviews, customer briefings).

Talent Multiplier

  • Mentor Sr. Staff engineers.

  • Raise hiring bar for advanced infrastructure roles.

  • Create technical learning frameworks for HPC & AI operations.

  • Work on tooling and automation for the CX team

  • Engage with customers during their onboarding phase

  • Work on Executive level escalations and high priority incidents

What You'll Bring to the Team:

  • 12+ years experience in distributed systems, SRE, DevOps, or HPC engineering.

  • Deep expertise in:

  • Linux internals

  • Kubernetes at scale

  • Infiniband / RDMA

  • GPU cluster performance engineering

  • Large-scale AI/ML workloads

  • Demonstrated ability to architect reliability systems, not just troubleshoot them.

  • Experience leading large-scale incident reform or platform redesign.

  • Exceptional cross-functional influence.

  • Strong executive communication skills.

Benefits:

  • Industry competitive pay

  • Restricted Stock Units in a fast growing, well-funded technology company

  • Health insurance package options that include HDHP and PPO, vision, and dental for you and your dependents

  • Employer contributions to HSA accounts

  • Paid Parental Leave

  • Paid life insurance, short-term and long-term disability

  • Teladoc

  • 401(k) with a 100% match up to 4% of salary

  • Generous paid time off and holiday schedule

  • Cell phone reimbursement

  • Tuition reimbursement

  • Subscription to the Calm app

  • MetLife Legal

  • Company paid commuter benefit; $300/month

Compensation Range

Compensation will be paid in the range of up to $230,000 - $280,000 + Bonus. Restricted Stock Units are included in all offers. Compensation to be determined by the applicants knowledge, education, and abilities, as well as internal equity and alignment with market data.

Crusoe is an Equal Opportunity Employer. Employment decisions are made without regard to race, color, religion, disability, genetic information, pregnancy, citizenship, marital status, sex/gender, sexual preference/ orientation, gender identity, age, veteran status, national origin, or any other status protected by law or regulation.

About

Inside Crusoe

Transforming stranded energy into eco-friendly power for data centers, reducing environmental impact significantly.