Search for Senior Site Reliability Engineer roles focused on Kubernetes platform management in Redmond, WA, with full-time schedules, requiring 5+ years of experience, and proficiency in Python and Terraform.

Okay, I've searched for Senior Site Reliability Engineer roles in Redmond, WA, specializing in Kubernetes platform management, full-time positions, requiring 5+ years of experience, and skills in Python and Terraform. I have found 47 positions for you.

Senior Site Reliability Engineer

Redmond, WA

Full Time

5 years + Experience

Python

Terraform

Can you help me to analyze the job description and company culture?

Sure. Here's the analysis:

Job Analysis:

This Sr. Kubernetes Platform Site Reliability Engineer role at SpaceX’s Starlink project is fundamentally about ensuring the reliability, scalability, and operational excellence of the largest satellite internet infrastructure in the world. The core purpose is to design, deploy, and maintain highly available Kubernetes-based platforms and supporting infrastructure that power a global network serving millions of users daily. Beyond just technical know-how in Kubernetes, Linux, scripting, and infrastructure automation tools like Terraform and Ansible, the role demands deep operational insight to tackle challenges such as large-scale distributed system management, network reliability, and performance optimization. The engineer will frequently engage with cross-functional teams—software developers, network engineers, and operations—to ensure seamless deployment and lifecycle management of critical services, making collaboration and communication essential. The complexity of running the world's largest satellite constellation means this individual will regularly confront ambiguities in system behavior, pressure on uptime and latency, and evolving infrastructure needs. Success in this role hinges on both technical mastery and a proactive mindset to innovate solutions that anticipate and prevent failures, while maintaining system health. Within 6-12 months, a successful candidate will have solidified automated Kubernetes management, established robust monitoring/alerting frameworks, streamlined cross-team operations, and contributed to scaling Starlink’s infrastructure reliability in line with rapid growth.

Company Analysis:

SpaceX occupies a unique and pioneering space at the confluence of aerospace innovation and cutting-edge technology, with Starlink representing a bold expansion into global internet delivery via satellite constellations. As an industry leader and disruptor, SpaceX’s culture emphasizes rapid iteration, extreme technical ambition, and mission-driven work aimed at enabling human life beyond Earth. The company thrives in a fast-paced, high-stakes environment where technical excellence and resilience under pressure are critical. For a Sr. Site Reliability Engineer, this means being comfortable with continuous innovation while balancing operational stability. SpaceX’s values likely include rigorous problem solving, a hands-on approach, and ownership mentality, which manifest in long working hours and a high bar for collaboration and communication. This role, situated within Starlink but drawing on company-wide engineering excellence, likely enjoys considerable visibility due to the criticality of reliable internet infrastructure and the growth-centric nature of the business unit. Strategically, the position supports SpaceX’s broader goal of scaling Starlink globally—making reliability and scalability not just technical targets but business imperatives—and requires alignment with SpaceX’s mission for transformational impact and fearless pursuit of breakthroughs.

Mock up some interview questions based on the job description.

Absolutely. Here are some mock interview questions that could come up:

Behavioral: Tell me about a time when you had to troubleshoot a complex issue in a distributed system. How did you approach diagnosing and resolving it?
Behavioral: Describe an experience where you collaborated across multiple engineering teams to deploy or maintain a complex software infrastructure. What challenges did you face and how did you overcome them?
Behavioral: Starlink’s infrastructure operates at a massive scale. How have you handled pressure and ambiguity when system failures could impact many users simultaneously?
Behavioral: Give an example of when you identified a performance bottleneck in a platform or service. What steps did you take to analyze and improve it?
Technical: How would you design an automated deployment system for managing hundreds of on-premises Kubernetes clusters at scale?
Technical: Explain your experience with infrastructure as code tools like Terraform or Ansible. How do you ensure configuration consistency and avoid drift in such environments?
Technical: What strategies and tools do you use for monitoring, alerting, and ensuring high availability in large-scale Kubernetes-based infrastructures?
Technical: Walk me through your approach to debugging a complex networking issue involving TCP/IP at scale in a hybrid cloud and on-prem environment.
Situational: Imagine a critical Starlink service running in Kubernetes suddenly experiences intermittent failures impacting thousands of users. How would you triage and address this issue?
Situational: Suppose a newly deployed automation script accidentally causes configuration drift on hundreds of servers. How would you detect, contain, and remediate the situation?
Situational: You discover a bottleneck in the deployment pipeline delaying frequent releases across multiple teams. How would you investigate and collaborate to improve this?
Company Fit: What excites you most about working on Starlink and SpaceX’s mission to make life multiplanetary?
Company Fit: SpaceX values fast iteration and high ownership. How do you ensure quality and reliability while working in a high-velocity environment?
Company Fit: How do your personal values and work style align with SpaceX’s culture of innovation, collaboration, and mission-driven impact?
Questions to ask: Can you describe the structure of the Site Reliability Engineering team within Starlink and how it interacts with other engineering groups?
Questions to ask: What are some of the immediate challenges the Starlink platform is facing that this role will help address?
Questions to ask: How does SpaceX support continuous learning and professional growth, particularly for engineers working at the intersection of cloud-native infrastructure and satellite communications?

Job Details

Overview Company

Senior Kubernetes Platform Site Reliability Engineer

$160k/y-$220k/y

SpaceX

18 days ago

Highlight

OnsiteFull-time5+ yearsSenior

Summary

SpaceX is seeking a Sr. Kubernetes Platform Site Reliability Engineer (Starlink) to join their team in Redmond, WA. The role involves designing, operating, and scaling the infrastructure for Starlink, the world's largest satellite constellation.

About the Role

As a Sr. Kubernetes Platform Site Reliability Engineer, you will develop automation to deploy and manage on-premise Kubernetes clusters, manage core infrastructure, collaborate with software engineers, and engage in the entire lifecycle of services. You will also monitor and troubleshoot the Starlink stack and identify areas for improvement.

About You

Required:

Bachelor’s degree in computer science, information systems/IT, or an engineering discipline and 5+ years of professional experience in Site Reliability Engineering or DevOps; OR 7+ years of professional experience in Site Reliability Engineering or DevOps in lieu of a degree.
2+ years of professional experience with Linux operating systems.
Experience with Terraform, Ansible, or other infrastructure tools.
Experience with containerization technologies (i.e. OCI containers, Kubernetes).
Experience scripting in Bash, Python, or other similar languages.
Development experience in Python, C++, or Go.

Preferred:

1+ years of experience with Python and Python-based development frameworks.
Experience managing Kubernetes clusters, not just using them.
Knowledge of Linux boot process and systems configuration.
Deep understanding of testing, continuous integration, build, deployment & continuous monitoring.
Understanding of relevant build technologies, such as Bazel and Makefiles.
Focus on performance bottlenecks and performance improvement techniques.
Understanding of distributed databases and data modeling.
Experience with automatically managing dozens, hundreds, or thousands of servers (e.g., Terraform or Ansible).
Strong networking knowledge of TCP/IP.
Excellent communications skills with the ability to communicate with customers, peers, management etc. in both formal and informal situations.

Benefits

Base salary range: $160k–$220k per year.
Eligibility for long-term incentives, such as company stock, stock options, or long-term cash awards.
Potential discretionary bonuses.
Ability to purchase additional stock at a discount through an Employee Stock Purchase Plan.
Comprehensive medical, vision, and dental coverage.
Access to a 401(k)-retirement plan.
Short and long-term disability insurance.
Life insurance.
Paid parental leave.
Various other discounts and perks.
3 weeks of paid vacation.
10 or more paid holidays per year.
5 days of sick leave per year for exempt employees.
Company shuttles for round trip travel from select Seattle locations to the SpaceX Redmond office Monday to Friday.

SpaceX

SpaceX designs, manufactures and launches the world’s most advanced rockets and spacecraft. The company was founded in 2002 by Elon Musk to revolutionize space transportation, with the ultimate goal of making life multiplanetary.

SpaceX has gained worldwide attention for a series of historic milestones. It is the only private company ever to return a spacecraft from low-Earth orbit, which it first accomplished in December 2010. The company made history again in May 2012 when its Dragon spacecraft attached to the International Space Station, exchanged cargo payloads, and returned safely to Earth — a technically challenging feat previously accomplished only by governments. Since then Dragon has delivered cargo to and from the space station multiple times, providing regular cargo resupply missions for NASA.

For more information, visit www.spacex.com.

Redmond, WA