Top

Senior Site Reliability Engineer

Pleasanton, CA, USA

95 Days ago

Job Description


XOPS is a fast-growing startup building the future of observability and automation for IT operations. Our platform unifies complex system data to deliver visibility, control, and intelligent workflows across the enterprise, empowering IT teams to manage the entire employee technology lifecycle with precision. As industries embrace AI to automate cars, rockets, and even farming, IT operations remain stuck in the past, reliant on spreadsheets and manual processes. We believe it is time for a change.

At XOPS, we are pioneering autonomous IT operations, freeing teams from tedious tasks and elevating them into strategic leadership roles. Our mission is to drive operational excellence, financial stewardship, and security across the enterprise, while transforming the employee experience. We are just getting started, and we are looking for exceptional teammates to help shape the future.

The Senior Site Reliability Engineer (SRE) plays a vital role in ensuring the reliability, scalability, and performance of our enterprise software platform. This is a senior-level position that requires deep technical expertise, strong problem-solving skills, and the ability to collaborate effectively in a fast-paced, demanding environment. Our customers, the largest enterprises in the world, expect 24/7 platform availability and top-tier performance.

The ideal candidate has strong expertise in AWS cloud technologies , a deep understanding of serverless architectures (AWS Lambda), and a passion for building resilient systems to enhance the customer experience.

Platform Reliability:
  • Design, implement, and manage highly available and scalable systems to meet customer expectations for 24/7 uptime.
  • Monitor, troubleshoot, and resolve platform incidents using tools such as Sentry, New Relic, and custom monitoring frameworks.
  • Lead post-incident reviews to ensure root cause analysis and preventative measures are in place.
Automation and Optimization:
  • Develop and maintain automation for infrastructure management, monitoring, and incident response.
  • Optimize platform performance and scalability, proactively identifying and addressing bottlenecks.
  • Contribute to the development of CI/CD pipelines to improve deployment reliability and speed.
Collaboration:
  • Partner with L2 engineers to resolve complex customer issues, providing guidance and technical expertise as needed.
  • Work closely with product engineering to ensure platform improvements align with customer needs.
  • Actively contribute to the documentation and sharing of best practices to improve team performance and customer outcomes.
Leadership:
  • Mentor junior engineers and provide technical leadership in reliability engineering.
  • Drive cross-functional initiatives to improve platform stability and customer satisfaction.

Requirements

  • Bachelor's degree in Computer Science or related discipline.
  • 8+ years in a Site Reliability Engineering or DevOps role, with experience supporting enterprise-grade software platforms.
  • 3+ years of experience in cloud services, in particular AWS.
  • Experience building observability systems on New Relic, Cloudwatch or similar.
  • Experience implementing rate-limiting, API gateways, and load balancing for highly available systems.
  • Exposure to security best practices and compliance frameworks (e.g., SOC2, ISO27001).
  • Proficient in infrastructure as code (IaC) using tools such as Terraform or CloudFormation.
  • Hands-on experience with scripting and programming languages like Python, Go, or Bash.
  • Strong troubleshooting and debugging skills.
  • Excellent communication and collaboration skills.
  • Experience with incident management and post-mortem practices.
  • Soft Skills:
    • Exceptional problem-solving and critical thinking abilities.
    • Strong verbal and written communication skills, with the ability to navigate ambiguity and provide clarity.
    • Ability to work collaboratively in cross-functional teams under pressure.
    Key Attributes:
    • Reliability-Driven: Strong commitment to platform reliability and performance.
    • Leadership and Mentorship: Willingness to guide and mentor less experienced team members.
    • Customer-Focused: Dedication to meeting and exceeding customer expectations in a high-pressure environment.
    Expectations:
    • Availability to participate in a 24/7 on-call rotation.
    • Ability to work in a fast-paced, ambiguous environment with rapidly changing priorities.
    • Proactive approach to identifying and mitigating risks before they impact customers.
    • Strong sense of accountability and ownership for platform stability and customer satisfaction.

    For this role, the estimated base salary range is between $166,000 - $203,000 USD. The actual base salary will vary based on various factors, including market and individual qualifications objectively assessed during the interview process. The listed range above is a guideline, and the base salary range for this role may be modified.

    Benefits

    • Competitive Compensation: Salary, Equity, and 401K
    • Comprehensive Vision, Dental, and Healthcare plans
    • Discretionary Time off Policy (If you need time off, take time off!)
    • 11 Company-paid Holidays
    • Hybrid Work Policy - 3 days in office/2 days remote
    • A chance to be part of a rapidly growing startup and make a real impact!

    Qualification

    Bachelor's Degree

Key Skills Required

PythonAWSAutomationAPICloudWatchCI/CDAccountabilityAmbiguityAnalysisAPIAWS LambdaBashCollaborationCommitmentCommunicationComplianceComprehensiveComputer ScienceCritical ThinkingCustomer ExperienceCustomer NeedsCustomer SatisfactionDedicationDesignDevelopmentDisciplineDocumentationEnterprise SoftwareExceeding Customer ExpectationsFocusedGuidanceHealthcareIncident ManagementIncident ResponseInfrastructureInfrastructure as CodeInfrastructure ManagementIT OperationsLeadershipLoad BalancingManagementOptimizationOwnershipProactiveProduct EngineeringReliability EngineeringRoot Cause AnalysisScalabilityScienceSecuritySite Reliability EngineeringSoft SkillsStrategic LeadershipTechnical LeadershipTerraformTroubleshootingWritten Communication

Job Overview


Job Function: IT/Computers - Software & Software Services

Job Type: Full Time

Workplace Type: Not Specified

Experience Level: Mid-Senior level

Salary: $166,000 - $203,000 / Annual Salary

Experience: 8 - 9 yrs

Contact Information


Company Name: XperiencOps Inc

Recruiting People: HR Department

Website: http://xops.io

Location

Important Fraud Alert:
Beware of imposters. elsejob.com does not guarantee job offers or interviews in exchange for payment. Any requests for money under the guise of registration fees, refundable deposits, or similar claims are fraudulent. Please stay vigilant and report suspicious activity.

Similar Jobs

IT Security and Compliance Analyst

XperiencOps Inc • Pleasanton, CA, USA

Salary: $135,000 - $165,000 / Annual Salary

View Job
R&D Engineer

Vector Atomic • Pleasanton, CA, USA

Experience: 3 - 4 yrs

Salary: $85,000 - $90,000 / Annual Salary

View Job
Senior Technical Support Engineer

XperiencOps Inc • Pleasanton, CA, USA

Salary: $153,000 - $167,000 / Annual Salary

View Job
Lead Product Designer

XperiencOps Inc • Pleasanton, CA, USA

Experience: 6 - 7 yrs

Salary: $153,000 - $187,000 / Annual Salary

View Job
Design Engineer

BKF Engineers • Pleasanton, CA, USA

Experience: 0 - 3 yrs

Salary: $76,000 - $93,000 / Annual Salary

View Job
Senior DevOps Engineer

XperiencOps Inc • Pleasanton, CA, USA

Experience: 10 - 11 yrs

Salary: $170,000 - $208,000 / Annual Salary

View Job
Senior Engineering Manager (Frontend/Fullstack)

XperiencOps Inc • Pleasanton, CA, USA

Experience: 5 - 6 yrs

Salary: $202,000 - $247,000 / Annual Salary

View Job
Senior Fullstack Engineer / Apps Team

XperiencOps Inc • Pleasanton, CA, USA

Experience: 7 - 8 yrs

Salary: $175,000 - $214,000 / Annual Salary

View Job
Senior Backend Engineer

XperiencOps Inc • Pleasanton, CA, USA

Experience: 7 - 8 yrs

Salary: $175,000 - $214,000 / Annual Salary

View Job
Senior React Native Developer

XperiencOps Inc • Pleasanton, CA, USA

Experience: 8 - 9 yrs

Salary: $184,000 - $226,000 / Annual Salary

View Job