Systems Engineering Principal


About the Role

The Customer Centric Reliability Engineering (CCRE) organization at Salesforce is focused on ensuring our products deliver seamless reliability to exceed customer expectations. The Problem Management team drives post-incident analysis, enabling teams across Salesforce to learn, remediate, and improve the reliability of our platforms.

This Principal Engineer role is critical to our success, as they will lead the charge in uncovering technical themes, recommending improvements, and pushing for systemic changes across our clouds. You will be hands-on, reading code and working directly with other engineers on suggested solutions.

What You’ll Do

  • Identify and analyze recurring technical themes across incidents, releases, and problem management data.
  • Recommend improvements across multiple clouds and advocate for systemic changes.
  • Drive engineering teams to integrate recommended improvements into their roadmaps and deliver sustainable, impactful solutions.
  • Define and prioritize paved path development functions for the Site Reliability Engineering (SRE) organization.
  • Conduct reviews of high-impact incidents and problems to ensure appropriate levels of remediation across platforms and services.
  • Collaborate with service owners to drive root cause mitigation, corrective actions, and incident detection improvements.
  • Foster an environment of proactive reliability and resilience, ensuring all platforms meet high standards of technical excellence.
  • Communicate effectively across technical and executive audiences, advocating for necessary changes and championing cross-cloud collaboration.

What We’re Looking For

Minimum Qualifications:

  • 10+ years of engineering experience, with a focus on reliability engineering and post-incident analysis.
  • Proven experience driving systemic technical improvements across platforms and teams in large-scale, distributed systems.
  • Experience with various architectures and platforms, proficient in both Windows and Linux/Unix.
  • Ability to debug and understand stack traces, architectural patterns, and reliability concerns.
  • Strong communication and leadership skills, with a track record of influencing and driving change across engineering and business organizations.
  • Experience with incident analysis, root cause identification, and defining technical remediation strategies.
  • Extensive knowledge of service reliability, observability practices, and availability metrics.
  • Familiarity with development in object-oriented programming languages (e.g., Python, Java) and experience with cloud-based architecture.
  • Bachelor’s degree or equivalent in a technical field.

Preferred Qualifications:

  • Experience leading cross-functional initiatives to implement technical improvements.
  • Expertise in incident management processes and operational excellence practices.
  • Hands-on experience with data analysis and visualization tools to drive technical insights, including SQL, Big Data, NoSQL, Memstores/memcache.

Benefits & Perks

Check out our benefits site to explore our various benefits, including:

  • Wellbeing reimbursement
  • Generous parental leave
  • Adoption assistance
  • Fertility benefits
  • And more!

Learn More About Salesforce

Check out our Salesforce Engineering Site to learn more about our technical teams and projects.

Fair Hiring Practices

For roles in San Francisco and Los Angeles: Pursuant to the San Francisco Fair Chance Ordinance and the Los Angeles Fair Chance Initiative for Hiring, Salesforce will consider for employment qualified applicants with arrest and conviction records.

CareerBee Logo

Don't miss out on new jobs!

Signup for weekly updates on new jobs so you can be the first to apply