Course Category: Agile Methods
Course Duration: 2 Days
Hours: 14 Contact Hours
Course Overview
The Site Reliability Engineering Foundation course introduces a range of practices for improving service reliability through a mixture of automation, working methods and organisational re-alignment. The course is tailored for those focused on large-scale service availability.
This course is an introduction to the principles and practices that enable an organisation to reliably and economically scale critical services. Introducing a site-reliability dimension requires organisational re-alignment, a new focus on engineering and automation, and the adoption of a range of new working paradigms.
The course highlights the evolution of SRE and its future direction, and equips participants with the practices, methods, and tools to engage people across the organisation involved in reliability and stability evidenced through the use of real-life scenarios and case stories. Upon completion of the course, participants will have tangible takeaways to leverage when back in the office such as understanding, setting and tracking Service Level Objectives (SLOs).
This course positions learners to successfully complete the SRE Foundation certification exam.
Learning Objectives
Participants will develop a practical understanding of:
- The history of SRE and its emergence at Google
- The inter-relationship of SRE with DevOps and other popular frameworks
- The underlying principles behind SRE
- Service Level Objectives (SLOs) and their user focus
- Service Level Indicators (SLIs) and the modern monitoring landscape
- Error budgets and the associated error budget policies
- Toil and its effect on an organisation’s productivity
- Some practical steps that can help to eliminate toil
- Observability as something to indicate the health of a service
- SRE tools, automation techniques and the importance of security
- Anti-fragility, our approach to failure and failure testing
- The organisational impact that introducing SRE bring
Who is the SRE Foundation Course for
- Anyone starting or leading a move towards increased reliability
- Anyone interested in modern IT leadership and organizational change approaches
- Business Managers
- Business Stakeholders
- Change Agents
- Consultants
- DevOps Practitioners
- IT Directors
- IT Managers
- IT Team Leaders
- Product Owners
- Scrum Masters
- Software Engineers
- Site Reliability Engineers
- System Integrators
About DevOps Institute (DOI)
The DevOps Institute (DOI) is the global learning community around emerging DevOps practices and is bringing enterprise level DevOps training and certification to the IT market. Working with recognized thought leaders, a strategic examination partner, and DevOps Express, DOI has set the quality standard for relevant, current and sustainable DevOps course content and certification.
Pre-requisites
An understanding and knowledge of common DevOps terminology and concepts, and related work experience are recommended.
SRE Foundation Certification Exam
The SRE Foundation certification exam is a computer based exam which is taken with an online proctor at a date and time convenient to the learner.
Successfully passing (65%) the 60-minute examination, consisting of 40 multiple-choice questions, leads to the candidate’s designation as a SRE Foundation certified. The certification is governed and maintained by the DevOps Institute.
Course outline
SRE Principles and Practices
- What is Site Reliability Engineering?
- SRE and DevOps: What is the Difference?
- SRE Principles and Practices
Service Level Objectives and Error Budgets
- Service Level Objectives (SLOs)
- Error Budgets
- Error Budget Policies
Reducing Toil
- What is Toil?
- Why is Toil Bad?
- Doing Something About Toil
Monitoring and Service Level Indicators
- Service Level Indicators (SLIs)
- Monitoring
- Observability
SRE Tools and Automation
- Automation Defined
- Automation Focus
- Hierarchy of Automation Types
- Secure Automation
- Automation Tools
Course outline
Anti-Fragility and Learning from Failure
- Why Learn from Failure
- Benefits of Anti-Fragility
- Shifting the Organisational Balance
Organisational Impact of SRE
- Why Organisations Embrace SRE
- Patterns for SRE Adoption
- On-Call Necessities
- Blameless Post-Mortems
- SRE and Scale
SRE, Other Frameworks, The Future
- SRE and Other Frameworks
- The Future
- Additional Sources of Information