We are looking for a Site Reliability Engineer who’ll be working on a project for one of the few unicorn startups in the USA - Calendly. They are one of the leading companies in the US, helping its users stay organized and schedule their meetings effortlessly!
You will help engineering teams improve the reliability, performance, resilience, and security of the services they own. Working with a well-defined continuous delivery process and a reasonably instrumented production environment, the successful candidate will be able to define SLOs and measure SLIs with an eye toward continuous improvement and an evolution at scale. An ideal candidate should demonstrate exceptional leadership in communicating patterns and improvements that automate tasks, improve stability, secure systems, and increase performance.
Institute resilient infrastructure through source code based configuration (Infrastructure as code)
Demonstrate skills in evaluating, measuring, and improving rapidly evolving systems
Collaborate with engineering teams to understand and improve their systems
Organize a holistic ecosystem of infrastructure, tools, and capabilities that effectively provides visibility into the health of each component
Operate CI/CD pipelines to provision, track, validate, sign, and securely deploy software
Grow expertise in cloud concepts, especially IaaS/PaaS with exposure to virtualization technology in support of building our enterprise container infrastructure
Implement high availability systems with automated failover across multiple availability zones
Lead postmortem of unexpected incidents to prevent future recurrence
Participate in an on-call rotation to support critical Calendly infrastructure
Foster an environment of learning and knowledge dissemination
Prototyping new solutions and going into green-field implementation
Define standard practices and tooling around new services, changes, incidents, postmortems, and work and capacity to work with engineering teams to adopt those practices
3+ years of Engineering experience supporting high availability systems in production
Experience solving infrastructure problems with software
Excellent verbal and written English
Strong technical knowledge of cloud infrastructure, distributed systems, and reliability practices
Experience working in a Linux environment
Experience with GCP and/or AWS
RDBMS experience
Software development experience
Experience deploying containerized services (Docker experience preferred)
Experience running and securing Kubernetes in production environments
Understanding of CI/CD pipelines and application delivery via GitOps
Varied experience in software monitoring tools
6-hour net workday
Remote work
Flexible working hours
20 vacation days
Private health insurance with full pregnancy/maternity coverage and family members included
10 days of paid paternity leave
New Macbook Pro 16”
Pet-friendly office
Net salary starting at 2000 EUR
Monthly home expenses budget of 50 EUR
A dynamic and friendly atmosphere where you can further develop your skills while having fun along the way!
Short general questionnaire ~ 5 min
Intro call with a team ~ 30 min
Client-side screening ~ 30 min
Client-side Tech Interview ~ 1h
HR & Interview with our Chief Cat ~ 1.5h
If this position sounds appealing, send us your resume at careers@fatcatcareers.com with the subject line “Site Reliability Engineer - Debele Macke”.
Fill in the form and apply today. We’ll reach out shortly.