Experience leading and managing teams in a Site Reliability Engineering or related role. Minimum of 4 years professional software development experience instrumenting complex observability stacks, preferably in Go. Minimum of 2 years professional experience with containers in a professional setting, preferably Docker Strong understanding of microservices architecture and its associated challenges. Proficiency in AWS container management, orchestration, and observability features (ECS, Fargate, Aurora, AppConfig, CloudWatch, etc.) Professional Experience in Terraform and/or CloudFormation Adept understanding of observability stack management (otel, tracing, monitoring, alerting, structured logging, APM, etc.) Strong leadership and communication skills, able to lead and mentor other engineers, clearly detail designs and implementations, and effectively communicate with cross-functional teams. Demonstrated experience in driving and leading incident response, incident management, and post-incident review processes. Lead and manage the day-to-day operations of a team of 3-5 SREs, including road-mapping, task assignments, and performance evaluations. Mentor and train your team in observability best practices and foster a culture of continuous learning and improvement. Lead incident response efforts and troubleshoot critical issues to minimize downtime and maintain high availability of systems. Design and implement solutions for monitoring, alerting, and incident response to proactively identify and resolve issues. Be a trusted voice in the evangelism of reliability engineering throughout the team with an eagerness for mentoring. Work with technical leadership to help define and oversee short and mid-term project roadmaps. Participate in after-hours on-call support rotations.