Site Reliability Engineer (SRE) – 100% Remote

The Judge Group
Published
August 27, 2021
Location
Atlanta, GA
Category
Job Type

Description

Location: REMOTE
Description: Our client is currently seeking a Site Reliability Engineer (SRE) - Remote

Job Description:

Responsibilities:

  • Lead the newly established SRE team supporting Discover.com and Discover’s mobile application.  Discover.com gets 3.5million and Discover Mobile gets 2.5 million logins daily!
  • Champion a culture of learning, continuous improvement, and blameless retrospection within your team.
  • Mentor and grow your junior engineers, and empower and unblock your senior ones.
  • Partner with our Talent Acquisition team as we recruit, interview and hire the best engineering talent to join Discover’s growing SRE practice.
  • Partner with Product teams and Solution Architects to help design solutions that achieve the required reliability outcomes for their services.
  • Be a leader in the SRE community of practice and evolve the SRE practice or the entire organization.
  • Use the core Site Reliability Engineering principles of change management, monitoring, emergency response, capacity planning, and production readiness reviews to run platforms
  • Partner with security engineers and developing plans and automation to aggressively and safely respond to new risks and vulnerabilities.

Necessary experience: 

•     Well versed with the entire software development lifecycle, DevOps, and SRE practices

•     Expertise and operational experience at scale - designing and operating highly available, scalable and fault-tolerant systems using container platforms

•     Experience with operational monitoring tools (AppDynamics, NewRelic, Instana, CatchPoint) with a mindset towards predictive analysis

•     Experience with Splunk or ELK Stack, Grafana, DataDog, or Sysdig

•     Working knowledge of the automation tools such as Ansible, Terraform, or Chef

•     Experience  with Pivotal Cloud Foundry (PCF), OpenShift (OCP), Amazon Web Service (AWS), and Google Cloud Platform (GCP)

  • Good understanding of networking including L2 and L3 concepts, including Firewall, Load Balancing, Routing and Switching.
  • A working knowledge of Linux based systems and Virtual Machines (VM) technology
  • Strong scripting skills including ability to write scripts from scratch using Python and/or Bash
  • Basic knowledge and understanding of Security (CIA Model and PCI compliance) is a plus
  • Experience with Continuous Integration and Continuous Delivery models including Blue/Green and Canary release models is a plus

Minimum Qualifications:

•     You have 5+ years of SRE experience in a highly customer-focused environment.

•     You have 3+ years experience successfully managing a team of engineers on large-scale projects that included technical deep-dives and production troubleshooting in the areas of: distributed systems, programming, configuration management, networking, storage, and operating systems

•     You possess strong leadership skills and the ability to motivate teams.

•     You bring a strong perspective and collaborative partnership that drives change, and motivates engineers to develop simple solutions to complex operational or reliability challenges.

•     You have experience formulating a team's technical strategy and roadmap, and you've collaborated and partnered effectively with several other teams.

•     You are capable of leading a discussion with upper management, and are able to tailor the level of technical detail to suit your audience.

B.S. in Computer Science or equivalent experience

Contact: [Click Here to Email Your Resumé]

This job and many more are available through The Judge Group. Find us on the web at www.judge.com
Apply
Drop files here browse files ...

Related Jobs

September 21, 2021
Senior to Lead .NET Engineer   Atlanta, GA new
September 21, 2021
September 21, 2021
Casual Package Handler (Non DOT)   Jacksonville, FL new
September 21, 2021

Author: