Site Reliability Engineer

November 23, 2021
Atlanta, GA
Job Type


100% Remote / Site Reliability Engineer / Opportunity to improve lives for our Veterans / Cutting edge Application Monitoring and Performance Engineering

This Jobot Job is hosted by: Blake Williams
Are you a fit? Easy Apply now by clicking the "Apply Now" button and sending us your resume.
Salary: $110,000 - $160,000 per year

A bit about us:

Support our Veterans! We are on a mission to transform government IT to drive efficiency and tax payer value to improve the lives of our Veterans. We know how to drive IT transformation so federal agencies can work faster and easier, enabling them to focus on their important roles in serving the needs of our citizens. Our team supports the application and performance monitoring for over 800 applications used by the VA and Veterans Hospitals across the USA.

Why join us?

There's one thing that unites us, no matter where in the world we live: we are transformers. Transformer is more than just a catchy title; it’s the core of who we are as a company. We wholeheartedly embrace this title, both in how we approach our customer work and in the workplace culture we create. We offer tons of benefits for our employees.

  • 401(k) & Roth retirement plans w/ company match up to 4%
  • Generous Paid time off
  • 10 paid holidays
  • Medical, dental, & vision insurance
  • Flexible Spending Accounts (FSA, DCA, transit, & parking)
  • Health Savings Accounts (HSA) with employer contribution
  • Life and AD&D insurance; 1x salary, up to $300K max
  • Short and long-term disability
  • Voluntary life/AD&D (employee, spouse, & child)
  • Legal plan
  • Pet insurance
  • Critical illness, accident, & hospital insurance
  • Employee assistance program
  • Employee referral program
  • Bereavement/jury duty leave
  • Spot bonus program

Job Details

As a Site Reliability Engineer, you will be focused on establishing and improving monitoring to measure end-to-end performance and end-user availability of systems via a suite of common monitoring tools. You will work with teams within Enterprise Command Center (ECC) to assist with development and implementation of monitoring to meet business requirements, including KPIs, service mapping, dependency mapping, alerting thresholds, etc. You will be working with other site reliability engineers and dedicated monitoring engineers to support this initiative.

Job Responsibilities:

  • Work with application owners, both Business owner and operations technical teams, to establish Business and Technical monitoring strategies, including instrumentation of the systems, collection of metrics, development of KPIs, and configuration of alerting by static and dynamic thresholds through use of statistical analysis and machine learning.
  • Support Triage incidents by deconstructing application performance, interoperability, instrumentation, and human factors to facilitate resolution and development of resilient solutions.
  • Support Triage efforts during major Incidents by deconstructing application performance, interoperability, instrumentation, and human factors to facilitate resolution and development of resilient solutions.

Ideal Background:

  • 5+ years of DevOps / Site Reliability Engineer experience
  • Experience with modern application performance monitoring and diagnostics with Dynatrace, AppDynamics, Splunk, ITSI, and WireShark
  • Experience with incident management and Triage within the OSI model
  • Experience developing KPIs for application monitoring.
  • Experience with DevOps tools, JIRA, ServiceNow, MS Project
  • Ability to diagnose complex issues throughout many technologies and apply this knowledge to effective monitoring of applications.
  • Bachelors Degree (or 10 years of professional experience in lieu of a degree)
  • Ability to pass a background check including fingerprinting and a Public Trust Clearance

Interested in hearing more? Easy Apply now by clicking the "Apply Now" button.

Drop files here browse files ...

Related Jobs

December 2, 2021