SRE Architect

SRE Architect

NeerInfo Solutions | Dallas, TX, 75215, US

Posted 10 days ago

Apply Now

Description

Responsibilities

  • Provide SRE and production support with an emphasis on observability to proactively identify issues and drive incident response.
  • Act as incident commander to diagnose complex issues and actively drive incident calls with technical teams, product SMEs, and Tier 2 SREs.

Qualifications

  • Bachelor’s degree or foreign equivalent required from an accredited institution. Will also consider three years of progressive experience in the specialty in lieu of every year of education.
  • At least 10 years of Information Technology experience.
  • SRE mindset in production support with proactive issue identification using observability tools.
  • Skilled in using monitoring and observability tools to track system performance.
  • Experience with Splunk (including Splunk APM and Splunk O11y), AppDynamics; experience with DB, Network, Linux/Unix, Kubernetes; and experience in APM, NMON, Wireshark usage and analysis.
  • Experience in production support activities including proactive issue identification leveraging observability tools and correlating inputs from dashboards and tools to drive resolution.
  • Able to identify probable failure points through analysis of logs, observability dashboards, recent application changes, infra and network changes.
  • Basic troubleshooting across the stack (Application, Database, Infra including container platforms, and Network).
  • Experience in setting up observability dashboards based on Splunk logs.

Preferred Qualifications

  • Production support expertise with SRE observability experience, including proactive issue identification using observability tools and tracking system performance.
  • Experience in production support activities involving correlating inputs from dashboards and tools to drive resolution.
  • Ability to swiftly identify probable failure points through analysis of multiple inputs (logs, observability dashboards, recent changes, infra, network changes).
  • Strong troubleshooting across all layers of the tech stack (Application, Database, Infra including container platforms, and Network).
  • Experience in setting up observability dashboards based on Splunk logs.

Communication

  • Excellent communicator and capable of leading and triaging proactively identified issues/incidents where leadership may be present.
  • Leadership in triage calls to direct actions for the team.
  • Automation – experience in Toil identification and automation.

Technical expertise

  • Analysis of issues via Splunk (including Splunk APM and Splunk O11y), AppDynamics, Grafana, RedMetrics, 1000Eyes.
  • Debugging issues in VMs, load balancers, firewalls, API gateways, DB, network, Linux/Unix.
  • Debugging in containerization (Docker, Kubernetes), AWS, PCF, Azure.
  • Analysis of issues via APM, NMON, Wireshark usage and analysis.
  • Database performance monitoring and analysis.
  • Experience in UEM and synthetic monitoring setup.
  • Experience in heap dump analysis, memory leak analysis, and resource optimization.
#J-18808-Ljbffr