Jobs Search | Entry Level Jobs | Internships for Students

Senior Platform Reliability Engineer

G-Research | Dallas, TX, US

Salary Range:$108,000 – $147,000 Salary range estimated by Zippia

Posted 8 hours ago

Description

Do you want to tackle the biggest questions in finance with near infinite compute power at your fingertips?

G-Research is a leading quantitative research and technology firm, with offices in London and Dallas. We are proud to employ some of the best people in their field and to nurture their talent in a dynamic, flexible and highly stimulating culture where world-beating ideas are cultivated and rewarded.

This is a hybrid role based in our new Dallas infrastructure hub where we work on the latest technologies in a cutting-edge environment.

The role

The Reliability Engineering team, part of our Platforms as a Service (PaaS) function, works with a variety of technologies, including multiple Kubernetes clusters, multiple database technologies, low latency networks and big data warehouse across multiple regions around the globe.

We are actively seeking an experienced Site Reliability Engineer (SRE) with a proven track record in building up and bootstrapping SRE functions across multiple teams.

We want an individual who excels in ensuring the robustness, scalability, and fault tolerance of large-scale infrastructure. The ideal candidate will have a comprehensive understanding of the intricacies involved in architecting, deploying, and maintaining high-performance solutions, coupled with a track record of implementing and enhancing reliability measures across all infrastructure ecosystems.

This role demands hands-on experience in orchestrating resilient systems, fine-tuning performance, and implementing proactive strategies to mitigate potential downtimes or disruptions. The successful candidate will play a pivotal role in driving the reliability, efficiency, and scalability of infrastructure platform through innovative solutions and best-in-class practices.

In return, you will gain exposure to the latest hardware and software technologies in a forward-thinking company, which values innovation, personal development and training.

Key responsibilities of the role include:

Leading efforts to enhance existing practices across teams, fostering collaboration and synchronization to optimize system reliability and scalability
Driving strategies for enhancing systems performance, leveraging innovative approaches to improve efficiency and streamline processes
Implementing best practices for system reliability, fault tolerance, and scalability, ensuring alignment with evolving industry standards
Cultivating a culture of continuous improvement, encouraging regular reviews and iterative enhancements to tools, methodologies, and processes
Enhancing incident response processes by conducting comprehensive reviews, implementing improvements, and integrating learned lessons into future strategies
Leading efforts to optimize capacity planning strategies, ensuring systems are prepared for future scaling while maximizing resource utilization
Collaborating with security teams to fortify and enhance security measures within systems, ensuring compliance with evolving policies and standards
Collaborating effectively with other SRE’s within PaaS, and colleagues in different time zones.(Dallas and London)

Who are we looking for?

The successful candidate will be an experienced Platforms Reliability Engineer who is enthusiastic about contributing to an automated, scalable, reliable and high-performing Infrastructure and Platform as a Service:

A strong desire to continually learn about new technologies, approaches, and systems, along with the agility to work across multiple teams
A strong communicator with excellent written communications to technical and non-technical audiences
A self-starter with excellent problem-solving skills
Proficient in Go or other programming language such as Python, Rust or Java for automation and development tasks
Extensive Linux, Networking and Infrastructure knowledge
Experience with CI/CD (preferably Jenkins and ArgoCD) and Configuration Management tools, such as Ansible and Terraform
Experience deploying and running applications on Docker and Kubernetes, including the creation of Helm charts
Familiarity with monitoring tools like Prometheus, Grafana, Open Telemetry and the ELK stack (Elasticsearch, Logstash, Kibana), or similar
Understanding of core SRE concepts and their implementation in platform engineering

Beneficial experience would include:

Experience building and bootstrapping an SRE organization across multiple teams
Experience working on large-scale infrastructure to improve performance, stability and efficiency

Why should you apply?

Market-leading compensation plus annual discretionary bonus
Informal dress code and excellent work/life balance
Excellent paid time off allowance of 25 days
Sick days, military leave, and family and medical leave
Generous 401(k) plan
16-weeks’ fully paid parental leave
Medical and Prescription, Dental, and Vision insurance
Life and Accidental Death & Dismemberment (AD&D) insurance
Employee Assistance and Wellness programs
Generous relocation allowance and support
Great selection of office snacks, and hot and cold drinks
On-site gym and car parking

Find Jobs