About Global Tech:
Imagine working in an environment where one line of code can make life easier for hundreds of millions of people and put a smile on their face. Thats what we do at Walmart Global Tech. Were a team of 15,000+ software engineers, data scientists and service professionals within Walmart, the worlds largest retailer, delivering innovations that improve how our customers shop and empower our 2.3 million associates. To others, innovation looks like an app, service or some code, but Walmart has always been about people. People are why we innovate, and people power our innovations. Being human-led is our true disruption.
Team and Position Summary:
We are looking for passionate Software Engineers to join our Site Reliability and Engineering team & help us in building our next generation site reliability applications and tools for the Fulfillment pillar. The SRE team is responsible for deployment and platform architecture, platform modernization, infrastructure design, setup & support, site reliability & availability, performance testing, deployments, automation, cloud capacity optimization, tooling, dashboarding, development & adoption, holiday readiness & support, 24/7 support for deployment and infrastructure issues
As part of the SRE team @Walmart Labs, youll have the opportunity to work with some of the smartest engineers/architects and will have the opportunity to solve build applications and tools to make our services resilient.
- Develop software to improve reliability of services
- Develop automation to improve developer productivity
- Research, learn & adapt new reliability technologies to solve problems & improve existing solutions
- Participate in events to build innovative solutions
- Adhere to company policies, procedures, mission, values, and standards of ethics and integrity.
- Bachelor's degree in Computer Science, Engineering or related discipline
- 4+ years of hands-on related DevOps/SRE experience
- Strong experience in at least one of the following programming languages: Python/Go
- Experience with Monitoring & Logging solutions like Prometheus, Grafana, ELK/Splunk etc.
- Experience of working with large scale distributed systems, including scalability, disaster recovery and fault tolerance.
- Expertise in tools such as Terraform, Jenkins, Ansible/Chef/Puppet etc.
- Strong knowledge on CICD Pipeline, GIT
- Strong working experience with containers and orchestration technologies like Docker, Kubernetes,
- Experience with Service Mesh like Istio, etc.
- Experience in developing full stack application using Java
- Demonstrated knowledge of Configuration Management and Deployment tools automation
- Strong Experience with networking concepts and protocols (HTTP, HTTPS, Telnet, SSH, Firewall, VPN, Routing and Load Balancing)
- Strong Experience with Linux and Open Source software
- Experience with Monitoring solutions like Prometheus, Grafana, Products like ELK/Splunk etc.
- Experience of doing load/performance tests and other non-functional testing
- Experience of chaos engineering
- Experience with code-quality and automated testing (Sonar, JUnit, Selenium or similar)
- Experience with API standards (REST, SOAP, JSON, XML)
- Experience with SQL and NoSQL databases
- Experience of working with large scale systems