Junior Site Reliability Engineer (SRE)

Junior Site Reliability Engineer (SRE)

ZineOne, Inc. | Mumbai, MH, IN

Posted a month ago

Apply Now


Site Reliability Engineer

Job Description

This is a hands-on technical position to be a team member of the Site Reliability Engineering Group. The primary mission of our SRE is to ensure ZineOne Cloud runs smoothly, efficiently and achieve scalability reliably. The ideal candidate has 5+ years of experience managing cloud-based Big Data stack and strives to solve operations problems through automation and software tools. You must possess a high standard of excellence, solid written and verbal communication skills, have a strong customer focus, and technical depth in operating systems, application performance, databases, load balancers, networks, and storage systems.

Key Responsibilities: As a ZineOne SRE, you will:

  • Build solutions to problems that impact availability, performance, and stability in our systems, services, and products
  • Develop and maintain cloud infrastructure as code and provision AWS environments for our customers that will automate as many of the ops related work such as build, deploy, install, start/stop, restart, monitoring of several machines on cloud providers such as AWS or on premise
  • Develop and deploy new automated solutions using frameworks that will enable the core platform and features to automatically scale in the cloud
  • Work closely with other members of the group to enable DevOps automation, continuous integration, test automation execution and continuous delivery of the ZineOne platform & its new features
  • Perform data plumbing/engineering tasks to ensure clean and correct data is ingested into our platform or sent to receiving systems
  • Proficient in software engineering, shell scripting, python, java and other programming languages as well as willingness to research and learn new technologies and frameworks to utilize while creating automation solutions
  • Develop and implement instrumentation for monitoring the health and availability of services including fault detection, alerting, and recovery
  • Be accountable for backup and business continuity / disaster recovery procedures
  • Develop and maintain documentation for operational practices and procedures as well as help drive operational cost reduction

Skills in the following areas and/or similar cloud platforms will be a plus:

  • Working experience with AWS using both the AWS Management Console and the AWS Command Line Interface (AWS CLI)
  • Strong experience building and maintaining production systems on AWS using EC2, S3, ELB, CloudFront, Elastic BeanStalk etc
  • Perform technical and system administration to support development, deployment and delivery. Deep experience in administering Linux systems, NodeJS, Python, Flask, Bash Shell Script modules
  • Experience with real-time, big data platform including HDFS/Hbase architecture, Zookeeper and Kafka clusters