Junior Site Reliability Engineer (SRE)

Junior Site Reliability Engineer (SRE)

ZineOne, Inc. | Mumbai, MH, IN

Posted a month ago

Apply Now


Site Reliability Engineer

Job Description

This is a hands-on technical position to be a team member of the Site Reliability Engineering Group. The primary mission of our SRE is to ensure ZineOne Cloud runs smoothly, efficiently and achieve scalability reliably. The ideal candidate has 5+ years of experience managing cloud-based Big Data stack and strives to solve operations problems through automation and software tools. You must possess a high standard of excellence, solid written and verbal communication skills, have a strong customer focus, and technical depth in operating systems, application performance, databases, load balancers, networks, and storage systems.

Key Responsibilities: As a ZineOne SRE, you will:
Build solutions to problems that impact availability, performance, and stability in our systems, services, and products
Develop and maintain cloud infrastructure as code and provision AWS environments for our customers that will automate as many of the ops related work such as build, deploy, install, start/stop, restart, monitoring of several machines on cloud providers such as AWS or on premise
Develop and deploy new automated solutions using frameworks that will enable the core platform and features to automatically scale in the cloud
Work closely with other members of the group to enable DevOps automation, continuous integration, test automation execution and continuous delivery of the ZineOne platform & its new features
Perform data plumbing/engineering tasks to ensure clean and correct data is ingested into our platform or sent to receiving systems
Proficient in software engineering, shell scripting, python, java and other programming languages as well as willingness to research and learn new technologies and frameworks to utilize while creating automation solutions
Develop and implement instrumentation for monitoring the health and availability of services including fault detection, alerting, and recovery
Be accountable for backup and business continuity / disaster recovery procedures
Develop and maintain documentation for operational practices and procedures as well as help drive operational cost reduction
Skills in the following areas and/or similar cloud platforms will be a plus:
Working experience with AWS using both the AWS Management Console and the AWS Command Line Interface (AWS CLI)
Strong experience building and maintaining production systems on AWS using EC2, S3, ELB, CloudFront, Elastic BeanStalk etc
Perform technical and system administration to support development, deployment and delivery. Deep experience in administering Linux systems, NodeJS, Python, Flask, Bash Shell Script modules
Experience with real-time, big data platform including HDFS/Hbase architecture, Zookeeper and Kafka clusters