Strong knowledge of various GCP components like Big Query, Dataflow, Cloud SQL, Bigtable, PubSub etc.
Experience in developing re-usable frameworks using PySpark.
Worked in Risk Compliance in the Payments domain to serve use cases about fraud detection.
Worked on PySpark-based Framework which automates the Data ingestion flow.
Developed Hive tables that involve Transformation to generate Dashboards.
Have experience in real-time streaming use cases with Kafka and Spark Streaming.
Expertise with scheduling tools and such as UC4, Airflow.
Data ingestion from various sources such as RDBMS, EC2 servers, etc., and handling different file formats like XML, JSON, Parquet, CSV, TXT and delimited along with cleaning and pre-processing of data ingested in HDFS using sparkhive.
Developed the hive queries to apply logic based on requirements and fetched the data after transformation and deliver it to the down source.
Created shell script to automate the jobs and job scheduling using UC4.
Experience with Git hub and Jira tools.
Using SQL functions effectively in queries to perform various data transformation activities.
Role & Responsibilities :
Take complete responsibility for the sprint stories' execution.
Be accountable for the delivery of the tasks in the defined timelines with good quality.
Follow the processes for project execution and delivery.
Follow the agile methodology.
Set clear team goals and KPIs.
Delegate tasks and set project deadlines.
Oversee day-to-day teams' operation and performance.
Do regular performance evaluations.
Create a healthy and motivating work environment and atmosphere.
Work with the team members closely and contribute to the smooth delivery of the project.
Understanddefine the architecture and discuss the pros-cons of the same with the team.
Involve in the brainstorming sessions and suggest improvements in the architecturedesign.
Work with the project's clients and counterparts (in the US).
Location: Pune, Mumbai, Chennai, Kolkata ,Delhi and Bangalore
Data Engineers Pyspark
Skills: Pyspark, Hadoop, Spark, Scala, Rdbms, Ec2, Xml, Json, Airflow
Experience: 5.00-8.00 Years