Lead Data Engineer

New York Life Insurance Company (“New York Life” or “the company”) is the largest mutual life insurance company in the United States*. Founded in 1845, New York Life is headquartered in New York City, maintains offices in all fifty states, and owns Seguros Monterrey New York Life in Mexico.


New York Life is one of the most financially strong and highly capitalized insurers in the business. The company reported 2016 operating earnings of $1.954 billion. Total assets under management at year end 2016, with affiliates, totaled $538 billion.  As of year-end 2016, New York Life’s surplus was $23.336 billion**.  New York Life holds the highest possible financial strength ratings currently awarded to any life insurer from all four of the major ratings agencies: A.M. Best, A++; Fitch AAA; Moody’s Aaa; Standard & Poor’s AA+. (Source: Individual Third Party Ratings Report as of 8/17/16).


Financial strength, integrity and humanity—the values upon which New York Life was founded—have guided the company’s decisions and actions for over 170 years.



Ranked number 61 on the 2016 Fortune 500 list, New York Life Insurance Company (“the Company”) is the oldest and the largest mutual life insurance company in the United States and one of the largest life insurers in the world.  Founded in 1845, New York Life is headquartered in New York City and maintains operations in all fifty states as well as Mexico, through a network of more than 17,000 agents and more than 10,000 employees.  New York Life is consistently recognized as a great place to work and is often named on the lists of "Best Companies": 




Title: Lead Data Engineer
Location: CNJ/JCNJ/HO

Job Level: MG1


The Information Delivery team provides transformative data capabilities and solutions that serve as the core data, reporting, and analytics platform for the Insurance & Agency Group at New York Life (NYL). The Information Delivery team is looking for a Lead Data Engineer with the skills listed below.




Experience working with Polyglot Data Storage technologies in the cloud, Data Lake (HDFS, Blog storage), RDMS (Oracle, SQL, SQL DW, Redshift) and NoSQL solutions (Key Value, Document, Column and Graph). The installation, configuration, administration and governance of these environments on-prem and the cloud.


MS/BS in Computer Science, Information Systems, or related field preferred and/or equivalent experience




Data Ingestion

Experience in data processing operations (ETL / ELT) that is optimized and scalable, encapsulated in workflows, that transform source data, move data between multiple sources, load the processed data into an analytical data store using cloud orchestration technology such as Azure Data Factory or Apache Oozie, Sqoop, AWS Data Pipeline, AWS Glue, Informatica.


Batch Processing

Experience in the Hadoop echo system, reading source files, processing them, and writing the output to new files or data stores using Hive, Pig, or custom Map/Reduce jobs in a Hadoop cluster, or using Java, Scala, or Python programs in an Hadoop Spark cluster.


 Stream Processing and Real-Time Message Ingestion

Experience with real-time data sources and message ingestion for processing by filtering, aggregating, and preparing the data for analysis. Technologies such as Spark Streaming and Kafka, AWS Kinesis, Azure IOT or Event Hub etc. Good understanding of Lambda Architecture. Experience building and productionizing micro-services and APIs.


ML and AI

Identify and formulate various challenges in user support domains as Machine Learning (ML) problems.

Design, develop, validate and deploy proposed ML solutions.


Analysis and Reporting and Analytical Data Stores

Knowledge of solutions (complex reports, dashboards, and scorecards) that provide insights into the data through analysis and reporting with Enterprise Data Lake initiatives including Data Model development, Semantic Layer Development is a plus.


Knowledge of solutions for shared data usage using cloud based data store technologies such as AWS (Redshift) or Azure (SQL Data Warehouse), SSAS or NoSQL technology such as HBase, or Hive databases in a distributed data store is a nice to have.


Working knowledge in any one of the ETL tools is a plus.


Knowledge of the reporting tools like  PowerBI, Tableau, Qlik Sense, QlikView is a plus.


Other Responsibilities:


  • Ability to manage a team in an on/offshore model
  • Ability to interact with multiple technical teams and the data governance team to come with a data governance framework.










If you have difficulty using or interacting with any portions of this Web site due to incompatibility with an Assistive Technology, if you need the information in an alternative format, or if you have suggestions on how we can make this site more accessible, please contact us at: (212) 576-5811.


*Based on revenue as reported by “Fortune 500, ranked within Industries, Insurance: Life, Health (Mutual),” Fortune Magazine, June 17, 2016.  See http://fortune.com/fortune500/  for methodology.

**Total surplus, which includes the Asset Valuation Reserve, is one of the key indicators of the company’s long-term financial strength and stability and is presented on a consolidated basis of the company.


1. Operating earnings is the key measure use by management to track Company’s profitability from ongoing operations and underlying profitability of the business. This indicator is based on generally accepted accounting principles in the US (GAAP), with certain adjustments Company believes to be appropriate as a measurement approach (non GAAP), primarily the removal of gains or losses on investments and related adjustments.


2. Assets under management represent Consolidated Domestic and International insurance Company Statutory assets (cash and invested assets and separate account assets) and third party assets principally managed by New York Life Investment management Holdings LLC, a wholly owned subsidiary of New York Life Insurance Company.