a9TD6cbzTZotpJihekdc+w==.docx

1 | P a g e
Neel Hitesh
Spark/Hadoop Developer
sreekar@hstechnologiesllc.com
Phone: (425) 818-5334
PROFESSIONAL SUMMARY:
 9+ years of extensive experience in design and development of Big Data/Hadoop/Spark/Scala/Java.
 Worked on several domains Banking, Insurance, Mortgage.
 Extensive experience in using Hadoop technologies HDFS, MapReduce, YARN, Sqoop, Pig, Hive,
Impala, Kafka, HBase, Spark.
 In depth understanding/knowledge of Hadoop Architecture and its components such as HDFS, Yarn,
Resource Manager, Node Manager, Job History Server, Job Tracker, Task Tracker, Name Node,
Data Node and MapReduce.
 Experience working on spark in performing ETL using Spark 2.0, Spark Core, Spark-SQL and Real-
time data processing using Spark Streaming.
 Extensive experience in AWS (Amazon Web Services) like S3 Storage, Elastic Compute Cloud (EC2),
Elastic Map Reduce (EMR).
 Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems
(RDBMS) and vice-versa.
 Extensive experienced in working projects involving methodologies like Agile, Scrum, TDD (Test Driven
Development), Iteration and Waterfall.
 Strong Experience in Scala, Python, Java/J2EEE, Spring, HTML5, CSS3, JavaScript, jQuery
technologies.
 Experience with Relational and Non-relational databases designing, development using Oracle
9i/10g/11G, MySQL, DB2, MongoDB, HBase and Cassandra.
 Involved in converting Hive/SQL queries into Spark transformations using Spark Data frames and Scala.
 Experience in Hadoop Distributions Cloudera and Hortonworks.
 Worked with Docker to deploy the application jar through Docker images into virtual machines.
 Experience in job workflow scheduling tools like Oozie.
 Good experience on general data analytics on distributed computing cluster like Hadoop using Apache
Spark, Impala, and Scala.
 Involved in Incident management and change management process. Heavily involve in fully automated
CI/CD pipeline process through Github, Jenkins and Puppet.
 Experience in Automated testing using JUnit, Mockito and Cucumber.
 Hands on experience with build and deployment tools Maven, ANT and Gradle and Jenkins.
 Extensive experience in operating systems Unix, Linux and Windows.
 Worked on setting up Apache NiFi and performing POC with NiFi in orchestrating a data pipeline.
 Experience in document preparation including Requirement Specification, Analysis, design documents, Test
cases and User Training documents and Technical Help documents.
 Excellent verbal and non-verbal communication skills, analytical and problem-solving skills, self-starter to
work independently or within a team environment.
Education:
Bachelors in CS – 2012

2 | P a g e
Technical Skills:
Hadoop Technologies HDFS, YARN, Spark, MapReduce, Hbase, Hive, Impala,
Kafka, Pig, Nifi, Sqoop, HUE UI, Cloudera, Kerberos
Programming Languages Java, Spring, Scala, Python, C, PL/SQL, XML
Web Technologies JSP, JavaScript, HTML/HTML5, CSS/CSS3, jQuery,
Bootstrap, Ajax
Application/Web servers JBoss 5/6, Tomcat, IBM WebSphere 6/6.1/7, Oracle
WebLogic 8.1/9.1
Analysis/Design Tools Informatica ETL, Data Modelling, Design Patterns, UML,
Axure, Photoshop
Cloud Tools AWS, S3, EMR, EC2
Testing/Logging Tools JUnit, Mockito, Jasmine, Log4J, Karma, Jenkins
Build/Deploy Tools ANT, Maven, Gradle, TeamCity, Jenkins, uDeploy, Docker
Database Technologies Oracle 9/10g/11g, DB2, MySQL, MongoDB, Cassandra
Web Services REST, SOAP, JAX-WS, JAX-RPC, Axis 2, WSDL, SOAPUI
Version Control Git, SVN, CVS
Platforms Windows, Mac OS X, Linux
Scheduler Tools Oozie
PROFESSIONAL EXPERIENCE
Client: Morgan Stanley, New York, NY- (June 2020 – Till now)
Role: Sr. Data Engineer
Responsibilities:
 Worked on real-time and batch data processing using Spark/Storm and Kafka using Scala.
 Data Ingestion implemented using Sqoop, Spark, loading data from various RDBMS, CSV, XML files.
 Data cleansing, transformations tasks are handled using Spark using Java.
 Data Consolidation was implemented using HIVE and was sourced to two different targets Dataiku and
Teradata.
 Dataiku was used primarily for data analytics purposes. Creating dashboards was a secondary use case.
 Data ingestion implemented using custom shell scripts loading data from Hive to Teradata.
 Teradata was used as a source of data for creation of dashboards in Tableau and executing queries in
backend in Informatica workflows.
 Worked on Informatica ETL tool on analyzing the logic for each table and worked on creating jobs in our
ETL application to ingest the real time data.
 Workflow creation, mapping was taken care using Informatica for purpose of creating daily adhoc reports.

3 | P a g e
 Developed and designed system to collect data from multiple sources using Kafka and then process it using
Spark.
 Sqoop jobs, Spark jobs were created for data ingestion from relational databases to compare with historical
data.
 Loaded all datasets into Hive from Source CSV, XML and JSON files using Spark.
 Used HUE UI to execute SQL queries to analyze the data.
 Worked on writing complex SQL queries to be executed in backend in Informatica and create views on top
of the queries to be used in Tableau dashboards.
 Worked on Java programming in developing Spark streaming jobs for building stream data platform
integrating with Kafka.
 Worked on processing streaming data from Kafka topics using Java and ingest the data into DB2.
 Worked on writing Java programs using Spark/Spark-SQL in performing aggregations.
 Worked on Tivoli scheduler to schedule jobs for data ingestion from multiple sources Hive and from Hive
to Teradata.
 Provided support in case of job failures on scheduler.
 Worked as support for reviewing Hadoop/application log files to handle production issues and production
releases.
Environment:
Hadoop, HDFS, YARN, MapReduce, Hive, Impala, Spark, Spark-Streaming, Spark-SQL, Cloudera, Kafka, Sqoop,
Tivoli, Dataiku, Kerberos, AWS, Java, ETL, Oracle, Informatica, Teradata, Mockito, Junit, Gradle, Maven, Git.
Client: ECA, Thousand Oaks, CA- (Jan 2019 – June 2020)
Role: Sr. Spark/Hadoop Developer
Responsibilities:
 Worked on real-time and batch data processing using Spark/Storm and Kafka using Scala.
 Data Ingestion implemented using Sqoop, Spark, loading data from various RDBMS, CSV, XML files.
 Data cleansing, transformations tasks are handled using Spark using Scala and Hive.
 Data Consolidation was implemented using SPARK, HIVE to generate data in the required formats by
applying various ETL tasks for data repair, massaging data to identify source for audit purpose, data
filtering and store back to HDFS.
 Developed and designed system to collect data from multiple portal using kafka and then process it using
spark.
 Worked on MapReduce program to export old data from HDFS to Hbase and Impala.
 Worked with Kerberos authentication for Hbase/Hive to secure the data.
 Sqoop jobs, PIG and Hive scripts were created for data ingestion from relational databases to compare with
historical data.
 Loaded all datasets into Hive from Source CSV files using spark and Cassandra from Source CSV files
using Spark.
 Used HUE UI to execute SQL queries to analyze the data.
 Involved in submitting and tracking MapReduce jobs using Job Tracker.
 Worked on writing complex SQL queries to define latency between source and target tables and
collaborated to increase performance by parallel ingestion on multiple instances to fasten the ingestion.

4 | P a g e
 Worked on Scala programming in developing spark streaming jobs for building stream data platform
 Developed multiple MapReduce jobs for data cleaning and preprocessing.
 Wrote Hive and Pig Scripts to analyze customer satisfaction index, sales patterns etc.
 Orchestrated Sqoop scripts, pig scripts, Hive queries using Oozie workflows.
 Created multi-node Hadoop and Spark clusters in AWS instances to generate terabytes of data and stored it
in AWS HDFS.
 Created Map Reduce Jobs using Hive/Pig Queries.
 Started using Apache NiFi to copy the data from local file system to HDFS.
 Worked on processing streaming data from kafka topics using Scala and ingest the data into Cassandra.
 Worked on writing Scala programs using Spark/Spark-SQL in performing aggregations.
 Worked on Oozie scheduler to scheduled jobs to get reports on data for each table and MapReduce jobs.
 Worked on Informatica ETL tool on analyzing the logic for each table and worked on creating jobs in our
ETL application to ingest the real time data.
 Migrated existing on-premises application to AWS and used AWS services like EC2 and S3 for large data
sets processing and storage and worked with Elastic Mapreduce and setup Hadoop environment in AWS
EC2 Instances.
 Worked as support for reviewing Hadoop/application log files to handle production issues and production
releases.
Environment:
Hadoop, HDFS, YARN, MapReduce, Hive, Nifi, Pig, Impala, Hbase, Spark, Spark-Streaming, Spark-SQL, HUE,
Cloudera, Kafka, Sqoop, Oozie, Kerberos, AWS, Java, ETL, Oracle, Informatica, Mockito, Junit, Gradle,
Maven, Git.
Client: BMO Harris Bank - Naperville, IL- (Nov 2017 – Dec 2018)
Role: Hadoop Developer
Responsibilities:
 Performed various POC’s in data ingestion, data analysis and reporting using Hadoop, MapReduce, Hive,
Pig, Sqoop, Flume, Elastic Search.
 Installed and configured Hadoop.
 Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
 Implemented Spark using python and Spark SQL for faster testing processing the data.
 Data cleansing, transformations tasks are handled using Spark using Scala and Hive.
 Developed spark scripts by using python shell commands as per the requirement.
 Amazon EC2 is also used for deploying and testing the lower environments such as Dev, INT and Test.
 Object storage service Amazon S3 is used to store and retrieve media files such as images.
 Developed multiple MapReduce jobs using java for data cleaning and preprocessing.
 Installed and configured Pig and also written PigLatin scripts to convert unstructured data to structured
format.
 Imported/Exported data using Sqoop to load data from Teradata to HDFS/Hive on regular basis.
 Written Hive queries for ad-hoc reporting to the business.

5 | P a g e
 Worked on Data ingestion to Kafka and Processing and storing the data Using Spark Streaming.
 Experienced in defining job flows using Oozie.
 Worked and learned a great deal from AWS Cloud services like EC2, S3, EBS, RDS and VPC.
 Hands on experience in setting up HBase Column based storage repository for archiving and retro data.
 Worked on processing streaming data from kafka topics using Scala and ingest the data into Cassandra.
 Setup and benchmarked Hadoop clusters for internal use.
 Involved in managing and reviewing Hadoop log files.
 Developed NiFi workflows to automate the data movement between different Hadoop systems.
 Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, HBase NoSQL
database and Sqoop.
 Installation, configuration, supporting and managing Hadoop Clusters using Apache, Cloudera (CDH4)
distributions and on Amazon web services (AWS).
 Worked on writing Scala programs using Spark/Spark-SQL in performing aggregations.
 Responsible for migrating the code base from Hortonworks Platform to Amazon EMR and evaluated
Amazon eco systems components like Redshift.
 Setting an Amazon Web Services (AWS) EC2 instance for the Cloudera Manager server.
 Worked on writing Scala programs using Spark on Yarn for analyzing data.
Environment:
Scala, Hadoop, MapReduce, Spark, Yarn, Hive, Pig, Nifi, Sqoop, Flume, AWS, S3, EC2, IAM, HBase, Elastic
Search, Horton Works, Java, J2EE, Webservices, Hibernate, Struts, JSP, JDBC, XML, Weblogic Workshop,
Jenkins, Maven.
Client: ToyotaInsuranceManagementSolutions -Plano,TX-(Oct 2016 - Oct 2017)
Role: Data Engineer
Responsibilities:
 Developed Spark applications using Scala utilizing Data frames and spark SQL API for faster processing
of data.
 Built a real-time data pipeline to store data for real-time analysis and Batch Processing
 Developed Spark jobs to summarize and transform data in Hive.
 Wrote Spark-Streaming applications to consuming the data from Kafka topics and write the processed
streams to Hive.
 Run trials connecting the Kafka to the storage layers such as HBase, MongoDB, HDFS/Hive and other
analytics.
 Worked on Scala programming in developing Spark streaming jobs for building stream data platform
 Performing data migration from Legacy Databases RDBMS to HDFS using Sqoop.
 Developed Hive scripts in HiveQL to de-normalize and aggregate the data.
 Developed customized UDF's in java to extend Hive and Pig functionality.
 Expertise in creating Hive Tables, loading and analyzing data using hive queries.
 Setting up a scalable analytics pipeline with raw unstructured data as input and valuable extracted data as
output
 Involved in developing a MapReduce framework that filters bad and unnecessary records.

6 | P a g e
 Developed Hive queries on different tables for finding insights. Automated the process of building data
pipelines for data scientists to predict, classify, descriptive and prescriptive analytics.
 Developed Spark code to using Scala and Spark-SQL for faster processing and testing.
 Built a Data warehouse on Hadoop/Hive from different RDBM systems using Apache NiFi data flow
engine for replicating the whole database.
 Developed NiFi Workflows to pick up the data from Data Lake as well as from server logs and send that to
Kafka broker.
 Enabled speedy reviews and first mover advantages by using Oozie to automate data loading into the
Hadoop Distributed File System and PIG to pre-process the data.
 Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from
UNIX, NoSQL and a variety of portfolios.
 Developed web services in play framework using Scala in building stream data platform.
 Lead the Offshore team for automating the NiFi workflows using NiFi REST api.
 Design and implement the complex workflow in Oozie Scheduler.
 Developed Python scripts for Data validation.
Environment:
Spark, Spark SQL, Spark Streaming, Scala, Kafka, Hadoop, HDFS, Hive, Oozie, Mapreduce, Pig, Sqoop,
HDInsight, Shell Scripting, HBase, Apache NiFi, Tableau, Oracle, MySQL, Teradata and DB.
Client: Fidelity Investments/Global Logic - India - (Jul 2015 - Sep 2016)
Role: Hadoop Engineer
Responsibilities:
 Involved in Daily SCRUM meetings and developing project Agile methodology Jira process.
 Worked on building ETL framework to ingest real time and batch ingestion using Java.
 Worked towards creating real time data streaming solutions using Apache Spark/Spark Streaming, Kafka.
 Developed using new features of Java 1.7 Annotations, Generics, enhanced for loop and Enums.
 Designed the user interfaces using JSPs, AJAX and Struts Tags.
 Involved in unit testing, troubleshooting and debugging. Modifying existing programs with
enhancements.
 Performed cleaning and filtering on imported data using Hive and MapReduce.
 Regularly tune performance of Hive and Pig queries to improve data processing and retrieving.
 Involved in Extracting, loading Data from Hive to Load an RDBMS using Sqoop.
 Load the data into Spark RDD and do in memory data Computation to generate the Output response.
 Knowledge on handling Hive queries using Spark SQL that integrate Spark environment.
 Worked with NoSQL platform MongoDB for developing and communicating with database.
 Worked on MongoDB database concepts such as locking, transactions, indexes, Shading, replication,
schema design, etc.
 Created applications using Kafka, which monitors consumer lag within Apache Kafka clusters.
 Troubleshoot and resolve data quality issues and maintain elevated level of data accuracy in the data being
reported.
 Developed Spark scripts by using Scala shell commands as per the requirement.
 Built Spark Scripts by utilizing Scala shell commands depending on the requirement
 Worked on MapReduce programs to export from HDFS to local and import data from local to HDFS.
 Worked on creating Hive External and Internal tables to ingest data using Java ETL framework.

7 | P a g e
 Experienced in managing and reviewing the Hadoop log files.
 Involved in processing ingested raw data using Apache Pig.
 Worked on automating data loading, extraction. UNIX Shell scripting is used for generating the reports.
Environment:
Java, Scala, Spring, HDFS, MapReduce, Kafka, Hive, Spark, Pig, HBase, JSP, Struts, Ajax, Unix Shell
Scripting, Mockito, SVN, MongoDB, Scrum, Jira, SOAP, WSDL, XML, Unit Testing, Debugging,
Troubleshooting, Firebug, Putty
Client: Zensar – India- (Jan 2014 - Jun 2015)
Role: Software Developer
Responsibilities:
 Involved in various phases of Agile Software Development Life Cycle (SDLC) of the application like
Requirement gathering, Design, Analysis and Code development.
 Used JBOSS as application server and Gradle for building and deploying the application.
 Used Apache Cassandra database to design and handle data of the application.
 Managed build results in Jenkins and deployed using workflows.
 Implemented error checking/validation on the Java Server Pages using JavaScript.
 Developed critical components of the application including spring forms, Spring controllers, JSP views,
and business logic and data logic components that include Hibernate Entities, Spring-Hibernate DAO and
Spring models following MVC architecture.
 Developed, modified, and maintained hand coded CSS and HTML that W3 standards-compliant,
accessible, semantic, cross-browser compatible.
 Front-end was integrated with Oracle database using JDBC API through JDBC-ODBC Bridge driver at
server side.
 Developed application service components and configured beans using Spring IOC. Implemented
persistence layer and Configured EH Cache to load the static tables into secondary storage area.
 Participated in the product development life cycle via rapid prototyping with wireframes and mockups.
 Developed Web Services SOAP/HTTP, SOAP/JMS, MDBS, SMTPusing SOA technologies such as
SOAP, WSDL and UDDI.
 Utilized DOM, SAX parser technologies in implementing XML parsing framework.
 Developed JUnit unit test cases for DAO and Service Layer methods.
 Deployed and tested the web application on WebSphere application server.
Environment:
Java, JSP, JDBC, Spring IOC, Spring, Agile, Cassandra, JavaScript, HTML, CSS, JUnit, XML, JBOSS, SOAP,
WSDL, UDDI, wireframing, Jenkins, Gradle, WebSphere, Hibernate, IntelliJ
Client: CSC- Hyderabad- (May 2012 – Dec 2013)
Role: Jr Programmer
Responsibilities:
 Designed and integrated the full-scale Struts/Hibernate/Spring/EJB persistence solutions with the
application architecture

8 | P a g e
 Responsible for architecture and implementation of new Stateless Session Beans (EJB) with annotation for
the entity manager lookup module
 Implemented Object/relational persistence (Hibernate) for the domain model
 Designed and implemented the Hibernate domain model for the services
 Implemented the Web services and associated business modules integration
 Developed and implemented the MVC Architectural Pattern using Struts Framework including JSP,
servlets and action classes.
 Developed UI Interface with Struts/JQuery Plugin/AJAX functionality
 Implemented Struts action classes using Struts controller component
 Developed Web services (SOAP) to interact with other components and worked on parsing and processing
the ANSI835 and generated the claims cross over the XML files for the trading partners using Java and
DB2
 Wrote the programs to parse and transform the XML files by using XSLT
 Wrote secure FTP programs to send cross over files to trading partner
 Created reports for provider search using JSP
 Designed the XML schema to validate XML
 Refactored the Java threads (Multi-threading) to enhance the performance of the business process
 Wrote the PL/SQL Stored Procedures to handle the business logic related to DB
 Worked on creating Views, Indexes and Stored Procedures using AQT
Environment:
Java 1.6, JSP 2.2, Java EE 1.5, Servlets 3.0, Struts 2.0 MVC Framework, Hibernate 3, Ant, JDBC, Web
Services, Axis, Eclipse, UNIX, Weblogic 10.3.2, Oracle 11g, Spring Framework 3.1, JQuery 1.4, EJB 3.0, JPA
2.0, JMS, Eclipse Helios 3.6, SVN, JAX-RPC

a9TD6cbzTZotpJihekdc+w==.docx

More Related Content

What's hot (20)

Similar to a9TD6cbzTZotpJihekdc+w==.docx (20)

Recently uploaded (20)

a9TD6cbzTZotpJihekdc+w==.docx