SlideShare a Scribd company logo
1 | P a g e
Neel Hitesh
Spark/Hadoop Developer
sreekar@hstechnologiesllc.com
Phone: (425) 818-5334
PROFESSIONAL SUMMARY:
 9+ years of extensive experience in design and development of Big Data/Hadoop/Spark/Scala/Java.
 Worked on several domains Banking, Insurance, Mortgage.
 Extensive experience in using Hadoop technologies HDFS, MapReduce, YARN, Sqoop, Pig, Hive,
Impala, Kafka, HBase, Spark.
 In depth understanding/knowledge of Hadoop Architecture and its components such as HDFS, Yarn,
Resource Manager, Node Manager, Job History Server, Job Tracker, Task Tracker, Name Node,
Data Node and MapReduce.
 Experience working on spark in performing ETL using Spark 2.0, Spark Core, Spark-SQL and Real-
time data processing using Spark Streaming.
 Extensive experience in AWS (Amazon Web Services) like S3 Storage, Elastic Compute Cloud (EC2),
Elastic Map Reduce (EMR).
 Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems
(RDBMS) and vice-versa.
 Extensive experienced in working projects involving methodologies like Agile, Scrum, TDD (Test Driven
Development), Iteration and Waterfall.
 Strong Experience in Scala, Python, Java/J2EEE, Spring, HTML5, CSS3, JavaScript, jQuery
technologies.
 Experience with Relational and Non-relational databases designing, development using Oracle
9i/10g/11G, MySQL, DB2, MongoDB, HBase and Cassandra.
 Involved in converting Hive/SQL queries into Spark transformations using Spark Data frames and Scala.
 Experience in Hadoop Distributions Cloudera and Hortonworks.
 Worked with Docker to deploy the application jar through Docker images into virtual machines.
 Experience in job workflow scheduling tools like Oozie.
 Good experience on general data analytics on distributed computing cluster like Hadoop using Apache
Spark, Impala, and Scala.
 Involved in Incident management and change management process. Heavily involve in fully automated
CI/CD pipeline process through Github, Jenkins and Puppet.
 Experience in Automated testing using JUnit, Mockito and Cucumber.
 Hands on experience with build and deployment tools Maven, ANT and Gradle and Jenkins.
 Extensive experience in operating systems Unix, Linux and Windows.
 Worked on setting up Apache NiFi and performing POC with NiFi in orchestrating a data pipeline.
 Experience in document preparation including Requirement Specification, Analysis, design documents, Test
cases and User Training documents and Technical Help documents.
 Excellent verbal and non-verbal communication skills, analytical and problem-solving skills, self-starter to
work independently or within a team environment.
Education:
Bachelors in CS – 2012
2 | P a g e
Technical Skills:
Hadoop Technologies HDFS, YARN, Spark, MapReduce, Hbase, Hive, Impala,
Kafka, Pig, Nifi, Sqoop, HUE UI, Cloudera, Kerberos
Programming Languages Java, Spring, Scala, Python, C, PL/SQL, XML
Web Technologies JSP, JavaScript, HTML/HTML5, CSS/CSS3, jQuery,
Bootstrap, Ajax
Application/Web servers JBoss 5/6, Tomcat, IBM WebSphere 6/6.1/7, Oracle
WebLogic 8.1/9.1
Analysis/Design Tools Informatica ETL, Data Modelling, Design Patterns, UML,
Axure, Photoshop
Cloud Tools AWS, S3, EMR, EC2
Testing/Logging Tools JUnit, Mockito, Jasmine, Log4J, Karma, Jenkins
Build/Deploy Tools ANT, Maven, Gradle, TeamCity, Jenkins, uDeploy, Docker
Database Technologies Oracle 9/10g/11g, DB2, MySQL, MongoDB, Cassandra
Web Services REST, SOAP, JAX-WS, JAX-RPC, Axis 2, WSDL, SOAPUI
Version Control Git, SVN, CVS
Platforms Windows, Mac OS X, Linux
Scheduler Tools Oozie
PROFESSIONAL EXPERIENCE
Client: Morgan Stanley, New York, NY- (June 2020 – Till now)
Role: Sr. Data Engineer
Responsibilities:
 Worked on real-time and batch data processing using Spark/Storm and Kafka using Scala.
 Data Ingestion implemented using Sqoop, Spark, loading data from various RDBMS, CSV, XML files.
 Data cleansing, transformations tasks are handled using Spark using Java.
 Data Consolidation was implemented using HIVE and was sourced to two different targets Dataiku and
Teradata.
 Dataiku was used primarily for data analytics purposes. Creating dashboards was a secondary use case.
 Data ingestion implemented using custom shell scripts loading data from Hive to Teradata.
 Teradata was used as a source of data for creation of dashboards in Tableau and executing queries in
backend in Informatica workflows.
 Worked on Informatica ETL tool on analyzing the logic for each table and worked on creating jobs in our
ETL application to ingest the real time data.
 Workflow creation, mapping was taken care using Informatica for purpose of creating daily adhoc reports.
3 | P a g e
 Developed and designed system to collect data from multiple sources using Kafka and then process it using
Spark.
 Sqoop jobs, Spark jobs were created for data ingestion from relational databases to compare with historical
data.
 Loaded all datasets into Hive from Source CSV, XML and JSON files using Spark.
 Used HUE UI to execute SQL queries to analyze the data.
 Worked on writing complex SQL queries to be executed in backend in Informatica and create views on top
of the queries to be used in Tableau dashboards.
 Worked on Java programming in developing Spark streaming jobs for building stream data platform
integrating with Kafka.
 Worked on processing streaming data from Kafka topics using Java and ingest the data into DB2.
 Worked on writing Java programs using Spark/Spark-SQL in performing aggregations.
 Worked on Tivoli scheduler to schedule jobs for data ingestion from multiple sources Hive and from Hive
to Teradata.
 Provided support in case of job failures on scheduler.
 Worked as support for reviewing Hadoop/application log files to handle production issues and production
releases.
Environment:
Hadoop, HDFS, YARN, MapReduce, Hive, Impala, Spark, Spark-Streaming, Spark-SQL, Cloudera, Kafka, Sqoop,
Tivoli, Dataiku, Kerberos, AWS, Java, ETL, Oracle, Informatica, Teradata, Mockito, Junit, Gradle, Maven, Git.
Client: ECA, Thousand Oaks, CA- (Jan 2019 – June 2020)
Role: Sr. Spark/Hadoop Developer
Responsibilities:
 Worked on real-time and batch data processing using Spark/Storm and Kafka using Scala.
 Data Ingestion implemented using Sqoop, Spark, loading data from various RDBMS, CSV, XML files.
 Data cleansing, transformations tasks are handled using Spark using Scala and Hive.
 Data Consolidation was implemented using SPARK, HIVE to generate data in the required formats by
applying various ETL tasks for data repair, massaging data to identify source for audit purpose, data
filtering and store back to HDFS.
 Developed and designed system to collect data from multiple portal using kafka and then process it using
spark.
 Worked on MapReduce program to export old data from HDFS to Hbase and Impala.
 Worked with Kerberos authentication for Hbase/Hive to secure the data.
 Sqoop jobs, PIG and Hive scripts were created for data ingestion from relational databases to compare with
historical data.
 Loaded all datasets into Hive from Source CSV files using spark and Cassandra from Source CSV files
using Spark.
 Used HUE UI to execute SQL queries to analyze the data.
 Involved in submitting and tracking MapReduce jobs using Job Tracker.
 Worked on writing complex SQL queries to define latency between source and target tables and
collaborated to increase performance by parallel ingestion on multiple instances to fasten the ingestion.
4 | P a g e
 Worked on Scala programming in developing spark streaming jobs for building stream data platform
integrating with Kafka.
 Developed multiple MapReduce jobs for data cleaning and preprocessing.
 Wrote Hive and Pig Scripts to analyze customer satisfaction index, sales patterns etc.
 Orchestrated Sqoop scripts, pig scripts, Hive queries using Oozie workflows.
 Worked on Scala programming in developing spark streaming jobs for building stream data platform
integrating with Kafka.
 Created multi-node Hadoop and Spark clusters in AWS instances to generate terabytes of data and stored it
in AWS HDFS.
 Created Map Reduce Jobs using Hive/Pig Queries.
 Started using Apache NiFi to copy the data from local file system to HDFS.
 Worked on processing streaming data from kafka topics using Scala and ingest the data into Cassandra.
 Worked on writing Scala programs using Spark/Spark-SQL in performing aggregations.
 Worked on Oozie scheduler to scheduled jobs to get reports on data for each table and MapReduce jobs.
 Worked on Informatica ETL tool on analyzing the logic for each table and worked on creating jobs in our
ETL application to ingest the real time data.
 Migrated existing on-premises application to AWS and used AWS services like EC2 and S3 for large data
sets processing and storage and worked with Elastic Mapreduce and setup Hadoop environment in AWS
EC2 Instances.
 Worked as support for reviewing Hadoop/application log files to handle production issues and production
releases.
Environment:
Hadoop, HDFS, YARN, MapReduce, Hive, Nifi, Pig, Impala, Hbase, Spark, Spark-Streaming, Spark-SQL, HUE,
Cloudera, Kafka, Sqoop, Oozie, Kerberos, AWS, Java, ETL, Oracle, Informatica, Mockito, Junit, Gradle,
Maven, Git.
Client: BMO Harris Bank - Naperville, IL- (Nov 2017 – Dec 2018)
Role: Hadoop Developer
Responsibilities:
 Performed various POC’s in data ingestion, data analysis and reporting using Hadoop, MapReduce, Hive,
Pig, Sqoop, Flume, Elastic Search.
 Installed and configured Hadoop.
 Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
 Implemented Spark using python and Spark SQL for faster testing processing the data.
 Data cleansing, transformations tasks are handled using Spark using Scala and Hive.
 Developed spark scripts by using python shell commands as per the requirement.
 Amazon EC2 is also used for deploying and testing the lower environments such as Dev, INT and Test.
 Object storage service Amazon S3 is used to store and retrieve media files such as images.
 Developed multiple MapReduce jobs using java for data cleaning and preprocessing.
 Installed and configured Pig and also written PigLatin scripts to convert unstructured data to structured
format.
 Imported/Exported data using Sqoop to load data from Teradata to HDFS/Hive on regular basis.
 Written Hive queries for ad-hoc reporting to the business.
5 | P a g e
 Worked on Data ingestion to Kafka and Processing and storing the data Using Spark Streaming.
 Experienced in defining job flows using Oozie.
 Worked on Scala programming in developing spark streaming jobs for building stream data platform
integrating with Kafka.
 Worked and learned a great deal from AWS Cloud services like EC2, S3, EBS, RDS and VPC.
 Hands on experience in setting up HBase Column based storage repository for archiving and retro data.
 Worked on processing streaming data from kafka topics using Scala and ingest the data into Cassandra.
 Setup and benchmarked Hadoop clusters for internal use.
 Involved in managing and reviewing Hadoop log files.
 Developed NiFi workflows to automate the data movement between different Hadoop systems.
 Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, HBase NoSQL
database and Sqoop.
 Installation, configuration, supporting and managing Hadoop Clusters using Apache, Cloudera (CDH4)
distributions and on Amazon web services (AWS).
 Worked on writing Scala programs using Spark/Spark-SQL in performing aggregations.
 Responsible for migrating the code base from Hortonworks Platform to Amazon EMR and evaluated
Amazon eco systems components like Redshift.
 Setting an Amazon Web Services (AWS) EC2 instance for the Cloudera Manager server.
 Worked on writing Scala programs using Spark on Yarn for analyzing data.
Environment:
Scala, Hadoop, MapReduce, Spark, Yarn, Hive, Pig, Nifi, Sqoop, Flume, AWS, S3, EC2, IAM, HBase, Elastic
Search, Horton Works, Java, J2EE, Webservices, Hibernate, Struts, JSP, JDBC, XML, Weblogic Workshop,
Jenkins, Maven.
Client: ToyotaInsuranceManagementSolutions -Plano,TX-(Oct 2016 - Oct 2017)
Role: Data Engineer
Responsibilities:
 Developed Spark applications using Scala utilizing Data frames and spark SQL API for faster processing
of data.
 Built a real-time data pipeline to store data for real-time analysis and Batch Processing
 Developed Spark jobs to summarize and transform data in Hive.
 Wrote Spark-Streaming applications to consuming the data from Kafka topics and write the processed
streams to Hive.
 Run trials connecting the Kafka to the storage layers such as HBase, MongoDB, HDFS/Hive and other
analytics.
 Worked on Scala programming in developing Spark streaming jobs for building stream data platform
integrating with Kafka.
 Performing data migration from Legacy Databases RDBMS to HDFS using Sqoop.
 Developed Hive scripts in HiveQL to de-normalize and aggregate the data.
 Developed customized UDF's in java to extend Hive and Pig functionality.
 Expertise in creating Hive Tables, loading and analyzing data using hive queries.
 Setting up a scalable analytics pipeline with raw unstructured data as input and valuable extracted data as
output
 Involved in developing a MapReduce framework that filters bad and unnecessary records.
6 | P a g e
 Developed Hive queries on different tables for finding insights. Automated the process of building data
pipelines for data scientists to predict, classify, descriptive and prescriptive analytics.
 Developed Spark code to using Scala and Spark-SQL for faster processing and testing.
 Built a Data warehouse on Hadoop/Hive from different RDBM systems using Apache NiFi data flow
engine for replicating the whole database.
 Developed NiFi Workflows to pick up the data from Data Lake as well as from server logs and send that to
Kafka broker.
 Enabled speedy reviews and first mover advantages by using Oozie to automate data loading into the
Hadoop Distributed File System and PIG to pre-process the data.
 Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from
UNIX, NoSQL and a variety of portfolios.
 Developed web services in play framework using Scala in building stream data platform.
 Lead the Offshore team for automating the NiFi workflows using NiFi REST api.
 Design and implement the complex workflow in Oozie Scheduler.
 Developed Python scripts for Data validation.
Environment:
Spark, Spark SQL, Spark Streaming, Scala, Kafka, Hadoop, HDFS, Hive, Oozie, Mapreduce, Pig, Sqoop,
HDInsight, Shell Scripting, HBase, Apache NiFi, Tableau, Oracle, MySQL, Teradata and DB.
Client: Fidelity Investments/Global Logic - India - (Jul 2015 - Sep 2016)
Role: Hadoop Engineer
Responsibilities:
 Involved in Daily SCRUM meetings and developing project Agile methodology Jira process.
 Worked on building ETL framework to ingest real time and batch ingestion using Java.
 Worked towards creating real time data streaming solutions using Apache Spark/Spark Streaming, Kafka.
 Developed using new features of Java 1.7 Annotations, Generics, enhanced for loop and Enums.
 Designed the user interfaces using JSPs, AJAX and Struts Tags.
 Involved in unit testing, troubleshooting and debugging. Modifying existing programs with
enhancements.
 Performed cleaning and filtering on imported data using Hive and MapReduce.
 Regularly tune performance of Hive and Pig queries to improve data processing and retrieving.
 Involved in Extracting, loading Data from Hive to Load an RDBMS using Sqoop.
 Load the data into Spark RDD and do in memory data Computation to generate the Output response.
 Knowledge on handling Hive queries using Spark SQL that integrate Spark environment.
 Worked with NoSQL platform MongoDB for developing and communicating with database.
 Worked on MongoDB database concepts such as locking, transactions, indexes, Shading, replication,
schema design, etc.
 Created applications using Kafka, which monitors consumer lag within Apache Kafka clusters.
 Troubleshoot and resolve data quality issues and maintain elevated level of data accuracy in the data being
reported.
 Developed Spark scripts by using Scala shell commands as per the requirement.
 Built Spark Scripts by utilizing Scala shell commands depending on the requirement
 Worked on MapReduce programs to export from HDFS to local and import data from local to HDFS.
 Worked on creating Hive External and Internal tables to ingest data using Java ETL framework.
7 | P a g e
 Experienced in managing and reviewing the Hadoop log files.
 Involved in processing ingested raw data using Apache Pig.
 Worked on automating data loading, extraction. UNIX Shell scripting is used for generating the reports.
Environment:
Java, Scala, Spring, HDFS, MapReduce, Kafka, Hive, Spark, Pig, HBase, JSP, Struts, Ajax, Unix Shell
Scripting, Mockito, SVN, MongoDB, Scrum, Jira, SOAP, WSDL, XML, Unit Testing, Debugging,
Troubleshooting, Firebug, Putty
Client: Zensar – India- (Jan 2014 - Jun 2015)
Role: Software Developer
Responsibilities:
 Involved in various phases of Agile Software Development Life Cycle (SDLC) of the application like
Requirement gathering, Design, Analysis and Code development.
 Used JBOSS as application server and Gradle for building and deploying the application.
 Used Apache Cassandra database to design and handle data of the application.
 Managed build results in Jenkins and deployed using workflows.
 Implemented error checking/validation on the Java Server Pages using JavaScript.
 Developed critical components of the application including spring forms, Spring controllers, JSP views,
and business logic and data logic components that include Hibernate Entities, Spring-Hibernate DAO and
Spring models following MVC architecture.
 Developed, modified, and maintained hand coded CSS and HTML that W3 standards-compliant,
accessible, semantic, cross-browser compatible.
 Front-end was integrated with Oracle database using JDBC API through JDBC-ODBC Bridge driver at
server side.
 Developed application service components and configured beans using Spring IOC. Implemented
persistence layer and Configured EH Cache to load the static tables into secondary storage area.
 Participated in the product development life cycle via rapid prototyping with wireframes and mockups.
 Developed Web Services SOAP/HTTP, SOAP/JMS, MDBS, SMTPusing SOA technologies such as
SOAP, WSDL and UDDI.
 Utilized DOM, SAX parser technologies in implementing XML parsing framework.
 Developed JUnit unit test cases for DAO and Service Layer methods.
 Deployed and tested the web application on WebSphere application server.
Environment:
Java, JSP, JDBC, Spring IOC, Spring, Agile, Cassandra, JavaScript, HTML, CSS, JUnit, XML, JBOSS, SOAP,
WSDL, UDDI, wireframing, Jenkins, Gradle, WebSphere, Hibernate, IntelliJ
Client: CSC- Hyderabad- (May 2012 – Dec 2013)
Role: Jr Programmer
Responsibilities:
 Designed and integrated the full-scale Struts/Hibernate/Spring/EJB persistence solutions with the
application architecture
8 | P a g e
 Responsible for architecture and implementation of new Stateless Session Beans (EJB) with annotation for
the entity manager lookup module
 Implemented Object/relational persistence (Hibernate) for the domain model
 Designed and implemented the Hibernate domain model for the services
 Implemented the Web services and associated business modules integration
 Developed and implemented the MVC Architectural Pattern using Struts Framework including JSP,
servlets and action classes.
 Developed UI Interface with Struts/JQuery Plugin/AJAX functionality
 Implemented Struts action classes using Struts controller component
 Developed Web services (SOAP) to interact with other components and worked on parsing and processing
the ANSI835 and generated the claims cross over the XML files for the trading partners using Java and
DB2
 Wrote the programs to parse and transform the XML files by using XSLT
 Wrote secure FTP programs to send cross over files to trading partner
 Created reports for provider search using JSP
 Designed the XML schema to validate XML
 Refactored the Java threads (Multi-threading) to enhance the performance of the business process
 Wrote the PL/SQL Stored Procedures to handle the business logic related to DB
 Worked on creating Views, Indexes and Stored Procedures using AQT
Environment:
Java 1.6, JSP 2.2, Java EE 1.5, Servlets 3.0, Struts 2.0 MVC Framework, Hibernate 3, Ant, JDBC, Web
Services, Axis, Eclipse, UNIX, Weblogic 10.3.2, Oracle 11g, Spring Framework 3.1, JQuery 1.4, EJB 3.0, JPA
2.0, JMS, Eclipse Helios 3.6, SVN, JAX-RPC

More Related Content

DOCX
DOCX
Sunshine consulting mopuru babu cv_java_j2ee_spring_bigdata_scala
DOCX
Prashanth Kumar_Hadoop_NEW
PDF
Data Engineering Course Syllabus - WeCloudData
DOCX
Mukul-Resume
PPTX
Spark Advanced Analytics NJ Data Science Meetup - Princeton University
DOCX
Sanath pabba hadoop resume 1.0
PDF
Transitioning Compute Models: Hadoop MapReduce to Spark
Sunshine consulting mopuru babu cv_java_j2ee_spring_bigdata_scala
Prashanth Kumar_Hadoop_NEW
Data Engineering Course Syllabus - WeCloudData
Mukul-Resume
Spark Advanced Analytics NJ Data Science Meetup - Princeton University
Sanath pabba hadoop resume 1.0
Transitioning Compute Models: Hadoop MapReduce to Spark

What's hot (20)

DOCX
YUVAM17_BIGDATA
PPTX
Big Data Processing with .NET and Spark (SQLBits 2020)
PPTX
YARN Ready: Apache Spark
PPTX
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
PDF
Spark mhug2
DOCX
Sureh hadoop 3 years t
PPTX
Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...
PPTX
Big Data Simplified - Is all about Ab'strakSHeN
PDF
Apache Spark for Everyone - Women Who Code Workshop
PDF
Webinar: Selecting the Right SQL-on-Hadoop Solution
PPTX
Big data hadoop ecosystem and nosql
PPTX
Applied Deep Learning with Spark and Deeplearning4j
PDF
Apache Spark & Hadoop
PDF
Spark Uber Development Kit
PPTX
Meeting Performance Goals in multi-tenant Hadoop Clusters
PPTX
Spark + Hadoop Perfect together
PPTX
Introduction to the Hadoop EcoSystem
PDF
spark_v1_2
PDF
YARN webinar series: Using Scalding to write applications to Hadoop and YARN
YUVAM17_BIGDATA
Big Data Processing with .NET and Spark (SQLBits 2020)
YARN Ready: Apache Spark
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Spark mhug2
Sureh hadoop 3 years t
Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...
Big Data Simplified - Is all about Ab'strakSHeN
Apache Spark for Everyone - Women Who Code Workshop
Webinar: Selecting the Right SQL-on-Hadoop Solution
Big data hadoop ecosystem and nosql
Applied Deep Learning with Spark and Deeplearning4j
Apache Spark & Hadoop
Spark Uber Development Kit
Meeting Performance Goals in multi-tenant Hadoop Clusters
Spark + Hadoop Perfect together
Introduction to the Hadoop EcoSystem
spark_v1_2
YARN webinar series: Using Scalding to write applications to Hadoop and YARN
Ad

Similar to a9TD6cbzTZotpJihekdc+w==.docx (20)

DOCX
sam_resume - updated
DOC
Nagarjuna_Damarla
PDF
Rajeev kumar apache_spark & scala developer
PDF
Sandish3Certs
DOC
Resume_VipinKP
DOCX
sudipto_resume
DOCX
BigData_Krishna Kumar Sharma
DOCX
Himansu-Java&BigdataDeveloper
DOC
Ankit_Yadav
DOCX
Deepankar Sehdev- Resume2015
DOC
HariKrishna4+_cv
DOC
Resume - Narasimha Rao B V (TCS)
PDF
Ashish dwivedi
DOC
Pallavi_Resume
DOCX
Manikyam_Hadoop_5+Years
DOC
DeepeshRehi
DOCX
Nikhil Sinha.
DOCX
Anil_BigData Resume
DOC
Atul Mithe
DOC
sam_resume - updated
Nagarjuna_Damarla
Rajeev kumar apache_spark & scala developer
Sandish3Certs
Resume_VipinKP
sudipto_resume
BigData_Krishna Kumar Sharma
Himansu-Java&BigdataDeveloper
Ankit_Yadav
Deepankar Sehdev- Resume2015
HariKrishna4+_cv
Resume - Narasimha Rao B V (TCS)
Ashish dwivedi
Pallavi_Resume
Manikyam_Hadoop_5+Years
DeepeshRehi
Nikhil Sinha.
Anil_BigData Resume
Atul Mithe
Ad

Recently uploaded (20)

PPTX
TE-AI-Unit VI notes using planning model
PPT
SCOPE_~1- technology of green house and poyhouse
PPTX
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
PPTX
Ship’s Structural Components.pptx 7.7 Mb
PPT
Ppt for engineering students application on field effect
PPTX
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
PPTX
Soil science - sampling procedures for soil science lab
PDF
Geotechnical Engineering, Soil mechanics- Soil Testing.pdf
PDF
BRKDCN-2613.pdf Cisco AI DC NVIDIA presentation
PPTX
ANIMAL INTERVENTION WARNING SYSTEM (4).pptx
PPTX
Practice Questions on recent development part 1.pptx
PDF
Monitoring Global Terrestrial Surface Water Height using Remote Sensing - ARS...
PPTX
The-Looming-Shadow-How-AI-Poses-Dangers-to-Humanity.pptx
PPTX
MET 305 MODULE 1 KTU 2019 SCHEME 25.pptx
PDF
Introduction to Data Science: data science process
PDF
flutter Launcher Icons, Splash Screens & Fonts
PPTX
Unit 5 BSP.pptxytrrftyyydfyujfttyczcgvcd
PDF
6th International Conference on Artificial Intelligence and Machine Learning ...
PDF
Top 10 read articles In Managing Information Technology.pdf
PPTX
AgentX UiPath Community Webinar series - Delhi
TE-AI-Unit VI notes using planning model
SCOPE_~1- technology of green house and poyhouse
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
Ship’s Structural Components.pptx 7.7 Mb
Ppt for engineering students application on field effect
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
Soil science - sampling procedures for soil science lab
Geotechnical Engineering, Soil mechanics- Soil Testing.pdf
BRKDCN-2613.pdf Cisco AI DC NVIDIA presentation
ANIMAL INTERVENTION WARNING SYSTEM (4).pptx
Practice Questions on recent development part 1.pptx
Monitoring Global Terrestrial Surface Water Height using Remote Sensing - ARS...
The-Looming-Shadow-How-AI-Poses-Dangers-to-Humanity.pptx
MET 305 MODULE 1 KTU 2019 SCHEME 25.pptx
Introduction to Data Science: data science process
flutter Launcher Icons, Splash Screens & Fonts
Unit 5 BSP.pptxytrrftyyydfyujfttyczcgvcd
6th International Conference on Artificial Intelligence and Machine Learning ...
Top 10 read articles In Managing Information Technology.pdf
AgentX UiPath Community Webinar series - Delhi

a9TD6cbzTZotpJihekdc+w==.docx

  • 1. 1 | P a g e Neel Hitesh Spark/Hadoop Developer [email protected] Phone: (425) 818-5334 PROFESSIONAL SUMMARY:  9+ years of extensive experience in design and development of Big Data/Hadoop/Spark/Scala/Java.  Worked on several domains Banking, Insurance, Mortgage.  Extensive experience in using Hadoop technologies HDFS, MapReduce, YARN, Sqoop, Pig, Hive, Impala, Kafka, HBase, Spark.  In depth understanding/knowledge of Hadoop Architecture and its components such as HDFS, Yarn, Resource Manager, Node Manager, Job History Server, Job Tracker, Task Tracker, Name Node, Data Node and MapReduce.  Experience working on spark in performing ETL using Spark 2.0, Spark Core, Spark-SQL and Real- time data processing using Spark Streaming.  Extensive experience in AWS (Amazon Web Services) like S3 Storage, Elastic Compute Cloud (EC2), Elastic Map Reduce (EMR).  Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems (RDBMS) and vice-versa.  Extensive experienced in working projects involving methodologies like Agile, Scrum, TDD (Test Driven Development), Iteration and Waterfall.  Strong Experience in Scala, Python, Java/J2EEE, Spring, HTML5, CSS3, JavaScript, jQuery technologies.  Experience with Relational and Non-relational databases designing, development using Oracle 9i/10g/11G, MySQL, DB2, MongoDB, HBase and Cassandra.  Involved in converting Hive/SQL queries into Spark transformations using Spark Data frames and Scala.  Experience in Hadoop Distributions Cloudera and Hortonworks.  Worked with Docker to deploy the application jar through Docker images into virtual machines.  Experience in job workflow scheduling tools like Oozie.  Good experience on general data analytics on distributed computing cluster like Hadoop using Apache Spark, Impala, and Scala.  Involved in Incident management and change management process. Heavily involve in fully automated CI/CD pipeline process through Github, Jenkins and Puppet.  Experience in Automated testing using JUnit, Mockito and Cucumber.  Hands on experience with build and deployment tools Maven, ANT and Gradle and Jenkins.  Extensive experience in operating systems Unix, Linux and Windows.  Worked on setting up Apache NiFi and performing POC with NiFi in orchestrating a data pipeline.  Experience in document preparation including Requirement Specification, Analysis, design documents, Test cases and User Training documents and Technical Help documents.  Excellent verbal and non-verbal communication skills, analytical and problem-solving skills, self-starter to work independently or within a team environment. Education: Bachelors in CS – 2012
  • 2. 2 | P a g e Technical Skills: Hadoop Technologies HDFS, YARN, Spark, MapReduce, Hbase, Hive, Impala, Kafka, Pig, Nifi, Sqoop, HUE UI, Cloudera, Kerberos Programming Languages Java, Spring, Scala, Python, C, PL/SQL, XML Web Technologies JSP, JavaScript, HTML/HTML5, CSS/CSS3, jQuery, Bootstrap, Ajax Application/Web servers JBoss 5/6, Tomcat, IBM WebSphere 6/6.1/7, Oracle WebLogic 8.1/9.1 Analysis/Design Tools Informatica ETL, Data Modelling, Design Patterns, UML, Axure, Photoshop Cloud Tools AWS, S3, EMR, EC2 Testing/Logging Tools JUnit, Mockito, Jasmine, Log4J, Karma, Jenkins Build/Deploy Tools ANT, Maven, Gradle, TeamCity, Jenkins, uDeploy, Docker Database Technologies Oracle 9/10g/11g, DB2, MySQL, MongoDB, Cassandra Web Services REST, SOAP, JAX-WS, JAX-RPC, Axis 2, WSDL, SOAPUI Version Control Git, SVN, CVS Platforms Windows, Mac OS X, Linux Scheduler Tools Oozie PROFESSIONAL EXPERIENCE Client: Morgan Stanley, New York, NY- (June 2020 – Till now) Role: Sr. Data Engineer Responsibilities:  Worked on real-time and batch data processing using Spark/Storm and Kafka using Scala.  Data Ingestion implemented using Sqoop, Spark, loading data from various RDBMS, CSV, XML files.  Data cleansing, transformations tasks are handled using Spark using Java.  Data Consolidation was implemented using HIVE and was sourced to two different targets Dataiku and Teradata.  Dataiku was used primarily for data analytics purposes. Creating dashboards was a secondary use case.  Data ingestion implemented using custom shell scripts loading data from Hive to Teradata.  Teradata was used as a source of data for creation of dashboards in Tableau and executing queries in backend in Informatica workflows.  Worked on Informatica ETL tool on analyzing the logic for each table and worked on creating jobs in our ETL application to ingest the real time data.  Workflow creation, mapping was taken care using Informatica for purpose of creating daily adhoc reports.
  • 3. 3 | P a g e  Developed and designed system to collect data from multiple sources using Kafka and then process it using Spark.  Sqoop jobs, Spark jobs were created for data ingestion from relational databases to compare with historical data.  Loaded all datasets into Hive from Source CSV, XML and JSON files using Spark.  Used HUE UI to execute SQL queries to analyze the data.  Worked on writing complex SQL queries to be executed in backend in Informatica and create views on top of the queries to be used in Tableau dashboards.  Worked on Java programming in developing Spark streaming jobs for building stream data platform integrating with Kafka.  Worked on processing streaming data from Kafka topics using Java and ingest the data into DB2.  Worked on writing Java programs using Spark/Spark-SQL in performing aggregations.  Worked on Tivoli scheduler to schedule jobs for data ingestion from multiple sources Hive and from Hive to Teradata.  Provided support in case of job failures on scheduler.  Worked as support for reviewing Hadoop/application log files to handle production issues and production releases. Environment: Hadoop, HDFS, YARN, MapReduce, Hive, Impala, Spark, Spark-Streaming, Spark-SQL, Cloudera, Kafka, Sqoop, Tivoli, Dataiku, Kerberos, AWS, Java, ETL, Oracle, Informatica, Teradata, Mockito, Junit, Gradle, Maven, Git. Client: ECA, Thousand Oaks, CA- (Jan 2019 – June 2020) Role: Sr. Spark/Hadoop Developer Responsibilities:  Worked on real-time and batch data processing using Spark/Storm and Kafka using Scala.  Data Ingestion implemented using Sqoop, Spark, loading data from various RDBMS, CSV, XML files.  Data cleansing, transformations tasks are handled using Spark using Scala and Hive.  Data Consolidation was implemented using SPARK, HIVE to generate data in the required formats by applying various ETL tasks for data repair, massaging data to identify source for audit purpose, data filtering and store back to HDFS.  Developed and designed system to collect data from multiple portal using kafka and then process it using spark.  Worked on MapReduce program to export old data from HDFS to Hbase and Impala.  Worked with Kerberos authentication for Hbase/Hive to secure the data.  Sqoop jobs, PIG and Hive scripts were created for data ingestion from relational databases to compare with historical data.  Loaded all datasets into Hive from Source CSV files using spark and Cassandra from Source CSV files using Spark.  Used HUE UI to execute SQL queries to analyze the data.  Involved in submitting and tracking MapReduce jobs using Job Tracker.  Worked on writing complex SQL queries to define latency between source and target tables and collaborated to increase performance by parallel ingestion on multiple instances to fasten the ingestion.
  • 4. 4 | P a g e  Worked on Scala programming in developing spark streaming jobs for building stream data platform integrating with Kafka.  Developed multiple MapReduce jobs for data cleaning and preprocessing.  Wrote Hive and Pig Scripts to analyze customer satisfaction index, sales patterns etc.  Orchestrated Sqoop scripts, pig scripts, Hive queries using Oozie workflows.  Worked on Scala programming in developing spark streaming jobs for building stream data platform integrating with Kafka.  Created multi-node Hadoop and Spark clusters in AWS instances to generate terabytes of data and stored it in AWS HDFS.  Created Map Reduce Jobs using Hive/Pig Queries.  Started using Apache NiFi to copy the data from local file system to HDFS.  Worked on processing streaming data from kafka topics using Scala and ingest the data into Cassandra.  Worked on writing Scala programs using Spark/Spark-SQL in performing aggregations.  Worked on Oozie scheduler to scheduled jobs to get reports on data for each table and MapReduce jobs.  Worked on Informatica ETL tool on analyzing the logic for each table and worked on creating jobs in our ETL application to ingest the real time data.  Migrated existing on-premises application to AWS and used AWS services like EC2 and S3 for large data sets processing and storage and worked with Elastic Mapreduce and setup Hadoop environment in AWS EC2 Instances.  Worked as support for reviewing Hadoop/application log files to handle production issues and production releases. Environment: Hadoop, HDFS, YARN, MapReduce, Hive, Nifi, Pig, Impala, Hbase, Spark, Spark-Streaming, Spark-SQL, HUE, Cloudera, Kafka, Sqoop, Oozie, Kerberos, AWS, Java, ETL, Oracle, Informatica, Mockito, Junit, Gradle, Maven, Git. Client: BMO Harris Bank - Naperville, IL- (Nov 2017 – Dec 2018) Role: Hadoop Developer Responsibilities:  Performed various POC’s in data ingestion, data analysis and reporting using Hadoop, MapReduce, Hive, Pig, Sqoop, Flume, Elastic Search.  Installed and configured Hadoop.  Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.  Implemented Spark using python and Spark SQL for faster testing processing the data.  Data cleansing, transformations tasks are handled using Spark using Scala and Hive.  Developed spark scripts by using python shell commands as per the requirement.  Amazon EC2 is also used for deploying and testing the lower environments such as Dev, INT and Test.  Object storage service Amazon S3 is used to store and retrieve media files such as images.  Developed multiple MapReduce jobs using java for data cleaning and preprocessing.  Installed and configured Pig and also written PigLatin scripts to convert unstructured data to structured format.  Imported/Exported data using Sqoop to load data from Teradata to HDFS/Hive on regular basis.  Written Hive queries for ad-hoc reporting to the business.
  • 5. 5 | P a g e  Worked on Data ingestion to Kafka and Processing and storing the data Using Spark Streaming.  Experienced in defining job flows using Oozie.  Worked on Scala programming in developing spark streaming jobs for building stream data platform integrating with Kafka.  Worked and learned a great deal from AWS Cloud services like EC2, S3, EBS, RDS and VPC.  Hands on experience in setting up HBase Column based storage repository for archiving and retro data.  Worked on processing streaming data from kafka topics using Scala and ingest the data into Cassandra.  Setup and benchmarked Hadoop clusters for internal use.  Involved in managing and reviewing Hadoop log files.  Developed NiFi workflows to automate the data movement between different Hadoop systems.  Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, HBase NoSQL database and Sqoop.  Installation, configuration, supporting and managing Hadoop Clusters using Apache, Cloudera (CDH4) distributions and on Amazon web services (AWS).  Worked on writing Scala programs using Spark/Spark-SQL in performing aggregations.  Responsible for migrating the code base from Hortonworks Platform to Amazon EMR and evaluated Amazon eco systems components like Redshift.  Setting an Amazon Web Services (AWS) EC2 instance for the Cloudera Manager server.  Worked on writing Scala programs using Spark on Yarn for analyzing data. Environment: Scala, Hadoop, MapReduce, Spark, Yarn, Hive, Pig, Nifi, Sqoop, Flume, AWS, S3, EC2, IAM, HBase, Elastic Search, Horton Works, Java, J2EE, Webservices, Hibernate, Struts, JSP, JDBC, XML, Weblogic Workshop, Jenkins, Maven. Client: ToyotaInsuranceManagementSolutions -Plano,TX-(Oct 2016 - Oct 2017) Role: Data Engineer Responsibilities:  Developed Spark applications using Scala utilizing Data frames and spark SQL API for faster processing of data.  Built a real-time data pipeline to store data for real-time analysis and Batch Processing  Developed Spark jobs to summarize and transform data in Hive.  Wrote Spark-Streaming applications to consuming the data from Kafka topics and write the processed streams to Hive.  Run trials connecting the Kafka to the storage layers such as HBase, MongoDB, HDFS/Hive and other analytics.  Worked on Scala programming in developing Spark streaming jobs for building stream data platform integrating with Kafka.  Performing data migration from Legacy Databases RDBMS to HDFS using Sqoop.  Developed Hive scripts in HiveQL to de-normalize and aggregate the data.  Developed customized UDF's in java to extend Hive and Pig functionality.  Expertise in creating Hive Tables, loading and analyzing data using hive queries.  Setting up a scalable analytics pipeline with raw unstructured data as input and valuable extracted data as output  Involved in developing a MapReduce framework that filters bad and unnecessary records.
  • 6. 6 | P a g e  Developed Hive queries on different tables for finding insights. Automated the process of building data pipelines for data scientists to predict, classify, descriptive and prescriptive analytics.  Developed Spark code to using Scala and Spark-SQL for faster processing and testing.  Built a Data warehouse on Hadoop/Hive from different RDBM systems using Apache NiFi data flow engine for replicating the whole database.  Developed NiFi Workflows to pick up the data from Data Lake as well as from server logs and send that to Kafka broker.  Enabled speedy reviews and first mover advantages by using Oozie to automate data loading into the Hadoop Distributed File System and PIG to pre-process the data.  Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.  Developed web services in play framework using Scala in building stream data platform.  Lead the Offshore team for automating the NiFi workflows using NiFi REST api.  Design and implement the complex workflow in Oozie Scheduler.  Developed Python scripts for Data validation. Environment: Spark, Spark SQL, Spark Streaming, Scala, Kafka, Hadoop, HDFS, Hive, Oozie, Mapreduce, Pig, Sqoop, HDInsight, Shell Scripting, HBase, Apache NiFi, Tableau, Oracle, MySQL, Teradata and DB. Client: Fidelity Investments/Global Logic - India - (Jul 2015 - Sep 2016) Role: Hadoop Engineer Responsibilities:  Involved in Daily SCRUM meetings and developing project Agile methodology Jira process.  Worked on building ETL framework to ingest real time and batch ingestion using Java.  Worked towards creating real time data streaming solutions using Apache Spark/Spark Streaming, Kafka.  Developed using new features of Java 1.7 Annotations, Generics, enhanced for loop and Enums.  Designed the user interfaces using JSPs, AJAX and Struts Tags.  Involved in unit testing, troubleshooting and debugging. Modifying existing programs with enhancements.  Performed cleaning and filtering on imported data using Hive and MapReduce.  Regularly tune performance of Hive and Pig queries to improve data processing and retrieving.  Involved in Extracting, loading Data from Hive to Load an RDBMS using Sqoop.  Load the data into Spark RDD and do in memory data Computation to generate the Output response.  Knowledge on handling Hive queries using Spark SQL that integrate Spark environment.  Worked with NoSQL platform MongoDB for developing and communicating with database.  Worked on MongoDB database concepts such as locking, transactions, indexes, Shading, replication, schema design, etc.  Created applications using Kafka, which monitors consumer lag within Apache Kafka clusters.  Troubleshoot and resolve data quality issues and maintain elevated level of data accuracy in the data being reported.  Developed Spark scripts by using Scala shell commands as per the requirement.  Built Spark Scripts by utilizing Scala shell commands depending on the requirement  Worked on MapReduce programs to export from HDFS to local and import data from local to HDFS.  Worked on creating Hive External and Internal tables to ingest data using Java ETL framework.
  • 7. 7 | P a g e  Experienced in managing and reviewing the Hadoop log files.  Involved in processing ingested raw data using Apache Pig.  Worked on automating data loading, extraction. UNIX Shell scripting is used for generating the reports. Environment: Java, Scala, Spring, HDFS, MapReduce, Kafka, Hive, Spark, Pig, HBase, JSP, Struts, Ajax, Unix Shell Scripting, Mockito, SVN, MongoDB, Scrum, Jira, SOAP, WSDL, XML, Unit Testing, Debugging, Troubleshooting, Firebug, Putty Client: Zensar – India- (Jan 2014 - Jun 2015) Role: Software Developer Responsibilities:  Involved in various phases of Agile Software Development Life Cycle (SDLC) of the application like Requirement gathering, Design, Analysis and Code development.  Used JBOSS as application server and Gradle for building and deploying the application.  Used Apache Cassandra database to design and handle data of the application.  Managed build results in Jenkins and deployed using workflows.  Implemented error checking/validation on the Java Server Pages using JavaScript.  Developed critical components of the application including spring forms, Spring controllers, JSP views, and business logic and data logic components that include Hibernate Entities, Spring-Hibernate DAO and Spring models following MVC architecture.  Developed, modified, and maintained hand coded CSS and HTML that W3 standards-compliant, accessible, semantic, cross-browser compatible.  Front-end was integrated with Oracle database using JDBC API through JDBC-ODBC Bridge driver at server side.  Developed application service components and configured beans using Spring IOC. Implemented persistence layer and Configured EH Cache to load the static tables into secondary storage area.  Participated in the product development life cycle via rapid prototyping with wireframes and mockups.  Developed Web Services SOAP/HTTP, SOAP/JMS, MDBS, SMTPusing SOA technologies such as SOAP, WSDL and UDDI.  Utilized DOM, SAX parser technologies in implementing XML parsing framework.  Developed JUnit unit test cases for DAO and Service Layer methods.  Deployed and tested the web application on WebSphere application server. Environment: Java, JSP, JDBC, Spring IOC, Spring, Agile, Cassandra, JavaScript, HTML, CSS, JUnit, XML, JBOSS, SOAP, WSDL, UDDI, wireframing, Jenkins, Gradle, WebSphere, Hibernate, IntelliJ Client: CSC- Hyderabad- (May 2012 – Dec 2013) Role: Jr Programmer Responsibilities:  Designed and integrated the full-scale Struts/Hibernate/Spring/EJB persistence solutions with the application architecture
  • 8. 8 | P a g e  Responsible for architecture and implementation of new Stateless Session Beans (EJB) with annotation for the entity manager lookup module  Implemented Object/relational persistence (Hibernate) for the domain model  Designed and implemented the Hibernate domain model for the services  Implemented the Web services and associated business modules integration  Developed and implemented the MVC Architectural Pattern using Struts Framework including JSP, servlets and action classes.  Developed UI Interface with Struts/JQuery Plugin/AJAX functionality  Implemented Struts action classes using Struts controller component  Developed Web services (SOAP) to interact with other components and worked on parsing and processing the ANSI835 and generated the claims cross over the XML files for the trading partners using Java and DB2  Wrote the programs to parse and transform the XML files by using XSLT  Wrote secure FTP programs to send cross over files to trading partner  Created reports for provider search using JSP  Designed the XML schema to validate XML  Refactored the Java threads (Multi-threading) to enhance the performance of the business process  Wrote the PL/SQL Stored Procedures to handle the business logic related to DB  Worked on creating Views, Indexes and Stored Procedures using AQT Environment: Java 1.6, JSP 2.2, Java EE 1.5, Servlets 3.0, Struts 2.0 MVC Framework, Hibernate 3, Ant, JDBC, Web Services, Axis, Eclipse, UNIX, Weblogic 10.3.2, Oracle 11g, Spring Framework 3.1, JQuery 1.4, EJB 3.0, JPA 2.0, JMS, Eclipse Helios 3.6, SVN, JAX-RPC