Resume Data Engineer
Resume Data Engineer
M
Sr. Data Engineer
http://linkedin.com/in/nagasrihk
[email protected] +1720-806-9236
Professional Summary:
I am an accomplished Data Engineer with over 10 years of extensive IT experience, specializing in the design,
development, and implementation of large-scale data engineering projects. My professional expertise
encompasses:
I bring a comprehensive understanding of modern data engineering practices and technologies, combined with
a strong ability to translate complex data into valuable business insights. My hands-on approach ensures
alignment with organizational goals, enhancing efficiency, and delivering high-impact solutions.
1
Technical Skills:
Cloud Computing Platforms AWS (Redshift, RDS, S3, EC2, Glue, Lambda, Step Functions, Cloud
Watch, SNS, DynamoDB, SQS, EMR), Azure (Data Lake, Data Factory,
Stream Analytics, SQL DW, HDInsight/Databricks), Google Cloud Platform
(Big Query, Cloud DataProc, Google Cloud Storage, Composer).
Big Data Ecosystem HDFS, MapReduce, YARN/MRv2, Pig, Hive, HBase, Sqoop, Kafka, Flume,
Oozie, Avro, Spark (Spark Core, Spark SQL, Spark MLlib, Spark GraphX,
Spark Streaming), Cassandra, Zookeeper.
Database Systems MongoDB, Cassandra, MySQL, ORACLE, MS SQL Server, Azure SQL,
NoSQL DB.
Database Query Languages SQL (MySQL, PostgreSQL, Redshift, SQL Server and Oracle dialects).
Data Warehousing Solutions Snowflake Schemas, Data Marts, OLAP, Dimensional Data Modelling
with Ralph Kimball Methodology (Star Schema Modelling, Snow-Flake
Modelling for FACT and Dimensions Tables), Azure Analysis Services.
Software Development Methodologies Test Driven Development (TDD), Behaviour Driven Development (BDD),
Acceptance Test Driven Development (ATDD).
2
Deployment (CI/CD)
Software Development Life Cycle Database Architecture, Logical and Physical modelling, Data
(SDLC) Warehouse/ETL development using MS SQL Server and Oracle, ETL
Solutions/Analytics Applications development.
Business Intelligence Solutions MS SQL Server Data tools, SQL Server Integration Services (SSIS),
Reporting Services (SSRS).
Other Tools/Technologies Spring Boot, Solr, AWS ALB, ECS, Informatica, Map R.
Certifications:
Microsoft certified Azure Data Engineer
AWS certified Solutions Architect
Professional Experience:
Responsibilities:
Developed and maintained complex SQL stored procedures to optimize data retrieval, reducing query
execution time by 30%.
Designed and implemented ETL workflows using SSIS, facilitating the integration of data from multiple
sources into a centralized data warehouse.
3
Utilized Talend for robust ETL pipeline design in complex data-intensive environments, and maintained
financial data ETL pipelines for budgeting and forecasting.
Automation & Scripting:
Created and maintained Python scripts for automation of routine tasks, data extraction, and
integration with various APIs and web services.
Implemented Python-based solutions for real-time monitoring, alerting, and logging, enhancing system
performance and reliability.
AWS Cloud & Serverless Architecture:
Designed Cloud Formation templates for deploying web applications and databases, and optimized
AWS service performance.
Implemented serverless architecture using API Gateway, Lambda, DynamoDB; deployed AWS Lambda
code from Amazon S3 buckets.
Developed ETL process in AWS Glue to migrate campaign data into Redshift, automated dashboards
with Terraform.
Business Intelligence & Reporting:
Utilized tools like Power BI for data visualization and reporting.
Conducted data analysis and compliance reviews to secure customer sensitive data for risk, AML, and
marketing teams.
Big Data & Hadoop Ecosystem:
Worked with Hortonworks Apache Falcon for data management and utilized AWS EMR for map-reduce
jobs.
Developed Hive queries for structured and semi-structured data transformation and used ELK (Elastic
Search, Logstash, Kibana) for log management.
Other Responsibilities:
Implemented data interfaces using REST API and processed data using MapReduce 2.0, stored in HDFS
(Hortonworks).
Performed data extraction and aggregation within AWS Glue using PySpark and tested jobs locally
using Jenkins.
Played a key role in implementing and maintaining data pipelines using Actimize on Cloud, ensuring
efficient data ingestion, transformation, and loading (ETL) processes.
Environment: Python, PySpark, AWS Cloud Formation, AWS Lambda, AWS Glue, AWS Redshift, Datadog,
Terraform, AWS API Gateway, DynamoDB, AWS S3, Tableau, Spark, Scala, Spark-SQL, Kafka, Snowflake,
Golang, MapReduce 2.0, HDFS (Hortonworks), ELK Stack (Elastic Search, Logstash, Kibana), AWS EMR, Amazon
AWS EC2, Hive, Talend, Linux, Jenkins, AWS Glue, Git.
Responsibilities:
4
Development & Programming:
Utilized Golang to build RESTful APIs and developed MapReduce programs in Java for raw data parsing.
Employed Python to automate routine tasks, extract data, and interact with various APIs and web
services.
Utilized Git in conjunction with Docker and Kubernetes for version control, testing, and deployment of
the CI/CD pipeline.
Data Pipeline & ETL Management:
Implemented Apache Airflow for authoring, scheduling, and monitoring Data Pipelines and designed
DAGs for ETL pipelines.
Utilized Azure Data Factory, T-SQL, Spark SQL, and U-SQL for data extraction, transformation, loading,
and integration across various Azure services.
Used Python for designing and implementing configurable data delivery pipelines for scheduled
updates to customer-facing data stores.
Led ETL processes using SSIS, extracting data from varied sources, transforming as per business logic,
and loading into data warehouses.
Cloud & Infrastructure Management:
Designed and built Azure Cloud environment infrastructure, integrating Azure SQL Database, Azure
Analysis Services, and Azure Data Factory.
Implemented a Continuous Delivery pipeline with Docker, GitHub and managed Azure Cloud relational
servers and databases.
Business Intelligence & Visualization:
Managed the development of Power BI reports and dashboards, and developed Tableau reports
integrated with Hive for data-driven decision-making.
Created Databricks job workflows to extract data using PySpark and Python, and worked with BigQuery
and spark data frames.
Other Responsibilities:
Worked on Confluence and Jira for project management, and used Jenkins for continuous integration.
Utilized Linux for system administration and performed troubleshooting in a Linux environment.
Designed and implemented Salesforce data models tailored to specific business requirements,
ensuring data accuracy and consistency.
Environment: Python, Pyspark, Azure Data Factory, T-SQL, Spark SQL, Azure Data Lake, Azure Storage, Azure
SQL, Azure Databricks, Azure SQL Database, Azure Analysis service, Azure SQL Data Warehouse, Trillium
Quality, Azure DevOps, Power shell, MongoDB, MS SQL Server, Golang, RESTful APIs, Power BI, Confluence,
Jira, Flume, HBase, Pig Latin, Hive QL, Jenkins, Tableau, MapReduce, Apache Tez, Docker, GitHub, Databricks,
BigQuery, Git, Kubernetes.
Responsibilities:
5
Developed automated scripts in Python for data cleaning, filtering, and analysis with tools such as SQL,
HIVE, and PIG.
Hadoop Cluster Management:
Managed Hadoop clusters, ranging from 4-8 nodes during pre-production to 24 nodes during
production, and transitioned Hadoop jobs to HBase.
API Development:
Built APIs to allow customer service representatives access to data, and developed RESTful APIs using
Golang for data processing functionalities.
Data Warehousing:
Improved Business Data Warehouse (BDW) performance, established self-service reporting in Cognos,
and developed database management systems for data access.
Utilized views to simplify data access for reporting purposes, reducing the need for redundant query
creation.
Image Processing:
Processed image data through Hadoop, using Map and Reduce, and stored into HDFS.
Data Visualization:
Designed and documented dashboards with Tableau, including charts, summaries, graphs, and
geographical maps. Utilized show me functionality for various visualizations.
Golang-based Pipelines:
Developed and maintained data processing pipelines for handling large volumes of data, including data
ingestion, transformation, and loading.
Statistical Analysis & Data Processing:
Performed analysis using Python, R, and Excel, including extensive work with Excel VBA Macros and
Microsoft Access Forms.
Azure Databricks & ETL Workflows:
Designed data pipelines using Azure Databricks for real-time processing and developed ETL workflows
to transform and load data into target systems.
Environment: Python, Hadoop, API Development, HBase, Cassandra, ORACLE, JSON, Azure SQL DW,
HDInsight/Databricks, Data Lakes, Stackdriver Monitoring, Jenkins, Hive, Java, MapReduce, HDFS, Talend,
Tableau, Waterfall Methodology, Git, Golang, RESTful API, R Programming, SQL, SAS, Azure Databricks, ETL
workflows.
Responsibilities:
6
Python Development & Testing:
Built data validation programs using Python and Apache Beam, executed in cloud Dataflow, and
integrated BigQuery tables. Utilized PyTest for unit and integration testing to ensure the proper
functioning of data pipelines and applications.
SQL Operations & Optimization:
Utilized SQL across various dialects (PostgreSQL, Redshift, SQL Server, and Oracle) for advanced data
manipulation, reporting, and performance optimization. Successfully migrated data between RDBMS,
NoSQL databases, and HDFS using Sqoop.
Big Data Analytics & Data Science:
Applied Big Data analytics using Azure Databricks, Hive, Hadoop, Python, PySpark, Spark SQL, and
MapReduce on petabytes of data. Implemented advanced data analysis techniques, including
regressions, data cleaning, and visualization tools such as Excel v-look up, histograms, and TOAD client
to provide insights for investors.
Hadoop Ecosystem Design & Development:
Leveraged the Hadoop ecosystem, utilizing technologies such as MapReduce, Spark, Hive, Pig, Sqoop,
HBase, Oozie, and Impala. Designed and implemented Oozie pipelines to perform tasks like data
extraction from Teradata and SQL, loading into Hive, and executing business-required aggregations.
Optimization & Parallel Processing:
Developed Apache Hadoop, CDH, and Map-R distros, optimizing data latency by leveraging parallel
processing wherever possible.
Data Visualization & Machine Learning:
Integrated Power BI with Python to enhance visualization and employed Python for implementing
machine learning algorithms on different data formats like JSON and XML.
ETL & Data Processing Automation:
Created automated ETL processes, including Spark jars for business analytics, and developed JSON
scripts for SQL Activity-based data processing. Converted Pig scripts into JAR files and parameterized
them within Oozie for HDFS data handling.
Version Control & Development Workflow:
Utilized Git for version control, including pulling, adding, committing, and pushing code, paired with
screwdriver.yaml for build and release management. Employed Git tagging for efficient and traceable
release management.
Cloud & Distributed Computing:
Worked on cloud-based data processing and deployed outcomes using Spark and Scala code in the
Hadoop cluster.
Responsibilities:
● Build and maintain server-side logic for web applications, often using frameworks like Django and Flask.
● Worked with a team of developers to build data-driven applications to provide analytical insights and decision
support tools for executives. Used Python libraries like pandas, NumPy and SciPy.
7
● Developed advanced data access routines using Python and libraries such as SQLAlchemy to extract data from
source systems, replacing tasks previously done using VBA, SQL Server SSIS, SAS and SQL.
● Utilized Python libraries like Dash and Plotly, in conjunction with Tableau and R, to develop data visualizations
and dashboards for large datasets.
● Identified and implemented process improvements using Python to automate repetitive tasks and improve
workflow efficiencies.
● Developed and executed sophisticated data integration strategies using Python scripts, harnessing the power
of industry-leading libraries such as Apache Beam and pandas.
● Wrote and executed tests for the code developed, ensuring that it functions as expected and is robust against
possible edge cases. This involved unit tests, integration tests using Python's testing libraries like unittest and
pytest.
● Used version control systems such as Git to manage code, track changes and collaborate with other
developers.
● Wrote clean, maintainable and efficient Python code for developing applications. Debugging code to identify
and fix issues that arise.
Environment: Python, Django, Flask, pandas, NumPy, SciPy, Tableau, pytest, Git.