0% found this document useful (0 votes)
67 views6 pages

Nagaraju Bachu

Nagaraju Bachu has over 9 years of experience in data engineering. He has strong hands-on experience with technologies like Azure Data Bricks, ADLS, Spark, Python, Amazon Web Services, and Google Cloud Platform. He has extensive experience designing ETL processes, building data pipelines, performing data analytics, and implementing data solutions at scale. His technical skills include Python, Java, Spark, Linux, databases, AWS services, and Azure services.

Uploaded by

Vamsi Ramu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
67 views6 pages

Nagaraju Bachu

Nagaraju Bachu has over 9 years of experience in data engineering. He has strong hands-on experience with technologies like Azure Data Bricks, ADLS, Spark, Python, Amazon Web Services, and Google Cloud Platform. He has extensive experience designing ETL processes, building data pipelines, performing data analytics, and implementing data solutions at scale. His technical skills include Python, Java, Spark, Linux, databases, AWS services, and Azure services.

Uploaded by

Vamsi Ramu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

Nagaraju Bachu

[email protected]
615-576-0066
https://www.linkedin.com/in/naga-raj-21a1b01ab/
Dallas, TX - 75252

Summary:
● Around 9 years of Experience in Data Engineering, designing algorithms, building models, developing Data
Mining, Data Acquisition, Data Preparation, Data Manipulation, Feature Engineering, Machine Learning
Algorithms, Validation and Visualization, and reporting solutions that scale across a massive volume of structured
and unstructured data.
● Have strong hands-on experience in Azure Data Bricks, ADLS, Spark, Python 
● Extensively worked on Databricks to load data to Snowflake for data profiling.   
● Experience with Apache Hadoop ecosystem components like Spark. 
● Used Amazon Web Services Elastic Compute Cloud (AWS EC2) to launch cloud instances.
● Hands-on experience working with Amazon Web Services (AWS) using Elastic Map Reduce (EMR), Redshift,
and EC2 for data processing.
● Experienced with query optimization and performance tuning of SQL stored procedures, functions, SSIS
packages, SSRS reports, and so on.
● Hands-on experience with Amazon EC2, Amazon S3, Amazon RDS, VPC, IAM, Amazon Elastic Load
Balancing, Auto Scaling, Cloud Front, CloudWatch, SNS, SES, SQS, and other services of the AWS family.
● Proficient with Shell, Python, Power shell, JSON, YAML and Groovy scripting languages.
● Expertise in writing Automated shell scripts in Linux/Unix environments using bash.
● Executing the python Scripts by using AWS Lambda
● Perform data engineering responsibilities using agile software engineering practices. Migrate Matillion pipelines
and Looker reports from Amazon Redshift.
● Extensive experience in loading and analyzing large datasets with the Hadoop framework (MapReduce, HDFS,
PIG, HIVE, Flume, Sqoop, SPARK, Impala, Scala), NoSQL databases like MongoDB, HBase, and Cassandra.
● Experience in Setup GCP firewall rules in order to allow or deny traffic to and from the VM’s instances based on
specified configuration and used GCP cloud CDN to deliver content from GCP.
● Worked on GCP services like compute engine, cloud load balancing, cloud storage, cloud SQL, Stack driver
monitoring and cloud deployment manager.
● Experience in developing MapReduce Programs using Apache Hadoop for analyzing the big data as per the
requirement.
● Experience in using Microsoft Azure SQL database, Azure ML and Azure Data Factory.
● Hands on Experience in Writing Aws Templates to create VPC, Subnets, EC2 instances etc.
● Worked with both Scala and Python, Created frameworks for processing data pipelines through Spark.
● Experience of Partitions, bucketing concepts in Hive and designed both Managed and External tables in Hive to
optimize performance.
● Experienced in writing JSON/YAML scripts for Cloud Formation.
● Extensively worked on Databricks to load data to Snowflake for data profiling.
● Experience with GIT, Git Bash, and bit bucket.
● Responsible for building scalable distributed data solutions using sh.
● Experienced in both Waterfall and Agile Development (SCRUM) methodologies
● Extensively strong on databases including Oracle, MS SQL Server.
● Good experience in Data Modeling using Star Schema and Snowflake Schema and well versed with UNIX shell
wrapper and Oracle PL/SQL programming.
● Expertise in Writing PySpark scripts for daily workloads based on the business requirements.
● Maintained current knowledge of emerging cloud computing, data engineering, and RESTful API development
technologies, tools, and techniques, and evaluate and recommend new tools and technologies as needed.
● Extensive experience in writing SQL to validate the database systems and for backend database testing.
● Gained expertise on the entire CI/CD Pipelines of an Analytics Project from Data Ingestion, Exploratory Analysis
to Model Development and Visualization to Solution Deployment.
● Creating new Ansible YAML, Playbooks, Roles, and Bash Shell scripts for application deployments.
● Set up Jenkins server and build jobs to provide continuous automated builds based on polling the Git source
control system to support development needs using Jenkins, Gradle, Git, and Maven.

Technical Skills:
Technologies Used Python, Java, Java script, Spark, Linux/Bash, Kubernetes, Databricks
AWS Services AWS EMR, glue, glue crawler, Athena, Redshift, EC2, S3, IAM, Quick Sight,
SNS, SQS, Event Bridge, Lambda, Cloud formation
Databases Elasticsearch, Oracle, SQL, Postgres, Snowflake, DynamoDB
Azure Services Azure Data Lake, Azure Data Factory (ADF), Azure Blob Storage, Azure SQL
Analytics, Azure Network components (virtual network, network security group,
Gateway, Load Balancer etc., Virtual Machines, Express Route, Traffic Manager,
VPN, Load Balancing, Auto Scaling.
Build technologies Docker, Jenkins
Version control GitHub
Methodologies Agile, Waterfall
Agile Tools Rally, Jira, Confluence
Visualization Power BI, Tableau

Education Details:
Bachelor’s Degree - Mahatma Gandhi Institute of Technology, Hyderabad, India 2014

Work Experience:
Client: CVS Health Group-TX
Role: Data Engineer Jan 2021- Present
Responsibilities:
 Design and develop ETL integration patterns using Python on Spark. Participated in Normalization /De-
normalization, Normal Form, and database design methodology. Expertise in using data modeling tools like MS
Visio and Erwin Tool for the logical and physical design of databases.
 Optimize the PySpark jobs to run on Secured Clusters for faster data processing.
 Used Python for SQL/CRUD operations in DB, file extraction/transformation/generation.
 Developed spark applications in Python and PySpark on the distributed environment to load a huge number of
CSV files with different schema into Hive ORC tables.
 Designing and Developing Apache NiFi jobs to get the files from transaction systems into the data lake raw zone.
 Develop a framework for converting existing PowerCenter mappings to PySpark, Python, and Spark Jobs.
 Developed and implemented Apache Flink applications for real-time data processing, streaming analytics, and
batch processing.
 Designed and optimized Flink data pipelines to ensure efficient data ingestion, transformation, and output.
 Monitored and maintained Terraform infrastructure, ensuring resource optimization, cost efficiency, and
adherence to best practices for security and compliance.
 Stayed updated with Terraform releases and new features, evaluating and implementing improvements to
infrastructure provisioning workflows and practices.
 Created Pipelines in ADF using Linked Services/Datasets/Pipeline/ to Extract, Transform and load data from
different sources like Azure SQL, Blob storage, Azure SQL Data warehouse, write-back tool, and backward.
 Extensive experience in developing data-driven web applications using Angular framework.
 Proficient in building interactive user interfaces and implementing responsive designs using Angular components,
directives, and services.
 Strong knowledge of TypeScript and JavaScript, enabling seamless integration of backend data processing with
Angular frontend.
 Designed Star and Snowflake Data Models for Enterprise Data Warehouse using ERWIN.
 Created Spark clusters and configured high concurrency clusters using Azure Databricks to speed up the
preparation of high-quality data.
 Responsible for ingesting data from various source systems (RDBMS, Flat files, Big Data) into Azure (Blob
Storage) using the framework model.
 Built Azure Web Job for Product Management teams to connect to different APIs and sources to extract the data
and load it into Azure Data Warehouse using Azure Web Job and Functions.
 Developed MapReduce programs and Pig scripts for aggregating the daily eligible & qualified transactions
details and storing the details in HDFS and HIVE.
 Involved in converting Hive/SQL queries into spark transformations using Spark RDDs, Scala and Python.
 Optimized MongoDB queries and indexes to enhance performance and ensure efficient data retrieval and
aggregation for analytical purposes.
 Implemented data security measures in MongoDB by enforcing access controls, encryption, and data masking
techniques to protect sensitive information.
 Perform data engineering responsibilities using agile software engineering practices. Migrate Matillion pipelines
and Looker reports from Amazon Redshift
 Involved in converting the HQLs to spark transformations using spark RDD with the support of Python and Scala.
 Building ETL data pipeline on Hadoop/Teradata using Hadoop/Pig/Hive/UDFs.
 Developed MapReduce/Spark Python modules for machine learning & predictive analytics in Hadoop on
AWS.Implemented a Python-based distributed random forest via Python streaming.
 Developed AWS cloud formation scripts for hosting software.
 Developed Apache PIG scripts to process the HDFS data on Azure. Created Hive tables to store the processed
results in a tabular format.
 Designed a serverless architecture using Dynamo DB, AWS Lambda and lambda code using the S3 Buckets.
 Proficient in designing, developing, and maintaining microservices architecture to enable scalable and distributed
data processing using technologies such as Docker and Kubernetes.
 Conducted performance tuning and optimization of MongoDB databases and microservices, identifying and
resolving bottlenecks to enhance overall system efficiency.
 Used Visualization tools such as Power view for excel, and Tableau for visualizing and generating reports.
 Developed and maintained Airflow DAGs (Directed Acyclic Graphs) for data processing and ETL (Extract,
Transform, Load) workflows, ensuring that data is processed efficiently and accurately.
 Monitored Airflow workflows to ensure that they are running smoothly and troubleshoot any issues that arise,
ensuring that data is delivered on time and in the correct format.
 Deployed Kubernetes Cluster on AWS cloud with master/minion architecture and wrote many YAML files to
create many services like pods, deployments, auto-scaling, load balancers, labels, health checks, Namespaces,
Config Map.
 Created data ingestion framework in Snowflake from different file formats using Snowflake Stage and Snowflake
Data Pipe.
 Implemented Custom Azure Data Factory (ADF) pipeline Activities and SCOPE scripts.
 Set up Jenkins server and build jobs to provide continuous automated builds based on polling the Git source
control system to support development needs using Jenkins, Gradle, Git, and Maven.
Environment: Hadoop, Pig, Spark, Airflow, Spark SQL, Python, PySpark, Hive, Hbase, ADF, Azure Databricks,
Azure SQL, Scala, AWS, EC2, EBS, S3, VPC, Redshift, Oozie, Linux, Maven, Apache NiFi, Oracle, MySQL,
snowflake, HDFS, Hive, Jenkins, Unix Shell Scripting, CI/CD pipeline

Client: BCBS-Kansas City


Data Engineer Jan 2020-Dec 2020
Responsibilities:
 Successfully created a plan which will align with the technology roadmap which delivers efficiency through Data
tools.
 Provided my feedback during the design sessions of future services and application integrations onto the CBB
environment.
 Completed a PySpark framework with all the runtime critical parameters for execution on the CBB environment.
 Successfully provided the primary operational support for the Data Science users like Jupyter notebook, python,
spark, hive, Teradata, Greenplum, Oracle databases connections best practices.
 Mounting Azure Data Lake Storage (Authentication using Service Principle) for Databricks.
 Loading data into Snowflake tables from internal stage using SnowSQL.
 Scheduled AWS lambda functions to trigger the AWS resources using the Event bridge.
 Implemented the integration runtime for Azure Data Factory with linked services and datasets, built pipelines and
activities, created and schedule triggers (w/ Python SDK for ADF)
 Used Hadoop scripts for HDFS (Hadoop Distributed File System) data loading and manipulation.
 Completed User onboarding and Governance process for private cloud onboarding. Partnership with Global
Technology Infrastructure partners to obtain storage and commute capacities based on the use cases.
 Developed and maintained data pipelines using both Java and Node.js technologies
 Implemented ETL processes for data processing, transformation, and loading using Java and Node.js frameworks.
 Utilized Node.js for real-time data streaming and processing solutions
 Provided primary integration support documents for onboarding of new clients to the CBB environment.
 Monitored and troubleshooted Flink jobs to identify and resolve issues related to data consistency, data loss, or
system failures.
 Worked with big data technologies such as Apache Kafka, Apache Hadoop, and Apache Spark to integrate Flink
applications into the existing data ecosystem.
 Skilled in handling data visualization and analytics by leveraging Angular libraries and frameworks.
 Design and implement data pipelines using Google Big Query to automate the processing and storage of data.
 Optimize Big Query queries for performance, scalability, and cost, ensuring that queries are executed efficiently
and cost-effectively.
 Designed and developed data-driven applications compliant with the Google Cloud Platform architecture and
infrastructure.
 Extensive experience in using SQL Management Studio, SQL Server Business Intelligence Solutions like SSRS
and SSIS.
 Used SSMS to manage SQL Server databases, including creating, modifying, and deleting database objects such
as tables, views, stored procedures, and indexes.
 Successfully Onboarded Jupyter notebook/Hub on CBB environment.
 Successfully tested and provided a distribution of various APIs and components that CBB provides and supports.
 Build the infrastructure required for optimal extraction, transformation, and loading of data from a wide variety of
data sources like Teradata and Oracle using Spark, Python, Hive, Kafka.
 Worked on analyzing Hadoop clusters and different big data analytic tools including Pig, HBase database, and
Sqoop.
 Importing the data from the SQL Server to HIVE and HDFS using SQOOP for One time and daily solution.
 Developed Apache PIG scripts to process the HDFS data on Azure. Created Hive tables to store the processed
results in a tabular format.
 Optimization of Hive queries using best practices and right parameters and using technologies like YARN,
Python, and PySpark.
 Mentored and trained team members on Terraform usage and best practices, promoting knowledge sharing and
ensuring effective utilization of infrastructure-as-code principles.
 Designed and implemented ETL workflows using Informatica PowerCenter to extract, transform, and load data
from various sources into the data warehouse.
 Create, deploy, and maintain RESTful APIs for data exchange between systems.
 Used Python for SQL/CRUD operations in DB, file extraction/transformation/generation.
 Optimized the PySpark jobs to run on CBB for faster data processing and developed RESTful APIs for capturing
logs.
Environment: Azure, ADF, Azure Data Bricks, clusters, Containers, AZURE SQL Spark, Java, Nodejs, SSIS, SSRS,
Hadoop, PySpark, snowflake, Big Query, Teradata, Hive, PIG, HDFS, Python, Kafka, RESTFUL APIs.

Client: Honey Well – India Mar 2018-Aug 2019


Role: Data Engineer
Responsibilities:
 Designed data pipelines and optimized data for parallel processing.
 Integrated disparate project datasets using Spark SQL.
 Design a Data Lake Architecture as a centralized Data Hub to deliver data on demand to downstream
applications.
 Using Apache Sqoop, the data from various database sources like Oracle, Teradata, DB2 and Informix is
extracted to HDFS.
 Working experience in EBS, S3, VPC and Cloud watch for AWS instances.
 Automated the cloud deployments using Puppet, python (boto & fabric) and AWS Cloud Formation
Templates.
 Leveraged Terraform modules to promote reusability and standardization across different projects, reducing
deployment time and enhancing consistency in infrastructure setups.
 Implemented data integration pipelines and APIs to fetch and manipulate data from various sources using
Angular’ s HTTP Client module.
 Designed and implemented Data pipelines to move and transform data from various sources into Cloudera.
 Implement real-time data streaming and processing solutions using Java frameworks
 Managed and monitored data quality, ensuring accuracy and completeness of data
 Utilized big data technologies like Hadoop, Spark, and Kafka for data processing and analysis
 Developed end-to-end scalable distributed data pipelines which receiving data using distributed messaging
systems Kafka through persistence of data into HDFS with Apache Spark using Scala.
 ETL (SSIS) jobs were created to extract, clean, transform, and load data into target tables.
 Performed end-to-end Architecture & implementation assessment of various AWS services like Amazon
Redshift, and S3.
 Worked on data archival by transferring data across platforms, validating data while transferring, and
archiving data files for various DBMS by creating dynamic SSIS packages.
 Submitted Talend jobs for scheduling using Talend scheduler.
 Created Talend Development Standards. This describes the general guidelines for Talend.
 software applications.
 Performed advanced operations like text analytics and processing, using in-memory computing capabilities of
Spark using Scala.
Environment: Python, Spark, AWS, EC2, EBS, S3, VPC HDFS, Red shift, Cloud Formation, Java SSIS, SSRS, SQL,
Spark-SQL, Sqoop, Apache Kafka, Pyspark, Cassandra, Scala, Oozie, Eclipse and Qlik sense, Cloudera.

Client: Novartis – India Jan 2017 – Feb 2018


Data Engineer
Responsibilities:
 Identified process data inefficiencies, implemented solutions and tracked progress continuously to improve
data quality in Python using NumPy, pandas and matplotlib.
 Provided Tableau reports/dashboards coupled with suggested actions to leaders whose teams provide master
data inputs, to reduce data quality issues, non-value record maintenance, and process deviation.
 Develop, maintain, and troubleshoot complex scripts and reports developed using SQL, Microsoft Excel or
other analytics tools for Healthcare industry
 Developed technical and functional specifications/metrics based on business requirements and analysis of
relevant workflows.
 Used Bteq for sql scripts and batch scripts and created batch programs using Shell scripts.
 Understand and communicate integration specifications and plans for implementation with customer/vendors
for those interfaces that deviate from standard.
 Understood and communicated data integration specifications and plans for implementation with
customer/vendors for those interfaces that deviated from standard in the healthcare industry.
 Identified issues in data quality that would impact reporting (i.e. duplicate records, missing data, etc.)
 Developed and implemented a periodic review, inclusive of robust reporting, of all master data to ensure data
is current and accurate across business functions, conforms to business rules, and meets data quality standards.
 Assisted in the research for data mining, analytic and data quality assurance tools.
 Worked with cross-functional team to clarify business process and data needs on process-related IT projects,
as well as supporting and executing end-to-end UAT.
 Responded to ad hoc data analysis requests and provide technical assistance.
 Complied with all data best practices, rules of engagement and data protocols.
Environment: SQL, Map Reduce, Python, Kafka, Flume, Linux, Unix, Azure Virtual Machine and Blob Storage, and
Tableau, Azure SQL.

Client: Dhruv soft Services Pvt Ltd – Hyderabad, India Feb 2014 - Dec 2016
Python Developer
Responsibilities:
 Extensively worked in Sprout core managing the client side and backend in Python.
 Expertise in Python scripting.
 Designed database architecture of NorthStar .
 Migrated the entire database objects from SQL to Oracle.
 Worked on the Oracle data base for analysing the data.
 Implemented various performance techniques(Partitioning, Bucketing) in Oracle to get
 better performance.
 Participated in project Initiatives(PI Planning) to plan and assess the technical work.
 Used JIRA, to keep track of sprint stories, tasks and defects.
 supported many production release activities and involved in active interaction with
 business clients to resolve any production issues in a jiffy.
 Data profiling and system analysis.
 Utilize agile software development practices, coding, data and testing standards, secure
 coding practices, code reviews, source code management, continuous delivery and
 software architecture.
Environment: Python, Oracle, SQL, Jupyter Notebook, Sprout Core, UNIX, Jira, S3 Buckets, SQL server, My SQL,
Git.

You might also like