0% found this document useful (0 votes)
20 views4 pages

Vishnu DE

Vishnu Vardhan is a Data Engineer with around 9 years of experience in IT, specializing in Data Warehousing, Big Data technologies, and cloud computing platforms like AWS and Azure. He has extensive hands-on experience with various tools and technologies including Spark, Hadoop, Snowflake, and ETL processes, and has worked on significant projects for clients like Toyota Motors and Fidelity Investments. His technical skills encompass programming, data modeling, and cloud services, along with a strong focus on optimizing data workflows and implementing CI/CD pipelines.

Uploaded by

abhinavin4u
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views4 pages

Vishnu DE

Vishnu Vardhan is a Data Engineer with around 9 years of experience in IT, specializing in Data Warehousing, Big Data technologies, and cloud computing platforms like AWS and Azure. He has extensive hands-on experience with various tools and technologies including Spark, Hadoop, Snowflake, and ETL processes, and has worked on significant projects for clients like Toyota Motors and Fidelity Investments. His technical skills encompass programming, data modeling, and cloud services, along with a strong focus on optimizing data workflows and implementing CI/CD pipelines.

Uploaded by

abhinavin4u
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

Vishnu Vardhan

[email protected] | (972)-945-5081
LinkedIn: linkedin.com/in/rahul-g-sarsai

Professional Summary:
 Around 9 years of Professional experience in IT Industry, experience with specialization in Data Warehousing, Decision
support Systems and extensive experience in implementing Full Lifecycle Data Engineering Projects and in Hadoop/Big Data
related technology experience in Storage, Querying, Processing, analyzing the data.
 Software development involving cloud computing platforms like Amazon Web Services (AWS), Azure Cloud.
 Hands-on experience on AWS Services like S3 for Storage, EMR for running spark jobs and hive queries, Glue – ETL
pipelines, Athena – creating external tables.
 Skilled in data cleansing, preprocessing using Python and creating data workflows with SQL queries using Alteryx and
prepares Tableau Data Extracts(TDE).
 Design & implemented solutions on Azure cloud by creating pipelines using ADF – Azure data factory, linked services, data
sets, Azure Blob Storage, Azure Synapse, Azure Databricks.
 Excellent knowledge on Hadoop Architecture and ecosystems such as HDFS, Job Tracker, Task Tracker, Name Node, Data
Node.
 Strong experience in migrating other databases to Snowflake.
 In-depth knowledge of Snowflake Database, Schema and Table structures.
 Used Azure Synapse to manage processing workloads and served data for BI and prediction needs.
 Hands on Experience with dimensional modeling using star schema and snowflake models.
 Strong knowledge of various data warehousing methodologies and data modeling concepts.
 Experience in installing, configuring, and using Hadoop ecosystem components like Hadoop Map Reduce, HDFS, HBase,
Oozie, Hive, Sqoop, Zookeeper and Flume.
 Experience in analyzing data using Spark, Python, SQL.
 Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems like Teradata, Oracle,
SQL Server and vice-versa.
 Developed Apache Spark jobs using Python in a test environment for faster data processing and used Spark SQL for
querying.
 Experience implementation of a CI/CD pipeline using Azure DevOps in both cloud and on-premises with GIT, Docker,
Maven along with Jenkins plugins.
 Developed solutions using Alteryx tool to provide the data for the dashboard in formats that include JSON, csv, excel etc.
 Managing application code using Azure GIT with required security standards for .Net and java applications.
 Experienced in Spark Core, Spark RDD, Pair RDD, Spark Deployment Architectures.
 ExperiencedwithperformingrealtimeanalyticsonNoSQLdatabaseslikeHBaseandCassandra.
 Worked on AWS EC2, EMR and S3 to create clusters and manage data usingS3.
 Experienced with Dimensional modelling, Data migration, Data cleansing, Data profiling, and ETL Processes features for
data warehouses.

Technical Skills:

Big Data Apache Spark, Hadoop, HDFS, Map Reduce, Pig, Hive, Sqoop, Flume, HBase, YARN, Cassandra,
Phoenix, Airflow, Qlik.
Frameworks Hibernate, Spring, Cloudera CDs, Hortonworks HDPs, MAPR.

Programming & Scripting Java, Python, R, C, C++, HTML, JavaScript, XML, Git, PySpark, Scala.
Languages
Database Oracle 10g/11g, PostgreSQL, DB2, MySQL, Redshift, MSSQL Server, T-SQL.

NoSQL Database HBase, Cassandra, MongoDB.

IDE Eclipse, Net beans, Maven, STS (Spring Tool Suite), Jupyter Notebook.
ETL Tools Pentaho, Informatica, Talend, SSIS.
Reporting Tool Tableau, PowerBI, SQL Server Reporting Services (SSRS).
Operating Systems Windows, UNIX, Linux, Sun Solaris.
Testing Tools Junit, MRUnit, SoapUI.
EMR, Glue, Athena, Dynamo DB, Redshift, RDS, Data Pipelines, Lake formation, S3, SQS, SNS,
AWS
IAM, CloudFormation, EC2, ELB/CLB.
Data Lakes, Data Factory, SQL Data warehouse, Data Lake Analytics, Databricks, and other azure
Azure
services.

Technical Experience:

Client: Toyota Motors, Tx


Data Engineer March 2021 – Present
Responsibilities:
 Involved in loading and Transforming sets of Structured, Semi Structured and Unstructured data and analyzed them by running
Hive queries and Spark SQL.
 Involved in migrating SQL database to Azure data lake, Data lake Analytics, Databricks and Azure SQL data warehouse.
 Responsible for the design, implementation, and architecture of very large-scale data intelligence solutions around Snowflake
Data Warehouse.
 Hands on experience in Azure cloud services(PaaS & LaaS), Azure Synapse Analytics, SQL Azure, Data Factory, Azure
Analysis services, Application Insights, Azure monitoring, Key Vault, Azure Data Lake.
 Creating Spark clusters and configuring high concurrency clusters using Azure Databricks to speed up the preparation of high-
quality data.
 Using different databases to Extract, Transform and Load the data like SQL, Netezza, Oracle and SAP.
 Controlling and granting database access and migrating on premise databases to Azure data lake store using Azure Data Factory.
 Migrated data from on-premises SQL Database to Azure Synapse Analytics using Azure Data Factory, designed optimized
database architecture.
 Proficient in Azure Data Factory to perform incremental Loads from Azure SQL DB to Azure Synapse.
 Involved in the development of real time streaming applications using PySpark, Apache Flink, Hadoop Cluster.
 Data Ingestion to one or more Azure Services and processing the data in Azure Data Bricks and write the data in the form of Text
files, Parquet Files.
 Created the RDD's, Data frame for the required input data and performed the data transformations using PySpark.
 Developed common Flink module for serializing and deserializing AVRO data by applying schema.
 Worked on migration of data from On-prem SQL server to Cloud database(Azure Synapse Analytics(DW) & Azure SQL DB).
 Familiar with DBMS table design, loading, Data Modeling, and experience in SQL.
 Involved in supporting a cloud-based data warehouse environment such as Snowflake.
 Involved in requirement analysis, design, coding and implementation.
 Using Linked Services connected to SQL server, Teradata and get the data into ADLS and BLOB storage.
 Creating and loading the dimensional tables, views in the databases which are linked through the SAP databases, which makes
the informatica easy to read and write the data into SAP databases via relational databases.
 Controlling and granting database access and migrating on premise databases to Azure data lake store using Azure Data Factory.
 Coordination between Business about the anomalies/outliers observed in day-to-day basis and identifying a solution to fix them.
 Extensively worked on CI/CD pipeline for code deployment by engaging different tools (Git, Jenkins) in the process right from
developer code check-in to Production deployment. Engaged in version control using Git, facilitating collaborative development
and ensuring codebase integrity.
 Operated within a Linux environment, executing scripts, managing data processes, and troubleshooting technical issues.
 Practiced Agile methodologies and utilized JIRA to manage and track progress, ensuring alignment with project goals and
timelines.
 In-depth understanding of Spark Architecture including Spark Core, Spark SQL, Data Frames.
 Demonstrated a full understanding of the Fact/Dimension data warehouse design model, including star and snowflake design
methods.
 Extensively worked on User Interface for few modules using JSPs, JavaScript and Ajax.
 Experience managing Azure Delta Data Lakes (ADLS) and Delta Data Lake Analytics and an understanding of how to integrate
with other Azure Services.
 Knowledge of USQL and how it can be used for data transformation as part of a cloud data integration strategy.
 Data Ingestion to one or more Azure Services and processing the data in Azure Data Bricks.
 Worked on Informatica Power Center tools- Designer, Repository Manager, Workflow Manager, and Workflow Monitor.
 Developed spark application by using python (PySpark) to transform data according to business rules.
 Involved in creating Hive scripts for performing ad hoc data analysis required by the business teams.
 Used GitHub for branching, tagging, and merging, Confluence for documentation.
Environment: Azure Data Factory (ADF), Azure Data Bricks, Alteryx, Azure Data Lake Storage (ADLS), Snowflake, Blob storage,
Java, Delta Lake, Druid, Python, SSIS, Flink 1.14, PySpark.

Client: Fidelity Investments, Tx


Data Engineer February 2019 – March 2021
Responsibilities:
 Design, develop, and implement a robust ETL pipeline using Informatica, handling large volumes of data from various sources
and transforming it for storage and analysis.
 Orchestrated seamless data movement between on-premises systems and AWS cloud services, such as S3, Redshift, EMR, Glue,
and RDS, ensuring efficient and scalable data processing.
 Leveraged Athena to create optimized, serverless querying mechanisms for ad-hoc analysis, enhancing data accessibility and
decision-making.
 Engineered Lambda functions to automate data ingestion, processing, and loading tasks, enhancing system reliability and
reducing manual intervention.
 Architected and optimized Cloudera distribution on Hadoop clusters, implementing Spark, Hive, and PySpark jobs for high-
performance data transformations and analytics.
 Collaborated closely with stakeholders to define data requirements and used Pig, Presto, and Sqoop to extract, transform, and
load data from various sources into the data lake.
 Implemented real-time data streaming using Apache Kafka, ensuring the timely delivery of events to downstream applications.
 Managed and optimized complex data pipelines using MySQL and DynamoDB to ensure efficient and reliable data storage,
retrieval, and processing.
 Containerized ETL workflows using Docker and managed container orchestration with Kubernetes for improved scalability and
resource management.
 Designed and developed REST APIs to expose data services, enabling seamless integration with other systems and enhancing
data accessibility.
 Spearheaded the adoption of Git-based version control through Bitbucket, enabling efficient collaboration, code management,
and continuous integration.
 Utilized AWS CloudFormation templates to automate the provisioning and management of cloud resources, improving
infrastructure as code practices.
 Integrated the ELK stack (Elasticsearch, Logstash, Kibana) to centralize and visualize logs, enabling efficient monitoring,
troubleshooting, and data-driven insights.
 Worked within an Agile environment, utilizing JIRA to manage tasks, track progress, and facilitate sprint planning, ensuring
efficient project execution.
 Led a cross-functional team in applying Scrum methodologies, driving iterative development, and delivering high-quality data
solutions within tight timelines.
 Created interactive data visualizations and dashboards using Tableau, enabling stakeholders to gain actionable insights from the
processed data.
 Collaborated with data scientists to operationalize machine learning models, ensuring seamless integration of predictive analytics
into the data pipeline.
 Designed and implemented data governance and security measures, ensuring compliance with regulatory requirements and
safeguarding sensitive data.
 Conducted performance tuning and optimization of data processing workflows, significantly reducing processing time and
resource utilization.
 Maintained comprehensive system architecture, design, and process documentation, facilitating knowledge transfer and ensuring
long-term system maintainability.

Environment: Python, ETL, Informatica, AWS, Cloudera, Hadoop, Spark, Hive, PySpark, Pig, Presto, Sqoop, Flume, Flink,
Apache Kafka, Docker, Kubernetes, REST API, JSON, XML, Bitbucket (Git), AWS CloudFormation, Apache Kafka, ELK,
JIRA, Agile, Scrum, Tableau.

Client: Northern Trust, IL


Big Data Engineer January 2017 – February 2019
Responsibilities:
 Azure PaaS Solution Design: Analyzed, designed, and built modern data solutions using Azure PaaS services to support data
visualization, ensuring alignment with business needs and objectives.
 ETL Process Development with Azure Data Services: Extracted, transformed, and loaded data from source systems to Azure
Data Storage services using Azure Data Factory, T-SQL, Spark SQL, and U-SQL Azure Data Lake Analytics, enabling
efficient data ingestion and processing.
 Azure Data Factory Pipeline Creation: Created pipelines in Azure Data Factory to extract, transform, and load data from
various sources such as Azure SQL, Blob storage, and Azure SQL Data Warehouse, facilitating seamless data integration and
processing.
 Spark Application Development and Optimization: Developed Spark applications using PySpark and Spark-SQL for data
extraction, transformation, and aggregation from multiple file formats, optimizing performance and efficiency through fine-
tuning and memory utilization.
 Cluster Monitoring and Performance Tuning: Estimated cluster size, monitored, and troubleshooted Spark Databricks cluster,
ensuring optimal performance and resource utilization.
 UDF Development and Pipeline Deployment: Wrote user-defined functions (UDFs) in Scala and PySpark to meet specific
business requirements and developed JSON scripts for deploying pipelines in Azure Data Factory, enhancing automation and
efficiency.
 SQL Script Development and Build Management: Developed SQL scripts for automation purposes and created build and
release pipelines for multiple projects in the production environment using Visual Studio Team Services (VSTS), ensuring
smooth deployment and delivery.
 Optimization of Data Pipelines and Workflows: Optimized performance and efficiency of data pipelines and processing
workflows by fine-tuning Spark jobs, SQL queries, and resource utilization, improving overall data processing capabilities.
 Cloud Platform Utilization and Documentation: Leveraged cloud platforms such as AWS, Azure, or Google Cloud to deploy
and manage data infrastructure, including storage, compute, and orchestration services. Documented technical designs, data
flows, and architecture diagrams for data solutions, facilitating knowledge sharing and future maintenance.
 Azure Data Lake and Databricks Expertise: Utilized Azure Data Lake Storage Gen2 for storing various data formats and
leveraged Azure Databricks for ETL tasks, including workspace management, Hive operations, and data loading procedures.

Environment: Cosmos, Azure, Kusto, Scope Script, AWS, Databricks, DataWarehouse, Apache Hadoop, Hive, Spark,
Python, Azure Data Bricks, Azure Data Factory, Delta Lake, HDFS.

HCL, India
ETL and SQL Developer September 2015 – January 2017
Responsibilities:
 Informatica and ETL Process Management: Installed, configured, supported, and managed Informatica and DAC, ensuring
smooth operation and performance of ETL processes.
 ETL Process Design and Development: Designed and developed ETL processes using Informatica, Talend, and SSIS to
extract, transform, and load data from various sources into target databases, aligning with business requirements.
 Complex Mapping Design in Informatica: Designed and developed complex mappings in Informatica to move data from
multiple sources into common target areas such as Data Marts and Data Warehouses, implementing diverse transformations
and ensuring data integrity.
 Data Quality and Error Handling: Implemented data quality checks and error handling mechanisms within ETL processes to
ensure data integrity, reliability, and error-free data processing.
 Optimization and Reusability: Optimized mappings by creating reusable transformations and mapplets, improving
performance and reducing duplication of efforts in Informatica workflows.
 Dashboard Development with Tableau: Leveraged Tableau to create interactive visual dashboards, enhancing data
accessibility and enabling informed decision-making with a 50% increase in data accessibility.
 Insightful Reporting with PowerBI: Designed and developed PowerBI reports, providing invaluable insights and improving
overall decision-making by 25% through insightful data visualization.
 SQL Query Analysis and Optimization: Analyzed complex SQL queries to address concerns regarding error values/data
mismatches between source and sink, ensuring data accuracy and report validation by users. Conducted performance tuning
and optimization of SQL queries and database structures to enhance overall system performance.

Environment: Informatica, Azure Data Factory, Azure SQL Database, Azure Synapse Analytics, and Azure Databricks,
PowerBI, Tableau.

You might also like