Vishnu DE
Vishnu DE
[email protected] | (972)-945-5081
LinkedIn: linkedin.com/in/rahul-g-sarsai
Professional Summary:
Around 9 years of Professional experience in IT Industry, experience with specialization in Data Warehousing, Decision
support Systems and extensive experience in implementing Full Lifecycle Data Engineering Projects and in Hadoop/Big Data
related technology experience in Storage, Querying, Processing, analyzing the data.
Software development involving cloud computing platforms like Amazon Web Services (AWS), Azure Cloud.
Hands-on experience on AWS Services like S3 for Storage, EMR for running spark jobs and hive queries, Glue – ETL
pipelines, Athena – creating external tables.
Skilled in data cleansing, preprocessing using Python and creating data workflows with SQL queries using Alteryx and
prepares Tableau Data Extracts(TDE).
Design & implemented solutions on Azure cloud by creating pipelines using ADF – Azure data factory, linked services, data
sets, Azure Blob Storage, Azure Synapse, Azure Databricks.
Excellent knowledge on Hadoop Architecture and ecosystems such as HDFS, Job Tracker, Task Tracker, Name Node, Data
Node.
Strong experience in migrating other databases to Snowflake.
In-depth knowledge of Snowflake Database, Schema and Table structures.
Used Azure Synapse to manage processing workloads and served data for BI and prediction needs.
Hands on Experience with dimensional modeling using star schema and snowflake models.
Strong knowledge of various data warehousing methodologies and data modeling concepts.
Experience in installing, configuring, and using Hadoop ecosystem components like Hadoop Map Reduce, HDFS, HBase,
Oozie, Hive, Sqoop, Zookeeper and Flume.
Experience in analyzing data using Spark, Python, SQL.
Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems like Teradata, Oracle,
SQL Server and vice-versa.
Developed Apache Spark jobs using Python in a test environment for faster data processing and used Spark SQL for
querying.
Experience implementation of a CI/CD pipeline using Azure DevOps in both cloud and on-premises with GIT, Docker,
Maven along with Jenkins plugins.
Developed solutions using Alteryx tool to provide the data for the dashboard in formats that include JSON, csv, excel etc.
Managing application code using Azure GIT with required security standards for .Net and java applications.
Experienced in Spark Core, Spark RDD, Pair RDD, Spark Deployment Architectures.
ExperiencedwithperformingrealtimeanalyticsonNoSQLdatabaseslikeHBaseandCassandra.
Worked on AWS EC2, EMR and S3 to create clusters and manage data usingS3.
Experienced with Dimensional modelling, Data migration, Data cleansing, Data profiling, and ETL Processes features for
data warehouses.
Technical Skills:
Big Data Apache Spark, Hadoop, HDFS, Map Reduce, Pig, Hive, Sqoop, Flume, HBase, YARN, Cassandra,
Phoenix, Airflow, Qlik.
Frameworks Hibernate, Spring, Cloudera CDs, Hortonworks HDPs, MAPR.
Programming & Scripting Java, Python, R, C, C++, HTML, JavaScript, XML, Git, PySpark, Scala.
Languages
Database Oracle 10g/11g, PostgreSQL, DB2, MySQL, Redshift, MSSQL Server, T-SQL.
IDE Eclipse, Net beans, Maven, STS (Spring Tool Suite), Jupyter Notebook.
ETL Tools Pentaho, Informatica, Talend, SSIS.
Reporting Tool Tableau, PowerBI, SQL Server Reporting Services (SSRS).
Operating Systems Windows, UNIX, Linux, Sun Solaris.
Testing Tools Junit, MRUnit, SoapUI.
EMR, Glue, Athena, Dynamo DB, Redshift, RDS, Data Pipelines, Lake formation, S3, SQS, SNS,
AWS
IAM, CloudFormation, EC2, ELB/CLB.
Data Lakes, Data Factory, SQL Data warehouse, Data Lake Analytics, Databricks, and other azure
Azure
services.
Technical Experience:
Environment: Python, ETL, Informatica, AWS, Cloudera, Hadoop, Spark, Hive, PySpark, Pig, Presto, Sqoop, Flume, Flink,
Apache Kafka, Docker, Kubernetes, REST API, JSON, XML, Bitbucket (Git), AWS CloudFormation, Apache Kafka, ELK,
JIRA, Agile, Scrum, Tableau.
Environment: Cosmos, Azure, Kusto, Scope Script, AWS, Databricks, DataWarehouse, Apache Hadoop, Hive, Spark,
Python, Azure Data Bricks, Azure Data Factory, Delta Lake, HDFS.
HCL, India
ETL and SQL Developer September 2015 – January 2017
Responsibilities:
Informatica and ETL Process Management: Installed, configured, supported, and managed Informatica and DAC, ensuring
smooth operation and performance of ETL processes.
ETL Process Design and Development: Designed and developed ETL processes using Informatica, Talend, and SSIS to
extract, transform, and load data from various sources into target databases, aligning with business requirements.
Complex Mapping Design in Informatica: Designed and developed complex mappings in Informatica to move data from
multiple sources into common target areas such as Data Marts and Data Warehouses, implementing diverse transformations
and ensuring data integrity.
Data Quality and Error Handling: Implemented data quality checks and error handling mechanisms within ETL processes to
ensure data integrity, reliability, and error-free data processing.
Optimization and Reusability: Optimized mappings by creating reusable transformations and mapplets, improving
performance and reducing duplication of efforts in Informatica workflows.
Dashboard Development with Tableau: Leveraged Tableau to create interactive visual dashboards, enhancing data
accessibility and enabling informed decision-making with a 50% increase in data accessibility.
Insightful Reporting with PowerBI: Designed and developed PowerBI reports, providing invaluable insights and improving
overall decision-making by 25% through insightful data visualization.
SQL Query Analysis and Optimization: Analyzed complex SQL queries to address concerns regarding error values/data
mismatches between source and sink, ensuring data accuracy and report validation by users. Conducted performance tuning
and optimization of SQL queries and database structures to enhance overall system performance.
Environment: Informatica, Azure Data Factory, Azure SQL Database, Azure Synapse Analytics, and Azure Databricks,
PowerBI, Tableau.