ANSAR HAYAT BigData Architect
ANSAR HAYAT BigData Architect
Summary
A Microsoft and Databricks Certified Data Engineer Professional with almost 7 + years of experience in the
creation, orchestration, and deployments of end-to-end batch and stream data pipelines and managed big data
platforms in cloud-native environments such as AWS, Azure, and Databricks. Experienced in processing large
datasets of different forms including structured, semi-structured, and unstructured data. Hands-on experience with
AWS Big Data Services (EMR, EMR Studio, Glue, Glue Studio, Glue catalog, Athena, S3, RDS, Databricks, Lake-
house) as well as on Azure Data Services (Synapse Analytics, Azure Data factory, ADLS Gen2, Azure Databricks,
Lake-house).
Professional History
• Architect and developed a data lake architecture that allowed for efficient storage and retrieval of large
datasets.
• Loaded and transformed large sets of structured, and semi-structured data from different sources.
• Creation and monitoring end to end-to-end data pipeline by using Pyspak and glue code.
• Saving data in s3 with parquet format and segregated data as a bronze, silver, and gold layer.
• Built a data catalog to store metadata and enable data governance.
• Querying and data verification were done with Athena.
• A Final Aggregated data set with move-in dataware house to make fact and dim tables.
• The dataware house is connected with Power BI for different dashboards.
• Automated data ingestion processes to reduce manual effort.
• Implemented a data security framework to ensure compliance with data privacy regulations.
• In Systems worked with different clients to provide big data solutions by using AWS data platform
services.
• By using Data Lake, we Ingest data from SharePoint, SFTP, and cloud storage using Apache NiFi and
Pyspark job.
• Post-ingestion was done with the Pyspark script, deployed scripts on Jenkins, and scheduled DAG on
airflow and job run on the EMR cluster.
• we with the Databricks team to Implementation using AWS Databricks E2.
• Creation of s3 root bucket for audit log and terraform State files via terraform scripts.
• Create CI/CD pipelines and integrate with Bitbucket.
• Creation of new Databricks Workspace and Polices.
• Configuration of Unravel with Databricks for Hive and S3 role-based access control management.
• Worked on Informatica cloud by moving data to Redshift.
• Implemented Kafka producer and consumer application on Kafka Cluster setup with the help of
Zookeeper.
• Use Kafka API calls to process the messages smoothly on the Kafka cluster setup.
• Used Kafka Connect Source and Sink Connector of MYSQL, PostgreSQL, Mongo, and MYSQL.
• Use Alteryx for different project ETL pipelines. Created functional diagrams and documented data flow
process on confluence.
July 2019 – October 2021
(EXP: 2 Year) Worked at The Entertainer Pakistan. Lahore in different capacities
since July 2019 as Data Engineer
(https://www.theentertainerme.com/)
Work Summary:
• Big Data architect Implementation using Databricks services Data Lake, Delta Lake, and control
data quality.
• Implement the big data solution by using Databricks services Data Lake, Delta Lake, and Synapse
analytics·
• Spark Batch stream by using Pyspark and spark SQL to get data from Mongo to move into the data
lake.
• Azure Synapse Analytics pipeline was created, scheduled, and monitored by using Azure data
factory.
• Spark Real-time streaming by using Azure event hub and move data into Delta Lake.
• Tableau Desktop and Server for data visualization and scheduling different reports.
• Extensively used the advanced features of MongoDB to get different analytics-related ad hoc
reports.
• GDPR data removal from different data sources like Mongo, MySQL, and DWH.
• B2C and B2B app analytics flow control and investigating data analytics flow issues.
Work Summary:
• Implement business logic of finance and leasing using Stored Procedures, Functions, views, Cursors,
CTE in MS SQL Server 2014 and Oracle.
• Database modeling and modification for different business domains.
• Conducted deep business analysis with respect to client’s actual business requirements, current
practices, and procedure of work, on prioritized risk issues
• Implementing ETL tools from data Migration, by using Visual studio data tools for SQL.
• Telerik Reporting development to show different client data of different dates.
• Account data developments by using different business logic.
• Participation in Gap Analysis and Requirement gathering with different clients.
• Analyzing Client’s Business operational procedures, methods and mapping them with current business
processes and practices.
• Suggesting business processes related recommendations on proposed solutions.
• Conducted deep business analysis with respect to client’s actual business requirements, current.
practices, and procedure of work, on prioritized risk issues.
Development Skills
References
https://www.linkedin.com/in/ansarhayat/