0% found this document useful (0 votes)

33 views17 pages

@Q_B@Snowflake & AWS

The document outlines a comprehensive set of interview questions for a Data Engineer role, categorized into general, work experience, technical, system design, and HR rounds. It includes questions on ETL processes, Snowflake and AWS Glue usage, data modeling, and behavioral fit, along with practical coding tasks. Additionally, it provides tips for effective responses, such as using the STAR method and preparing for hands-on coding exercises.

Uploaded by

shubham khot

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

33 views17 pages

@Q_B@Snowflake & AWS

Uploaded by

shubham khot

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 17

“Set 1 - General Questions 1️⃣Tell me about yourself.

2️⃣Why are
you interested in this role? 3️⃣What do you know about our
company? 4️⃣What are your strengths and weaknesses? 5️⃣Where
do you see yourself in the next 3-5 years? Set 2 - Work Experience
& Role-Specific Questions 6️⃣Can you walk me through your
experience with Snowflake & AWS Glue? 7️⃣What’s the most
challenging ETL pipeline migration you’ve worked on? 8️⃣How do
you handle failures in AWS Glue & Snowflake pipelines? 9️⃣Can you
explain a situation where you improved pipeline performance? 🔟
How do you ensure data accuracy and consistency during
migration? 💡 Pro Tip: Use the STAR Method (Situation, Task, Action,
Result) to answer behavioral questions effectively. 🟠 Technical
Round 2 – Data Engineering Concepts (ETL, Snowflake, SQL,
PySpark, AWS Glue) Set 1 - ETL & Pipeline Migration Questions 1️⃣
How do you approach migrating ETL pipelines from Oracle/MsSQL
to Snowflake? 2️⃣Explain the best practices for designing ETL
workflows in AWS Glue. 3️⃣What are the differences between
Snowflake vs Redshift vs BigQuery for warehousing? 4️⃣How do
you handle incremental vs full data loads in Snowflake? 5️⃣Explain
how you optimize ETL jobs in AWS Glue & PySpark. Set 2 -
Snowflake Performance & Optimization 6️⃣How do you optimize
Snowflake queries for faster performance? 7️⃣What’s the role of
clustering & partitioning in Snowflake? 8️⃣How do you handle
large-scale data ingestion into Snowflake? 9️⃣What are Transient
vs Permanent Tables in Snowflake, and when to use them? 🔟
Explain Time Travel and Zero-Copy Cloning in Snowflake. 💡 Pro Tip:
Be ready to write SQL queries & PySpark transformations on a
whiteboard or shared screen. 🔵 Technical Round 3 – System Design
& Cloud Infrastructure (AWS, APIs, Data Modeling) Set 1 - Data
Modeling & Warehouse Design 1️⃣How do you design a Snowflake
schema for an analytics use case? 2️⃣What’s the difference
between Star Schema & Snowflake Schema? 3️⃣How do you model
Salesforce & NetSuite data for analytics? 4️⃣Explain fact vs
dimension tables in a Snowflake data warehouse. 5️⃣How do you
handle slowly changing dimensions (SCD) in Snowflake? Set 2 -
Cloud Infrastructure & API Integration 6️⃣How does AWS Glue
integrate with Snowflake for ETL? 7️⃣How do you replicate
Salesforce/Workday data into Snowflake? 8️⃣Explain AWS Lambda
vs AWS Glue vs Airflow for data orchestration. 9️⃣How do you
handle API rate limits & failures in data ingestion? 🔟 What security
best practices do you follow for AWS & Snowflake? 💡 Pro Tip:
Expect system design whiteboarding & architecture discussions. 🟣
HR Round 4 – Final Discussion & Salary Negotiation Set 1 -
Company & Culture Fit Questions 1️⃣What motivates you as a Data
Engineer? 2️⃣How do you handle tight deadlines & production
failures? 3️⃣Have you worked in cross-functional teams? 4️⃣How do
you keep up with new data engineering trends? 5️⃣Why should we
hire you? Set 2 - Salary & Expectations 6️⃣What are your salary
expectations? 7️⃣Are you open to contract vs full-time roles? 8️⃣Are
you comfortable working in different time zones? 9️⃣What’s your
preferred work model (Remote/Hybrid/Onsite)? 🔟 Do you have any
questions for us? 📌 Bonus: Hands-On Coding & Whiteboarding
Practice ✅ SQL Questions 🔹 Write a query to find the top 3 highest-
selling products per month. 🔹 How do you implement window
functions in Snowflake? 🔹 Write a query to merge new records
into an existing Snowflake table. ✅ PySpark Questions 🔹 Convert a
JSON file to Parquet using PySpark. 🔹 Write a PySpark script to
remove duplicate records from a DataFrame. 🔹 Explain how
broadcast joins improve performance in PySpark. ✅ AWS Glue
Questions 🔹 How do you create a Glue job for processing S3
data? 🔹 How do you handle schema evolution in AWS Glue? 🔹
What’s the difference between Glue DynamicFrame & DataFrame?
"

”🟢 HR Round 1 – Behavioral & Fitment Questions

Set 1 - General Questions

1️⃣ Tell me about yourself.

"I'm a Data Engineer with over two years of experience designing and optimizing data
pipelines. I specialize in ETL development, big data processing with PySpark, and cloud-
based solutions using AWS and Snowflake. Currently, I work at a news agency, where I build
scalable data pipelines for analytics and reporting. I enjoy working on performance
optimization, automating workflows, and ensuring data integrity. Beyond work, I stay
updated with the latest data engineering trends and enjoy contributing to cross-functional
projects."

2️⃣ Why are you interested in this role?

"I'm excited about this role because it aligns with my experience in Snowflake, AWS Glue,
and ETL workflows. I see it as an opportunity to work on large-scale data challenges,
optimize complex pipelines, and collaborate with teams that prioritize innovation in data
engineering."

3️⃣ What do you know about our company?

"I've researched your company and found that you focus on [mention specific domain, e.g.,
e-commerce, finance, healthcare] and handle large volumes of data. Your emphasis on
cloud-based data solutions and analytics-driven decision-making aligns with my expertise,
and I believe I can contribute effectively."

4️⃣ What are your strengths and weaknesses?

 Strengths:

o Proficiency in building scalable ETL pipelines

o Strong SQL and PySpark skills for data transformation & optimization

o Experience in cloud platforms like AWS & Snowflake

 Weakness:

o "I tend to focus too much on details when optimizing queries. However, I’m
learning to balance performance with project timelines by prioritizing
optimizations that have the most impact."

5️⃣ Where do you see yourself in the next 3-5 years?

"I see myself evolving into a Senior Data Engineer or a Data Architect, leading the design of
efficient data systems, mentoring junior engineers, and working on cutting-edge
technologies in big data and AI-driven analytics."

Set 2 - Work Experience & Role-Specific Questions

6️⃣ Can you walk me through your experience with Snowflake & AWS Glue?
"I've designed and optimized ETL pipelines using AWS Glue for transforming and loading
data into Snowflake. I use PySpark within Glue for data transformations, schema evolution,
and incremental loading. In Snowflake, I optimize queries using clustering, partitioning, and
caching techniques."

7️⃣ What’s the most challenging ETL pipeline migration you’ve worked on?
(STAR Method Example)

 Situation: Migrating an on-premise SQL Server ETL pipeline to Snowflake.

 Task: Improve performance and reduce maintenance overhead.

 Action: Used AWS Glue for ETL processing, optimized data partitioning, and
implemented incremental loading using Snowflake Streams & Tasks.

 Result: Reduced processing time by 60% and improved query performance for
analytics.

8️⃣ How do you handle failures in AWS Glue & Snowflake pipelines?

 AWS Glue: Implement checkpointing & retry logic in PySpark jobs.

 Snowflake: Use error-handling SQL scripts, monitoring with Snowflake Query
History, and retry failed tasks using Streams & Tasks.

9️⃣ Can you explain a situation where you improved pipeline performance?
"I optimized an ETL pipeline by using PySpark broadcast joins to speed up small-to-large
table joins, reducing runtime from 3 hours to 45 minutes."

🔟 How do you ensure data accuracy and consistency during migration?

 Schema validation before migration

 Row count & checksum validation

 Using Snowflake Streams & Tasks for CDC (Change Data Capture)

🟠 Technical Round 2 – Data Engineering Concepts

Set 1 - ETL & Pipeline Migration

1️⃣ How do you approach migrating ETL pipelines from Oracle/MsSQL to Snowflake?

 Assess source schema & ETL logic

 Extract data using AWS DMS

 Use AWS Glue/Snowflake Staging Tables for transformations

 Optimize Snowflake warehouse sizing & indexing

2️⃣ Best practices for designing ETL workflows in AWS Glue?

 Use DynamicFrames for schema flexibility

 Optimize memory with Spark partitions

 Use S3 as an intermediate storage layer

3️⃣ Snowflake vs Redshift vs BigQuery?

 Snowflake: Best for on-demand compute scaling & semi-structured data.

 Redshift: Good for batch processing, but less flexible.

 BigQuery: Serverless with automatic scaling, ideal for Google Cloud users.

4️⃣ Handling incremental vs full data loads in Snowflake?

 Full Load: Truncate and reload entire data.

 Incremental Load: Use Streams & Tasks to track changes.

5️⃣ Optimizing ETL jobs in AWS Glue & PySpark?

 Use Glue Job Bookmarks for incremental loads

 Optimize partitions & avoid shuffling in PySpark

Set 2 - Snowflake Performance & Optimization

6️⃣ How do you optimize Snowflake queries?

 Use clustering keys, result caching, and materialized views

 Minimize SELECT * queries and optimize joins

7️⃣ Role of clustering & partitioning in Snowflake?

 Clustering improves query pruning

 Partitioning (via file structure) reduces unnecessary scans

8️⃣ Handling large-scale data ingestion into Snowflake?

 Parallel COPY commands from S3

 Auto-ingest using Snowpipe

9️⃣ Transient vs Permanent Tables in Snowflake?

 Transient: No Fail-safe, used for staging.

 Permanent: Retains history for compliance.

🔟 Time Travel & Zero-Copy Cloning?

 Time Travel: Restore data from past states.

 Zero-Copy Cloning: Clone tables instantly without duplication.

@@@@

🔵 Technical Round 3 – System Design & Cloud Infrastructure

Set 1 - Data Modeling & Warehouse Design

1️⃣ How do you design a Snowflake schema for an analytics use case?

 Understand business requirements (KPIs, dimensions, fact tables).

 Choose schema type (Star Schema or Snowflake Schema).

 Optimize data storage with clustering and partitioning.

 Use materialized views for frequently used aggregations.

 Leverage Snowflake features like micro-partitioning and result caching.

2️⃣ What’s the difference between Star Schema & Snowflake Schema?

Feature Star Schema Snowflake Schema

Structure Denormalized Normalized

Performance Faster queries Slower joins

Storage More redundant data Less redundancy

Joins Fewer joins needed Multiple joins required

Use case Fast query performance Optimized storage

3️⃣ How do you model Salesforce & NetSuite data for analytics?

 Extract data using AWS DMS, Fivetran, or Stitch.

 Stage raw data in Snowflake using schema similar to Salesforce/NetSuite.

 Transform data to match analytical needs (flatten JSON structures, join related
tables).

 Create fact and dimension tables (e.g., Sales as fact, Customers as dimension).

 Optimize with clustering on frequently queried fields (e.g., Date, Customer ID).

4️⃣ Explain fact vs dimension tables in a Snowflake data warehouse.

 Fact tables store transactional data (e.g., Sales, Orders).

 Dimension tables provide context (e.g., Customers, Products).

 Fact tables have high cardinality and numeric values.

 Dimension tables contain descriptive attributes for slicing and dicing data.

Example:

 Fact Table: sales (sale_id, customer_id, product_id, amount, date_id)

 Dimension Table: customers (customer_id, name, region, created_at)

5️⃣ How do you handle slowly changing dimensions (SCD) in Snowflake?

 SCD Type 1 (Overwrite): Update records directly.

 SCD Type 2 (Versioned History): Maintain historical records with valid_from and
valid_to timestamps.

 SCD Type 3 (Limited History): Store only the previous value in a separate column.

 Use Streams & Tasks to track changes efficiently.

Example SQL for SCD Type 2:

INSERT INTO customers_scd2 (customer_id, name, region, valid_from, valid_to)

SELECT customer_id, name, region, CURRENT_TIMESTAMP, NULL

FROM staging_customers

WHERE NOT EXISTS (

SELECT 1 FROM customers_scd2 WHERE customers_scd2.customer_id =

staging_customers.customer_id);

Set 2 - Cloud Infrastructure & API Integration

6️⃣ How does AWS Glue integrate with Snowflake for ETL?

 AWS Glue extracts raw data from sources (S3, RDS).

 Processes data using PySpark or Glue DynamicFrames.

 Writes transformed data to Snowflake using JDBC connection.

 AWS Glue Catalog can be used for metadata management.

7️⃣ How do you replicate Salesforce/Workday data into Snowflake?

 Use Fivetran/Stitch/AWS DMS for real-time or batch replication.

 Store data in a Snowflake staging area before transformations.

 Use Streams & Tasks to track changes and implement incremental loads.

 Partition & cluster data for optimal query performance.

8️⃣ Explain AWS Lambda vs AWS Glue vs Airflow for data orchestration.

Service Use Case Pros Cons

Event-driven ETL (small Serverless, cost- Limited memory & execution

AWS Lambda
data) effective time

AWS Glue Serverless ETL for big Scalable, supports Higher cost for frequent jobs
Service Use Case Pros Cons

data PySpark

Apache Flexible, DAG-based Requires infrastructure

Workflow orchestration
Airflow scheduling management

9️⃣ How do you handle API rate limits & failures in data ingestion?

 Implement exponential backoff for retries.

 Use caching mechanisms for frequently requested data.

 Batch API requests instead of making multiple small ones.

 Use AWS Step Functions for handling failures in a workflow.

Example:

import time

import requests

def call_api_with_retry(url, max_retries=5):

retries = 0

while retries < max_retries:

response = requests.get(url)

if response.status_code == 200:

return response.json()

elif response.status_code == 429: # Too many requests

time.sleep(2 ** retries) # Exponential backoff

retries += 1

return None

🔟 What security best practices do you follow for AWS & Snowflake?

 AWS:

o Use IAM roles & least privilege principle.

o Enable VPC, private endpoints & encryption (KMS, SSE-S3).

o Monitor access with CloudTrail & GuardDuty.

 Snowflake:

o Enable role-based access control (RBAC).

o Use network policies to restrict access.

o Implement column-level security & masking for PII data.

🟣 HR Round 4 – Final Discussion & Salary Negotiation

Set 1 - Company & Culture Fit

1️⃣ What motivates you as a Data Engineer?

"I enjoy working with large-scale data and optimizing high-performance pipelines. Building
scalable and efficient data solutions excites me."

2️⃣ How do you handle tight deadlines & production failures?

 Prioritize critical fixes while keeping stakeholders informed.

 Use monitoring & alerting to detect issues early.

 Follow post-mortem analysis to prevent future failures.

3️⃣ Have you worked in cross-functional teams?

"Yes, I collaborate with data analysts, scientists, and product managers to ensure data
solutions meet business needs."

4️⃣ How do you keep up with new data engineering trends?

 Follow AWS & Snowflake blogs.

 Participate in LinkedIn & Medium communities.

 Take online courses and attend tech meetups.

5️⃣ Why should we hire you?

"I bring strong ETL, cloud, and big data expertise and can optimize large-scale data
pipelines while ensuring data quality & reliability."
Set 2 - Salary & Expectations

6️⃣ What are your salary expectations?

"I’m looking for a competitive offer based on my experience, but I’m open to discussion."

7️⃣ Are you open to contract vs full-time roles?

"I prefer a full-time role but open to contract if it aligns with my career goals."

8️⃣ Are you comfortable working in different time zones?

"Yes, as long as there's some flexibility in overlapping working hours."

9️⃣ What’s your preferred work model (Remote/Hybrid/Onsite)?

"I prefer hybrid but open to onsite depending on the role."

🔟 Do you have any questions for us?

 "Can you tell me about the team structure?"

 "What are the key challenges you’re currently facing?"

📌 Bonus: Hands-On Coding & Whiteboarding Practice

✅ SQL - Top 3 highest-selling products per month

SELECT month, product_id, total_sales

FROM (

SELECT month, product_id, SUM(sales) AS total_sales,

RANK() OVER (PARTITION BY month ORDER BY SUM(sales) DESC) as rnk

FROM sales_data

GROUP BY month, product_id

) WHERE rnk <= 3;

✅ PySpark - Convert JSON to Parquet

df = spark.read.json("s3://input-bucket/data.json")

df.write.parquet("s3://output-bucket/data.parquet")
✅ AWS Glue - Schema Evolution

 Enable schema detection in AWS Glue DynamicFrames.

 Use AWS Lake Formation for schema validation.

@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

Here’s the full set of SQL, PySpark, and AWS Glue codes for your questions:

✅ SQL Questions

1️⃣ Query to Find the Top 3 Highest-Selling Products Per Month

sql

CopyEdit

SELECT month, product_id, total_sales

FROM (

SELECT

DATE_TRUNC('month', order_date) AS month,

product_id,

SUM(sales) AS total_sales,

RANK() OVER (PARTITION BY DATE_TRUNC('month', order_date) ORDER BY SUM(sales)

DESC) AS rnk

FROM sales_data

GROUP BY month, product_id

) WHERE rnk <= 3;

2️⃣ Implementing Window Functions in Snowflake

sql

CopyEdit

SELECT

customer_id,

order_date,
total_amount,

SUM(total_amount) OVER (PARTITION BY customer_id ORDER BY order_date) AS

running_total,

LAG(total_amount, 1, 0) OVER (PARTITION BY customer_id ORDER BY order_date) AS

previous_order_amount,

ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY order_date) AS order_rank

FROM orders;

 SUM() → Calculates running totals.

 LAG() → Fetches the previous order amount.

 ROW_NUMBER() → Assigns a sequential rank per customer.

3️⃣ Merge New Records into an Existing Snowflake Table

sql

CopyEdit

MERGE INTO customers target

USING new_customers source

ON target.customer_id = source.customer_id

WHEN MATCHED THEN

UPDATE SET target.name = source.name, target.city = source.city

WHEN NOT MATCHED THEN

INSERT (customer_id, name, city) VALUES (source.customer_id, source.name, source.city);

 Updates existing records.

 Inserts new records if no match is found.

✅ PySpark Questions

4️⃣ Convert a JSON File to Parquet using PySpark

python

CopyEdit

from pyspark.sql import SparkSession

spark = SparkSession.builder.appName("JSONtoParquet").getOrCreate()

df = spark.read.json("s3://input-bucket/data.json") # Load JSON

df.write.parquet("s3://output-bucket/data.parquet") # Save as Parquet

spark.stop()

5️⃣ Remove Duplicate Records from a PySpark DataFrame

python

CopyEdit

from pyspark.sql import SparkSession

spark = SparkSession.builder.appName("RemoveDuplicates").getOrCreate()

df = spark.read.csv("s3://input-bucket/data.csv", header=True)

df_no_duplicates = df.dropDuplicates() # Removes all duplicate rows

df_no_duplicates.show()

spark.stop()

6️⃣ How Broadcast Joins Improve Performance in PySpark

python

CopyEdit

from pyspark.sql import SparkSession

from pyspark.sql.functions import broadcast

spark = SparkSession.builder.appName("BroadcastJoinExample").getOrCreate()

large_df = spark.read.parquet("s3://large-dataset.parquet")

small_df = spark.read.parquet("s3://small-dataset.parquet")

optimized_df = large_df.join(broadcast(small_df), "common_key")

optimized_df.show()

spark.stop()

 Broadcasting smaller tables avoids costly shuffles, improving performance

significantly.

✅ AWS Glue Questions

7️⃣ Create an AWS Glue Job for Processing S3 Data

python

CopyEdit

import sys

from awsglue.transforms import *

from awsglue.utils import getResolvedOptions

from pyspark.context import SparkContext

from awsglue.context import GlueContext

from awsglue.job import Job

args = getResolvedOptions(sys.argv, ['JOB_NAME'])

sc = SparkContext()

glueContext = GlueContext(sc)

spark = glueContext.spark_session
job = Job(glueContext)

job.init(args['JOB_NAME'], args)

datasource0 = glueContext.create_dynamic_frame.from_options(

connection_type="s3",

connection_options={"paths": ["s3://input-bucket/data/"]},

format="json"

transformed_data = datasource0.toDF().select("id", "name", "age") # Transformation

glueContext.write_dynamic_frame.from_options(

frame=transformed_data,

connection_type="s3",

connection_options={"path": "s3://output-bucket/processed-data/"},

format="parquet"

job.commit()

 Reads JSON files from S3.

 Transforms data.

 Writes output as Parquet to another S3 bucket.

8️⃣ Handle Schema Evolution in AWS Glue

python

CopyEdit

from awsglue.context import GlueContext

glueContext = GlueContext(SparkContext.getOrCreate())

dynamic_frame = glueContext.create_dynamic_frame.from_options(

connection_type="s3",

connection_options={"paths": ["s3://data-bucket/"]},

format="parquet"

dynamic_frame = dynamic_frame.resolveChoice(

specs=[("new_column", "cast:int")]

glueContext.write_dynamic_frame.from_options(

frame=dynamic_frame,

connection_type="s3",

connection_options={"path": "s3://output-bucket/processed-data/"},

format="parquet"

 Uses resolveChoice() to handle schema evolution.

 Casts columns to appropriate types dynamically.

9️⃣ Difference Between Glue DynamicFrame & DataFrame

Feature DynamicFrame DataFrame

Schema is flexible (supports evolving

Schema Handling Fixed schema (Spark-based)
schemas)

Transformations Glue-specific transformations available Standard Spark transformations

Slightly slower due to metadata

Performance Faster for large-scale operations
handling
Feature DynamicFrame DataFrame

Preferred for optimized Spark

Usage Preferred for AWS Glue ETL jobs
workloads

Example Converting DynamicFrame → DataFrame

python

CopyEdit

from awsglue.context import GlueContext

glueContext = GlueContext(SparkContext.getOrCreate())

dynamic_frame = glueContext.create_dynamic_frame.from_options(

connection_type="s3",

connection_options={"paths": ["s3://data-bucket/"]},

format="json"

# Convert to DataFrame

dataframe = dynamic_frame.toDF()

# Convert back to DynamicFrame

dynamic_frame_new = DynamicFrame.fromDF(dataframe, glueContext)

Accenture - Azure Data Engineer - 3+
No ratings yet
Accenture - Azure Data Engineer - 3+
4 pages
@Hexalytics@ Full Material
No ratings yet
@Hexalytics@ Full Material
12 pages
ETL Interview Preparation
No ratings yet
ETL Interview Preparation
18 pages
Data Pipeline Project Overview
No ratings yet
Data Pipeline Project Overview
12 pages
Data Engineer Introduction
No ratings yet
Data Engineer Introduction
3 pages
Ravi
No ratings yet
Ravi
4 pages
Exposure Details
No ratings yet
Exposure Details
12 pages
sample_resume77
No ratings yet
sample_resume77
4 pages
Pharma Script Pawan
No ratings yet
Pharma Script Pawan
19 pages
Venkata Sai Sharath Snowflake Resume
No ratings yet
Venkata Sai Sharath Snowflake Resume
3 pages
SNWFLK POINTS
No ratings yet
SNWFLK POINTS
2 pages
Prashant Kumar CV PDF
No ratings yet
Prashant Kumar CV PDF
5 pages
HCL Interview Prepration
No ratings yet
HCL Interview Prepration
4 pages
Vinodsingh CloudDataEngineer 900 (1) (1)
No ratings yet
Vinodsingh CloudDataEngineer 900 (1) (1)
5 pages
HR_KiranKumar_Snowflake
No ratings yet
HR_KiranKumar_Snowflake
4 pages
Azure Data Engineer
No ratings yet
Azure Data Engineer
2 pages
Q&A - Chief Manager - Platform Analytics
No ratings yet
Q&A - Chief Manager - Platform Analytics
2 pages
Common Interview Questions for Data Engineering Ro
No ratings yet
Common Interview Questions for Data Engineering Ro
4 pages
New Microsoft Word Document
No ratings yet
New Microsoft Word Document
3 pages
Cloud Based Developer - RizwanShaikh (3y - 8m)
No ratings yet
Cloud Based Developer - RizwanShaikh (3y - 8m)
1 page
Anil Kumar B
No ratings yet
Anil Kumar B
3 pages
SrichandanAnumala Data Engineer3
No ratings yet
SrichandanAnumala Data Engineer3
6 pages
Karthik (project details)
No ratings yet
Karthik (project details)
14 pages
Eliassen Resume - Anup Somavarapu
No ratings yet
Eliassen Resume - Anup Somavarapu
5 pages
My_Walmart_interviewExperience_Answers
No ratings yet
My_Walmart_interviewExperience_Answers
13 pages
Data Engineer
No ratings yet
Data Engineer
5 pages
@Arcserve@Operations Analyst Hyderabad Remote
No ratings yet
@Arcserve@Operations Analyst Hyderabad Remote
10 pages
Rakesh Data Engineer
No ratings yet
Rakesh Data Engineer
8 pages
abhinav Chaudhri
No ratings yet
abhinav Chaudhri
6 pages
Raju SF Resume1
No ratings yet
Raju SF Resume1
4 pages
DATA_ENGINEER QUESTIONS
No ratings yet
DATA_ENGINEER QUESTIONS
3 pages
Data Engineering Agenda
No ratings yet
Data Engineering Agenda
19 pages
Cloud Based Developer ManishaAwachare (3y 5m)
No ratings yet
Cloud Based Developer ManishaAwachare (3y 5m)
3 pages
walmart data engineering question
No ratings yet
walmart data engineering question
10 pages
Mucharla Shiva Kumar Goud _leadData Engineer
No ratings yet
Mucharla Shiva Kumar Goud _leadData Engineer
5 pages
Akash Shandilya Snowflake Developer CTS
No ratings yet
Akash Shandilya Snowflake Developer CTS
3 pages
Question
No ratings yet
Question
6 pages
ETL Question and Answers
No ratings yet
ETL Question and Answers
6 pages
Ultimate Azure Synapse Analytics: Unlock the Full Potential of Azure Synapse Analytics to Seamlessly Integrate, Analyze, and Optimize Complex Data for Enhanced Business Insights and Decision-Making (English Edition)
From Everand
Ultimate Azure Synapse Analytics: Unlock the Full Potential of Azure Synapse Analytics to Seamlessly Integrate, Analyze, and Optimize Complex Data for Enhanced Business Insights and Decision-Making (English Edition)
Swapnil Mule
No ratings yet
Bellamkonda Manasa
No ratings yet
Bellamkonda Manasa
7 pages
Cloud RanganathJasti
No ratings yet
Cloud RanganathJasti
6 pages
santhi cv
No ratings yet
santhi cv
3 pages
Data_Engineer_Preparation
No ratings yet
Data_Engineer_Preparation
5 pages
SF Resume
No ratings yet
SF Resume
3 pages
Week Number: (Week 1) Topic: Orientation Course Description
No ratings yet
Week Number: (Week 1) Topic: Orientation Course Description
12 pages
Microsoft SQL Server 2014 Business Intelligence Development Beginner’s Guide
From Everand
Microsoft SQL Server 2014 Business Intelligence Development Beginner’s Guide
Reza Rad
No ratings yet
SQL 101 Crash Course: Comprehensive Guide to SQL Fundamentals and Practical Applications
From Everand
SQL 101 Crash Course: Comprehensive Guide to SQL Fundamentals and Practical Applications
Emrys Callahan
5/5 (1)
DV_FINAL_UPDATED_LAB_MANUAL
No ratings yet
DV_FINAL_UPDATED_LAB_MANUAL
91 pages
Topic 03 Data Integration
No ratings yet
Topic 03 Data Integration
32 pages
Oracle DBA Code Examples
No ratings yet
Oracle DBA Code Examples
567 pages
Snowflake Data Platform Engineering: Definitive Reference for Developers and Engineers
From Everand
Snowflake Data Platform Engineering: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
SQL Made Easy: Tips and Tricks to Mastering SQL Programming
From Everand
SQL Made Easy: Tips and Tricks to Mastering SQL Programming
Ryan Campbell
No ratings yet
Group 7 Databases On The Web and Semi Structured Databases
No ratings yet
Group 7 Databases On The Web and Semi Structured Databases
33 pages
AWS Glue for Data Engineers: Serverless ETL Made Easy
From Everand
AWS Glue for Data Engineers: Serverless ETL Made Easy
Robert Johnson
No ratings yet
Mastering Apache Iceberg: Managing Big Data in a Modern Data Lake
From Everand
Mastering Apache Iceberg: Managing Big Data in a Modern Data Lake
Robert Johnson
No ratings yet
CoSc 265 FDMS - Chapter Two
No ratings yet
CoSc 265 FDMS - Chapter Two
24 pages
DOC-20241221-WA0006.
No ratings yet
DOC-20241221-WA0006.
14 pages
GDP SAP TPL 460 Master Object List Standard Objects SPDD
No ratings yet
GDP SAP TPL 460 Master Object List Standard Objects SPDD
47 pages
DBMS-LECTURE 13 Transactions
No ratings yet
DBMS-LECTURE 13 Transactions
54 pages
Data Engineering with Scala and Spark: Build streaming and batch pipelines that process massive amounts of data using Scala
From Everand
Data Engineering with Scala and Spark: Build streaming and batch pipelines that process massive amounts of data using Scala
Eric Tome
No ratings yet
Baingan Lelo
No ratings yet
Baingan Lelo
7 pages
0144 Access Database Design Course Tutorial
No ratings yet
0144 Access Database Design Course Tutorial
22 pages
Azure Data Demystified: From SQL to Synapse
From Everand
Azure Data Demystified: From SQL to Synapse
Kameron Hussain
No ratings yet
Thimmarayudu. Gangavaram 8007779596: Loading Unloading
No ratings yet
Thimmarayudu. Gangavaram 8007779596: Loading Unloading
4 pages
Implementation of Data Warehouse
No ratings yet
Implementation of Data Warehouse
11 pages
Ism Lab File
No ratings yet
Ism Lab File
22 pages
Snowflake Resume
No ratings yet
Snowflake Resume
4 pages
Computer Science Sem2 GE2 DSC2 2022 (1)
No ratings yet
Computer Science Sem2 GE2 DSC2 2022 (1)
2 pages
BMM Layer New Features-OBIEE11g
No ratings yet
BMM Layer New Features-OBIEE11g
31 pages
Microsoft Azure Database Administrator DP 300
From Everand
Microsoft Azure Database Administrator DP 300
Manish Soni
No ratings yet
DS-BDS (Unit 1) Technical
No ratings yet
DS-BDS (Unit 1) Technical
22 pages
Daily Tasks in Oracle RDS
No ratings yet
Daily Tasks in Oracle RDS
17 pages
Amit Pathak Resume1
No ratings yet
Amit Pathak Resume1
1 page
List D
No ratings yet
List D
3 pages
DML Statements
No ratings yet
DML Statements
12 pages
Table Exe
No ratings yet
Table Exe
3 pages
DA - Nguyễn Phương Thảo
No ratings yet
DA - Nguyễn Phương Thảo
3 pages
Data Provider: For Connecting To Database, Retrieving Data, Storing It in Dataset, Reading and Retrieving It and
No ratings yet
Data Provider: For Connecting To Database, Retrieving Data, Storing It in Dataset, Reading and Retrieving It and
6 pages
Latihan 1 - Pemrograman Basis Data
No ratings yet
Latihan 1 - Pemrograman Basis Data
4 pages
Creating User Accounts in SQL Server
No ratings yet
Creating User Accounts in SQL Server
2 pages
All Questions Are Compulsory
No ratings yet
All Questions Are Compulsory
2 pages
Information Retrieval System
No ratings yet
Information Retrieval System
4 pages
CV For Snowflake Traning
No ratings yet
CV For Snowflake Traning
4 pages
BSC Iiyr IV Sem Dbms Total Notes
No ratings yet
BSC Iiyr IV Sem Dbms Total Notes
110 pages
Abdul_SnowflakeDeveloper
No ratings yet
Abdul_SnowflakeDeveloper
3 pages
The Snowflake Handbook: Optimizing Data Warehousing and Analytics
From Everand
The Snowflake Handbook: Optimizing Data Warehousing and Analytics
Robert Johnson
No ratings yet
Low Level Design Document
33% (6)
Low Level Design Document
5 pages
Lakshmi Snowflake Resume
No ratings yet
Lakshmi Snowflake Resume
4 pages
Foreign Chapter 2
80% (5)
Foreign Chapter 2
6 pages
Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
From Everand
Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
alasdair gilchrist
5/5 (1)

@Q_B@Snowflake & AWS

Uploaded by

@Q_B@Snowflake & AWS

Uploaded by

“Set 1 - General Questions 1️⃣Tell me about yourself.

”🟢 HR Round 1 – Behavioral & Fitment Questions

Set 1 - General Questions

1️⃣ Tell me about yourself.

2️⃣ Why are you interested in this role?

3️⃣ What do you know about our company?

4️⃣ What are your strengths and weaknesses?

o Proficiency in building scalable ETL pipelines

o Experience in cloud platforms like AWS & Snowflake

5️⃣ Where do you see yourself in the next 3-5 years?

Set 2 - Work Experience & Role-Specific Questions

 Situation: Migrating an on-premise SQL Server ETL pipeline to Snowflake.

 Task: Improve performance and reduce maintenance overhead.

 AWS Glue: Implement checkpointing & retry logic in PySpark jobs.

🔟 How do you ensure data accuracy and consistency during migration?

 Schema validation before migration

 Row count & checksum validation

🟠 Technical Round 2 – Data Engineering Concepts

Set 1 - ETL & Pipeline Migration

 Assess source schema & ETL logic

 Extract data using AWS DMS

 Use AWS Glue/Snowflake Staging Tables for transformations

 Optimize Snowflake warehouse sizing & indexing

2️⃣ Best practices for designing ETL workflows in AWS Glue?

 Use DynamicFrames for schema flexibility

 Optimize memory with Spark partitions

 Use S3 as an intermediate storage layer

3️⃣ Snowflake vs Redshift vs BigQuery?

 Snowflake: Best for on-demand compute scaling & semi-structured data.

 Redshift: Good for batch processing, but less flexible.

4️⃣ Handling incremental vs full data loads in Snowflake?

 Full Load: Truncate and reload entire data.

 Incremental Load: Use Streams & Tasks to track changes.

5️⃣ Optimizing ETL jobs in AWS Glue & PySpark?

 Optimize partitions & avoid shuffling in PySpark

Set 2 - Snowflake Performance & Optimization

6️⃣ How do you optimize Snowflake queries?

 Use clustering keys, result caching, and materialized views

 Minimize **SELECT *** queries and optimize joins

7️⃣ Role of clustering & partitioning in Snowflake?

 Clustering improves query pruning

 Partitioning (via file structure) reduces unnecessary scans

8️⃣ Handling large-scale data ingestion into Snowflake?

 Parallel COPY commands from S3

 Auto-ingest using Snowpipe

9️⃣ Transient vs Permanent Tables in Snowflake?

 Transient: No Fail-safe, used for staging.

 Permanent: Retains history for compliance.

🔟 Time Travel & Zero-Copy Cloning?

 Time Travel: Restore data from past states.

 Zero-Copy Cloning: Clone tables instantly without duplication.

🔵 Technical Round 3 – System Design & Cloud Infrastructure

Set 1 - Data Modeling & Warehouse Design

 Understand business requirements (KPIs, dimensions, fact tables).

 Choose schema type (Star Schema or Snowflake Schema).

 Optimize data storage with clustering and partitioning.

 Use materialized views for frequently used aggregations.

 Leverage Snowflake features like micro-partitioning and result caching.

Feature Star Schema Snowflake Schema

Structure Denormalized Normalized

Performance Faster queries Slower joins

Storage More redundant data Less redundancy

Joins Fewer joins needed Multiple joins required

Use case Fast query performance Optimized storage

 Extract data using AWS DMS, Fivetran, or Stitch.

 Stage raw data in Snowflake using schema similar to Salesforce/NetSuite.

4️⃣ Explain fact vs dimension tables in a Snowflake data warehouse.

 Fact tables store transactional data (e.g., Sales, Orders).

 Dimension tables provide context (e.g., Customers, Products).

 Fact tables have high cardinality and numeric values.

 Fact Table: sales (sale_id, customer_id, product_id, amount, date_id)

 Dimension Table: customers (customer_id, name, region, created_at)

5️⃣ How do you handle slowly changing dimensions (SCD) in Snowflake?

 SCD Type 1 (Overwrite): Update records directly.

 Use Streams & Tasks to track changes efficiently.

Example SQL for SCD Type 2:

 Minimize SELECT * queries and optimize joins