0% found this document useful (0 votes)
190 views

100 Days of Data Engineering - Make A Copy and Use As You Need - Sheet1

This document outlines a 100-day plan to become a data engineer. It includes daily tasks focused on skills like SQL, programming, cloud technologies, data modeling, and data warehousing. The plan emphasizes practicing these skills over a 100-day period to build habits, though it notes that 3 months alone may not be enough to fully become a data engineer. Learners are encouraged to continue practicing key topics beyond the 100 days.

Uploaded by

Spencer Brian
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
190 views

100 Days of Data Engineering - Make A Copy and Use As You Need - Sheet1

This document outlines a 100-day plan to become a data engineer. It includes daily tasks focused on skills like SQL, programming, cloud technologies, data modeling, and data warehousing. The plan emphasizes practicing these skills over a 100-day period to build habits, though it notes that 3 months alone may not be enough to fully become a data engineer. Learners are encouraged to continue practicing key topics beyond the 100 days.

Uploaded by

Spencer Brian
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

100 Days Note

100 days is just a little over 3 months and I don't believe 3 months is truly sufficent to "become a data engineer" or at the very least it feels a little fast. There is no need to
rush. The real purpose of this 100 days is to get you into the habit of practicing. If aftwards you want to dig into specific subjects, do that! Don't let this 100 days limit you.

Day Task Notes Category


For day one, what I reccomend is taking the time to answering some questions
and write out your plan to commit to the next 100 days on social media or
somewhere people can help keep you accontable. A discord group, slack, etc
1. What do you hope to accomplish by the end of the 100 days
2. Are there any topics you'd like to learn that aren't covered?
Day 1 Take a moment to write your goals
1. Downloading SQL Server And Creating A Tables
2. Joins
Day 2 3. Case Statements SQL
1. SQL Interview Tips
Day 3 2.Solving More Problems With SQL SQL
1. Partition By
2. CTE (Common Table Expression)
Day 4 3. Stored Procedures SQL
1. Loops Strings And Tuples
2. Functions
3. Mutabiltiy
Day 5 4. Error Handling Programming
This video is quite long, so I've put it over the
next three days. So you can watch a little over
Day 6 Basic Linux Commands 1/3 1.5 hours a day Linux
Day 7 Basic Linux Commands 2/3 Linux
Day 8 Basic Linux Commands 3/3 Linux
1.Data Modeling Basics
Day 9 2.Normalization Vs Denormalization Data Model
Day 10 Read Chapters 1,2,3 in Kimballs Data Warehousing Toolkit Data Model
What is Data Pipeline | How to design Data Pipeline ? - ETL vs Data pipeline
Day 11 (2023) Data Pipeline
Day 12 SQL Project Example SQL Deeper Dive
1. 262. Trips and Users
2. Popularity of Hack
3. Average Salaries
Day 13 4. 626. Exchange Seats SQL Deeper Dive
Use the `bigquery-public-data.stackoverflow.*` data set and answer some of the
following questions and come up with some of your own
1. What percentage of stackoverflow questions that ended with a "?" had
accepted answers
2. Are there certain programming langauges that are more likely to have 1. What questions can you answer using this
accepted answers data set?
3. Do certain programming languages have questions that get answered more 2. Are there places you can join the data set?
quickly 3. Write out 10 questions you think you can
Day 14 4. Do certain programming langauges get more answers on average than others? answer SQL Deeper Dive
Day 15 Continue from yesterday with new questions. Come up with some of your own? SQL Deeper Dive
This video is a a 10 hour video, I'd reccomend
you break it down into 2 hour segments over the
next few days. You should also take notes and
share them. Also, another benefit here is if you
feel confident, you might be able to consider
taking a cert once you're done with this set of
Day 16 AWS Certificate Prep videos and some of the projects Cloud
Day 17 AWS Certificate Prep Cloud
Day 18 AWS Certificate Prep Cloud
Day 19 AWS Certificate Prep Cloud
Day 20 AWS Certificate Prep Cloud
1. GCP Intro
2. GCP and VPC
Day 21 3. GCP IAM Cloud
1. GCP Bigquery
Day 22 2. GCP Cloud Composer Cloud
1. Azure Vocab
2. Azure Opex Vs Capex
3. Azure Geographics And Regions
Day 23 4. Azure Basic Compute Services Cloud
1. Azure Private Networks And VPCs
2. Azure Storage
3. Azure Big Data Services
Day 24 4. Azure Serverless Computing Cloud
1. Data Structures And Algorithms Review Chapters 1-5
2. Introduction to Linked Lists (Data Structures & Algorithms #5)
3.Introduction to Recursion (Data Structures & Algorithms #6)
Day 25 Programming
1.Data Structures And Algorithms Review Chapters 8-11
2 Big O Notation
Day 26 Programming
Go through this article and if you have time, and
1. WEB SCRAPING then if you have time see if you have time to
Day 27 2. Reading CSVs, JSON And APIs start a project Programming
Day 28 Keeping time, scheduling, tasks and launching programs Programming
Programming Your Own Thing
Using the prior few days readings, try coming up with some small mini projects.
Perhaps you can automate a task such as scraping a website, or hitting and API.
Day 29 But take your time and enjoy some free time just trying things out for yourself Programming
1. Learn Database Normalization - 1NF, 2NF, 3NF, 4NF, 5NF
Day 30 2. Logical Data Model Data Model
1. Database Denormalization
Day 31 2. Article TBD(I'll be writing one shortly) Data Model
Day 32 Read Chapters 4,5,6 in Kimballs Data Warehousing Toolkit Data Warehousing
Day 33 Agile Data Warehouse Chapters 1,2(and if time 3) Data Warehousing
1. What Is A Data Pipeline
Day 34 2. ETLs, Data Pipelines, Etc Data pipelines
Day 35 Basic Data Pipeline Project Data pipelines
I'll be running a QA on the 36th day(or so) that
should be the 7th of February. We can use it as
a time for people to ask questions and then I'll
Day 36 Live QA And Pipeline Sign Up attach a link the the live in the future Progress Review And QA
At this point you may need some time to catch up. If that's the case, then the next
three days can be used for that. But if you have the time, here are some articles
and videos
1. What Is Query Driven Modeling
2. What Is Change Data Capture
Day 37 3. Stateful Streaming Catch Up
1. Airflow Is Not An ETL Tool
2. Databricks Vs Snowflake
Day 38 3. Data Engineering Vocab Catch Up
1. Why Is Data Engineering Important
Day 39 2. MongoDb Is Not For Analytics Catch Up
At the end of day 40, you should take a moment and review what you have
Day 40 learned overall(otherwise you'll forget all of your hard work) Write A Review
Day 41 Read How To Start Your Next Data Engineering Project Mini Project
1. Pick a data source, (also you can find some more here and here)
2. Write out 10-15 questions you'd like to answer
3. Select 3 or 4 of those questions as the ones you'll focus on
4. Design a basic dashboard you can build in 2-3 days based on the questions
(pick a solution like Tableau, Powerbi, or easy to work with dashboarding
solution)
5. Pick a data storage solution to use like Snowflake, Postgres, etc
Day 42 6. Kick-off your project Mini Project
1. Load your data into your data storage system
2. Perform a general EDA to understand what your data looks like, either with
SQL or Python
3. Answer your questions from day 1
4. Write up your current progress and note down which code or SQL is actually
Day 43 going to be used Mini Project
1. You should hopefully have an idea of the data properties so you can create a
basic data model and the queries required ot create it
2. Create a process that automate those queries, either using Cron or some other
form of scheduler
Day 44 3. Create a layer that can be used for the analytics(aggregate tables, views, etc) Mini Project
Day 45 Continue with any uncompleted tasks from the past few days Mini Project
Day 46 Run some basic data quality checks to ensure your data is accurate Mini Project
Day 47 Start to create your dashboard and populate it Mini Project
Day 48 Finish Dashboard Mini Project
Run some final QA and decide how you'd like to display this project(also general
Day 49 catchup) Mini Project
Day 50 Write a blog, post or create a github repo to share your project Mini Project
Day 51 Video To Be Filmed By Seattle Data Guy Tool Intro
1. What Is Apache Spark
2. Downloading And Working With Spark
Day 52 3. Quickstart Spark Spark
1. RDD Programming
Day 53 2. Pyspark Tutorial Spark
Day 54 Long Pyspark Tutorial Spark
1. Docker Intro And Setting Up Airflow
Day 55 2. Docker In An Hour Docker
1. Airflow Intro
Day 56 2. Airflow Tutorial 2 hour walk through Airflow
1.Set-up Airflow yourself on an ec2 instance
Day 57 2. Set-up basic DAG that pulls data from a one of these data sources(TODO) Airflow
1. Challenges You Will Face With Airflow
Day 58 2. Common Mistakes You'll Make Setting Up Airflow Airflow
Take some time to review what you've learned thus far or take some time off!
Here are some other things you could do.
1. Write about what you've learned, and what you still don't understand
2. Find a friend who you can teach some of the concepts you've learned(teaching
Day 59 is a great way to learn) Catch Up and Review
Day 60 Same as the prior day Catch Up and Review
1. Intro To Databricks
2. Setting Up Databricks
Day 61 3. Load Data Into Databricks Databricks
1. Databricks Delta Table
Day 62 2. Databricks Delta Table Video Databricks
1.What Is Trino
Day 63 2. Setting Up Trino Trino/Presto
Day 64 Continue setting up trino and working with it
Day 65 Data Governance Book Data Governance
Day 66 Data Governance Live - Sign Up Data Governance
1.Creating A Data Governance Framework
2.Data Governance for Modern Organizations, Part 1
Day 67 Data Governance
1. What Is A Data Catalog
2. Data Catalog Case Study
Day 68 3. Datahub Purpose And Architecture Data Catalogs And Lineage
1. 6 Pillars Of Data Quality
2. How And Why We Need To Implement Data Quality Now!
Day 69 3. Data Quality And Examples Data Quality
1. Data Quality Examples With SQL
Day 70 2. Data Quality With DBT Data Quality
Start your own project
1. Pick a data set you can pull
2. Plan out what questions you'd like to answer(List out 10-15 questions you'd like
to answer)
3. Pick 4-5 of those questions to focus on
3. Start to plan out how you'll serve up the data/insights(dashboard, ML,
application, etc)
Day 71 4. Decide on some tools you'd like to use Project Planning
1. Set-up your infrastructure, Cloud components, Airflow, etc
2. Set-up any database/storage system you will use

Day 72
From here, you'll likely need to take this project on yourself. Create a project plan
for the next 30 or so days. It doesn't have to take all of the next few days. But
really look at this as a time you can learn and try out lots of ideas. But for the
most part you'll take on a similar approach. Set-up your infrastructure, load your
Day 73 data, analyze it, figure out what you'd like to display, etc
Day 74
Day 75
Day 76
Day 77
Day 78
Day 79
Day 80
Day 81
Day 82
Day 83
Day 84
Day 85
Day 86
Run the project as you've planned out
Day 87
Day 88
Day 89
Day 90
Day 91
Day 92
Day 93
Day 94
Day 95
Day 96
Day 97
Day 98
Day 99
Day 100 Write a blog, post or create a github repo to share your project

You might also like