NoteGPT_Data Analyst Bootcamp for Beginners (SQL, Tableau, Power BI, Python, Excel, Pandas, Projects, More)
NoteGPT_Data Analyst Bootcamp for Beginners (SQL, Tableau, Power BI, Python, Excel, Pandas, Projects, More)
you are in the right place to learn to become a data analyst in this massive boot
camp Alex the analyst will cover all the core topics that data analysts need to
know and along the way you'll build plenty of projects to gain hands-on experience
hello everybody my name is Alex freeberg better known as Alex the analyst on
YouTube and in this video you're going to be taking my entire data analyst boot
camp this boot camp is comprised of videos that I've made over the past 3 years and
they
00:00:26
cover a lot of different topics like SQL Excel powerbi tableau and python
throughout the boot camp there are a lot of Hands-On guided projects that will
really help you learn these skills well and speaking of projects there's an entire
Part near the end where you can build a free portfolio website where you can put
all of your projects on so that hiring managers and recruiters can go and look at
all these projects that you've built if you wanted to go even more in depth into
the skills that we
00:00:47
learn in this boot camp I have a data analytics learning platform called analyst
Builder analy Builder was designed specifically for data analyst so all of the
courses and all the content are just for you and it has a coding section where you
can learn and practice for technical interviews and lastly before we jump into the
boot camp I want to give a huge shout out to free code camp for putting this all
together personally learned a ton from free code camp and so I'm really honored
that my boot camp is going to be here for you
00:01:09
guys to learn and I really hope you enjoy it what's going on everybody it is 2023
and in this video I'm going to help you become a data [Music] analyst we're going
to start at the very beginning assuming you haven't started this process at all of
becoming a data analyst if you already have you can kind of find IDE identify where
you are in this process and then go from there now before we dive into everything I
want to warn you I will be mentioning my own channel a lot in this video I have
00:01:39
videos and playlists on just about every single topic that we're going to be
talking about today I'll have all the links to those videos in the description so
you can dive into those topics more in depth so I hope that's okay and it's all
completely free I've been building this out for the past 3 years and honestly you
can probably get 90% of the way to learning everything you need for data analytics
just on my channel so now that I've warned you let's J been of number one and that
is learn the data
00:02:01
analyst skills now there are literally a hundred different things that you can
learn for data analytics you can learn things like alter X or a cloud platform or
different programming languages but there are some core skills that I recommend you
start out with before kind of branching into some of those other skills the number
one skill that I always recommend people start with is SQL SQL is just one of those
fundamental skills I think everybody should learn even if you don't use SQL you'll
use
00:02:23
some variation of SQL if your company has a large enough data set SQL is used to
actually query and retrieve data from a database so if your company collects data
which every company does they're going to put it somewhere to store it's usually
stored in a database and sqls how you get that data from the database I think SQL
is also fairly easy to learn which makes it really good when you're just starting
out I have several playlists dedicated to SQL starting from beginner all the way to
Advanced and you
00:02:48
can learn all of that for free one other reason why I think you should learn SQL
first is that a lot of companies interview or have a technical interview during the
interview process on SQL that's something that really caught me off guard when I
was first starting out out because I thought it was going to be more behavioral I
didn't even know what a technical interview was so knowing SQL actually became a
really important part of interviewing and getting a job as a data analyst the
second skill that I
00:03:10
would learn is a business intelligence tool like Tableau or powerbi now there are a
ton of different bi tools I can literally name 10 off the top of my head that I've
used throughout my career but what I will say is that learning something like
Tableau or powerbi is pretty transferable to almost all those other bi tools
they're all fairly similar and how they do things and how they show display the
data you most likely won't have a technical interview asking you about Tableau or
powerbi like
00:03:35
to build something for them that usually does not happen but the combination of SQL
where you can query your data and then taking that data to build something that is
a really really great combination to learn right away I have entire series on both
Tableau and powerbi with projects on my channel the third skill that I would learn
is Excel now most people have used Excel they know what Excel is and how it's used
but it can be used a little bit differently for a data analyst for example example
00:04:00
in Excel a lot of people haven't cleaned data in Excel or built charts and graphs
using Excel and those are things that data analysts would probably do excel is also
just a fundamental skill that every company is going to expect you to know so I
have an entire playlist dedicated to excel to actually walk you through how to use
it for data analysis the fourth skill that I recommend you learn is python now a
lot of people will have python higher up on their list they only use Python they
don't use SQL or a bi
00:04:26
tool they just do everything in Python now python is a fantastic tool you can use
it to manipulate your data to create data visualizations and a ton more like web
scraping and regular expression and a hundred different other things but it can be
kind of hard to learn it took me a long time to really learn the basics very well
that's really the only reason why it is farther back I feel like SQL and a bi tool
are really easy to learn and really pack a big punch whereas python can be quite
tough to learn in my
00:04:53
experience and you may not use it as often as you would something like SQL or a bi
tool if you're interested in learning py python I have an entire series dedicated
to python as well as projects that you can build again I warned you there's going
to be a lot of self-promotion in this video I have videos on just about every
single one of these topics the fifth and the last skill that I recommend you
learning and this is the only one that I don't have a series on yet I will make
those is
00:05:17
learning a cloud platform like AWS Google Cloud platform or Azure there's no
denying that these platforms have played a huge impact in how we use data as a
whole in the data analyst industry they can be kind of tough to learn though if you
aren't using it Hands-On in an actual job I think that learning a cloud platform is
already something that most people should start working towards because in the
future it's only going to become more prevalent now where can you go and actually
learn all of these
00:05:42
skills that you need to become a data analyst well the number one place I'd
recommend of course is my channel I have free tutorials on all these skills and a
lot of other topics and I think it's just a really great place to start the next
place that I recommend you looking at is udemy I recommend udemy especially if
you're just starting out because it's pretty pretty cheap you can buy an entire
course entire SQL course for $10 or $15 and they have courses on every single one
of these skills and I just
00:06:07
recently made a video called DIY data analysts curriculum using udemy for under $75
so you can create an entire curriculum to learn all of these skills for under $75
which is just amazing the next place I'm going to recommend you look is corsera now
udemy is fantastic they have really good instructors and good courses but as a
whole I find that sometimes corsera just has more professional or better content
corsera is a bit more expensive though you're looking at $59 per month for all of
00:06:37
their courses or you can pay upfront an annual fee of $399 so again it's just a lot
more expensive I moved to corsera once I started having a data analyst job and had
a bit more money but when I was first starting out I just couldn't afford it so I
went to udemy and it was a really great place to start there's also places like
data camp and data Quest that kind of gamify learning and they're more text based
so all these other platforms udem me corsera and me they're all video based but if
you like
00:07:03
reading data camp and data Quest are a lot more of text where you can learn it by
reading it and doing it after you learn all of these skills the next thing that I
recommend you do is actually build projects with those skills now what is building
a project actually mean it means taking a skill and then building something out of
it that you can then show a potential employer for example if you went through and
learn Tableau you go and take a data set and you could build a visualization and a
dashboard in tableau and that would be a
00:07:30
project with these projects you can build something called a portfolio and I
usually call it a portfolio website a portfolio website is a website that you
create where you store all of your projects and then you can share that with
recruiters and hiring managers so that they can see all of your work now do you
absolutely need a portfolio to show employers no you don't but it does help in two
different ways the first thing that it may do is actually help you land the
interview if you have a link on your resume and they click on it
00:07:57
they may see your skills and see your projects and be like man this person really
knows what they're doing this is exactly what we need the second reason that I
recommend building projects is because most likely during your interview you're
going to get asked questions like how have you used SQL how have you used Tableau
and if you don't have any experience in that you're just going to say well you know
I've taken courses to learn it but with a project you can be a lot more specific
you'll be
00:08:22
able to say well I actually just built out this project in Tableau I took the data
and cleaned it in Excel and then I put it in Tableau and built out this Dash board
and here are the insights that I found from this data set it's just a much better
answer and as a hiring manager myself I can tell you that it is definitely
beneficial to build out these projects The Next Step that I recommend you take in
becoming a data analyst is building a data analyst resume the resume to say the
least is extremely important it's what's going to
00:08:49
actually allow you to land an interview to potentially get a job now if you were
like me when I was first starting out I had a resume it just had nothing to do with
data analytics so how do you make a data analyst resume if you don't have any
experience as a data analyst well you are asking the perfect questions because the
very first things that we talked about are what are going to go on your resume
those skills and those projects if you have no experience or degree like myself who
has a recreational therapy degree if you have
00:09:18
no background in this it can be really daunting to kind of display that you know
what you're doing and that a company should hire you so what I usually recommend is
right beneath your contact at the top you put your skills and your projects that
you built out on your resume things like work experience and education should go on
your resume as well but just a little bit lower you want them to see those things
before they see that your last work experience was at Domino's and you have a
degree in
00:09:43
Marine Biology it's just not relevant to data analysis and if you put those things
at the top they're probably going to rule you out right away the fourth step to
become a data analyst is actually applying you have the skills you have the
projects you have the resume now you're ready to start applying for those data
analyst jobs now there's there's a lot of different opinions on how you need to go
about applying for data analyst jobs but I'll give you my take on it and this has
been
00:10:05
the most successful for me in my career the first thing I want to mention is
actually what I would not do which is just blindly apply on glass door monster zip
recruiter and all these other platforms to just any data analyst job that you can
find now I'm not against this I think you should do that but I don't think that's
the only thing that you should do because the chances of you getting a call back or
actually hearing something back are extremely low to really increase your chance of
becoming
00:10:30
a data analyst I highly highly highly recommend working with a recruiter a
recruiter is literally someone who is there to help you find a job now when I first
started out I didn't understand what a technical recruiter was at all I was kind of
nervous or scared to work with him but it's actually pretty simple a company has a
position that they want to fill and they don't want to spend hours and hours and
hours to find someone to fill that position so they hire a recruiter a recruiter is
going to
00:10:54
go out and try to find someone to fill that position AKA you and so if you go into
talk to that recruiter and they have a position that opens up they will help you
get that interview and then if you get a job let's say for $50,000 the company is
going to pay that recruiter let's say 10% of your salary so they'll give them
$5,000 so you don't actually lose or have anything to lose using a recruiter you
can reach out to Recruiters in several ways and I've done every variation but I'll
tell you my
00:11:21
most successful way which was using LinkedIn there are tens of thousands of
Recruiters on LinkedIn I made an entire video of how you can reach out to
recruiters and what to St Recruiters on LinkedIn to help you land a job so be sure
to check out that video when you actually get to that point but you can also just
cold email and cold call these recruiting companies but to me it's just not as
effective as reaching out directly on LinkedIn and this is just a bonus one the
last thing that you need
00:11:46
to do is accept a job offer so on step number four after you apply to those jobs
you do actually have to go in interview and then get a job offer which you will
accept I just thought I'd mentioned that just in case that was not super clear now
that was a lot of stuff let's talk about time frames to actually complete all of
these things now doing all of these things from scratch is going to take a while
but let's break it down by each step and see how long I generally think it's going
to take let's
00:12:11
start with step number one which is actually learning the skills now just to be up
front this one probably is going to take the longest for most people for most
people to learn all of these skills it's going to take around 3 to four months now
if you don't learn a cloud platform and python which are the last ones that I
recommend and you just focus on SQL a to in Excel I think you can do that in under
3 months that is very dependent though on how much time you have to study that time
frame is more
00:12:37
for someone who has several hours per day maybe 3 hours in the end of a night after
you go to work that is someone who has quite a bit of time to dedicate to learning
during their week of course that time frame is going to take longer if you don't
have as much time to dedicate to learning now let's look at number two which was
creating projects and a portfolio of projects from my experience when you're first
starting out it takes a lot longer to actually create these projects it can take
one
00:12:59
one or two weeks per project I usually recommend people doing three to five
projects in their portfolio before they start applying and since they can take
anywhere from 1 to two weeks you're looking at anywhere from 3 to 6 weeks The Next
Step was to create a data analyst resume now in my opinion this one should take the
shortest out of every single step here because you're really just kind of
reformatting a resume or creating a resume you're just adding skills you're adding
your
00:13:22
projects and then kind of reformatting it to make it look nice this should
hopefully take under a week but if you use something like a professional service so
they help you build a resume it could take one to two weeks the two last steps
which kind of go hand inand are step four and five which is actually applying for
jobs and then Landing a job now this process can take as little as a month or it
can take as long as 6 months or a year it really depends on how you're applying
where you're applying
00:13:48
and just the kind of luck that you're having with actually Landing interviews I've
seen people who have never had any experience land a job within a month of starting
to apply and it's incredible it's amazing but it doesn't happen too often you're
usually looking at around 2 to 4 months on average to land your first data analyst
job if you put all of those together and kind of average everything out you're
looking at around 6 months total for the entire process now I don't want that to
discourage you
00:14:14
okay 2023 is a long year you have a lot of time and it doesn't have to take 6
months you could do it faster you could do it in three months and just prove me
wrong but if you are really focused and you are really driven to become a data
analyst this year I know that you can do it now to maybe boost your spirits and
make you feel a little bit better I didn't know any of these things when I first
started out I didn't have anyone telling me kind of a plan on what to do I had to
go out and figure all these
00:14:37
things out by myself and it took me almost a year to land my first real data
analyst job so with all that being said I hope that this video is helpful I hope
you now have a path on how to become a data analyst this year and that my channel
can be a big part of that so thank you guys so much for watching I really
appreciate it if you like this video be sure to like And subscribe below and I'll
see you in the next video [Music] what's going on everybody my name is Alex
freeberg and in today's video we're
00:15:12
going to be starting our basics of SQL series now in this series we're going to be
going over everything you need just to get started and then in future videos we're
going to be going over some intermediate Concepts and some more advanced concepts
and then in the final series we're going to be going over some portfolio projects
in this video in particular we're going to be downloading SQL Server Studio we're
going to be creating our tables inserting data into our tables and in future videos
we're
00:15:34
going to actually learn how to query those tables if you already have SQL Server
management Studio downloaded you can skip ahead to where we actually create the
tables and insert the data into the tables if you don't care about that at all and
you're just looking at a query I would skip to the next video where we actually
start quering the data that we inserted into those tables so to download SQL Server
management Studio we actually have to download two things and I have both links
right here I'm going
00:15:55
to leave those in the descriptions that you guys have those but this one is to
actually download SQL Server management studio so let's go down here I actually
deleted it off my computer so I can walk through this with you guys so we're going
to download that let's also go over here this is actually a server so we have to
download a SQL server and if you go down right here there's a free version now I
don't need the developer version I'm just going to download the express version
it's actually smaller so
00:16:20
let's download that as well now once this is done running we're going to open it up
and I'll show you what to do next so it just finished running let's click on it all
right so we need to install it we're going to click yes and this is going to take a
little while so this popped up I clicked install and it's been running for the past
couple minutes apparently I was not recording so I apologize for that but that's
all I did so now it's been installed I'm actually
00:16:50
going to pull it up right here and let's open it up now when it pulls up it's going
to ask you to connect to a server and that's why we downloaded the SQL Express
server so let's connect to that and there you go it's as easy as that so now we
have SQL Server management Studio set up and we are good to go so the first thing
that we need to do is actually create a database so let's go over here to databases
and let's click new database and let's just do SQL tutorial keep it simple and if
we click
00:17:31
that it's going to create our database for us now when you open up the database
there's going to be a lot of stuff you really do not need to know all this really
what we're going to be sticking to is this tables right here uh as of right now we
do not have any tables so we need to create tables now there's two ways that you
can do that you can click right here and you can go to new and create table we're
not actually going to do that we're going to create it using a script or a t-sql so
we're going to go
00:17:58
over here and do new query and we will get started on actually creating uh the two
tables that we're going to be using for all the stuff going forward all right so
let's get rid of me CU you really don't need to be seeing me anymore let's get
started by doing our very first table which is going to be our employee
demographics table so let's start off by saying create table and we have to name it
so let's do employee demographics and enter down we want to do an open parenthesis
now we
00:18:27
need to specify what our column names are going to be and what the data type is for
each column so let's start off with employee ID and we want that to be an integer
so that'll be like 1 2 3 4 uh anything numeric now we want to do first name and
let's make that varar 50 if you don't know what these data types are that's okay uh
that will probably be covered in a different video that's not really necessary for
this video uh let's do last name we'll also make that varar 50 let's do age make
00:19:04
that an integer and very last let's do gender and we will make that varar 50 as
well so now we have our very first table let's run that and we'll see if it works
we'll go over here we'll refresh our tables and there you go so we have our very
first table let's go up here let's get rid of this one and now let's create our
second table so we're going to do basically the exact same thing but we're going to
have a little bit different information in it this is going to be
00:19:39
our employee salary table so let's do create table and again we need to name it and
enter and open parenthesis so now we're going to do the same thing we're going to
do employee ID let's make that an integer now we want the job title because we want
to know what they do and this one is going to be varar 50 because we keep it pretty
simple whoops and then for our very last one we're going to do salary and that will
be integer as well and I'll just do PR here so let's create this
00:20:25
table let's see if it is there and there we go so let's open up one of these tables
really quick see what's in there see what it looks like as you can see we do not
have any information in there uh when you create a new table sometimes when you
open it up you're going to see this if you want to get rid of that you just need to
do a I think it's called A Hard refresh or something like that but you can do
control shift R let's see if it works for me I just did it all right it
00:20:55
goes away so now it recognizes it as a table so we're good there let's go back here
and let's get rid of all this we've already created our tables now we want to
insert the data into our tables so let's see what that looks like let's do insert
into and now we need to specify what table we're inserting our data into so let's
start off with employee demographics let's do values so now we have to select what
values we're going to put into um into this table so now we're going to have to do
the
00:21:32
employee ID so let's do 101 then we're do first name so let's do Jim last name
Halpert and then his age let's say he's 30 and he is a male now just for fun let's
execute that let's go back to this table right here and execute and as you can see
all of our information actually went in there so now we have his employee ID his
first name his last name age and gender now we need a lot more information uh for
this table in order to actually learn a lot of the concepts of quering the table so
00:22:15
I'm actually going to go through and add a ton more information I'm not going to
bore you through that but I will show you the final product before I actually hit
execute so stick with me I'm actually just going to cut to the end where I insert
all my stuff down on here and then if you want that I'll probably leave it in the
description or maybe put in my GitHub or something so you can easily just go copy
and paste that if that's what you want to do so I'll see you in a few
00:22:38
seconds all right so I have all my values right here I actually going to take this
one out cu I already did that one but this is our additional information let's
insert that into our table real quick and go back here and take a look at it and
there you go this is going to be our core information that we are querying off of
uh in future videos so that table is completely finished let's go back here we're
going to get rid of this because now we want to insert our information to our other
table so let's do insert into
00:23:11
and let's do employee and now we're going to do salary so let's do values to
specify that we're inserting values into there and in this one we have employee ID
so again let's do in th1 that's gym his job title is salesman and let's say his
salary is $45,000 and let's execute that and you can't see it but down here it says
it's done let's go to that table and as you can see that is inserted I'm going to
do the exact same thing as I did before I am going to fill
00:23:52
out all these and in a second it will be done uh on your side and then again I will
leave it in the description or I'm going to put it on my GitHub and you guys can
just copy and paste that if that's what you want to do or you can write it out
whatever you want to do all right just like before I'm going to get rid of this
first one that is Jim he is already done now let's insert this information Ed is
finished let's go back here and there we go now we have both of our tables and we
are good to go for
00:24:21
future videos so thank you so much for sticking all the way through this one in the
next video we're going to actually begin uh quering the table and learning the
select the from the where the group by and the order by statement everything is in
these upcoming videos so stick around and we will learn all of that together thank
you so much for joining me if you like this type of content be sure to subscribe
below and I'll see you in the next video what is going on everybody my name is Alex
freeberg and
00:24:47
in today's video we're going to be going over the select and the from statement so
if you joined us for our last video we went over creating our tables and inserting
data into those tables and so we have this employee demographics table and we also
have this employee salary table and today we're going to be walking through the
select statement in the fir statement on these tables so here are some of the
concepts that we're going to be going over today let's just get it started by doing
select
00:25:14
everything and let's do this from the employee demographics table so let's execute
this if we wanted to only show the first names we can just do first name and run
that and if we want first name and last name we can just separate that by using a
comma and it will return those well if we want to return all columns and all rows
then all we have to do is use this star so that's what the star does now we have
nine rows of data here and if we only wanted to return let's say the top five we
can easily do that and we can
00:25:53
just say top five of everything now the reason this could be useful is say you have
a table that has millions of rows in it and you only want a small sample you can
say select top 1,000 and when you do that it will only select the top five rows now
let's get everything back in here really quick because we're going to move on to
this distinct feature so when we use distinct we're actually saying that we want
the unique values in a specific column so if we say distinct and then let's do
employee ID
00:26:29
D everything should be returned so all nine rows should be returned and that's
because every single one of these are unique now let's try gender so there's only
going to be two results the male and the female and that's because there's only two
distinct values in that column now let's look at all of our data again so now we
want to look at count now count is very simple all is going to do is going to show
us all the non null values in a column so let's look at last name for example if we
do count of last
00:27:07
name all that's going to give us is a count of nine because we have nine last names
if for whatever reason somebody's last name was left out and that was null then it
would have returned maybe eight or seven depending on how many were actually in
there so if an entire column was null we it would be a Return To Zero and if you
notice we are not given a column name that's because this is derived information
based off the last name so if we want to actually give this a name so that that
column does not say
00:27:37
no column name we can use this as right here so once you put as you can actually
name it so since this is the count of the last name we'll write last name count
keep it simple and if we execute that as you can see we have last name count right
there so that's how you use that as let's look at all of our data again we want to
look at some Max mins and averages right now and the only column here where it
would be useful to do it on is age but let's actually go over and let's look at our
salary table
00:28:15
and at our salary table we have some really interesting salaries that I think would
be a little bit more useful for this information so let's go over to employee
salary all all right and let's look at this table really quick so we have our
salary now we want to look at the maximum salary that is in uh that column and that
is going to be $65,000 now let's say we wanted to know what the minimum salary was
let's execute this and the person who makes the least money is making $36,000 now
what's the average what is
00:28:54
the average salary for all employees that's going to be $ 48,5 so so super easy to
use all of these things they're extremely useful I use them every single day so I
know that each of these are very very useful and are definitely among the basics
that you have to know let's look real quick at everything really quick so we just
learned the select statement but learning this from statement really quick is also
important up here this actually shows us that we're already Hitting off the SQL
tutorial database
00:29:28
but let's say we change it to master when we try to run this it's going to give us
an error and that's because now we're hitting off this database and this database
does not have this table in it so in order to do this in order to still hit off
that table while up here we're actually hitting off a different table we can change
this information so the from statement you have to specify three separate things
the first thing that you need to specify is the database so let's
00:29:54
say we want to hit off the SQL tutorial database now we want to select what table
we're going to do this is actually a dbo so let's put dbo there's there's a lot
that can go into that um it's not worth getting into now but dbo do and let's do
employee salary when we execute this our information comes up even though up here
we're still hitting off the master database when we specify it right here then we
actually are choosing what database and what table a hit off of and
00:30:29
so it does not matter what it is up here so that's how you use the from statement
in the next video we're going to be going over the wear statement and then after
that the group by and order by statement and that will be the complete basics of
SQL tutorial and then we'll start getting into a little bit more fun stuff some
more advanced concepts which I think it be really really exciting for everybody to
learn thank you guys so much for joining me I really appreciate I hope this has
been helpful if you like
00:30:53
this type of content subscribe below and I'll see you in the next video thanks and
goodbye what's going on everybody my name is Alex freeberg and in this video we're
going to be going over the we statement and SQL in the very first video we created
our table inserted data into our table in the second video we went over the select
and the from statement and now we are on to the wear statements now what does the
wear statement do it helps limit the amount of data and specify what data you want
returned we have
00:31:19
quite a few Concepts that we're going to be covering today let's just start out
with something really easy let's do where first name equals gym really simple so
we're selecting everything where our first name equals gym and this is our output
so really really simple now let's try where it does not equal this right here says
does not equal gym and let's execute that and as you can see we have everybody
except Jim Halbert in there so now let's look at the greater than or
00:31:51
less than so in this table I think the one that we're going to look at is age so
let's look at age and let's do where it's greater than 30 and when we execute that
we're going to get everyone who is over the age of 30 now as you can see we're not
including people who are 30 years old if we want to include people who actually are
30 years old we're going to add the equal sign right there so we should be seeing
people who are now 30 so before Pam and Jim were not in there and now
00:32:21
they are if we do the exact same thing let's do less than 32 here's everyone that's
going to be included but if we want to include the people who are 32y old then we
are just going to add that equal sign and now the people who are 32 years old like
Toby and Meredith are now included if we want to go even further we want people who
are less than or equal than 32 and who are male we can say where gender equals male
so now we have two two things that we are specifying that we need we need
00:33:00
somebody whose age is less than 32 and we need their gender to be male so let's
execute that and we have four people who meet that criteria so that's what the and
statement does if we write or then only one of these criteria has to be correct in
order for it to be met so if we hit execute now we're saying anybody who's under
the age or equal to 32 or their gender equals male so if we look down here Michael
Scott is actually 35 years old so he's over 32 but since he is male he is now
included let's get rid
00:33:36
of everything really quick I want to look at this like really quick so let's
execute just that and if you do that you highlight just that hit execute then it uh
will only run what you have highlighted so now let's look at this whole table now
when you're using like you typically are doing this for sometimes numerical but
most of the time you're using it for text information so if we're looking at this
right here if I'm looking at last names and let's say I want everybody whose
00:34:06
last name starts with s you can't really do that with anything else so I'm going to
say where it's like and then I'm going to say s and after that I'm going to put a
percent sign that's actually called a wild card and if I close that off what this
is saying is is I want every last name where it starts with where it's like where
it only starts with an S so let's run this really quick now we have two people
whose last names start with s now if I put a wild card at the beginning we
00:34:39
are now saying where there's an S anywhere in anybody's name so let's execute this
and see what we get so now even if the S is like flenderson towards the end it's
still counts so you can specify multiple things in here as well so let's say I want
it to start with s that would return shre and Scott but now I want something that
also has an o in it so so it has an S at the beginning and then somewhere in there
there's an O now let's execute that and there's only one person that meets that
criteria so
00:35:11
you can do that for multiple things you can even say OT TT and let's execute that
and he's still going to be returned and if we put C at the back it's not going to
be returned because it follows it in order so isn't s o TT C the C would actually
need to go over here so now we have s c o t t and although there's a bunch of wild
cards in here it is going to return Scott so that is a little bit a little hint at
how you can use like there is a little bit more that goes into it you
00:35:49
can use it for numerics um there's a lot of things that you can use this for but
this is just the basics how you can use it today how you get started on using the
like a nutshell that is how you use like and as I said before you can use like with
numerical data as well but for demonstration purposes I wanted to use text Data
let's get rid of this really quick um let's look at our entire table and I wanted
to show you how to use null and not null I can't really show you how to use null
because I do
00:36:21
not have any null Fields I could easily update this table and make n but that's in
a future video where it's a little bit more advanced where you can start altering
your data but just for purposes of showing you what null and not null is let's do
where first name is null and if we see that is not going to return anything but if
we say is not null it's going to return everything because nothing in here is null
nothing in this first name column is null so that's how you use it um there are a
lot
00:36:58
of use cases where you actually will use null and not null that will be in future
videos probably in the project section or the portfolio section we weren't able to
show really how to use this super well but just as a demonstration that's really
all it does it looks at the whole column and whether it is null or not null that's
really all it's used for this is actually super useful and you can use it in a ton
of situations but again for demonstration purposes that's really all it does so
let's get rid of
00:37:25
this let's look at in really quick so in is kind of like the equal statement but
it's multiple equal statements so let's say we want to say we first name equals gy
and then we were like wait we also want to include Michael Scott so then we would
have to write and where first name equals and then we would do Michael and then etc
etc for anybody that we wanted to include but if we said in we could do an open
parentheses and then we can say gy we can say Michael and we can say as many people
as
00:38:08
we want going down the road just separating it by commas and if we had execute
everything would be returned so it really is just a condensed way to say equal for
multiple things so that is the we statement I think the wear statement can get
extremely complex but this really is highlighting the basics so if you can learn
all of these Concepts you will absolutely have the basics down and will be set to
go over some more intermediate and more advanced things with the we statement later
on in the next video
00:38:37
we're going to be going over the group buy and the order buy and then we are done
with the SQL Basics and then you can practice and work your way up into my
intermediate level videos which are going to be coming out very shortly after these
videos thank you guys so much for joining me if you like this tutorial Series be
sure to subscribe below and I'll see you in the next video going on everybody my
name is Alex freeberg and in today's video we're going to be going over the group
by and
00:39:00
the order by statements in previous videos we created tables we went over to select
the from and the where and now we are at the very end of our SQL basic series if
you stayed with us for the whole time hopefully you have learned a lot and learned
the basics of SQL in future videos we're going to be going over intermediate and
even more advanced concepts and even going through portfolio projects that you can
use to put on your resume if you like this type of content be sure to subscribe
below
00:39:27
but let's get into it for today the group by statement is similar to distinct in
the select statement in that it's going to show the unique values in a column the
difference is is if we say distinct gender what's going to be returned is the very
first unique value of female and the very first unique value of male but if we say
gender and we say Group by gender it's only going to return two values but in these
two values we actually have all the males rolled up into this one row and all the
females
00:40:07
rolled up into this one row now let me further show you what that means if I say
count of gender now you can see that this whole time there were six males in this
one row and there were three females in this one row so with a distinct it really
is only showing us what value is in there that's unique but with the group by it's
showing us what the unique value is but it's also rolling them all up into one
column that we can use it for other things now real quick I want to be able to see
both of these at the same time so
00:40:42
let's just put this up here and let's run this so we can actually see both now
let's add age to this statement down here or this query and let's only run this one
and I want to show you what happens and why it happens we're now looking at gender
age and then the count of gender so if we look down here we only have one male who
is 29 we have one male who is female that's age 30 and so on and so forth so none
of these people are both the same gender and the same age if for example
00:41:22
we had two or three people who were male and who were 30 years old then we would
have a two or a three over here so this count is actually being counted at each row
that's being returned so for our data that we have today this isn't a fantastic
example CU it really split it out there any that were the same but as you can see
you can put multiple columns as long as you put multiple down here now why did we
not have to put this count gender down here in this group by that's because this
count gender is
00:41:54
actually a derived field or derived column it's derived based off the gender column
so it's technically not a real column that's in the table it's one that we're
creating that's fictional uh per se so the age and the gender are actual fields or
actual columns that are in our table they have to be down here and like I said
before it's the comparison to that distinct in the select statement because we're
looking at the distinct of gender and age so we're saying distinct
00:42:25
across multiple columns both gender and age now as we had it before we were only
looking at gender it's going to roll all of those up into just male and female but
if we want to add more we can easily add more in this group by statement we can
still do things like where age is greater than 31 we can still do those things so
let's execute this and our numbers are going to change now we're doing it based off
gender and we're looking at the count of people whose age is greater than 31
00:42:59
which is smaller than before now let's look at order bu I'll do it down here really
quick for demonstration but I am eventually going to come up here and use it
because I think it'll be a little bit better to completely round out this query
down here let me give this a name let's do count of gender and then let's come down
here and let's order by uh let's order by count gender and when we run that it's
going to do 1 three and that's because as a default SQL has an ascending feature
00:43:34
which is going to be smallest to largest going down if we want to change that we
can change it to descending that's going to be largest to smallest so now we have
31 and if we want to do it based off gender and we do it descending now we have Z
to A and so that's going to be male female and if we get rid of that it's going to
do the the default ascending and let's see what that brings female male now for
what we're trying to do let's look at this large table so I think it's going to be
a little bit more
00:44:08
descriptive or a little bit better visually let's do order by and let's do age
let's run this and it's going to order smallest to largest if we do descending it's
going to do largest to smallest now you don't only have to do just one thing you
can do multiple columns so if I wanted to do age and then gender I can do that as
well so let's do gender and let's run that so now we have the age but under the age
we also have it ordered by female and that's an ascending order so AB BC d f so
females
00:44:52
first so it's going to be female first and then it's going to be male and again
female and male now we don't have to just let it be ascending for each one if I
wanted to do it reverse in this column I can do descending now let's run that and
when we have 30 now male is first and female second and if I wanted to do that over
here I can do descending and now we have them both descending so it's going to go
top to bottom and we have 32 it's going to be male 32 female so you
00:45:26
can specify lots of different things in here and we don't actually have to use
column names we could just use numbers so if I wanted to do 1 2 3 4 5 I could but
let's try to replicate the exact same thing before this would be column 1 2 three
four so let's do where four descending and then let's do five descending and if we
execute that it's going to give us the exact same result as if we' actually put in
the column name and I I do use this a lot oftentimes I don't use the column name I
00:46:00
just if it's a small table I'll just use the number so in my actual queries I do
this a lot where I just use the number instead of the column name so that is the
group buy and the order by statement and if you have walked through my previous
videos you should be completely done with the basics of SQL so congratulations the
next thing to do is really just practice the basics because the basics are what
you're going to be using day in day out and so what I would recommend is create a
few more tables
00:46:24
query those tables try to think of use cases and what you would actually want to
know from that information after that I would move on to my intermediate videos if
those are already out and then I would move on to my Advanced videos those are
going to go over some more challenging topics but things that would be very useful
for anybody to know in my next video I'm going to be going over intermediate SQL
topics things like joins and subqueries and a ton more so if I already have posted
those be sure
00:46:51
to go check those out on my page and if I haven't I hope to have those up soon
thank you thank you guys so much for watching I really appreciate it if you learned
anything in this basics of sequel Series be sure to subscribe below and I'll see
you in the next video what's going on everybody my name is Alex freeberg and today
we're going to be starting our intermediate SQL series if you joined us for our
last series we walked through the basics of SQL which is everything you needed just
to get
00:47:13
started and in this series we're going to be walking through some intermediate
Concepts to really take your skills up to the next level now today we're going to
be walking through joins but let me show you what you can expect from the entire
series for this intermediate course so we're going walking through joins today and
then in future videos we're walking through unions case statements updating and
deleting data Partition by data types aliasing creating views having versus the
group by statement the
00:47:42
get date function primary care of your foreign key and then we're going to have an
advanced course and this is not set in stone yet but these are some of the things
that I think I will be going through or walking through we're going through CTE CIS
tables or system tables subqueries temp tables string functions regular expression
store procedures and then importing and exporting data so with all that being said
let's get into it all right now let's get rid of me because we do not need to be
seeing me
00:48:11
for the rest of the series at the very top here are some of the things that we're
going to be going through today which are inner joins and then outer joins and in
the outer joins we have a few different styles or a few different types of outer
joins now a join is a way to combine multiple tables into a single output for now
we're going to be using the employee demographics and the employee salary table so
let's get a look at both of these tables and see what's in them in our employee
00:48:39
demographics table we have employee ID first name last name age and gender and then
down here in our employee salary table we have employee ID job title and salary if
you notice they have a similar column and that's going to be the employee ID now
when you're doing a join you have to do this based off a similar column and
typically you want it to be a unique field so we're going to be using the employee
ID from both tables to join these tables together to create one output so let's get
rid of this real
00:49:12
quick and let's start building our query to join these two tables together so the
first thing we're going to do is an inner join so let's do select everything and
let's do it from SQL tutorial. db. employee demographics and let's do join we can
also say inner join but join by default is going to say iner and we're going to do
SQL tutorial. db. employee salary now we have to join them together which is what
we talked about earlier and we're going to be doing that based
00:49:54
off the employee ID so for that we have to say on and then we're going to say
employee demographics dot employee ID is equal to employee salary dot employee ID
so let's run this real quick and take a look at the output and let me pull this up
real quick so what we are looking at is actually both tables combined we have the
employee ID first name last name age gender and then here's the salary employee ID
job title salary now an in join is really only going to show everything that is the
same so in both
00:50:41
tables there are employee IDs of 10001 all the way down to 10009 but if you notice
there is data that is missing real quick let's go down to this graphic and let's
look at this inner join an inner join is going to show everything that is common or
overlapping between table a and table B so what we are looking at here is exactly
that we're only looking at the things that are similar based off this employee ID
in both tables now let's change this join to a full outer join and let's run this
and see what we
00:51:23
get now if you notice the output is very different so let's take a look at it and
see why it's so different if you notice everything down till here is the exact same
so employees 101 down to 1009 are exactly the same but once we get down to row 10
it starts to get very different now we are joining these tables based off the
employee ID so for example right here Ryan Howard has an employee ID of 101 but as
you can see in this table for salaries there is no 101 employee ID so it has
nothing to link it to so because
00:52:04
of that it fills in everything as null because it has nothing to match on this
table and vice versa in the employee salary table there's a person in here that's a
Salesman and there's no employee ID at all which means all this information is
going to be null and we can see that in this diagram right here so this is the full
outer join right here and what it is saying is we are going to show everything from
table a and table B regardless of if it has a match based on what we were joining
them
00:52:34
on so even if table a has an employee ID but there's no employee ID in table B
we're still going to show it and vice versa so now let's look at a left outer join
a left outer join is going to take the left table and say we want everything from
the left table and everything that's overlook lapping but if it's only in the right
table we do not want it now what is the left and the right table the left table is
going to be our first table that we use our right table is going to be the second
table
00:53:06
that we use so we're going to look at everything in the employee demographics table
regardless of whether or not it has a match on the employee ID in the employee
salary table so this is what that looks like so as you can see this is our entire
table for employee demographics and down here we have three that have information
in the employee demographics table but have absolutely no information in any of the
employee salary table because there's nothing to match it on so this 101 is not in
this
00:53:37
table this 13 is not in this table and this one does not even have an employee ID
so we're not going to have a match at all and if we change that to the right you'll
see the exact opposite it's going to show us everything in the employee salary
table so now we have all of our information right here from the employee salary
table and if it doesn't match in this table it's just going to give nulls so down
here we have 1,0 and obviously there's not going to be anything associated with
that because
00:54:07
there's no 10,0 in the employee demographics table and for this one we have a
Salesman with no employee ID and since there's no employee ID to tie it to this
demographics table we're going to have nothing and we can see that in the diagram
right here so for the left outer join we're looking at everything in table a which
is our demographics table and in our right outer join looking at everything at
table B which is our salary table now let's pull this down a little bit so so far
we've only
00:54:36
been using the select star so we've been selecting everything and I only did that
just for demonstration purposes but you most likely would not be doing this when
you actually use these joins what you're probably going to want to do is Select
exactly what columns you want in your output so for example let's do employee ID
let's do first name last name and let's do job title and let's do salary and let's
try to run that really quick and as you can see it is not going to work now why is
that not working it's
00:55:19
not working because we have two Fields one in each of these tables and we have to
specify what employee ID we want because that is going to drastically change what
our output is so we have an employee ID in this table and in this table which one
do we want to use so for this demonstration let's use employeed demographics.
employee ID and let's actually just do an inner join because it's easier for the
output now let's run this and see what we get so as you can see we now have the
00:55:54
employee ID first name last name job title and salary now we're doing this with an
injoin based off the employee ID from the employee demographics table but if we use
the employee salary table it should give us the exact same output and that's cuz
we're using an in join and an in joint is only going to show us everything that
overlaps between both tables but now let's try a write outer join and let's run
this now we're using this employee ID from our employee salary table and since
we're doing a
00:56:27
write outer join we're going to get all the information from our employee salary
table and it does not have to be in our left table which is our employee
demographics table so if you look at the information down here this 110 is in the
employee salary table but it's in this position because that's what we're looking
at in our select statement and then over here we have our salary and since we have
information right here which is in our employee salary table but there is no
employee employe ID our
00:56:56
employee ID is null now let's change this to look at the employee demographics
employee ID and execute it as you can see that 110 is gone now we just have this
information right down here and we didn't have the employee ID for either of these
so it's going to show it regardless and that's again because we have a right outer
join and that's why we have no employee ID down here now let's do a left join and
it's basically going to do the opposite of what we just looked at now
00:57:30
we're looking at everything from our left table regardless of if it's in our right
table and so our left table is our employee demographics table and we are looking
at our employee demographics ID so with the employee demographics ID it's going to
show us the first name and the last name which is everything in our left table our
employee demographics table and since for these IDs or lack of IDs it's just going
to give us NES in all of these places if I change it right up here to the employee
salary employee
00:58:02
ID and I execute it because we're showing everything from our left table which is
our employee demographics table we are still going to see our names but since we're
using the employee ID from our right table now we're just going to have blanks in
this information and this information now let's look at a use case for these joins
let's say Robert California is pressuring Michael Scott to meet his quarterly quota
and Michael Scott is almost there he needs like a thousand more dollars and he
comes up
00:58:33
with the genius idea to deduct pay from the highest paid employee at his Branch
besides himself so how does he go about doing this and identifying the person that
makes the most money well of course he's going to come to SQL first so we actually
want to look at a full outer join real quick and let's just look at everything so
here's what we have we have the employee ID first name last name age gender
employee ID job title and salary now what information do we need to know to get the
information that
00:59:13
Michael Scott needs well we need the employee ID we want the first name and last
name so let's write all that real quick so employee ID we need first name name we
need last name and then we're also going to need the salary cuz we need to know who
is the highest paid employee so now let's do an injin because we really only want
to look at the employee IDs where we know what their name is and their salary is
and let's do this based off the employee demographics table really doesn't matter
00:59:49
for an in join but let's do that real quick so let's look at this so we have our
employee ID we have our first name our last name and our salary and we want to do
it where it's not Michael Scott and that's because Michael doesn't want to take
away his own money he wants to take away his employees money so let's do where
first name does not equal Michael and he knows that he's the only one that is not
named Michael so now we have our list and let's do order bu and let's do
01:00:25
salary and let's execute this and let's do descending so that we can get at the
very top and this is tough tough news for Dwight shut because it looks like he is
the highest paid employee besides Michael and so it looks like he is going to get a
cut in his pay this quarter so that Michael can meet his quota so that's just one
use case let's look at one more use case let's start out by getting rid of this and
looking at everything again so for our next use case Kevin Malone who is an
accountant thinks that
01:01:06
he may have made a mistake when looking at the average salary for our salesman now
Angela Martin is very good at SQL and so what she is going to do is she wants to go
in and calculate the average salary for our salesman so let's try to get that
information so all we're going to need is the job title and the salary so let's
come up here and let's get job title and let's get salary and let's look at this
and now we only want to look at where the job title is equal to salesman now the
very last thing we want
01:01:48
to do is we want to say we want the average of salary now since we're going to need
to do a group buy we're going to have to get rid of this salary and just take job
title write down here and do group by job title so we're going to have job title
and then the average salary and there you go we have the salesman and the average
salary is 52,000 so Angela now knows to go back and fix what Kevin made a mistake
on so that's how you use joins I will includ include this image in the description
so
01:02:27
you can go and look that up yourself if you are curious and want to look at that
that really helped me out when I was first getting started to kind of conceptualize
and understand what kind of data I was pulling based on what join I was using so I
hope that was useful to you as well in the very next video we're going to be
looking at the union so if that is posted be sure to check that out next thank you
guys so much for joining me I really appreciate it if you like this type of content
or got anything out
01:02:51
of it today be sure to smash the like button smash the Subscribe button and I'll
see see in the next video what's going on everybody my name is Alex free in today's
video we're going to be looking at unions now in the very last video we walked
through joins and I thought it was appropriate to look at unions next because
unions and joins are somewhat similar or closely related and that's because in both
instances they're combining two tables to create one output now what's the
difference the
01:03:18
difference is that a join combines both tables based off a common column and in
last video that was the employee ID so in both tables we had an employee ID and
when you're selecting your data you have to choose either to only select one
employee ID or you can choose both employee IDs but they're in separate columns and
with a union you're actually able to select all the data from both tables and put
it into one output where all the data is in each column and not separate it out and
you don't have to
01:03:51
choose which table you're choosing it from now that may not have made1 100% sense
but let's look at it real quick in stages so let's go down here and let's actually
join this table together and see what we get now the two tables that we're looking
at is employee demographics and warehouse employee demographics so over here we
have our employee demographics information and then over here or actually down here
we have our warehouse employee demographics now right now I'm doing a full outer
01:04:22
join so we're looking at all the data and if we were to pull this in to an Excel
spreadsheet we could just copy this and paste it over here and we would be good to
go and that's because we have all the same columns first name last name age gender
first name last name age gender but if we tried to combine this in a query where we
have this information right here it wouldn't work we cannot get it in the same
column and that's where a union comes into play so let's go back up here and let's
actually
01:04:53
run both of these now as you can see they have the exact same columns and that
makes it super easy for what we're about to do all we're going to do is between
these two queries which are completely separate right now all we're going to do is
write Union so let's run just this now because of the Union you can look down here
and the information that used to be in the other table which were in separate
columns are now added Down Below in the exact same order now Daryl filin was
actually in both tables and
01:05:34
the reason he isn't showing up multiple times is because this Union is actually
taking out and removing the duplicates kind of like a distinct statement now
there's actually another thing called Union all and if we do Union all it is going
to show us all of the information regardless if it is a duplicate or not so let's
run that real quick and they they are both there but let's order by and let's do
employee ID so now let's run it and as you can see right here these are exact
01:06:08
duplicates and so the union got rid of it because they were the exact same but the
union all kept it in because it is showing just the data as is now let's get rid of
this Union all because the only reason why it works so well is because those two
tables were exact same they were employee ID first name last name age gender so
they're basically the same tables just with different information so it made it
really easy but we have another table employee uh salary and let's look at these
two
01:06:42
tables so these two tables are obviously very different they hold different
information now we would still be able to combine them so let's do employee ID
first name and let's do age now down here on the employee salary table we will do
employee ID job title and salary now let's use a union really quick and run this
one and it is still going to work now why does this work well first off the the
reason it's working is because these data types are the exact same or at least
similar so text and text age which
01:07:32
is an integer salary which is an integer it has the same amount of columns so three
and three so we have employee ID first name and age and it's taking that from the
first select statement and it's still using a union to take the data from the
second select statement so it's still inserting this information now this is not
what you want to do because right here we have first name and it's salesman
salesman and then our age we have 30 45,000 and 45,000 is obviously not an age so
you want to be careful
01:08:04
when you're using a union to combine two separate tables and make sure that the
data you're selecting is the same in the very next video we're going to be walking
through case statements thank you guys so much for joining me I really appreciate
it if you like this type of content be sure to subscribe below and I'll see you in
the next video what is going on everybody my name is Alex freeberg and today we're
going to be walking through cas statements in SQL a case statement allows you to
specify a
01:08:28
condition and then it also allows you to specify what you want returned when that
condition is met so we're going to be using this employee demographics table that
we're looking at right here we're going to walk through the syntax of how to create
a case statement and then we're going to actually go into some use cases at the end
so let's start off by specifying what columns we want let's say we want the first
name we want the last name and we want want the age now let's just get that
01:08:58
information now for our case statement we're going to be using this age column so
we actually want the age to be in there so let's specify where age is not null and
run that so now we have a pretty good look at it and let's just order by H just to
clean it up a little bit so now let's start building our case statement so we're
going to say case and then we want to say when now we need to specify what
condition we want to look for so let's do when age is greater than 30 then then
what do we want to be
01:09:36
returned so we want to return that they are old else so that means anything that is
not over the age of 30 we want to return young and then you need to specify that
you done with the case statement and so you will write end at the very bottom so
this is our first case statement let's run it and see what we get so as you can see
a new column was created and if the person is over the age of 30 so 31 and up they
are given old and if they're not over the age of 30 they are given young now we can
do as many when and
01:10:13
then statements as we want so if we want to we can also do when the age is between
27 and 30 then we want to return young and anyone else we're going to call a baby
so now we have Ryan Howard as the baby anyone between 27 and 30 they're considered
young and anyone over the age of 30 is old now something to note is that the very
first condition that is met is going to be returned so if there are multiple
conditions that meet the criteria only the very first one is going to be return
returned and let's
01:10:55
demonstrate that real quick so if the age equals 38 then return Stanley because
that is Stanley uh and let's execute this real quick so right here I'm specifying
that if it's 38 it should return Stanley but he is right here and it still says old
and that's because this condition was already met now if we were to put this right
here it should work correctly and let's try it out so now because this condition is
met first it is going to return Stanley down here so now let's get into
01:11:33
our first use case let's start off by copying this and then commenting it out I
only did that because I don't want to rewrite it because I'm lazy uh let's get rid
of that and let's look at this real quick we are going to join on another table
that we have really fast um that's going to be SQL tutorial if you watched my other
videos then you know this table and we're going to do that on employee
demographics. employee ID is equal to employee salary. employee ID okay so let's
just
01:12:15
look at everything in these tables really quick now we are going to be focusing on
the job title in the salary column but we want their first name and last name as
well so let's start building that out let's do first name last name job title and
salary and let's look at this really quick so now we have our employees and here is
the situation we had a fantastic year this year selling paper and corporate has
allowed Michael Scott to give out a yearly raise to every single employee but not
every
01:12:45
employee is going to get the same raise because our salesmen are genuinely the
people who made us our money and they're going to get the biggest raises well other
people really aren't going to get that big of a raise so now let's go through and
create a case statement to calculate what their salary will be after they get their
raise so let's start off by saying case and when and we want it to say when job
title is equal to salesman so when they are a Salesman what do we want to happen so
this is
01:13:17
where the calculation occurs so we're going to take their salary and then we're
going to add their salary times how much their raise is going to be so the salesman
did really really well and we want to give them a 10% raise this year now when
their job title is equal to accountant then and we'll take their salary we will
give them let's give them a 5% raise still very generous there we we go and when
the job title is equal to HR then it's going to be the salary plus the salary times
and then we're going to do
01:14:14
01 all right and else we are just going to do salary plus salary oops let's do
parentheses times and let's just give everyone else a 3% rays and then we'll write
end now let's take a look at our results so here's what we have so far we have our
first name our last name our job title and our salary that is our current salary
and then we're going to have our salary after we get our raise so I'm going to
actually write that up here so let's do as salary a after raise and let's execute
01:14:57
that so let's look at these raises really quick so we have 45,000 and since he is a
Salesman he gets a 10% raise which is a raise of $4,500 so 45,000 plus $4,500 is
$49,500 and as you can see down here we have HR who is making $50,000 and now he is
making $5,000 5 so everybody got a raise so that is our case statement I hope that
was helpful I find myself using the case statement a lot when I'm wanting to
categorize things or label things and that's kind of what we did in the first
example and you can even do calculations
01:15:33
like we did in this use case so I hope that was helpful thank you guys so much for
watching I really appreciate it if you learned anything from this video be sure to
like And subscribe below and I'll see you in the next video what is going on
everybody my name is Alex fre and today we're going to be looking at the having
Clause now the having Clause I feels a little bit unappreciated in the SQL
Community I feel like it doesn't get a lot of love and so today I want to describe
how to use it and what it's
01:15:59
used for so before we use the having Clause I want to set up our query here uh we
want to use an aggregate function in the group by statement and then I will show
you how to use this having Clause so let's look at the job title and let's look at
the count of job titles and then down here we need to do group by job title and
let's execute this and here is our job titles and here's the count of how many
people have those job titles so now let's say we want to look at all the jobs that
have
01:16:34
more than one person in that specific job so let's do where uh the count of job
title is greater oops is greater than one and let's run that and as you can see
we're going to get this this message right here now let's read it an aggregate may
not appear in the wear Clause unless it is in a subquery contained in a having
clause or a select list and the column being aggregated is an outer reference what
that is basically saying is is we cannot use this aggregate function in the wear
statement we need
01:17:13
to use a having Clause so let's get rid of this and let's say having the count of
job title greater than one I did the same thing again and let's execute this and
we're still going to get an error now why are we getting that error the reason is
is because this having statement is completely dependent on the group by statement
because we are performing this after it has been aggregated so this having
statement actually needs to go after the group by statement because we can't look
at the aggregated information
01:17:48
before it's actually aggregated in that group by statement so now let's run this
and and it worked perfectly so now we only have the jobs that have more than one
employee for that job title so now let's look at one more example let's do the
average let's say salary and let's get rid of this having Clause real quick and
just to look at this information uh let's do order by and we'll do average salary
so let's look at this and we have 36,000 to 65,000 so in the middle we got
01:18:29
44,500 so let's use this having statement and let's say the average of salary where
it is greater than 45,000 and we actually need to put this right here right after
the group buy and before the order buy so let's run this and see what we get and it
worked perfectly so now we're looking at the job titles that have an average salary
of over $45,000 so there you go that is the having Clause definitely one that is
good to know and is very useful in specific situations thank you guys so
01:19:08
much for watching I really appreciate it if you like this video or learned anything
today be sure to subscribe below and I'll see you in the next video what is going
on everybody my name is Alex freeberg and today we're going to be looking at
updating and deleting data in a table now what's the difference between inserting
data into a table and updating data insert into is going to create a new row in
your table while updating is going to alter a pre-existing row while deleting is
going
01:19:36
to specify what rows you want to remove from your table so let's get going with the
updating so down here Holly flax does not have an employee ID age or gender now we
want to update this table to give her that information so let's do update now we
need to specify what table we are going to be hitting off of so let's do SQL
tutorial. db. employee demographics so now we're going to use something called set
and set is going to specify what column and what value you actually want to insert
into that cell
01:20:09
so let's set her employee ID equal to and it's going to be 1,2 and we have to
specify which one to do this to because if we ran just this is going to set every
single employee ID to 112 because we haven't specified that we only want Holly
flax's row to be updated so now we have to specify where first name is equal to
Holly and last name is equal to flex so now let's run this and see what we get so
one row has been affected let's see what we got and there we go as you can see the
employee ID was updated
01:20:59
exactly how we specified it right here so we also want to update age and gender and
let's do that in the same query so let's set the age equal to 31 and instead of
using and we actually need to use a comma so let's say age equal to 31 comma gender
is going to be equal to female and let's write this and see what we get there you
go now let's look at our table and as you can see it was updated to 31 and female
so very easy very easy to specify what you want often times uh tables like
01:21:39
this will have a unique key like employee ID is our unique key in this table so I
could easily just say uh where the employee ID is equal to and then you know 102 so
it's an easy way way to specify what employee you're trying to update so now let's
look at the delete statement the delete statement is going to remove an entire row
from our table so let's do delete and we actually need to say from and we have to
specify what table we want to be removing this information from so let's do SQL
tutorial. db.
01:22:14
employee demographics and now we need to specify what row we want to remove so
let's do where employee ID is equal to and let's choose a completely random
employee ID 105 so let's run this and see what happens so one row is affected let's
look at our table and as you can see 105 is now gone now you have to be very
careful when you use the delete statement because once you run it you cannot get
that data back there's no way to reverse a delete statement so if I had gotten rid
of this wear statement
01:22:52
and I ran this it would delete everything from the entire table and you could not
get that data back so a little trick that I use before I actually run a delete
statement is I make it a select statement because you're going to select everything
where the employee ID is equal to let's just do 1,4 and now when you run this you
are going to see exactly what you will be deleting and now we know that Angela
Martin that entire row is going to be gone if I hadn't done that and I just went
like this and I wrote delete and I
01:23:24
only had this running I would not know that this information is going to be the
only one that's gone maybe I made a mistake down here maybe I accidentally put
something in there that wasn't supposed to be in there and now I'm deleting much
more than I thought I was actually going to delete so using the select statement
can be a very good Safeguard against accidentally deleting data that you do not
want to delete so that is update and delete thank you guys so much for watching I
really appreciate it if you
01:23:49
like this video be sure to subscribe below and I'll see you in the next video
What's going going on everybody my name is Alex free and today we're going to be
talking about aliasing now all aliasing really is is temporarily changing the
column name or the table name in your script and it's not really going to impact
your output at all aliasing is really used for the readability of your script so
that if you hand this off to somebody or somebody comes behind you and starts
working on this they can more
01:24:12
easily understand it and it may not sound super useful especially for small scripts
like what we have on the screen but when you start getting to larger scripts where
you have six seven or eight joins and you're selecting 10 different column names it
actually is very useful and very important so let's get into how that actually
works and then I'll have an example later of how we can use aling with a little bit
of a larger query so in this table let's select first name and execute what we want
to do is just write
01:24:42
as and let's do FN name and all that's going to do is it's going to rename this
column from first name which it was originally named to FN name now you can can use
as but you can also just get rid of that and do it exactly how I have it and it's
still going to work perfectly you can either use the as or you can not use it I
typically don't I just put a space in between the actual column and the Alias now
let's look at an example of how this might actually be useful so we have a first
name and a last name in
01:25:10
this column so what we're going to do is actually combine those so let's do plus
and let's add a space in there and let's do a plus and let's do last name so this
is going to take the first name add a space and then do the last name and we're
going to do that as and let's do full name and let's execute this so now we have a
column called full name which is our Alias so we've combined the first name and the
last name column into one single column and we've renamed it full name if we had
not
01:25:40
used this Alias at all it would have just said this which is no column name at all
we don't typically want that when we have an output we want to give this column a
name so that somebody who's actually looking at the script or who's looking at the
output of the script actually understand what is contained within this column so
for that we're just going to keep it as full name now another time that you're
often going to use aliasing in the select statement is when you're using aggregate
functions so
01:26:05
in this table we have age so let's pull that up really quick so we have age right
here and let's actually just do the average age and when we execute this we're
going to get no column name and 31 so we want to do is give it average age and when
we do that we now have a column name and again you want to have a column name in
case someone comes up behind you and is reading the script so that they understand
what this column is being used for now that we've looked at aliasing column names
let's look at
01:26:38
aliasing table names it basically is the exact same thing uh we're just going to
write as and let's do demo for demographics and let's do demo Dot and it's going to
give us all of our options and we'll do employee ID so when you alias in a table
name when you are selecting in the select statement you actually need to preface
your column name with a table name or the table Alias Dot and then employee ID and
this is extremely important to do especially when you have a lot of joins
01:27:11
that you're doing or you're selecting a lot of columns when you have several joins
because it can get very very messy quick so let's actually join this to employees
salary and let's do that on demo. employee ID is equal to s. employee ID so now
let's do demo. employee ID comma s do and let's do salary so looking at the script
now is very clean it is very easy to understand and that is what's so important
with aliasing if for for example we took this off every time we wanted to reference
01:27:56
this table we would have to put the entire table name and putting the entire table
name is correct it just is very cumbersome and does not look clean at all and so
using something like demo as an alias makes it a lot more easily readable and a lot
more manageable when you're looking at it when you have a very long script let's
look at this queer where we're joining together three Separate Tables and after
each table we have an alias for employee demographics we have a employee salary we
have B and
01:28:22
warehouse employee demographics we have C now unfortunately I have seen a lot of
scripts that look exactly like this and this is what you do not want to do you do
not want to use your aliasing to just write an a a b or a c that is very frowned
upon when writing queries because it really doesn't give any context to what the
table that you're referencing is and it gets really confusing as this query
continues to grow and as you add more columns to your select statement it makes it
more difficult to understand where those
01:28:47
columns are coming from and so when I'm reading that I say select a. employee ID
okay what's a a is employee demographics so you really do not want to do that now
let's look at an example of what it should look like so for employee demographics
instead of having an alias of a a I used demo for demographics for employee salary
I used s and for warehouse employee demographics I used where now this is not
perfect by any means but in the select statement if you're just glancing at it you
can
01:29:15
easily understand which columns are coming from which tables so when I look at
employee ID I know that's coming from employee demographics CU I have demo as the
Alias so it's a lot easier to understand and when you hand this query off to
somebody it is going to be a lot easier for them to read through it and understand
where those columns and those table names are coming from and so they will
appreciate that in the long run so that is all I got that is aling again not a
super tough subject but a really
01:29:40
important one to understand especially as you start working in teams and as you
start creating more and more complex queries you want to have it more organized and
more easily readable and so it may not come into play with those really simple
queries but again as as you build out those more complex queries this becomes very
useful I really hope you enjoyed this video if you did be sure to comment and
subscribe below thank you so much for watching and I'll see you in the next video
what's going
01:30:04
on everybody welcome back to another intermediate SQL tutorial today we're going to
be covering Partition by now Partition by is often compared to the group by
statement the group by statement is a little bit different the group by statement
is going to reduce the number of rows in our output by actually rolling them up and
then calculating the sums or averages for each group whereas Partition by actually
divides the result set into partitions and changes how the window function is
calculated and so the Partition by
01:30:29
doesn't actually reduce the number of rows returned in our output let's get started
to look at the actual syntax of how to use Partition by and then we'll compare it
to the group ey statement later just to see the differences between the two we're
going to be using these two tables on our left over here so I'm going to pull those
up really quick so let's run this and let's look at the two these two tables Side
by well one underneath the other really quick so what we're going to be using to
01:30:51
demonstrate these partitioned by is this gender column as well as this salary
column and so we just need to join these two tables together on the employee ID and
then we'll go from there now I'm not going to bore you with that I'm going to skip
ahead and we'll actually look at how to use this partition bu so I've joined these
two tables together and this is our output but we don't want every single column
I'm going to start selecting some of these columns and then we'll start using this
partition Buy and
01:31:13
see what the output looks like after that all right so let's go right up here let's
choose the first name let's do the last name we'll do gender and let's do salary
and now we want to identify how many male and female employees we actually have and
so we're going to say count of gender and this going to be over and now we're going
to do our Partition by and we're also going to partition that by the gender as
total gender now I'm going to come back to why we did each part but I
01:31:50
want to see the output first and then we come back to why we wrote it this way so
let's just do this really quick so it's going to be a little bit different than
what you typically would expect in a group by statement the group by is going to
roll everything up and you typically wouldn't have like a first name last name in a
group by statement because it would be very hard to roll all those things up into
those individual columns and to reduce the number of columns that are in your
output and so in our output we can see
01:32:18
Pam Beasley she's a female she makes $36,000 as a salary and there are three total
women that work alongside her in this employee demographics table and so in our
total gender column over here this is where we use the partition bu and if we used
a group bu statement to get this kind of information all we would be able to do to
get this information in a group by statement is say select gender count of gender
and then Group by the gender down below underneath the join so because we're using
the partition bu we're able to
01:32:48
isolate just one column that we want to perform our aggregate function on and so
we're able to add things like the first name and last name columns even though we
aren't trying to include that in any partition or group by statement yet we're
still able to add the aggregate function to each individual row while still
maintaining those other columns let's take this entire query and let's basically
just transform it into a group by statement and we'll see kind of what that looks
like and what the difference
01:33:11
is so all I'm going to do is get rid of all this I'm going to copy all of this and
I'm going to say Group by and I'm going to do that because we have to use all these
columns in our group by statement so let's execute this and as you can tell we are
not able to see the output for the aggregate function that we were hoping for if we
wanted to get the same output that we had before where we're showing three for
females and six for males what we'd have to do is get rid of this first and last
01:33:45
name and the salary and do the same thing in the group by statement and so let me
get rid of these really quick and run this and so what the Partition by is doing is
basically taking this query right here and sticking it on one line in the select
statement and so I hope now you can see how valuable the partition bu can be if
used correctly thank you guys so much for watching I really appreciate it if you
like this video be sure to like And subscribe below and I'll see you in the next
video what's going on everybody
01:34:18
welcome back to another squl tutorial today we're going to be talking about CTE a
CTE is a common table expression and it's a named temporary result set which is
used to manipulate the complex subqueries data now this only exists within the
scope of the statement that we were about to write once we cancel out of this query
it's like it never existed a CTE is also only created in memory rather than a
tempdb file like a temp table would be but in general a CTE acts very much like a
subquery and so if
01:34:46
you know how to do subqueries you should be able to pick up on CTE fairly easily so
let's get started writing our very first C CTE and we're going to come down here
and we're going to say with and we're going to write CTE employee and we're going
to say as and this is where everything's going to start now CTE are sometimes
called with queries I've never personally used that but I've seen it called that
online but that's because it uses this with statement right at the very beginning
so
01:35:11
now we have with CTE employee as then we have an open parenthesis and now we have
to construct our select statement and this is kind of where we build out our quote
unquote subquery and so I'm going to take in a select statement that I actually
used in a previous video where we using the partition bu and so I'm going to put
that in there and I'm kind of walk us through what that does and how we're going to
use this so I'm going to paste this down right here and I'm actually going to go
like this just to
01:35:37
make it look a little nicer and then I'm going to close the parentheses at the end
so now we have our CTE in place and as you can see it is basically just a select
statement within the with CTE employee as and what this is going to do is going to
take the first name last last name gender and salary and then it's going to take
this aggregate function with the partition buy aggregate function with the
partition buy and it's going to place it to where we can now query off of this data
so
01:36:02
it's putting it basically in a temporary place where we can then go and grab that
data so all we're going to do at the very bottom is we're going to say select
everything and we can do that from CTE employee so let's run this entire thing and
see what we get so as you can see this select everything from CTE employee we are
selecting everything from this select statement and so this feels a lot like a temp
table we're actually quering off of a temp table but it actually acts a lot
01:36:34
more like a subquery now we don't have to the select everything we can just do
first name and let's do average salary and when we run this we'll just get those
two columns and we don't have to go through and actually write this out each time
it's just in this CTE for us so it does all the heavy lift within the CTE and then
we can just query off of what we want now something to note is that the CTE is not
stored anywhere and so it's not stored in some temp database somewhere if I try to
run just this by
01:37:02
itself it is not going to work so let's try that out really quick and we should get
an error and that's because each time we run this query is actually creating the
CTE again and so it's not being saved anywhere and so each time we run it we have
to run it with the entire CTE another thing to note is you actually have to put the
select statement right after the CTE if I try to go down here and say select
everything from uh let's do CTE employees it doesn't actually work it's not going
to come up at all and
01:37:29
that's because it only is going to work with the select statement directly after
the actual CTE that you've created I hope this was helpful and I hope that you
understand how to use a CTE a little bit better again you don't have to go super
complicated with the select statement within your CTE it can be very very simple I
just wanted to demonstrate that you can use aggregate functions within your CTE and
then just query off of those without having to do the the aggregate function again
which I find is
01:37:53
very very useful again thank you for watching if you like this video be sure to
like And subscribe below and I'll see you in the next video what's going on
everybody welcome back to another squl tutorial today we are looking at temp tables
and if you can guess it based off of the name they're kind of like temporary tables
and we create them very much the same way we're going to do create table um it's
just a little bit different and you can hit off of this temp table multiple times
which you
01:38:19
cannot do with something like a CTE or a subquery where you can only use it one
time or with a subquery you need to write it multiple times within a query and so
these temp tables are extremely useful I'm going to kind of talk about how you can
use them as we're going uh throughout this video but let's get started right away
with actually creating one looking at it inserting some data and and and kind of
showing you how temp tables work and what we can do with them so uh we're going to
start
01:38:50
off with create table much like uh a regular table is created the only difference
is we're going to do this pound signed and then we're going to do tempcore employee
uh so literally the only difference between a regular table and a temp table is
this right here at the very beginning this this pound sign so uh let's just start
by doing employee ID we make that an integer we'll do job title and we'll make that
a varar 100 and then we'll do salary and let's make that an
01:39:32
integer and so now we have our temp table uh let's go ahead and create it so now we
have our temp table created and so we can look at it really quick so let's select
everything from and we'll do temp employee so let's take a look it's completely
empty um and we can insert data very much the same way we'd insert data into a
regular table so let's start doing that let's do insert into and we'll do temp
employee and we'll do values and let's just do something
01:40:16
really quick because I'm going to get to a little bit more interesting stuff in a
second oops so we'll make this person HR that's their job title then for salary
we'll give them 45,000 and close it off so let's run this and let's select
everything again and see what's in there perfect so we were able to insert data
into this temp table and again we we don't have to create this every single time we
um um or we don't have to run this every single time we need to hit off of it
01:40:54
like we did a CTE if you watch my previous video and this one we can just run it
and it sits there and so U again it feels very much like a real table and I'm going
to get to a little bit of the nuances of of the and the differences between a
regular table and a temp table in a second but let's really quickly um we want more
data in there you don't have to just um do it value by value we can also just do um
uh where we select all of the data from a specific table and insert that into a
temp table and that is really
01:41:31
quickly you know how I do it most of the time most of the time I'm not inserting
values um I am you know taking a large table and taking a subset of that and then
sticking it into a temp table so let's look at this really quick and and run that
so now we took all of the data from employee salary and then we just stuck it into
this table and really quickly this is one of the big uses of a temp table we had
let let's say for example that this employee salary table had a billion rows or or
or just an
01:42:13
extremely large number and we were trying to uh you know hit a somewhat complex
query off of it where we're using joint coins and we're using U maybe some window
functions or different things you know it would take a very long time to hit off of
this but what we can do is we could insert that data into this temp table and then
we can hit off the temp table and it already has that sub uh that subsection of
data that we're wanting to use for all of our later queries so really quickly
that's
01:42:46
kind of um kind of a use case for that so let's go down here we're going to kind of
create another one and this one's going to be a little bit more advanced a little
bit of how I would actually use a temp table above was just kind of showing the
basic syntax how you kind of put data into it you know kind of how it's used now
I'm going to show you kind of how I would actually use it so let's do create table
uh let's do temp oops create table uh let's do temp uh employee
01:43:20
2 and then let's do open parentheses and we'll do job title and we'll make that a
varar 50 and then we can do employees per job we'll make that an integer now we
need average age make that an integer and the very last one will be average salary
I'll make that an integer as well and let's run this oops so we have our second
table now we want to insert data into this one so we're just going to do insert
into and we'll do temp employee 2 and for this one I'm going to take a query
01:44:09
that we used in a previous video and so I'm just going to copy and paste that to
save time uh and then we'll keep on moving from there all right so I'm just going
to paste that in we will run this and really all it's doing is from this these
tables it's taking the job title we're getting a count on the job title average age
average salary and that is it um so let's see if that worked which it looks like it
did but you know let's actually take a look at the [Music]
01:44:41
data and so now we have this subsection of data from this join above and what this
is going to do is is whenever we want to run this we don't have to run it on these
two tables and create the join and then do the calculations which takes time what
it's going to do is it's going to take this these exact values and place this into
this temporary table and if we want to run further calculations on these values we
can easily do that in a fraction of the time instead of having to run this
01:45:14
every single time which will take up so much uh uh processing power and it will
reduce your runtime dramatically when you're placing this data in this temp table
and hitting off of that instead of all these joints and everything above uh a lot
of times these temp tables are used in store procedures now if you haven't learned
about store procedures or used stor procedures at all you know that's okay I still
want to show you something that might be useful um although this is used a ton in
store
01:45:43
procedures so for example let's say we have a store procedure set up we run the
store procedure and we get an output and you know we for whatever reason want to
run it again and when we run it again uh we get this error and you know this temp
table lives somewhere it it doesn't live in an actual in the actual database uh but
it lives somewhere and so when we run it again we get an error because there's
already a temp table created one trick or one little tip that I would give is doing
something like this saying
01:46:16
drop table oops I don't know why I did so many spaces drop table if exists and
we'll do temp employee 2 just like that now what this is going to do is when you're
running that store procedure over and over and over again you're getting error or
whatever for whatever reason you need to run it multiple times every time that you
run it it's going to encounter this and so if that already exists it is going to
delete that table and then allow you to create it again and this is just a
01:46:54
really good thing to do so now if you see down below I can run this time and time
and time again and it is going to work every single time because it is checking to
see if that exists and if it does it deletes it and then I can create again and so
that is just a helpful tip if you're going to try to use this I highly recommend
adding that to your query just to make sure things run smoothly I know there is a
lot more that can go into temp tables a lot more of the technical aspects or the
DBA stuff
01:47:25
um obviously I just want to teach you how to use it and what you might use it for
and how to actually write it out but you know there are a lot more things that you
can do research on about processing speed and storage but unless you are something
like a DBA you probably don't need to worry about those things and so if you are a
DBA I do recommend looking into those things making sure you understand how that
works how this data is stored uh so that when people use them or you are using them
you know what's going on in the
01:47:54
background but for getting up and running with temp tables I hope that this was
helpful thank you guys so much for watching I really appreciate it if you like this
video be sure to like And subscribe below and I'll see you in the next [Music]
video what's going on everybody welcome back to another SQL tutorial today we're
going to be looking at string functions some of the things that we're going to be
looking at are things like trim replace substring and upper and lower uh we're
going to create a new table insert
01:48:32
a little bit of bad data into it and then we're going to be using that to work on
our string functions today so I already have this set up right here um I'm going to
put this in the GitHub that you can just download this you don't have to you know
type this out manually so go look in the description if you know you just want to
get that off the GitHub and download that and copy and paste it save you a little
bit of time but let's go ahead and run this really quick and as you can see in this
table
01:49:00
we have uh our data right here give me one second so in this employeee errors table
basically what we have actually let me pull this back up basically what we have is
in this first one we have here we go we have some uh basically blank spaces on the
right side the second one some blank spaces on the left side U we also have Jimbo
which is an error because his name is Jim um and Halbert because his name is
actually Halbert um and then for Toby for whatever reason that o is capitalized and
then uh Michael got in here and
01:49:38
added this extra part so we're going to have to figure out a way to take that out
when we're doing our query and that'll come in a little bit later I think in the
substring section so let's get into it right away let's start using uh our left
trim and right trim we're going to kind of go through each one um pretty quickly
hopefully I'm not not trying to make this a super long video because we got a lot
of things to get through in this one video uh so I'm going to go through the
01:50:05
trim right trim and left trim let's look at uh the employee ID because that's the
one where we have some blank spaces on the right and the left side the left side
you'll be able to obviously you're going to see that one much easier but uh let's
start walking through this so let's do select employee ID and before we get any
further let me just get the employee errors on here so we can um so that we can see
everything as it comes up so we're just going to do trim and then type in the
column that we want
01:50:41
to uh take these blank spaces out of that's where the trim does the trim gets rid
of Blank spaces on either the front or the back or or the left on the right side so
on both sides that's what trim does and we'll say as ID trim so let's run this one
really quick and as you can see this is our regular employee ID and so you know you
can't visually see it as easily on this first one but there are blank spaces after
this 101 and we got rid of those and then there were blank spaces before the 102
and we got rid of
01:51:15
those now I'm just going to copy this uh two times because it's basically the exact
same thing but uh I'm going to show you them all at the same time so it's the exact
same thing except lrim and right trim uh and let's take a look at all these at the
same time and let me pull it up so in the me see if I can get these all in here
okay in the trim it got rid of both the left and the right side so all of these
were fixed in the employee ID for the left trim we're only getting going to be
01:51:50
getting rid of this one this one still has um blank spaces on it and when we do the
right trim we're only going to get rid of the stuff on the right side so this one
doesn't change because this is on the left hand side where the blank spaces are so
this one was fixed again not super visual so you can't really see it but that one
is fixed uh let's move on to the next part uh which is using replace so for this
one we're going to be looking at the last name so let's go back up really quick to
the employee
01:52:25
errors uh as you can tell the last name um the biggest one where we kind of want to
take something out of because we don't want that um that Dash fired still in there
we're going to replace that and so let's look at how to do that um let me just copy
this real quick and get rid of this top part um so we're going to do the last name
so let's just start off with our last name um and then just as a baseline so we can
see what it looks like before and then we'll do replace and all we're going to
specify is the
01:53:00
column that we want uh to do the replacing in we're going to specify the value that
we want to replace so in this it's going to be Dash fire oops got a little
aggressive on that one dash fired and we're going to indicate what we want to
replace it with now I'm just going to replace it with blank um and we can say as
last name fixed so let's see what this looks like really quick and it looks like it
worked so in this last name it originally had flenderson DF fired and when we
replaced
01:53:36
it and we took that Dash fired and replaced it with basically nothing uh it then
fixed it and so now it looks correct all right let's move on to the next one I
think this one might be um the the longest one to write but that is the substring
um and let me take this real quick trying to save us some time so substring is very
is very very unique you can specify um in a either a number or a string you can
specify the place that you want to start and then you can also specify how many
characters you want to
01:54:16
go out um and and and it pulls that in so just as a really quick example um and
then I'm going to show you kind of a use case for this one that I think is pretty
cool that um you know maybe let me see so that maybe that you'd find useful so I'm
going to do first name and then I'm just going to do one comma three so it's going
to take the first name it's going to start at the very first um very first letter
or number and it's going to go forward three spaces or three spots
01:54:50
spots so let's just take a look at what that looks like so for our table it's going
to take Jim Pam and to or or Tobe for Toby um and so it's only going to take the
the first three because you're starting at number one now what if we started at
three so we do three comma 3 it's going to go to the third um digit or or third
letter and then it's going to go forward three so you kind of get a sense of how
this works now I'm going to show you something that I think is very
01:55:25
interesting that I think you guys will also find interesting uh let me fix that CU
I just messed it up so if you've ever heard of something called fuzzy matching now
if you don't know what fuzzy matching is I'll give you an example let's say in one
table my name is Alex and in another table my name is Alexander if we tried to join
those two together based off of my name they will not join because one is Alex and
one is Alexander there's not they're not an exact match but for if I take the
01:55:53
substring and start position one and move forward four characters it's going to
take Alex from both and then it will match them together uh and say that they are
the same so that you know it may not be perfect that's why it's called a fuzzy
match because it can work for a large majority of the time but it's not going to
work every single time and so I want to show you how we can use this here um really
quick I need to join this to um the demographics table so I'm going to do that
really
01:56:27
quick bear with me for just one second let's try to make this at least look
somewhat good so what I'm going to do is I'm going to start off by saying um let's
tie it to the first name uh let's do whoops let's do air. first name is equal to
the demographics table first name okay so I want to see and I'm just going to do
first name for ER and let's do them. first name so let's see what comes up when we
do it like this so the only one that is going to work is Toby and that's because
even
01:57:09
though it has a capital O it's still going to take it um so you know we want to get
all of them to match and we can do that but it's going to be um a little bit of a
different way than maybe is perfect but that's why they call it fuzzy matching so
we're going to use substring on this so I'm going to say substring oops let me that
right so I'm going to say substring and we're going to go one three so starting at
the first position and going forward with three and we're going to do the exact
same
01:57:38
thing on the oops subst string it be great if I could spell that correctly we're
going to do the exact same thing so one and three so we are actually going to take
this give me a second missed that we're going to take this up here and we're just
going to go like that and I why did I copy it with the error okay so let's run this
really quickly and as you can see it is now going to match all of them and you can
do this on a lot of different things typically when I'm doing a fuzzy match
01:58:17
like this I'm not just going to do it on a first name right because if every there
can be a ton of people named JY you know we want to do it on uh and and real quick
let me actually show you um what the originals looked like just to make sure I hit
the the point across um and that is going to be first name and come all right so
real quick let's actually look at this so it originally was Jimbo Pamela and Toby
uh in this one was Jim Pam and Toby And so when we just took the first three
because it was Jimbo it then becomes Jim
01:59:01
it was Pamela it becomes Pam now it matches and so that's what that's kind of the
example that we're going for like I was saying I typically will not just filter on
a first name because there's going to be a ton of people named Alex or Jim or or or
you know Henry or whatever you're going to do this on many different things so
would be doing it on things like uh if I'm trying to do a fuzzy match on a person I
do it on their gender to make sure that their gender is the same um and I
01:59:27
wouldn't probably need to use a substring for that but just to kind of give you a
little bit more information I need to do it on the last name um so I need to use
that substring again and I would probably do it on the age oops the what am I doing
come on the age and the date of birth okay so all of those things if you if you
fuzzy match on the first name and the last name and then the gender the age and the
date of birth are all the same then you can typically get a very high accuracy in
matching people across
02:00:04
tables whether or not you have you know this is an example if you don't have like
an employee ID which is what we do have but take for example we were not given that
uh this is a way to match them using substrings let's move on to Upper and lower
all upper and lower is going to do is basically take all the characters in The the
text and make them either upper or make them lower so it's very self-explanatory uh
let me copy this up here and we will get going on this one uh let's just look at
the first name
02:00:44
um specifically we're going to be looking at Toby right here so let's do first name
let's do uh lower and all we have to do is put in the column that we want to do so
this is our original first name and it then takes every single uh string that is in
here or every single I guess character and and it makes it lowercase that's all it
does uh and it is the exact opposite when we do upper so we can now take take a
look at this one and now everything's going to be capitalized so there is a lot
that you can do with
02:01:25
these string functions and this is not all the string functions that there are
there are a lot more but I would say that these are the more popular more useful
ones that I typically use on a regular basis and so I hope that this has been
helpful I hope that you learned something from this if you did be sure to like And
subscribe below I have a lot more videos coming out with tutorials on everything
from SQL python Tableau and Excel thank you so much for joining me I appreciate it
and I will see you in the
02:01:52
next [Music] video what's going on everybody welcome back to another SQL tutorial
today we are talking about stored procedures now what is a store procedure a store
procedure is a group of SQL statements that has been created and then stored in
that database a store procedure can accept input parameters and we will be looking
at that today but that means that a single store procedure can be used over the
network by several different users uh and we can all be using different input data
a store
02:02:31
procedure will also reduce Network traffic and increase the performance and lastly
if we modify that store procedure everyone who uses that store procedure in the
future will also get that update let's start writing out the store procedure so we
can look at the syntax we'll start off very simple and then in the next one we'll
get a little bit more complicated so the very first thing that you need to write is
create and then procedure and after that you're going to name it so let's just call
this one
02:02:56
test and all you're going to say is as and then you're going to write your query
and so let's just do select everything from employee demographics and that is it we
have created our very first store procedure of course this is super super simple
but let's execute this really quick and take a look at it so it says that the
commands completed successfully let's go over to our SQL tutorial we're going to go
over to programmability store procedures and it is not showing up there what we
need to
02:03:29
do is we need to refresh our store procedures we're just going to go right here
we're going to click refresh and then there is our store procedure now how do you
actually use the store procedure that we just created so let's go right down here
and let's say x which means execute and then all we're going to say is test test
and we're going to run this and there we go so all we put in this store procedure
was a select statement and so when we actually Rebrand the store procedure it
returned
02:04:00
our select statement now let's go down here and we're going to make it a little bit
more complicated we're going to do the exact same thing in create store procedure
make sure I spelled that right and let's call this tempore employee so if you
remember from a previous video we worked on temp tables and we created our temp
tables then inserted data into that we are going to add that to this St procedure
so we can see the difference between a simple query versus a little bit more
complicated query so I'm going to say as
02:04:31
and then I'm going to insert that in here now what this is doing is I'm creating a
table and then right down here I inserting that table now if I create this store
procedure and then execute it nothing is actually going to be returned it will
insert the data into that temp table but since I don't have a select statement in
this proced procedure nothing will be returned so let's write select everything and
we'll just do from and this is temp employee and right here and so now let's
02:05:02
create our store procedure so that created successfully let's refresh over here and
let's execute this so let's just go down right here and say execute and it's going
to be temp employee and now we will execute this and there is our output now really
quick let's go into temp employee and we actually want to change this store
procedure so we're going to go over to modify so when we modify it a few things are
going to show up on your screen the first thing that you're going to see is
02:05:37
it says use SQL tutorial so it's just specifying the database the next two things
you may not be as familiar with it's set anzy nules and then set quoted identifier
if you don't know what these are it's not super important the first one just talks
about how it to deal with nulles when you're using the wear statement and then the
quoted identifier just talks about how it uses quotes in the actual query itself
again not super important but they have those automatically turned on let's go down
a
02:06:03
little bit further and we're going to look at the alter procedure so we created our
store procedure but now we want to alter it so this is the alter procedure and we
are going to add a parameter to this so what the parameter is going to allow us to
do is when we're actually executing the store procedure we can specify an input
into that store procedure so that we get a specific result back and I'm going to
show you what I mean by that in just a second but let's actually add our input and
we're
02:06:31
going to say at we're going to say job title and we need to specify the data type
that that is going to be so let's just say nvar 100 I know below it says varar 100
but that's um not extremely important so this is going to be our input so we need
to go down here and say where job title is equal to at job title so when we
actually are executing this and we say the job title is equal to let's say
accountant this is going to become accountant and it's going to give us our results
based off of it being an
02:07:08
accountant so let's go over here and we are going to click this execute temp
employee which we just modified and when we run it we're going to get an error
because it is now expecting us to include our parameter of job title so what we
need to do is we need to say at job title and let's say it's equal to a Salesman
now let's try running this one and see what we get and so there is our output if we
go back here I just wanted to show you really quick we do not have to put this job
title right here you can
02:07:42
put this anywhere in the query and use it however you want that's how parameters
work and that's why parameters are so useful and you can use multiple parameters
for one store procedure so you don't have to just limit yourself to one or none you
can put as many as you really like so I hope that this video is helpful and that
you understand store procedures just a little bit better thank you guys so much for
watching I really appreciate it if you like this video be sure to like And
subscribe below and I'll see you in the
02:08:08
next [Music] video what's going on everybody welcome back to another SQL tutorial
today we are going to be talking about subqueries now subqueries are often called
inner queries or an nestic queries and they're basically a query within a query a
subquery is used to return data that will be used in the main query or the outer
query as a condition to specify the data that we want retrieved you can use
subqueries almost anywhere you can use it in the select part of a query the from
the where you can also use it in
02:08:48
insert update and delete statements but in today's tutorial we're only going to be
looking at the select the from in the Weare statements and you should get a pretty
good idea of how to use it in those other statements all right now I'm going to
paste on screen basically what we're going to be walking through today but really
quick let's just take a look at the table that we're actually be working in and
that is going to be from the employee salary table and I just want to show you the
data that we're
02:09:11
going to be working with before we actually get into it so we have an employee ID
we have a job title and then we have a salary so really quick I'm going to show you
what it looks like to have a subquery in the select statement so let's go down here
really quick and what we're going to try to do is kind of do something like a
Windows function but without actually having to do the windows function um and so
we're going to do this with a subquery so I'm going to select and really quick
02:09:40
actually let me copy this so we're going to do employee ID there we go we're going
to do salary and now we can start building our subquery so we need to do an open
parenthesis and I'm just going to copy this really quick because we're going to be
doing it off of that table so we're going to say select and then I'll paste that
and close it as well but what we want to do is we want to say average and salary
now what this is going to do is it is literally going to run this and
02:10:10
let's run this really quick it is going to run this and is going to show that the
average salary for all the employees is 40 $ 7,99 so we are looking at the average
salary for every employee so when we run this it is going to give us the employee
ID the salary and then in the very last one is going to show the average salary for
every employee now it doesn't have a column header so or or a column name so let's
give it um let's say as all average salary and we'll run that one
02:10:46
more time just to make it look a little prettier um you can also do this in
Partition bu I'm going to Super quickly just really quickly write this out um it
should take no time at all and then I'm going to show you why we can't do this
without the subquery why you aren't able to do this with a group buy so really
quickly let me copy this I'm going to put it right down here and we're going to say
average salary whoops and we can get rid of all this and we can say over and we're
not going to partition it by
02:11:18
anything but let's run both these at the same time and you'll see that they're the
exact same outputs and so it's just a different way of doing it in this example but
it really is just to show a comparison of how you might be able to use a subquery
in the select statement now you might be wondering why group I does not work for
this uh really quickly I'm going to write this out and let's get rid of that and
we'll say Group by whoops let me at least try to write it correctly Group by and
we'll do employee
02:11:50
ID and we also have to do salary and then we'll say order by one two so let's run
this and as you can see since we have to use the group by it groups by both the
ordered ID and the salary and so we're not going to be able to get that all average
salary that we're looking for that we can get in the partition buy and also the
subquery in the select statement now I'm going to show you the subquery in the from
statement so let's just get rid of that really quick and let's say select
everything let's say
02:12:25
from and we're going to do an open parentheses here and here is where we're going
to write our subquery so if you have watched previous videos where I've done uh
tutorials on the CTE or tutorial on the temp tables this is one that is very much
like those except I think a little bit less efficient when I'm doing something
where I'm creating a table and then quering off off of it which is what we're about
to do I much prefer a CTE or a temp table subqueries tend to be a little bit slow
compared to a temp table
02:12:58
or a CTE I tend to use temp tables a lot more because you can reuse them over and
over whereas a subquery you cannot you have to write it out each time so really
quickly I'm going to show you how it's done although I don't really recommend using
this method really quickly let's go up here and let's steal this partition bu
really quick this will be our subquery uh and let's paste this in here I'm going
make this look a little nicer just so you can visualize it a little
02:13:26
bit easier um so really quick what this is going to do is it is first going to run
this and create this table again much like a temp table or a CTE so let's execute
this really quick it's going to create this table and then it's going to allow us
to query off of it so I can actually say um and let me give kind of kind of an
alias to this a. employee ID and then let's say all average salary so now I can
take um columns from this inner query if I want to and just select those or I can
select everything and
02:14:04
return that entire table again I much prefer a temp table or a CTE for this type of
situation but as an example I just wanted to show you how it works now let's go
down to the subquery in thewar statement but really quick I just steal this query
so I don't have to rewrite everything and let's get rid of this really quick and
add back the job title all right so let's look at this really quick so we have our
table that we've been using our employee ID job title salary so for this example we
only
02:14:39
want to return employees if they're over the age of 30 and as you can see in this
table there is no age column that is in the employee demographics table now if we
wanted we could join to that table and get that information or we could use a
subquery and so for this example we are going to be using a subquery so let's go
right down here and say where employee ID is in and we'll do an open parentheses
and now this is where we are going to build out the subquery so just for visual
purposes I'm going to go
02:15:07
right here I'm going to say select everything and we'll do from employee
demographics and close the parenthesis so we're going to try to select something in
this subquery that will then identify the employee IDs that are over the age of 30
so really quickly let's take a look at this table so right now we have the entire
table selected so we have the employee ID first name last name age and gender so in
this subquery the only thing that should be returned is the employee ID and in fact
in your
02:15:39
subquery you can only have one column selected so I can't select everything I have
to specify one column and that's a little bit different than how we did it in in
this from statement where we were basically able to select the entire table and
then in the select statement specify what columns we wanted in the where statement
we can't do that so we want to return the employee ID and we also want to say where
the age is greater than 30 so let's run this really quick and see if it works as
you can see
02:16:11
in the results these are the employees who are over the age of 30 now if you wanted
to display the age as a column in this output you would have to join to that table
and then put that column or that field in the select statement but in a lot of
situations you won't actually want or need to do that and so a subquery can be a
really good option in these scenarios with that being said this is the last video
in the advanced sequel tutorials I hope that this Series has been helpful and that
you learned
02:16:37
something along the way thank you so much for joining me I really appreciate it if
you like this video be sure to like And subscribe below and I'll see you in the
next video [Music] what is going on everybody welcome back to another video today
we are starting our data analyst portfolio project [Music] series now before we
jump into our first project I wanted to talk with you for just a second so that
we're all on the same page first thing is that there are going to be four projects
the first one
02:17:17
is going to be SQL and we doing a lot of data exploration and we'll be setting up a
lot of our data to visualize it in Tableau Tableau is going to be our second
project in our third project again we're going back to SQL but we're going to be
doing a lot more of the ETL process so a lot more of the data cleaning I did that
one as the third project because I think it's going to be a little bit more
advanced than this first project I tried to make it as beginner friendly as
possible so even if
02:17:44
you are a complete beginner as long as you've walked through uh you know the
tutorial that I have made on my channel you should be pretty good and then the
fourth and the final project will be with python we'll be using a lot of pandas
doing a little bit of data cleaning and then doing visualizations as well as I said
just a second ago I'm trying to make this as beginner friendly as I possibly can
the whole point of the series is that if you are trying to apply for a data analyst
job by the end
02:18:08
of the series you should have an entire portfolio or at least a a really good start
at a portfolio to show a potential employer I give you full permission to copy
every script every query line for line if that is what you want to do and create
your own portfolio I am totally fine with that but I will encourage you and I'm
sure I'll say this throughout the video I encourage you to try to think of your own
queries try to think of your own insights and your own things that you can do to
make this portfolio
02:18:32
project unique with that being said I'm super excited to get started on this with
you guys so let's jump over to my screen and get started on our very first project
all right so now that we are on my screen we are going to get started on this
project we're going to download the data set we are going to format it just a
little bit in Excel and then we're going to get into sequel where we will start
querying it I will say that I think this is going to be a very long video I'm
hoping to keep it
02:18:55
under an hour and a half I may separate this into two videos depending on how long
it runs um but you know I I will do my best to keep it short but we have a lot to
get through I'm going to basically do no Cuts I'm I'm that's my goal is to do no
cuts um in this because I want to walk you through each step of the process so that
you understand everything that's going on and I I you don't get lost at some point
um but I think this is probably the best way to do it we'll see uh the very first
thing
02:19:27
we're going to do is download our data set so you know as we're looking at this
there's an option right here to download the data set I don't recommend that one um
you can it just won't give you all the information that I personally want which is
go back to like the very beginning um if you go down right here to the very first
graph um you can actually push this back and then download it and what this will do
is it will go back to I think January 1st of 2020 so let's open this one
02:19:57
up um and when we get in here we're going to reformat it just a little bit it's
nothing too complicated I hope um I'm just going to double click here actually let
me let me go up here and filter just in case we want to filter anything so um what
we have here is a ton of information on Co I mean just a ton and it goes back to
early 2020 I believe it does go back to the first of 2020 so really quick a really
brief introduction of what kind of data is in here we have total cases new cases um
total deaths new deaths we use
02:20:33
those quite a bit in the the queries that are coming up um if we go way over here
we have total vaccin vaccinations people vaccinated um and then over here a little
bit farther we have population that's the main stuff we're going to be working with
today as you can see there's so many other things in here I mean you can use this
if you want to go back and do more stuff on this I highly recommend it there's such
you know there's so such unique data in here about smokers and diabetes and like
all
02:21:05
this random stuff that I did not do a deep dive in I mean I could I could spend you
know a month just like looking at this data set and and getting really interesting
stuff from it um but I'm not going to do that I wanted to do this faster than uh
two months to to complete what we're going to do um is we're going to go back over
here we're going to take this population and we're going to click on this as and
we're going to click contrl X and that's going to cut it we're going
02:21:36
to go back to the very beginning and we're going to place it right here and we're
going to right click and say insert cut cells now why are we doing this because
I've already done this entire project um and if you don't do this you're going to
do a join with every single query you do which if you want to do that keep it there
and then just you know change your query for for that I did it like this because I
wanted to show joins later on I wanted to keep it kind of simple at the beginning
um
02:22:05
and then work my way to a little bit more advanced things which you will see um it
gets you know semi Advanced but not too much I promise um just stick with me let's
go back over here we're going to go to uh actually double A and then we're going to
click control shift right key that's going to select everything over here and we're
going to literally delete it okay this is going to be our first table over here so
everything you see over here is our first table um and we're going to save
02:22:33
that so let's save as I'm just going to keep it in my downloads as and let's do
covid deaths so that has our death information the next one is going to include our
um vaccination information which is what we're going to join on and then um we're
going to do that later so let's let's hit contrl Z that's going to bring it back
now let's select on Z and go all the way to e and we're going to do the same thing
we're going to delete this looks like there's no data but I
02:23:02
promise there is later on the vaccinations um like total vaccinations if we go down
um you can see that that starts on in February the end very end of February in 2021
that's because vaccinations are you know didn't come out till recently now let's
save this file and we're going to save as instead of covid deaths we'll do Co
vaccinations all right now let's save that so now we have our two excels that we
want we need to get them into SQL we're going to go over to SQL and we're
02:23:39
going to create a portfolio project database I've already done this all you have to
do though is rightclick click new database type in portfolio project and then click
okay and it will create your database for you um if you open up the tables it
should be empty and that's where we're going to put these two Excel files now uh I
had a ton of trouble actually importing these excels um I mean I tried everything
and I eventually just went down a rabbit hole of how to get these in I don't know
02:24:11
if it's me or or what but I could not figure out how to do it if you go to
portfolio project you hit tasks and you hit import data that may do it for you and
it may work um it did not work for me uh it just it kept giving me errors so what I
would recommend you do right off the bat just to make sure that we're doing the
same thing um and you can do it that way if you want I went over here to start um
again I'm on a Windows and I went down to Microsoft SQL Server 2019 and clicked
Import and
02:24:46
Export looks the same but for whatever reason it it all the research I did it has
to do with the 32-bit versus the 64bit when you do it this way it goes to the 64-
bit and it is able to import the data if you do it the other way it was doing it
the 32-bit version and gives you an error I don't understand it don't ask me that's
that's the re that's I mean I went down a huge rabbit hole but this one works so
let's go over here and this is going to be our data source where is the data coming
from it's an
02:25:16
Excel file so let's do that let's browse and let's go over to my downloads I
thought I saved it in downloads uh maybe because it's an Excel workbook what was I
saving before Oh that's a CSV okay something important to note is we're doing an
Excel and not a CSV you're going to get the same error I'm just doing it live and
I'm making myself look stupid so um we're going to save it but instead of a CSV
we're going to save it as an Excel workbook so let's
02:25:52
save that um now we have to go back to how it was right here um the same way and
we're going to file save as and let's do this is now covid deaths and save it as a
workbook now we have them now let's go back um now we have our covid deaths and our
covid vaccinations let's do our deaths first um let me get back right here so it
looks kind of more normal um so we have our Excel file we have our covid deaths
let's go next and now we have to say where we're going to place it where's our
destination so
02:26:31
we're going to click over here and go down to SQL Server native client 11.0 I want
to say this is something that I messed up and it took me like 45 minutes to figure
out it was the stupidest mistake um it's gonna autop populate a server name and I
never checked to confirm that this was my server name and so I couldn't figure out
why I wasn't able to insert this into my portfolio project uh database that's
because mine is 01 I created two different servers um intentionally and for
whatever reason I
02:27:06
forgot that and so all I have to do is add 01 over here so just make sure yours is
is the same thing click portfolio project click next yes we're want to copy the
data should autop populate if it doesn't if it gives you like multiple you can
always uh check mark on the one that you think is the right one it should be the
first one we'll click next we'll just click finish I'm sure it says run immediately
we'll click finish and finish now while this is running um there should be around
02:27:35
89,000 that's how it was like a week ago when I started it maybe a little more now
because there's extra days um with that being said you know there's going to be a
good siiz amount of data um we're about to do a lot of different things we're going
to start at the very basics of just like queer quering the table like super simple
um and then we're going to go into things like joins ctes temp tables creating
views um I the whole purpose of what we're about to do is not
02:28:06
to it's not to keep it too simple um I want to showcase to a potential employer
right that you can do more advanced Advanced things so I'm going to probably do I
mean I'm I'm looking at because I have already done this entire project
individually I mean we've probably got like 15 to 20 queries here you don't have to
do all of them um I'm going to walk through all of them and you can choose which
ones you want but you don't have to do all them it is quite a few so
02:28:36
just know that so there's 85,000 right here that's fantastic uh it won't show up
immediately you need to refresh it uh and there we go so that's our covid
vaccinations U let's get rid of this so we just have Co vaccinations um I thought
that was our covid deaths one but maybe I'm wrong um but let's do the exact same
thing down here and we will import and say next we're going to go down to Excel and
browse and now we want to do the covid deaths apparently last time we
02:29:15
did the vaccinations which um I actually actually you know what I bet what it did
was it took yeah it took this right here as Co vaccinations but that was the deaths
one as it saved so uh forget that let's go right here let's do the co vaccinations
it just has the same sheet name uh so sorry for the confusion destination is going
to be the exact same place it's going to be SQL Server native client let's add that
01 and let's click refresh portfolio project next next um like I said before
02:29:57
if it does this just click the first one it's going to be Co vaccinations it did
that for the covid deaths that's because I made the mistake earlier I hope you I
hope when you're watching this you aren't super confused um the whole point make
two tables or make two excels one should be covid deaths one should be Co
vaccinations upload them and then rename them in a nutshell U so we have the same
amount uh let's refresh this this one is actually the co vaccinations this one is
covid
02:30:31
deaths I'm telling you this stuff is it's confuses me sometimes to be honest um but
we're going to query this really quick to make sure we act are actually doing um
what we're supposed to be doing so let's do select everything from um and let's do
portfolio project and you can do dbo or you can do dot dot I tend to just do that
because it's easier um let's look at this one make sure it's the right table so we
have total cases new cases perfect um and let's order on let's do
02:31:09
three comma 4 just to make sure or order by of course just to make sure that we
have all everything that we're looking for so this looks right this looks like our
Excel let's copy this let's go down here we're going to do covid vaccinations and
let's run this one make sure the second one came in correctly as well so perfect so
we have our two tables this is fantastic news um and now we can get going um we can
keep this one I'm GNA comment it out in case you know we want to come back to it um
I'm going
02:31:51
to really quick again right here I have another laptop I have already done this
whole project so I'm just using it as a guideline to know kind of what I'm doing
next so that I don't waste everyone's time um so really quickly let's just let's
select the data that we are going to be using you don't have to use these comments
I will say that I'm going to specify I'm going to say hey this comment is something
I would keep in your portfolio project I'm going to add
02:32:21
a bunch of extra stuff that is not needed um just for your purpose but when you are
creating your portfolio project you shouldn't be adding some of the things that I'm
going to be commenting um on so we're going to do um or actually let's do really
quick let's copy this so that it kind of knows what we're doing so let's select the
location uh the date the to total cases the new cases the [Music] total deaths and
then population uh now where we're at I'm going to turn off my camera because it's
02:33:04
going to get it's going to start getting in the way to be honest I don't want it to
interfere with your ability to see what we're doing on screen so it's been great
seeing you guys I'm going to turn this off and we will continue from here all right
that should be turned off so let's keep running so this is what we're doing let's
actually let's keep this going because I I don't like things not being organized um
so we have our location oh no we want to do one two we want to do
02:33:36
it based off the location and the date makes things everything easier I promise you
so we're going to be the first one's obviously Afghanistan here's our date we have
our total cases are new cases total deaths and population so really quick I'm just
going to scroll down just a second um they started having you know the the total
deaths it's um it started about a month after they got their first case it looks
like so and then it just like ramps up a lot um and we're going to be
02:34:10
diving into all these numbers what they mean how to you can do some really simple
calculations on them um but really quickly we're just going to do again a super
simple calculation um and one that we do multiple times for different things um so
let's go right down here and let's say uh we're going to be looking at the total
cases versus total deaths so how many cases are there in this country and then how
many deaths do they have per um uh you know how many deaths they have for their
02:34:48
entire cases so let's say they have a thousand people who H who've been diagnosed
they had 10 people who died what's the percentage of people who died who had um who
had it so uh let's go right down here and we're gonna I'm just going to copy this
really quick this just going to make our life easier I think you should do the same
as well um so we have location date total cases um and we're going to get rid of
our new cases we don't need that one in this query right here uh nor do you need
02:35:21
this population so let's work on our calculation really quick it should be super
super easy let me make sure I'm still recording perfect oh man we're 25 almost 25
minutes in um or more because I have the intro so now we're going to do uh we want
to know the percentage of people who are dying who actually get infected or or or
or who um report being infected so we're going to do um total underscore deaths
we'll go right down here and we're going to divide that by the total
02:35:55
cases total cases and if we do this really quick um what it's going to have and
well let's go down to where there's actually numbers so we have 34 we have one um
it's it's showing 0.029% if you ever try to get a percentage of something you have
to multiply times 8 100 um so let's do that really quick all we have to add is the
what's that the asteris sign um times 100 um and while we're here let's just add
the um what's it called Alias Let's do let's call this death percentage I
02:36:35
don't know that that works for me and let's take a look at this it'll be a little
bit more accurate accurate so when there were 34 there was one and that gives gives
us a 2.94% death rate and we can go down even further um and this is still all
Afghanistan let's go down to the very bottom let's go down to the very very bottom
so as of as of today yesterday there were 59745 total cases in Afghanistan and
there were 20 2,625 deaths which is 4% so you have a 4% chance basically right
02:37:14
now of dying I mean if if you want to look at it like that 4% chance of dying if
you get it and you live in Afghanistan um let's I mean we you don't have to but
really quick just to look at it further let's look at where the location um I think
it's let's say like real quick because I'm not 100% if it's States um it should I
think it's United States but yeah so I mean I live in the United States if you
don't you can look at your country but um you know we we this is like this is
genuine real
02:37:50
reported data so it's really interesting um right at the beginning I mean the I
don't know if it was the way we were reporting or what but we had really high
percentage rates um as we go down we're looking at a 5% 6% I mean this was the peak
of it this got really bad in the US um maybe get I hope it gets better um how many
are we at this is I'm going to go to the end of this year we sitting at around 2 to
3% um um yeah it goes down to under 2% so at the end of at the end of the year
02:38:22
we were looking at over 2 million people that's 2 million no wait 20 million 9363
wait wait wait 20 million people who have been infected um that's a lot that's a
lot of 20 million people who have had it 35,000 or 352,000 deaths by the end of the
year that's a lot um let's keep going um and at the very end we had over 32 m346fa
um there's a lot of deaths 576,000 and I verified this number um I Googled it
Google knows all I googled this number and it's pretty accurate um
02:39:06
and it's really sad that's a lot of lot of lives um and that's 1.78% so as of right
now if you're were to get it today a estimate is around one uh and three fourest to
2% chance that you're that you could die from it um so really interesting numbers
this is the kind of exploratory stuff that that you know we're going to be doing
we're going to get a lot more advanced as we go on but this shows you know the
likelihood um and we can I'm going to write that shows the likely I hope I'm
spelling
02:39:39
this right I'm not spelling this right likelihood I hope that's right if this not I
apologize likelihood of dying if you contract uh covid in your country um again
rough estimates but you know just glancing at the data that's kind of what we're
looking at um now we're going to look at and let's go down here let's look at
looking at the total cases versus the population again we're going to do a lot of
this like percentage stuff um it it's pretty simple um that will only last for
02:40:27
so long I promise you but it'll be really I'm going to keep it on the states just
because um I'm going to be looking at that one the most because obviously it's
pretty relevant to me um so if you're in another country filter by your country
you'll be really interested in the stats I I know I was really really really um
shocked by a lot of the things that we're going to find today so we're going to
keep the location we're going to we're going to keep the date keep the
02:40:57
total cases um but let's change this to population and then instead of um the total
cases being here we're going to put the total cases there and then change this to
population so what is this going to do for us this is going to show us what
percentage of the population has gotten covid so shows what percentage of
population oops got covid um some of these things again they're they're good to
know um the one that I upload to GitHub will have the notes that I recommend
keeping um again not
02:41:41
everything in here is um not everything in here is what you know you need to have
in there this is mostly just you know what I think you guys need to see while we're
actually typing this out all right so let's take a look at this um actually I want
to change this I want to put this right here just as easier for me visually um just
for because the total cases right here so our our population in the US is around
331 million um so at the beginning when we had one case I mean it's like nothing
let's keep
02:42:22
scrolling um and see where we get to 1% so 1% that's 3,311 32 uh people and that
happened in what is that August August of last year so 1% of the population let's
keep going all the way down again we're just kind of glancing at this we're about
10% um again we're at the that 32 million so 10% of the population has has gotten
it gotten a test and it's been confirmed so really interesting um you know we'll
come back to that one I'm sure in the future I you know we might make we might use
this one
02:42:59
as like um a visualization again uh I'm only looking at the states or United States
right now but you know think about it in terms of how we're going to visualize this
in the future cuz a lot of what we're doing we're going to visualize in the future
um in Tableau I have Tableau even open right here you can see I have a map um this
is just a super I threw this together in like two seconds um we have the uh we have
the location and so you know this is like our future this is what you need to be
envisioning when
02:43:34
you're looking at this data so we have you know Afghanistan and let's just scroll
through bellaro and Bolivia and Bulgaria and cambod all the every single country um
that that is reporting so we're just looking at the states but remember all of
these are going to be used so just something to remember um I want to know and I'm
really curious as to what countries have the highest um infection rates compared to
the population so we're just looking at our population um up here um how are we
02:44:10
going to do this we'll do actually let me say well let me write it out really quick
so let's look looking at countries with highest infection rate compared to
population so that's what this script is going to do or this query is going to do
I'm going to copy this um so we're going to keep the location we are not going to
keep the date this is not going to be date specific this just going to be overall
and then we're going to look at the max of the total cases so we only want to look
at the highest so when when
02:44:47
we were looking at the us we had 32 million we don't want to look at every single
Pop um uh of the total cases we only look at the very highest one so we'll look at
the Max total cases um and let's right here we'll just say give it an alias at
least something to recognize it so highest U I guess we can say infection count so
we'll say highest infection count that's the highest infection count per country um
so per location um and then we want to also take because it's going to it's not
going since we
02:45:23
don't have Max total cases here if we just kept total cases here it'll give us the
same one that we were looking at in this above query what we need to do is we need
to look at the max of this um so we're going to look at Max and just add a
parentheses there um and we'll look at this isn't the death percentage anymore I
forgot to change it in this last one this is is what is this it's percent of
population infected so let's change that for both of these because I don't want to
get
02:45:56
confused when you're looking at the column headers later um so we'll look at the
percent of population infected let's run this and see what we get uh list is not
contained in either the aggregate oh I need to add a group ey of course um so let's
add Group by um and we need to group by both the population and the location so
let's try that really quick let's see if this works awesome um well we ordered on
location and population but I really want to look at the highest um so let's so
let's just see
02:46:37
really quick look at some of these numbers got like 1% 4% um 10% okay so yeah yeah
what we want to do is order on um this percent population infected so let's go
ahead and do that uh and let's do that descending so the descending gets the
highest number first um my goodness 177% so what percentage of your population has
gotten covid it's been reported and and and um we can see that now so the very
first one small population so it doesn't surprise me but if you look right down
here here so that's that 32 million that
02:47:17
we were talking about that's that Max of total cases um which is the the highest
number of our infection count so we have 33 so we're at I mean we're we're right up
there on the list let's look for other large countries I mean it's us you know
there's Israel there's Belgium Portugal France so you know we're up almost to about
10% in a lot of these countries so some some of us including the United States we
are we are in there as well some of us has have really high
02:47:48
percentage rates we just did not keep it under control um and you know a large
amount of the population has gotten it that's what this one shows um now let's look
uh kind of at the sad side of things we were just looking at how many people were
infected let's look at how many people actually died um so let's do let's comment
and we'll say this is going to this is showing the countries with the let's do
highest high am I spelling that right yeah highest death count per
02:48:25
population um now how are we going to do this let's copy this off the bat but I
don't know if we're going to do it the exact same way because we just need location
um and not much else honestly so let's get rid of all this stuff but we do need
we're looking at the highest death count so like we did up here with the Max total
cases we're going to do Max and then we'll do total deaths I hope it's like this
total deaths um and then we'll do as total oops total death
02:49:04
count um and we'll order that by the total death count see I don't need this I
think yeah I need to group by because there's an aggregate function and let's try
this really quick okay so if you're getting this there's a there's a simple slash
confusing explanation to this total deaths right now let's go into our covid deaths
columns okay let's show the total deaths which is right here it's an nvar chart 255
it's an issue with the data type um oh wait total deaths no total deaths right here
02:49:47
it's an issue with the data type um it just has to do with how the data type is
read when you use this aggregate function we need to convert it um or cast it is
what we're actually do we need to cast this as an integer so that's red as a
numeric um why I cannot 100% give you a perfect explanation for it but this happens
all the time you just need to look at the data and realize oh it's probably because
of this data type let's try something else um and then it'll work so let's cast
this
02:50:17
and we're in casting it I find is just easier but just as int boom there you go so
now we're taking this nvar chart 255 over here and then we are converting it to an
integer now let's run this um and let's get rid of this just for visual visual
purposes now we are much more accurate but we have a slight issue or we're we're
now seeing a slight issue with our data in our data in the location section we have
a few ones that really shouldn't be there ones like world or Africa um or South
America these are
02:51:04
grouping entire continents so let's go back up to our um let's go back up here and
let's do actually let's pull it up really quick because this is just part of
exploring the data and figuring it out so if we scroll down um we're going to f
we're going to see one like right where is it right here this this location is all
of Asia whereas in other ones the continent is Asia if I can pull one up real quick
so like right here the continent is Asia whereas before the location is Asia but
02:51:39
if you also notice um the continent is null here so what we need to do is say um uh
where continent is not null because when it is null that means that this location
is actually an entire continent and we don't want that um that may be helpful for
us um later on but it is not helpful now so now this right here will get rid of
that um and just knowing that figuring that out now we can add that to every every
script um and we can do you know you don't have to do this I'm just doing this for
you know visual purposes I'm
02:52:18
not going to do that for everyone um so let's say where continent is not null and
now let's look at this and now you can see that the United States is number one and
so number one is not the best thing to be number one in but we have a death count
of 576,000 and again I I googled this earlier these numbers are pretty accurate
there some of them are like a day or two behind give me a second I'm going to take
a water they're like a couple days behind um this number is actually higher um and
02:52:53
as you know as we continue to have more people die unfortunately that number just
continues to go up um so the data that that you download may be a a lot higher um
as of right now we've been breaking everything out by location right really quickly
let's just do this by something we kind of saw earlier um and I'm just going to do
this for breaking it up purposes but I'm going to say I'm going do caps lock let's
break things down by continent how SP continent Contin jeez is that even how you
spell
02:53:33
it I don't even know let's keep going um but now we can do consonant right here and
we'll just copy and paste that let's get that back up here um and now we can see
where continent is not null let's see if that makes that yeah okay so now it's
breaking it out by continents um with North America South America Asia Europe
Africa Oceana is this perfect no no it's not perfect um North America looks like
it's only including the numbers from the United States and not Canada um so we have
some small
02:54:19
issues in here um but for the purposes of what we're trying to do which I don't
think anyone's going going to come in here and fact check us or check the data they
may and then you're I don't know you might be screwed but for the purposes of
hierarchy um and you know drill that drill down effect in Tableau which is
something we are going to do we want want to start including this continent in our
in our queries so that we can drill down um further into these things um we can
also do where just wait I'm
02:54:55
going to do where isnull um actually let me see so before we were doing work
continent is not null but let's do location I'm just I I'm doing this on the Fly I
haven't done this before I just kind of am doing this um this actually is the
correct numbers and I don't know why I didn't do this before when I was actually
creating this project but now this is a wonderful beautiful thing I believe this is
the correct numbers um I could verify but I don't want to do that live because I I
02:55:26
might look stupid but I think this is accurate um remember before we were looking
at the location and the location um and it was actually the countries itself and
then there were ones where we did where is notnull to get rid of all the ones that
were like world and all those other things well now I'm just filtering on those
instead of deleting them before we were looking at everything but these now we're
only looking at these and these numbers look a lot more accurate so with that being
02:55:54
said um I'm going to use this going forward in my script so I'm going to kind of
change things up to where from what I originally had um let me see though because
if that is the case it may screw up our drill down effect um which is highly
unfortunate I may I I honestly might just revert back to it for the pure fact that
we want the visualizations to look correct um just know that this is the right way
and if you want to go back and do that I highly encourage that I didn't figure that
out
02:56:24
my first time around um but I'm willing to admit when I'm wrong let me see what let
me do a time check all we're run like 50 minutes or so I think we're gonna we're
just going to keep going all the way through I I I don't think we're going to stop
um I don't think we're going to stop in this project so we want to do some of the
the above queries were kind of what we were going for nothing crazy difficult right
nothing crazy hard um and now we want to we want to start
02:56:55
breaking this out by um continent as well I'm I'm going to go back and is this
correct let me look no so is not no um so we want to start doing some of the above
queries but adding that content in there you can even go back and add that as well
um if you want to that's totally fine I'm going to do some more queries down here
um or at least one more one or two more and then we're going to start getting I
think into some a little bit more advanced things we're going to start getting into
some temp tables uh
02:57:34
stuff like that because we're going to eventually set these up in um views so that
we have these views to um use for Tableau later um and again it shows you know how
to create a view so that's important so we we've we've done this first one this
next one is going to let me go down one more this is showing the continents with
the highest death count so almost the exact same as we did before but now we're
looking at the continents um we can even go up and look at uh just wait we
literally just
02:58:16
did that um so that's what this one is actually looking at my notes wrong idiot
okay perfect um now we you know we want to start looking at this from a Viewpoint
of I'm going to visualize this so how do we do that what we want to look at let's
look at some Global numbers um you can do as many many of these as you want
anything up here just add continent to it um anything what like groupy just replace
it with continent and you and you got it um so I don't want to go through and do
every
02:58:54
single one of those but that is kind of the gist of what you might want to do
especially if you want that drill down effect and if you don't know what that is um
you know it's like clicking on North America and then when you bring up North
America then it shows all the countries in North America so Canada uh and the
United States and so it's a drill down so you like on Africa and then there's all
the African countries that's what drilling down does and that's what you can do
when you have um
02:59:20
those layers so you have the continent then you have the location um so you know
I'm not going to we we'll look at that when we actually get to Tableau but I don't
want to actually spend all the time writing that out um but what we now want to do
is we want to calculate everything for the across the entire world so let's do this
let's say um breaking let's do Global let's just say global global numbers easier
easier than nothing um all right uh I let me really quick find the
03:00:08
I think it's probably the first one the death percentage let me let me see if this
is the one that we want [Music] okay let me see all right so let's take this one
I'm sorry that took me a while to find again I'm not cutting any of this stuff out
you just got to stick with me you if you're sticking with me this long I know you
care I know you're not you're not cutting away because I'm trying to figure things
out on my side so um let me get rid of this so this is the exact
03:00:43
same SC what well let's say where just so we can get the right numbers um so we are
now going to look at the global numbers uh so we're not going to we're not going to
uh include any location any continent or anything like that but we do want to make
sure that we're only looking at all of the um countries and we're not looking at
the world numbers plus all the countries because then the numbers would get
astronomical so instead of now now we can't do so let's try running this
03:01:19
really quick so now we really can't do this um because now it's breaking everything
out by um by you know that uh which is the date it's breaking everything out by the
date because um these total case the numbers are different right so really quick
let's Group by date and now let's see what it looks like uh it's going to give us
an error obviously that's because we're looking at um that's because when we're
looking at this we're looking at multiple things
03:02:00
and we can't Group by just the dates obviously if we wanted to group by something
which we need to do we then need to um start using aggregate functions on
everything else um so really quickly let's do some aggregate functions I'm looking
at my notes for just a second um to see what I did basically what we want to do and
I think what'll make things easier is I mean I could try to do the sum of Max total
cases I don't think that's possible um let me comment this out really
03:02:39
quick yeah um it's because there's an aggregate function within an aggregate fun
function and we can't really do that um if we go back to the data and you we kind
of looked at this earlier there's one called new cases um let's use this because
instead of doing Max we can just sum it or or or do a sum on it and that's going to
give us the sum of all the new cases which adds up to the total cases so if we do
this let's see this will give us on each day the total across the world because
03:03:18
we're not filtering by any continent or or we're filtering out um like the world
and in the actual continents we're not filtering by location or continent or
anything it's just by date so we're looking at the sum of the new cases so now
let's do uh let's do the [Music] sum of uh new underscore deaths and we can run
that one um operating data type and our chart is invalid for the some operator so
going back um and this is something I encountered a lot when I was doing this
03:03:58
is these new cases is a float which is why it's working in the sum but the new
deaths is an narar so what we need to do again is cast that as an integer it's just
the easiest thing to do um and now that one should work so um let's get rid of the
well let's get rid of down to here so we're we're about to do another one and
that's going to be our death percentage globally across um across the I guess the
world so we need to do the sum of I think it's we need to do new
03:04:39
deaths all right divided by the sum of new [Music] cases all right times 100 let's
see what this takes us um okay of course we're getting the same thing let me um let
me put this right here and see if this works um invalid data oh that's because this
was new cases the new deaths one is right here and let's run this and now we are
looking good um and as you can see the death percentage is right here we have 91 um
and let me give these I don't we can't let me go back real quick and just
03:05:38
say as total cases as as total deaths um and let's run that again okay and so
across the world these are our numbers so we have total cases on that very first
day that cases were starting to be reported there were 98 total cases there was one
total deaths that gives us a death percentage of 1% across the country or across
the world and as we scroll down it gets lower and lower and that's cuz we have a
lot of people who have gotten infected are the total cases um and again that's per
day right so if we
03:06:28
remove this all together that date Al together which we can do right now this will
uh this will give us the total cases which is oh gosh let me read this through one
two 150 million um versus 3,180 26 so overall across the world we are looking at a
um a death percentage of a little over 2% so interesting numbers you can keep both
of those queries separate if you'd like um you know they might come in handy later
but let's do this so we have um give me one second check on my notes again because
I just want to make sure
03:07:23
I'm not doing something stupid all right all right so again we have a whole another
table that we haven't used yet uh it's this covid vaccinations um and just to you
know refresh your memory let's do um let's look at the table from portfolio
project. Co vaccinations let's jog our memory on what we got here so we have um we
have these tests we have um vaccinations over here which was what we're actually
going to be using um excuse me me uh that's what we are going to be
03:08:17
using so let's join these two tables together uh and let's let's actually just do
from actually let's just do this whole thing from let's do covid deaths and here's
how we're going to join it so we're going to say join and we're going to say oops
wait that is wrong join and we're going to say on so what are we going to join them
on um we're going to join them on two things we're going to join them on location
because that's much more specific than the continent
03:08:55
we're going to join them on location and we're going to join them on date let's
call this one DEA let's call this one vaccination so a little Alias for these so
that we don't have to type out this entire table name each time so let's do dea.
location is equal to vac. location and da do and we'll say date is equal to vac.
date and let's just see what we get really quick so we'll have all of these things
and let's look at Granada 0717 let's go all the way over
03:09:39
here and it should have Gren 0717 so just making sure that they were joined
correctly um for this query what we're going to do is look at the total population
and let's do that right here so looking at total population versus vaccination so
how many PE what is the total amount of people in the world that have been
vaccinated that is that is what we're going to do in this query right here so let's
do dea. continent location uh da. date again these are going to be the same in
either one but
03:10:26
we have to specify um let me just for example if we do population population oh
actually that's a terrible example um because population's only in one let me go
back real quick let me say I only write date that's going to give me an error
because there's date in both of them in fact we joined it on them so we know
there's date in both of them so it's going to give us an error we just have to
specify what table we want to pull it from so we going to do DEA um and da.
population just to keep
03:10:56
it consistent um and now we're going to add the next one da do and let's do new
vaccinations um and really quick let's just look at this um and let me get my
orders cu I want it to be organized I I actually one let's do one two three I don't
like it when it's not organized it bothers me so we're looking at oh no I also need
to add or consonant is not [Music] null there we go uh da perfect now let's run
this this should look much better there we go all right we are in fact if we want
to look
03:11:48
at Afghanistan like we have normally been doing um in previous ones we do two
slash3 so there's our population here's our new vaccinations now let's see we're
going to go back go down and let's see they have vaccinations starting on 218 um if
we go even further down let's just go to who's this Canada oh yeah Canada would be
a good one to look at they started doing vaccinations on right here so 12:15 I mean
they started very early and their numbers only increased and now they're you know
doing this is
03:12:34
per day right so this is 288,000 in one day um so that's you know really high
numbers but this is the number of new vaccinations um there is a column called
total vaccinations in this table but we're going to do something pretty just to
display again this whole portfolio project is to show potential employers that you
know how to do certain things so I want to set up opportunities to do that we're
not going to use the total vaccinations we're going to use this new vaccinations
which is new vaccinations
03:13:04
per day um so we want to we want to know or do kind of like a rolling count um out
here so as this number let me go back to the beginning as this number increases 718
2300 4179 we want it to add up over here it's a pretty cool thing I mean you know
it's once you see it you'll be like oh that's pretty easy but you know we're going
to be using partition bu we're going to be using um uh this a Windows function so
it's really good to to Showcase I think so we're going to do
03:13:45
um and let's do um we need to do the sum because we're going to be adding these
together so we need to do the sum of new vaccinations oops do the sum of new
vaccinations let's do over and we're going to say partition oh gosh Partition by
and we need to Partition by the location first and foremost because we're breaking
it up by if we do it by continent the numbers are going to be completely off we
need to do it by location location and and also partly the date but you'll see that
in just a
03:14:32
second but we need to partition it by breaking it up by um location and why is that
because every time it gets to a new location we want the count to start over we we
don't want this aggregate function to just keep running and running running it'll
ruin all of our numbers we only want the this part a partition on the the location
so that it runs only through Canada and then when it gets to the next country it
doesn't keep going um and if we only did that by the way let's look at what this
03:15:01
looks like uh okay real quick I need to cast this um as an integer like we've been
doing in the past you can also do um real quick I want to show you another one
convert and I think it's comma [Music] integer um or is it integer comma let me try
integer comma I think it's that way actually um and you can do it this way as well
that is up to you um you know either one is totally fine if you want to use both
that's even better because then it kind of shows you can do both um but they
basically do the exact same
03:15:41
thing so let's go down and let's see what what's happening here so it goes down to
Albania and since we're partitioning on Albania Albania their total amount of
vaccinations is 347,000 I know that going into it because it has it on every single
stinking row but down here they started to add they started to add up right but we
didn't do that we only partitioned on location so it added it did the sum of all
the new vaccinations by that location so what we need to do is go over here and say
order by and we
03:16:20
need to order it by both the location oops da. location and the date that is very
important uh the date is what's going to separate it out um and you'll see in just
a second what I mean so now let's run this and let's go back down to Albania I
think it was so here's Albania let's go to our first one so here's what we have we
have 60 and it gives us 60 then we add 78 so we add 60 + 78 = 138 then 78 + 1 78
sorry 60 + 78 + 42 = 180 then 60 + 78+ 142 + 61 241 so you get the point it adds up
every single uh
03:17:07
consecutive one and when there's nulls or there's zeros it's going to uh not
anything it's just going to keep it uh going and then you can see as it's it's a
rolling count so we're going to name this let's do as um let's do as um rolling
people vaccinated let's call that um I think that's good now what we want to do is
actually look at the total population versus the vaccinations um and really what we
want to do is use this rolling people vaccinated we want to use the max number
03:17:52
because at the very bottom is our Max number this is how many people in Albania um
we want to use that number and divide it by the population to know how many people
in that country are vaccinated so what we want to do is we'll do this we'll do
rolling people vaccinated divided by population times 100 and as you can see we're
getting an error you can't use a column that you just created to then use the next
one so what we need to do is we need to create either a CTE or a temp table um this
is at this is the time of
03:18:32
of the show of this tutorial whatever you want to call it where I'm going to give
you some options you can do one you can do both you know there's no preference to
me um but we're going to take this and we're going to at least for this first one
we're going to use a CT so we're going to say excuse me we're going to say with and
let's call it um pop vers vac I don't know population versus vaccination and then
all we need to do is specify the um basically the columns
03:19:16
that we're going to input um so let's put as and let's insert that down here
because what we need to do is we want to say um we do continent oh gosh I'm so bad
at spelling continent uh location date population um and then we'll have this
rolling people vaccinated that should be it um and let's see if there's we just
need to close this parentheses so this is our CTE it should be working um actually
that's not true I need an open parenthesis here that's why it's giving me that
03:20:05
error um let's see it's I'm still getting an error so let me see if I'm doing
something wrong do I have this in parentheses there and there I say with pop back
there continent location date population ah I believe that is the issue so then we
need we just need to add that last column new vaccinations um if the number of
columns in the CTE is different than the number of columns here it's going to give
you an error so you got to make sure um and then let's just say for real for right
03:20:53
now select everything from and we'll do and we can even say pop versus vag it'll
come up right away so really quickly let's run this and see what happens uh the
order by Clause can't be in there I knew that but whoops let's comment that out
let's get that all the way up here let's run this so now that query that we were
looking at before is now in here but now we can actually use it to perform further
calculations um so we'll just do everything comma and then we'll do rolling people
03:21:31
vaccinated uh divided by and that needs to be population time 100 I'm pretty sure
this is incorrect give me me a second um invalid object oh it's because I have to
run it with the CTE my bad um so let's look at this percentage really quick um it's
not wrong and it's actually going to give us a rolling number and this may actually
be what we want um so basically what it's doing is it's taking this column and
doing it versus this column and so this number should only increase because
03:22:17
as this number increases this number will increase because the population stays
stagnant um again I'm kind of looking at this as we go so right now 12% of the
population in um Albania is vaccinated so that you know that is that's all we know
I don't think we need to go any further than that I think um if you want to you can
look at the max one um but you'll have to get rid of date and just keep the
location um population Etc because the date is going to throw everything off so if
that's something you want to
03:22:56
do absolutely do that um you can use a temp table here uh we can look at how to do
that really quickly I think um so that you guys know how to do that again I
recommend throwing in one or two of these um like even up here you can do different
um different counts and then do one for each um so let's do temp table all right so
it's going to be a lot of the same stuff we're going to keep this and this is going
to be what we insert so let's say insert into and we need to write where we're
inserting it into but
03:23:42
let's say uh again I'm only doing this for it's going to be basically the same it's
going to have the same effect but um with a temp table so uh we're going to do temp
table and let's look at um let's say let's call percent population vaccinated and
we need to specify our columns so let's go down here excuse me let's go down here
and let's do the basically the exact same thing so continent I think I spelled that
right no I didn't spell that right I
03:24:18
almost did I got really confident we'll do we and and just so you know for these we
have to specify the data type as well um because we're basically creating like a
genuine table is just a temporary one so let's do invar Char 255 we'll do um
location we'll do the same thing and barar oops 255 we need to do date and we'll do
that as date time we'll do population and we can do I mean there's lots of
different ones we can do but we'll do numeric for this example there's new uncore
03:24:59
vaccinations and let's do that one as numeric again you can use different things um
and then we'll do rolling people vaccinated Um this can can be numeric as well um
and then we need to insert that into here okay so we're inserting the data and then
down here we can actually select it and let's let's take this and do right here
except we're going to be doing this by this right here but it hasn't been created
yet but it will be created in just a second okay so you let me see if
03:25:40
yeah so these were the rows that were affected um and and then we got our actual
output from this right here now let's say you wanted to change something in here
you're like oh you know I I don't want to do it we this let me comment that out and
then let me do this and um create that table again oh no we got an error um how can
we get around this very simple I've done this in a I should do this in a different
one you can do drop table if exist and then do this right here um and when we run
this it
03:26:18
should give us our output I highly recommend just adding this especially if you
plan on making any alterations so that when you um run it multiple times you don't
have to you know go and then delete the view or or delete the temp table or drop
temp table or you know it's just built in it's at the top it's easy to maintain and
it looks good it's it's something that that a lot of people do and so if you have
that at the top of your query and somebody you know somebody who wants to hire you
looks at
03:26:49
this like oh okay that makes sense I'm glad they included that they know what
they're doing this guy's smart I should hire them um now what we're going to do is
uh I feel like I've showed you as much as I can show you um with the limited data
that we've looked at again I could have done this for six hours straight if I had
used all the data at least I mean there's just so much data but let's create a view
you know I'm only going to show you how to create one view but I want you to go
back and
03:27:18
create multiple views you know if this is one that you want to look at these Global
numbers um let's look at this one really quick if you want to look at this number
right here toss it in a view I mean that one doesn't make sense to toss in a view
but this one toss these numbers in a view um and we're we're going to um look at it
in Tableau later but for right now let's just create our view um so like let's just
say creating view to store data for later visualizations all right so let's say
03:27:57
create view um and I want I'm just going to keep the same thing um like that um and
for views it's so easy I mean I'm literally just going to and I can even take um
the order by I believe we'll see if I'm correct um actually let's get rid of both
of these things so it says create view percent uh percent populate oops percent
population vaccinated um and let's see am I doing anything wrong [Music] here let
me see the order by clause I was completely wrong I was wondering
03:28:45
why I was getting that now let's try running it okay so it ran successfully um
let's look at our views it's not going to be in there let's refresh it hey look we
got our very first view we can open that up like a table if we want to um isn't
it's I mean it's gorgeous um if you want to get rid of that select or sorry control
shift R that's a refresh um and now it it basically recognized is it but let's go
back here for a second um and you know we can now query
03:29:19
off of that it's a view now so you know it's it's something that you can it's
permanent you know you have to go in and actually delete it's not like a temp table
this is now permanent and this could be something that we now use for a
visualization later so do some of these look at some of the queries that we've
looked at and create a few of these views um and we will use them later um normally
in like a normal setting uh if I was actually working I would put some of these in
actual like I would
03:29:51
call them like a work view or a work table or something set aside so that I can use
them consistently um but I would also set them aside so that I could connect
Tableau to that view now we're going to be using something called Tableau public
that'll be in the very next tutorial unfortunately um let me see if I can show you
I can't show you Tableau public does not connect to SQL databases um and that's
because it's free and I totally get it you have to pay for the upgraded version but
I am
03:30:24
not a a billionaire okay I cannot afford uh the real version of Tableau I'm also
not like a student or or like something where I can get it cheap so I'm not paying
for that so we're going to use Tableau public and and I recommend this anyways
because anybody can access it it's it's free for anybody so we're going to be using
Tableau in the next one to actually visualize a lot of these things I want to get
at least five visualizations we're going to create a dashboard it's going to be a
beautiful
03:30:50
beautiful thing all right so the very last thing that we are going to do is we are
going to actually save this and then put it into GitHub and I just want to show you
how to do that that's where we're going to be storing our code at least for now um
so let's go up here let's click file let's click save as I've already have multiple
versions of this let's just put B2 we're going to save that so we have this saved
now I'm going to go over here I'm going to go to my GitHub now if you
03:31:21
don't have an account I highly recommend getting an account so you can start
putting your portfolio projects in here of course we're not going to put our
Tableau one in here but our SQL ones and our python ones you can put in here again
I'll talk a lot more about how we actually want to display this in GitHub or other
places but what we're going to do for this is we're going to create a new
repository let's call this one portfolio projects make it public we'll create the
03:31:54
repository we'll do all that extra stuff later so what we now want to do is upload
an existing file we'll click right there go to choose files and we'll click this
latest one that we saved uh and we'll open it and we can always change the name of
it later on and you can add notes if you'd like but we'll commit that change so
we'll actually upload this uh this file um but let's look at it really quick and
I'm going to go back and I'm going to use the real one where has the
03:32:27
formatting and and the notes that I have that I wanted to add in there but as you
can see you know you can see all of the queries that we wrote and this is fantastic
so if somebody comes in here you know we'll have more notes and kind of better
comments on what they do um and what the takeaway is this from for a hiring manager
to you know when they actually look at this so this is a really really good place
to start again uh this may not be your optimal place to put this I'll give you a
few different
03:32:58
options in a later video about how we can actually uh potentially improve upon this
I'm really looking forward to getting more portfolio projects done so we can
actually start building a compl complete portfolio uh if you've stuck around all
this way I just want to say congratulations I mean I know this was a long video I
know that it took a long time but you stuck with me uh you you put in the hard work
and that is fantastic and I really hope that it pays off and I hope that this has
been
03:33:29
helpful thank you for watching we'll have a lot more uh videos in the future on
these portfolio projects and I'm I'm just really really looking forward to doing
them to be honest so thank you for sticking with with me uh thank you for watching
I really appreciate it if you like this video be sure to like And subscribe below
and I will see you in the next [Music] video what's going on everybody welcome back
to another video today we will be heading back in a sequel for our third portfolio
03:34:10
project now I am extremely excited for this project in particular for a few reasons
one we're getting back into SQL and I really like SQL and two we are finally
focusing on data cleaning and I have talked so much about why data cleaning is
important and that you really need to learn how to clean data and that that's a big
part of what a data analyst does but I haven't actually showed you how to do it yet
and so that is what this whole project is going to be and then at the end you'll
get to add it to your
03:34:39
portfolio so it's really a win-win now before we start I just want to say that I
think it's going to be a little bit more advanced than our very first video in
Sequel where we walk through data exploration if you see something that you have
never seen before I will do my best to explain it while we're walking through it
but if you get confused or it seems a little complicated please pause it Google it
do a little bit of research and then come back and I think that will be very
helpful with that being said
03:35:02
let's jump over to my screen and we'll get started on the project so we're going to
start over here on GitHub and this is where I've actually put the data set that we
are going to be using so I will put this link in the description uh we're going to
go right over here to the Nashville housing data for data cleaning all you have to
do is Click download and it's going to download it and you can open it up if you
want to we're not going to do anything to this data at all
03:35:26
but really quick I'm just going to show you what it does look like um and we'll of
course look at this in SQL in just a little bit but we have a unique ID parcel ID
uh we have this address a sales date uh the price of the home so this is housing
data if you didn't pick up on that already uh who actually owns the home the owner
address and then some information about land value um bedrooms bathrooms things
like that again not super important um because we're going to be doing all of
03:35:55
this in uh SQL so let's actually get this data into SQL we're going to import it
the exact same way that we did uh in the very first video so we're going to come
right over here going to go all the way down to Microsoft SQL Server 2019 Import
and Export we'll click next our data source is like last time a Microsoft Excel and
let's take a look and we'll take that first one this is the most recent one I've
downloaded but I just wanted to make sure so I downloaded a few times um for the
03:36:32
destination we're going to click SQL Server native client 11.0 and this is my
client or my server right here and I'm going to go down here and I want to put it
in this portfolio project so you know just configure this to what your server is um
again if you haven't done this before you've never set up SQL server or a server um
to go on SQL Server I will leave a link hopefully right here also in the
description uh like I did for the first project so um you know be sure to go
through that
03:37:04
video so that you know how to download this and have everything we're going to copy
the data we're going to take sheet one um we could renamed sheet one to something
else but uh we didn't and then we're going to finish this and finish and it should
run successfully hopefully it's looking good perfect so we have 56477 so let's head
over to SQL all right let's go to our database portfolio project uh and here is our
sheet one now I'm going to rename this um let's rename name it what is it
03:37:42
Nashville let's just do Nashville housing that's what I'm going to rename it as um
at least so when I post these queries um to the GitHub and you see them this is
what they will be so if you want to have them the exact same or be able to copy and
paste them um you know you should you should do that as well so let's take a look
really quick let's select the top 1,000 but there's about 56,000 rows there's a lot
of data in here um and a lot of things so uh I'm about to open up a a save thing
03:38:13
and we'll walk through the exact things that we're going to be working on in just a
little bit but um yeah this is what the data looks like in here there's lots of
columns uh lots of data so really excited about this um let me pull this open
really fast it's going to be this project walkth through here are the things and
I'm going to show you this really quickly here are the things that we're going to
be walking through so we're going to standardize the date format we're going to
populate the
03:38:38
property address data um that's referring to this right here if you notice there's
the address and there's also the city that it's in so we want to be able to
separate that out um and that is actually right over here we're going to be doing
the same same thing to the owner address except that has an address a city and the
state um which makes it a little bit more complicated and so um that one should be
really really cool to to show you um oh whoops I I messed up that's what this one
is breaking out
03:39:09
into individual columns that's where going to do for that this popular in the
property address um you know if you notice and we'll go into this a little bit
there's actually some values in the property address that are blank but I'm going
to show you how you can actually populate that um which you know is a it's just a
cool trick that I've used a few times and it it it does work I think you'll find
that one interesting um in the sold as vacant field we're going to
03:39:34
be doing some um some case statements if then um then we're going to be removing
duplicates and then delet deleting unused columns so we have a lot to get through
this could be potentially the longest video and I'm okay with that um because I'm I
love SQL down here and and I will say that when I when I in the very first video I
said it was going to be an ETL video um and I fully intended on doing that but I
ran into not issues on my side but issues in the fact that the ma vast majority of
people who are
03:40:06
going to be watching this are not going to be able to do what I did to configure my
server um but I left it in here anyways when I think ETL is an automated process in
order to uh extract the data from somewhere we're going to transform it and then
put it somewhere this was going to be the extraction method um and I was going to
put it in a store procedure so that you could um you know run the run the store
procedure run the job import the data it was going to be really cool but I know
that if I was
03:40:33
having trouble with it me trying to explain it to you and you being able to figure
it out on your side was going to be very tough I left the this anyways because I
was able to get to work on my computer um but it is tough and it took a lot of
research um and I did this for a previous server like a year or two ago and I
remember it being crazy hard but I was able to figure it out on my computer so if
you want to try it out um try it out and and look into the stuff so I'm going to
leave this here this is just
03:41:01
for if you want to try it it's a little more advanced um and so you don't have to
just important and this will be a data cleaning project instead of an ETL project
but data cleaning is what 90% it was going to be anyways um anyways let's go back
up to the very top really quickly I have a whole another laptop right here as I did
in the first video I didn't show it to you last time but um I have all of my
queries written out over here I'm going to try to do this as quickly as possible
03:41:31
we have a lot to get through now before we start writing our queries I am going to
turn off my camera so I do not get in the way all right you should still be hearing
my voice let's let get started let's just start with select everything and we'll do
from uh and it is portfolio project. db. Nashville housing so let's just get this
pulled up on screen awesome so this is exactly what we were looking at before and
the very first thing that we're going to be looking at is this sale date now uh I
wrote standardized
03:42:07
sale date but I'm really just going to change the sale date um so let's copy this
really quick and let's look at just s date and it has this time on the end and it
serves absolutely no purpose and I it just annoys me I want to take that off and so
right now it's a say it's a date time format but we're going to convert and we're
going to do date and we're going to take sale date sale date and we're going to go
like that and let's run this really quick and this is what we want it to
03:42:47
look like all right so let's say update and we have portfolio project specified up
here so we can just say Nashville housing and we are going to set sale date equal
to and we're just going to copy this now I will say before we do this um I had some
issues in my when I was initially doing it whether or not it made the update and I
was I'm not sure why why not it was doing it um so yeah it's not doing it right now
I you try it out on yours it may or may not be working I'm not exactly sure why
that is
03:43:22
because I would say like 80% of the time it's doing it 10 20% it's not I don't know
why um no logical explanation of that but uh when I most the time when I did it
they would then be the same column something we can do I just thought of we can do
alter alter can't even say that word alter table and we can say um I think it's new
or it's add add um give me one second yeah so add and we'll just do sale date
converted um and let's make that a date format and bum just like this and then
03:44:04
we can say like this and say sale date converted um let's try this and see what
happens so I'm going to add this column and then I'm going to update this and it
says it's affected let's see what happened uh so let's write sale date convert sale
date converted let's see what happened let's see if it actually worked and it
worked okay so we we now have a column um and maybe at the end we'll remove that
sale date column U so that we just have that sale date converted but we know what
that is you
03:44:47
don't have to name it that you can name it sale date to or something like that um
cool well let's go down to the property address and let's get a just a really quick
look at it uh let's copy this up here I hate rewriting this stuff so I'm always
copying and pasting um but we're going to be working with the prop address there we
go so let's take a look at this really quick um so let's look at sorry I was
looking at my notes we need to look at where the property address is null so what
you'll
03:45:29
see really quick when we run this is that there are null values um why there are
null values yeah I really don't know um I I really am not sure but let's look at
everything where this is um where it's n so we have this property address we have a
sale date a price legal reference um there's this parcel ID and there's this unique
ID um so we have a lot of information and when you have something like this
something like a u an address an address is you know the address isn't going to
change
03:46:08
the address is the address the owner the owner's address might change but the
property itself the address 99.9% of the time is not going to change so you can say
with almost certainty that you know this property address could be populated if we
had a reference point um to base that off of so really quickly um let's look at
just everything and let's look at and we'll just order by let's do property not
property address uh let's do parcel ID and let's take a look at this so we have to
do a little bit of
03:46:50
some research on this um but I'm going to show you something really quick let's see
if I can find example um in not too long okay so here's an example here's the same
ID so 015 bum and that's the exact same address and we'll find this a lot of times
and I look through the data and it's it is pretty much accurate um when it does
have it it it is the exact same address so this parcel ID is going to be the same
as the property address um so something that we can do is basically say if this
parcel ID has an
03:47:33
address and this parcel ID does not have an address let's populate it with this
address that's already populated because we know these are going to be the same
that is basically what we are about to do um and it's not super complicated um but
let's get started writing it let's copy that down there um one thing we are going
to have to do with this is do a self-join so we have to join the table to itself to
look at if this is equal to this then this needs to be equal to this that kind of
03:48:11
thing um so real quick let's just write that join part out and we'll go from there
I don't know why I sounded Canadian right there we'll go from there uh so we'll
join on this and we'll say on a do oh wait let's let's label them I'm gonna do this
in a really lazy way I'm just going to do a and b a. parcel ID is equal to b.
parcel ID and um let's see really quick so we need to find a way to distinguish
these the sale date could be the same um one thing this unique ID is
03:48:51
is unique so we need these to be different so let's use this and let's say um let's
say and a. unique ID is not equal to b. unique ID so all we have done here is we've
joined these the same exact table to it self and we said where the partiel ID is
the same but it's not the same row right because this is a unique ID unique will
never that means these will never repeat themselves so we'll never get the same one
so if this is equal to this but these are different we want to then populate um
03:49:29
populate the other one so let's do a. parcel ID and we'll say a do property address
B do parcel ID comma bproperty address um and let's take a look at this really
quick and let's do let me see if this works where aproperty address is null and
let's see if see what comes up here okay so this is perfect this is exactly what I
wanted to see so we have this parcel ID we have this parcel ID and here is our
address and it's blank in all 35 of these so we have an address for all of these
but we're not
03:50:16
populating it so what we want to do is we want to say use this thing called isnull
so isnull is basically saying it's the first thing is what do we want to check to
see if it's null so we want to check aproperty address this whole thing now if it
is null what do we want to populate um we want to put in there this B do bproperty
um address because we want to take that property address and stick it in there so
um let's run this really quick so this row is what is eventually going to be stuck
into this
03:50:56
row so this is perfect um it's literally saying when it's null take take this and
put it there and so that's what this um this part of is doing so let's go in here
and write our update uh so we want to update and let's take this whole thing from
here up and we this will be the set oops um so we're going to set um property okay
we need to specify um and just so you know when you're doing joins in an update
statement you're not going to say Nashville housing okay that's
03:51:37
going to give you an error you need to use it by by its Alias so let's put a so now
we're going to say property address is going to be equal to and now we're just
going to copy this is null and put it right here and we only want to update let's
see if it it does take this so I think this should be correct let's let's test it
out really quick and we're going to run this above query and see if it made that
update all right so there you go um as you can see there are now none that have
03:52:10
null in there otherwise it'd be giving us an output right now so that one is fixed
we can go back and check it if you want to please go back and and double check that
um but that is what we did and it worked perfectly so that's what that is null does
it checks to see if this is null if it is null it it it can populate with a value
you can also do like a string and what we I mean you can write you know no address
if you wanted to do something like that we don't want to do that we're going to
keep it how it
03:52:41
is let's keep moving on we do not have unlimited time here trying to keep this I'm
going to try to keep this on one under two hours stretching the rules because for
my love of SQL and that is the only reason um and this I think is going to take a
little longer so let's take a look and let's copy this real quick and let's take a
look at uh what are we doing the property address the property address um and we
can get rid of this as well so if you notice we have two things here we have both
the address and then
03:53:22
there's this comma after all of them and there is the city now you know you don't
know that or you maybe you haven't looked into this but I have and there are no
other commas anywhere except for in between these things as a separator as a
delimiter um a delimiter is lit if you don't know what if you've never heard that
term delimiter a delimiter um is something that separates different columns or
different values so for us the delimer is a comma and for this first one because
we're going to be separating
03:53:56
this one out and then we're going to be doing the owner address um for this one
we're going to be using something called a substring and we're also going to be
using something called a character index or a charart index so let's start writing
that out and let's do select and let's say substring now the substring that we want
to take we of course want to be looking at oops let me um put this down here so it
helps us out a little bit and I'll get do like that so substring and of course
we're looking at
03:54:32
property address and we want to look at position one so we're going to start at
position one one now this next part is something that you may have never seen
before um and if that if you haven't that's totally okay um we're going to be the
character index is going to be searching for the um it's going to basically be
searching for a specific value okay that's all it's doing and you and you can look
into this a little bit more if you want um so it's going to be Char
03:55:04
index that's how it's spelled and then like an open parentheses and we want to
specify what we're looking for so it can be anything you can even do you know if
you wanted to things like um Tom or you can do Val well you do it um like this you
can look for Tom or if you're looking for a specific word like John you can search
that that's what this is for um but we're going to do a comma where are we looking
that's what this next one is so we're looking in property address uh and then we're
going to close
03:55:36
the parenthesis and and we'd also close it again to complete off that substring and
we'll say as address um and let's just take a look really quick at this so right
now it's taking the it is basically going it's looking at property address it's
going to the very first value or starting at the first value and then it's going
until the comma Now the unfortunate thing is is we actually getting this comma in
this output and we don't want that uh you don't want a comma at the end of every
address we can
03:56:12
change that um so we can say because this is specifying a position if we just look
at this chart index which we can do really quick it is going to give us a a number
it is saying at position 19 that is where the comma is right so it's not like it's
taking it's not a value or it's not a um it's not a string it's a it's a number so
we can say minus one one and if we do that and now we run it now that comma is gone
because we're looking back we're going to the comma
03:56:49
and then going back one from uh one behind the comma so that's how you get rid of
that comma right there um the next one's a little bit more tricky because we're not
starting well it's not super tricky but we're not starting at that first position
anymore so let's put a comma then we have our substring now where we want to start
is at this as at where the comma is so instead of position one we want it to be
where that character index um I don't want it to look like this this whole time is
it like this
03:57:18
what am I doing it doesn't matter let's just get rid of this and see if that fixes
it what am I doing here oh it's just because this is wrong um and we'll just do
comma parentheses that might fix it ah doesn't matter okay I'm wasting time I'm
going to keep going we want to start in this in this position okay um but we
actually don't want to start at minus one we need to start at plus one because we
want to go to the actual comma itself then once we get to the comma we want to
03:57:55
add one so if we didn't if we just left it the same again it would include the
comma at the beginning um then we need to specify where it needs to go to where
does it need to finish now every single thing is going to be different every single
address has a different length but we can use that to our advantage in this one and
we can literally say the length of property address you guessed it right and then
we can close this off let's see if that works okay what's messing up so we have
03:58:27
property substring property address comma character index and then we have
specifying it in the comma um we have the property address plus one okay we can't
have that right there I don't know why I had that F finally figured it out at the
end um so let's see what we're doing here let's see if it worked it works perfect
um and again this was one that I'm guessing a lot of people haven't used before so
I was trying to explain it a little bit more than other ones um but if we take
03:59:01
that out we take out that plus one you're going to see the comma at the beginning
right here so that's what that is um so Plus one and that's what we're going to
keep now we can't separate two values into from one column without creating two
other columns so just like we added this um table up here we're just going to I
mean we're we're I'm just going to copy this down here really quick we're going to
create two new columns and add that value in so we're
03:59:35
gonna we're gonna add that we're going to call this um let's call it because it's
property address let's do property property split um and this is the address and
then we'll say this one this next one is going to be property and this is City
split city city and this isn't going to be a date of course uh this going to be
let's do narar and let's make it 255 just in case it's a large um just in case it
is a large string a large text so then we can say um update that update
04:00:15
that um and now we need to in insert um what we did for it so this first one is the
address so we're going to say that equals the address and we're going to take this
whole thing this whole substring oops and copy that and that's going to equal this
um and then at the end we'll we'll look at it really quick so first let's add this
table I'm going to do this one at a time really quick so you can see it so it adds
the table now it adds the results and again adds the table of city and sets that
04:00:53
City to that substring and now let's take um let's take this and just do select
everything from this and you should see at the very end because when you add it it
goes to the end we should have two new values and here we are so property split
address and property split city um it's much more usable than this I mean this
would be a nightmare not a nightmare it just be annoying to use this column I mean
now that it's separated on the address and the city it's so much more usable of
data it
04:01:27
really really is the next thing we're going to be looking at is this owner address
now it was hard enough or it was tough enough to do this um but I want to show you
maybe even a simpler way to do it even though this is more complicated so let's go
down here and let's get rid of this so let's say um let's get this and let's just
say property oops no we're doing owner owner address here we go let's just take a
look at this let's see what we got so again we're using or what
04:02:08
we have in here is the address the city and the state so what we need to do is
split all of those out um and again I don't want to use substrings again that was a
pain I want to use um something a little different something again that you may
have never seen it's called parse name um and parse name is super useful um
especially for like delimited stuff stuff that's delimited by a specific value um
so let me just show you what it is and then we'll go from there so we can say parse
parse name um and we're going to
04:02:47
be doing this on the owner address okay let's let me see let me see yeah I mean
it's because I don't have this of course I do that all the time so annoying so on
the owner address um and then let's do one and let's just see what happens uh
nothing changed of course because parse name only is useful with periods or that's
what it looks for that's what par name looks for and these are commas so something
we can just do is we can replace those commas with uh a a instead
04:03:27
of a comma we replace it with a period so super easy we're just going to do owner
address comma um and we'll look for the comma in there then we need to specify what
we need to change it to we'll change it to a period and let's close that and now
let's run it and it's taking Tennessee so something odd about at least to me odd
about parse name is that it kind of does things backwards than what you would
expect it to do uh let's really quick let's add the other things um you'll
04:04:04
you'll get a kick out well you won't get a kick out of this as much as I do here's
one two three let's execute this and it separates everything for us but it's
backwards so it's 1 2 3 you would imagine it' be one two 3 but no it's one two
three so all we need to do is go three 2 one and run this and there we go so now we
have it broken out this is now our address this is our city and this is our state
so super what I would consider super easy a lot easier than the substring but I
04:04:42
didn't want to show you the easy one first and then give you the hard one um so now
we just need to add those columns and then we need to add the values so let's do
this uh let's make some room and I need to get rid of one of these I think o did I
do that right what did I do I have my alter table update alter table update what is
this doing here what is this I don't even know what this is we'll just go like that
so now we have three perfect um so from National Housing we're going to say we're
going
04:05:19
to say this is the owner oops owner split address um actually let me just copy the
owner make it easier so we have owner split address owner split City and let's do
owner owner split and then State oops and copy there owner split City there we go
owner split address owner split address so I'm putting all the sets equal to what
we're about to add to so now this first one this three is the address we'll paste
it there the second one is the city so we'll put that oh I see what happened here
that's what
04:06:10
happened got to get rid of that um I set the owner split City equal to that middle
one and then of course the third one is the state so let's go do that and that
should be done so let's do it two at a time oops owner split address what's wrong
with that oh I probably just got to run this first let's try that tried to get good
too quick um you can do this a much more efficient way I'm just doing this for
visual purposes I would update all the tables first or add all the um columns first
I mean and then
04:06:51
do all the updating at the end that's normally how I do it but um again for visual
purposes that this is what we're doing so let's go get this actually let's get this
bring this down here um don't keep this in in your final queries it's a lot of
extra selecting everything you don't need to do that um so here we go so owner
split address owner split City owner split State again so much more usable than
when it's all in one column I mean it is 10 100 times more useful data now um you
know that
04:07:27
one to me you that gets used a lot let's keep it going I feel like we're making
fantastic time I don't even know I'm not even keeping track of time time is not
even relative anymore be three hours and I wouldn't care let's keep going um let's
take a look at this column right here sold as vacant um right now has no but let's
look at let's do select distinct oh gosh I hate when I do this I do this all the
time am I the only one I don't think I'm the only one and we'll
04:08:01
do sp uh what is it sold as okay sold as vacant let's do a distinct count on are
distinct on these so right now we have yes no n y I'm guessing which is no and yes
and then no so let's look just for just because I'm curious um let's look at a
count of I don't want to do the let me just do sold as vacant let me do a count of
this and we'll Group by uh sold is vacant okay let's run this and see what we get
oh gosh let me order by okay here we go now we're now we're
04:08:46
moving that's not what I wanted at all order by two here's what I wanted okay so at
no we have 51,000 yes 4,000 almost 5,000 no and then just a few so let's change
them to to yes and no because these are obviously the vastly more populated ones um
and we're just going to do this through a case statement so we're going to say oh
yeah let me get this ready before we start oh yeah I'm ahead of the game now let's
do select and we'll do sold as vacant and then we'll start our case
04:09:19
statement um yeah let's do right here so we'll do case when sold as vacant is equal
to yes all we want to do is say then we want to make it no oh won't make a yes what
am I doing geez I'm losing it when and I'm just oops oops oops oops ignore that
pretend that didn't happen when sold as vacant is equal to n then no and then else
we want to say if it's already if it's not one of those values it means it's
already a yes or no so we're just going to say just keep it as
04:10:08
sold as vacant and then we'll end it so let's take a look okay so let's scroll
through here and see if we get any that we can see oh I just went byy some didn't I
oh I just went buy some I know I did um let's see okay here we go so here's an N
it's now a no so this this sold as vacant as this column the newly uh the case
statement right here is changing it so the N is no so this should work all and this
will be a unique update statement um and I hope it works unlike the first update
statement
04:10:46
that we we did that was a that was a travesty um let's do update Nashville housing
um and we'll say set sorry I'm talking faster than I'm going set sold as vacant
equal to and we can just literally put in this case statement um it's not but let's
try it okay now let's go look at this again and see if it made the update there we
go the update statement worked oh fantastic it's a beautiful thing okay great I'm
glad that one worked I was worried for a second that
04:11:24
uh my update had broken in um in SQL Server now now we're going to do something um
these next two things is we're going to remove the duplicates and then we're going
to get rid of unused columns um this removing duplicate I got to be honest I don't
do it a ton in SQL but I have done it um especially for like queries you know when
I'm looking at full tables I I will write some sort of temp table and like put the
remove duplicates in there I normally don't delete actual data we are we're going
to
04:11:57
do that um but it's not a standard practice to delete data that's in um that's in
your database so just for future purposes don't blame me if you delete all the all
the duplicates back accident in your uh table at work so you can do this a few
different ways but the way I'm going to show you is we're going to write a CTE and
we're going to do some windows functions to find where there are duplicate values
okay so excuse me so let's start writing out our CTE and or
04:12:31
you know even we can write out the query first then put it into a CTE that might be
a little bit better so let's do select everything and oh my gosh I was about to do
it somebody's out there just like waiting for me to make that mistake again so we
want to partition our data um when you're doing removing duplicates we're going to
have duplicate rows and we need to be able to have a way to identify those rows
right so you can use things like rank order rank um row number there are a few
different
04:13:10
options we're going to be using row number um and you know if you want to look into
how Rank and rank uh like dense Rank and all those ones work please do that so you
know why we're doing it um but we're using row number because it's the I think the
simplest um and it's going to do what we need exactly so I'm going to get this over
here we'll say select everything because we're selecting everything then we're
going to add this row number on here so row number and we're going to do these
04:13:37
parenthesis right here we're going to say over and an open parentheses now we need
to write our partition because we're going to partition this data so we're going to
say um Partition by cool um now really quickly while we're here we need to actually
know what we're partitioning on that's helpful so let me write this so while we're
writing it we can see what we're doing we need to partition it on things that
should be unique um two basically to each row um if in I
04:14:16
guess for the sake of what we're doing we're we're going to pretend this unique ID
isn't here um although you know you could say I'm cheating it doesn't matter but
I'm going to say you know if things like the parcel ID are the same if the sale
date is the same um the property address is the same the sales price is the same
This legal reference which I'm guessing is some type of legal document saying it's
like somebody's uh property if all of those are the exact same then
04:14:45
to me that is the same data it's it's unusable just for example I mean this may I
don't I mean this data is just some random data set I found online right so that's
what we're going to be going with that's what we're going to be running with and
pretend that lie that I just told you is completely true so what we want to
Partition by let's start with the parcel um can I is this not right here why is it
saying this why is it not giving me okay doesn't even matter I'm just
04:15:13
going to say parcel ID um we can say property we'll do a property address stick
with me we're getting somewhere we'll do sale price um what do we say sale date I
mean there shouldn't be two of this they didn't sell twice on the same day come on
and then legal reference and oh I know why it's not working or my autocomplete
isn't working which I love um it's because we're creating our own partition so it's
its own column of course I don't know why I'm it's late as
04:15:56
you can see down here it's 11:15 it's getting late for me but hey I I this is an
adrenaline rush for me um now we need to order it now we want to order it on
something that should be um not necessar I guess unique so we're going to order it
on this unique ID we'll see if that actually does what we want it to do um oops
what am I doing order bu come on and we'll say uh unique oops unique ID perfect and
we should be able to close that off and we're going to call this R num I mean
04:16:34
that's just that just makes sense so now we have this and let's run this really
quick and see what happens so um and maybe we should order this as well but we'll
maybe we'll do that later yeah let's order this on parcel ID um order by parcel ID
let's just see what happens because this I think that should be pretty accurate um
let's scroll down and see if we get any this is all ones maybe should be doing it
on unique ID I don't know let's see if we get any hits okay there's a two in
04:17:16
there let's let's look at this really quick because I want to see it maybe I did
something wrong I don't know it is absolutely possible somebody play some Jeopardy
music for me real quick yeah I don't know I don't know why it's um okay so let's
see let's let's look at these two um and let's see if I did something wrong oops
don't need to pull that up I was doing some research when I when that convert by
wasn't working um okay so this one and this one it's giving
04:17:57
different row numbers so let's look at the actual data ignore the unique ID but the
data itself so the the sale date is the same the sale price is the same the legal
reference is the same the owner is the same this is the same I mean literally every
single thing in here is the same so this is a good example so we're going to in
this query that we're about to write that that will be that second one will be
deleted because we don't need it now there there's only one so it looks like this
04:18:32
is working as intended um I can also do um let's do where rowcor num is greater
than one let's see if that I don't think it will work actually yeah that's because
uh it is that is in a Windows function of course we can't do that what am I
thinking that's why we need to put it into a CTE oh of course it all comes back so
let's call this all comes back to the CT those things are amazing um let's call
this um row num num CTE and we'll say as and then open parentheses and I don't
think we can
04:19:16
have an order by in here let's do it like this and let's just do select everything
from row number CTE so again if you haven't watched my like CTE CTE video or you've
never used a CTE before um this is now basically almost like a temp table so we're
going to be able to this query down here is querying off of this table that we
quote unquote created so um it looks like it's working so all we're going to do is
select um everything from that and we want to say where row num because that's now
a row
04:19:56
is greater than one and let's order that by I don't know property address let's see
if that works and let's see what happens okay so all of these are duplicates we
have 104 of them it looks like so there's not many but it there's twos any threes
no no threes so there's multiple of these rows or columns that are basically
duplicates um and we want to delete them so all we're going to say is we're going
to select instead of saying select everything from row we're just
04:20:35
going to say delete and uh yeah I got to get rid of that order bu that doesn't work
and let's do this there's 104 let's see if it worked um so now let's do let's go
back and we'll say select everything and let's see if there's any more duplicates
in there there are none that is fantastic every I'm like biting my nails now to see
if each one of these Works um because I that first one didn't work um so yeah so it
worked we got rid of the duplicates that is fantastic um and now
04:21:08
it's smooth sailing from here because we're just going to delete some um unused
columns that we don't care about this doesn't happen often um this I would say
actually happens more in like views when I'm creating views I have a view and I'm
like oh I didn't mean to add that column let me just remove it because it's a I
don't need it you don't do this to um like the raw data that you import usually
this is I mean again best practices please don't do this to your
04:21:36
raw data that comes into your database um talk to somebody before you do this
that's just my my legal advice for the day I'm not legally bound or legally held
responsible for any mistakes you make so let's keep going um we're literally just
going to delete some columns it could be any columns that we want um but for
example we got have these property split address and owner split address um in city
and state and city and these are perfect and much more useful than these owner um
these owner
04:22:07
address because this is really unusable to be honest so we're going to delete those
um and maybe we'll also get rid of like I don't know maybe the land that land use
might be useful this tax tax District who cares about that um so it's going to be
super easy we're just going to write alter table alter table did I say that right
geez um and we're going to say alter this table and we're going to drop a column
and you can do as many as many as we want so we're going to say
04:22:40
owner um address we're going to do tax district and let's also do the property
address all right and let's try this and let's see if it works I'm nervous all
right so as you can see that the property address is gone the owner address is gone
the tax what was it tax district is gone and now we are left with this um now
remember the whole point of everything we were doing was to clean up the data right
we wanted to clean the data and actually now well now that we're here we have this
sale date
04:23:28
as well U and we have the sale date converted over here let's get rid I forgot
let's get rid of this oh that was my dog Max excuse them let's get rid of oops
let's get rid of that sale price that that or the um sale date that made me look
like an idiot this is Sweet Revenge sale date Sweet Sweet Revenge all right and it
is gone so it's as easy as that now remember like I was saying before the whole
point of this project is to clean the data and make it more usable um and it may
not have felt
04:24:03
like that as we were going through cuz I wasn't you know really looking at the
clean cleaning data uh uh we were cleaning it but you know what was the purpose of
it I may not have highlighted that too much all these other columns that we created
um are just it's much more usable much more friendly um this is standardized now
and you know we we did that through quite a few various methods um so let's go back
up the top we're going to recap what we did really quick so using this convert we
tried to
04:24:38
standardize the date format or change the date format may or may not have worked
for you didn't work for me we populated this property address um which we did that
before we broke this out because if we reversed it if we broke these addresses out
into individual columns and then we populated the this thing um we would have
because then we went and deleted uh we went and deleted this column oops sorry we
went and deleted uh this property address so we wouldn't have actually gotten any
of that data so
04:25:11
there was a reason it was in that order uh don't mess that up that's happened um so
we broke it out we did that to to using um substring chart index as well as parse
name and replace then we went through and we changed yes to no or Y and n's to
yeses and NOS using case statements um then we use we removed duplicates using a
row number a c te and windows function of Partition by and then at the end we
deleted a few useless columns that we no longer want to see because um they are
horrible and
04:25:46
terrible and um you know we don't want to see them anymore that is the entire
project that was everything and you did it and I'm honestly super proud of you for
sticking around this long it this this was not necessarily an easy project we used
quite a few new things that I may have not talked about or showed you before this
to me is just the beginning right this is just a a glimpse into all the things that
you need to do you need to look for um in order to clean data so you know I really
do think this is a
04:26:19
good portfolio project because it will show that you understand and know how to
clean the data although this is not an end to-end project right that could that
would take a long time and a lot more exploratory analysis looking into the data to
to figure out what we need to change but for all intents and purposes I mean this
is a a pretty good project for cleaning data and I hope that you learned something
I also hope that you worked on this hard um if you want to make any improvements
please do that
04:26:49
this is not perfect by any means there's other things that you could change um you
could you know I don't even know I'm not even going to try to guess you could do
other things to this data though um and and create your own queries create your own
um data cleaning uh part of this and so um you do that if you were able to get this
um the ETL part of it done do that I think it'd be really really cool um again I
was able to get it to work but I don't think 90% of people out there would be able
to get it
04:27:19
to work um it's just every computer is different every server is configured
differently um and so it would just be a huge pain so I decided to cut that out and
I'm sorry um but hopefully this will suffice um with that being said this is it you
made it all the way to the end again I'm super proud you guys are doing fantastic
you guys are the ones putting in the hard work to build the portfolio for your
future job I mean it's not easy but you're putting in the work and so and so kudos
to you um in our next video
04:27:52
we're going to be going into python for the very first time really excited about
that one because um I think the only python video that I have up right now is on
one where I was scraping data from Twitter so um you know this will be a nice
change a pace or a little bit different content than I normally put out and so I'm
really excited about it and I hope you are as well with that being said I am done
with the video I'm going to be stopping it soon thank you for joining me if you
like this video be
04:28:20
sure to subscribe be sure to like this video leave a comment below um telling me
how it changed your life uh and I will see you in the next video [Music] goodbye
[Music] what's going on everybody today we are starting our Excel tutorial [Music]
series now there are so many things that you can do in Excel so I don't know how
long this series is going to be it could be 15 or even 20 videos but what I do know
is that I'm going to be covering just about every single thing that I've used since
I became a data analyst and I
04:29:04
want to show you how to do it uh so won't just be the more concrete things um you
know like pivot tables charts V lookups things like that it'll also be some of the
more nuanced things like how to deal with missing data or how to deal with dirty
data and how to clean that up within Excel and so those are things that you may not
be able to do you know if somebody wasn't showing you how to do it and so that's
what I'm going to try to help you because I know that that is something that you
will need to do or
04:29:28
learn how to do in Excel now before we get into it I want to give a huge shout out
to the sponsor of this Excel series and that is udem me I took so many Excel
courses on you to me when I was first starting out as a data analyst and there was
this one course that I kept going back to over and over again because as I got into
it in my job I realized that there were so many things that were in that course
that I really needed to know but I didn't realize I needed to know it and so I'm
going to put the links to
04:29:52
those courses in the description in case you want to take those again huge shout
out to you to me without further Ado let's jump on my screen and get started with
our very first Excel tutorial all right so I'm going to go ahead and get rid of
myself we are going to be looking at something absolutely pivotal in your data
analytics career and that is Pivot tables uh and I think that's really appropriate
it is probably one of the most commonly used things I think that data analysts use
to convey information
04:30:15
in Excel it's super easy to group things together to display information in a very
easily understandable way especially for people who are not data analysts right I
use this a lot for other managers or for higher-ups um who don't want to get into
SQL or or you know aren't super text savy in like python or Tableau they just want
it in an except sell and so I use it all the time for that reason and so we're
going to be using this data set right here bike store sales in Europe I will
04:30:43
include this link in the description um we're not going to look at the columns just
yet we're going to download it um I've already downloaded it a few times but we are
going to go to um our downloads we're going to open it up and we're going to open
up this sales right here and give it a second all right perfect and so here's what
it looks like uh at least on my screen I'm going to uh spread it out just a little
bit um and really quickly let's take a very quick glance at this so we have a
04:31:15
date a day a month a year so some um some date information um then we have some
customer age information so how old was the customer again this is bike sales so
what did um you know what did they buy and they have some demographic information
so this is their age group we have uh the gender country State the product category
the subcategory the actual product that was purchased and then we have things like
um you know how much these things cost the quantity that was that was ordered so we
have order
04:31:51
Quant quantity unit cost unit price then we have the profit cost and revenue all
things that we almost everything in here we can in some way put into a pivot table
now I'm not going to go through every single variation of that but we are going to
be um looking at a lot of this um Revenue over here because I think it's it's
pretty easy to show the value of a pivot table with especially with um you know
currency or money so what we're going to do to get started is we're going to go up
to insert and we're
04:32:23
going to click on insert and then we are going to click on pivot table now really
quick there is a recommended pivot tables and if you click on that what will come
up is some recommendations that Excel gives based on the data that you have um and
it can kind of give you some ideas of of what you can do with pivot tables it's
going to generate it for you we're not going to do that we're going to build our
own uh but let's click on pivot table and it's going to Auto Select basically
04:32:52
everything and that's fantastic um but what if it doesn't come like that I I just
erase that if it doesn't come like that you can click right here you can cck excuse
me you can click control shift and then the right arrow and then the down arrow and
is going to select all of our data um and you have right here a new worksheet or an
existing worksheet we're going to create a new worksheet just tends to get too
clogged up if we put it on the same worksheet that already has a lot of data in it
so
04:33:20
right over here are pivot table fields and these are all of our columns that we
just looked at and we're going to be able to select those and kind of drag and drop
now if you just took the Tableau um tutorial series that I just finished doing last
week then this is going to be pretty pretty familiar um you're going to start
seeing a little bit of um hopefully some patterns about how the data is kind of
displayed and so we have our filters down here we have columns rows values all
these things uh we will be
04:33:51
using I'll show you how to use today as well as some additional things um one thing
that we want to start with uh for this demonstration is we're going to be looking
at kind of the um these bottom ones right here profit cost and Revenue and we're
going to be doing that per country uh per country and state and we'll kind of do
some drill Downs um and I'll show you how those work so for just to start out we're
going to take the country right here and you'll see it populate right over here in
fact um let
04:34:20
me zoom in maybe once uh yeah that should be fine I don't know if I want I might
zoom in it again in just a little bit um so we have our country and and it's just
like this very very simple oops um now I'm going to include the state now I'm going
to drag this um all the way and I'm going to put it under you can put it above or
you can put it below I'm going to put it below uh it definitely makes the most
sense there now when you do that it it um kind of populates it in an expanded
04:34:51
way but you can collapse this very easily we're going to go right here we're going
to right click we're going to go go down to expand and collapse and we're going to
collapse the entire field and so now here are all of our um all of our countries as
they were before now each of them has this plus sign to the left and if you click
on it now we can go and we see this state that we that we added to these rows and
what this is going to do is it kind of is like a rollup or it's like a grouping um
and so
04:35:18
if you you know have taken the SQL um tutorial series and you've done things with
Group by this is very similar to that um and if you've done the Tableau tutorial
series it's kind of like a drill down it's very very similar so you can drill into
the information so we um can put some values in here uh and what we're what that's
going to do is that's going to kind of create some some context to what this what
we're grouping by so just for um visual purposes let's
04:35:51
add this Revenue so this is the revenue that is bike uh bike sales revenue right
that's what we're looking at so this is the sum of the revenue for these bike sales
per country now if we drop down right here we can see that in Australia uh New
South Wales had uh 92 was that 9,234 N5 Queensland had 5 million you know etc etc
so now we can break it down we can't it's we don't just have to look at Australia
we can now drill down even further to the actual state is what they're calling it
um the actual state
04:36:32
within Australia and so it's super super useful and you can do that for every
single one and so we can look at Canada we can look at France and we can really
drill down into uh the revenue for each of these countries as well as the states
within them now over here this is not the most uh pretty um it just says sum of
Revenue and then it has some numbers not not the most pretty thing I've ever seen
um really quick we can go like we can um kind of highlight over these and we can go
back to home you can do it in
04:37:03
a couple different ways we can go to home and will type currency now it has these
two. Z at the end you can get rid of those really easily by going like that um
already this looks quite a bit better just visually um especially if you're looking
at it in uh you know dollars you can change the currency um to different currencies
if you want to do that now we don't just have to do uh the sum of Revenue we can do
a lot of different things so let's go to the value field settings so we can
customize
04:37:35
this name so we can do um Revenue oops good if I get spell Revenue per country
that's fine that you know it's just a placeholder trying to show you but we don't
have to just do that um you know we could do the count the average the max the Min
we can do just about anything we want um but let's keep it the sum right now um and
if we want to we can show this value as different things so we percentage the
percentage of column total percentage of row total let's do really quick just for
04:38:13
demonstration purposes the percentage of grand total so when we do that we can see
that the United States the per Revenue per country United States has 32% just
between these um you know these countries and Australia has the next one so you
know it might be kind of hard to glance at this really quickly to know who has the
highest um but what we can do is we can go right here and we can go to sort and we
can do largest to smallest and there we have the United States on top now when you
do it right
04:38:47
here it's not sorted largest uh to smallest you'd have to go in again click sort
and do largest to smallest and so now we can see that California has the has the um
you know biggest percentage they're pulling in 20% of that 32% of Revenue so I'm
just going to click C control z a few times and get us back to where we just were
um and what I want to do is I want to show you a few different things uh pretty
quickly so we want to pull in this profit and this cost uh and so I'm going to pull
in this cost next
04:39:21
and then I'm going to pull in this profit again uh I'm going to change the currency
on this and I'm not going to change the names um right now but you know you
absolutely can do that now the revenue is the how much is actually being sold so
you know for the United States it was 27 million now the cost is how much did it
cost to manufacture or or store um or distribute all of these products so that was
60 million and the profit is actually how much money is being made at the end of
the day after um you know all
04:40:00
their costs after all their employee costs after everything they're still making
the United States is still making $1 million now you might look at this and you
might say well you know I can kind of glance at it and say know that this profit is
correct based off these two numbers um but we can do a calculated field um if you
remember what calculated fields are that's something from Tableau very uh basically
the exact same thing and so we can create an additional column right here that is a
calculated
04:40:30
field that can add and subtract these things to make sure that our numbers are
adding up correctly so let's do that really quickly U let's go to pivot table
analyze we're going to go over to Fields items and sets and go to calculated field
now we can name this anything um and I'm just going to for demo purposes I'm going
to say um oops calculated field demo uh I'm sure yours will be different now um if
you want to you can go in here and this is the formula it's almost like um you know
we
04:41:05
haven't looked at formulas up this is our first tutorial but you know when we look
at formulas it's basically the same thing as writing it if inside of a cell but
here it gives us kind of this um open text to do how we uh do what we want with it
now what we're going to do is we're going to do Revenue I'm going to insert that
I'm going to get rid of this I'm going to do revenue and so that's the the the very
large number and then we're going to subtract and we're going to sub subract
04:41:35
our cost we going to insert that and let's do this and click okay so this is our
calculated field demo column that we just created and as you can see it matches our
uh sum of profit column exactly and that's exactly what we want to see we want to
kind of check to make sure that this revenue and cost uh fields are generating the
correct profit and sometimes those are off and so it's really good to kind of check
those and have that additional column um You probably wouldn't have this if you
were
04:42:07
um you know going to submit this to somebody uh just so you know now that this is
an actual column you can't go here and do something like cut or and paste it over
here you know that's not it won't let you do that what it is is is now an actual um
column and so we can go and remove that and we can add it back at any moment so if
we want to go back and add that um oops add that down here we can do that because
we've created that column it's now permanently there unless we go and delete all of
04:42:37
that data uh and so we can just click this check mark and it will get rid of it for
us all right now the last thing that we have not used down here is the filters now
the filters is exactly what it sounds like it's going to allow you to filter on
certain things um but probably not things that you already have included in your
pivot table so if you add something like the country down here um it's going to
kind of expand everything and then if you then go and filter on it it kind of
breaks it down
04:43:08
that's really not what the filter is kind of used for or meant for um for example
right up here we have uh customer gender okay so let's take the customer gender and
we'll put it in this filters now we can see all of the revenue all of the cost all
the profit and we can do that based off of the gender so we can filter by a gender
not really having to change anything about our pivot table and so at a super Quick
Glance we can see that uh the males are the profit from the males is 16.48% so at a
super uh basic level at a
04:43:50
really quick glance we can see that the men or the males are you know spending a
little bit more than the females by about about $700,000 now let's go ahead and
create one more pivot table uh we are going to create a pivot table right over here
let's go back to the sales right here again control shift right down it's going to
select all of our data and we're click okay so one thing that we're going to look
at is we're going to use some of this date information right here so let's select
04:44:23
our country just like we did before um and what we want to do is see you know what
year were we performing our best when were we doing our absolute best uh with oops
let me go back uh with our sales so I'm going to select the year and put that in
our columns and so now we have 2011 through 2016 and we want to look at our Revenue
let's put our Revenue right down here and now we have all of our Revenue now let's
again make this into a currency just like that and super quickly now we can get a
really quick
04:45:03
glance at at how Australia was doing each year and we can see that there was a huge
uptick in 2013 and a huge uptick in 2015 it didn't happen for every single country
uh it did go up uh for most countries very slightly for some but we can see on a
large scale from um year to year what that's like and So within just a few minutes
we're able to create some really useful pivot tables that anybody could look at and
understand and that's really the biggest use of these PIV pivot tables is that
04:45:35
you can kind of group these things together show some uh information and data at at
kind of a broad larger scale and make it to where anybody who's looking at it can
understand it that is why pivot tables are so useful and so I hope that this video
was helpful I hope that I was able to walk through it and help you better
understand how pivot tables work and how you can use them when you are working
within Excel thank you guys so much for watching I really appreciate it if you like
this video be
04:46:00
sure to like And subscribe below and I'll see you in the next video [Music] what's
going on everybody today we're going to be looking at formulas in [Music] Excel now
I know what you're thinking there's absolutely no way that you're going to be able
to show us every single formula in Excel and you're absolutely right but I am going
to show you some of my favorites and the ones that I found the most useful and then
you can go ahead and practice those and try those out and if there are ones that
you
04:46:40
really want me to do and you think that I missed put it in the comments below and I
will see those and I'll try to make a list of those and make another video on
formulas and include all of those as well and now before we jump into the actual
tutorial I want to give a huge shout out to the sponsor of the series and that is
udemy you guys already know if you have watched any of my videos that I absolutely
love udem me I mean honestly they were the ones who got me started and were able
ble to give
04:47:04
me affordable courses for me to get started as a data analyst I learned SQL and
Excel and python all through udimi courses and so if you are looking for a platform
to take a course I absolutely recommend you look at udemy they have fantastic sales
going on right now especially during the holiday season in this new year and so if
you're looking to take a full-fledged Excel course I have some of my favorites in
the description below and now without further Ado let's jump onto my screen and get
started with the tutorial all
04:47:30
right now before we start I want to say that this is not like every other tutorial
that I have created created this one is very streamlined okay so I already know
exactly what I'm going to do there's not going to be much messing around I left
little notes here and there um and I'm going to try to get through it because
there's a lot of them to get through um so all these ones at the bottom now these
are ones that I use a lot that I think are useful again if you know other ones that
you use a lot
04:47:55
that think that I should be using which I know there are ones that I left out of
here you know put it in the comments um I'll see the ones that people are liking
and I will I will create more videos on the because I know there are so many I also
will save this um excel in on the GitHub so you can go and download it it will be
exactly what you're looking at right now I highly recommend trying these formulas
out for yourself so you can get a feel for how they work and how they're actually
used and you can mess
04:48:20
around with it yourself so um as you can see at the bottom we're going to start
with Max Min and then we're going to go on to some more I think a little bit more
uh difficult things um and all these things are super useful I'll try to talk about
how you can actually use it as we go through it some are super self-explanatory but
some may not be so this one I think is super self-explanatory but again one that
you're going to use all the time um and so what we can do is we can say equal
04:48:49
and that's how you kind of start off saying this is going to be a formula in this
cell equal means uh I am now creating a formula and we're going to say MX and I'll
hit Tab and so it'll kind of populate it and right here if you've never seen a
formula before it'll to give you what the inputs need to be so it's going to say
Max of number one number two etc etc what we're going to do is we're going to give
a range so we're going to go from here down to here
04:49:15
you don't have to close the parenthesis but you can I'm going to and then you hit
enter and so for this date it's going to give us the max date now these are um the
start dates for these people right here and so if we just kind of glance through
here we can see that 2013 was the last year and this one is actually the latest in
that year and so it gave us the correct one the Min is going to do the exact
opposite it's going to give us uh the smallest and so we'll give it the same range
we'll close
04:49:45
the parenthesis and it's going to say December 7th of 1995 and we can see that that
is correct so Michael Scott started in 1995 the earliest of all the employees um
and you can do the exact same thing for really any of these columns we can see who
the who's making the most money or at least what the higher salary is U so we'll do
Max and then we'll do the salary range and so this is this one again uh whoops what
did I do oh I did the wrong range didn't I no I didn't do the wrong range it's
04:50:20
just there it goes uh this column was a date range or a date column for whatever
reason let me get rid of that uh and then we can do equals Min and we'll do again
we'll do the salary and at a quick glance we can see that Pam Beasley is making the
least and 65,000 is Michael Scott who's making uh that so super simple it shows the
max it shows the Min you can select a range there you go let's move on to if and
ifs now if is um I think pretty straightforward so all you're going to do is you're
going to
04:50:57
say if this then that um ifs is a little bit different so ifs is you can you can
put multiple conditions and as we're writing it I'll show you kind of what it the
conditions that need to be met all right so we're going to click right here we're
going to say equal we're going to do if hit Tab and we need a logical test uh and
so we're going to give it a range or or or something we're going to say if it's
equal greater to um something like that then we're going to say if the
04:51:24
value is true what's the what is going to be the output or if the value is false
what's going to be the output so let's do this right here we'll do this age range
and so if they are greater than let's say let's do 30 if they're greater than 30
we're going to do a comma and so if the value is true what what should be the
output uh if they're greater than 30 we're going to call them old and then if it is
false so if they're younger than 30 what should it say and we're going to say
04:51:59
young and we'll close the parenthesis and there you go so if they're over 30 then
they are going to have young or if they're younger than 30 they're going to have
young now this is something where you need to specify if you want 30 and over or
over 30 we chose over 30 so 30 is not included in that um so they're going to be
young now uh let's get we don't actually need two of these that's pretty self-
explanatory the ifs is a little bit different right you can have multiple
04:52:31
conditions so let's open that up real quick so ifs and now we have a logical test
value if uh that's true then you can do logical test two value if that's true um so
you can have multiple multiple multiple things now this one is a little bit
different in this one oops let me get out of this in this one you had a value of
true a value of false ifs does not have that ifs is going to give you um different
ranges in different specific conditions and you can't say if this one's false
04:53:06
you're just going to have multiple conditions so let's do equals and ifs Tab and
we'll do our first logical test so let's do um if the salesman or if that equals to
salesman we're going to say we're going to respond with sales so that's if the
value is true that's what we want the output to be now we're going to go on to our
logical test two so you're going to see this pattern right if this is our
conditional or logical test so if this is true this is what's going to be returned
so you'll
04:53:47
notice that's just a a pretty simple pattern we can just do random things so if
it's equal to sales um and we'll just do the same one if that is equal to say HR we
can say fire immediately and now we're going to say if it's equal to regional
manager and we say give Christmas bonus and we'll close the parenthesis and let's
see what we get so as you can see there's no default value for true or false like
like this one there was a logical test and if it was true there was a value and if
it was
04:54:39
false there was a value so for every single one you'll get a value for this one
that's not exactly going to happen as you can see there are these Nas now when that
happens it just means nothing met that condition so we never said anything about
supplier relations we never said anything about accountants but if it was part of
that ifs statement then it got something um and so that is how the ifs works now
let's move on to length uh this is exactly what we're going to do but you know some
of the
04:55:08
uses for this U for the length I've used it for a lot of different things um one
thing that I've used it for in the past and you know Max and ifs you know you can
use it for almost anything length is there's a lot of different use cases one I
used to work with a lot of um customer data or patient data they had like Social
Security numbers and if you know there was bad Social Security numbers we didn't
want to include that and so we do like the length of that and if a social security
number was let's say 10 numbers
04:55:37
or 11 numbers where it should only be nine or or you know however many they are I
think it's nine then we know that that social security number is incorrect and then
we can get rid of that or discard it from our results that's just an example right
um so for this oops what I do that I did control Z to undo that if you didn't know
how to do that um so we're going to do equals Len which is length um and again if
you didn't see that it Returns the number of characters in a text string so let's
go right here
04:56:09
and let's go to uh let's go to their last name and we'll give it a range so it's
going to tell us how many characters are in that string so for halber it's seven
characters for flenderson it's 10 characters and we're able to see a length and so
again there are a lot of different use cases for this uh the social security number
was one another one is phone numbers right if you look at the length of the phone
numbers and there's ones that are like 12 numbers long you know those might not
04:56:38
be ones that are accurate and you need to go look at them and see if you want to
include them in your results or your output so that is how length is done let's
move right over to the left and right um I I might be going a little fast but uh
you know I'm keeping it I'm keeping it live I'm keeping this on our feet uh so
let's keep going left and right um are kind of like substrings if you've taken the
the sequel um tutorial series that I've done uh substrings are where you can choose
a certain part of
04:57:09
the text string and you can extract data from that um and usually have to reference
a certain number so a certain amount of characters that's the exact same thing
except uh unfortunately there's no substring there's substitute but there's no
substring left and right is really the closest thing that we have so let's kind of
take a look real quick and see what we can do so we're going to do left and it's
going to say Returns the specified number of characters from the start of a text
string so we're
04:57:37
starting from the very far left and we need to choose our text and then choose the
number of characters that we're going to be looking over so let's go over here and
let's just choose you know start symol uh we'll get a little bit more advanced so
we have um this is our text range so these are the the the ones that we want to
look at and then how many characters do we want to look forward and we'll just
choose three as an example and so you can see that it takes the first three
characters from
04:58:06
every single thing now you can also do this with numbers it doesn't just have to be
um you know name with with actual words or letters you can do the exact same thing
so you can say write um and we're going to choose our our string uh and let's do
this one so you know all of them start with 100 um and we'll just say we want to
take the last one so this one is going to start from the very far right and go over
one character so right here you can see this is our range and I just chose one so
starting
04:58:37
from the very far right we go over one character and that's what we take and so
that can definitely be useful another one that you can do and this one is one that
I have used so many times I mean honestly countless times in in actually using this
in my job uh so we're going to go from the right and we're going to look at a date
so you know sometimes you have these date structures month month day day year year
year or year um you know day month year all these different and sometimes you just
want to extract
04:59:06
either the month or the year or or something like that the day and so we want to
come in here we're just going to extract the oops I wanted to make that arrange we
want to extract the year of the start dates so we're going to do that and then
we're going to go over four so we want to take the first four characters from the
right to give us the entire year let's do that and now we can see exactly the year
and this can be just super super useful this is again one that I've used used a lot
and so
04:59:34
that is one that you might want to remember in case you're ever doing analysis on
you know start and end dates or or anything with um date data uh again one that I
highly recommend remembering let's go over to date to text I actually probably
should have included that um before because I actually used it in this one um if
you notice right here this is a text so in in this one we just did that was a text
you can't do this right on um start and end dates when it's a date uh format and
05:00:05
let me show you so this is a date now if I do equals and you know we just did this
uh let's do on the end date and I'll do the whole range give me a second and we'll
do four it's giving us completely random numbers why is that because underneath the
date range there are um numbers right so if I go right here and I make this a
general it's going to have the numbers and look these are the first four characters
from the right and so it's doing what it's supposed to do but
05:00:39
uh it's not doing what we actually want and that's the issue so how can we convert
this now there are a ton of different ways um but the quickest probably the easiest
besides actually writing writing it out like this like 11-2 d201 which then
converts it to a date format um but what you can do you know just so you know you
can create a as a text you can do 11-2 d201 and now it will stay a text string and
as you can tell these are a little bit different because this one is uh formatted
or situated on the right and
05:01:17
this one's on the left that's how you can tell the difference now if you don't want
to do it by hand uh completely manually and waste hours of your time you can do it
in a very simple way so we're going to do uh text so this is the exact um form for
that we're going to use so let's get rid of that one there we go so we're going to
do equals we're going to do uh oops text it says converts a value to text in a
specific number format so for a date format we can choose a date format and then
it'll
05:01:52
convert it to a text for us which saves so much time I promise you uh let's do all
of these just like we did and then we need to tell it what the format is if we
don't if we tell it something incorrect it's going to give us a completely terrible
output or just give us an error alog together so this is a DayDay month Monon year
year year year format and that is what we're going to do so we're going to do ddmm
y y YY and close that up and there you go and now we well because it's in a
05:02:26
formula what we need to do is copy this and past paste it right over here and now
you can see that is a general this is something that we can use as a string and
let's just check it just to make sure we're going to do right we're going to do
this one let's do all of them and we'll do four and there you go so now it works
that is what we are looking for um and you can do that imagine doing that with
millions of rows or you know let's say 10,000 rows it's going to be a breeze
05:03:01
right it's going to take you two minutes or a minute to do everything that you want
to do instead of having to just do a bunch of mess to convert it to a string which
I promise you I've done it it just takes forever it's it's terrible so that is uh
date to text super helpful formula let's go over to trim now I I purposefully
messed up this column now why do I did I mess it up like this because when you're
working with real data you're going to get data like this it it's messy it's
05:03:31
dirty it just has random spaces at the end for no reason um because sometimes
you're going to be working with um data that is inputed by a user it's not like a
drop- down option so imagine somebody's typing this in they accidentally put a
space so they actually put an enter or something and then they submit it and this
is how it's going to look in the database um and if you're a data engineer or you
know you're working with the raw data if they don't clean that up then you're going
to
05:03:59
be working with that that dirty data and I I guarantee you if you're working as a
data analyst you're going to see stuff like this not with maybe a last name but all
sorts of data so we're going to go right here we're going to say equals trim do
open parenthesis actually this says removes all spaces from a text string except
for a single space between words so like you know if it said Halpert space uh or gy
space Halpert it won't take the space in between there because it it kind of
understands that
05:04:28
the in normal language space is supposed to be there so it won't do that um but
we'll take that we'll give it this Range close that up and there you go now it is
nice and clean much more usable now let's look at concatenate one that I have used
just way way way too many times um and something that I've used concatenate for and
you'll see this one in a lot of demonstrations for a good reason is because a lot
of people use it for this um so what you can do is you can say equals um and well
let me tell
05:05:06
you what concatenate does real quick so what concatenate does oops I'm totally
messing up here um but it joins two or more text strings into one string it
basically joins things together and adds them together so let's do concatenate and
we're going to add this first and last name again one that gets used all the time
but that's because um it really is useful so you can do this and you can say now
now I want to include this so concatenating this and this and let's take a look so
it says
05:05:38
Jim Halpert U but it's all connected and that's typically not how people write
their names so what we can do is we can go back in here and we can do what my
demonstration up here already tells us to do which is we're just going to add
another thing in here and if we add two parentheses we can include anything in here
we can include a dash we can include an exclamation point or we can just include a
space so let's just include a space really quick and just like that it works
perfectly and so now
05:06:08
we have the full name now something that you could use it for is something like
generating uh an email this is something that you absolutely could do um and it's
you know pretty simple so I'm going to do it like this I'm G to say oops what did I
do I'm G to say um Dot and then at the end I'm going to say at oops comma quotation
gmail.com and now I've created emails for all of these people so just something
that you can do with this um and something that it it absolutely is
05:06:49
used for and you'll see that demonstration almost everywhere because honestly it
gets used a lot um by data analysts and so uh you know just a good one to know
understanding how that that concatenation works um let's go over to the next one so
we are going to do substitute now substitute's really interesting um there are
different ways you can do it I'm going to show it to you on these dates real quick
uh that's what we're going to look at so changing a date format changing how what
it's supposed to look
05:07:19
like is absolutely something that happens all the time and um you know sometimes
you'll even get it like this where it'll look like it'll be messy it'll be
different a different um I guess format so this one has all the other ones have
slashes where these ones have dashes and you know what you can do is if you want to
well let me actually go with the no instances real quick because this one is uh
actually makes the most sense um so we'll do equals and we're going to say
substitute and oops and let me say
05:07:57
substitute replaces existing text with new text in a text string so if we do an
open parenthesis it says we take the text have the old text we have the new text
and then we have how what instance or how many times uh or or or what instance are
we looking at it and I'll explain that in a little bit so the text that we're going
to be looking at is this one right here so let's take this range and the old is
we're going to take this Dash and so let's take the dash and then what do we want
to replace
05:08:32
replace it with we want to replace it with this slash right here I think it's a
forward slash isn't that what it's called it's called a forward slash am I crazy um
and we're not going to put an instance notice that that's in a bracket that means
it's optional we're going to do none of that um and what it's going to do is it's
going to fix this so this one is now in the correct format that we want uh and
that's fantastic that's you know that's what we tried to accomplish
05:08:57
given what we had now let's fix that if we want to do the exact same thing uh we
can say uh what are we doing substitute we can do substitute we can do open
parentheses we'll give the range and now let's say we want to change all of them to
a different format so instead of the um forward slash I'm going to keep calling it
that if that's correct we want to give it a dash and so then we close that and now
all of them are in this new format so it it's able to substitute a specific value
for a new value and if
05:09:30
you don't include an instance then it'll do it to every single one in there so
let's go over here and we're going to actually use the the um the the instance num
and I'll show you what that does uh and so really quick we'll do the exact same
thing that we just did we'll do the forward slash and we want to replace it with
this one again this Dash but we only want to do it on the first instance of that
forward slash and so as you can see all the ones that um all the ones that were
replaced are the very
05:10:09
first instance whereas the second instance which is the second time it appears in
this string does not get touched so if we take this and we put it right over here
and we move it to two it's kind of the opposite so the first one wasn't touched the
second one was so we're choosing which instance or which time it shows up in that
string and then it replaces it if you do not choose an instance it chooses all of
them so this can be super useful if you want to do like a bulk replace um but
05:10:43
you only want to do it on a specific column um and you just want to use a formula
really quick right um and so you can use this in a lot of different ways so that's
how you're able to actually do it with the first instance the second instance and
if you don't include an instance at all let's go over to the sum uh this is one I
think everyone knows how to use but I want to show you two other ones um as well so
let's go to the sum and we're just going to do equals the sum and I hope you know
what this is
05:11:11
well not hope I if you don't know what this is it just adds up all the numbers in
range so we're going to add sum means add so we're going to take this and it's
going to give us the uh what all these salaries are together so super super simple
Su is one of probably the most basic formulas that you can do um some if is a
little bit different you can add an if statement which we learned right back here
you can add an if statement and then add it if it meets a certain criteria all
right so we're going to do
05:11:44
equals some if and then you're going to need to give a range in criteria and you
can include a some range if you would like so we're going to do the salary again we
going to do a comma and now here's our criteria let's do if they have greater than
50,000 for their salary and close our parenthesis so now it's only going to add up
if their salary is greater than 50,000 now his is 50,000 exactly so that won't
count but we have 63 and 65,000 which does equal 128,000 so it it just gives a
specific
05:12:23
criteria or an if statement then it does the addition uh so super useful on that
one so that is how you do a su if and Su ifs is kind of the same thing as we did
back here there's the if and the ifs so the ifs is going to be if it has it meets
multiple conditions so let's take a look at that one so let's do um equals some ifs
now uh oops now the Syntax for this one is going to be a little bit different
you'll see that in just a second this adds the cells specified by a given set of
conditions or criteria so
05:12:58
let's do an open open parentheses we give the sum range so let's do um the same one
as before then we have our criteria range so what are we looking at What's um this
is the area that's going to be added after all these if statements are done right
so we have to initially set that now we're going to say okay what criteria are we
basing this off of so let's put a comma and we're going to base it off of let's do
this one we'll say um if the uh gender so we'll do comma if that's female oops
05:13:35
if that's female and then we'll give another one we can say if they're female and
let's say they are greater than oops greater than 30 and we'll close that up and
it's going to give us 88,000 so female female there's one two right here so it's
going to be this one and this one that equals 88,000 so that's how that works
you're able to incorporate several different conditions into uh the sum formula so
again I know this one's super simple but you you can use it in a much more
05:14:12
complex way if you use the sum if and the sum ifs um almost the exact same thing
for this count I'm not going to go super in depth into this one um I'll just kind
of show you because count is um count and sum are kind of on the same level of
difficulty they're both pretty beginner this is just going to give you a count of
how many cells um are there so let's give this range um and so it's not going to
add it it's just going to give us a count so if we do right here and scroll over
them like highlight them
05:14:47
this countdown here oops this countdown here is nine and so it's going to give us
that count but we can do a count with conditions exactly how we did it in the sum
so if we do count if Oops I did not spell that right if we do count if we're going
to give a range and a criteria exact same as we did before so let's do this I me
you can do this on basically any of these it doesn't really for this demonstration
it doesn't really matter um but we'll say if their salary is greater than 45,000 so
how many people
05:15:21
this is going to give us how many people have a salary over 45,000 and that's five
so before in the sum if if we did that um we did 50,000 it adds everything together
the count is just going to count the amount of cells that meet that criteria and
again count ifs uh we're going to have a criteria range and then we will specify
what if statements we want to be uh to occur in order to count those cells so let's
do we want you know we want to count it can be any range or it can be any of these
05:15:57
we'll do the ID this time and now we can say you know want it to be is our criteria
one we can say we want it to be greater than want their ID to be greater than 1005
and let's say we want them to be male so they have an ID over a certain um a
certain range and then they are a male so there's only three people that meet that
criteria and so it'll be Michael Stanley and Kevin those are our three people and
so it gives us a count very useful to give quick numbers like this something I I
genuinely use a lot
05:16:41
and I know I've said that a lot during this tutorial but that's because everything
I'm showing you are things that I've used a lot so I don't feel like um you know
I'm speaking out of turn here let's look at this one this one is very um has some
specific use cases um notice that this is a text right now um if you do it when it
is uh in a date format it actually will not work I mean I can you can test it out
yourself you just got to trust me it's not going to work so what this does is
05:17:13
it's going to give you the range from this day to this day that's what it's going
to do so let's do uh oops days it's GNA we want to choose our end date so this is
our end date it's kind of backward from what you think end date to start date you
think start date to end date so you have to start with this one and then we're
going to choose the start date and now it's going to tell us how many um how many
uh days was it from here to here and this one it's 5,56 so Network days is
extremely
05:17:47
similar except it takes out holidays and it takes out weekends and you can see how
many working days has this person um how many working days or network days has this
person worked not including you know weekends and holidays have they actually
worked since their start date and their end date so let's do Network days and we
need our start date our end date and you can specify extra holidays if you'd like
but there are a already standard set holidays in there that it takes out um so you
know if you want to
05:18:20
do that you can so we're going to do the start date again this one's different this
one says start date end date and then we're going to give the end date and if you
notice they are going to be different numbers is dramatically lower because it's
taking out weekends and holidays so this is how many days uh calendar days they've
worked and this is how many days they've actually been in the office and worked and
that is it um again there are so many formulas I mean literally
05:18:49
hundreds of formulas that you can utilize and use and are out there for you to try
out yourself if there are specific ones that I did not cover in this video please
please put it in the comments below so that I can you know show you how to do these
things I I I will say I've probably used a majority of the ones that you're going
to put in the comments already and if I haven't used it I'll take a look at it and
see if it's really useful and I'll show you that so thank you guys so much for
05:19:18
watching I hope that this has been helpful I I feel like a lot of these things are
not things that I learned before I started almost all these are ones that I learned
while I was on the job and so I'm hoping that you can get ahead of the curve and
you can learn learn these things before you actually start so that when you get in
there you're just like killing it with the formulas and people are like whoa this
guy is like this guy knows what he's doing in Excel give him all the Excel
05:19:42
work and then you become like you know just the Excel guy um and everyone you know
loves you for it so with that being said thank you so much for watching I really do
hope this helped if you like this video be sure to like And subscribe below I'll
see you in the next [Music] video [Music] what's going on everybody welcome back to
another video in this Excel tutorial we'll be looking at [Music] xlup now if you
don't already know what xlookup is it is a new feature in Excel to kind of replace
vlookup or to be a
05:20:25
much better option at least in my mind is a much better option than V lookup and so
if you're someone who's either used V lookup a a lot and you're trying to you know
learn this new Option or if you've never used it before this video will be super
helpful because I'll walk you through kind of the options and what x lookup can do
as well as the difference between X lookup and V lookup but before we get into the
tutorial I want to give a huge shout out to today's sponsor and that is udemy udemy
is the
05:20:47
go-to place if you want a full-fledged course in Excel I have three options of
courses that I have taken on em me so I'd highly recommend checking those out they
are having a huge sale on all their courses during this time and so if you are in
the market for a course I highly recommend checking out UD to me and getting one
there now without further Ado let's jum on my screen and start the tutorial all
right so let's get me off the screen because we all know why we're here so I didn't
include this in the
05:21:11
formulas video last week because I knew this was going to be a large one and a lot
of people are going to want to know how to do this what the difference stream V
lookup and X lookup is so it has its own dedicated video to it so let's get started
it is a Formula so we're going to come in here in this cell we're going to hit
equal and then we're going to start typing X lookup now I'm GNA hit tab in just a
second but let's read what this says it says searches a range or an array for a
match and
05:21:37
Returns the corresponding item from a second range or array by default an exact
match is used so really useful to know um we'll talk a little bit more about that
in just a second let's hit Tab and it's going to complete it and it's going to
start giving us or it's going to tell us what our input values need to be we're
going to have our lookup value we're going to have our lookup array our return
array and then some options things like if not found so if your option isn't found
you know what
05:22:06
will be um you know the the uh output that it gives us a match mode and a search
mode and I'm going to show you um kind of how to use every single one of these
things as you can see at the very bottom I've kind of already set up all of the
instructional um instructional content for this video and so we'll kind of get
through all these different scenarios so let's just start really quickly with um
how to use it very simply with the lookup lookup array and return array so we're
going to come in
05:22:36
here and we're going to give it our lookup value Now Toby Fenderson right over here
in A3 is going to be our lookup value so that's who we're going to be searching for
now we're going to hit comma and now we're going to be needing to look up uh or to
input our lookup array now an array is just uh you know a range basically so we're
going to do this is where it's going to be searching for um that value this is
where it's searches for A3 so here's Toby Fenderson here's Toby flenderson so
05:23:05
it will find it in this array right here then we're going to hit comma and now we
need to give it the return array what it's going to return on that row when it
finds it so we're going to return his email keep it really simple so what it should
do and let's close parentheses what it should do is it should take Toby Fenderson
it's going to search in this column or in this array and then it's going to return
the email when it finds Toby Fenderson so it's on Toby Fenderson
05:23:34
is on row six so it's going to find Toby flenderson it's going to come over here
and it's going to return Toby flenderson dundermifflin corporate.com that's what it
should do let's see what it actually does said enter and it returns it now if we
drag it down like this it'll apply it to all of these names right here and it works
exactly how it's supposed to um again if you have never used vlookup you don't know
how good you have it okay vlookup um was extremely useful but just
05:24:05
uh a bit complicated and I'll talk about that near the end of the video when we
compare V lookup to xlookup but just know that if you're using X lookup for the
first time and you're just getting into using Excel you guys have it good okay so
just know that um now let's go over here to X lookup multiple rows because you can
return more than one output with um with X lookup so let's go right in here and
we're going to basically write the exact same thing as we did before so let's write
X lookup
05:24:39
we're going to do Toby flenderson as our value we're going to search here and we're
going to do something a little bit different this time we want to include our end
date and the email so what we're going to do is we're going to start here we're
going to go down all the way to the bottom of end date and then we're also going to
include the email and when we do that it will uh in the output give us a row or a
column for end dat and a column for email so an output for both
05:25:07
so let's hit enter and now we can see that we have the end date here and the email
here now one of the downsides or or something that I'm not a huge huge fan of is
well first off I love that you can do this that's fantastic um but it have to be
right next to each other so you're only going to get that output exactly how it is
in the columns so if I went and did this range um I would include all of that um so
H you know let's just for example let's pull that down here so let's take
05:25:42
this and put it right here if I did instead of zero or or O2 to P10 if I included
age to email this whole range and I hit enter it's all going to be included so you
know that's one of the small downsides of of that functionality of when you can use
multiple rows is that it's going to use the rows exactly as they are you can't
really customize it within the formula you can move around um these columns to how
you want it um so that is something to note and again you can pull this down and
it'll
05:26:21
be applied to all of those names let's go over to X lookup exact match so let's
open this up we're going to do equals xlup as we've been doing and we're actually
going to be looking at the if not found and the match mode U both you know on this
tab right here so let's do what we've been doing before we take our value that
we're looking up we take the array that we're looking and we're going to do the
email and you know as you can see this says Toby flender and not Toby
05:26:54
flenderson so what we are going to do is we're going to hit comma and if it's not
found you can return um a value or a string that you want to return now for simple
purposes or for simple instructional purposes we're going to do not found and then
we're going to close that off so let's do this and Toby Fenderson was not found and
so it was returned not found if Toby Fender was actually in this full name then it
would have returned the email and then if along the way you know one of these was
not part
05:27:29
of it then you know we would have uh we would have had the KN found all right so
let's go right up here we're actually just going to copy this uh because I want to
reuse it um and then we're going to go right here and we hit a comma now this is
our match mode option and so we have four different options that we can choose from
a zero is an exact match and that is by default that is what we have or what we use
then there's a minus one that's an exact match or next smaller item then there's a
one which is an
05:28:00
exact match or next larger item and then there's a two which is a wild card
character match now we're going to do that and we are going to um you know try this
out and it's not going to work and not just because I forgot to put A4 um it's
doing it because it's searching for Beasley but if there's not a wild card option
already put in here um it doesn't recognize it so we need to indicate where that
wild card needs to be so we're going to do a double apostrophe or
05:28:29
quotation marks we're going to put put an asterisk right here and then do another
one and we're going to hit an Amper sand so we're going to have an Amper sand right
here and when that's going to say is anything that comes before A4 anything that
comes before Beasley is okay doesn't matter what it is as long as it has Beasley at
the end that is going to be okay so we're going to have Pam that comes before
Beasley and that's going to tell it and it's going to say okay I know that anything
05:28:56
that comes before Beasley is all right and so when we hit enter is now going to
return the output that we are looking for and we can include that on these as well
now this one is Meredith um and so Meredith is at the beginning so we have Meredith
Palmer so we can actually take this and we're going to put this at the end put the
Amber sand right here and now it'll work and the exact same thing for Kevin Malo
right here Kevin Malone so it just didn't include uh the ne at the end and so it's
still going to work
05:29:33
if we include that asterisk at the end now I know I said we were looking at search
order but I'm actually going to kind of give you an exact match uh first and then
search order but it just kind of easier to show it over here so I'm going to do X
look up I'm going to look up this value do a comma here's the range this is our
start date that's it's going to be looking for and I want to return the full name
now no value in here has one one 2000 but what we can do is we can do comma and
then a comma for
05:30:05
the match mode and do an exact match or next larger and I know this is in the exact
match part but it you know kind of refers to search ORD a little bit um where it
searches for the next largest value that's that's what that number one represents
the next larger value so we have 112000 and if we look right here the next value
above 112000 is 152000 and so it should should return Angela Martin let's see if
that works and there it is now let's look up the actual search order um so let's do
05:30:39
equals x lookup this is the value that we want to be searching for and we're going
to be looking in this start date and comma and we want to return the name now let's
get over to search mode now the search mode performs a search starting at the first
item so at the very top going down so by default it searches from first to last but
you can reverse that and do search from last to first or you can do a binary search
which is where it sorts in ascending order or sorts in descending order um and
that's with the actual
05:31:14
value and so we won't be able to show this binary search or on ascending or
descending because our values are the same but if we had different values and we
were looking up um using this um next largest we we would be able to show that but
I'm going to show you the search from first to last and last to first so let's put
in by default and this is what it would be search From First to Last what the
default would be so it starts at the very top it goes down and finds the first 56
2001 and returns Toby
05:31:46
flenderson now if we go in here and we hit minus one that is going to search from
last to first so it's going to start at the bottom and go to the top and the first
one that it finds is Michael Scott so that's that first one starting from the
bottom and then the Michael Scott right there so these two the exact match and the
search order can kind of be combined into um this one right here we're using this
one um which is you know exact match or next larger and you can include that in
05:32:15
this binary search in this one as well all right now let's head over to the X
lookup horizontal I think we're we only have a few left yep X look up horizontal
then we'll do X lookup with sum and then I'm going to show you the V lookup at the
end so let's go right here let's say equals X lookup the value that we want to be
searching for is February that's what we're looking for hit comma and where do we
want to search to find February we want to search in uh these
05:32:40
calendar months and then we hit another comma and now we're going to be searching
for paper so let's do paper and we'll hit enter and it found February and it return
paper right here and we can do that for paper printer and manila folders and so
it's going to give us the 310 the 40 and the 118 from February now let's go right
over here to XL up with some um I actually it's basically a carbon copy of this uh
let's take this over here real quick and place it right there because
05:33:15
it's the exact same thing except at the end we're going to use I'm going to show
you how to use sum with the X lookup at the same time now um we're going to be
using the formula sum and so we're going to do sum and then within the sum our
first number is going to be an X lookup and then our next value is also going to be
an X lookup so let's do X lookup and now we're going to search for our very first
value oops our very first lookup value so we're going to go to
05:33:52
i1 and then we're going to search this again and we want whatever value oop goes
into that so let's close that parenthesis and now we're going to do a colon and
another X lookup and now let's do March so now we're going to search for March
we're going to do our search range where we're searching for that March and we want
the paper as well and let's close that and then we also need to close that
parentheses so now we are basically adding this February and and this March so it's
05:34:31
going to be 310 plus 150 it's adding those um two values and it should be uh what
460 so let's see if that is our output and it is so you can do this with a lot of
things not just some but you're able to use x lookup within different formulas if
you're searching for a specific value and a specific value um in in another um cell
you can add those together using X lookup which is honestly it's pretty great so
let's go over to V up so I wanted to show you this because I wanted to show you
where
05:35:03
it came from and what we used to do um unless you are continuing to use V lookup
and what we can do now so X lookup I just showed you kind of everything um but
super quickly I'm going to show you how vlookup used to work um in a super short
way so that you can understand how it used to be used and how it is used uh how X
lookup is used now so let's go in here and we're going to say equals and we're
going to do a vlookup and so we have a lookup value Val and so we're going to click
05:35:32
this we're going to hit Comma just like we did before and now we're going to do a
table array and the table array is a little different in that you're searching an
entire area so let's do uh H2 all the way through o oops o10 so that's what that's
what our table array is going to be then we're going to do a comma and now we have
to do a column index number which number um are we going to be um searching for
which um value are we going to be searching for in here and so we want to search
for
05:36:10
eight because this is 1 2 3 4 five 6 7 eight we want to return that email and we're
searching for the name right here in this very first column so we have that comma
and we're going to do eight and then in the range lookup you can do true which is
an approximate match or false which is an exact match and we'll do false I don't
know why it's not Auto auto doing it but there we go and now we will do it and it's
going to return it just as we had it um a lot of people uh I guess not everybody
but some people
05:36:45
didn't like and the reason why they created X lookup you had to do those ranges and
if you ever went in here and then we let's say we um added another column which
happens to data now it gives completely different um different data so let's say
for whatever reason we added uh address so now we have these people address well
now it's going to give us a different um value it's going to have this end dates
because if we go in here now it doesn't um now the eighth is this end date and the
ninth is this
05:37:19
email so if you have a vlookup that you use for um you know a calculation or a
table that you've created or different things in Excel you then have to go through
here and manually change this and so a lot of people didn't like that CU if you you
know needed to change data or you needed to change something or add an additional
column you'd have to go back and fix all of your vlookups they wouldn't just
automatically U Move with it which is what happens with xlookup and just to prove
this uh let's go back
05:37:48
to the very first one which is the X lookup and right now the email is looking at
O2 and through o10 um we're just going to insert right here and that would be our
new colum we'll do address oops address and notice that it hasn't changed and why
is that because it auto changed for us from P2 to P10 understanding that it wanted
to stick with when something was inserted here it wanted to stick with the original
data the original array that was selected and so xlup does that work for you and it
05:38:21
makes it a little bit easier to automate things and create these processes in Excel
without having to go fix it later which you had to do with lookup so that is it for
today I hope that you know how to use x lookup a little bit better now that you
have watched this uh if you enjoyed this video be sure to like And subscribe below
and I will see you in the next [Music] video what's going on everybody welcome back
to another Excel tutorial today we'll be looking at conditional formatting [Music]
05:39:04
now if you've never heard of conditional form mounting before that's okay I had
never heard of it before I became a data analyst and so now that I've been using
Excel a lot of course I use it quite a bit and so I want to show you how to use it
conditional formatting is basically just a way to see patterns and Trends and data
and that's a super simple way of putting it um but it's very easy to use and so
hopefully I can show you how to use it uh really easily in a lot of the things that
I use the most and some
05:39:29
of the things that I use it for so that you can also know how to use conditional
formatting now before we jump into the tutorial I want to give a huge shout out to
the sponsor of this Excel series and that is udemy you guys know by now that I
absolutely love udemy I've been using them for years and I've taken literally
hundreds of courses on udemy and I've learned so so much especially when I was
first starting out as a data analyst uh I learned a lot through their Excel courses
on udemy and so I have actually
05:39:53
put the ones that I really like and I have taken and enjoyed and think you would as
well in the description so if you want to take those sure to check those out again
huge shout out to UD me for sponsoring the series now without further Ado let's
jump onto my screen and get started with the tutorial all right so let's jump right
into it on this Home tab right here if we go all the way over to the right there is
conditional formatting and the description that it gives us is easily spot Trends
and patterns in your data
05:40:17
using bars colors and icons to visually highlight important values and that is
exactly how I would have defined it a really good job Microsoft exactly how I would
have done it so what you'll see right away is there's nothing too complex so we
have some highlight cell rules um we have some top bottom rules data bars color
scales icon sets and then at the bottom we can create a rule we can clear the rule
and we can manage our rule so if you create a rule then you can manage it so we're
going to
05:40:45
start with these icon sets and I'm going to show you how to use those and we'll
work our way to the top and then I'll show you how to create some rules yourself
and how that all works so let's start off with the icon sets I'm going to go over
here to sales um and for this data we kind of have this um you know Trend or or
pattern that you can kind of see over time so over the months um so if we go right
here and let's use that conditional forming let's use that icon sets and right here
we can use these
05:41:18
directional so you know we have this kind of Time series each month that shows us
how much paper they're selling and if we do this right here it's going to show us
if it's kind of average or if it's below average or if it's above average or if
it's going up so at a really quick glance you can kind of see the pattern of this
data set it's kind of going mostly yellow and red there's only two months where
it's going up significantly now we don't have to only
05:41:45
do that for one row or one column you can apply to all of them but as you can see
all of these are red now why are they all red it's because they're using numbers
for everything so they're comparing these 24s these 50s and 65s against these 450s
and 750s and so they're all going to be red but if we do it individually if we do
it each row if we take it just like this and then we go to Icon sets and do it it's
going to be much more representative of the actual printers not of all the numbers
as a
05:42:19
whole and you can do other things uh the arrows are ones that you'll probably see
the most often that's the one I've used if I ever do use them um but you can you
know do ones like this where they have you know kind of a trend upward or a trend
downward um and so there's just several more arrows this one only gives you three
as you can see this one gives you five um and you can do you know colors or shapes
or or different indicators and all these different things um and honestly it's kind
of
05:42:47
whatever you want to use whatever makes sense for your data but you know I've
really only ever seen like these colors being used I've never really seen these
flags or anything like that but again it just depends on what industry you work in
you might you might see that let's go right over here to the demographics um and
let's look at our color scales now color scales are going to be the probably the
most obvious thing that in datab bars are going to be the most obvious things in
here um if you go
05:43:14
right here and and you look at this color scale if it's high if it's among the top
ones it's green the lowest it's red and you can change that um to really any colors
you want any colors that they offer you um and it it does exactly what it does it's
a color scale a gradient of the colors from high to low or low to high and so any
color that you do you'll be able to kind of see um you know what's good and what's
not good that really is um color scales in a nutshell
05:43:48
data bars are again super super straightforward it's going to be either a gradient
fill or a solid fill so let's look at the gradient fill if we do a blue gradient
fill I'll actually let's get rid of our um let's go over here let's go to clear
rules from selected cells we haven't looked at that yet but that's how you clear it
let's go to data bars and we'll use this blue gradient so with this blue gradient
you know this one is or sorry this one is the highest
05:44:16
one so it's going to be completely filled and this one is 36,000 almost half of
this I'm pretty close and so it's almost half um this one again you know it's not
used very often I you don't see these a lot to be honest you just don't um but if
you do see it that's how you use it that's how it can be done again pretty easy uh
as I just showed a second ago if you want to clear the rules you can clear from the
selected cells that's what we're doing so I have column G selected and I'm
05:44:46
going to I'm going to clear that if you want to clear the rules for the entire
sheet you can do that as well so it would affect every single column and row we'll
just do this for now so now let's go look at the top bottom rules so so this is the
top 10 items top 10% bottom 10 items bottom 10% above average and below average and
they're going to do exactly what you think they are going to do if you select above
average it is going to select or highlight the cells that are above the average in
column G
05:45:16
so let's look at the salaries that are above average all right and so uh the ones
that are at the very top are Michael Scotts Toby flenderson and Dwight shro uh no
shock there um I believe the average is somewhere around like 48,500 or something
so I think this one just is just below it and so all these other ones are below
average and that's just because you know Michael Scott and Dwight Sho are and Toby
are kind of bringing up that average quite a bit so everyone else is going to fall
beneath
05:45:47
that so at a super quick glance you're able to just highlight the cells and you're
able to see who is above average and you know you can do this in a lot of different
ways in Excel but this is just a really simple fast way to do that um let's get rid
of that real quick and let's go back up here and now we can oops let's go to top
bottom rules and now we can see the below average and it's going to highlight all
the other ones and so it works exactly how you think it is going to work and this
is
05:46:16
the default way that it highlights these cells so it highlights them this kind of
um seeth through red and then it highlights the actual text or or the um characters
in there red as well now I'm not going to go through and show you every single one
of these top bottom rules I think they're pretty self-explanatory I just kind of
wanted to show you what happens when you do use one of them it's going to highlight
that cell so let's go up here to the Highlight cells rules and honestly these
05:46:43
are the ones that I use by far the most uh all these other ones combined I do not
use more than this highlight cells rules um and the one in here that I use more
than any other conditional formatting rule is this duplicate values so I'll start
with that really quick and I'll kind of show you a few few of these other ones but
this duplicate values to me is one of the most useful ones um and so let's kind of
show you how that works if we go to the start date you can see that we have a
duplicate value right
05:47:13
here and if we go over here to conditional formatting highlight cells rules and
duplicate values it is going to highlight um the uh duplicate and that says
duplicate right here now we can go through here and click on unique um and then it
would highlight all the ones that are not duplicates um so you can use it you know
kind of in a similar inverse way uh it's just different different but I use the
duplicate almost always um another thing that you can do is go over here and you
can change the
05:47:44
color um or you can even do a custom um which I never do that it's not um something
I spend a lot of time doing I typically just stick with this one so you can do that
and it's going to highlight um you know something that has a duplicate value in
there now why do I use this so much well I work with a lot of different types of
data sets but one thing that you'll find in almost all of them is they have some
type of ID and they're going to have some type of um personal information whether
that's a
05:48:17
social security number or an address or um you know or a cell phone number or
something like that there is going to be data that is going to to identify that
person now I work a lot with pharmaceutical data a lot with Pharmacy data um as
well as Healthcare data so like names Social Security numbers addresses phone
numbers all those things all that customer or or client information and oftentimes
when I get a new data set and I have it in Excel or I convert it to excel I will
start using these duplicates to try to
05:48:51
find issues with the data and I find them all the time either there's an employee
ID or some type of customer ID or client ID that has a duplicate in there that
should not be in there or there's multiple Social Security numbers or there's an
issue in some other way and I'm able to find those things and spot those patterns
using this duplicates and I promise you I use this one almost every single time I
open a new data set or I work with a new clients working with their data um and
05:49:17
so I wanted to show you this one I wanted to really press upon you that this one is
a really really really good one to know and learn how to use it's not complicated
it's not hard it just shows you you know you know if there's a duplicate value but
I wanted you to know how I use it and how often I use it so that you can you know
pick that up and put that in your tool kit in your back pocket so that you can use
that later on if you have uh if you have a similar need or if you're trying
05:49:42
to do something similar to what I was just talking about so that is how duplicates
work again super great it's obviously not super useful when you're only using um 10
rows but when you have you know 50,000 100,000 and there should be zero duplicates
in there and you highlight it and then uh you come right here use the filter and
we're going to filter and we're going to sort by the color and it allows you to
sort by the color and you have duplicates in there then that's a problem and you
identified a problem
05:50:14
super quickly uh and you know some of those things they slip by because nobody
checks it and so that's something that I I often check and if you go here and you
sort by color and there isn't an option to do um this this pink red color and that
means there aren't any duplicates and that a really good thing most of the time
that's a really good thing so let's go ahead and we're going to clear that as well
as get rid of our conditional formatting rules now another one that I use a lot
05:50:43
is this one right here which is the text that contains honestly this one comes a
lot in handy especially when you're looking for like a specific keyword in my uh
case a lot of times I was using this when I was going through drug names I am not a
doctor I do not pretend to be a doctor and so when I was looking for laraza Pam or
something like that um I would just search for like lorz or something and and not
Lorax but loras you know I I would just search for it and then all the ones that
contain that
05:51:17
would pop up I can bring them to the top and I can see them and to me that's super
super useful and I would do that all the time and so in this case we're looking at
emails and let's say we all only wanted to pull all the ones that are Gmail and so
now we can go through and we can you know click okay and that's going to pop up or
we want all the ones that have Dunder oops Dunder Mifflin and if we click on that
all the ones that are Dunder Mifflin come up or have done their Mylin in it and
again we
05:51:46
can um sort by or we can um and so we can sort by right here and we can bring all
those to the top and so super super useful um and another use for it that you may
not think of is something like if it's you know there's some incorrect data in
there this happens often with phone numbers addresses um start dates or or or dates
in general date formats where you can go in here and you can say text that contains
and if you know you put in a oops a dash and it has it in there then you know that
that is that is
05:52:23
wrong now that is really all I wanted to show you in the Highlight cells rules uh
the duplicate values and the text contains are by far the ones that I use the most
all the other ones I have used um these ones not so much but in these highlight
cells rules I use you know these two all the time um sometimes I use this between I
don't really use these other ones as much although I have used them and so you got
nothing else from this video I just wanted you to know that these two are super
useful and
05:52:51
if you haven't used them before to maybe try them out and see if you can apply them
to your own data sets now we've looked at all of these preset ones in conditional
formatting but you can also do a new rule and so if we click on new rule right here
and we go down to use a formula to determine which cells to format we can add our
own formula in here that will then highlight exactly what we want and so if there
isn't a preset rule that you like and it doesn't have the option that you want you
can do
05:53:20
almost any formula that you want in our formulas video that we did a few weeks ago
and you can put it in here and then you can format uh what you want the cell to
look like if it meets that criteria so let's take this right over here um and
before we start this formula I just want you to note that you know I have h11
highlighted that's going to come into play in just a little bit but I want you to
be aware that h11 is the cell that we're highlighted so what we're going to do is
we are going to
05:53:48
create our formula now if you've never created a formula I highly recommend uh
watching my formulas tutorial because that is going to show you how to do this um
but we're all we're going to do is we're going to do equals that's how you start
the uh how you actually create a formula and we're going to give it this range
right here and so it's going to take everything from G2 to G10 now these dollar
signs are super important if you don't know how to use them or you don't
05:54:14
know what they do um you're going to mess up this formula a lot uh and so what this
dollar sign basically does is it's basically hardcoding it in there it is only
going to look at G2 and is only going to look at G10 or through G10 because that
colon and this can come into play because if you have something selected like the
h11 it's going to mess it up because now if you have h11 selected like we do you'll
see this in a second it's not going to be applied to this um and again I'll show
you that in
05:54:46
just a minute but we don't want this hardcoded in there okay but we do have to
select the proper range in a second um so we're going to get rid of this we're
going to get rid of the dollar signs because we want to pretty fluid and be able to
applied to be applied basically anywhere we want let's go into this formula um if
it meets our criteria let's give it um let's give it a border and we'll give it um
we'll give it some color we're going to say if this is greater than
05:55:17
50,000 so let's hit okay and nothing happened so let's go back and see why so if we
go to our manage rules you can see that so as the G2 to G G10 is greater than
50,000 but it only is being applied to this h11 cell which really makes no sense um
so if we had wanted to get it done the first time we needed to have basically
selected that G2 to G10 right away um but we can do that now so let's get rid of
this and we're going to say G2 to G10 and that is hardcoded in there that's should
be fine still um but let's
05:55:55
see what it does and so now every every single thing is highlighted and why is that
uh that's because when we changed it it also changed the format of it because we
changed the cell that we were looking at so we need to come back here and that's
why again you want to do this the right way the first time we're going to come back
here we're going to give it this range and we're going to get rid of these dollar
[Music] signs and now we're going to hit okay and so now it's being applied G2 to
G10
05:56:28
and G2 to G10 and we'll keep it like that and we'll apply it and now it works
properly so now everything that's above 50,000 is being highlighted again if that
was confusing um it it is confusing it genuinely is and so if you wanted to do this
right the first time without having to make a bunch of changes you'd want to
highlight these before you start and then you want to go in and create the rule
we'll do this really quick just to kind of show you what I'm talking about we'll
say equals we'll give it
05:56:56
this range get rid of these real quick because again I don't want this hardcoded in
there it will ruin our formula and then we'll say greater than 30 um and we'll give
this nice green uh and so now if they're over the age of 30 it will be highlighted
and we didn't have to go back and change anything we didn't have to go back and fix
anything like we did in the first one um that was all for demonstration purposes
but again you need to really be aware of that that is something that I think think
almost
05:57:29
everybody's going to mess up at some point if you don't already know about it then
you definitely are going to make that mistake now if we come over here in this area
uh we go to our manage rules and not just the current selection but this whole
worksheet then you can see that we have these two formulas now you can go in and
edit any of these by double clicking or clicking on it and then hitting edit rule
you can also delete these rules or duplicate these rules um I just wanted to show
you what
05:57:53
you are able to do with them but if we uh go ahead and we get rid of this um so
let's say we delete that rule and we hit apply uh you know the rule is going to go
away that's that I mean it's as simple as that so that is how you can create your
own rule I want to be again very specific in the fact that that is a confusing
piece and if you mess that up you're going to be you know fixing a bunch of
different stuff and not understanding why your rule is not working properly it's
just because it's
05:58:23
confusing those dollar signs are are really important to watch out for and that is
all all there is to it with conditional formatting again conditional formatting is
um you know it's not anything super confusing we've looked at more complicated
things but it's a really really useful tool to use to look at these patterns and
Trends super quickly and to find um these outliers or these specific values that
you're looking for very quickly and if you're looking at just thousands and tens of
05:58:50
thousands or hundreds of thousands of rows this is one of the fastest ways to find
these things without having to kind of wait and filter and use these um these these
filters right here because again this can just take forever um and so if you
haven't or if you've never worked with a ton of data and tried to use this before
it can take honestly like 10 minutes for something simple that you could do with
conditional formatting in like 10 seconds so definitely something to mess with and
use when you are working with your own
05:59:17
data sets uh I hope this was helpful I mean honestly I use this all the time so you
know I hope that somebody out there can can use this uh for their own work that
they're currently using thank you guys so much for watching I really appreciate it
again huge shout out to you me for sponsoring this Excel series if you like this
video be sure to like And subscribe below I'll see you in the next [Music] video
what's going on everybody welcome back to another Excel tutorial today we will be
looking at
05:59:55
[Music] charts now if you have data in Excel and you want to visually show that
with bars or graphs or anything like that you can do that really simply and I'm
going to show you how to do that today and a lot of people are a little bit
intimidated because they think it's a little bit complicated but I promise you by
the end of this video you will know how to do it like a pro it's not that difficult
it's just you need to know where to look where to click and how to actually filter
through things to make sure that
06:00:24
you're visually showing the things that you want to show but before we actually
jump into the the tutorial I want to give a huge shout out to the sponsor of this
Excel series and that is udem me you may not know this but I probably get at least
15 to 50 companies every single month reaching out to me wanting to sponsor the
channel and promote their product and I turn down almost every single one because I
either don't know their product or I don't believe in their product and so I'm not
going to
06:00:46
you know go and promote that on my channel but unud me is one that I have
consistently promoted over the past year and that's because I truly believe in
their product I've been taking courses off their platform for years and I've
honestly learned so much and I cannot recommend them enough so if you want to take
a full-fledged Excel course I have my recommendations in the description if you
want to check those out thank you again to UD me for sponsoring this Excel Series
so without further Ado let's jump
06:01:10
onto my screen and get started with the tutorial all right so let's jump right into
it right here we have the Dunder Mifflin sales report and over here we have all the
products that they were selling along with the months that they were sold in and so
in January they sold 450 reams of paper down here we have the total it items per
month and so in January they sold 898 units of uh products or or things that they
sold at the very end we have the year end total so this is the total amount of
paper
06:01:38
that they sold throughout the year now we're going to use this data right here for
all of our charts now you may not have data exactly like this it can come in lots
of different flavors but you're going to get the basic gist of how to use charts
how to edit it how to customize it to fit what you need and then we're going to
kind of put it right over here and kind of create its own sheet where we can kind
of visualize all the things that we want to show so let's jump right back over here
06:02:07
into sales and first thing we need to do is kind of highlight the data that we're
going to be working with now I'm going to start with everything but um you know
I'll show you along the way we don't actually want everything but we can filter
that stuff out as we go so let's go right here and we're going to insert and we're
going to go over to charts now this is the chart section there's lots of different
types of charts um but the first thing that we're going to be
06:02:32
looking at is right here this is a 2d column or kind of like a bar chart and we're
just going to click right here and we're going to pull this down so now that we
have this down here there are a few things that I want to show you before we
actually really get into it I kind of want to show you the options that you have so
if you go up here we have different uh chart Styles and so if I hover over them you
can see that each one kind of looks a little bit different and it really doesn't
matter
06:03:03
uh it doesn't really change the data in any way just how you visualize it and so if
that is important if that is something that you um you want to stick with a certain
theme or a certain look then go for that uh the other thing that's really nice to
have over here is this switch row and column so right down here you can see this
purple and you can see this red those are our rows and columns and we can switch
that right here so if we go like this now instead of the months being right here
the
06:03:32
months are the colors and the actual product is right here let's click it again and
it'll go back and so now we have this kind of Time series now we have January
through the end of your total now this one is one that I think is super helpful you
know it you can do it down here as well if you go to this filter um but both of
these are super helpful because you sometimes just want to select all the data and
then kind of get in there and mess with with it something that we want to get rid
of is
06:04:00
this total items per month so we want to remove that and then we also want to
remove this year-end total because both of those are are kind of the end result
they're not the actual data per month or or per product so we're going to get rid
of those and we're going to apply that and as you can see just right off the bat
our data is changed dramatically uh and that's because we aren't including these
these large large numbers that were kind of throwing off uh the visualization for
us so this one right
06:04:30
here as is is already pretty good um what we can do right here is we can change
this and we're just going to say products sold per month now what we can do if we
want to move it to another um to another sheet is we can actually move the chart
and we can select where we want to move it we can move it to chart sheet and we can
do that or something that I do um almost 99% of the time I just copy and I come
over here and I'm going to paste it and so now we have this um this chart right
over here as well as back here and so I
06:05:11
typically tend to do that because now we can still go over here and change this one
as much as we want so if we want to go in here we can alter this one and it won't
affect the other one so we just have basically two copies so we're going to keep
this one right here this is going to be our first visualization um and as I said
said it's it's fairly straightforward if you've ever done any types of charts or
graphs before um right here it's January February March April May and if you
06:05:36
hover over these you can see that that's the the paper and if we just glance you
know the paper is their biggest product by far and so that blue um which is their
paper is going to be the biggest every single month so that makes perfect sense now
what if we want to change up uh the the kind so what if we want to change up the
kind of visualization that it offers us well we have a lot of different options
let's go right over here to change chart type now this is going to offer you just
about everything
06:06:08
you could possibly imagine or want and even things that you absolutely would never
ever want ever um and so I'm going to show you some of the good ones and I'm going
to show you some just absolutely insane ones that uh Excel came up with which
cannot I could not imagine a scenario that these are ever used um but Within These
columns you can do they're called cluster columns uh these stacked columns so would
look just like this those are often used as well um and then we have ones that
06:06:40
they're just not used often let's look let's take a look at this one right here I
mean it's tough it's tough to look at um but let's let's put it right here this is
basically the same thing that we just had except visualized in a different um we'll
call it more unique way uh and let's for the sake of it let's put it over here um
these two things show the same information they show the same data just one is
shown well and one is not shown well um I'm not a fan of
06:07:11
these 3D type of visualizations I I just don't like them but maybe you do and and
you want to use that that's fantastic let's go back um something else that you'll
probably use a lot are things like these um these line graphs okay so these are
line graphs and they're different types so they're these stacked um 100% stacked
line lines with markers different flavors for this this type of line graph and so
you can go in here and take a look again um not my favorite but they
06:07:47
have it as an option if you CH so choose to do this um but I kind of I'm kind of a
simple guy um but I'm going to go in here and it's pretty cluster um I want to kind
of take the ones that have the highest sales or the highest total amount sold so
that would be paper manila folders and three ring binders so let's go in here we
want to keep paper we want to keep uh manila folders and we want to keep three ring
binders and let's apply that and so now it's a lot cleaner and we're just going
06:08:25
to copy this and we're going to put it over here and I'm just putting these all
over here for you U because we'll look at this at the end and just kind of see
different options and and ways to do things as we have gone through this tutorial
so let's go back here now something else that we haven't looked at is the actual
colors and color schemes that you can do so let's go right here to these chart
Styles and we can go to color now color is um something that probably is quite
overlooked um in
06:08:55
actual charts and graphs some terrible colors like this or or this um where they're
really close together especially when you have a lot of them um for example let's
just pretend we put all of them back really quickly it is near impossible to
distinguish these colors um we wouldn't we wouldn't want that let's go back to this
color you know when you have it like uh in some of these colors at least it at
least distinguishes them so you can kind of see what you're working with with um
but when you have
06:09:28
it in these monochromatic options sometimes they're just impossible to distinguish
so be sure to choose the right colors that you're using so that if somebody who's
never seen this data before looks at it they can easily distinguish uh the product
and the month that you are looking at but let's go just back up here we'll choose
this default option um well let's choose this one right here this one's nice
although there's lots of yellows and oranges let's see this one this one's not bad
06:09:57
greens blues uh and like yellows so that's nice um other things that we want to
look at and there are these chart elements right here other things that we can add
are things like data labels um and right here it's super messy um but if we went
back and we got rid of some of these things like the printer Staples highlighters
pens and total we apply that it's a little bit easier to distinguish um and that's
you know something that you may be interested in doing you can also add this data
table at the bottom which
06:10:34
is the actual columns and rows that you have for this visualization right here now
let's expand this quite a bit I'm going to make this extremely large if you have
something like this it actually can be pretty nice um you know maybe we get rid of
these data labels but it can be easy because you're putting it all in one place you
can also make this two separate visualizations so you can have one visualization
just like this and right underneath it you can have the actual rows and columns but
this option
06:11:01
allows you to put it all in one so let's put this back down because that is way too
big and uh wait let's expand it a little bit now if you notice right here we have
our Legend up top um it is possible to actually change that you can go right here
and you can move this um kind of wherever you want um but it's not exactly easy to
put based off how we have it right here if we go into to this chart elements we go
down to Legend and we hit this little arrow right here we can select it on the
right the top the
06:11:36
left and the bottom or we can just go to more options uh which allows us to push it
anywhere but um let's say I want to do it just like this I'm going to put on the
right and I actually want to bring it down right here and you know that's just an
option if you want to kind of customize it a little further makes a little cleaner
uh you can do that with almost any of these things so if you click on this oops if
you click on this you can move this anywhere as well so if you want to move this
over here on top
06:12:03
of it you can and make it look terrible or you can move it uh right back over here
you know this is something that you can move around uh you just kind of want to
make sure you're doing it the right way so let's get this back where was there we
go now before we go any further let's copy that and put it right over here with our
other uh charts and graphs and if you see over here on this side we have this this
format chart area notice I haven't showed you this at all yet that is because I
genuinely just don't
06:12:32
use this almost at all um there are some good stuff in here um and I'm sure that
you know if you were someone who really wants to go in there and super customize it
you can do that um but I honestly I just never get in here and I never you know
change the glow or the Shadows um just not something I use and some of these are
only for these three 3D formatting which I never use and so I'm not going to show
you and walk through these things again I I really don't use it and so if you want
to go in there and
06:13:02
mess with it uh you know by all means go for it it's just not something that I want
to take the time to show you and with that being said let's go back over to this
chart sheet that we have and it was super super easy to get these um charts and
graphs and and and whatnot there are lots of different options again if we go back
here and we go up here to chart design and go to the change chart type and again
there are a ton of different options like a pie chart um like this it's it's you
know
06:13:34
you can try to figure this out and use these um but you know I wanted to show you
the ones that you'll probably use the most which are these columns and line charts
and they all kind of are similar in their own way this bar chart is basically you
know this column chart just on its side and so they all have their different flavor
they all have their different way of visualizing the data but but in essence
they're using the data in a similar way to to visualize it and represent the data
itself especially things like these box
06:14:03
and whisker plots or these waterfall charts uh you know these are things that
usually require specific data to kind of use uh and and so I'm just using data that
you'll probably see the most of um like this this sales data so I hope that this
given you a pretty good um you know quick understanding of how to use these how to
customize them how to copy and paste them over to to a different sheet to create
some type of little uh chart and visualization sheet that you can use to show your
employers and and visualize
06:14:35
the data that you are working with thank you guys so much for watching I really
appreciate it again huge shout out to you to me for sponsoring this Excel series if
you like this video be sure to like And subscribe below and I'll see you in the
next [Music] video what's going on everybody welcome back to the Excel tutorial
Series today we'll be looking at how to clean data in [Music] Excel now knowing how
to clean data in Excel is actually extremely useful and there are a ton of
techniques to do this
06:15:16
I'm going to be showing you the ones that I probably use the most I feel like are
the most helpful to kind of do the bulk or the majority of the data cleaning that
you're going to do in Excel like I said there's so many different ways and very
specific things that you can do but I'm going to highlight some of the bigger ones
that I find the most useful and some of you may be thinking well I'll just do my
data cleaning in SQL or python or when I get it ready to put it in Tableau um but
06:15:39
honestly a lot of the data cleaning at least a lot of the big stuff I tend to do in
Excel IF the data set is small enough to fit in Excel and so I think it's actually
really really useful to know how to do this because you'll most likely be doing it
more than you think now before we jump into the tutorial I want to give a shout out
to the sponsor of this video and is brand new sponsor it is unlocked by Z by HP
unlocked is a movie that's actually broken up into four parts and each of them have
a
06:16:04
unique data science challenge associated with it now I'm going to read this next
part because it's extremely interesting each challenge represents a different topic
so there's data visualization text analysis audio signal processing and computer
vision and you can submit your answers in your work on their website for a chance
to win one of 10 zbook Studio laptops or a free trip to the kaggle World
Championships so I'll leave a link in the description where you can go watch the
movie and then do the
06:16:27
challenges and then submit your answers for a chance to win you should also go
check out their hackathon where you can do these projects with other people just
like you who are trying to figure out these answers and submit them to win as well
so go check that out thank you again to the sponsor of this video unlocked by Z by
HP now without further Ado let's jump onto my screen and get started with the
tutorial all right so let's jump right into it I have this US president data set I
got the base data
06:16:50
set from kaggle uh but I added some of my own data and then I messed some stuff up
as well just to kind of um demonstrate some of these things that we're going to be
looking at today this is not a full project so you know we're actually going to be
using this to create any visualizations or anything like that so you know all this
is just for demonstration purposes but we will be doing a full project in about two
or three videos uh in this Excel Series where we're going to be doing from start
06:17:17
to finish with a real data set so you know if that's something that you're you
wanting then we will absolutely be doing that now something that you may be
wondering is how do you actually identify what you need need to clean in the data
what do you know to look for well some of the obvious things are things like
formatting and standardization so things like you know this James Monroe is in all
caps that happens all the time within real data um and and so you know you want to
standardize that or this all lowercase
06:17:43
you want to standardize that you want that all to be the same there's also things
like um right here or we have this wig and this wig with a bunch of random stuff
after it this happens all the time where it's not completely standardized um and
you may even notice um you know there are some spelling errors in here and I'll
we'll kind of look through that in a little bit and then you know there are things
like additional spaces where there shouldn't be spaces there are things like
06:18:11
currencies that you need to be aware of if you were importing this into or going to
be importing this into a SQL database um things like currencies can be just a
problem or be really um unnecessary it may actually cause more issues in the long
run so you may just want to you know take that to the base value and then dates are
always an issue always always always um so always look at your dates make sure
they're they're formatted correctly make sure they're all the same these are the
types of
06:18:40
things that right when I glance at this data set these are things that I'm looking
for um one other thing that is actually the first thing that we're going to start
out with is you want to make sure that your data is not duplicated because if your
data has duplicate data in it and you don't want that it's not supposed to be there
there are some specific use cases where duplicated data is okay um you know you
want to get rid of that and it's very easy to do in Excel uh the first thing
06:19:08
we're going to do we're going to go uh to this data tab we're going to go right
over here and we're going to get see if there's any uh duplicates in our data so
we're just going to go up to remove duplicates it's going to automatically choose
all of your columns to to check against so it's going to for from a all the way
through I it's going to see is the exact same data in all these rows and if it is
it's going to get rid of it um and so we're going to click okay and
06:19:32
it did find one duplicate and I'll show you that one real quick um because you know
it was right here so Barack Obama was here twice and then I'm going to hit control
I hit control Z to go back I'm going hit control y to go forward and it removed
that uh that row completely now in this example you may be able to spot that with
your eye but in a real data set where you have 10,000 100,000 rows there's
absolutely no way you're going to see that or very very unlikely that you are going
to see that there's
06:20:01
duplicated data in there so just running a a a quick um dup or or removing of
duplicates that is really important to make sure that you um have gotten rid of
those things so that's one of the first things that I do um we're going to go into
a lot of these different uh columns and I'm going to kind of show you different
techniques or things that I do when I look at actual data so I'm going to come
right over here I'm going to insert and this is what I actually do I I usually
create a separate column
06:20:30
especially when I'm working with this because I don't want to change this one um I
don't want to go in here and you know say um equals upper equals proper Etc there's
a lot of different ways that you can change um names or not a lot but the main ones
that you can change names and all of them are completely okay so for example I'm
going to hit equal upper oops upper and I'm going to go like this and close my
parentheses so I selected this S I close my parenthese hit enter
06:20:58
it is and I'm going to hit um in the bottom right I'm going toit double click this
and it's going to apply to all of them it is completely okay to have your data like
this if you want it to be like that um if you want it to be all lower you can do
that if you want it to be in proper case you can do that um there are oops there
are different um uses for all of them and honestly as long as it's all the same
typically it's okay but if um you know for example if you're selling
06:21:25
this to like a third party company or something like that they may have um what
they want for their ingestion process when they take your file in if you send you
know a weekly file or a monthly file they may want it exactly how they want it and
you can change that to to what they want um but as long as it's standardized for
you it's all the same for you that is a good thing so now we have all of these um
in the proper case that's typically what I I do or I use upper those are the ones I
use the
06:21:55
most I don't usually use um lower and if you go in here and you type in lower you
know it changes it to all lower I don't typically do that um and I'm gon to add I'm
oops I'm gonna say president Dash fixed and so now all of these names um all of
these uh different uppercase and lowercase these are all fixed and and it just
makes it so much easier to read and you don't have different um uppercase and
lowercase issues it's all the same so I'm going to keep keep that right
06:22:27
there uh if we move a little bit to the right if you look at this prior now this
prior is a mess it it has stuff all over and to be honest this is not really
something that I would probably be using um like in a real data set I would look at
this column and I would say this is pretty useless um if I had a very specific use
case for this this data in this column I might try to you know parse it out and do
something but I don't uh this this is a completely useless com to me so I'm
actually going
06:23:00
to skip this one I'm going to go to this party one and this party one to me it
looks pretty important because this is something that I know I can Group by um and
I can create visualizations with and and kind of break that out and if you look
right here we're going to add um we're going to add a filter so now let's open up
party and take a look so if we look right here we have Democratic democratic-
republican Federalist nonpartisan repu Republican Republicans wig and wig with a a
date and some
06:23:30
information in the back of it and then some blanks um and it's really important
when we're when we're looking at these um ones that we think we might Group by that
we have these um properly grouped so Republican and Republicans to me right off the
bat looks like a spelling error and so I'm just going to deselect All I'm going to
go to Republican Republicans and it's literally Republican all the way down except
for for this last one and to me that's just something that I would update so I
would
06:24:00
just go right here I do that if I didn't do that and then I try to create let's say
a pivot table on here I'll have its own group of Republicans and it wouldn't be
added to Republican and maybe that's on purpose but let's just presume that we know
this data extremely well and that's not supposed to be like that right again that
that just comes back to knowing your data really well understanding what it um you
know what it should look like and we know that it should not be like that so we're
going
06:24:26
to fix that uh the next thing that we're going to fix um and as you can see it it
got rid of it next thing we're going to fix is this wig um that's just like an
error that's that's some issue on the the data side and we're just going to fix
that by updating it and that's it I would always be keeping um a a copy of this
with the raw data uh somewhere else because this is presumably like a working
document this is not a um you know you aren't saving over your original file let's
just say that
06:25:02
and then let's take a look at these blanks real quick um okay so there are these
rows right here that have nothing I think we're okay but if we see anything
different 47 48 okay so yeah it's just these ones right here that have no data in
it anyways it's just seeing it in the filter so not an issue at all so okay we're
looking good we've gone all the way over we we fixed this President we skipped this
one um we we cleaned up this party and I kept this one in here because I'm not
exactly sure
06:25:33
if that's a Democratic or republican so I'm going to keep it its own thing um I'm
not a huge uh history buff in that aspect the next one right here is um the next
one right here is really easy uh this is something that happens all the time
especially on actually most often it's happens on numerical data so like uh you
know there'll be a number of 1,1 and then there'll be a space after it for
absolutely no reason uh and it happens all the time it does happen like this as
well um where you'll see this
06:26:07
and all you got to do is do trim and select the the cell we're going to close that
parenthesis and we're going to apply that all the way down what is so fantastic
about the trim is that it's really intuitive and it knows basically everything it
needs to do for example um it gets gets rid of the um spaces before it gets rid of
extra spaces in the middle and um it'll get rid of extra spaces at the end um which
you wouldn't be able to see but they are there and they they absolutely can cause
issues if
06:26:41
you have spaces at the end that you cannot see um let's take this one for example
like if I had spaces at the end that can cause issues when you insert or or or put
that into a database um that happens a lot with numbers um you know when you're
putting that into SQL that can cause issues and so you really it is important to
actually do that trim um and you can do that on all of your columns or just ones
that you know you're having issues with but once you import that data into SQL you
will know
06:27:07
if there's an issue or not when you actually try to start using it so we're going
to say Vice and we're going to say fixed oops there we go uh this next one is one
that you'll run into a lot when you're working with numerical data you will
encounter so many different issues um one that I run into a lot is I I've worked
with a lot of cost data or pricing data and when it's in an Excel it h it sometimes
comes in with um these currencies like a dollar sign a pound sign things like that
and when you put
06:27:43
that into SQL it just is a nuisance right you're not going to be able to run um
it's going to go in as a text or it's going to be like a string right because it
has that special character and you don't want that you don't want to have to then
go in and then change things around you just want to be able to start um you know
doing calculations on those numbers so what you can do is sometimes it'll come in
as a text sometimes it'll come in as um currency which I think this
06:28:12
one's a currency we are just going to change that to be a number and then we're
going to get rid of these oops and get rid of those that it doesn't look as pretty
but that is much more useful than actually having the currency on there with the
decimals this actually is so much easier when you when you want to use it for
almost anything because you're able to add and uh do things properly in other
systems in Excel I think it does understand it um but you know that can cause
issues so
06:28:45
there is how you do that the next thing that we're going to look at is these dates
and just notoriously whenever I see a date field I know there's going to be an
issue with it it's very rare that I get a date field that is perfect uh it just it
is genuinely is um is a novelty when that happens and most of the time it has to do
with um let's say a date comes into Excel and it's in a text format or date comes
into Excel and they're not the same in this example they are not the same um and we
just
06:29:17
want them to all be similar they say date on if you look right here it says date it
says date it looks like it should be the the same um but if we go like this it all
looks the same right there's no issues at all if we were to um try to use that it
may or may not be an issue but we don't want to leave that to chance later on if
you're using this with python or something like that it can cause issues U maybe
not in SQL because it may um see the underlying um what's in the underlying cell
not just
06:29:51
what we see but some systems won't and so you want to make sure that they're all
the same and so you know what we were doing back here with um oops with the party
and we were looking at this uh this filter and identifying the issues I usually do
that on date fields as well and and oftentimes um I know just for just for
demonstration purposes ofttimes I will get something like that and then I'll come
up here and I'll notice that there's this one random number that happens all the
time all the time um and
06:30:23
so you know you want to make sure that you um that you look at these things and
just just do at least a quick glance if not kind of doing a kind of a deep dive
into it but all we're going to do is we're going to do both of these and we're
going to do a short date and let's take a look and see if that fixed it and so now
they are all the same format and that is fantastic that is exactly what we want
we're going to go back through here we're going to get rid of these um again this
is a
06:30:52
working um this is a working document oops uh we need to we're I'm going to do um
control shift down oops let me go back up do control shift down and copy and what
I'm going to do right now is I'm actually going to copy let me do it right here
I'll show you sometimes I do this does just depends I'm going to go right here I'm
going to hit rightclick and I'm going to paste as a value which means it's not
going to take the calculation or the formula that I just did
06:31:26
uh it's going to actually paste it as that value so we just replaced it um right
here you can see up here it says equals trim of G2 this now now that I copied and
pasted it over as a value um it got rid of that um calculation and now it is
actually a string so we don't need this anymore and I'll do the same thing over
here as well I'm going to control shift down copy and I just hit the right key uh
or the left key sorry now I'm going to right click and I'm going to do paste as
06:32:03
a value and again it has this proper and now it doesn't have the proper it's
actually the value that was here so that's really important to note uh and we're
going to get rid of that one and so now what we have is is already looking much
better now one of the last things I we're going to look at is deleting columns that
we are not going to use and this is why it's so important to keep a backup or or or
the raw data not in this file because if you start saving over this file and this
is your
06:32:30
raw file uh that can mess up a lot of things and that happens to me before and it's
terrible and then you have to request another file or you have to go back and find
it or something like that it's terrible um so so this is our working document so we
can mess with this and do whatever we want for our purposes now for us um I can
already tell you that this prior is a bunch of nonsense and we do not need it we're
not going to use it for anything and it and if we have um this is a small very
small
06:32:58
data set this only has like um let's say you know one two three four five six seven
eight we have like eight columns that we're you know kind of using that has data
eight or nine now that's a small data I've had ones with literally like hundreds um
and and it has so many columns uh so much data and sometimes it's good to just trim
it back to the things you know you're going to use this to me is absolutely useless
um we're going to delete that and then right over here it's pretty
06:33:27
redundant um it's just one number off but if we scroll down just a little bit um it
goes it's basically just counts it's a you could even call it a unique um
identifier if you want sure why not but we don't need both um so we're going to get
rid of this first one and now we have more of the useful and relevant data rather
than the stuff that we absolutely know that we are not going to use um these date
updated and date created we may never use them but we might um so it doesn't hurt
to keep it
06:33:56
on hand those other ones are ones that we are almost certain we will never use
again keep a backup just in case you need it you can always go back and get it so
you know if you go back to what we started with and you look at what we have now it
is much cleaner it's much more usable and these are small subtle changes um
especially with this very small data set of only like 50 rows or or 46 rows but
you're going to be working with data sets that are thousands tens of thousands
hundreds of
06:34:23
thousands of rows and you need to know how to kind of look at this data standardize
it um format it properly for what you're going to be using it for if you're keeping
it in Excel there are different things that you may do than if you're putting it
into a database or going to be using it in you know um using python to to access it
so you need to kind of know your use case but these are some things that I do all
the time to kind of clean up the data before I use it for something whether I'm
06:34:52
creating pivot tables or I'm inserting it into or I'm putting it into SQL these are
things I do all the time and so hopefully that helps give you kind of an idea of
some of the things that you should be looking for when you're actually cleaning
data and it's really important to understand why you're actually making these
changes and the reason you're making these changes because some of the things that
I did today may not be things you want to do on a different data set that has
06:35:15
different uses and different um purposes for so you know take everything that I've
said and and apply it um with a little grain of salt to your data set because your
spefic specific needs may be different than what I wanted when I was cleaning my
data set so I hope this was helpful I hope you this gave you a small glimpse of
some of the things that I'm looking for when I clean a data set or I get a new data
set in and I'm kind of you know analyzing it figuring out what I need to fix in it
I hope this has
06:35:41
been helpful uh with that being said thank you so much for watching I really
appreciate it if you like this video be sure to like And subscribe below and I'll
see you in the next [Music] video [Music] what's going on everybody welcome back to
the Excel tutorial Series today we're going to create an entire project in [Music]
Excel now if you've never done a complete project in Excel where you take the data
you clean it then you create an actual dashboard where people can click on things
and filter things this is
06:36:23
going to be a really great learning opportunity as well as potentially you know a
simple project that you can use for your portfolio or you can spice things up and
go a little farther than what we're going to be doing in today's video I will walk
you through every single step of the way and hopefully we learn something together
and without further Ado let's jump right into it let's jump onto my screen and get
started with the project all right so this is the data set that we're going to
06:36:44
be working with I will leave a link in the description to my GitHub where you can
go and download it so you can be working with the exact same data set that I am
using now before we actually get into this data and start looking at it I'm going
to show you what the final dashboard is going to look like um we're going to create
a few different types of visualizations nothing too crazy um and then we'll create
some filters as well so we can kind of you know create some interactive filters
with our data so
06:37:07
let's go right on over to our data set now I'm going to hide this because we are
not going to use that but what I am going to do before we do anything is I'm going
to create a dashboard and I'm going to create a pivot table oops and I'm going to
create a working sheet so um all these things have different uses and I'll explain
that as we go along so this is our data set um I'm going to copy this over to our
working sheet when I go into you know an Excel and I'm working on something I don't
06:37:46
like to you know use just the one that I was using in case I mess something up and
it saves over or's some issue I like to create a working sheet and keep the raw
data right over here it just makes my life easier I don't have to save it and then
you know open up a different Excel to compare them so we have our bike buyers this
is our working sheets this is our raw data this is the one we're actually be
working on today so let's um let's start looking at it really quick and just kind
of glance and
06:38:10
see what data we're working with and then we'll start cleaning it up making it more
useful for what we are going to be using it for and then we'll start building out
the dashboard so right here we have an ID that should be be a unique ID to each
person uh this is their marital status so married or single this is their gender
male female we have their income children their education their occupation do they
own a home how many cars they own how long their commute is the region where they
live
06:38:44
their age and if they purchased a bike and this column right here is extremely
important this is going to tell us whether they did or did not buy a bike so we got
their information they're looking for a bike but they either decided not to buy a
bike or they did buy a bike and we're going to be using that one a lot in in this
video and so um you know this is basically the data set that we're working with um
some of the demographics and and information behind the person so what we want to
do
06:39:11
when we are cleaning the data before we do anything uh I like to see if there are
any duplicates in here um what we're going to do is come right up here we can go to
uh where is it right here we got remove duplicates so we're going to click on that
it selects every single one we just want to see if there's any useless duplicated
data that we do not need uh and the data is a header so we're going to click okay
all right so we had a ton of duplicates in there uh for whatever reason so yeah we
do have duplicates in
06:39:43
there so I'm glad we did that otherwise we would have uh you know not good data and
we don't want that let's start right over here um the ID of course we're not going
to change the marital status and gender are M's s's fs and M's um this isn't
inherently a bad thing to have it like this but you know we have to think about it
from the perspective of someone who's going to be using this dashboard do they know
what M ands is do they know what M uh and F is and if they don't
06:40:13
it's better to just spell it out for the most part um so let's just do that so
we're going to click on the column B we're going to hit controll H that's going to
bring up our find and replace now there's an m in both of these columns and there's
different things one is married and one means male so we're going to do is we're
going to search by columns um and we'll have match case I don't think that's going
to change anything but that just means an exact
06:40:39
match uh and we're going to do m equals and we're going to replace it with married
and we'll replace all awesome and then we do s is single this one is super easy
we're going to do the exact same thing right here so column C to hit contrl H we'll
do still has by column so we'll do m is male we'll replace all of those and F is
female and replace all those that's great uh you know the next column right here is
income and in a SE in a previous video I talked about how I don't
06:41:20
typically like it in this format and that's true um if you're doing calcul ations
on it or or any other thing it can mess it up sometimes having the dollar sign or
it being a currency we're not really going to mess with it too much right now um
what we can do is just kind of make sure all of it's currency um we'll just go like
that to make it a little simpler but we're not going to change it to like a numeric
um we will use this in the visualization we'll see how it looks and
06:41:49
if we need to we'll come back and change it if not we'll keep it how it is um so so
that's all we're going to do to that one uh the children those look good we have
education partial College partial High School this looks fine to me um if there's
any spelling errors or anything like that of course we need to clean that up it
doesn't look like there is occupation skilled manual manual okay those should be
separate are they a homeowner should just be yes or no all right we have Cars 1 2 3
4 good night
06:42:24
who owns four cars um and then we have the commute distance uh and you know there's
nothing terrible about this it's giving you ranges um which can be a good thing I
say let's keep it for now but I have a feeling when we get further and we start
using in the visualization we may want to change this so let's just hold off for
now um but if needed we will come back to this and we'll change this um and then we
have our region and that looks totally fine and we have our age now when you're
using ages typically
06:42:56
you have some type of like age bracket or or age range and you do that because
there are so many ages in here right it's 25 all the way down to 89 and if you're
using that in some type of visualization it could just get really messy and so
you'll create kind of you know just brackets around these so that you can kind of
condense it and make it a little bit easier to understand so let's do that and just
create a new column and then then we can use that for our dashboard so let's go
right up here
06:43:27
we're just going to create a new column uh we'll call this age brackets and what we
can do is we can use an if statement to kind of say if it's older than or less than
and and and kind of give them these ranges um that's one way to do it and that's
the way we're going to do it right now so let's go up here and what we want to do
is we want to say is going to we're going to say equals and we're going to do if
and we're going to close that parenthesis now what we're going to say is if
06:44:01
this we'll go right back up here if this is less than so we're going do this 31 and
we're going to say comma so if they are less than 31 what do we want to call them
what do we want their their you know name to be we'll call them adolescent oops
that's not how you spell adolescent adolescent um and then if they're not what
we're going to do is we're going to say it's invalid okay and let's just see if
this one works first all right it's not working at all
06:44:39
um okay so basically what we did was um incorrect we did it backward uh we want to
do I said uh L2 is greater than 31 no we want to do like this so let's do that now
all right and it should pull up where if they're under the age of 31 so if they're
30 or below is basically what it's saying so if they're 31 they'll be invalid but
if they're 30 or below it's adolescent so it is working properly um and let's see
what it see what it says perfect so this one is working and and
06:45:13
now what we want to do is we actually want to build on this and make it uh kind of
like a nested if statement if you've ever heard of that or done that before so this
is our first first if statement and this is going to be this is invalid this is our
value if false statement this whole statement is going to become our value if false
for a different if statement um so let let me write it out and hopefully that'll
make sense but we're going to say if do open parentheses and we're going to do it
06:45:44
like this and let's just get rid of this for a second all right uh what did I do
and let me do oops give me a second okay we have our if let me just write that out
again we have our if there we go so now what we're going to do is we're going to
write basically the next part of it so we're going to say if that L2 is and we're
going to do this time we're going to do greater than or equal to 31 so now it's
going to include that 31 so right here we did anything less than 31 so it's 30 and
below this
06:46:23
one is going to be 31 and above so we're going to say these people are middle Ag
and if not then it's going to go to this if statement and then we need to close it
I believe so now let's try this all right fantastic now if um everybody should be
in one of these areas right everyone should either be an adolescent or middle age
because basically all we're saying is is if they're older than 31 or 30 or below
that's all these two statements do so we have um you know our next group now we can
add and go even
06:47:00
further into this and now we can use this entire thing as the um what was it called
the value if false section so that's what we're going to do we're going to do one
more so we're have three different categories so we're going to say if and do uh an
open parenthesis and we're going to say if oh actually Let's Do It um let's not do
it to this one let's do to this top one just easier uh so we're going to say if
open parenthesis we're going to say L2 and
06:47:33
this time we're going to say anybody over the age of 50 uh or we can do 55 let's do
55 so we'll do 55 and we're going to call them old and we'll do a comma and this is
the value if false statement and we need to close our parenthesis so let's try this
anybody over the age of 55 should have old um you know maybe we'll do 54 so anybody
who is 55 is considered old I think that's fair I think that's fair guys oops I
should have done I should have done that to this one
06:48:08
let me get out of this and we'll do 54 my dad is 55 that's why I'm doing it like
this this is fre dead CU he should be in this old category to be fair so now we
have adolescent adolescent middle-age and old these are three categories so we can
now have these buckets these different groups of Ages and it's much more usable
than these individual ages um and so we will be using this in our in our dashboard
for sure now our next one is the purchased bike uh and we're not going to do
anything with that so you
06:48:43
know that is that is that one and you know there wasn't a ton to clean up here we
removed some duplicates um I don't know why it says that what did I do married
married what does this mean even mean I did I write that did I mess this up guys oh
when I did the m and the S uh replacement in there it replaced it with married and
single it's supposed to say marital status oops thanks for catching that guys
thanks for catching that I hope that's how you spell marital uh we'll see so
06:49:24
uh we are going to keep it just like this now what we are going to now now what we
are going to do is build pivot tables with this data so we had our raw data we have
our working sheet and now we want to create pivot tables and pivot tables is how
you actually help build your dashboards or help build your visualizations so we're
going to go right here we're going to hit whoops get rid of that we're going to go
right here we're going to insert and we're going to say pivot table and it's going
to ask us
06:49:56
what range so we're going to go back to the working sheet and we'll just click here
and hit control a this is going to select all of our data for us so it's really
easy and we're going to hit okay and so now we have all of our pivot I don't need I
don't need to pull it out that far that was way too far and now we have all of our
pivot table information over here and so that should make it really easy to you
know actually build out so what we're going to do is start selecting what columns
06:50:27
and what data we actually want to work with so the first one that we're going to
build out is a dashboard that is basically looking at the average income of
somebody who either bought or did not buy a bike so we need in this one we're going
to need their income that's definitely going to be a value right here um but we
want to break it out by male and female so let's look at their gender we going to
pull that down into the rows so um this is basically a sum and no let's look
06:50:57
at let's make this an average so I just went to the um I clicked right here I went
to the value field settings and we're just going to do an average all right and
then we are going to make these um and as you can see there's four decimal points
um we'll keep it as is right now but we may need to go back and change something
then we're going to look at if they purchased a bik or not and we're going to put
that right here so so we can see that uh right here for the people who did not
06:51:27
buy a bike the females their their average salary was 53,000 the average salary for
the average salary for males was 56,000 for yes the ones who did buy a bike the
average salary was 55 for female and 60 for male so the people who had a little bit
more money are buying bikes and you can also see that uh the men are making more
money in this data set just overall in general um so let's make the visualization
really quick but you know I don't know I'm not a huge fan of these decimal points
and maybe we can
06:51:59
just change that in the visualization we'll see um oops that's not what I meant to
do um let's do that so what we are going to do is we're going to click into here
we're going to click insert and we're going to go to these recommended charts and
it's going to bring up basically every single type that we would want um and we can
just click in here and see which one looks good uh oh yeah I love those 3D ones
those are my favorite you guys know that uh let's let's use this
06:52:29
one right here pretty simple um whoops let's pull this right over here and as is it
looks pretty good um you know it shows male female we have the average or the
incomes right here whether they did or did not purchase it um and so at a glance
it's pretty easy to see let's see if there's anything um you know if you want to
change up style-wise go for it I'm just going to keep it as is um but let's see if
there's anything we need to add right do we want to add these access
06:53:02
titles uh for the most part I I tend to do that um it makes it pretty easy to see
so we can go in here and we can just click it like this and we'll say income and
we'll say oops and we'll do gender so that's what that is and and let's go back in
here do we want to add a chart title we definitely want to add a chart title uh for
most of these we'll add a chart title for sure so we'll say average income per
purchase um I don't know if that's 100% right but we'll we'll we'll use it uh if
06:53:39
we need to change it to be you know by gender or something we can but um for now
let's see do we want to add data labels uh definitely not uh a data table um we can
do this it may make it a little easier to read I will say that again these numbers
are just these decimal points are really throwing me off let's go see if um we can
change it in here let's go to see if we can just make these numbers okay and um we
can keep it like that or we can even do something like this add commas yeah I'm
going to keep it just
06:54:15
like this I I think this just looks the best um again I'm I'm getting adding commas
here I'm changing the um decimal place right here it just makes it look a little
nicer a little cleaner um so let's keep this exactly how it is um we can always
change things if we want to uh if we want to come back to it so we created our
pivot table and then we created our visualization basically exactly what we're
going to do for all of these because again all of these need um you know all of
these need pivot
06:54:47
tables in order to create the visualization so let's um get out of here we're going
to scroll down and we're going to create our next pivot table and once we get done
with all of the pivot tables that we need all the visualizations that we need then
we will um we will start so we're going to do control a we're going do okay and
basically do the exact same thing that we did um this time we're going to look at
the distance so for this one I wanted to see you know I try to you know I
06:55:15
created this already I've already done this entire project through but I haven't
really talked about why or what we're going to look at for this one you know know
we're looking at is their income does it change whether they bought or didn't buy
one um so if they said yes you know is there a reason are they making more money is
you know are price points are the customers do they make more money so you we cater
to them or not uh that's a good question uh another thing is you know we're we sell
06:55:43
bikes or this person sells bikes so commuting distance definitely makes a
difference you know does the person who is buying a bike live one mile away from
where they work or 20 miles away uh this will help us determine this next
visualization will help us determine you know who who is doing that or who's buying
it so what we are going to do is we are going to look at the um that one that we
were looking at earlier the commute distance so we're going to bring that right
over here so we have these
06:56:14
you know one mile 10 Mile 1.2 Etc now we are going to uh again we're going to look
at if they purchased a bike that's really important and let's make that the column
as well so now what we have is a count of these Nos and yeses whether they did or
did not buy a bike um one of the issues I already see and we'll I'm going to
visualize it and then I'll show you that this 10 miles you know it's right next to
the 0.1 so it's not an order um and that could be that could be an issue um so we
may have to
06:56:47
revise that somehow to put it at the very bottom because we can either do ascending
or descending uh either one I don't think is going to work so we may have to work
through that in just a second um I don't know if I did that my I plan for that um
yeah so it has this big dip um yeah so let's let's create it um that's okay we're
going to figure this one out together because I honestly um I didn't plan for this
one so okay we have 0.1 miles that's exactly where it needs
06:57:19
to be the one the two the five that's exactly where it needs to be this 10 miles is
not and let's see if I change that 10 10 plus miles to 10 miles plus let's see if
that'll put it down here because I I don't know if it's looking at I don't know if
it's reading it weird um but let's go into this working sheet and let's go right
here and we're going to do controll H and we'll do oops not this one um 10 miles
plus let's get that in there and we're going to do
06:57:55
10 uh miles plus I I don't know if that's actually going to work um we will see so
let's go back to the pivot table let's re go to the data let's refresh uh no it
didn't it didn't change it um okay so let's think about this maybe if we change it
to like a letter it might change down here so start it with uh miles that could
work um let's try it it okay it's already selected let's do the 10 plus miles okay
so let's do um M uh more than 10 miles and we'll replace all let's get
06:58:43
rid of this let's go to the pivot and refresh all right okay so it's not perfect
but it works um and for what we're doing I think we'll keep it how it is so we have
our second one uh and you know there are different ways you can kind of change this
one um you know on the last one we did a ton of different stuff we can do just do
commute distance and we can say what do we want to say on this one what is this oh
this is the count um do we have to do we have to keep this one um no there we go
I'm just going to
06:59:30
do um just one and say commute distance and let's add a title chart title we can
make this one um let's say distance per customer uh that's not 100% true because
it's no or yes um that's that's the important part of this it's distance um average
distance uh let's see we'll just say customer commute all right and we'll keep it
just like that all right perfect I don't think um let me see I don't think there's
anything else we need to add on that one all right now let's go right
07:00:20
down here we're going to create our very last one uh we only had three so you know
sometimes you'll have a ton sometimes you'll have like one on each sheet and you'll
create multiple sheets but um do contr a um now we have our thing now this one
we're going to be looking at these age brackets that we were looking at that we
created um something that I do honestly a lot is is kind of bracket things in into
groups like this and you know for this I'm just kind of made them up but you know
it's
07:00:53
good to know how to do this because I I promise you this one happens a lot or I use
this one a ton and then we just want to look at who purchased a bike uh so the same
thing as we did before so like purchase a bike count of the purchase um you know
pretty easy so we just have to count of either no or yes for these age ranges um
and let's go to the insert we'll go to recommendation um I personally like a good
line for this one um so let's this is already interesting we could do something
like
07:01:29
this that's nice see this one versus this it just adds a dot it looks nice we'll
keep that one um so just really quick at a glance really interesting people under
the age of 30 are not buying that many bikes um age 30 to 54 uh 31 to 54 buying a
ton of bikes uh they buy more bikes or look at bikes more than anybody really
interesting um but yeah we'll make the dashboard in a little bit um let's make
these chart titles we'll do vert oops the horizontal we just call this age bracket
um and then we'll add a chart
07:02:14
title um again you can add some extra stuff if you want to um but you don't need to
uh none of this other stuff we really need I'm just kind of looking at the stuff we
do need or do want uh so what do we want to call this one let's call it customer
age brackets um and it's not perfect but we'll keep it as is for comparison um let
me see if I can copy um or or use this um real quick instead of the age brackets
I'm going to get rid of this and use the age and then let's
07:02:55
use um let's insert recommendation we use a line and we'll use this so This
compared to this just think of it like if a customer or consumer or or not a
customer if somebody you're working with is trying to use this dashboard to
understand this dashboard this is going to be just it's going to I don't know it
might melt their brain just makes no sense it makes sense it's just all over the
place it's really hard to make sense of this it really is I mean you can kind of
see a pattern going
07:03:29
up around like the mid-30s and then it Trends downward but it's hard to see um it
really is so doing these um these brackets really helps and you can even add you
know adolescent um you know 0o to 30 underneath it and in fact we may want to do
that um why not why not let's do that oh whoops um so why don't why don't we do
that why don't we go back I'm just going to I'm doing this on the Fly why don't we
go back uh what am I doing whoops and this is all calculated but
07:04:04
let's do adolescent 0 to 30 let's do middleaged 31 through 54 and then old 55 plus
let's see if this breaks anything I hope it doesn't um and we'll go back to our
pivot table let's refresh the data uh okay it did mess with stuff okay never mind
guys that was a terrible idea don't do that um perfect uh let's get rid of that
that was a terrible idea don't do that I'm glad we tested it out though I like I
like to see if it was going to work no it messed with the um the Order of Things um
I I
07:04:53
intentionally named them adolescent middle- Ag and old because it's it it makes
sense for the visualization um but you know if if I change something and it messes
with it I'm not going to mess with it it was just an idea on the Fly guys come on
all right so let's start building out our dashboard now um when we're building our
dashboard what I personally like to do is to have this pivot table sheet and then I
will copy them over and later we'll hide these other sheets beats um and I'll
explain
07:05:24
that a little bit but I like to have this this one for us so we're going to copy
this so I just click on it hit controlc we're going to paste it right over here uh
let's just make them small for now that's oh gosh no let's not do that oh these
look terrible okay anyways um let's copy this one over oops okay what did I just do
oh I didn't copy this one whoops it's not copying okay we're going to go copy hit
paste fantastic oops guys look away this is this is tough to watch this is tough
for
07:06:09
me to watch I'm the one doing it it is tough for me to watch all right let's go to
this last one I'm I'm gonna try it again all right it worked this time so now we
have um our our three visualizations this is perfect but now we actually want to
create a dashboard now how do you do that how do you make it look nice U and then
we're going to add some you know filters and stuff like that how do we make it look
nice um what happened here what changed what did we do oh my goodness gracious all
right
07:06:40
let's copy this let's paste this let's get rid of this I don't even know how that
happened I've never seen that before that was wild uh Excel is trying to destroy my
whole video I mean I'm doing this for you Excel good night okay no problem at all
what we're going to do and how you make this at least look nice um first off we can
get rid of these grid lines pretty easily and I recommend when you do that when you
make a dashboard just makes it look cleaner makes it look like
07:07:10
an actual dashboard um let's go to view and grid lines so we can get rid of these
grid lines it just makes it look nicer um we're going to make you know we can
choose any color here here I'm just going to get choose a color I like this and
let's we're we're basically creating like a header right if you're using like
Tableau or something um we're going to merge and center so it takes every single
cell that we have highlighted creates it into one let's call this um bike sales uh
I
07:07:41
have I think I called it bike sales dashboard let's just call it that um you know
see what happens let's get that let's make it white and and make it much larger
than it is okay okay um sure let's do that doesn't look bad um what is it doing
there we go uh let's bre that Center perfect um it's not perfect but we're going to
use it all right so now we kind of want to organize these and you know everybody
has their different way of doing it uh I'm just going to start building it out
myself
07:08:22
self and just see how it looks uh and then we'll go from there I like this one
there um we can put this one I I this one's a kind of a longer one so I'll probably
put it at the bottom let's see how it looks um but we'll put this one right here
try to line it up geez let's let's zoom in a little bit let's try to line this up
see what it looks like let's extend it to the end that doesn't look too bad uh
needs to move up just a hair and I'll show you how to kind of align these in a
second
07:09:00
but um that looks not bad and we'll kind of try to align these as well let me zoom
out and extend this the length of this just to make it look nice um you know now
what you can do and you know this is something that's pretty simple is you can get
both of these and we're going to go to shape format and we can just align these
it's really nice to align especially if like the top and maybe like the left to
right but like we're going to align these to the top and they just kind of align
themselves
07:09:33
on the very top now these look much better this one is a larger dashboard or a
larger visualization so I'm going to keep it how it is um and I'm going to keep
this one how it is so it is going to be a little bit smaller as you can tell and
then we'll have this one um and I'm going to do that um I this is going to bother
me if I don't align these so let me do this I'm shape format align to the right and
it's not exactly what I wanted to happen because oh jeez what am I doing that's
07:10:12
not exactly what I wanted to happen I actually wanted this one to align uh this one
to align with this one it did the opposite um so let me just scoot this back all
right visually looks fine but that's how you do it if you want to do it um I I I if
you have multiple of them like this it you can make it look bad so we have our
dashboards this is already looking really good I I like how this looks colors are
coordinated it we have a kind of a theme throughout um and it looks nice I actually
I actually kind
07:10:40
of want to change this one um to um let's see maybe if I did like that it look
nicer than all of them yeah this does look nicer um it doesn't change much either
guys I'm should I do it all right we're going for it we're changing the design on
the Fly should I do it for all of them let's see it doesn't fit doesn't fit um all
right guys just ignore what I'm doing uh don't do any of this I'm just messing
around at this point so this is really great to have it really is and what we
07:11:21
want to do is there are other elements there are other things that people would
like to feel a to filter by and be able to look at but it's not in this
visualization um to be more specific one field that's could be really interesting
is married versus single are single people buying more or um married people buying
more you know it it'd be nice to filter on it so we're going to click on uh any of
these actually and we're going to go up to Pivot chart analyze and we'll click
insert slicer now we can
07:11:50
choose which ones we want to be able to filter on all at the same time or one at a
time I'm just going to do the first one by itself and then I'll show you how to do
other ones um but this one is the marital status so this is the married single the
one we were just looking at and we can drag this right over here bring it in a
little bit all right and we don't need all that space so we're going to boop boop
boop boop all the way up now while we're doing this um it only because we selected
this uh this
07:12:22
visualization it only is working on that one right now we of course wanted to apply
to all of them is not hard to do all we're going to do is we're going to click on
we're going to make sure we're clicking on this we're going to go up to slicer
we're going to hit report connections um and if you remember we have this um this
pivot table that we're working with um and this is where all of our pivots are
coming from so we're going to actually apply it to all of them this is our sheet U
and this is the
07:12:50
name of the pivot table now again we created that fourth one we're not using it but
we're going to apply it to all of them so now when we click on it it's going to
apply to all of them so at a quick glance let's see what single people are doing um
interesting interesting um you know when I'm looking at the just these numbers
right here married people these individuals are making a lot more like eight um
sometimes eight to like 10,000 more on average than their single counterpart um you
know again that's a
07:13:25
rough estimate but it's it's interesting so now what we can do is we're going to
create more of these so we're going to go to uh pivot chart analyze we're going to
go to slicer now we already did marital status but what if we want to look at
things like uh region and maybe something like their education so let's bring up
both of those and look now two of them come up so let's add the region right here
we'll bring that in just a little bit see if we can match it nailed it all
07:13:57
right now we're going to put that up we'll bring this one down just like this bring
it over see if I can match it again come on N almost nailed it I don't know if I
nailed it but it's close all right kind of bring this up a little bit bring this up
and we have to do the exact same thing that we did with this one because right now
again it only applies to that one um chart so what we want to do is we want to go
to slicer report connections add it to all of them okay do the same thing with
education or connections bada
07:14:34
bing bada boom We are looking good and now uh let's get rid of all of them it's
just going to be everybody so now we can kind of slice and dice and choose what we
want we want to look at people who have a bachelor's degree who live in Europe and
are single and this is the information that we have on those people so now we can
narrow it down by certain demographics even further and look at this key
information so we may not you know look at counts and averages of these things but
we're
07:15:04
able to filter on them uh and that's really great to know so bachelor's degrees on
average are making 60s 70,000 um let's look at um let's look at graduate degrees
okay a little more um but you know again I'm just looking at random stuff um but
you can mess around with this take a look at some stuff um this to me I want to
make this color darker I feel like it look nicer darker there we go oh yeah that's
way better this to me is it's a good dashboard right you have key information
07:15:42
that you're looking at nice visualizations it's color coordinated you have these
slicers on the side um to me this is a fantas fantastic just simple dashboard and
there are so many other things that you can do with this data and you can make it
unique and you can add your own spin on it and I highly recommend that you do that
push yourself go past what we just did today and add your own stuff and and use
this and then you can add this to your portfolio website and show this off and show
07:16:10
people that you know how to use Excel which is a fantastic thing to know how to use
and show off so with that being said I hope that this project was helpful I hope
that you learned something along the way I know I did um I was learning things as
we were going and I hope that you didn't mind that I took some detours along the
way um for your amusement as well as my learning uh so with that being said thank
you so much for joining me I really appreciate it I hope you have a good day and
[Music]
07:16:49
goodbye what's going on everybody welcome back to another video today we are
starting our Tableau tutorial [Music] series now this series is for absolute
beginners so if you have never used TBL blow before you are in the perfect place
I'm going to take you all the way from the very beginning of installing it and just
understanding what Tableau is and how you can use it all the way to creating
dashboards and sharing it now personally I hate those videos that are like 3 hours
long and they just expect
07:17:18
you to go through it uh i' like to break my videos up in chunk so if you have ever
done my sequel tutorials you'll know that I like to break things up so it gives you
time to try them out and do them yourself and then you can move on to the next
video so I'm going to be breaking this up into five separate videos but in this
video I'm going to show you how to install Tableau for free I'm going to show you
the user interface we're going to download a data set that you can find on kagle
and then we will
07:17:42
build our first visualization together with that being said let's jump over my
screen and we'll get started all right so the very first thing that we need to do
is you need to actually download Tableau so we're not going to be using Tableau
we're going to be using a free version called Tableau public it has a lot of the
same features except of course it's not uh every single feature that regular
Tableau has but it is absolutely perfect for learning it and for using it and and
you can even build
07:18:07
um you know dashboards and share those for your portfolio um I'm going to put this
link in the description so you can just go and click on that and and all you have
to do is input your email right here we're going click download the app um and then
it should start to download and then you can save that and then you're going to
open this up now I'm going to open it up I don't know what it's going to do I
already have it downloaded um but it should open up and look hopefully
07:18:33
like what you're seeing on my screen in just a second let see what it does um I
hope you can see this but it says Tableau public um it says I already have it set
up but you're going to click install and go through all that um all that setup
stuff uh so I'm going to exit out of here but I'm going to go over here and type in
table of public uh and it's 20 21.3 that's the current version that they have out
if you're doing this in the future they may have you know different versions um so
you should be
07:19:04
able to pull this up right here now um I'm going to go and get our data set that
we're going to be using and I'm going to show you how to get that as well and then
we will actually jump into Tableau and start uh using it so let's go over here I'm
going to get a data set from kagle I wanted something pretty generic uh to show you
in future videos I'm going to show you some special or not special but just
different visualizations that you might use um and we'll get different data sets
for those
07:19:32
because of course not one data set covers all these other types of visualizations
so um we're starting off pretty simple right here we're going to be getting one
called video game sales um and we can take a really quick look at it um here are
some of the fields that you're going to be having uh like rank name platform the
year genre and then some sales data and this is what it actually looks like it's
called VG sales so video game sales it's then a CSV and um you know here are the
fields
07:20:00
and we have our data and all we are going to do is we're going to download that and
I will save it now when you download it it's going to be saved into a zip file so
we need to go to our downloads uh let's refresh this here's our archive we need to
go in here you can just copy it and paste paste it right back into here um and just
so you know that is a uh a CSV so be aware of that so what we want to do is we want
to come in here now since it is a CSV this is not we're not going to be using
07:20:33
Microsoft Excel we're going to be using the text file so we'll come in here we'll
take VG sales now uh one thing I want to do before I do that is I'm going to rename
mine uh VGC sales1 um I've already prepared for this and so I already have that in
there um but so I want to make a distinct one for myself you do not have to do that
so we'll come back here um and then we're going to do text file and VG sales we're
going to open that up and when it pulls up right here um you
07:21:09
can bring in other tables and then you can start to join them together and create
those relationships we are not going to be doing that in this video we'll do that
in a separate one um as for you know just getting started you know we're not going
to be using that but you can see some of these things or some of these fields and
if you notice they they um they're either ABC or they're a number so it starts to
categorize what this field type is so is it a string is it numeric it starts to
07:21:43
automatically do that and that's all done within Tableau and so it just kind of
reads it and that's what it does um what we going to do is I'm going to click right
down here it's called go to worksheet um the worksheets are where you're going to
actually start being able to build your visualizations your charts your graphs all
these things um and so you know we have this in here now and so we're just going to
click right here on go to worksheet as you can see here is VG sales1 you will not
have the underscore
07:22:12
one if you did not add that like I did uh but right down here you can see all the
fields that we just imported from that data set and they even created one right
here for us uh they just generated that field u based on the file so it's a count
of all the rows really so what I'm going to do is I'm just going to walk you
through uh basically what we're looking at some of the things that we're going to
be using today there will be things that I don't talk about but I'm going to
highlight those in in in future
07:22:41
videos when we start using those or going over them um and so let's just start with
the most obvious one it's way over here I'm sure you saw it when we uh this first
came up on the screen because it has all these different charts and visualizations
and graphs and uh these will become available as you start dragging and dropping
our data into this sheet and so if I go right here it says for Scatter Plots try
zero or more Dimensions two to four measures so what our dimensions are are right
here what
07:23:13
our measures are are right down here and so typically uh things like like you say
genre or names or or strings like that are going to be these uh dimensions and then
a lot of lot of times the numerical is going to be our going to be measures next
what I want to show you is right here so you can take something like Global sales
and you can drag it right here into your rows and then it takes your rows and so it
automatically created a sum of global sales now if we take that away and let's say
we drag it
07:23:48
right here it's going to give us a column now you can also do it right up here you
don't have to um drag it on screen you can also just add it to the column or the
row that's typically what I do I it's just more intuitive to me um or you can drop
it in this section right here and it does its best to assign it some type of um
some type of visualization and so that's what it always is trying to do it is
trying to say okay this is what you're trying to do let me try to to get
07:24:21
the best visualization for the data that you're giving me now while we are here um
it went down here into marks and marks is a very important area it's where you can
add color size text detail and Tool tip and I'm not going to go into what all those
are cuz I'm just going to show you so let's start pulling some fields in here and
creating a visualization and then I'm going to show you how all of that works
including filters as well so the first thing that we are going to look at is global
save
07:24:51
and let's put that in the rows and then I'm going to take year and I'm going to
make that the column and this is basically exactly what uh I wanted to do now as of
right now it has only the year and it's looking at Global sales for everything but
we want to break that out a little bit better I Want to Break It Out by let's do
genre so different genre of games now if I add that right here to this column s it
is going to break it up by year and genre if I add it right here is going to break
it out by the year of
07:25:30
course but then in each individual row has the different genre that's not what we
want we want to keep this type of line graph uh and what we're going to do is we're
going to add it to Marks and you can't really see it based off of these colors but
they're all different so we have action J genre we have the sports genre racing uh
role playing all these different genres within it now we can get rid of that cuz we
don't need it anymore uh and this is where these U these marks really come in handy
because
07:26:04
you can start basically doing what you want with them so for the genre I want to be
able to see all these different genres with different colors to me that just makes
the most sense so I'm going to put color right here and automatically it assigns
every single genre its own own color and gives us this Legend right over here and
so it's really easy to see well when you have smaller numbers is much easier but I
know that red is sports and I can go right here and find red and that is sports so
it makes it a lot easier than
07:26:36
when it is all the same color blue so what you can do after that is you can also
add things like uh a label to it so if we take label and we or we take genre put
label you can click right here and you can get rid of the labels that you have and
you can see them right down here or you can also change uh the font so if you want
to make it orange or or whatever color you can do all those same things and you can
also do things like changing where you see these things so for Action you're going
to see it a ton
07:27:10
because for each year action is is at the is on the higher end and so you're seeing
those in those mins and Maxes you can also do it for a selected area so if I come
in here and I select it it's then going to show me what those are so label is
really really uh useful really helpful let me get rid of that really quick uh you
can also do it where the lines end so line ends is at the beginning and the end and
you can also take that away or put that back on so labels are really important
labels
07:27:40
aren't very helpful when you're doing at least I don't find that it's super helpful
when you're doing things like genre so when you're doing your Dimensions so I'm
going to get rid of that and I'm actually going to bring our Global sales over here
and let's label that and right now I think it's labeling the uh line ends we want
to do the Min and Max now if we do Min and Max on the table it's just going to give
us the Max and the men which is zero and then
07:28:09
139.0 it's a little bit more useful if we do it for each line uh this at least
gives us some context I probably wouldn't do this in an actual visual visualization
but to give you some um understanding just how it works so now I know that um right
over here the men and the max or the men sorry the max for these for action and for
sports is right around 138 139 so it's pretty easy to see um and you can again go
in here and you can remove the max or remove the mins whichever one you feel is
best uh
07:28:44
you'll probably keep the maximums in there for each category and so this is a
really quickly becoming uh a pretty usable visualization and that's not the only
label that you can add we still are using year over here so we can always drop year
in there as well we'll create a label and so now we have let's see for this one is
a puzzle genre so we also have the year that it had the maximum uh sales and so you
know just some things that you can do you don't have to add that now let's go up
here and we're
07:29:18
going to take a look at filters because filters are really important you know if
you are making this for a client or you making this for somebody you want them to
be able to filter down uh to very specific information that they want to see so
let's take uh the platform lots of different platforms um as you can see you know
PS4 Xbox um if you're familiar with these we'll click all of these um and we'll
click okay so now this is an option as a filter and all we're going to do is
07:29:50
we're going to click on this Arrow right here and we're going to say show filter
now right now all of them are selected so every single one is being taken into
account for this visualization but let's say we come down here and we say okay I
don't want to see sales for any of these PS the original PlayStation 2 three or
four so I'm going to get rid of this one this one this one and this one and you
could immediately see the the changes that were happening so now none of the
numbers none of those
07:30:21
sales are being accounted for and and being added to the sum of global sales right
here at all so that is just how a filter uh can work and you can also do that and
you can get rid of all of them and you can go in and actually just pick very
specific sales so if you only want to see the PlayStation sales you can go in there
and do that as well so really really handy filter are things that you at least want
to have as an option for most of your your visualizations at least that's what I
found especially
07:30:57
when you're doing client facing work they like to uh get in there and mess around
and look at different look at it in different ways and so that's one that I I think
is is really useful to to have the very last thing that we want to do is we want to
actually add this to a dashboard now let's say we add come right down down here and
we add a new worksheet and actually we might change one more thing on that last one
but we'll just make a really simple one um we'll just give it genre and we'll give
07:31:29
it Global sales as the rows um and this Nifty button right up here which is a
sorting button so I'm going to sort like that I'm going to add the genre in just as
we did I'll give it different colors perfect now we have two really quick different
visualizations right what I want to do is just show you how to combine those
because what you are going to do is you're going to actually come in here and
you're going to do new dashboard that's what this button is right here now when we
come in
07:32:01
here the size is extremely small it's very easy to fix that all we're going to do
is Click right here we're going to go to this range or this dropdown and we're
going to click automatic so now it is a much larger size for us to actually drop
our visualizations into uh and let's put sheet sheet one and we'll put uh let's put
it up top so now it looks a little bit like this uh not perfect but again if I
wanted to make this look a lot better I definitely would and then you can go over
here and
07:32:33
you can rename these things you can also do that back when we were in our actual
worksheets but you can also do it here as well and then start um you know
customizing it and building it out that's not what this video is for that is the
last video we're going to build an entire dashboard it'll be kind of like a small
project you put that in your portfolio um if you have gotten this far and you want
to jump straight into it and you don't want to wait for these other videos to come
out or you
07:32:59
don't you just want to jump straight into creating an entire portfolio project I
have an entire portfolio project series that covers SQL Python and Tableau and so
go check out that series I have one video dedicated to Tableau it's like 45 minutes
or an hour long and it covers a lot of the things that we're going to hear in here
as well as a few other things but I appreciate you checking out this video in
future videos we're going be going over things like creating bins calculated Fields
07:33:29
doing joins and then creating a final project and putting it all together so thank
you so much for joining me I really appreciate it if you like this video be sure to
like And subscribe below and I will see you in the next [Music] video [Music]
what's going on everybody welcome back to the Tableau tutorial Series in this video
we're going to be going over bins and calculated [Music] Fields all right so let's
jump right into it the first thing that we're going to look at are bins and bins
are
07:34:11
basically just groupings or ranges of numerical values so we cannot create bins uh
for genre name platform or anything like that we have to do something with this
sign right here which means that it is a numeric so year or all this sales data or
this ranking data and we're going to use what we worked on in our very first
tutorial and so what we're going to be using to kind of demonstrate how bins work
is this year right down here so right now we have a range of 1993 all the way up to
2018 and we're going to create some bins
07:34:42
to group and create ranges for these years and it's pretty simple all we're going
to do is I'm going to come right over here to year and this little drop down on the
side and we're going to go down to create and go down to bins now it's going to say
the size of Bin and it's going to give you a recommendation based off of the
information that is already provided the Min and the max the ranges of these values
you know you don't have to do this but usually um it it does give some
07:35:12
good estimation on what you might be considering if you were thinking hey maybe do
a bit of like 20 and they're recommending two think about why they might be doing
that we're going to change ours to five and you can always change what this field
is going to be I'm just going to give it an old exclamation point just to um really
spice things up here so we're going to click okay and as you can see it adds it
right up here is no longer um it is no longer a numeric now it is a categorical
07:35:42
so it now it's this is no longer just uh 1 2 3 4 five its ranges its groups and
we're going to get rid of this year really quick actually let's keep it up there
for a second uh see what happens but we're going to bring this up and we'll get rid
of this year and this is is what kind of it spits out for us now I did look at the
data um when I was prepping for this there are some nulls in the Years um and so
all we're going to do for this is we're just going to go like this and we're going
to exclude the
07:36:12
nulls uh probably not something you should be doing uh if you're doing this for
work but this is for demonstration purposes so we can do it ever we want but as you
can see we now have these ranges so this range starts at 1990 and it includes 1990
all the way up to 1994 and then it's 1995 to 1999 and so just really quickly we can
tell that the years 2000 to 2004 were a huge huge huge uh season or group of of
years for game sales so these are the global sales for for these video games and so
it is really helpful it's very
07:36:53
useful um you can do this on a lot of different information we could do this on the
sales data you can do this on age you can do it on years like we did and it can be
very very useful and so uh really quickly that is how bins work I would say it's
pretty straightforward now this is a perfect time to segue into the next part of
the video which is calculated Fields uh right over here on this left hand side we
see that the global sales which are in millions goes all the way up to 900 million
and
07:37:22
created these beautiful bins right down here but let's look at Within These from
1999 to 2015 let's see which of these has the highest percentage of course it's
going to be this one but we can do something called a quick table calculation uh
we'll create a our own calculation later I'll show you how to do that but we're
going to do a quick table calculation and we're going to do the percent of total
and so now we have these bins and instead of just seeing the total amount of sales
that they had
07:37:52
we see the actual percentages based off these year ranges which is really useful
something that you could absolutely put uh in some real work that you do for a
client now really quick just to show you something that you can do if you click
control and you drag this over here you can actually save that calculation so we
can say percentage of global sales and that actually saves it as uh you know a
measure for us so that was a quick calculation but let's look how to actually
create a calculated field so if
07:38:25
we do this right here what is going to come up is just the global sales and you can
do a lot of what you would basically do in Excel multiplication division
subtraction a few other things but we're going to keep it super super simple today
all I'm going to do is I'm going to take Global sales and I'm going to subtract I'm
going to do an open bracket and I'm going to say EU sales and it auto completes for
me I'm going to click okay and created calculation 2 I'm going
07:38:51
to come in here and I'm just going to say Global sales minus EU sales and let's
drag this over these are different um one's percentage one is in terms of sum and
so I'm just going to bring this in right here and so now we are comparing against
the same thing and if we look at the global sales we have probably right around 9
50 million-ish in this 2000 to 2004 bin and for Global sales minus the EU sales
we're looking at you know 650 million so there is a noticeable difference and this
is just
07:39:32
one of the ways that you can use calculated fields to actually just show the
difference between two numbers or you can do more advanced calculations depending
on the data that you actually have so that's it for this video I hope you learned a
little bit more about bins and calculated fields in the next video we're going to
looking at a ton of different visualizations and graphs and charts and just
exploring what options really are out there for visualizing our data thank you guys
so much for joining
07:39:59
me I really appreciate it if you like this video be sure to like And subscribe
below and I will see you in the next [Music] video what's going on everybody
welcome back to the Tableau tutorial Series in this video we're going to be looking
at lots of different visualizations including the scatter plot and density [Music]
Maps now before we jump into the tutorial I have some very exciting news in just
two days on October 7th I going to be partnering with alter X to host a webinar
this webinar is completely for
07:40:42
data analysts who are wanting to change careers to become a data analyst now you
did hear that right I will be the host of the event but but we will be bringing on
guests as well who are industry experts who actually change careers to become data
analyst much like myself they'll be sharing their stories of how they actually
transition careers along with the tools that they found extremely useful and
helpful to make that switch and they'll be giving lots of advice along the way so
if you are somebody who
07:41:06
is wanting to change careers to become a data analyst or just wanting to learn
about data analytics this is an absolute fantastic place to learn a lot more about
that I will leave a link in the description so be sure to go and sign up for that
again I'm going to be there so so it should be really fun without further Ado let's
jump onto my screen and start the tutorial now we are about to look at a ton of
different visualizations uh over here you can see just an array of them but not all
of
07:41:31
them are ones that I actually think are useful or ones that I would actually
recommend using and so I'm going to take you through some of the ones that I
absolutely think are worth learning and using and trying out uh and I'm just going
to kind of just show you how I might use them how they might look how you can
navigate them a little bit now before we do that we do need to go download one data
set it's this Starbucks location worldwide yes we're going to do a little bit of
longitude
07:41:59
latitude here and all we have to do is click this downloads button and it will
download we're going to do that into downloads we'll save that uh yeah I've already
done that but you know I'm doing this with you guys I'm doing it for you so let's
go to our downloads now we have have here we want to come in here we're going to
copy it or um you can cut it and then we're going to paste it here yeah replace it
perfect and now we have it ready to go we'll come in here let's do a new sheet
07:42:35
and I already have it in there but uh I'm just going to show you what I would do do
new data source we'll do text file we'll do directory and we will open it and let's
see what data we have in here before we actually begin uh just super quickly we
have the brand so um whatever company has it and then a bunch of um location
information street address City the state this is all in the United States so
that's basically it and what we are going to do is we're going to go over to this
sheet
07:43:11
three and we have this directory 2 that's the one I just pulled in exact same thing
as directory but so the first VIs visualization that we are going to look at is a
bar and line graph so what we're going to take is the year right here take these
Global sales and these na sales and we're going to be doing this one right here so
this has a combination of two separate uh types of visualizations so sometimes you
just have lines sometimes you just have these uh these bar graphs or the bar charts
07:43:43
and we're combining the two and it's very nice I like how this looks now if you
notice if I put this na sales behind it now it kind of cuts off so now this Global
sales is in front we're going to you know put that back I just wanted to show you
that uh right here there's all some of global sales some of Na sales so if we go
into this all we click this drop down we can change it to a line um we can change
it basically whatever we want I just hit contrl Z to reverse that but what we can
do is we can go in here
07:44:16
and we can change this color and let's see if we can just make it red is that
[Music] possible see what I did I made it orange that works for me um just
something to stick out a little bit more choose whatever color you want and this is
a really nice visualization this is one that I have used in the past we're looking
at Global sales versus the na sales and so it's very easy to see the distinction
between the two and how one was doing a specific year versus how the other one was
doing in that same year so
07:44:47
I really like this if you want to do something uh like keeping it consistent you
can do two bars I don't really like this one as much um and you can again you can
really change it up um there's lots of different ones that you can do again I
prefer the line but you know do whatever you think is best I'm going to change it
back because this is not how I want to keep it but there you go so that is the
first one that we are going to look at let's move on to the second one and we
actually will be using our our
07:45:17
Starbucks data here now when you bring in data that has um any type of map or or um
address or postal code or things like that or or country it's typically going to
create this latitude and longitude it's going to generate that now what we want to
do is bring this longitude right up here and this latitude right there and if you
do the show me right now it's giving us this but what we want to do is add what
we're looking for so what will we actually be trying to search for on this map you
can do
07:45:54
anything from like a postal code um and it will drag us right here let's come over
to this this allows us to kind of scroll around a little bit um we're going to mess
around with this one for just a little bit and me see if I can that's nice that
might be too big let me back up one so at least in the Continental us a little bit
down here this these are the postal codes so right now we're looking at post codes
uh and there are a lot that you can do with this um really color will make almost
no
07:46:31
difference it just becomes this mess so you don't typically want to do something
like that at least not for this let's go to size and if we make it really small you
can kind of see these groupings these pairings um typically of like larger cities
or major major metropolitan areas and so you can do this and it's and it's really
really easy I don't recommend uh labeling this I don't even know if it'll do it um
it would be an absolute mess to try to label all these postcodes well let's bring
this out and
07:47:05
let's bring these State and provinces in now right now we have these little tiny
tiny uh dots on here and I think what we want to do is not increase the size size
but over here we want to actually do this and make it a map so now it's going to
fill in all the states we can you know why not we'll add some color here um but we
can it hasn't numbered I didn't think they were numbered um oh that's interesting I
haven't seen that I didn't look at that before I was just found that interesting
but now we
07:47:42
can see what uh what states Starbucks is in and as you can see they're in all 50
states but it's something interesting to um look at to think about now if we go
right up here we can again choose a different type and we're going to go to the
density now right now it's just doing a density on the uh the state we're going get
rid of that we're going to bring back postal code I'm just switching it up on you a
little bit and you can do it as small or as big as you'd like um you know I like to
do
07:48:12
somewhere in the middle um probably right right about there is fine um I don't
think it's going to make sense to really add any color here again all these poster
codes are different so it's just going to be complete mish mash but this is kind of
how you can use a density map and you can do this with uh countries you can do this
with postal codes you can do this with any type of kind of like address or location
based data so that is how you can use a map again there's lots of different ways to
07:48:42
use a map and so I'm not going to show you every single way but in a really brief
way this is how you can use a map to actually visualize your data that does have
location uh based information in it so let's go over to sheet three uh and this
data that we have over here it just allows for a lot of different types of
visualizations so we're going to use this one um and there are lots of other ones
that you might see out there like this one right here uh we obviously wouldn't be
using this we might do
07:49:11
something like this change the label um and maybe add why have both of these in
here um let's get rid of this oops that's not what I meant let's actually add that
let's do the sum of global sales and we'll just make that into a label as well so
what you can do with these and and how you're able to use them and visualize them
again these are not you'll see these often but these are not often ones that I
would recommend you use that's very similar to these packed bubbles um you can as
these Global sales
07:49:49
in here again add the label it just uh it sometimes is not as straightforward the
information that it's trying to tell you right you kind of have to search for it a
little bit you kind of have to look around um but you can find some good
visualizations in here for very specific types of data and so these are just ones
to consider uh one that you'll see all the time is uh this guy right here and uh
let me see if I can expand this a little bit because this is very small um let's
see we have the I
07:50:28
just want Global sales and let's label that the size I how do I expand this haven't
done this in a while let me just expand this I don't use pie charts what is
happening this is a incredibly large pie chart oh my gosh I am making this um this
is becoming a problem there we go uh and what I actually wanted to do was label the
uh genre as well as I've been doing in all the other ones and we'll label this now
look whether you are a fan of pie charts or not you have to understand that people
07:51:11
use them uh some people just like how they look and for certain data it can do well
for things that have a lot of different um groupings or categories it usually isn't
super great but it does give you some type of order of things give you a quick
glance and people use them right so let's not pretend like it's like the the the
Hideous stepchild all right people use it people have it in their dashboards and
their visualizations all over so it's best to just know what they look like know
how
07:51:45
to do them know um how to use them best again I'm not a super huge huge fan of it
myself I've used it once or twice but one to look out for and again you can come
over to here and use is called a box and a whisker plot um it's good for these
large um distributions you know this is like the median upper upper lower lower I
don't use these a lot but I know a lot of people who love them something to just
look at and or mess around with it a little bit it's pretty I think straightforward
and it does give you
07:52:23
some good insight into your data if you know how to use it now there is one last
one that I want to show you I'm just going to create it on a new sheet make it easy
uh we'll do year here we'll do some of let's do na sales why not and we are going
to make this like this now it's very similar to a line chart but when we break it
out by the genre and we add some color you know it's just a different way to
visualize this information you can uh you know potentially add some stuff in here
like
07:53:00
some labels if you uh want to depending on how it looks for you but this is just
another way to visualize the data so wanting to give you guys some options wanting
to give you some things that you might want to look at if you haven't already used
these before four these are ones all every single one that I've showed you are ones
that I've at least used once um this one I maybe have literally only used once but
the first ones that I showed you the ones I pointed out as the ones that I really
07:53:31
wanted you to know are great visualizations to learn how to use and learn how to
make useful for the data that you have with that being said that is all that we are
looking at in this video again I tried to keep it super easy just wanted to show
you some different visualizations the data that you can use to get those
visualizations and just some other options in case you wanted to get a little bit
uh spontaneous a little bit out there a little bit funky uh to show your boss or
something like that thank you guys so
07:53:59
much for watching I really appreciate it if you like this video be sure to like And
subscribe below and I will see you in the next [Music] video [Music] what's going
on everybody welcome back to another video today we're looking at joins in [Music]
Tableau now before we get into the tutorial I want to give a huge shout out to
today's sponsor and that is udem me they were having a massive Black Friday sale
and so everything is about 85% off so if you've been looking at a course now is the
time to buy it if you are
07:54:44
looking at learning and taking an actual full Tableau course there are fantastic
ones on UD me that I have taken myself so be sure to go and check out UD me while
they're having this huge sale I will include a link in the description if you want
to check them out now let's get into the tutorial all right let's get started and
first we're going to start off in Excel I'm going to kind of walk you through the
data that we're working with and then we're going to put it into Tableau and I'm
going to show
07:55:06
you how to do all those joins in Tableau so the first table that we have is this
demographics table we have employee ID name of employee employee age and employee
gender now look right here because this will be important uh going forward in the
demographics table we have 10 uh individuals and they each have an employee ID now
when we go to the job title we have our employee ID employee name and the job title
but this one is missing Ryan Howard is missing his employee ID and then the very
last one there are only seven employee IDs
07:55:41
and no names um and so we're going to use all of that and I'm going to show you how
to actually do the joins into Tableau Tableau does a really fantastic job of
visualizing for you so it takes a lot of the guesswork out um I am going to include
a link to my joins video in SQL because these two are very closely connected and
and if you understand how the joins work in in SQL you'll understand how the joins
work in Tableau it's almost the exact same thing so with that being said let's jump
over to
07:56:11
Tableau so I'm going to pull this up going go right over here and now we have uh
where where we can connect to our data and so we're going to click Microsoft Excel
I'm going to scroll down here to Tableau joins file I'm going to open this up and I
have it open so I can't use it so let me get rid of that and let's open it again
perfect so now what we're going to do and I'm going to show you how to actually
open up the joins um in a second but what you need to understand is when you first
come
07:56:40
here Tableau doesn't automatically allow you to to use the joins they use something
called relationships and there are joins on the back end but they call it
relationships because they are inferring all of these things they're trying to go
in and make that inference for you so it takes a lot of the work off of you and
most of the time that works and and you know you just plug these two things in here
like a demographics and the job title and it is going to you know help you build
those what they call relationships and you can
07:57:11
click on this and learn how the relationships differ from joins again there's not a
huge difference but it's not as custom customizable and you can't as easily do left
joins or full joins or all these things that we're about to look at so uh I'm going
to take this one off and what we're going to do to actually be able to look at the
joins and and choose what joins we want to use is we're going to do this dropdown
we're going to click open and so now we are in a place where we can actually create
the
07:57:40
joins uh and again it's just much more customizable and so um back when I was using
regularly I would use the relationships when it was pretty simple and
straightforward cuz almost they almost always got it right but uh you know the
joins it it just makes more sense in the way it visualizes it for me so most of the
time I'd be using the joins so let's pull over this job title right here and it's
going to make this connection now before if you remember just about you know 30
seconds ago when it connected
07:58:14
them it was just a line and and so it gave us the this option down here to kind of
edit the relationship but now it's giving us this visualization and so let's click
on it really quick and what is going to come up is the different types of joins
that you can do you can do an inner join a left join a right join and a full outer
join and then you can actually choose the different uh data sources and how you're
connecting them so again um I'm going to walk through a little bit of this but I
think
07:58:43
the sequel video that I did on this shows it so well um I would highly recommend
using that um and I recommend learning SQL too so you know two birds one stem so
I'm going to get into each of the joins how they work what data is going to be
displayed um and these visualizations are really going to be helpful and I think
that it's it's just nice that they have it because it's a little reminder okay um
you know this is what this joint is or this is what that joint is so super super
simple so right
07:59:13
now we have the demographics table and we have the job title table and so what it's
doing right now and let's get rid of this what it's doing right now is it's doing
an inner join and so it's pulling everything that overlaps if it matches on the
employee ID and the employee ID and so right now you only see one through n but if
you remember in the demographics table we had uh 1,000 all the way through 10 so
where's that 10th one well the 10th one is not there and that is because in this
job title
07:59:46
employee ID it only went up to9 and then Ryan Howard just didn't have an employee
ID in there for whatever reason so that data is going to be missing now when you
are using actual data sets very large data sets which we will use in the next video
when we walk through an entire project um when you use large data sets this can be
the difference between clean data and very wrong data and and visualizing it
correctly and showing completely wrong numbers and so you really need to be sure
you understand
08:00:19
how your data works together when you're doing these joins so how can we fix this
how can we um make it to where we can see all of the data well right now we're only
making it to where if the employee ID is equal to the employee ID so we only are
going to see through 109 and through 109 we're never going to see Ryan so there are
two different types of joins that we could do to make it see it and then there's
something else that we can join on to where we can see that data the first that we
can look at is
08:00:48
the right uh join and what this does is it's going to take everything that is the
same but also everything from this job title table regardless of if it has a match
in the demographics table so it's pretty you know this visualization does it all
it's going to show everything in the right table regardless and it's only going to
show things from this table if there's a match so let's try this one and we should
see Ryan Howard in the job title table so let's click on it and if we scroll down
there
08:01:17
going to be n n n n n until we get to over here where we now have the data that we
had in that actual table but again this wasn't a match and so we weren't able to
see that data so this gives us a way to where we can see all of it um all
everything from that right table this job title table and now we're going to click
on the full outer now the full outer is going to take everything from both
regardless of if there is a match at all and so right here you're going to see Ryan
Howard and Ryan Howard
08:01:50
now why are there two different rows for it well because in the demographics table
there was an employee ID so we're seeing the employee ID Ryan Howard his age and
his gender and over here there was no match right but in the job title table again
this one didn't have an employee ID and so we we are going to be able to see this
data but over here it has no match and so that's why showing us two different rows
is because there was no connection there was no match there that's what a full
outer joint is
08:02:23
going to do now just for uh the purposes of seeing what this one does as well we
have the leftand table um and now we are able to see the 110 or or 1010 that we
didn't see before um and it's putting in nulles over here because there's no match
so that's that is um what we have so far now like I said just a second going to go
there is a way that we can do this without using the employee IDs we're allowed to
use a different join Clause now there is the name of the employee in both of them
this one is
08:02:58
called name of employee and in the job title it's called employee name they don't
have to have the same column name in order to join it you can do whatever you want
so I'm going to get rid of this one and now we are only tying it on the employee
name and let's do an inter join and it should be basically everything um except the
only piece of data that wasn't filled in which is that 110 over on the job title
table and so this way was a slightly different maybe uh less thought of way because
normally
08:03:34
you do it if there's an ID you go on the IDS but because we had a lack of data for
in in one of the tables in the job title table we decided to use a different column
to to join on and now we're able to look at all the data together so super quickly
that is an inner join a left join a right join and a full outer join and it's
pretty easily visualized here and you're able to uh change what you're joining on
right here but you're also you can do multiple so if we want to do the employee ID
and the
08:04:09
employee ID you can do that as well and you can keep going as as many as you'd like
um and right here or you can change some of these things uh I don't there aren't a
lot of use cases for this um but you know you can absolutely do this um and mess
around with this as seen I'm not going to go through it in the tutorial because
again 95 plus perc of the joins you're doing you're going to want to do it to where
this equals this um and if you want to get into where it doesn't
08:04:40
equal or or all these other things which is more complicated I think it's much
better to learn that in SQL uh that's my personal preference and so um again all in
the SQL tutorial if you want to check that one out so you're able to join on
multiple things now let's get rid of that one because we can actually bring in this
salary one as well and what you'll see right down here is that we have our employee
ID and this is all coming from the demographics so employee ID name of employer
employee
08:05:10
age employee gender then right over here we have the job title table so employee ID
job title employee name job title and then right over here was or is our salary
table and so we have employee ID salary and employee salary so again this is a way
that you can put all of this data into one place and and just a second we'll go
into the worksheet right down here I'm going to show you kind of how it looks
because it looks a little bit different um than previous tutorials and so I want to
show
08:05:42
you how that actually all works together um but again you can create these joins um
as well and do the exact same thing that we just looked at and customize the joins
customize what you're what you're um uh joining on and then you have your finished
product and so right now we have our demographics plus Tableau joins file and we
can rename that if we want I'm going to call this um demographics plus joins demo
and click enter and so now that is saved so so now let's go down to the go
08:06:18
to worksheet we're going to click on that and so up here on our left side this may
look a little bit different than it normally does um because it's broken out um on
the measure names and the measure values it's broken out by the tables that they
were joined on so we can pull in the employee gender now and we can pull in the
employee name now um and we can pull in the employee ID again if we want to from
the job title table and we can pull in the employee ID from the salary table we
could do that
08:06:47
if we wanted to it makes no sense uh uh for actually creating any visualizations
but you know you can do that and so you probably you wouldn't be able to do that if
you hadn't joined these together and so down here in the measure values the values
that we have are from the demographics table and the salary table all of the um all
of the stuff from the employee title none of those things were um values and so we
can't use there are going to be no values down here and so really quick let's take
the name of the
08:07:17
employee let's take their salary sure why not um let's order that let's take the
employee salary we'll do color and uh expan this out a little bit maybe one more
time oops just like that and there you go so that is how you do joins in Tableau
and I think Tableau does a really fantastic job of making it pretty simple they
have the different types of joins when you click on that that join button and it
shows you the inner and the left and the right and the full outer and they make it
pretty
08:07:54
simple um and and and it's just really useful to be able to see that while you're
creating it and see the output below like we just did a second ago it it just makes
it so simple to create those joins and then just keep going because you already
know what your output is going to be and you can kind of mess around with it and
make sure you're getting the data that you need in the very next video we're going
to be doing an entire project in tap we're going to be using a lot more data and
08:08:19
it's going to be a a complete project that you can add to your portfolio and it's
going to be a really good time so I hope that you joined me for that one I
appreciate your time I hope that this was helpful thank you guys so much for
watching I really appreciate it if you like this video be sure to like And
subscribe below and I'll see you in the next [Music] video what's going on
everybody welcome back to the Tableau tutorial Series this is our very last video
in the series and
08:08:53
today we'll be doing an entire [Music] project now if you're watching this video I
hope that you watch the other four videos in this series just so you can get the
basics down you kind of know what you're doing uh this won't be a crazy hard
project this is a beginner tutorial Series so I'm trying to make this super easy so
you can follow along nothing super comp complicated I promise and if you were
wanting to go above and beyond and just make a lot of different dashboards or try a
lot of different
08:09:22
things there's a ton of data in here and so I'll show you some of the things that I
would do you know as we go through it of the things that I would be looking at and
some of the different visualizations that I might do as well but again in this
video we're going to be singing to a lot of the basics but I'll switch over my
screen in just a second I will show you the final product and then we will actually
walk through step by step of how to do the entire dashboard and at the end you
should have a completed
08:09:44
project that you can add to your portfolio or you know just share on LinkedIn if
you want to do that as well with that being said let's jump over to my screen and
let's get started all right so let's get me off screen and show you what we're
going to be working on today this is the final dashboard that we're actually going
to be building and so it's nothing crazy right I'm sure you have seen all of these
things before um and I'm just going to help you kind of build it out show you what
to do the
08:10:07
buttons to click um and it's really going to be a simple walk through by the end of
this you should be able to do all these things very easily and I highly encourage
looking at at the data and looking at these visualizations and seeing what else you
can do with it there's a lot of different colors a lot of different visualizations
um that you can do with this data I'm just showing you this today and so the more
you go out there and the more you do this on your own and you mess around with
stuff
08:10:33
and and choose different things and see how it all works the better you're going to
get and so I highly highly encourage doing that uh so what we are going to be
working with today is an Airbnb data set I'm going to show you that in just a
second and I'm going to show you the data and we're going to just jump right into
it all right so this is the data set that we are going to be using this is the
Seattle Airbnb open data set and let's scroll down really quick um there's three
different csvs in here and
08:11:01
so this is some of the data that we're going to be working with um some date on
listings and some pricing and then there's the actual listing that shows um the
actual street address the location the price the bedrooms all of these good stuff
stuff and then there's a reviews um and it has you know some comments and you know
talks about some of the reviews so this is what we're going to be working with but
you don't have to go in here and download it I have already combined all these csvs
08:11:33
into one I've put it on the GitHub so I'll have a link below so you can just click
on that and you don't have to do all the stuff that I did to get this set up um
just so you know this is from 2016 so this data set is a little bit old if you want
to you can come right here and I will leave this link as well and you can get the
data set from you know what is this a couple weeks ago uh this is they they are
continuing to update this this is always updated and so you can go ahead and
download these but some of
08:12:00
these are the CSV Dogz um so you may need to like convert it I don't want to go
through that process um on you know in the video and so I am just going to go with
what is literally in kaggle um and use that but if you want want to have an updated
one for your project I just advise you to go in here and grab it yourself and that
should be perfectly good so go ahead and download the data set from the GitHub and
we should be good to go so this is the Excel that I was just talking about this has
all of
08:12:31
our csvs in one place this is you know an Excel workbook so in this reviews
actually let's start with the listings because that's kind of where it all stems
from uh we have our listing and the DAT or the data in here is um you you know
really extensive there's a lot of data in here so let's get over really quick um
the listing refers to the actual home that they're renting out the Airbnb so it
shows their location um and there's a lot more location information over here I'm
08:13:02
getting into it in in just a second so there's the neighborhood the city state um
zip code all stuff that you know may be useful there's a latitude and longitude it
shows what type of property it is so that's really really good um right over here
it has you know how many bathrooms bedrooms and beds um you know sometimes if it's
a five bedroom house it has seven beds so that's why there's those two different um
Fields I don't know if you're familiar with Airbnb and
08:13:31
and you know what they have on there but just something to note uh they have the
price this is the price per day this is a weekly price a monthly price and if
there's a deposit needed uh and then a cleaning fee as well so a bunch of financial
data that's you know super useful we go into it a little bit but there's so much
you can do with that um you know if you want to dig into that and that's kind of it
the rest of it's pretty uh pretty useless um and there's a lot so there's so much
data in here
08:14:00
almost you know more than half by far is nothing you would put in any type of
visualization um and this is pretty common uh you're not going to get data every
column where you're going to be able to use it a lot of times it's just a lot of
useless junk and so you have to know what you're looking for and know uh you know
what's actually useful so that's the listing then we have reviews now what's really
a little bit confusing in here and something that you just need to kind of
understand about the data um
08:14:28
and something that if you're if you get a data analyst job you need to understand
your data because it's very easy to come in here and say okay there's an ID ID
field and here's an ID field so that means that those are the same well not in this
case um this ID field is actually the review reviews ID not the reviewer ID that
refers to like the person this is the reviews ID this listing ID is the actual ID
right there so really important to note um and then the L and so then they just
have their comment there what they
08:15:03
left as a review and then on the calendar um I don't know why I'm scrolled down uh
we have this listing idea again so again that listing ID is equal to the ID in this
listing table and we have a date in a price so this refers to a specific location
and on this day they got $85 for it somebody rented it out um and so then there's
these like T's and Fs um let's try to find a blank one really quick here's a blank
one so there's these T's and Fs uh the t means that it was taken um the f
08:15:35
means that it's vacant I don't know exactly what it means uh what a TF means but
that we can deduce that much from this and so you can see when and how much this
person was making or this homeade uh in that time so really really good data in
here there's a lot to work with um and and so we're just going to be kind of I'll
give you a little bit of a use case for it in a second and then we're going to
start trying to answer some of those the building out some of the visualizations
for that use case uh
08:16:07
again you could have 20 different use cases for this data or more um honestly for
this data where you can build out different dashboards and different reports
literally with just this data but you know we're doing a pretty General broad
project and so it's hard to answer all of them so let's jump over to Tableau we're
going to get started on this and we are going to build out everything all right so
let's come right here uh this is a Microsoft Excel we'll open that up do this one
we will open
08:16:41
it and give it just a second says it's executing the query it's pulling the data in
all right so we have our calendar our listing and our reviews those are the
different tabs at the bottom we're going to start with the listing this is the the
kind of the main one has um you know the there's I didn't show you but there's
about 3,600 locations that they had in there uh let's just have it update
automatically I don't know why we need to click on that but um so we have this
08:17:15
list listings we have our calendar and our reviews what we're going to do is going
to come in here and we're going to open it as we did in our very last video uh for
the joins so now that we've opened it we can kind of go in here and we can do the
joins as um as needed and so let's go over here and we're going to uh let's start
with calendar put it right there that was super slow I apologize all right let's
wait for it to get the data start setting everything up did not think it would take
this long
08:17:57
I apologize no take your time so let's click on here and right now it has the uh
the join based on the price which obviously is not going to work um and if you
remember there is no ID in this calendar it's just just the listing ID um we can
actually look right here there's just the listing ID so we're actually going to put
listing ID is equal to ID and right down here we can see that we have a lot of of
well you can't see it um but we show that there is a lot of data um and so we know
that that is
08:18:37
correct we know that that is now pulling in data correctly because it's showing up
down here so that's a good thing now in this listings there there are about 3600 um
about 3600 listings and so that all the data that's in listings is going to be in
there but on the calendar because we converted from a CSV to an Excel workbook it
isn't able to store as much information so some of the ones in calendar may have
gotten cut off so we can just keep at this inj join because we know that if it's in
listings
08:19:09
it is going to be in calendar we know that it if it um there may be some in calar
Cal that aren't in listings so if we really um you know if we really really wanted
to we could do a full outer or something like that I I haven't really thought
through this as I'm talking through it in my head but we know that uh everything
that's in listing is going to be in calendar and so you know we don't really need
to do anything other than an inner join and we can also pull in these reviews and
it's going to do the same
08:19:43
thing as before where just kind of pulling in the data and it defaults to ID equals
ID now we know that that is not correct um because the ID in here is referring to
the review ID we need to go to the listings ID so we need the ID be able to you
know be part of that listings ID if we do the ID it goes down to 2,555 rows if we
do how it's supposed and because that's just you know it's random luck there happen
to be some numbers that are in both fields um that tie together if we do the
correct one
08:20:18
where we hit the listing ID it bumps it up to I think 2, 373,000 oh maybe more than
that uh 23 million rows right a lot lot lot more and so it's super important to get
these joins right to tie them together on the right Fields if you just do it based
off what Tableau tells you because it has that automated um you know it goes into
these fields and says okay these are the same exact column name so they're most
likely going to be what you're looking for well it was incorrect in this point so
it's really
08:20:49
important to check those things and make sure you're pulling in the right data
again we're going to keep it that inner join um you know if you wanted to you know
try to see if there's any other data that correlate we're keeping it simple today
but sometimes you need to join on multiple things uh so just uh a you know a tip so
let's get out of here um and we are good to go so this is our listings plus Tableau
full project that's what we'll that's what we'll be
08:21:15
working with um and we we were able to tie all three of these um you know as you
call them tables or sheets or whatever you want to call them we were able to tie
them together so let's go over here to our first worksheet uh let's see all right
so this says Tableau public only works with less than 15 million rows of data we
have 23 million rows of data that is uh that's a problem um and when I did this
before it didn't do that so I you know we're going to work through this together so
this is
08:21:46
date reviews I believe this is date for um this is date for the calendar which is
going to be a lot of rows of data and so I'm sure that's part of it let's see let's
do years we only want 2016 oops we only want 2016 let's do okay let's see what that
does let's see if that gets us under what we need um we only want 2016 data anyways
so if it's in 2017 we were going to take it out um anyway so we'll see if that gets
us underneath I have absolutely if this T ends up taking like
08:22:33
20 minutes I will just cut it and you know you won't have to wait as long as I'm
waiting so let's see how long it takes all right so it took about 20 minutes and it
did absolutely nothing um one thing I do know is that we don't actually use this
review tables at all um just for demonstration purposes so we're going to remove
that and let's see if that helps us in any way if it does we're just going to keep
it as is um you know the reviews table is really just for demonstrating how to
08:23:14
do the joint but we weren't actually using any of the data for any of the
visualizations although you could again I'm going to see how long this takes uh and
I'll cut ahead all right so that worked uh perfectly it apparently took out all the
data that we needed all the rows that we needed to get under that level again I was
just doing that to show you the that that joins how you needed to change the
columns to make sure that it joined properly we don't actually use for any of the
visualization so their end
08:23:46
product is going to be totally fine I don't know why uh this didn't happen to me
when I when I created this whole thing already um so just going to move forward
because uh I make mistakes so uh let's keep moving the first one that we are going
to make is that uh is that colorful one I'll probably pop it up on screen so you
can see it uh well if I remember I'm going to pop it up on screen um it's the
colorful one it's the price by ZIP code so we're going to be looking at these zip
codes and kind of
08:24:14
see um you know how expensive is each zip code um and before we actually start I
just remembered I want to talk to you about the use case for this data I want to
imagine you to imagine that you're working for somebody they're like hey where you
know I want to start an Airbnb business I want to know where I should go where
should I buy up buy a home put it up on Airbnb and start renting it out where's the
best place you know what are some of the fact fact that I should be looking at uh
and so
08:24:47
that's kind of what our use case is so we're going to some of the things that he
cares about are things like bedrooms um location which is really important and how
much price he's actually going to get how much money can he charge and so he's
trying to optimize that to make sure that whatever rental he gets he can make a the
most profit from instead of choosing something that you know he thinks would work
but you know in the end he's actually not making that much money so those things
are important so
08:25:15
that's our use case we're trying to help this guy out help him find a really good
Airbnb um so let's take a look at these zip codes real quick we have uh quite a few
of them and there's one that's null uh we'll exclude that or if if it doesn't have
a zip code we'll just exclude those because they're not going to show up on the
these visualizations anyways um and so we want to look at the price so we just want
to find uh the price which should actually be down
08:25:42
here and not the sum uh no we want to look at the average price and let's order
that this is great um so this is the most expensive one uh ZIP code 98134 at $26 uh
per for the average price uh but let's give that some color really quick Let's uh
where's the ZIP code it's up here so let's take that zip code we're going to put it
right over here we're going to do color and it's going to give it some uh assorted
colors now these colors are going to um when we do the map in just a
08:26:20
little bit these colors will um match what we're doing in there and so you know I I
like to try to color coordinate things um we're not doing going too crazy with the
colors today so this is our very first visualization congratulations it is uh it is
complete so uh we can label this one and we can just do price by zip code and I'll
make that bold I don't know I usually like it bold we'll apply we'll do like that
and boom first one is done uh and this is our starting place to say uh Hey person
08:26:59
who's looking to buy this Airbnb here are the zip codes where they are able to
charge the most um for for their Airbnb so let's go over to the second sheet and we
are going to be doing the map and so um map is pretty easy but it it's pretty easy
Once you actually get the data that you need although there's a lot of different
data that you can use for the actual U map right here you need something that shows
um the location and there's a lot of things that show location in here in
08:27:32
fact they already um provide a latitude and longitude and then at the bottom they
generated a latitude and longitude from from some different um fields and then
there's just a bunch of different um State there's um States there's zip codes
there are uh I think another one I yeah like country there's a lot of location data
in here so which one do we want to use we want to stay consistent we don't want to
deviate from that and start using different um L long longitude and latitudinal uh
coordinates
08:28:06
because that could throw off our our results completely we want to stay consistent
with what we're using so we actually want to use this ZIP code but when we pull it
up here it's going to give us uh basically the same um you know it's going to show
these zip codes but we were going to right over here we're going to click on this
one and now it's going to separate them out so now we have all of these um you know
kind of separated out what you might get when you first do this um is it might look
08:28:31
like this you may have to zoom in um I know that that happened to me the other time
excuse me go to here that's what happened to me uh just when I first did it so uh
know that that may happen and we want to change the colors the exact same way that
we did them before so we're just going over here we're doing color and these colors
do um they do should match up with the um with the other ones let me um exclude
this let me see if it does 98134 that's the blue and right over here 98134 that's a
08:29:13
blue I I I believe believe they are going to be the same yep and so just scrolling
back if you look at the ZIP code on the far right uh they are the same so if you're
looking like this section right over here I I'm just wanting to make sure I'm not
going crazy uh before I get into this and realize I'm not correct at all so uh now
what we want is you know this doesn't really give us any information if I was just
to glance at this map I would have no idea what you're trying to show me um any
08:29:42
information off this so we want to show some actual information so first thing that
we're going to do is we're going to actually add the label to this so that you can
see it you know when you're going over here and you see okay here's this um zip
code um in the dashboard when we create it you can click on this but if you just
want to do it visually without having to click anywhere you'll be able to see okay
98134 that's right here so this location right here is you know able to
08:30:10
charge a lot of money it's probably a really nice neighborhood so um and we can
back that up by putting the average price so these these two visualizations are
really they really go hand in hand we're going to add oops not the sum this one
needs to be the average so you go to this measure the sum go to average and there
you go and these should match so this should be 206.125 206.000 so this all matches
um and we can uh we can actually change that size a little bit if you want to
actually get
08:30:50
it in um get it within each of these things you know adjust it as you see fits I
think that's fine right there um no need to mess with it anymore all right so let
me see I think that is everything for this one I don't know if I want to add
anything else uh no I'm going to keep it how it is so that is our second
visualization again these ones are directly uh correlated and and you know this
there's just different ways to visualize it this one you can see actually on the
map where it
08:31:24
is and the average price this one you can see from highest to lowest so again you
know sometimes when you're doing these visualizations you're going to have these
accompanying um uh these accompanying visualizations in your dashboard that's very
normal so let's move over to the third one and for this third one um you know
something that our guy was looking at is he's like okay well you know I'm thinking
about listing it on Airbnb but I also want to live in it so I want to know the best
times to
08:31:57
actually um you know put it on the market for people to be able to use and so I was
like okay man no problem uh let's let's take a look at when when are people
spending the most money in airbnbs and we actually had that calendar um if you
remember let's look let's see this calendar so we have this available the date the
listing all of that stuff um and let's look at the date in here uh and we obviously
don't want it like this we want it to be more uh more of a Time series and we're
going to do
08:32:34
be doing that based off of uh the price for the calendar so let's go see if we can
find that really quick okay here's the price where is that calendar one let me see
okay there's the calendar oh here I totally forgot where that was supposed to be o
that looks terrible okay um let's see let's let's start working on this because
this needs some work obviously uh this is the worst visualization I have ever seen
um so we need to work on this a little bit what we need to do is we need to change
oh
08:33:17
whoops we need to change some the way that these dates are are seen so right here
is a these are two separate things so if I go right here and I Do by quarter it's
just going to change the quarters here right that's that isn't really helpful we
actually want to keep the year here what we want to do it is by year we want to
separate it by year um but we want to separate it let's just do I don't know let's
try weak and see what it looks like okay this is great this is this is what we're
looking at
08:33:45
again um if we went back and Chang this like quarter it uh changed it quarter and
then change it to week it would show the quarters but it wouldn't show everything
right this isn't all the data that we need and so you know you really need to make
sure that you're doing this correct I by default it's almost always year but if
you're looking at it via quarter so like let's say somebody comes in you say hey
what quarters I Want to Break these out by quarters um and not year-over-year
08:34:18
that's how you would do this but in the year we want to break it out by uh the week
and you see this huge drop off um at the end well that is actually because the data
doesn't go past that um there's just like one day of data or one one um week of
data in here with actual um with January of 2017 data so it just drops off because
this is an this is the sum so it only adds up to like um 591 th000 compared to like
the 2 million so we want to get rid of that um and how do we do that uh let's see I
think it's
08:34:58
filter how's it format no it's not format what am I thinking bear with me uh let's
a filter well I was looking for it I just couldn't find it uh let's bring it back
to the 31st let's see if that fixes what we need perfect uh that's all you had to
do um and the reason that this is helpful and often times you'd have several years
worth of data in here um and then you could have you could do even do something
like this um like this one where it has multiple lines the reason that this is
helpful is
08:35:34
because if I'm telling my friend let's I mean just I'm going to say it's a friend
or business partner whatever you whatever you want to use this use case for I'm GNA
tell him hey the beginning of January all the way until like you know even February
it's like really low it's half so there's not a lot of people traveling because
everyone travels when at the end of the year so in November December for the
holidays to visit family um and then in the summer for vacations I would tell him
just based
08:36:05
off this one thing I would say hey over the summer and then at the end of the year
and during the holidays that's when I would be renting out your air BNB okay so
just this one very simple visualization can help him understand the best times um
to do that that may be an intuitive you may have already known that but you can
prove it with the data which is always really helpful um and let's see is there
anything else that we need to do with this uh I'm just going to label it and I'm
going to say
08:36:38
um revenue for year let's do bold do apply there we go do I label this last one I
didn't let's label that last [Music] one and we'll do price per zip code price per
zip code we'll just keep it at that keep it simple um and let's do that all right I
believe we have two more so we have done um we've done three of them um we got the
zip codes we've got the um you know the time of the year now something else that he
was wanting to know is um you know just how things affect it and
08:37:27
something that's going to affect the price of the actual Airbnb is going to be the
amount of bedrooms so the the larger the house the more bedrooms the more it's
going to cost typically so we can take a look at that let's pull in these bedrooms
um and that will be our columns uh no it won't what we need to do um and so I I
knew this was going to happen I just forgot it until right uh until right now what
we this right now is actually a um it's a a value right so it's a number and that's
totally um
08:38:05
reasonable because if we go right here we do count distinct that's because there's
only seven values right it goes there's zero bedrooms 1 2 3 4 5 5 six 7 all the way
up to seven bedrooms right now it has it as a numerical value we want to um change
that to create it as um these measure names not a value so we're going to um we're
going to remove this we're going to go right down here we're going click this drop
down and we're going to say convert to Dimension and so now we're going to add
08:38:38
it as a dimension so there that looks um much more normal I really quick I'm going
to I'm going to keep these in here for a second but we're going to get rid of these
nulls and zeros because if a home has zero bedrooms that's a problem um and so we
want to look at the price again let's go down here in the listings it should be the
price now this is the price for the location per day um if you want to look at
monthly or or you know stuff like that they have that data um but we're just going
to do the price
08:39:10
the average price not the sum um although this is is helpful so just really quick
before we change it this is going to show you which ones make the which ones are
bringing in the most money it also may show you which ones are the most common um
those are all different visualizations that we can do but the one that brings in
the most money uh that brought in 63 or that has $63 Million worth of um worth of
listings so they all add up those one bedrooms are doing phenomenal half of that
are two bedrooms at 30 million
08:39:44
three bedrooms at 18 million and so on and so forth so there's a ton of one-bedroom
ones we may even keep we could even keep that in there um you know if we wanted to
um and then we do something similar later but you can keep something like this in
there what we will do really quick though is we're going to do the same thing that
we've been doing is keeping average um and we are going to get rid of this cuz if
it doesn't have the bedrooms you know that's not helpful to us and if it has zero
bedrooms that's
08:40:16
that's genuinely a problem I will not be renting an Airbnb with my family uh that
has zero bedrooms in it so now we have this and would be really helpful to be able
to see that in the visualization I mean it's just kind of hard to see it as is I
mean it just does not hurt to add that right here do a label um why is it angled
like that maybe I just need to move it out more that looks much better um that's
the average price that cannot be right that's the sum that's why so let's go
08:40:55
over here let's make that average as well much better because uh if the price was
$3 million for a three-bedroom I would not be going there so this is really really
useful information for our friend right if um he wants start you know get into
those one that one bedroom area you know you're not going to be making a lot of
money it may be low cost UPF front but he's not going to be making a lot of money
it significantly goes up when you reach these five and six bedroom homes which
makes sense I mean if it has five
08:41:27
or six bedrooms in it it's probably a really large really nice home and you can
charge a lot more money and our friend is uh extremely wealthy he can buy whatever
he wants and so he may be looking at these um larger on seeing that there's a much
higher return um on his investment the higher and the more bedrooms he goes so
we're going to keep it just as it is um and let me see is there's anything else
that we want to do with this no we're going to keep it just like this uh and the
last one is by far the easiest
08:41:57
and we actually just discussed it a little bit we want to know you know what's his
competition look like so um for those for the bedrooms specifically so let's go
back up to the bedrooms we want that one to be right here in our rows so we show um
these and then we just want to count of um how many listings there are so we can do
that via the listings ID so here's our listings each ID represents one location or
one home so we're going to do that right here uh that looks absolutely
08:42:36
terrible that looks terrible what am I doing wrong here um let me see uh one thing
we need to do is we want to get rid of these nulls and zeros do that really quick
um and then we don't want to do just the ID because I I'm realizing now uh what I'm
doing I need to convert this to a numeric so we can do a count on it so let's um
oops let me see what what is happening this is terrible all right let's put this
back let's make let me see if I can just um do an attribute let's
08:43:18
do the [Music] count and let's do text um no it needs to be a distinct count
because that's that's basically like um a count of the numbers themselves not each
individual ID okay it took figuring out I'm going to keep that in there because you
guys need to see uh a lot of you guys like seeing when I make mistakes so you know
makes it feel like when you make mistakes it's okay um and I'm all about that so
I'm leaving that in there you guys can see me fail a little bit um I just forgot
how to do
08:43:57
that for a second and this is exactly what we're looking for right we want we now
it showed us in that visualization that we were looking at earlier before we um
switched it to the average price this is showing us that there are for one bedrooms
there's 1,800 one bedroom two that 483 3 that have 206 four that have 55 only five
that have 20 and six that have five so the more you go up the less and less it is
or the less and less competition there's going to be now is there a lot of demand
for four-bedroom
08:44:27
five-bedroom six-bedroom uh that's for our friend to figure out um well maybe we'll
help them out with that later um in the with the data you know we could look at the
reviews that we had um there's so much data in here and we could absolutely figure
that out but for what it's worth giving him this initial stuff and he'll have
follow-up questions for us later that's how it always works I promise um so now
we're good with this one let's label this one did I label the
08:44:53
last one I will go back and look um distinct I I'm going to butcher this one I'm
going do a distinct count of of bedroom listings I don't that may not make sense at
all but we're keeping it so we're going to do bedroom apply okay let me see if I
added the label on this one I didn't let me do that real quick we do average price
per bedroom again I'm oops you didn't see that I'm just going with whatever is
coming to my head this probably wouldn't be what I would keep
08:45:36
if I this or like an actual project but it works for now so we have our five
visualizations 1 2 three four and five and let's create our dashboard that's going
to be this button right here so we're going to click that we are going to uh go
right here and we're going to say automatic because we want to use this entire area
and so now we're just going to start um you know pulling them over and I'm just
going to start from the very first one and go to the very last one keep it really
simple so this
08:46:09
very first one we'll pull it over it you know it's going to take up the entire
space until you start adding all the other ones we'll include this one right here
um and well let's leave it as it is you know we'll adjust it once it gets to its
final place now we have number three We'll add this one on this side it looks
terrible right now but give it a second uh then we have number four we're going to
add that across the top okay it's already starting to look a little
08:46:39
better and um maybe I I you don't have to keep this in here um but you definitely
can uh let's start to adjust things a little bit oops okay let's see if I can zoom
in one more NOP I'm going to do it just like that actually let me [Music] see if I
can make it even just a little bit closer perfect uh that's the the best you're
going to get um if you didn't see I use this um magnifying and then I could click
on the area that I wanted to see so we're going to keep that just like
08:47:21
that we're going to move this over because that is um definitely not as important
um and then we're going to move this way over as well so keep it just like that
again this is something where if you want to you can click on this um it didn't I
don't know why uh I can't remember how to get those connected but it's you
definitely can um but okay I was just clicking on the wrong one that's why that is
why but you can click over here and you you know it'll filter um based on so if I
go to this one oops
08:47:55
[Music] dang oh jeez what am I doing oh this is a travesty okay let's try to get
this back all right I'm not touching it guys you get the gist you can mess around
with it yourself I'm not messing this up okay so the next thing we need to add is
the very last one that's going to go right up here and then we're just going to
kind of move it off to the side and let's see going add yeah have this caption um
if you've never seen something like this before um and I actually want to make
08:48:36
this bigger as well oh jeez give me a second it's it's kind of lagging a little
[Music] bit and make this a little bit tall maybe I don't want it as wide but I
definitely want a little [Music] taller give it a second yeah let me scooch this
[Music] back just like that that's fine uh we can keep it like that in my original
one I didn't have this um um you can get rid of this if you want you know you can
um you know just exit out right here if you want to do that but there you have it
uh
08:49:22
this is the entire thing so we started from the very start um we started with this
one then this one uh did some um and this is you know all the zip all of our ZIP
code work then we took a look at the calendar where we looked at the price and did
some time series visualization and then we're looking at the bedrooms and and the
count of bedrooms and so this should be really helpful for a friend it should be an
initial dashboard to get him going and once he sees us he's going to have a million
other questions and he's going
08:49:52
to want another dashboard for different data that's in there he's going to ask
about okay well what if I want to do it weekly or you know I want to rent it out
for the month or you know how many um reviews are people five star reviews are
people giving on you know W bedroom two bedroom three bedroom these are all things
that you know he may ask and then we'd have to build out in the real world this is
what happens all the time you know they make a request and then they're like oh
this is great but I also
08:50:18
want this so um you know your friend is is going to be right in line with just
about everyone else um that has ever gotten a dashboard uh for work or for personal
use with that being said this is it um we have done the entire thing now if you
want to share this it is super super easy to share um and I'm going to try to
remember how to share it uh so we're going to do save to tap public As and we're
going to do this and we're going to make it um let's do Air BnB is it like is it a
capital B is it like
08:50:52
that no that doesn't look right Airbnb uh we'll do full project and we'll save and
that is being created right now um and I will save this so if you guys want to go
look at this you can um and I'll provide a link in the description as well for that
and see if yours looks um similar to mine or better than mine give it a second CU
it's thinking all right so here it is so here's our final our final project um and
if you followed step by step then you should get this exact or very very
08:51:31
similar to this one again I encourage you to if you want to have the upto-date data
to go to that um Link in the description that has um the the most recent data and
they update that I believe monthly so you can go there get the most recent data and
then you can do stuff and you can create a beautiful project just like this um but
with the you know the most recent data again I use the kaggle data just so you guys
can remember and I encourage you to look at the different data points that are in
the Excel there is so much in there and
08:52:01
you can use uh honestly like there's probably 30 or 40 other fields that you could
be using in there that we never even touched um but for this project we're keeping
it pretty simple and so so go do that make completely unique dashboards and and
visualizations and create projects and add it to your portfolios so that you can
create uh a fantastic portfolio website and get a job and that's what this is all
about um it's about upskilling and and getting these skills that you can you know
get a
08:52:30
job or or do better in your job so I hope this has been helpful I really appreciate
you guys joining me and and doing this entire project with me I have no idea how
long this is this probably this could be like an hour for all I know um so thank
you so much for sticking with me this entire time if you like this video be sure to
like And subscribe below and I will see you in the next [Music] video what's going
on everybody welcome back to another video today we're going to be starting our
powerbi tutorial
08:53:11
series now I am super excited to start this series with you guys we are going to be
breaking this up in about six or seven videos I don't really like those super long
videos where it's like four hours long I like breaking mine up into chunks so
that's what we're going to do this is the beginner series and so we're going to
start with the very Basics and we're just going to work our way up and I'm going to
walk you through every single step of the way it'll be very easy to
08:53:33
follow everything will be provided for you so that all you have to do is really
follow along and by the end of it you should know powerbi a lot better you should
have a lot more com using it now before we actually jump onto my screen I want to
give a huge shout out to the sponsor of this video and that is udemy you guys know
that I absolutely love udemy I've been using them for years and that is no
exception when it comes to powerbi I have taken some of the best powerbi courses
ever on udemy so I
08:53:58
highly recommend you checking out the ones that I have in the description these are
ones that I actually took and I loved the most so if you're looking for a full
powerbi course I highly recommend checking out you to me thank you so much again to
our sponsor and now without further Ado let's jump onto my screen and get started
with a tutorial all right so the first thing I'm going to do is download powerbi
desktop I will leave this link in the description so you can just click on it go to
it and
08:54:19
download it we're going to click this download free button and once we click it you
can go to the Microsoft store and I already have it downloaded so when you see it
uh it'll already say downloaded but um for you you can go in here you can click
download and it will download it for you I'm on Microsoft uh but it may look a
little bit different for you if you're on a different system but once that is done
we are going to open up powerbi so let's go right down here to our search let's go
to
08:54:50
powerbi and it is going to open up for us all right so right away this is what it's
going to look like when you open it and we're going to go right over here to get
data and let's click on that it's going to open up this window and it's going to
give us a lot of different options for where we can get data from now some of these
are free and some you need to upgrade from but you just taking a quick glance
through here you have a ton of options there's databases there's
08:55:17
um you know blob storages there's post create SQL or different SQL databases um
there's Google analytics there's a lot of places and you can go through the process
to connect to that data and you can pull that data in from those data sources now
for what we are doing we're just going to be using an Excel I'm going to leave the
Excel that I'm going to be using in the description you can go and download it and
walk through this with me so what we're going to do is click on Excel workbook and
we're going
08:55:42
to click connect so we're going to go right here in our powerbi tutorials folder
and we're going to click on apocalypse food prep so let's click on that and it is
going to connect and pull that data in now right here we have our Navigator and so
if you had a lot of different sheets you can click on that and choose which ones to
pull in I just clicked on it right over here and we're able to preview the data but
I can't load or transform it yet I need to select which sheets I'm bringing in so
08:56:11
we only have ones that's the only one we're going to bring in so you can go ahead
and load the data or you can click on transform data it's going to take us to
powerbi power query which is going to allow us to transform our data so I'm going
to have an entire video on how to transform the data but I'm going to give you a
really quick glance at it to kind of show you what it is so right up here it says
our power query editor this is a the window to basically transform your data and
get it ready for your
08:56:38
visualizations now you can do this in Excel if you want to and do that before
forand or you can do it here and there are lots of things that we can do in here as
you can see at the top again I'll have an entire video dedicated to just power
query but let's take a quick look at the data and see if there's anything we want
to transform quickly before we actually go and start building our visualizations so
over here we have the store where we purchased it we have the product that we
purchased the price that
08:57:05
we paid and the date that we bought it now the first thing that jumps out to me is
that this just says date on it um we might want to say date uncore purchased and
we're going to hit enter and if you noticed right over here on these applied steps
it says renamed columns everything that you do every single step that you apply to
transform this data is going to be right over here and if I want to if I go back
and I say you know I really didn't want to rename that column I can just click X
and it is
08:57:35
going to get rid of that and take it back to its original state so again I'm just
going to say purchase and we're going to enter that now this is our apocalypse food
prep so this is food that we are buying for the apocalypse um for this example and
if we look at our products we have bottled water canned vegetables dried beans milk
and rice and all of that stuff makes sense except for the milk U milk will not stay
or last long in the apocalypse so I think what we're going to do is we're going to
filter that out really
08:58:06
quickly and we're GNA click okay and right over here again says filtered rows and
so now if we scroll down there's no milk so what we are going to do is we are going
to go over here to close and apply and it is going to actually load the data into
powerbi desktop so on this left- hand side it immediately takes us to the report
Tab and what we want to do is go right here to the data Tab and take a look at our
data so again there's our date purchased and as you can see the milk is not in
there another
08:58:43
tab that we're going to take a look at um and again in this report tab this is
where we actually build our visualizations the data is where we can see the data
and and change it up a little bit and change some small things about it like
sorting The Columns or even creating a new column and over here we have this other
Tab and is called model and this is especially useful when you have multiple tables
or multiple excels and you need to join them to kind of connect them together we
don't have
08:59:08
that but in a future video I'm going to walk through how to use this entire higher
tab so now let's go back to the data Tab and I want to just look at the data really
quickly before we go over to the report Tab and we start building our first
visualization as you can see I've been buying these different products in different
months so this rice I've been purchasing in January February March and April and
I've been buying it from three different locations because I wanted to see if I was
spending less money at one
08:59:33
location on all of the products so then I would just shop there in the future and
save a lot of money or if there were specific products that were really cheap at
one location but others they were cheaper at a different location so I should just
buy like the dried beans at Costco but everything else I should be buying at
Walmart and so that's what we're going to look at in just a little bit so let's go
over to the report tab right up here at the top there's this data section so you
can kind of choose
08:59:57
if you want to add any more data now that we are here we can also write queries or
transform the data like we were looking at in the power query editor window over
here in the insert we can add a new visualization or a text box and then in the
calculation section we we can create a new measure or a quick measure and then over
here we have share where you can actually publish your report or your dashboard
online now over on the visualization section on this far right this is a very
important area this is where a lot of the actual
09:00:25
creating of the dashboards happen so let's take a look really quick and we'll get
into a lot of these things as we're actually building our dashboard so we're not
just sitting here looking and talking we're going to be actually building and doing
all right so we're going to click right here on this drop down on sheet one it's
going to show us all of our columns now two of the things that we wanted to look at
were where are we spending the least amount of money buying the exact same product
that'll
09:00:49
help us determine where we want to shop and the second thing was should I be buying
all my products at the same place or are there certain products that they're going
to be cheaper at a specific store and I should buy it there so let's start out with
the first one which we're just going to see uh with the store and the price uh
where we're spending the least amount of money and just at a quick glance we can
see we're spending the least amount of money at Costco at $210 versus Target 219
and Walmart at 225 and
09:01:18
that really answers our question but we want to visualize it better be able to see
it in an easier way so we're going to go right over here and we can click on a lot
of these but the one that probably makes the most sense is the stocked column chart
and it's going to show Walmart Target and Costco now they're all the same color
let's add a legend so we're just going to drag store over here down to this Legend
and let's make this larger while we're working on it so now
09:01:45
we can see we're spending the most amount of money at Walmart right in between at
Target and then at Costco is the lowest and so right there we know that Costco is
the place to go for our apocalypse food prep but is it going to be that way for
every product I don't know let's take a look let's put this up in this corner and
let's start a new one we're going to need to select the product for sure and the
price and probably Additionally the store as well and let's click
09:02:16
on let's not do this one we need a clustered column chart that's what we need let's
bring this over here let's expand this quite a bit and so really at a glance this
is giving us everything that we need we can see each product right here and we can
see how much we're paying per store and so for Rice we're paying it looks like a
lot more for our rice at Walmart while at Target is actually where we are paying
the least now if we look at all of these it looks like for Costco the only one that
we're
09:02:49
really paying a lot more on is on our rice but for our dried beans our bottled
water we're paying quite a bit less and really it's pretty negligible for these
canned vegetables we're paying maybe what 60 cents 50 60 cents more per can so
that's pretty negligible but for the big ticket items um we're really spending a
lot less at Costco if we wanted to SP to save just a little bit more money we could
go to Target for our rice now if I want to make this more like a dashboard and
we're only keeping
09:03:19
these two things I'm going to kind of size them kind of like this whoops going to
show you that in a little bit I'm going to size them a little bit like this so now
that we have that looking good we want to change the title of both of these so what
we're going to do is go over here in our visualizations and format your visual uh
and we are going to go to this General go to Ty TI and now we can name it anything
we really want for this we're going to say best store for product and while we're
in here one
09:03:54
other thing that I wanted to do is I want to go to this visual go right down here
to these data labels now we haven't added any data labels so I'm going to click on
and you'll see exactly what it does uh it just puts the labels and the numbers
above it so you don't have to actually like hover over it and see what it is now it
is actually rounding these numbers so what we're going to do is go down here we're
going to go down to values and we'll go down to display units and it's on auto so
it's Auto
09:04:22
rounding those numbers and we're just going to say none so we can see the actual
value of these numbers and we can do the exact same thing over here it probably is
a good thing to do um and it just is going to visualize it a little bit differently
in here but you can always change that if you want to go over here to title and
we're going to say total by store and now we're going to take a look and so in a
matter of minutes we were able to take our data from an Excel put it into powerbi
transform it a little
09:04:57
bit then we're able to create these visualizations that gave us concrete answers to
some very important topics we now know that Costco is the place to go for basically
every single product except if we're buying rice and if we want to save just a few
dollars we're going to head over to Target and that's genuinely going to change my
shopping habits for the next several years until the apocalypse happens so in
future videos we're going to dive into a lot of the things that we looked at today
but
09:05:22
just in more detail and then at the very end of the series we're going to have an
entire project where we really use every single part of powerbi and create a
beautiful dashboard and so that's all we have for our very first video in our
powerbi series I hope it was helpful if you like this video be sure to like And
subscribe below and I'll see you in the next video [Music] what's going on
everybody today we're continuing our powerbi tutorial series and in this video
we're going to be
09:05:56
looking at Power [Music] query Now power query is really great because it allows
you to actually transform the data before you actually get it into powerbi so if
you want to make any changes like adding or deleting a column or changing the data
type or a ton of other things you can do all of that in power query now without
further Ado let's jump on my screen and get started with the tutorial all right so
before we jump over to powerbi and start using power query I wanted to take a look
at the data and this is the Excel
09:06:26
from our last video called apocalypse food prep and in that video we went through
and we bought some rice some beans water vegetables and milk all for the apocalypse
getting prepared for that now we decided to buy some additional things like rope
some flashlights duct tape and a water filter several water filters and after we
purchased those uh our boss or whoever we're working with or somebody decided to go
and make a pivot table now in this pivot table they kind of broke it out by Costco
Target
09:06:56
and Walmart and had all the items had some subtotals as well as some Grand totals
right here and then they decided to kind of copy and paste that into this and
you'll see this a lot when you're working with uh people who use Excel they like to
kind of make things like this maybe make it into like a table or or format a little
bit differently but you'll see stuff like this a lot so this is what we're going to
actually pull into Power query and work with now we're going to imagine that this
is all we
09:07:25
have this is the only thing we were working with and I'll kind of reference this
pivot table a little bit but we're going to pretend this is all we have and we want
to transform it to make it a lot more usable to where we can make visualizations
with it so let's hop over to powerbi and pull this excel in so what we're going to
do is click import data from Excel we're going to click apocalypse food prep and
click open and then it's going to bring up this window right here now this is where
we can
09:07:48
choose what data to bring in so we can take a preview and just click on it real
quick and this is the pivot table that we were looking at so it does have that
pivot table so we are able to pull in just a pivot table and then we have the
purchase overview where it's kind of that formatted um thing that we're just
looking at with all the colors we're going to pull both of those in so we're going
to pull in the pivot table and the purchase overview now we could just load it or
we could transform it and we're
09:08:15
going to click transform and that's going to bring us to power query so let's click
on transform data so now really quick before we actually jump into working through
this and transforming it I want to show you what the power query editor looks like
so if we go right over here we have our queries and these are the tables that we
actually pulled in and we can click on those and kind of go back and forth between
them now up top we have our ribbon and the ribbon offers a lot of functionality we
have things like remove
09:08:40
columns keep rows remove rows split columns these are all things that we're likely
to use when using this power query editor there's also another tab called transform
where there's a lot of functionality here as well things like unpivoting a column
or transposing columns and rows and using a first row as a header some of the
things that we'll be looking at today there's also another tab called add a column
and this one's pretty self-explanatory where you can add additional columns like
deleting
09:09:08
a column creating an index column or a conditional column those are the three main
ones there's also view tools and help but we're not going to really be looking at
those today and then on the far right side we have our query settings you can do
things like change the name so we call it pivot table 2022 and it'll update right
over here on our query side and we have our applied steps now our applied steps are
extremely important and very very useful anytime we make any change to transform
09:09:38
this data it's going to be documented right here and then we can go back and look
at it or we could even delete that change in the future if we want to and go back
to a previous version of what we just did so when we loaded the data into powerbi
it did a few things for us it shows the source the navigation and it promoted the
headers and then it also changed the data type so if we want to check we can
actually see those things or change those things like this Source right here we can
click on this little
09:10:04
icon and it's going to bring up the actual path where we got this file so if we
wanted to change that or or it changes in the future future we can come here and we
can change this file path but we're not going to do that right now so let's click
on cancel and let's go back down to change type so it promoted these headers and
obviously these headers are not correct we're looking at this pivot table and not
the purchase overview but it changed these column headers and so in the future if
we
09:10:29
wanted to we could easily change those but it did that for us and it changed the
type as well so if you look right here it says abc123 all the way over here it's
where it just says ABC ABC means it's only going to be text where abc123 means it
could be basically anything uh text or it could be numeric so now let's go over to
purchase overview and this is the one that we're actually going to be working on
the most but we might be looking at pivot table just a little bit to kind of
09:10:56
reference it and see some of the differences so before we do anything let's just
take a look at how powerbi decided to take this data in so it chose this apocalypse
food prep overview as kind of the First Column and that was kind of our header or
the title of what we were looking at before and then all these other columns are
basically column 1 2 3 four fivs so that's something that we're going to want to
change in just a little bit there's also all these blank uh columns right here at
the top and
09:11:21
kind of these null values as we go along and we'll take a look at those and we kind
of we going to want to get rid of some of this and just clean this up to make it
more usable for our powerbi visualizations this may be perfectly fine and
acceptable in an Excel but when you're pulling it into powerbi the real reason
you're pulling it in is to create visualizations not just it to look good in an
Excel so we're going to need to clean this up quite a bit so let's go right up top
the first thing that I want
09:11:48
to do is I want to get rid of these top rows so we're going to go to this top
ribbon and we're going to click remove rows and we're going to select remove top
rows and we're going to select two because we have one two rows of all nulls and
those are completely useless we just want to get rid of them right away so let's
cck Okay and it removed those the next thing that we want to do is these this
location product and all these dates these are actually the column headers that we
wanted so what we
09:12:17
need to do now is we want to go over to transform and we want to say use first row
as headers and just like that we have location products and these dates as our
headers exactly how we wanted them now let's say for whatever reason you know we
made a mistake and we needed to go back we would just select remove top rows and
that would be perfectly fine now you can see over here it promoted the headers but
it's also changed the data type so before if we went to before we removed the
headers these were all
09:12:50
abc123 abc123 because it had a lot of different data types in there so it just kind
of made a generic data type but when we promoted these headers the first thing that
it decided to do was also change this data type for us giving us its best guess as
to what this data type is and it decided to do this decimal so this one two is a
decimal but we're actually going to change that and all you have to do is click on
This 1.2 uh or or the data type that it has right here for you and we're going to
click on
09:13:19
fixed decimal number and let's do replace current and now it's just a little bit
better so now it's 2.70 2.5 and that's normally how we would read uh values like
this because this is money so we would normally read it to the second decimal just
like that and if we have it on the second decimal for some we should probably have
it on the second decimal for all all of them so really quickly I'm going to go
through and I'm just going to change that and it should be pretty quick so hang
with me for just a
09:13:49
second all right that is perfect now for the purposes of what we're about to do we
don't actually need these subtotals or this Costco total Target total and Walmart
total as well as the grand total really we want to get rid of those and so what
we're going to do is we're going to go right over here we're going to click on this
drop down and we're going to try to filter this data before we actually load it
into power VI so we're going to filter and we're going to say
09:14:14
remove empty and let's remove those and it's going to take out all of those nulls
if we wanted to try to filter this out by saying something like Costco total or
Target total we could do that by going right here clicking this drop town on
products going to text filters and saying does not contain and let's do insert and
we're going to say does not contain and we want to say total and let's click okay
okay and again it filtered out all of those things so there's a few different
options that you
09:14:45
can do if you want to filter out rows that contain either null values or specific
values now the next thing that we're going to do is actually get rid of a column
this grand total column and so what we're going to do is we're going to click on
the very top part where it says grand total we're going to go back over here to
home and we're going to click on remove columns and it says insert that's because
we're on this filtered rows one right here um but what we're going to do
09:15:08
is just insert that and it'll insert right there that's totally fine we can just
move it to the bottom now we got rid of this column entirely now this looks really
good visually I like how this looks I like how everything is set up the biggest
thing about this is that when you're actually wanting to use this for
visualizations these columns as dates doesn't really work too well and so what
we're going to want to do is we're going to want to transpose this or pivot this to
where these dates are
09:15:37
actually rows so what we're going to do is select the first date which is January
1st all the way through April 1st and we're going to hit shift and click on that
April 1st right there to select all of them at the same time and then we're going
to go over here to the transform Tab and we're going to click unpivot columns and
let's see what this does and so now what we've done is we've basically recreated
our original Excel that we had so let's go back and take a look really quickly at
that so this
09:16:03
looks almost identical to what we have in powerbi right now and this is extremely
usable and very good for visualization and is much much better than this but again
we were pretending that this is what we were given at the beginning so you have to
imagine you know somebody just handing you this and you need to make it much more
usable for visualizations in the future which happens a lot and we actually wanted
to create this we just weren't given this now a few last things that we might want
09:16:29
to do is we want to clean this up just a little bit we're going to select the data
type and change this to date and then we're going to select the value and I double
clicked on the value and I actually want to call this cost uh or product cost
productor cost and then for the location I actually want this to be called store so
now this looks really good but I want to show you one thing really quickly on this
pivot table 2022 so let's go back here this looks very similar to how we had it
when it first
09:17:03
started one thing I wanted to show you uh really quickly and I want to click on
this first one we're going to make make this our column header and then we're going
to try to Pivot or unpivot this January February March April so really quickly
let's do that so we're going to transform use first row as headers so now we have
this January February March April now if you notice these are not dates these are
actually texts it says January February March and April so if we go to do this and
we
09:17:36
click unpivot and here's the columns that are cre cre when we unpivot it it is
January February March and April these are not dates so we cannot go and change
this to a date because that would error out because it's actually text so it's
something that you want to look out for it's something that you need to be aware of
and you can change that in the pivot table so you want to be aware of how it
actually sits and looks in the Excel or whatever data source you're pulling from
09:18:03
before you actually pull it into Power query to transform and now the very last
thing that we need to do to finalize all of this is go over here to close and apply
and once we click that everything that we've worked on is going to be applied to
the actual data and it's going to load into powerbi to create our visualizations so
let's go ahead and click on that and so now the data has been pulled into powerbi
let's go right down here to data and we can see the data right here if we need to
transform
09:18:28
this data again we can bring it back into the power query editor window by just
clicking the transform data button and it's going to bring us right back so I hope
that this was helpful thank you so much for watching if you like this video like
And subscribe below and check out all my other videos and everything data analyst
related I'll see you in the next [Music] video what's going on everybody welcome
back to the powerbi tutorial Series today we're going to be taking a look at
09:19:03
building [Music] relationships now when you import multiple tables from either the
same data source or multiple data sources you want to tie them together so that
when you're creating your visualizations everything is connected so in this
tutorial we'll be walking through how to create those relationships to make sure
that all of your tables are connected properly and without further Ado let's jump
onto my screen and get started with the tutorial all right so before we jump over
to powerbi and start creating our
09:19:31
relationships and our model I want to take a look at the data in Excel we realized
we were buying so many products for the apocalypse that we decided to start our own
store and we have several customers and some client information down here and so I
wanted to take a look at some of the columns and these tables that we're going to
be looking at first thing we have is the apocalypse store these are the things that
we are selling I know it's a very limited inventory but these are the really high
sellers these
09:19:57
are the ones that I wanted to sell so we have this product ID our product name
price and production cost then we have this apocalypse sales this is how many sales
we've actually made to our customers so we have this customer ID our customer name
product ID order ID unit sold and the date it was purchased and then we have our
customer information right here here are all of our clients so we have this
customer ID customer address city state and zip code so now that we've taken a look
at our
09:20:28
data let's go and load it into powerbi so we're going to say import data from Excel
we're going to choose this model right here we're going to click open and we are
going to want all three of these so I'm going to click on all of them and we're
just going to load it we're not going to transform the data at all so now the data
has been loaded let's go right over here on the left hand side to our model Tab and
let's scoot this over just a little bit and move back and we're going to move these
09:20:56
tables up to where it's a little bit easier to see so right off the bat you can
already see that there are these lines between these tables so there are already
relationships that powerbi has automatically detected and created from my
experience powerbi actually does a really good job at creating these relationships
automatically but we're going to go in and take a look at these and kind of see
what everything means and then we're going to go back and create these
relationships from scratch
09:21:23
just to make sure that we know how to do every single part so to get it started
let's double click on this line connecting the customer information table to the
apocalypse sales table and it's going to bring up this edit relationship page right
here so this line right here connecting these two tables actually gives us quite a
bit of information without actually having to click into this edit relationship
page what this is showing is that we have a one to many relationship and there's
only one or a single crossfilter
09:21:50
direction and you can find both of those things right down here and I'm going to
walk through what those mean in just a little bit on this page you can also see the
columns that powerbi decided to choose in order to tie these two tables together
now for our example they decided to use the customer and customer right here from
the customer information table as well as the apocal sales but I don't really want
to use those specifically because on this apocalypse sales table I might remove
this customer
09:22:17
information and just keep the customer ID it may have chosen these customer columns
because they have the exact same name and really the same information but I want to
use this customer ID anyways so what I'm going to do is I'm going to click on that
column and click on this column and then I'm going to click okay and if we go back
into it by double clicking again we're going to see that and now save that and if
we did what we just did before which is kind of hover over it it's going to show us
what those
09:22:43
two tables are joined on so opening this back up let's go down here to this
cardinality and cross filter Direction cardinality has several different options
that you can choose from you have one to many one to one one to many and many to
many now for this example we're looking at apocalypse sales and we're going
apocalypse sales down to customer information now there are a lot of rows in the
apocalypse sales but there's very few in this customer information and there's only
one
09:23:10
customer per row whereas in the apocalypse sales up here the customer can have
several rows for several different orders so that's why the cardinality is many to
one now if we flip this and we say we want the customer information here and we
want the apocalypse sales down here we tie that together now it's going to flip and
it's going to say one to many now let's look at the cross filter Direction and
there's only two options here it's either single or both and if we choose
09:23:36
both and we click okay this now goes from a single arrow pointing in one direction
to two arrows pointing in both directions but what does this really mean so in
order to demonstrate this I'm going to put this back to a single Direction and what
we're going to try to do is connect the data over here or the columns over here to
the columns in this apocalypse store so let's go over here to build a visualization
and what we're going to do is we're going to take this customer information and
let's just say
09:24:02
we want to look at state so I'm going to click on state right here and I'm just
going to make this into a table and the customer information table is only tied
right now to the sales table so we're actually going to go over to the apocalypse
store and we want to see how many product IDs are being bought in these different
states so really quickly we're going to come up here and create a new measure and
all we're going to say is this measure is the count of Apocalypse store product ID
and we're
09:24:32
going to create that and now we're going to select it so it's added to that table
so now what this is showing is that there are 10 product s which there are 10
products for each of these states but that's not actually technically correct
because not every state purchased these 10 different items if we go back to our
model and we change both of these to a both Direction and then we're going to go
back and see what changed in our numbers so now let's go back to our visualization
and now we can see that
09:25:05
Minnesota actually only ordered seven different product IDs Miss Miss 8 New York 99
and Texas 10 this is actually much more accurate than before when you use the both
option it takes these tables and treats them as if they are a single table but the
single option is not going to do that and so for our example if we're trying to
connect this table to this table and one of the last things that I want to show you
is this option right down here which says make this relationship active now if we
don't
09:25:33
click list and there are other options in here that connect these things like the
customer to the customer then that may be the active relationship but if I select
this is the active relationship that means this is going to become the default
relationship between these two tables so now let's come out of here we're going to
click cancel we're going to zoom in just a little bit and bring these tables a
little bit closer so we can zoom in just a little bit more now we are going to go
ahead and delete
09:25:59
these so we're going to say delete yes and delete yes so just for demonstration
purposes we're going to build these relationships from scratch so we're going to
come over to the customer information table and we're going to drag it all the way
over here and put it on top of this cust ID or the customer ID in Apocalypse sales
and it's going to automatically create that relationship and we can open this up
and as you can see it created the relationship between this customer ID in the
apocalypse sales and the
09:26:29
customer ID in the customer information it also defaulted the cardinality from many
to one and the cross filter direction to single so we're going to go ahead and
change that to both and click okay and then we're going to come over here to the
product ID in Apocalypse store and drag this over the product ID in the apocalypse
sales and again if we open it up it created that relationship for us it created the
cardinality automatically and we're going to change this cross filter direction to
both and click okay
09:26:57
and so on a really small scale that is how it works of course it becomes a little
bit more complex the more tables that you add and the more relationships that are
created but this is how you're going to actually create the relationships in the
model tab within powerbi I hope that this tutorial has helped you understand this
concept a little bit better thank you guys so much for watching I really appreciate
it if you like this video be sure to like And subscribe below and I'll see you in
the
09:27:20
next [Music] video what's going on everybody welcome back to the powerbi tutorial
Series today we're going to be taking a look at Dax [Music] now DAC stands for data
analysis expressions and it's basically a library of functions and operators that
help you build formulas you can use Dax to create measures and calculated columns
within powerbi which can really give you a lot of insight into your data honestly
it is not super complicated and hopefully by the end of this video you'll have a
lot
09:28:03
more confidence actually using Dax and powerp so without further Ado let's jump
onto my screen and get started with the tutorial all right so let's take a look at
our tables and data before we get started so we have two tables the apocalypse
sales the apocalypse store for this apocalypse sales table we have the customer
product ID order ID unit sold and the date it was purchased and then for the
apocalypse store we have product ID product name price and production cost now
these are joined
09:28:31
together or they do have a relationship together via the product ID so what we're
going to be using are these new measures and new columns to create our Dax
functions so really quickly let's go over to this report Tab and let's drop down
our Fields over here so we can see everything and so to get us started we're going
to go right up here to apocalypse sales we're going to rightclick and click new
measure and it's going to open up this right here which is basically our bar where
we can
09:28:59
create our functions and so right here it's automatically given us the name measure
but we can change that and we're going to say count of sales so now we can start
writing our Dax function that's just going to be the name of it and what's going to
show up right over here once we click enter so let's go over here and we're going
to say count and as we're typing it's automatically giving us options it has
something called intellisense if you've ever used other Microsoft products
intellisense is
09:29:27
their kind of autoc completion that helps you look at other options very quickly
and so we're just going to click on this count and it's prompting us to put in a
column name and so we can come down here and we can select one or we can type it
out and it'll try to predict and help us choose which column to select so for us
we're going to use this order ID but let's just start typing it out we'll say order
ID and then we can click on it and we're going to close this parenthesis and click
enter or you
09:29:55
can go over here and click this check mark but we're just going to click enter and
so over on this right side it finalized that and save that and we can actually look
at that by clicking on this box next to it and we want to look at the this in a
table so now we can see that there are 74 sales now for this we want to see who's
buying our products we want to see what our what our client name is so we're going
to go over here we're going to choose customer and we're going to put customer on
top of sales and we're
09:30:26
just going to take a look at it like this so now we can see that our number one
customer is Uncle Joe's Prep shop he has 22 orders now they have the most orders
with us but it doesn't necessarily mean that they're spending the most money with
us but we can take a look at that later the next thing that I want to take a look
at is how many products we're actually selling what are our big products that we're
selling we have 10 different items but I don't know exactly which one is selling
the best if
09:30:52
if one is doing really poorly and getting no orders this is something that I want
to look into so all we're going to do is go right back up here to apocalypse sales
again right click and select new measure and for this one we're going to call it
the sum of products sold and all we're going to start out with is by doing sum and
if this seems familiar to something like Excel you're 100% correct it is very
similar and remember these are both Microsoft products so there's going to be
similar
09:31:23
functionality in both of them and so this Dax is going to have a lot of
similarities to exactly how it has it in Excel so we're going to do an open bracket
and now what we're going to choose is this units sold we want to sum up all of
these units sold and see how many we actually selling so we're going to say units
sold I'm going to hit tab it's going to autocomplete that I'm going to close my
parenthesis and I'm going to come over here and click this checkbox so now it's
created that
09:31:52
measure and we're already selected in this table so all we have to do is click the
check mark and it's going to show us that we have 3,000 total products sold and we
can go through here and see what the big sellers are and probably the biggest one
that I see right off the bat is this multi- Tool Survival Knife so these Dax
functions that you can write can be very simple and lead to really good insights
that you can use for the visualizations later on now I want to take a look at the
difference between
09:32:19
something like sum which is an aggregator function and something like sum X which
is an iterator function because if you add X to some of these aggregator functions
you can create them or or make them into an iterator function so you can have some
and some X or average and average X adding X onto the end of them can make them to
an iterator function so let's take a look and see how that actually works I'm going
to show you the difference and then I'm going to talk through the difference at the
end so really quickly
09:32:47
let's go back to our data and let's go to the apocalypse store now what we have
right here is we have the price and we have the production cost and we want to see
how much profit we're getting from each of these as well as we can take a look at
the unit sold and see how much money we are actually making so what we're going to
do is we're going to come back over here we're going to go to apocalypse store
we're going to right click and create a measure and in just a little bit we're
going to be creating a
09:33:12
new column and that'll kind of show the difference really well so we're going to
create this new measure and we're going to name it profit and we're going to come
over here and what we're going to do is we're going to take the sum oops we're
going to start with our sums we're going to take the sum of the price and then
we're going to close that parenthesis and we're going to subtract the sum of the
production cost so all that does is it says if something cost
09:33:40
$20 if we sold it for $20 and it only costs us $10 that's $10 in profit for that
item and then what we're going to want to do is we're going to actually want to
encapsulate that really quickly because we're about to use multiply and then we're
going to sum and now we're going to take the units sold so how many units were
actually sold at that profit that we just made so let's see if that works and let's
click the check right here and so we have the profit so let's
09:34:08
click on the profit oops that's not what I wanted to do let's use a new one or
let's create a new uh table we're going to click profit let's make it a table and
I'm going to pull this right over here now we have our profit but I really want to
know is which customer is spending the most money at my store so we're going to
come right over here we're going to click on customer and I'm put customer at the
top and just at a glance we can see that Uncle Joe's Prep
09:34:35
shop is spending the most money at the store now now what I want to show you is the
difference between sum and sum X so what I'm going to do so I'm going to go back to
this profit and going to copy this this entire thing and we're going to go back
here to this table now we just created a measure and we were able to break it down
by each customer so let's go back over here now let's go up here to home and we're
going to create a new column and we're going to call this
09:35:08
profit profit underscore column and we're going to literally paste the exact same
thing into here and we're going to hit enter and each row is the exact same thing
so what it's doing is it is going through the price and it's adding all of it up
and calculating it at the bottom it's adding the production cost it's going all the
way down and calculating it at the bottom and then it's going over and looking at
how many units it sold and then it's performing this calculation up here and then
it gives us
09:35:42
the total and it's doing it for every single row but that's not really what we
wanted to show what we wanted to show is the profit for each row what we wanted to
say is here's the price for the Rope the production cost for the rope and then how
many units we actually sold and then it'll calculate that and give us the actual
profit for just that row but we cannot do it by just using this sum what we need to
do is use something called Su X so let's add another column let's go back to home
say new
09:36:15
column and now we're going to say profit underscore oops underscore column
underscore sum X and now we're going to use sum X and hit Tab and we need to choose
the table that we want to put this in so we're going to say apocalypse sales
because that's the table that we're looking at right here we're going to say comma
and now we need to input an expression which it says it Returns the sum of an
expression evaluated for each row in a table before when you're just using sum
09:36:49
it's looking at all of these combined now it's taking it row by row so what we're
going to do is basically input the same thing as we did before I'm going to copy
I'm going to paste that it's not going to be correct I need to get rid of these
sums but it's basically the exact same equation give me just a second and let's get
rid of this some and let's see if this works so let's click the check button and
now this looks a lot better so what this is now showing us is at a
09:37:20
row level this nylon rope made us 51,000 almost $52,000 the waterproof matches made
us $115,000 and we can go down and look at each item and see how much that actually
made us versus this profit column and so that is the biggest difference between sum
and sum X hopefully that made sense I know that sum and sum X and and the
difference between an aggregator function and iterator function can be a little bit
confusing especially if you've never done it before but hopefully that was a good
example for
09:37:51
you to understand that concept now let's go back over here to apocalypse sales
right here we have a date purchase now in the Dax function we have some ways that
we can interact with dates and so I want to take a look at those really quickly so
we're going to go right up here and click on new column and we're just going to
leave that as column but what we're going to say is day so there's a few different
ones we have Day dates YTD next day previous day and weekday and they all are
pretty
09:38:22
self-explanatory if you click on it let's click on weekday it says it's going to
return a number from 1 to 7 identifying the day of the week of a date so let's use
this really quickly and so we're going to say date purchased and and click tab hit
comma and it's going to give us a three different options basically it's a one a
two and a three um right here if you hit this button read more you can read more on
it this is going to say Sunday is equal to one Saturday is equal to seven
09:38:53
I like this one personally which is Monday equals one in my brain it just makes
more sense so I'm going to click on two I'm going to close that parentheses and
we're going to I guess I'll say uh let's say day of week for the column let's click
that checkbox and now Saturdays are equal to sixes Mondays are equal to one this
allows us to see which day of the week people are buying the most products on or or
which day of the week is somebody submitting their orders on and so let's
09:39:26
go over to our report let's get rid of this we just going to move this oh jeez I
hate moving stuff sometimes all right really quickly I want to show you the
difference between what we just did and what we already have so we have this um
date purchased and let's make that into a bar graph and what we're going to be
taking a look at is actually the units sold so right here we have this and
obviously for we don't want 2022 we're going to get rid of the year we only have
one quarter right here we can see January
09:40:02
February March so we can tell that January has the most sales or the most units
sold in that month if we get rid of that we go down to day we do have some
information but we don't know what day of the week it is it could change from month
to month and it's really hard to tell exactly what if there's any pattern there at
all that's where what we just created comes in handy so let's recreate this exact
same thing but instead we're going to use day of week so we're going to select day
of week and
09:40:30
unit sold let's drag that down and move this over right here and this day of the
week should be on the xaxis and it's really easy now to see if there's a pattern
here there's really not at least not for this fake data that we have um but just I
I want these uh data labels on really quickly um it's not easy to see if there's
any pattern again Monday has the most so maybe that that I mean it goes down a
little bit and then it picks back up so maybe middle the week is our least
09:41:03
uh sales day our Wednesdays and Thursdays are a little bit lower than the rest and
the beginning and the end of the week tend to be the highest again not a huge
pattern but you know it's much easier to see if there is a pattern from week to
week or what day of the week now that we use this weekday function and so this can
be really really useful let's go back here to our data now we're going to look at
our last Dax function for this video let's go up here and create a new column and
we're
09:41:30
going to be looking at something called the if statement now if you've ever used
Excel I'm sure you have heard of this and you can do the exact same thing here in
powerbi and so we're going to name this one order size order undor size and so all
we're going to say is if we're going to click on this one right here we need to
perform our logical test and then we want to say if it's true what's our value and
if it's false what is our value so what we're going to be looking
09:41:58
at is units sold so we're looking at order size so we're going to say if unit sold
is greater than 25 what's going to happen if it is true if the order is larger than
25 you want to say it's a big order and if it's not we want to say it's a small
order super simple we'll close that parenthesis we'll click okay and now really
quickly we're able to see if this is a big order or a small order and so that is
all I have for you today there are a lot of other dox functions but the
09:42:33
ones that we looked at today are ones that are very common ones that you'll see the
most and there can be a lot of really complex and intricate Dax functions that you
can create and in our project at the end of this series I will be sure to include
some more complex Dax functions but hopefully this gave you a good introduction
into Dax so you know how to use it a little bit better thank you guys so much for
watching I really appreciate it if you like this video be sure to like And
subscribe and check out
09:42:59
all of my other videos on everything data analyst related I will see you in the
next video [Music] what's going on everybody welcome back to the powerbi tutorial
Series today we're going to be looking at how to drill down in [Music]
visualizations so when I say drill down I mean you're basically adding another
layer beneath the top layer of the visualization and when somebody clicks or drills
down in that data they can see more insights and more information on the top level
of data when you drill
09:43:43
down you can also drill up and I will show you how to do that in this tutorial so
without further Ado let's jump on my screen and get started with the tutorial all
right so before we get started I wanted to remind you that you can find the data
that we're going to be working with in this tutorial in the description you can go
and download it from my GitHub now the two tables I'm going to be looking at are
apocalypse sales and purchase tracker and if you've ever created any visualizations
you've
09:44:05
probably seen something like this where you'll have the store and the price and
this is the the things that we actually bought so this is the total amount of
Apocalypse prepping uh equipment that we bought and we'll put the store in this
Legend right here and you've probably seen something like this and if you're
anything like me you're going to be in a meeting and you're going to be presenting
this and some higher up is going to be like hey Alex that looks great but I want to
you know see what
09:44:29
things we actually bought in Target and how much this cost can you create a
visualization for that and you're going to be like well I could or I could use
drill down so you could have done this in the first place uh which you should have
so what we're going to do is all we're going to do is we're going to say we're
going to say the product right here and these are going to be the actual things and
we're going to put it right under store now you can't see these things right but
there is a a
09:44:52
hierarchy here so once we added this these options became available let's take it
out and all those just disappeared and then if we add it back right here they came
back and so you can do right here which is is click to turn on drill down you can
go to the next level in the hierarchy or you can even expand all down one level in
the hierarchy so let's look at each of those really quickly so let's click on this
one it's just going to turn on drill down mode so now if I go and I click on
09:45:22
target it's going to drill down into these and if we want to I can then put product
under this Legend and we can see all of those things but of course if we go back up
it's going to be all broken up into this clustered column chart which is more like
um this which isn't exactly what we were going for but it works now uh let me get
rid of this I actually want store in the legend now if we turn that off and we
click it doesn't do that anymore so what it does now is it just highlights
09:45:52
Walmart it highlights Costco it highlights Target so we're going to keep that on uh
but we can also do something called going down in the next level of hierarchy so
let's click on that and so now this is going to go down to the next level down to
this product level because that is the next level and now it's going to show us
each of those things but it's going to have it broken out by the store and so it's
a completely different visualization but all within the same Realm of the data that
we're
09:46:19
looking at and what we actually care about so let's go back up in the hierarchy and
then let's use this one right here which is expand all down one level in the
hierarchy and so this one is again extremely similar except it just visualizes it
differently and now what it's doing is Walmart rice Target dried beans Costco rice
so instead of having an all uh like this one where it's stacked on top of each
other it's breaking it down individually so this one column would become three
separate
09:46:47
columns now I'm going to minimize this right here uh I'm actually going to go back
up in the hierarchy just for visual purposes now I'm going to show you one more
example we're going to use this apocalypse sales up here and this is one that I
actually use all the time so the one you've seen you know you'll get stuff like
that especially if you're working with like sales and stuff but I work in
operations right so I have a lot of order IDs product IDs stuff like that now this
one this one genuinely I use
09:47:16
quite often I'll have a customer U let's make it we'll just go like this we have a
customer and we have unit sold and let's use the customer as the legend so let's
make this one quite a bit larger and I'll have something like this and they'll say
okay well we want to see the order ID s that go with it because we want to know
what orders are actually happening for each of these people obviously I'm not using
this exact data but very very very similar and all you have to do is take these
order IDs and
09:47:50
slide it right under here under customer and this visualization right here is
something I've done a thousand times because what happens is is someone some
stakeholder in our company is saying hey Alex we want this and we want to know we
want to drill down on this IP address we want to drill down on this certain
database we want to drill down on something and we want to see the order IDs within
them so then all you do is you turn on drill mode or drill down mode you'll click
on it and you can see
09:48:17
every single order ID that's in there and then they can go and look those up in
their system and resolve them or whatever they're trying to do with it and it helps
a ton and it's very very useful this one is extremely applicable and that's really
all drill down is again you have these different hierarchies as well um but for
different things it's not as useful as you can see we also have this hierarchy
which again is not as useful so it just depends on the data that you're using and
how you
09:48:43
want to use this drill down effect but I promise you that drill down is used all
the time especially when you're giving presentations where people want to know more
information than just the the visualization that you're presenting so I hope that
this has been helpful I hope that you understand drill down a little bit better if
you like this video be sure to like And subscribe and check out all my other videos
on powerbi thank you and I'll see you in the next video [Music] what's going on
everybody welcome back
09:49:20
to the powerbi tutorial Series today we're going to be taking a look at conditional
[Music] formatting now conditional formatting may sound familiar because we looked
at it in the Excel series and it's very similar how you use it in Excel versus how
you use it in powerbi conditional formatting allows you to take a table or a matrix
within powerbi and use those cells to color code them and create gradients and
different visualizations within the actual table or Matrix I'm excited to start
this one so let's jump
09:49:50
over my screen and get started with the tutorial all right so before we get started
if you want to use the data that we're using in this video you can find it in the
description on my GitHub now conditional formatting is super simple and you've most
likely used it in Excel before but you can also use it in powerbi and let me show
you how to do that so the first thing we're going to do is come over over to our
apocalypse store and we're going to pull up our product name as well as the price
and
09:50:15
what we can do is come over here and we're going to go to price and it has to be
under the columns so you can't come over here and do this we're going to come right
over here to price and we're going to right click and let's go to conditional
formatting and we have background color font color icons and web URL let's take a
look at background color first this is most likely the one that we'll look at the
most so we're going to get this pop up and I'm going to slide this over now there's
a lot of
09:50:40
different things we can customize in here and the first thing I want to take a look
at is format style we have the gradient and what it's going to say is the lowest
value will be this color highest value will be this color it'll give us this
gradient color scale and so we'll use that in just a little bit but we can also
create rules kind of like an if statement and if it is between this range and this
range we give it a color and if it's between a different range and a different
range we'll give it a
09:51:05
different color so we'll also try that one and then we have this field value uh and
this one is one that uh honestly I don't use that much I've used it maybe once and
what you can do is select a text field like customer and you can do some
summarizations on the first and last and that is it so what we're going to do is
we're going to look at gradient specifically for not the customer but we're going
to go back to the apocalypse store and we're going to do it on the price now what
I'm going to do is keep
09:51:34
it as the count because this is what the default is and we're going to go back and
fix it later but what we want our lowest value to be is this bright green showing
that this it's a cheap product it's easy to purchase the high value ones are going
to be just the shade of red more expensive and we'll do it on the count now
remember the count is on each of these and we're not doing a count of how many are
sold we're doing a count of each product so it's just one per row so it all should
be the same
09:52:01
color let's take a look so it is all the same color but what we really want to show
is the actual price not just the count of the price so let's go back to conditional
formatting we're going to click the background color again and this time we're
going to change the summarization now you can do sum you can do average minimum
maximum it really doesn't matter for this example the number is the same regardless
of really which one we choose so we can just choose the minimum and it's going to
09:52:29
choose the minimum of each row which is the price so we're just going to select
minimum for this example we'll select okay and it should correct it accordingly
which means the bright green is the lowest and it goes all the way up to the
highest which is the red now let's go over here to apocalypse sales we'll add in
the units sold and let's move that out a little bit and I'm doing that on purpose
because we're about to look at something within the conditional formatting so
09:52:56
let's go to unit sold and we'll look at the conditional formatting for this one now
if you noticed we now have a new one on here called data bars now we're able to see
data bar bars on unit sold and not price because unit sold is something like a sum
an average something that's aggregated but let's take a look at datab bars because
I want to show you how to use this and then we'll go back to the background color
so for data bars we are going to taking a look at the lowest to the highest value
again we're
09:53:25
going to go from bright green all the way to this exact red it's going to be from
left to right and what it's going to show you is if it is a positive number which
all of these are is going to be a green bar basically representing the number that
you see in here along this line so let's click okay and we're going to be able to
see the highest numbers and let's scooch this over quite a bit so you can kind of
get a better understanding and we're going to do it from highest to lowest so
09:53:55
we sold the most multi-tool survival knives at 477 and so this entire bar this row
is entirely filled up or almost all the way filled up while as it gets lower and as
we sell only 182 solar battery flashlights the bar is going to represent that and
show that now I'm about to completely mess up this visualization on purpose because
it's about to get very messy to show you that you can do a little bit too much uh
it is possible what we're going to do is we're going to go right over here to
09:54:25
this background color unit sold and instead of gradient let's look at rules now
with the price we just did a gradient scale but we can do basically groups of these
and say if a number is greater to or equal than this number then it's going to be a
certain color and then if it's in a different range we can give it a different
color so we're going to say if it's greater than or equal to zero and we're going
to say number not percent and if it's less than 266 because we have 265 right here
let's
09:54:56
make it a nice uh like gold a beautiful lovely mustard gold just just great now
we're going to say if it's greater than or equal to we'll do 260 6 6 because this
is less than 266 so it should be greater than or equal to 266 number and if it is
less than we'll say 500 now we want to do this one and we'll give it uh let's do
like a peach and we'll click okay and now we have another conditional formatting on
top of that that can give us more information now again you should not do this it's
just
09:55:32
too many now let's go one step further and make it even more ridiculous and show
you one more thing before I show you how you may actually want to use this uh let's
go back to unit sold we're going to rightclick go to conditional formatting and you
can do something called icons um font color is the exact same thing as background
color except it changes the the font and so I'm not really going to look into that
one icons are very simple extremely similar to Excel and how you've seen them and
the
09:55:58
rules that you can apply to them are basically the same as if you're doing like a
gradient and it's these if statements that we saw before now it Auto gives us this
right here which basically says 0 to 33% 33 to 67 67 to 100 if it's in the bottom
3% it gives us this red the middle is yellow and the top is green so we can go
through and change all of this but honestly this looks pretty good so let's click
on it and so the ones that are our least sellers are these red ones right here
09:56:29
and the top sellers are up here now this is just based on unit sold and this looks
absolutely terrible so let's kind of take this exact information but make it a
little bit better so we're going to create a new visualization or at least a new
table so let's click on product name and we'll take the price unit sold and revenue
and what I think makes the most sense for looking at revenue is these data bars
right here but there's only one problem I can't do that because it's
09:56:59
not summarized like unit sold was but what I can do is to get that those data bars
is I can come right down here instead of saying don't summarize I can summarize it
and I can just click the sum so it now was summarized it's the exact same number
but if I right click on here as sum of Revenue I go to conditional formatting I can
now use those data bars and so we're going to use those data bars and we're going
to say for the lowest value and the highest value and let's just make it a
09:57:29
nice maybe a darker green I don't want it to well that's that's hideous let's make
it this color right here a nice dark green and there's no negative so it doesn't
really matter we're going to go left to right and you can show the bar only but
we're going to keep it because I want to see it and we're going to go just like
this we're going to order and this is pretty telling um honestly I did not think
the weatherproof jackets were performing so well but I mean they are by far a
number
09:57:56
one seller so you know our weatherproof jackets multitool survival knives and the
nylon rope are perform outperforming all of our other products so those my might be
the ones that I focus on the most while duct tape the n95 masks and waterproof
matches I mean those are those are garbage so I might be looking to replace those
in the near future with some other items that might sell a little bit better so
that's how you use conditional formatting and it's actually pretty useful there are
a lot of times
09:58:22
where I've done something like this in an actual visualization for work and it
looks something like this it just depends on what you're visualizing but this is
very much a simple thing that you can do to just add a little bit more information
and and actual visual to this little chart or table that you're going to create
sometimes it's just better to have these simple visualizations on this table rather
than just having the numbers themselves makes it a little bit more easy to read and
09:58:48
understand so again I hope that this was helpful thank you guys so much for
watching I really appreciate it if you like this video be sure to like And
subscribe and check out all my other videos on powerbi and I'll see you in the next
[Music] video [Music] what's going on everybody welcome back to the powerbi
tutorial Series today we're going to be taking a look at bins and [Music] lists now
bins and list are really useful because they allow you to group things together to
analyze and visualize
09:59:28
them easier so in this tutorial I'll show you how to create your bins and lists and
then we'll create some visualizations to show you how it can be helpful so without
further Ado let's jump on my screen start with a tutorial all right so before we
get started I wanted to let you know you can go and download the data that we're
going to be using in this tutorial in the description below is on my GitHub so we
are going to be looking at bins and lists today um and for this we're going
09:59:51
to be going over here to this apocalypse sales uh and let's open up our data right
over here and we want to look at apocalypse sales really quickly I feel like more
people would know what a bin is so we'll kind of start with a list just go a little
bit backwards than we normally would uh I'm going to use this customer or we're
going to use this customer column right here for a list really quickly and you can
do that in two ways you can come up here and you can right click on the customer
and go
10:00:17
to new group or you can come over here under this uh the Field section on the far
right and go to customer rightclick and click new group so let's click on that now
and right now is only giving us the list type it's not giving us bins because bins
have to be numeric so we really can't do that at the moment um so we're going to
call this just customer groups just or or we'll actually call it list just so it's
easier to recognize when we create it and so all we're going
10:00:46
to do is we're going to basically group these but it's going to be called a list
and so what we're going to do is we're going to select and we're going to select
and we're going to say group and click on this group button and then it creates
this Alex the analyst apocalypse Preppers and uh this prep for anything prepping
store so that it kind of named it for us but if we double click on it then we can
rename this and we can call this the best prepping stores and then we have these
last two
10:01:19
and we can we can click on one and then click control and click on the other one so
we get both of them and then we can click group and we can call this and we'll
double click and we'll call this the worst prepping stores um and then that's it
and that's all we have to do and what we're then going to do and if you want to
undo this and you want to switch it up and do whatever you can click on group but
we're not going to do that we're going to click okay and here is the column that it
10:01:51
created and it basically tells us what list we put it in if it's Uncle Joe's Prep
shop that's in the worst prepping stores list and if it's the Alex the analyst
apocalypse Preppers that is in the best prepping stores so it's kind of like an if
statement you could even create a calculated column do it on this customer create
an if statement this is just a lot faster and a lot easier than doing that but it
basically would do the exact same thing now you can use lists as well on things
like numeric so let's
10:02:20
say we have order ID and we'll go to new group and it's going to Auto go to bin
because typically that's what you'll use but you can do list as well and let's say
you know we want to say we want to call these like we'll group these and call these
the first um we'll call this the first customers or the first orders because we're
looking at order IDs look at the first orders and then we will go back here we're
going on the left side we're going to click oops we're going to go
10:02:51
back to the top we're going to hit shift group all of these and we'll say the
latest orders and you absolutely can do this um again this is kind of like an if
statement right so you're saying if it falls between this range and this range then
it's called the first orders and if it's between this range and this other range
it's the latest orders um again it's just a much simpler version of an if statement
and so you don't have to write it all out you can just have this
10:03:20
user interface kind of do it for you uh and and it's really really useful so now
let's talk about bins and by far the easiest way to demonstrate this and I'll show
you one other way uh but by far the easiest ways to show this is by using age and
so uh for absolutely no reason whatsoever these customer IDs uh who are right here
in this customer information they decided to give us some of their buyer
information who are actually buying their products on their website or in their
store they just decided to
10:03:47
give it to us as well as some uh simple demographic information I I don't know why
but what we're going to use bins for is grouping these age brackets so you know you
might be interested in say well I want to know if my core population who are buying
my products are within a certain range and you don't want to look look at every
single age because then it just you know in your visualizations it's not going to
look right you want to kind of group them make it easier to visualize so what we're
going to do is
10:04:15
we're going to go through here and we're going to basically go by tens so 10 20 30
40 50 60 and see what age bracket these people fall in so we're going to go to age
we're going to right click and we're going to say new group and we're going to go
to bin and we'll leave it as a default age bins um and you can do two things you
can do the size of the bins which splits it uh uh which splits it by this number
right here or you can go based on the number of bins so if you
10:04:42
only want to do five different bins it'll calculate that for you and it'll say okay
if you only want five bins you're going to have to do it at 12.2 if you want 10
bins it can be 6.1 but it is completely up to you on how you want to do that um you
can do the size and we'll just say every 10 which is what we're going to do or you
can go through and then you can create you know the how many many bins you actually
want so let's go ahead and click okay and it's going to create those bins for us so
if
10:05:13
somebody is 78 they're going to be in the 70s bin if somebody's 41 they'll be in
the 40 bin if somebody is 29 they'll be in the 20 bin and so on and so forth so
when we go to visualize this we don't have you know 71 72 73 74 have a lot more
things on our visualization it'll just be the 70 or it'll just be the 20 now we can
also use bins on dates as well so let's go back to apocalypse sales we have this
date purchase so we can create a bin for this as well so let's go to date purchased
let's go new
10:05:45
group now you can also create a list and that's totally fine if you would like to
do that um and it would look kind of like this where you can go through and you can
select it and you can say okay this group all these dates you can group those and
say this is going to be January uh and you can do that and that's totally okay um
but for this one we're going to do bins I think it's a little bit easier to do bins
because what we can do is go right here and we can specify if we want seconds
minutes
10:06:15
hours days months or years and so um for the data that we have it goes January
February and March so we're going to do months and we're going to say the bin size
is going to be one month so each month should have its own bin so it'll be three
bins total so we're going to select okay and as you can see on this right side we
have January of 2022 and that correlates to the January over here then it goes down
to February and then it goes down to March and then when we visualize this uh we
don't have to do
10:06:46
this the hierarchy stuff that we do in here where we filter it down down to months
we can just use this right here and that will be our month's column so now let's go
over to our visualizations and we'll see how this looks really quickly we're not
going to look at all of them but we will take a look at few of them so the first
one that we can look at is age so let's look at the buyer ID and then we'll do age
as well and so let's spread this out and we can see our distribution of
10:07:14
our buyers so it looks like we have very few uh who are in the 10 range thank
goodness and we can even put the age right under here under the age bins and we
have this now we kind of have this drill down and so if we go right here and we
drill down right there this will actually give us the breakdown so this is what it
would have kind of looked like our visualization would have looked like if we had
just kept it the age cuz now we're drilling down into the age and so it looks like
we have one 18-year-old
10:07:41
and maybe a 20-year-old as well um let's go back up yeah so it looks like we only
have one buyer ID yes so there's only one 18year old so of legal age to start
buying you know all these prepping equipment and probably uh buying online and
stuff like that which makes sense right so uh this gives you kind of a quick
breakdown in the bins rather than um doing it the alternative way so now let's take
a look at the customer list as well as the unit sold and it looks like the best
prepping store uh is
10:08:12
actually performing much worse surprisingly uh than the worst prepping store and so
I hope this gave you a really good idea of how to use bins and lists within powerbi
thank you so much for watching if you like this video be sure to like And subscribe
and check out all my other videos on powerbi I'll see you in the next [Music] video
[Music] what's going on everybody welcome back to the powerbi tutorial Series today
we're going to be taking a look at all types of [Music] visualizations now when
you're working
10:08:54
in powerbi there are a lot of different options to create visualizations and you
may not always be sure which one to use and so that's what this video is for I'm
going to walk you through a lot of the visualizations that I like and I use a lot
as well as kind of point out some of the ones that I don't like as much so that you
get kind of a feel for the ones that I think are really popular and that are used
the most so without further Ado let's jump into powerbi and start taking a look all
right before we jump into it
10:09:18
there is a link in the description where you can get the data that we're going to
be using for these visualizations if you want to practice them yourself before we
actually get into it we do need to combine this and if you download that Excel and
you see this you'll have to do the same thing all we have to say is that this
product ID is the same as this product ID purchased and now we are good to go do
one to many and it's okay if it's one way so right over here under this
visualizations tab there are lots
10:09:47
of different options and it can be a little bit overwhelming you don't really know
which one to choose there are some in here that I have almost never used for my job
ever so I'll Point those out as we go through but the main focus is going to be
focusing on the ones that I do use that I have used and showing you how to actually
create that visualization Maybe spice it up just a little bit but we have a lot of
them to go through so let's jump right into it and the very first one that we're
going
10:10:12
to start with probably the easiest one and the one that you'll recognize the most
is a stacked bar chart and what we going to do is go ahead right over here to the
product name and we want this unit sold as well so we're going to click product
name and it's going to go straight into the Y AIS for us and then we're going to
click unit sold and that will go into the x-axis automatically it just kind of
intuitively knows but sometimes it will make a mistake and then you can just fix it
or flip it and
10:10:40
we do want this uh let me make this much larger we do want this to be a little bit
more colorcoded that is what this Legend is down here so what we're going to do is
drag this product name down to the legend and now we have each product as its own
color and in previous videos we have gone through and looked at some of these
Visual and general options that you have when you're actually creating these
visualizations but we're going to do some of them while we're in here as well so
we're just going to go down here
10:11:08
we're going to choose data labels and we're going to shrink that and if you go
higher the higher you go the less you see so if you want all of them all the way
down to the green we're going to go right about there and we're going to make it
smaller so now we can go ahead and click anywhere outside of that visualization and
now we can create a new one if we had just kept it like this where we were still
interacting with this visualization and we clicked on a different one it would have
then changed
10:11:34
our visualization completely which we don't want so let's hit contrl Z click out of
it and now we can create a new one let's go right over here to this 100% stacked
column chart I'm going to click on it drag it over here and make it much larger and
we're going to come right over here to this customer information and we're going to
click on customer and then we're going to go up to unit sold and click on unit sold
and we want to break these out and so basically what this is doing is it's
10:12:03
breaking it out by each of these shops and we can see the total of what they're
buying the units sold but we want to see exactly what products make up this
percentage of this 100% so we're going to go right over here to product name we're
going to drag that down to the legend and as you can see now we have each of these
products and each of the products is up here so this backpack we can see the
backpack right here backpack right here and right here and we can see which
customer is buying what percentage
10:12:32
of their purchases so for this prep for anything prep store they have a very large
percentage 40% is duct tape so they're buying a lot of duct tape so really quickly
we're able to see what clients are purchasing or which clients are purchasing what
products the most so just like this Alex analyst apocalypse Preppers they're buying
a lot of water purifiers we like drinking clean water um you know that's just what
my audience likes and so you know we can easily get a quick glance of that again
we're going
10:12:59
to go in here I tend to like putting these data labels on here that's just what I
preference so you know something like this it looks nice it looks clean um we can
always go back and change these names which we'll do for this one so we're going to
go over here go to title we'll go down to the text and we'll do customer oops
customer purchase oh jeez breakdown pretend I'm really good at spelling and we're
going to do it just like that we'll get out of there so now
10:13:34
we have customer purchase breakdown and that looks really nice it's a good uh a
good visualization and we're going to bring that right over here we're going to
have a lot on the screen so I may have to uh make them smaller or larger to fit
everything all right so let's go on to our next one another really common
visualization is this one right here which is the line chart and the line chart is
great especially when you're using things like dates I have found this one to be
the best best and a lot
10:14:05
of people use this as well so we're going to go right over here and click on date
purchased and then units sold and on the x-axis you can see it's broken up by year
quarter month and day so we don't want to do it that high level we only have three
months of data in here so we're going to get rid of the year we're going to get rid
of the quarter and then we at least have this and let's break it out because right
now we're looking at all of the units sold so we're going to drag the product name
10:14:31
right down here to the legend and now it breaks it out by the actual product and
for each month in January February or March you can follow these products and see
how they did in each of those months and if we wanted to we can come right over
here to the filter on the product name and we could filter it by maybe the top
three so let's do multi-tool survival knife the nylon rope and the duct tape and we
can have it just like this and you know you can do those for any product that you
want but again we
10:15:00
just want to do it for those three just for an example and that really doesn't give
us a ton of information we could even go down to the day and you know it might give
us a little bit more information and so we'll keep it like that and we can go over
here change the name as well we're not going to do this for all of them again we're
just looking at the different types of visualizations I think are really good to
know but we'll change this one as well to products purchased by
10:15:28
date we'll keep it just like that again nothing fancy we're just trying to look at
a bunch of different stuff so let's put this over here down here now let's click
out of there and there are other ones in here um that are definitely useful and you
absolutely can use um like this one is a stacked bar chart this one is a stacked
column chart it's basically the same thing just a different orientation like we
went to here it's just a different orientation it's the same thing um just like
this
10:15:57
clustered bar chart custom column chart it's just its orientation either horizontal
or vertical then we have things like an area chart uh stacked area chart not really
things that I've used too much in previous positions one that I have use though is
a line and clustered column chart so it kind of combines a few of these with you
know you have these bar charts as well as line charts into one visualization so
let's look at this one because this is one that I have used several times in my
actual job so for
10:16:29
our x axis we'll use the product name then we'll look at something like the price
and so let's make this a lot larger so you can actually see it so now we have the
price and now we can look at something like the production cost and that can be our
line ya AIS so now we're looking at the price of it how much someone is actually
paying for it and then we're looking at how much it's costing us to actually
produce that product and so really quickly at a glance you can kind of see that
it's around the halfway to
10:17:01
2/3 point on most of these you can see that the production cost is always lower
than the actual price because of course we're out here to make a profit on these
products so let's minimize this one we're going to put this one right down here
let's make it even smaller let's click out of that and the next one that we're
going to take a look at is a scatter chart so let's click on that and make it much
larger oops there we go so let's use the price and the production cost again and so
our
10:17:32
x axis is the price our y y AIS is the production cost but now we need to fill in
this values right here so let's go over here and click on the product name and drag
that into values and so now we have our values we just don't know what they are but
we can see it so let's drag this down to Legend as well and it breaks it out and we
kind of have this scatter plot and you know for this fake data that we're using it
doesn't really show a lot U but if you're using real data you can definitely find
outliers
10:18:01
and Trends and patterns using this type of visualization let's go ahead and make
that one small as well drag it right down into the corner now let's go right over
here and we have the the dreaded pie charts um and dut chart now look I think it's
kind of a joke in the data analyst Community about pie charts and doughnut charts
but at the same time people use them and they request them and so sometimes you're
going to use it whether you like it or not so let's click on the dut
10:18:27
chart and let's make this one a lot larger and let's go over here and let's click
on State and we're also going to click on total purchased and that's really all you
have to do these ones are pretty straightforward you can change a few different
things like where these labels are if you want them inside you can also do that and
that would look totally fine um again I'm just not a super huge fan but you will
get this one requested people like this and want to see it and the reason a lot of
analysts don't like
10:19:00
using this is because when you start glancing at these it's really hard to tell the
difference between these sizes if you look at something like this you can easily
see that this is larger like if you're looking at this one the multi-tool survival
knife is obviously the longest and it gets shorter shorter shorter shorter but when
you start getting in here it's really hard to approximate the size I would not be
able to tell the difference between this 5.63 5.78 two uh 7.72 I would not be able
to
10:19:27
tell really the difference between these or or kind of the the difference between
them very easily that's why a lot of people don't want to use them in general so
again I want to show you this one because I think it's worth noting and worth
knowing how to use but I don't really push people towards this because I don't
think it's the best visualization available most of the time all right the next two
are super easy but are used all the time uh maybe more than some of these even but
they're just
10:19:56
so easy to use so I'm kind of saved them for last this one is the card and all the
card is is it displays one number or multiple numbers if you want to use a multi-
card but we'll just look at the card for now all we're going to look at is the
total purchased and it's just going to display it just like this and you can make
it as large or as small as you'd like and normally it goes on like the top and
you'll put card here a card here um just for example I'll kind of show you how this
might look so it look
10:20:24
something like this right and at the top it'll have different usually High
overarching information and this is super common to see and I'm sure if you've
looked at other people's visualization you'll see something like this this is
usually totals or averages or something like that in here where it's super easy to
look at so like right here this is total purchased and we can go in and look at the
minimum and then we can go over here and this one can be account and so it gives us
a lot of
10:20:52
information just at a really quick glance and then we have all of our more in-depth
colorful visualizations that kind of have more information than just a single piece
like the card does and then the very last one that I'm going to show you is this
one right here which is the table and this one is obviously extremely popular it's
like an little Excel table and we can go in here and we can get the customer
wherever that is and then we'll also get the unit sold and this is what it looks
like and it's
10:21:20
super easy and oftentimes you'll have it like on the side as well uh and all the
other visualizations over here and so you know if we're going to take all these
visualizations and pretend they were like a real thing you know there's a lot in
here but we'll just kind of really quickly do this um you know we might have
something like this and we'll make this larger and make this wider and you know we
have a lot of information just in here and this is not a project so don't go put
this on your
10:21:49
portfolio I'm just threw a ton of random visualizations on you know this dashboard
but you can already see a lot of these you most likely have seen in other people's
work in other people's visualizations on LinkedIn or on YouTube these are very
common very very popular and again we did not go through all of the ones over here
there are maps that you can use but I haven't used Maps ever in my job there are
things like gauges and decomposition trees and waterfall charts and uh tree maps
and all these
10:22:21
different things but I really have never used those in my actual job and I don't
see them a lot in others people's work either otherwise I would be telling you to
learn these and use these but again try them out see which ones you like if you
like this video be sure to like And subscribe below and go check out all the other
powerbi tutorial videos that I have on my channel and I will see you in the [Music]
next what's going on everybody welcome back to the powerbi tutorial Series today we
are going to be working on our
10:22:59
final project now this is our final project of the powerbi tutorial Series so if
you have not watched all of those videos leading up to this I recommend going and
watching those videos so you can make sure that you know all the things that we're
going to be looking at in today's project I am really excited to work on this
project with you because I think it is a really good one and it uses real data that
we collected about a month ago where I took a survey of data professionals and this
is the raw data
10:23:29
that we're going to be looking at and so I think it's just really interesting that
we collected our own data and now we're using for a project we're going to
transform the data using power query and then we're actually create the
visualizations and finalize the dashboards as well as create a theme and a
different color scheme to kind of make it a little bit more unique without further
Ado let's jump onto my screen and get started with the project all right so before
we jump into it I wanted
10:23:49
to let you know that you can get the data below it is on my GitHub you can go and
download this exact file that we're going to be looking at now in the past several
projects we have been using this fake apocalypse data set you know it was fun it
was you know what whatever this data set is real this is a real data set it was a
survey that I took from data professionals I posted on LinkedIn and Twitter and all
these other places and we had about 600 700 people who responded to the questions
so before we
10:24:16
actually get into it and start cleaning the data and doing all this stuff in
powerbi I just wanted to show you the data all right so this is the CSV that I
downloaded from the survey website that I used and this is completely raw data I
haven't done anything to it at all let's go through the data really quickly and
we'll kind of see what we have and we are not going to make any changes at all in
Excel we're going to do all of our Transformations or at least a few
transformations in powerbi because again
10:24:45
this is a powerbi tutorial and project so I want you to kind of learn how to use
that and not use Excel because you can go through my Excel tutorial if you want to
do that so let's just look at it in Excel and then we'll move it over to powerbi
and actually start transforming the data so we have this unique ID these are all
the people that actually took it oops don't want to do that we have an email which
this was completely Anonymous I didn't collect any data or user data on this then
we have the date
10:25:11
Taken um and let's get into the actual good information then we have all of these
questions so we have question one which title fits you best and they can choose
things now uh let's add a filter really quickly that we can look at this now you
had the pre-selected ones which were like data analyst architect engineer but then
there was an option where you could say other and you could spe specify what that
was so if you look in here we're going to have all these different other please
specify with
10:25:41
different titles right and there were a lot of them now typically what you want to
do is really clean this up and we're not going to be doing a ton ton ton of data
cleaning but we are going to do some in powerbi but none in here but typically with
this amount of data and the way that it's formatted we would do so much data
cleaning um with this one I mean I mean there is a lot of work to be done um like
this current year salary this is one that I would absolutely be cleaning up because
it's ranges and it
10:26:13
has a dash and a k and and all these numbers this is something that I would be
cleaning up and using but we're not going to be cleaning this up right now so
anyways let's just get into it let's see what questions we asked uh we have the
yearly salary what industry do you work in favorite programming language then there
were a lot of different options this is like one question where they picked
multiple options so is how happy are you in your current position with the
following you have your salary work life
10:26:43
balance um then we have co-workers management upward Mobility learning new things
um and they could rank it from zero to 10 so some people ranked upward Mobility a
10 some ranked it a zero or a one um and again they can answer however they want
how difficult was it to break into Data very very difficult very easy um if you're
looking for a new job we have you know what would you be looking for remote work
better salary Etc we have male female which country you from and then this is more
like demographics
10:27:17
so if you're a male how old you are and this was in a Range so this is like a a a
sliding bar so you could slide it to the exact age you had there's some people who
are apparently 92 um which if that's true I mean good for you man or woman actually
really quickly I'm going to see just just while we're here I'm going to see if this
is a male male or a female oh it's a female from India very cool um so we have all
this information and it is a lot of information when you
10:27:47
have something like this I mean there is so much data cleaning that can be done I
mean I already see like 20 plus different things that I would need to do to make
this a lot better um and we also have date Taken and the time taken as as well as
how long it they took on it like the time spent really just really interesting data
but again this is a beginner tutorial Series this is the beginner project so we're
not going to get do anything too crazy I will be using this exact data set in a
future
10:28:21
video doing a lot more data cleaning and creating a much more advanced
visualization with what we have and what we're looking at right here but for this
video we're just going to be doing a pretty simple visualization and D dashboard
that you can use uh to practice with or put on your portfolio if you know that's
where you're at right now so let's get out of here and let's put this into powerbi
so let's exit out and let's come right over here to import data from Excel we'll
click on powerbi
10:28:48
final project and open give that a second doing this all in real time we only have
the one so we'll do be we won't be practicing any joins or anything but we're not
going to load it we're going to transform this data so let's put it into to power
query editor and now we have all of our data in here and it should look extremely
familiar now when I'm looking at this when I start looking at this information I
kind of need to know beforehand what I want to get out of this do I need to
10:29:22
clean every single column do I just need to clean a few of them do I need to get
rid of columns that's kind of where my head's at and so right off the bat I can
already tell you that there are columns that we can just delete to get out of our
way so we're going to do that at the beginning so that we don't have to do that
later on or they're just in our way so I'm going to click on browser and then I'm
going to hit shift and I'm going to go over here to refer and I'm just going to go
up here
10:29:47
to remove columns and everything that we do is going to go over here to this
applied steps if you've been following this series um you know we can remove things
add things but anything we do will show up right over here so we can track it and
go back if we need to now one column that I know for sure that I'm going to be
using quite a bit is this which title fits you best in your current role because I
I specifically wanted to do a breakdown of different people's roles and how much
they make
10:30:13
and different stuff like that so I know that I want to use this but as we saw
before there's kind of the issue is is it's not very clean right it has data
analyst data architect engineer scientist databased developer and then like a
hundred different options and then a student or or none of these right um and so
for the purpose of this video right here we are not going to take every single one
of these options because this involves a lot more data cleaning let me give you an
example this
10:30:47
says software engineer this also says software engineer and with AI these two would
typically be combined or standardized to software engineer but it's not very easy
to do that in powerbi we could do that in Excel but not really in powerbi or even
SQL if we pull this from a SQL database um and you can find lots of different you
know options of that we have data manager and data manager if we separated these
out these would be different options when we created our visualizations and we
don't
10:31:20
want that so what we are going to do uh and this is going to be kind of a an easy
way out to just make sure that this is pretty clean and doesn't we don't have a
thousand different options we're going to create this to other so we're to simplify
this a lot and then we're going to use this so we'll have maybe six or seven
options instead of the you know let's say 50 that we would have if we actually did
the harder work which just break it out standardize it and clean it up that way so
what we're going
10:31:50
to do is we're going to click on this right here and we're going to go up here to
split column in this ribbon up top we'll go to split column and we want to do it by
a delimiter and if you notice let me see if I can move this over if you notice we
have other and then we have this parenthesis and in no other option or way is there
parenthesis so what we're going to do is we're going to use a custom and we're use
this open parenthesis what that's going to do is it's going to separate it by this
10:32:18
parenthesis it's going to leave the other it's going to create separate columns um
just one separate column for each of these and we can do that at each occurrence or
we can do the leftmost and we really we only need it for the leftmost because
there's only one of these uh left-handed or left-sided uh brackets or or what is it
whatever this is called and then let's go and click okay and it should create
another column so it's going to have 0.1 Point 2 and now we have if we click on
this now we
10:32:50
only have these options we have analyst architect engineer data scientist database
developer other and student looking or none that is what we want it makes it so
much simpler and it's not perfect but again I'm trying to show you what we are able
to do in powerbi so now we're just going to remove that column and we're going to
go and do the exact same thing to this one as well because I know that we want to
use this and I really wanted to use this one as well but if we look at this one
also um
10:33:21
there's a lot so I said what is your favorite programming language and people there
were pre-selected answers like JavaScript Java C++ python R things like that and
then there was an other option and in this other option I mean it was free text so
they can fill it in as they want I mean there's four five six different ways that
people put SQL that is something I would standardize and you know that would be the
way I cleaned it but that's not how we did it in here so we're going to do the same
thing we're
10:33:49
going to keep that other so we're going to split this column again we're use a
delimiter and for this delimiter though we're going to use a colon so we're going
to say we're going to do a colon right there we'll just do the leftmost we'll click
okay and then we have our options and it's much simpler now I really would have
rather kept all these and because sql's in there quite a bit but you know a lot of
people don't think SQL is even a programming language so uh
10:34:18
we're going to delete that column now one that I just skipped and I kind of wanted
to go back to is this current yearly salary I really want to use this let's see if
we can use it I here's what I want to do with it and this is not perfect um for
this video I want to try it what I want to do is break up these numbers 106 125 and
then take the average of those numbers so then we'll use some docks in there so
we'll take 106 125 create that into two separate columns then we'll create a third
column
10:34:49
that will give us the average of those two numbers so we'll do 106 plus 125 divided
by two and then we'll have the average of that now that is not perfect but it's
going to give us at least you know an average of kind of roundabout number because
they gave us this range they said my salary is between 106 and 125,000 so if we say
that their salary was 112,000 at least gives us it makes it usable it's a numeric
value instead of being this which is text which we really we could use and and I'll
show you how
10:35:19
to do that because we're going to keep this column I'll create a copy of this and
I'll show you the difference between this and using the average but for but for
this data cleaning portion let's just try it let's see what we can do and see if we
can make it work so first let's create a duplicate so we're going to uh duplicate
the column so now we have this copy at the very very end and we can use this one
instead of having to use the original way way way back here so we're going to leave
that one how it
10:35:52
is and we're going to use this one so let's go ahead and split this one up we're
going to click on the column header then we're going to click on split column and
we'll do it by digit to non-digit and if you look at it right here it's broken it
out kind of um in the fact that now in this one we just have numeric values and in
this one we have k- numeric or just Dash numeric and now this can be easily cleaned
whereas this one we can just completely get rid of because it's only K so we'll
just
10:36:28
remove that column and then in this one we're going to rightclick we're going to
click on replace values and so if it just has we're just do a k we'll replace with
nothing we'll do okay and then for the last one we'll go to replace values and
we'll do the dash or the minus sign and we'll place that with nothing and so now we
have our values as well oh we also have a plus let me get rid of that because
that's when some people had 250 or 225,000 plus so for that one the
10:37:00
average is just going to be 225 we'll have to specify that in our dock I forgot but
actually if somebody has 225 let me find this plus really quick uh let me filter by
it because that's a lot faster what we actually want to do for the purpose of this
one is we want to put 225 here so that when we do 225 plus 225 divide by two it
comes out to 225 that's just what we're going to put it as and there's only two
people so uh I'm actually going to replace this I'm going to do replace values I'm
G to say
10:37:32
Plus with 225 and we'll click okay awesome we can unfilter these select all so
we're going to go right up here to add column we're going to say custom column and
we're going to go right over here actually let's make it uh average salary let's
make it average salary so we're going to insert this I'm going to say parentheses
and we're going to say plus this insert and close the parenthesis divided by two
and it says no syntax errors have been detected let's click on okay and
10:38:17
it's giving us an error so it's saying we cannot apply operator plus to types text
and text which makes perfect sense these aren't uh numbers so let's make it a whole
number and let's make it a whole number and then let's see if this will actually
work no or maybe we just need to try a whole another one so let's try transform or
add column custom column let's try this all again see if uh I can make it work
insert do this one plus this one and we'll do divid by two and let's
10:38:59
try this one and there we go so now let's get rid of this column columns and we can
actually remove these ones as well because now we have this um average salary
column which when we look at this or when we use this uh we can let me see if I can
just move this way way way over all right I might cut because this is taking
forever so if you take the average of these two numbers you'll get 53 if you take
the average of 0 and 40 you'll get 20 so now we have this average salary and again
when we get to
10:39:36
the actual visualization part I'll show you why this isn't as useful as having this
average salary and just a reminder this is not perfect uh I wouldn't typically do
this especially if I had it in Excel or if I was you know creating this survey in a
different way I would probably have a very specific value where they could do it on
a slider but this is how it is so we've at least made it usable or more usable in
my mind and we have a few other things that we can change like what industry do you
work in
10:40:05
where we can break this one out so I'm going to go ahead and break this one out as
well as this one right here which country do you live in I'm going to break bro
both of those out to where it's the country or other I'm not going to have these
other values although there are a lot of them because there's a lot of people who
live in these different countries but we can't really do that super well in here
because again the same issue kept happening Argentina Argentina Argentine
10:40:31
a Australia so we can't normalize those values unless we spend just a copious
amount of time doing that so I'm going to go ahead and do these I'm going to fast
I'm going to fast speed this so it goes a lot faster so I'm just going to go silent
and let this happen really quick and then we'll get to the end and we'll actually
start building our visualizations all right so we've split them up and as you can
see we have all the these options as well as other and I think you know there is
let me tell you
10:41:08
there is so much more that we could do with this I mean just so many other things
but this is like what the bare minimum of what we need for this project so let's go
ahead and close and apply this and if we need to come back at any point and
actually fix anything or change anything we can so it's not like that's permanent
um so as you can see we have everything over here we have all our data as it is
transformed in here as well and now we can start building out our visualization
let's go back to our
10:41:42
report and let's start building something out all right so let's add a title to our
dashboard we want to make this right at the top we call this the data professional
survey breakdown and let's make make that quite a bit larger make it bold why not
and we'll put that in the center and now let's um let's add some effects let's
change that background to something like it's too dark something like this and I do
not like that Boldt let's take that off there we go so something like this
10:42:26
just as a quick title to what we're about to do what we are about to build so we're
going to start off with the most simple visualizations that we're going to do and
we'll kind of work our way towards kind of the harder ones so the first one that
we're going to start off with is a card and the cards are obviously like just super
super easy they usually just display one piece of information so we're going to go
right over here to the very bottom at the unique ID and we're going to select it
10:42:54
and we're going to say a account of distinct or account it doesn't matter um it
says 630 count of unique ID now we're not going to keep that as is we're actually
going to go right over here we're going to say rename for this Visual and it says
count of unique ID but we're going to say count of survey takers and you can say
whatever you want here but in in general that is what it is we're we're counting
how many people um you know took this survey and that's just a kind of a total
10:43:25
maybe I should say total amount or of survey takers but you can say count of survey
takers how many people took this survey so let's click out of there let's click on
card let's make it about the same size we're going to drag it up here and try to
make them about the same we will in a little bit we'll make them the same size um
but for this one we're going to look at age so we're going to look at current age
so I'm going click on that and we'll say want the average
10:43:54
age so our average age taker is almost 30 years old so let's go right over here
we're going to say rename for this visual we'll say a average age of survey oop
this might be too long average age of survey taker again name it whatever you'd
like so again these are meant to be highlevel numbers so when somebody's looking at
your dashboard they can just really quickly glance at this and know exactly what it
is instead of like some of these other visualizations that we're about to
10:44:25
create they don't really have to dig into it look at the x- axis the y axis the the
different uh Legend colors and whatnot they can just see these high numbers and get
a really quick glance of the data now let's create our first visualization and what
we're going to do for that one is a clustered bar chart so let's go ahead and click
on the clustered bar chart we can create as small or as large as we'd like and for
this one we're going to be looking at the job titles now remember we kind of
10:44:54
changed the job titles or you know U transform those if you want to say that so
we're going to look at Job titles and then we're going to look at their average
salary and if you remember we transformed that one as well we have a average salary
now this one is it looks like a text right now so it may not work properly and what
we're actually going to do is go over here I want to see the average salary so
let's click on average salary and see if we can change this data type from a text
to a decimal number let's
10:45:27
click yes I forgot to do that when we were transforming it and there we go this is
perfect um so now we can go back and we can select our average salary and as you
can see it has this um this function symbol and so now we can click on it and it'll
look a lot better and although this says average salary as the title it's actually
doing a count or the sum so we can click average right here and what we want to do
is actually break this down by the job title and so now we can see data scientists
are
10:46:00
making the most by far far they're making average of 93,000 at least from the
survey takers that took it then we have our data Engineers making 65,000 data
Architects are making 63 and then where the data analysts data analysts are right
here making 55 so again we had 630 people take this survey and so the vast majority
of them were data analysts so this one's probably the most accurate out of all of
them and I actually don't like how this looks as the cluster bar chart let's try
the
10:46:32
stocked bar chart and put this as the legend that's more what I was going for I
don't know I didn't want as skinny because when you're doing this one it typically
they have multiple options per um uh x axis and so I think that's why it was that
little skinny line but this one is more what I was looking for but let's make that
smaller and let's definitely change that title because good night um this is like
incredibly long let's go over here to this format visual ual we'll go to the
general the
10:47:05
title and we're just going to say average salary by job title just like that and
this looks a lot better now we're not going to kind of format all our whole
dashboard yet we're going to create our visualizations and then we're going to kind
of organize everything and kind of play Tetris with it to make it look the best so
we're just going to minimize this and put it right up here for now um but we will
go back and kind of make everything look better at the end and actually while we're
here I also
10:47:40
want to change this as well so rename for this we're going to say job title Oops
why did I do that job title and for this one we're just going to say name average
salary there we go looks much better much cleaner uh took away a lot of the anxiety
that I was feeling about two minutes ago when we first put that up there so let's
go on to our second visualization the next one that I'm interested in is actually
what programming language people were using the most so we have salary there's a
10:48:20
thousand different things we can look at in here but I want to know you know what
is people's favorite programming language so let's take a look at that so we have
favorite programming language let's find that so we have our favorite programming
language and we also have how many people actually took it or the unique people so
right now this is columns we don't want that let's um let's do a clustered column
chart click on this right here and it looks like here we go that is kind of what
10:48:51
we're looking for and instead of count of unique ID we'll say count of let's do
count of Voters and for favorite program language we'll say favorite oops favorite
programming language and get rid of that as well and then we're going to go into
here also and change the title and say favorite programming languages or favorite
pro programming language just like this now let's make this a lot bigger so you can
see it but really quickly at a glance you can see python is by far the most popular
are
10:49:32
other C++ JavaScript Java now all we're seeing is the count so it's all the same
it's just blue we can see how many people voted for each one but if we wanted to
break it out similar to how we did with the job titles we could still do that so
all we'd have to do is break it out uh or bring this job title down to the legend
and now breaks out like this and that's not exactly what I was going for I was
going more for something like this where we can see the still the whole count but
now we can see who is
10:50:01
actually V voting for these things so I'm just not a huge fan of the colors that
are pre-selected here and kind of the whole theme of this dashboard at the very end
we're going to completely revamp this change a bunch of colors the background and
make this look a lot nicer rather than just the white background like we have it um
and so for now let's just make this a lot smaller and put it into this corner these
will not be staying there but we need to we need room to create our next
visualizations
10:50:31
and just just a cleaner space to do things now the next thing that I really want to
include is a way to break down where they're from their country because especially
something like salary is very dependent on your country whereas the average salary
in the United States for a data analyst may be like 60,000 in another country it
could be 20,000 that could bring down the average quite a bit so we need a way to
be able to break that down now we can do something like a filled map and there's no
problem with
10:51:00
that at all um but you know for what we're building what we're creating it's not
probably going to work out the best I mean this looks okay we could stick it in the
corner or something um and you can do that and that's perfectly fine I think what
I'm going to do is something like a tree map which I don't use a lot but I want
something where they can just click on it they can look at the values distinct they
can look at the values and just click on it and it'll be right
10:51:30
there for them so they don't have to filter it out on their own or no geography and
look at this map they can just read Canada other United Kingdom India United States
and click on that and so for example let's click over here on United States the
numbers change quite a bit now the average salary for a data scientist is 139,000
for data analyst it's 80 and if we look at India you know the average salary for a
data scientist is 68 the average salary is 26 for a data analyst that doesn't mean
that they make less
10:51:59
money in India that just means that the cost of living is probably lower in India
therefore they don't need the higher US Dollars salary because again this was all
done in US dollars so just something to think about uh let's click out of that so
we'll keep that one as well so now let's create our next visualization and this is
one that I do not get to use enough in my actual job so we're going to use it in
this project um and it's going to be this gauge right here so let's add that one
put it right
10:52:25
over here we're going to add two of those let's just go ahead and add another one
while we're at it because we're going to have them kind of like right here right
next to each other the first one and these ones are really good for kind of looking
at these kind of surveys and I don't get to work with surveys enough but we can see
you know how happy are they in terms of work life balance so we can add that we're
going to add work life balance um and right now it's doing a count and we don't
have
10:52:51
minimum or maximum values in there yet so it's going to look kind of weird but
we're going to look at the average rate or the the average score of these then
we're going to pull this over to the minimum value and we want to put that at the
minimum and pull this over and add the maximum value so now it actually has zero to
10 and it shows that the average person is happy with which one was this their
average person is happy with their work life balance uh they rate about a 5.74
overall now let's really quickly
10:53:23
change the title of this because this is ridiculous I want to say happy with work
life balance so this is their rating uh you know change it to whatever title you
want that's what I'm going to do and we'll also do happy with their salary let's
click on salary We'll add that to minimum and we'll add the maximum value as well
to make sure that we know how to use that and then we'll take the average so not
many people are happy with their salary I'm just finding out I mean this
10:53:55
is a real survey this is real data so I mean it's h pretty interesting let's go to
the title let's go to happy with or maybe it's happiness happiness with salary
maybe that's what we should make it and I'm going to change that over here as well
I think it sounds better some of this I've already planned out some I haven't this
is not something I've planned out so uh so we're going to say happiness with work
life balance happiness with salary really interesting
10:54:23
um we may go back and tweak these just a little bit in the future but the very last
visualization that we're going to do is male versus female kind to got to have that
in there um I don't typically like pie charts and dut charts but uh you know I'm
feeling I'm just feeling it so let's try it um and we will do let see let's make
this larger so we have male female and what do we want to look at like what do we
want to measure so we have male versus female we can measure anything um but maybe
what we'll do is
10:54:56
the average salary again I mean we've kind of only looked at salary once in this
one right here um and a little bit of like how happy they are but we'll look at the
average salary between males and females and then we'll look at not the current age
Oops I meant average salary and then we'll look at the average and it looks like
the average salary is actually really close versus males versus females 55 for
female versus 53 for male so actually the females are a little bit higher
congratulations so they're just a little
10:55:34
bit higher in terms of pay so now we need to start organizing all of this cleaning
it up making it look a lot better than it does right now it looks great uh you know
but we can do a lot more with this so I'm gonna we're we're going to keep these or
all these kind of over on this left hand side I'm GNA put this I want this up here
we also need to change that title I want this up here um and again we're going to
kind of change the theme as we go I I just want to format it right we'll have it
just like this let's
10:56:08
change the title of this let's go to title and we're going to say country of survey
takers uh I'm not the the survey takers I'm not really stuck on that if you find
something better you think of something better I would go with that but um you know
it definitely doesn't look bad and where did this where did my other visualization
go there goes um I think this one I want to make kind of more tall um so I might
move it this way jeez this is such a I hate I hate having a lot of visualizations
on here it just
10:56:41
really is annoying to me so what we're going to do I think we're gonna step this to
the side put this to the side as well I want to make it to where it's just okay I
didn't want it to cut off we'll do that might make these um make these a little
bigger actually so I want it to kind of match the size like right there I'll match
this perfect this one I kind of want to bring over here and bring it down a little
bit maybe something like this maybe I'm not sure I'm not I'm not
10:57:25
sold on that um I added a few different visualizations that I didn't have in my
original so now I'm kind of having to do this on the fly so I might fast forward
some of the parts where I'm like really thinking about it or taking too much time
on it but I'm going to bring this down a little bit actually because I don't like
how close that is to um the the text above it but one thing we do need to do I'm
going to put this up kind of like this I think that looks fine I think I'm
10:57:56
going to put this at the very bottom so let's make some room for it all right just
like that stretch it to the side and we'll lower it and I think we'll keep that as
is kind of like this um okay there's a lot going on in here and there are some
things I'm just noticing as we're walking through this that I kind of missed um
like I need to change some titles and stuff like that so let me go ahead and change
some of those things so we're going to do title do average salary by gender or by
10:58:37
sex do like that average salary by sex I also don't like that it's in the middle um
I don't like that it's on the outside I want them on the inside for this so let's
go to the details let's go to inside and see if that looks any better oh that looks
terrible um let me see if I can change that maybe I don't no I definitely want it
um I guess we'll do outside I you can't even see the information oh the decimal is
crazy long um let me go and see if I can change that decimal to just like a
10:59:12
whole number or like 1.1 uh because that's a problem so maybe I need to go over
here to the value all right so I think I want to change this one it's just not
working out exactly how I wanted and you guys know if I make mistakes I'm going to
keep it in here so you guys can see it I I hoped that this was going to turn out
better but it didn't um one that I do want to add because this is kind of a a
breakdown and a nice visualization I want to add this difficulty piece so I want to
add this how difficult was it
10:59:43
for you to break into data science let's get rid of these and I want to click on
this really quickly see what it gives us um values okay so now this shows us
percentages um of how easy it was again it's neither easy nor difficult difficult
easy very difficult very easy these numbers make absolutely no sense we need to
kind of order them a little better so I'm going to come over here to slices we have
our colors over here we want very difficult to be like the most difficult um so
we're going to make that
11:00:20
red and then we want difficult to be maybe like an orange let see if we can find an
orange there we have an orange this does not look red enough there we go oh no no
no very difficult is red difficult is orange we have neither easy nor difficult and
that's kind of a neutral um let's see if we have something neutral in here kind of
like this yellow I don't know let's try it out then we have easy and very easy and
these will be like our Blues so I'm going to keep that um I'm
11:00:55
going to keep that kind of like a dark blueish and then our blue for super easy is
just going to be like really blue U and that doesn't look bad the I mean look I'm
I'm not a color person I I'm not great with colors and we're going to kind of
organize this in just a little bit but this looks better to me um but we need to
change up some stuff as well like the title need to do difficulty to break into
Data there we go and we're also going to change this title right here we're just
11:01:36
say difficulty difficulty difficulty this looks better to me um again not perfect
and there's a thousand different things you could have done but that's just what
we're going to do I need to go through here and see what I need to change so right
off the bat I can see I need to change this um to let's see right here I'm going to
rename this job title just like we did in this one right here uh count of Voters
that's fine progr language breaking into difficulty happiness happiness average
count okay okay so
11:02:15
what we have here is very close to a finished product now it's not 100% complete I
mean I I do want to make it look a little nicer rather than just the typical white
so what we're gonna do we're GNA go up here we'll go to uh what is it View and we
have all these different filters and we're just going to play around with it see if
we can find something that we like um this doesn't look too bad it's not really my
style we can do this one Frontier this is pretty neat I kind of am digging this
11:02:51
we might come back to it I like the natural tones I don't know why I said tones
like that but I did um this one's not bad but I don't I don't it's not that's not
my I don't like how dark that is um and so maybe it's like you know we change like
the background color of all of these as well as match it with um match it with
something else whatever you want genuinely you customize this however you want I
kind of like this one it's kind of groovy man and um it's not
11:03:24
perfect by any means but what we can do and we can customize this current theme we
can come in here customize this theme however we'd like I personally don't want
color five which is the data analyst color I don't like it to I don't want to go go
and change it because I don't like it but I don't really like that color per se you
know I might want to choose a different color um but it has to be like this muted
like that it has a style to it so you can come in here and you can customize this
and make
11:03:56
it however you'd like and and really mess around with it play play around with it
for me uh I'm just going to keep it how it is because I don't really want to mess
with it and break it or anything like that so U let me just put that up just a tiny
bit so this is it this is the project I hope that it was helpful um I am not joking
when I say that I'm because I'm gonna do a different project I'm gonna go really in
depth in another project it's probably gonna be like a two-hour project it's going
to be crazy
11:04:25
long um well for a YouTube video but I can see doing thousand different things with
this data creating a really great dashboard really cleaning the data which is a
large part of of actually doing this and we didn't do much data cleaning at all
there's just so much you can do with this and so really dig into this see what you
like see what you don't like see what you want to clean what you don't want to
clean you could put it in SQL you could put it in um Excel and just and just
standardize the data to
11:04:55
make it a lot more usable do whatever you want with it I mean I I took this survey
for you guys that we could use it so go out and use it and make the best dashboard
that you can possibly do so I hope that this was helpful I hope that you enjoyed
this thank you so much for watching this video If you like this thank you so much
for watching if you like this video be sure to like And subscribe below and I'll
see you in the next [Music] video what's going on everybody welcome back to another
video today we're going to be
11:05:34
starting our Python tutorial [Music] series now I am extremely excited for this
series we're going to be walking through all the things that you need to know to
get started in Python we'll be looking at variables data types for Loops y Loops
operators and a ton more after this beginner series we're going to be going into
another set of Series where we look at pandas mat plat lib Seaborn web scraping and
more now in this video we're just going to be setting up our environment to where
we
11:06:04
can learn python in future videos in this series we're going to be using jupyter
notebooks for all of our tutorials because I feel like it's a really great place to
learn the basics but then in future videos I'll show you different idees that you
can use for your python code I genuinely cannot wait to get started on this series
I absolutely love python so without further Ado let's jump on my screen I'm going
to show you how to install jupyter notebooks all right so let's get started
11:06:24
by downloading anaconda anaconda is an open- Source distribution of python and R
products so within Anaconda is our Jupiter notebooks as well as a lot of other
things but we're going to be using it for our Jupiter notebooks so let's go right
down here and if I hit download it's going to download for me because I'm on
Windows but if you want additional installers if you're running on Mac or Linux
then you can get those all right here now if you are running on Windows just make
sure to check your
11:06:50
system to see if it's a 32bit or a 64 you can go into your about in your system
settings to find that information I'm going to click on this 64 bit it's going to
pop up on my screen right here and I'm going to click save now it's going to start
downloading it it says it could take a little while but honestly it's going to take
probably about 2 to three minutes and then we'll get going now that it's done I'm
just going to click on it and it's going to pull up this window right here we are
11:07:17
just going to click next because we want to install it this is our license
agreement you can read through this if you would like I will not I'm just going to
click I agree now we can select our installation type and you can either select it
for just me or if you have multiple admin or users on one laptop you can do that as
well for me it's just me so I'm going to use this one as it recommends now it's
going to show you where it's installing it on your computer this is the actual file
path
11:07:44
it's going to take about 3.5 gigs of space I have plenty of space but make sure you
have enough space and then once you do you can come right over here to next and now
we can do some Advanced options we can add Anaconda 3 to my path environment
variable and when you're using python you typically have a default path with
whatever python IDE or notebook that you're using I use a lot of Visual Studio code
so if I do this I'm worried it might mess something up so I am not going to do this
it also says it doesn't
11:08:15
recommend it again messing with these paths is kind of something that you might
want to do once you know more about python so I don't really recommend you having
this checked we can also register in AA 3 as my default python 3.9 you can do this
one and I'm to keep it this way just so I have the exact same settings as you do so
let's go ahead and click install and now it is going to actually install this on
your computer now once that's complete we can hit next and now we're going to hit
next
11:08:43
again and finally we're going to hit finish but if you want to you can have this
tutorial and this getting started with Anaconda I don't want either of them because
I don't need them but if you would like to have those keep those checked and you
can get those let's click finish now let's go down and and we're going to search
for Anaconda and it'll say Anaconda Navigator and we're going to click on that and
it should open up for us so this is what you should be seeing on your screen this
is
11:09:11
the Anaconda Navigator and this is where that distribution of python and R is going
to be so we have a lot of different options in here and some of them may look
familiar we have things like Visual Studio code spider our studio and then right up
here we have our Jupiter notebooks and this is what work we're going to be using
throughout our tutorials so let's go ahead and click on launch and this is what
should kind of pop up on your screen now I've been using this a lot um so I have a
ton
11:09:39
of notebooks and files in here but if you are just now seeing this it might be
completely blank or just have some you know default folders in here but this is
where we're going to open up a new Jupiter notebook where we can write code and all
the things that we're going to be learning in future tutorials and you can use this
area to save things and create folders and organize everything if you already have
some notebooks from previous projects or something you can upload them here but
what we're going to
11:10:06
do is go right to this new we're going to click on the drop down and we're going to
open up a Python 3 kernel and so we're going to open this up right here now right
here is where we're going to be spending 99% of our time in future videos this is
where we're going to write all of our code so right here is a cell and this is
where we can type things so I can say print I can do the famous hello world and
then I'll run that by clicking shift enter and this is where all of our code
11:10:34
is going to go these are called cells so each one of these are a cell and we have a
ton of stuff up here and I'm going to get to that in just a second one thing I
wanted to show you is that you don't only have to write code here you can also do
something called markdown and so markdown is its own kind of you could say language
but um it's just a different way of writing especially within a notebook so all
we're going to do is do this little hashtag and actually I think it's a pound sign
but
11:10:59
I'm G to call it hashtag we're going to do that and we're going to say first
notebook and then if I run that we have our first notebook and we can make little
comments and little notes like that that don't actually run any code they just kind
of organize things for us and I'm going to do that in a lot of our future videos so
just want to show you how to do that now let's look right up here a lot of these
things are pretty important uh one of the first things that's really important is
actually
11:11:21
saving this so let's say we wanted to change the title to I'm going to do a AA
because I want it to be at the beginning um so I can show you this I'm do AA a new
notebook and I'm going to rename it and then I'm going to save that so if I go
right back over here you can see AAA new notebook that green means that it's
currently running and when I say running I mean right up here and if we wanted to
we go ahead and shut that down which means it wouldn't run the code anymore
11:11:50
and then we'd have to run up a new cluster uh so let's go ahead and do that I
didn't plan on doing that but let's do it so we have no notebooks running and right
here it says we have a dead kernel so this was our Python 3 kernel and now since I
stopped it it's no longer processing anything so let's go ahead and say try
restarting now and it says kernel is ready so it's back up and running and we're
good to go the next thing is this button right here now this is an insert cell
below so if I
11:12:18
have a lot of code I know I'm going to be writing I can click a lot of that and I
often do that because I just don't like having to do that all the time so I make a
bunch of cells just so I can use them you can delete cells so say we have some code
here we'll say here and we have code here and then we have this empty cell right
here we can just get rid of that by doing this cut selected cells we can also copy
selected cells so if I hit copy selected cells and I can go right here and say
paste selected
11:12:48
cells and as you can see it pasted that exact same cell you can also move this up
and down so I can actually take this one and say I wanted it in this location I can
take this cell and move it up or I can move it down and that's just an easy way to
kind of organize it instead of having to like copy this and moving it right down
here and pasting it you can just take this cell and move it up which is really nice
now earlier when I ran this code right here I hit shift enter you can also run and
it'll run the cell
11:13:17
below so you can hit run and it works properly if you're running a script and it's
taking forever and it's not working properly at least it's you don't think it's
working properly you can stop that by doing this interrupt the kernel right here
and anything you're trying to do within this kernel if it's just not working
properly it'll stop it you can restart it then you can try fixing your code you can
also hit this button if you want to restart your kernel and this
11:13:40
button if you want to restart the kernel and then rerun the entire notebook as we
talked about just a second ago we have our code and our markdown code we're not
going to talk about either of these because we're not going to use that throughout
the entire series the next thing I want to show you is right up here if you open
this file we can create a new notebook we can open an existing notebook we can copy
it save it rename it all that good stuff we can also edit it so a lot of these
things that we were
11:14:06
talking about you can cut the cells and copy the cells using these shortcuts if you
would like to we also go to view and you can toggle a lot of these things if you
would like to which just means it'll show it or not show it depending on what you
want so if we toggle this toolbar it'll take away the toolbar for us or if we go
back and we toggle the toolbar we can bring it back we can also insert a few
different things like inserting a cell above or a cell below so instead of saying
This plus button you can just say
11:14:31
A or B adding above or below we also have the cell in which we can run our cells or
run all of them or all above or all below and then we have our kernels right here
which we were talking about earlier where we can interrupt it and restart those
there are widgets we're not going to be looking at any widgets in this series but
if it's something you're interested in you can definitely do that and then we have
help so if you are looking for some help on any of these things especially some of
these
11:14:57
references which are really nice you can use those and you can also edit your own
keyboard shortcuts and now that we walked through all of that you now have anacon
and jupyter notebooks installed on your computer in future videos this is where
we're going to be writing all of our python code so be sure to check those out so
we can learn python together thank you guys so much for watching I hope you were
able to get everything installed correctly I am super excited for this series ahead
of us if you like this video be sure to
11:15:19
like And subscribe below and I will see you in the next [Music] video [Music] hello
everybody today we're going to be learning about variables in Python a variable is
basically just a container for storing data values so you'll take a value like a
number or a string you can assign it to a variable and then the variable will carry
and contain whatever you put into it so for example let's go right over here we're
going to say x and this is going to be our variable we're going to say is equal to
now we can
11:15:58
assign the value to it so let's say I want to put 22 x is now equal to 22 so we
won't have to write out the number 22 in later scripts that we write we can just
say x because X is equal to 22 it now contains that number so now we can hit enter
and say print we do an open parentheses and we'll say x now I'm going to hit shift
enter and now it prints out that 22 because we are printing x and x is equal 22
this is our value and this is our variable one really great thing about variables
is that it assigns its own
11:16:36
data type it's going to automatically do this so we didn't have to go and tell X
that it's an integer it just automatically knew that 22 is a number so we can check
that by saying type and then open parenthesis and writing X and we'll do shift
enter again and this says that X is an integer type now we only assigned an integer
to X let's try assigning a string value or some text to a variable so we'll say Y
is equal to uh let's say mint chocolate chip I'm feeling some ice cream today so
we'll
11:17:10
say mint chocolate chip now if we print that again we'll do print open parenthesis
Y and do shift enter it'll print mint chocolate chip and if we look at the type we
can see that the type is a string this time and not an integer now again we did not
tell it that X was an integer and Y was a string it just automatically knew this
let's go up here really quickly we're going to add several rows in here because
we're about to write a lot of different variables and really learn in- depth how to
use
11:17:43
variables the next thing to know about variables is that you can overwrite previous
variables right now we have mint chocolate chip and that is assigned to the
variable y so if I go down here I say print y I hit shift enter it's going to print
out mint chocolate chip but if I go right above it I say Y is equal to and let's
say chocolate if I print that out it's now going to say chocolate whereas up here
I'm reassigning it to Y it's still going to say mint chocolate chip so if I come
right down here and I
11:18:18
copy this and I'm going to paste this right here initially it is going to assign y
to Chocolate but then right here it will automatically overwrite y as mint
chocolate chip and when we hit shift enter it's going to show mint chocolate chip
variables are also case sensitive so if I come up here and I say a capital Y this
is a lowercase Y and this is a capital Y it is going to print out the correct one
instead of mint chocolate chip and then if I go down here to the print and I type
the capital
11:18:49
Y it will give us the mint chocolate chip up till now we've only assigned one value
to one variable but we can actually assign multiple values to multiple variables so
let's do X comma y comma Z is equal to and now we can assign multiple values to all
of those so we can say chocolate and then we'll do a comma oops a comma then we can
say vanilla and then we'll do another comma and we'll say rocky road now this is
going to assign chocolate to X vanilla to Y and Rocky Road to Z so what
11:19:33
we can do is we'll say print and we'll go print print print and we'll say X Y and Z
so it prints out chocolate vanilla and rocky road and these are our three different
values we can also assign multiple variables to one value and we can do this by
saying X is equal to Y is equal to Z is equal to and we can put whatever we would
like let's do root beer float then we'll come back up here we'll copy this and
let's print off our X our Y and Z and they are all the exact same now so far we've
11:20:13
really only looked at integers and strings but you can assign things like lists
dictionaries tupal and sets all to variables as well so let's go right down here so
let's create our very first list I'm going to say icore cream is equal to and that
is our variable right there the ice cream is our variable so now we're going to do
an Open Bracket like this and we're going to come up here and copy all of these
values and we're going to stick it within our list so now within ice cream we have
three string values
11:20:45
chocolate vanilla and rocky road all within this list so what we can do is we can
say x comma y comma Z is equal to icore cream so so now these three values
chocolate vanilla and rocky road will be assigned to these three variables X Y and
Z and we can copy this print up here and we'll hit shift enter and now the X Y and
Z all were assigned these values of chocolate vanilla and rocky road now something
that we just did which is really important or something that you really need to
consider is how
11:21:22
you name your variables so right here we have ice cream now this to me is exactly
how I usually write my variables but there are many different ways that you can
write your variables so let's take a look at that really quickly and let's add just
a few more because I have a feeling we're going to go a little bit longer than what
we have so there are a few best practices for naming variables first I'm going to
show you kind of what a lot of people will do I'll show you some good practices and
I'm going to
11:21:49
show you some bad practices as well that you should avoid doing the first thing
that we're going to look at is something called camel case and let's say we want to
name it t test variable case oops case now if we have a test variable case the
camel case is going to look like this we'll have lowercase test and then we'll have
uppercase variable and uppercase case is equal to this is what this variable is
going to look like and we can assign it a nilla swirl and this is what your camel
case
11:22:25
will look like it's going to be lowercase and then all the rest of those uh
compound words or however you want to say that these letters are going to be
capitalized to kind of separate where the words end and begin let's go right down
here we're going to copy this the next one is called Pascal case so Pascal case is
going to look just a little bit different instead of the lowercase at test it's
going to be a capital T in test so test variable case again this is a very similar
way of writing it very
11:22:54
similar to camel case U but just a capital at the beginning now let's look at the
last one and this one is my personal favorite this one is going to be the snake
case now this one is quite a bit different in the fact that you don't use any
capital letters and you separate everything using underscore so we're going to
write testore variable underscore case now typically let me have them all in there
typically these are the best practices these are what you typically want to do but
probably the best one to to use is
11:23:29
this snake case right here what a lot of people say is that it improves readability
if you take a look at either the camel case or the Pascal case which you will see
people do it's not as easy to distinguish exactly what it says and the name of a
variable is important because you can gain information from it if people name them
appropriately so when I'm naming variables I usually write it in snake case because
I just find it a lot easier to read because each word is broken up by this
underscore score so now let's look at
11:24:00
some good variable names these are all ones that you can use or could use let's do
something like test VAR so test VAR is completely appropriate we can also do
something like testore VAR oops underscore we could do underscore test underscore
VAR you'll see that often as well well people will start it with an underscore you
can do test bar capital T oops capital T capital V in test VAR or you could even do
something like test VAR two now adding a number to your variable is not inherently
a Bad Thing usually it's
11:24:44
semif fround upon but there are definitely some use cases where you can use it but
one thing that you cannot do is do something like putting the two at the front if
you put the two at the front it no longer works it won't run properly at all so
we're going to take that out so we can't do that so I'm going to use this as an
example of what you should not do you also can't use a dash so something like test-
var2 that doesn't work either and you also can't use something like a
11:25:17
space or a comma or really any kind of symbol like a period or a backslash or equal
sign none of those things will work within your variable now another thing that you
can do within your variable is use the plus sign so let's assign this we'll say x
is equal to and we'll do a string we'll say ice cream is my favorite and then we'll
do a plus sign and we'll say period now what this will do is it will literally add
these two strings together so let's do print and we'll do X so now it says ice
cream is
11:25:57
my favorite one thing that we cannot do in a variable is we cannot add a string and
a number or an integer so we can't do ice cream as my favorite two if we try to do
that it will give us this error right here so in this error it's saying you can
only concatenate a string not an integer to a string so only a string plus a string
for this example you can also do and we'll say x is equal to or we'll say y we'll
say Y is equal to 3 + 2 and it should output five because you can also do an
integer and
11:26:34
an integer now so far we've only been outputting one variable in the print
statement but you can actually add multiple variables within a print statement so
let's go right down here we're going to say let's give it some more right there so
we'll say x is equal to ice cream and we'll say Y is equal to is and then the last
one Z is equal to my favorite and we'll do a period at the end now we can go to the
bottom and we can say print x + y + C and when we enter that and when we run and
when we run
11:27:17
that we get ice cream is my favorite now we can actually add a space before is a
space before my and when we hit shift enter it says ice cream is my favorite you
can also do this exact same thing with numbers as well so we'll say x = to 1 2 and
what Z is equal to three so this should equal six now one thing that we tried to do
was assign to one variable a string plus an integer and that did not work but what
you can do is you can take something like this and you can say ice cream and we'll
get rid of this one and
11:27:53
we'll get rid of the Z now saying plus is actually not going to work let's try
running this so again we can't concatenate these but what we can do in the print
statement is we can separate it by a comma so when we add this comma it should work
properly let's hit enter and it says ice cream 2 again this makes no sense but you
are able to combine a string and an integer separating by a comma now this is the
meat and potatoes of variables there are some other things as well but some of
11:28:21
those things are a little bit more advanced and not something I wanted to cover in
this tutorial although we may be looking at some of those things in future
tutorials but this is definitely the basics what you really really need to know
about variables I hope that this video was helpful if it was be sure to like And
subscribe below and I will see you in the next [Music] video hello everybody today
we're going to be talking about data types in Python data types are the
classification of the
11:28:57
data that you are storing these classifications tell you what operations can be
performed on your data we're going to be looking at the main data types within
python including numeric sequence type set Boolean and dictionary so let's get
started actually writing some of this out and first let's look at numeric there are
three different types of numeric data types we have integers float and complex
numbers let's take a look at integers an integer is basically just a whole number
whether it's
11:29:23
positive or negative so an integer could be a 12 and we can check that by saying
type we'll do an open parenthesis and a Clos parenthesis and if we say the type of
12 it's going to give us an integer or if we say a -2 that is also an integer we
can also perform basic calculations like -2 + 100 and that'll tell us it is also an
integer so whether it's just a static value or you're performing an operation on it
it's still going to be that data type if those numbers are whole numbers whether
11:29:53
negative or positive now let's take this exact one and let's say 12 and we'll do+
10.25 when we run this it's no longer going to be a whole number it'll now be a
float so let's check this and now this is a float type because is no longer a whole
number it's now a decimal number and the last data type within the numeric data
type is called complex let's copy this right down here now personally this is not
one that I've used almost ever but it is one just worth noting so you can do 12
plus and
11:30:25
let's say 3 J and if we do this it's going to give us a complex the complex data
type is used for imaginary numbers for me it's not often used but if you do use it
J is used as that imaginary number if you use something like C or any other number
it's going to give you an error J is the only one that will work with it now let's
take a look at Boolean values so we'll say Boolean the Boolean data type only has
two built-in values either true or false so let's go right down here and
11:30:59
say type true and when we run this it'll say bu which stands for Boolean we can do
the exact same thing with false that is also Boolean and this can be used with
something like a comparison operator so let's say 1 is greater than 5 and let's
check this this is giving us a Boolean because it's telling us whether one is
greater than five let's bring that right down here this will give us a false so
it's telling us that one is not greater than five and just as we got a false we
11:31:31
can say 1 is equal to one and this should give us a true so now let's take a look
at our sequence type data types and that includes strings lists and tupal let's
start off by looking at strings in Python strings are arrays of bytes representing
Unicode characters when you're using strings you put them either in a single quote
a double quote or a trible quote I call them apostrophes it's just what I was
raised to call them but most people who use Python call them quotes so right here
we
11:31:59
have a single quote and that works well we can do a double quote and that works
also and as you can see they are the exact same output and then we have a triple
quote just like this and this is called a multi-line so we can write on multiple
lines here so let's write a nice little poem so we'll say the ice cream vanquished
my longing for sweets upon this diet I look away it no longer exists on this day
and then if we run that it's going to look a little bit weird it's basically giving
us the raw
11:32:40
text which is completely fine but let's call this a multi-line and we're going to
call this a variable multi-line and we're going to come down here and say print and
before I run this I have to make sure that this is Ran So now let's print out our
multi-line and now we have our nice little poem right down here now something to
know about these single and double quotes is how they're actually used so if we use
a single quote and we say I've always wanted to eat a gallon of ice cream and then
we do an
11:33:16
apostrophe at the end obviously something went wrong here what went wrong is when
you use a single quote and then within your text within your sentence you have
another apostrophe it's going to give you an error so what we want to do is
whenever we have a quote within it we need to use a double quote these double
quotes will negate any single quotes that you have within your statement they won't
however negate another double quote so you need to make sure you aren't using
double quotes
11:33:45
within your sentence if you want to do something like that you need to use the
triple quotes like we did above so we can do double double and then let's paste
this within it and anything you do Within These triple quotes will be completely
fine as long as you don't do triple quotes within your triple quotes we'll say this
is wrong so even though it's between these two triple quotes it doesn't work
exactly again you just have to understand how that works you have to use the proper
apostrophes or quotes
11:34:16
within your string and just to check this we can always say here's our multi-line
we can always say type of multi-line and that is still a string one really
important thing to know about strings is that they can be indexed indexing means
that you can search within it and that index starts at zero so let's go ahead and
create a variable and we'll just say a is equal to and let's do the all popular
hello world let's run this and now when we print the string we can say a and we're
going to
11:34:51
do a bracket and now we can search throughout our string using the index so all you
have to do is do a colon and we can say five what this is going to do is is going
to say zero position zero all the way up to five which should give us the whole
hello I believe let's run this and it's giving us the first five positions of this
string we can also get rid of the colon and just say something like five and then
when we run this it's actually going to give us position five so this is 0o 1 2 3 4
and then five is
11:35:25
the space let's do six so we can see the ACT ual letter and that is our w we can
also use a negative when we're indexing through our string so we could say -3 and
it'll give us the L because it's NE -1 2 and three we can also specify a range if
we don't want to use the default of zero so before we did 0 to five and it started
at zero because that was our default but we could also do two to five let's run
this and now we go position 0 1 and then we start at 2 L L now we can also also
multiply strings
11:35:58
and we have this a hello world so we can do a * 3 and if we run this it'll give us
hello world three times and we can also do a plus a and that is Hello World hello
world now let's go down here and take a look at lists lists are really fantastic
because they store multiple values the string was stored as one value multiple
characters but a list can store multiple separate values so let's create our very
first list list we'll say list really quickly and then we'll put a bracket and a
bracket means this
11:36:34
is going to be a list there are other ones like a squiggly bracket and a
parenthesis these denote that they are different types of data types the bracket is
what makes a list list so to keep it super simple we'll say one two three and we'll
run this and now we have a list that has three separate values in it the comma in
our list denotes that they are separate values and a list is indexed just like a
string is indexed so position zero is this one position one is the two and position
two is the three
11:37:04
now when we made this list we didn't have to use any quotes because these are
numbers but if we wanted to create a list and we wanted to add string values we
have to do it with our quotes so we'll say quote cookie dough then we'll do a comma
to separate the value and then we'll say strawberry and then we'll do one more and
this will just be chocolate and when we run this we have all three of these values
stored in our list now one of the best things about list is you can have
11:37:33
any data type within them they don't just have to be numbers or strings you can
basically put anything you want in there so let's create a new list and let's say
vanilla and then we'll do three and then we'll add a list within a list and we'll
say Scoops comma spoon and then we'll get out of that list and then we'll add
another value of true for Boolean and now we can hit shift enter and we just
created a list with several different data types within one list now let's
11:38:09
take this one list right here with all of our different ice cream flavors we'll say
icore cream is equal to this list now one thing that's really great about lists is
that they are changeable that means we can change the data in here we can also add
and remove items from the list after we've already created it so let's go and take
ice cream and we'll say ice cream. append and this is going to append it to the
very end of the list we do an open parenthesis and let's say salted caramel now
when we run this and
11:38:42
we call it just like this it's going to take this list add salted caramel to the
end and we'll print it off and as you can see it was added to the list and just
like I said before let me go down here we can also change things from this list so
let's say ice cream and then we need to look at the indexed position so we're going
to say zero and that's going to be this cookie d right here we can say that is
equal to so we can now change that value so let's call that butter econ and now
when we call
11:39:16
it we can now see that the cookie dough was changed to butter peacon another thing
that you saw just a little bit ago is something called a list within a list
basically a nested list so we had Scoops spoon true let's give this and we'll say
nested uncore list is equal to now when we run this we now have this nested list so
if we look at the index and we say zero we'll get vanilla if we say two we'll get
Scoops and spoons now since we have a list within a list we can also look at the
index of that nested list so
11:39:52
let's now say one and that should give us just spoon and you can go on and on and
on with this you can do lists within lists within lists and all of them will have
indexing that you can call now let's go down here and start taking a look at tupal
so a list and a tupal are actually quite similar but the biggest difference between
a list and a tuple is that a tupal is something called immutable it means it cannot
be modified or changed after it's created let's go right up here we're going to say
11:40:23
Tuple and let's write our very first tupal so we'll say Tuple score Scoops is equal
to and then we'll do an open parentheses now these open parentheses you've seen if
you do like a print statement but that's different because that's executing a
function this is actually creating a tupal which is going to store data for us so
we'll say one 2 3 two and one let's go ahead and create that Tuple and we can just
check the data type really quickly and it's a tupal and just like we saw before a
11:40:56
tupal is also index text so if we go at the very first position which is a one we
will get the output of a one but we can't do something like aend and then add a
value like three if we do that it's going to say Tuple object has no attribute
append it's just because you cannot change or add anything to a tupal just like we
were talking about before typically people will use tupal for when data is never
going to change an example for this might be something like a city name a country a
location
11:41:27
something that won't change they definitely have their use cases but I don't think
they're as popular as just using a list so now let's scroll down and start taking
look at sets but really quickly let me add a few more cells for us and let's say
sets now a set is somewhat similar to a list and a tupal but they are a little bit
different in the fact that they don't have any duplicate elements another big
difference is that the values within a set cannot be accessed using an index
because it doesn't have
11:42:00
an index because it's actually unordered we can still Loop through the items in a
set with something like a for Loop but we can't access it using the bracket and
then accessing its index point so let's go ahead and create our very first set so
we're going to say daily uncore pints then we're going to say equal to and to
create a set we're going to use these squiggly brackets I don't know if there's an
actual name for those if I'm being honest I call them squiggly
11:42:25
brackets and that's what we're going to go with we're to put in a one a two and a
three so let's go ahead and run this and let's look at the type and as you can see
it is a set now when we print this out it's going to show us one a two and a three
and those are all the values within our set but if we copy this and we'll say daily
pant log this is going to be every single day maybe I had different values now when
we run this and we do the exact same thing now when we print
11:42:58
this it's going to have just the unique values within that set now a use case for
set and this is something that I've done in the past is comparing two separate sets
maybe you have a list or a tupal and you convert that into a set and that will
narrow it down to its unique values then you can compare the unique values of one
set to the unique values in another set and then we can see what's the same and
what's different so let's go down here and let's say wife's uncore daily just copy
this right here
11:43:28
we'll say is equal to let's do our squiggly lines let's do one two let's do just
random numbers so now this is my daily log and this is my wife's daily log and now
we can compare these values so let's go right down here let's say print we'll do my
daily logs and then we'll do this bar right here and this is going to show us the
combined unique values it's basically like putting them all in one second set and
then trimming it down to just the unique values so we'll take
11:44:00
wife's daily pintes log and when we run this we actually need to run this first
when we run this we should see all the unique values between these two sets and so
as you can see 0 1 2 3 4 5 6 7 24 31 so these are all the unique values between
these two sets we can also do another one and instead of this bar we're going to do
this symbol right here which I believe is called an Amper sand don't quote me on
that but when we run this it's going to show what matches that means which ones
show up in both
11:44:34
sets so the only ones that show up in both sets are 1 2 3 and five we can also do
the opposite of that by doing a minus sign and this is going to show us what
doesn't match and so we have four 6 and 31 now where is our 24 that was in our
wife's daily pints log it's in this one but we're subtracting the values on this
one so let's reverse reverse this and we'll say daily pints log and let's run it
now those are our other values so we're taking the values of this and then we're
subtracting all
11:45:07
the ones that are the same and getting the remaining values and then for our last
one we can get rid of this and we'll do this symbol right here and this is going to
show if a value is either in one or the other but not in both so let's run this so
these values are completely unique only two each of those sets now the very last
one that we're going to look at in this video is dictionaries so let's go right
down here let's add a few cells and let's say dictionaries now I saved dictionary
for
11:45:41
last because this one is probably the most different out of all the previous data
types that we've looked at within a data type we have something called a key value
pair that means when we use a dictionary it's not like a list where you just have a
value comma value comma value we have a key that indicates what that value is
attributed to so let's write out a dictionary to see how this looks so we're going
to say dictionary cream and just like a set we use a squiggly line but the thing
that
11:46:14
differentiates it is that in a dictionary we'll have that key value pair whereas in
a set each value is just separated by a comma so let's write name and this is our
key and then we do a colon and this is then where we input our value so we're going
to say Alex freeberg and then we separate that key value Pair by a comma and now we
can do another key value pair so we'll say weekly intake and a colon and we'll say
five pints of ice cream do a comma and then we'll do favorite ice creams and
11:46:51
now what we're going to do is we're going to put in here a list so within this
dictionary we can also add a list we'll do MCC from mint chocolate chip and then
we'll add chocolate another one of my favorites so now we have our very first
dictionary let's copy this and run it and let's just look at the type and as you
can see it says that this is a dictionary let's also print it out now if we want to
we can take our dictionary cream and say dot values with an open parenthesis and
when we execute
11:47:25
this we'll see all of the values within this dictionary so here's our values of
Alex freeberg five mint chocolate chip and chocolate we can also say keys and when
we run this all of the keys the name weekly intake and favorite ice creams and we
can also say items so this key value pair is one item and this key value pair is
another item now one difference between something like a list and a dictionary is
how you call the index but you can't call it by doing something like like this
where you just do a bracket oops
11:47:59
and say zero so this would in theory take this very first one right our very first
key value pair that's going to give us an error how you call a dictionary is
actually by the key so it doesn't technically have an index but you can specify
what you want to call and take it out so we're going to say name and this is going
to call that key right here and when we run this we'll get the value which is Alex
freeberg one other thing that you can do is you can also update information in a
dictionary
11:48:30
which we can't with some other data types so for this for the name it was Alex
freeberg now let's say Ste freeberg and when we update that I'm also going to print
the dictionary get rid of this so it's going to update Christine freeberg in that
value of the name so let's go ahead and run this and now it changed the name from
Alex freeberg to Christine freeberg we can also update all of these values at one
time so let's copy this and I'm going to put it right down here I'm going to say
dictionary.c
11:49:07
cream. update then we're going to put a bracket or not a bracket but a parentheses
around these so now what we're going to do is update this entire thing let me take
this say print this dictionary now we can update this to anything we want so
instead of here I can say I'll say weight and because of all that ice cream I now
weigh 300 lb so let's run this and as you can see it did not delete our key value
pair right here instead it just added to it when you're using the update
11:49:45
we can't actually delete that's the delete statement and I'll show you that in just
a second but all we did was added this new value it also is going to check and see
if you changed anything with your key value pair so we can go in here here and
change this value and we'll say 10 so now when we run this the value of this key
value pair was changed but let's say we do want to delete it we'll say deel that
stands for delete part of this dictionary cream and now let's specify the key which
will also
11:50:13
delete the value with it well let's specify the key that we want to get rid of and
let's say wait and then let's print that again and as you can see the weight was
deleted from that dictionary so that is all we're going to cover in this data types
video thank you guys so much for watching I really appreciate it if you like this
video be sure to like And subscribe below and I'll see you in the next [Music]
video hello everybody today we're going to be taking a look at comparison
11:50:52
logical and membership operators in Python operators are used to perform operations
on variables and values for example you're often going to want to compare two
separate values to see if they are the same or if they're different within Python
and that's where the comparison operator comes in right here you can see our
operators you can also see what they do so this equal sign equal sign stands for
equal we have the does not equal the greater than less than greater than or equal
to and less
11:51:18
than or equal to and honestly I use these almost every single time I use Python so
these are very important to know and know how to use so let's get rid of that
really quickly and actually start writing it out and see how these comparison
operators work in Python the very first one that we're going to look at is equal to
now you can't just say 10 is equal to 10 let's try running that really quickly by
clicking shift enter it's going to say cannot assign to literal that's because this
is like
11:51:42
assigning a variable we're trying to say 10 is equal to 10 and then we can call
that 10 later but that's not how this actually works what we're trying to do is to
determine whether 10 is equal to 10 so we're going to say equal sign equal sign and
then if we run that by clicking shift enter again it's going to say true now if we
put something else like 50 in there and we try to run this it's going to say false
so really what you're going to get when you use these comparison operators is
either a true or
11:52:08
a false if we take this right down here we can also say does not equal and we're
going to use an exclamation point equal sign and that says 10 is not equal to 50
and that should be true you can also compare strings and variables so let's go
right down here and we're going to say vanilla is not equal to chocolate and when
we run this it'll say false now if it was the same just like when we did our
numbers it should say true and we can also compare variables so we'll say x is
equal to
11:52:40
vanilla and Y is equal to chocolate and then when we come down here we can say x is
equal to Y and it'll give us a false and we say X is not equal to Y and it'll give
us a true the next one that we're going to take take a look at is the less than so
let's copy this one right up here let's scroll down and let's say 10 is less than
50 now this will come out as true now let's say we put a 10 in here before 10 was
of course less than 50 but is 10 less than 10 no that's false because they are the
11:53:17
same so if we want an output that is true all we would have to add is an equal sign
right here and this would say 10 is less than or it is equal to 10 and now it's
true of course we can say the exact same thing by saying greater than so 10 is
equal or greater than 10 that'll be true because 10 is equal to 10 we can also say
50 is greater or equal to 10 because 50 is obviously greater than 10 now let's look
at logical operators that are often combined with comparison operators so our
operators are and or and not so if
11:53:51
you have an and that returns true if both statements are true if it's or only one
of the statements has to be true and the not basically reverses the result so if it
was going to return true it would return false I don't use this not one a lot but I
will show you how it works so let's actually test that out so before we were saying
10 is greater than 50 and of course this returned false so now let's add a
parentheses around this 10 is greater than 50 and we're going to say and we'll do
an open parenthesis 50
11:54:23
is greater than 10 now this statement right here is true 50 is greater than 10 so
we have a true statement and a false statement but this and is going to look at
both of them and it's going to say they both need to be true in order to return a
true so let's try running this and we still have a false if we want it to return
true we're going to have to change this to make it a true statement so 70 is
greater than 50 and 50 is greater than 10 when we run this it should return true
now let's look at the
11:54:51
or so let's copy this and we'll say 10 is greater than 50 or 50 is greater than 10
now this is a false statement and this is a true statement so if even one of them
is a true statement the output should be true and again we can do this even with
strings so we can do vanilla and chocolate there we go and vanilla is actually
greater than chocolate because V is a higher number in the alphabetical order so V
is like 20 something whereas chocolate is three right so actually looks at the
spelling for this so if we
11:55:28
say or here it will come out true and if we say and here it should also be true
because V is greater than C and 50 is greater than 10 so this should also be true
now let's copy this right here and we're going to say not so what we had before is
50 is greater than 10 that returned true but now all we're doing is putting not in
front of it so instead of returning true it's going to return false so now let's
take a look at membership operators and we use this to check if something whether
it's a value
11:55:59
or a string or something like that is within another value or string or sequence
our operators are in and not in so it's pretty simple if it's in it's going to
return true if the sequence with a specified value is present in the object just
like we were talking about and for not in it's basically the exact same thing if
it's not in that object so let's start out by taking a look at a string we're going
to say ice _ cream is equal to I love chocolate ice cream and then we're going to
say love
11:56:30
in ice cream and that will will turn true so all we're doing is searching if the
word love or that string is in this larger string we could also just do that by
literally copying this and putting this where this is so we can check is this
string part of this string and it'll say true we can also make a list so we'll say
Scoops is equal to and then we'll do a bracket and we'll say 1 2 3 4 4 five and
then we'll say two in Scoops so all we're doing is searching to see if two is
within this list and that
11:57:04
should return true now if we put a six here and we said not in it will also return
true because six is not in Scoops and that is true and just like we did we could
also say wanted underscore Scoops and we'll say eight so I wanted eight Scoops so
we can say wanted Scoops in scoops and this should return true because there's not
an eight within the Scoops that we wanted and if we said in and we said we wanted
eight is that within our list that we created and that's going to return a false so
that
11:57:39
is a quick breakdown of comparison logical and membership operators I hope that
this was helpful thank you guys so much for watching if you like this video be sure
to like And subscribe and I will see you in the next [Music] video [Music] hello
everybody today we're going to be taking a look at the if statement within python
now it's actually the if LF else statement but that's a mouthful so I'm just going
to call it the if else statement now we have this flowchart and I apologize for
being blurry but this is
11:58:16
the absolute best one that I could find right up top we have our if condition now
if this if condition is true we're going to run a body of code but if that
condition is false we're going to go over here and go to the LF condition the LF
condition or statement is basically saying if the first if statement doesn't work
let's try this if statement if this LF statement is true it goes to this body of
code if it's false it'll come over here to the else and the else is basically if
all these things don't work
11:58:43
then run this body of code now you can have as many ill if statements as you want
but you can only have one if statement and one else statement so let's write out
some code and see how this actually looks let's first start off by writing if that
that is our if statement and now we have to write our condition which is about to
be either met or not met so we'll say if 25 is greater than 10 which is true we'll
say colon and then we're going to hit enter and it's going to automatically indent
11:59:10
that line of code for us and this is our body of code so if 25 is greater than 10
our body of code will execute so for us we're just going to write print and we'll
say it worked now if we run this it's going to check is 25 greater than 10 if that
is true true print this so let's hit shift enter and it worked now let's take this
exact code we'll paste it right down here and we'll say is less than and right now
this if statement is not true so it's not actually going to
11:59:41
work as you can see there's no output there's nothing that happened really but it
did check to see if 25 was less than 10 but it just wasn't true now we can use our
else statement so we're going to come right down here and we're going to say else
and we'll do a colon and we'll hit enter again automatically indenting and we're
going to say print and we're going to say it did not work dot dot dot so what it's
going to do is it's going to come up here and check is 25 less
12:00:08
than 10 no it's not so this body of code is not going to be executed it's going to
go right down to this else statement now this else statement is going to be printed
there's no condition on this so the if statement has a condition 25 is less than 10
this has no condition so if this doesn't work if this is false it's going to come
down here and it will run this body of code let's run this by clicking shift enter
and as you can see our output is it did not work now let's
12:00:34
go back up here and put greater than because this is now true it's going to say if
25 is greater than 10 print it worked and then it's going to stop it's not going to
go to this lse statement at all so let's run this and our output is it worked so
what if we have a lot of different conditions that we want to try let's come right
down here this is where the LF comes in so so really quickly let's change this to a
not true a false statement we're going to go down and say
12:01:00
LF and we're going to say if it is and let's say 30 we'll say LF worked so now it's
going to check is 25 less than 10 no it's not let's look at the next condition is
25 less than 30 and if it is we'll print L if worked so let's try running this and
L if worked now we can do as many of these LF statements as we want we can do let's
just try a few of them right here so we'll say if 25 is less than 20 is less than
21 and let's do 40 and let's do 50 so
12:01:43
we'll say LF lf2 lf3 and lf4 now if you look at this the first one that is actually
going to work is this 25 to 40 right here once this one is checked and it comes out
as true none of the other LF or L statements will work so let's try this one it
should be lf3 and this one ran properly now within our condition so far we've only
used a comparison operator we can also use a logical operator like and or or so we
can say if 25 is less than 10 which it's not and let's say or actually and we'll
12:02:19
say or 1 is less than three which is true if we run this now it will actually work
so we can use several different types of operators within our if statement to see
if a condition is true or not or several conditions are true there's also a way to
write an IFL statement in one line if you want to do that so we can write print
we'll say it worked and then we'll come over here and say if 10 is greater than 30
and then we'll write else print and we'll say it did not work just like we had
before
12:02:55
except now it's all occurring on one line so let's just try this and see if it
works so it's saying print it worked if 10 is greater than 30 which it wasn't so it
went to the lse statement and then it printed out our body right here although we
didn't have any indentation or multiple lines it was all done in one line now
there's one other thing that we haven't looked at yet uh and I'm going to show it
to you really quickly and that's a nested if statement so when we
12:03:20
run this it's going to say it worked it works because it says 25 is less than 10 or
one is less than three since this is true it's going to print out it worked but we
can also do a nested if statement so we can do multiple if statements as well so
we're going to hit enter and we'll say if and we'll do a true statement here so
we'll say if 10 is greater than five let's do a colon hit enter then we'll say
print and then we'll type A String saying this nested if
12:03:50
statement oops worked now let's try this out and and see what we get so it went
through the first if statement it said it was true and it prints out it worked this
is still the body of code so it goes down to this next if statement and it says if
10 is greater than five we're going to print this out and you could do this on and
on and on it can basically go on forever and you can create a really in-depth logic
and that actually happens a lot when you start writing more advanced code so I hope
that this was
12:04:19
helpful I hope that you understand the IFL statement better I hope that you
understand how nested if statements work as well thank you guys so much for
watching if you like this video be sure to like And subscribe below and I'll see
you in the next [Music] video hello everybody today we're going to be learning
about for Loops in Python the for Loop is used to iterate over a sequence which
could be a list a tupal an array a string or even a dictionary here's the list that
we'll be working
12:04:55
with throughout this video and I have this little diagram right here which kind of
explains how a for Loop works the for Loop is going to start by looking at the very
first item in our sequence or our list and that's going to be our one right here
it's going to ask is this the last element in our list and it is not so it's going
to go down to this body of the for Loop now we can have a thousand different things
that can happen in the body of the for loop as we're about to look out in just a
12:05:21
second then it's going to go up to the next element and ask is this the last
element reached so it'll be no again because we'll be going to the two and then the
three and then the four and the five once it reaches the five it'll go to the body
the for Loop and then when it asks if that's the last element the answer would be
yes because it's iterated through all the items within the list and then we would
exit the loop and the for Loop would be over now that may not have made perfect
sense but
12:05:48
let's actually start writing out the syntax of a for Loop so we can understand this
better to start our for loop we're going to say four and and then we're going to
give it a temporary variable for this for Loop so it's a variable as it iterates
through these numbers it's going to assign the variable to that number so for this
one we're just going to say number because it's pretty appropriate because these
are all numbers and then we're going to say in integers now right here you can
12:06:14
put just about anything this could be the list this could be a tuple this could be
a string even but that is what we're going to iterate through so we're saying for
the variables each of these numbers within this list of integers and then we're
going to write a colon this is the body of code that's going to actually be
executed when we run through and iterate through our list so for our first example
we're going to start off super simple and all we're going to do is say print open
parentheses and say
12:06:42
number as it iterates through the 1 2 3 4 and five number becomes our variable that
is going to be printed so during that first loop our one will be printed because
that will be assigned right here then through the next iteration the two will be
assigned and'll be put right here in each Loop until the very end so let's hit
shift enter and as you can see it did exactly that now in this body and I'll copy
and paste this down here in this body we really can do just about anything we want
we don't even have to use this
12:07:15
variable number right here we can just print yep if we wanted to and what it's
going to do is for each iteration all five of those every time it Loops through
it's going to print off yep so let's hit shift enter and it printed it off for us
so really we weren't even using the numbers within the list we were really just
using it as almost a counter now let's copy this integers once again let's go right
up here and let's go copy this for Loop that we wrote now we do not have to call
this
12:07:49
number this can be anything you want any variable name that you'd like to name it
we could call it jelly and we can do jelly plus jelly I think you're getting the
picture right when it Loops through that one it's doing 1 plus one when it Loops
through the two it's doing two plus two that is basically how a four Loop works now
for a dictionary it's going to handle it a little bit differently so let's create a
dictionary really quickly so we'll say ice cream dictionary is equal to we're going
12:08:22
to do a squiggly brackets so we're going to say name and we're going to say colon
we need to assign our value for that item so we're going to say Alex freeberg we'll
do our next one separated by a comma and we'll say weekly intake and I'll say five
Scoops per week the next one we will do is favorite ice creams and for this one
we're going to do something a little bit different for this we're going to have a
list within this dictionary so we'll say within our list of my favorite ice creams
we'll say
12:08:56
mint chocolate chip and I'll just do MCC for that and we'll separate that out by a
comma and we'll say chocolate so now we have this dictionary ice cream dick and
within it we have my name my weekly intake and my favorite ice creams with a list
in there as well let's hit shift enter and now we're going to start writing our for
Loop now the for Loop is going to look very similar but to call a dictionary it's
just a little bit different so we're going to say four the cream in icore
12:09:29
creamore dictionary. values and then we're going to do parentheses and then a colon
now we're going to print the cream so in order to indicate what we actually want to
pull we have to specify within the dictionary what we want are we pulling the item
are we pulling the value we need to specify this so that's why we have this dot
values right here so let's run this and see what we get so as you can see we are
pulling in the values right here that's why we're pulling in Alex freeberg 5 and
mint chocolate chip
12:10:01
SL chocolate now we are able to call both of those both the key and the value so
let's go right down here and we can do both the key and the value so we can pull
two things at one time and we're going to do this by saying do items so we could
also do do key if we just wanted to do a key but we want to do items so we going to
do both of them so we're going to go right down here and say for key and value in
ice cream dictionary. items print and let's write key and then we'll do a comma and
then
12:10:35
let's give it a little arrow or something like that uh something like this and then
we'll do a comma and we'll say value and let's print this off and see what we get
so it's looping through and for each key and value it's saying here is the key so
that's the name then we have weekly intake then we have favorite ice creams it's
giving us a little arrow and then we're also printing off the value so we have name
Alex freeberg weekly intake five favorite ice creams mint chocolate chip
12:11:03
and chocolate so now let's talk about nested for Loops we've looked at for Loops we
understand how they work and why they do what they do but what about a nested for
Loop a for Loop within a for Loop for this example let's create two separate lists
let's create flavors and let's make that a list by making it a bracket we'll do
vanilla the classic chocolate and then cookie dough all great flavors so that's our
first list and then we're going to say toppings and we'll do a bracket for that as
well and
12:11:41
we'll say hot fudge and then we'll do Oreos and then we'll do marshmallows is how
you spell marshmallows I think it's an e that looks wrong I might be spelling it
wrong but that's okay so let's save this by clicking shift enter and now we have
our flavors and our toppings so now let's write our first for Loops we're going to
say 41 as in our number one for loop we're going to say in flavors and we'll do a
colon we'll click enter now we can write our second for Loop so we're going to say
4
12:12:20
two in toppings and then we'll do a colon and enter and then we're going to say
print and we'll do an open parenthesis and then we're going to say one so we're
printing the one in flavors and then we're going to say one comma I'm going to say
topped with comma 2 so what this is essentially going to do is we're going to say
for one we're going to take the very first one in flavors and then we're going to
Loop through all of two as well so we're going to Loop
12:12:52
through hot fudge Oreo and marshmallows and once we print that off then we will
Loop all the way back to Flavors and look at the next iteration or the next
sequence within the first for Loop so let's run this really quickly and see what we
get so as you can see it goes vanilla vanilla vanilla and vanilla is topped with
the hot fudge the Oreos and the marshmallows and then we start iterating through
our second one in our first four Loop so there's that hierarchy so we're iterating
completely through this one
12:13:24
before we actually go to the very first for Loop and start iterating through that
one again now that is essentially how a nested for Loop works these nested for
Loops can get very complicated in fact for Loops in general can get very
complicated the more you add to it and the more you're wanting to do with it but
that is basically how a for Loop and a nested for Loop works thank you guys so much
for watching be sure to like And subscribe below and I'll see you in the next
[Music] video [Music]
12:13:59
hello everybody today we're going to be taking a look at while Loops in Python the
while loop in Python is used to iterate over a block of code as long as the test
condition is true now the difference between a for Loop and a while loop is that a
for Loop is going to iterate over the entire sequence regardless of a condition but
the while loop is only going to iterate over that sequence as long as a specific
condition is met once that condition is not met the code is going to stop and it's
not
12:14:23
going to inter through the rest of the sequence so if we take a look at this
flowchart right here we're going to enter this while loop and we have a test
condition right here the first time that this test condition comes back false it's
going to exit the while loop so let's start actually writing out the code and see
how this while loop works so let's create a variable we're just going to say number
is equal to one and then we'll say while and now we need to write our condition
that needs to be met
12:14:46
in order for our block of code beneath this to run so we're going to say while
number is less than five and then we'll do colon enter and now this is our block of
code we're going to say print and then we'll say number now what we need to do is
basically create a counter we're going to say number equals number + 1 if you've
never done something like this it's kind of like a counter most people start it at
zero in fact let's start it at zero and then each time it runs through this while
loop it's going
12:15:13
to add one to this number up here and then it's going to become a one a two a three
each time it iterates through this while loop now once this number is no longer
less than five it'll break out of the while loop and it will no longer run so let's
run this really quick by hitting shift enter so it starts at zero and it's going to
say while the number is less than five print number so the first time that it runs
through it is zero and so it prints zero and then it adds one two number and then
it
12:15:42
continues that y Loop right here and it keeps looping through this portion it never
goes back up here to this line of code this is just our variable that we start with
and then once this condition is no longer met once it is is false then it's going
to break out of that code now that we basically know how a y Loop Works let's look
at something called a break statement so let's copy this right down here and what
we're going to say is if number is equal to three we're going to break now with the
12:16:10
break statement we can basically Stop the Loop even if the while condition is true
so while this number is less than five it's going to continue to Loop through but
now we have this break statement so it's going to say if the number equals three
we're going to break out out of this while loop but if this is false we're going to
continue adding to that number just like normal so let's execute this so as you can
see it only went to three instead of four like before because each time it was
running
12:16:36
through this while loop it was checking if the number was equal to three and once
it got to three this became true and then we broke out of this while loop the next
thing that I want to look at and we'll copy this right down here is an else
statement much like an if statement but we can use the lse statement with a while
loop which runs the block of code and when that that condition is no longer true
then it activates the else statement so we'll go right down here and we'll say else
and
12:17:00
we'll do a colon and enter and then we'll say print and we'll say no longer less
than five now because this if statement is still in there it will break so let's
say six and then we'll run this and so it's going to iterate through this block of
code and once this statement is no longer true once we break out of it we're going
to go to our else state St now as long as this statement is true it's going to
continue to iterate through but once this condition is not met then it will go to
12:17:30
our L statement and we'll run that line of code now the L statement is only going
to trigger if the Y Loop no longer is true if we have something like this if
statement that causes it to break out of the while loop the L statement will no
longer work so let's say if the number is three and we run this the L statement is
no longer going to trigger so this body of code will not be run now the next thing
that I want to look at is the continue statement if the continue statement is
triggered it basically
12:17:55
rejects all remaining statements in the current iteration of the loop and then
we'll go to the next iteration now to demonstrate this I'm going to change this
break into a continue so before when we had the break if the number was equal to
three it would stop all the code completely but when we change this to continue
which we'll do right now what it's going to do is it's no longer going to run
through any of the subsequent code in this block of code it's just going to go
straight up to the
12:18:21
beginning and restart our while loop so what's going to happen when we run this is
it's going to come to three it's going to become three it's going to continue back
into the while loop but it's never going to have that number changed to be added to
one to continue with the while loop this will basically create an infinite Loop
let's try this really quickly and as you can see it's going to stay three forever
eventually this would time out but I'm just going to stop the code really quick so
if we
12:18:46
just change up the order of which we're doing things we're going to say there and
we're going to put this down here so what it's going to do now instead of printing
the number immediately and then adding the number later we're going to add the
number right away and then we're going to say if it is three we're going to
continue and it's going to print the number so let's try executing this and see
what happens so as you can see we no longer have the three in our output what
12:19:11
it did was when we got to the number three it continued and didn't execute this
right here which prints off that number so that really is the basics of the while
loop I hope that this was helpful I hope that you learned something in this video
If you did be sure to like And subscribe below and I'll see you in the next [Music]
video hello everybody today we're going to be taking a look at functions in Python
a function is a block of code which is only run when you call it so right here
we're defining our function
12:19:49
and then this is our body of code that when we actually call it is going to be ran
so right here we have our function call and all we're doing is putting the function
with the parenthesis that is basically us calling that function and then we have
our output throughout this video I'm going to show you how to write a function as
well as pass arguments to that function and then a few other things like arbitrary
arguments keyword arguments and arbitrary keyword arguments all of these things are
really
12:20:13
important to know when you are using functions so let's get started by writing our
very first function together we're going to start off by saying DF that is the
keyword for defining a function then we can actually name our function and for this
one we're just going to do first underscore function and then we do an open
parenthesis and then we'll put a colon we'll hit enter and it'll automatically
indent for us and this is where our body of code is going to go now within our body
of code
12:20:38
we can write just about anything and in this video I'm not going to get super
Advanced we're just going to walk through the basics to make sure that you
understand how to use functions so for right now all we're going to say is print
we'll do an open parenthesis we'll do an apostrophe and we'll say we did it and now
we're going to hit shift enter and this is not going to do anything at least you
won't see any output from this if we want to see the output or we actually want to
run that function and
12:21:02
some functions don't have outputs but if we want to run that function what we have
to do is just copy this and put it right down here and now we're going to actually
call our function so let's go ahead and click shift enter and now we've
successfully called our first function this function is about as simple as it could
possibly be but now let's take it up a notch and start looking at arguments so
let's go right down here and we're going to say Define number underscore squared
we'll do a
12:21:30
parenthesis and our colon as well now really quickly when you're naming your
function it's kind of like naming a variable you can use something like X or Y but
I tend to like to be a little bit more descriptive but now let's take a look at
passing an argument into a function the argument is going to be passed right here
in the parentheses so for us I'm just going to call it a number and then we're
going to hit enter and now we'll write our body of code and all we're going to do
for this is type
12:21:55
print and open parenthesis and we'll say number and we'll do two stars at least
that's what I call it a star and a two and what this is going to do is it's going
to take the number that we pass into our function it's going to put it right here
in our body of code and then for what we're doing it's going to put it to the power
of two and so when the user or you run this and call this function this number is
something that you can specify it's an argument that you can input that will then
be run in
12:22:22
this body of code so let's copy this right here and then we'll put it right down
here into this next cell and we'll say five and so this five is going to be passed
through into this function and be called right here for this print statement let's
run it and it should come out as I believe 25 that is my fault I forgot to actually
run this block of code so I'm going to hit shift enter so now we've defined our
function up here and now we can actually call it so now we'll hit shift enter and
we got
12:22:51
our output of 25 now now in this function we only called one argument but you can
basically call as many arguments as you want you just have to separate them by
commas so let's copy this and we'll put it right down here now we'll say number
squared uncore custom and then we'll do number and then we'll do power so now we
can specify our number as well as the power that we want to raise it to so instead
of having two which is what you call hardcoded we can now customize that and we'll
have power
12:23:23
power and now when we call this function we can specify the number and the power
and both of those will go into this body of code and be run and we can customize
those numbers so let's copy this and we'll say 5 to the power of three and let's
make sure I ran this so let's do shift enter and now we will call our function and
let's hit shift enter and we got 5 to the^ of 3 which is 125 and just one last
thing to mention is if you have two arguments within your function and you
12:23:57
are calling it right here you have to pass in two arguments you can't just have one
so if we have a five right here it's going to error out we have to specify both
Arguments for it to work now let's take a look at arbitrary arguments now arbitrary
arguments are really interesting because if you don't know how many arguments you
want to pass through if you don't know if it's a one a two or a three you can
specify that later when you're calling the argument so you don't have to do it
upfront and
12:24:24
know that information ahead of time so let's define our function so we're going to
say Define and then we're going to say number underscore args and we'll do an open
parenthesis and a colon now within our argument right here typically we would just
specify here's what our argument will be it will be number or it will be a word
right but what we're going to do is something called an arbitrary argument so it's
unknown so we're going to put star and then we'll say args now you will see
something
12:24:52
exactly like this typically if you're looking at tutorials that'll have star args
in there or if you're looking at just a generic piece of code this is what it will
look like but for us we're going to actually put number so again we have the star
and then we have our arbitrary argument right here and then we'll hit enter and
we're going to say print open parentheses and this is where it's going to get a
little bit different so we're going to say number and then we're going to do an
open bracket and
12:25:18
let's say zero and then we'll do that times and then we'll say number again with a
bracket of one so in a little bit once we run this and then we call this number
args function right here we're going to need to specify the number zero and the
number one that's going to be called so let's go ahead and run this and then we are
going to call it and let's say 5 comma 6 comma 1 2 8 so right up here we did not
know how many arguments we were going to pass through it could be five it could be
a thousand
12:25:53
we could also call in a tuple and that's what this is right here we're calling in a
tupal so what it's going to do now is when it calls this number it's going to call
the very first within that tupal which will be that five and then it'll also call
in this number which will be the first position which is the six so let's hit shift
enter and it's going to multiply these numbers together so 5 * 6 is equal to 30 now
like I just said this is a tuple so we don't actually have to
12:26:18
write out these numbers like we just did we can pass through a tuple when we are
actually calling this function let's do that right up here let's just create um
let's call it argor Tuple and we'll do open parentheses and we'll do the same
numbers let's just copy it make it easier and now we've created this tupal right
here which we can then pass in and this is a lot more handy a lot more specific and
this is most likely how someone would do something like this but let's now create
this and now we can
12:26:52
copy args Tuple and pass it through now really quickly this is going to fail and
I'm doing that on purpose but I want to show you what you need to do in order to
pass through this tupal so right now it's going to say Tuple index is out of range
all you have to do in order to use this is you have to specify a star before it
just like you did when you're creating your argument up here you have to put a star
in front of our Tuple that we just passed through and now let's try running
12:27:19
this and now it works properly now the last two things that we're going to look at
are keyword arguments and arbitrary keyword arguments there are more things that
you can learn and do within functions but again I'm just trying to teach you the
basics to make sure that you understand how they work so let's go right up here and
a keyword argument is kind of similar to this right here and let's actually copy
this and put it right down here now a keyword argument is very similar in that
you're going to
12:27:47
specify your arguments right here but what we did up here let me bring this down
when we actually called the function what we did was we just put in a five and a
three and when we did that it automatically assigned number to five and power to
three and that's totally fine and you can do that but if you want a little bit more
control you can use a keyword argument so right here we could say power is equal to
five and number is equal to three so I just switched it around right number was
assigned to five
12:28:22
and Power was assigned to three but I just switched it to show you how this might
work so let's run both of these and now it's 3 to the^ of 5 which is 243 so that
essentially is a keyword argument again it just gives you a little bit more control
you don't have to put them in specific positions like if you're just calling
multiple arguments now let's come right down here we're going to create basically
another custom function uh so for this one we're going to write Define number
underscore
12:28:51
bar and then we'll do an open parenthesis a colon and enter and what this one is is
this one is a keyword argument or an arbitrary keyword argument now to specify an
arbitrary argument all we did was a star and then we input number but if we're
doing a keyword argument we actually have to have two stars right here so let's
start taking a look and again if you're doing arbitrary it means we don't really
know how many keyword arguments we want to pass into our function so we're just
12:29:21
going to put star our number and then later within our body of code and when we're
calling it we'll be able to specify it and just like the arbitrary argument before
the arbitrary keyword argument means we really just don't know how many keyword
arguments we're going to need to pass into our function so to demonstrate this
let's write print do an open parenthesis and we'll say my oops need to do an
apostrophe my number is we'll do just like that little space and we'll say
12:29:51
plus and this is kind of where it gets a little interesting or a little bit more
tricky so we're going to say is number so This Is Us calling our number and then
we're going to do a bracket and then I'm actually going to go to calling the
function it's a little bit backward or a little bit different than what you might
think but when we're calling it what I'm going to do is I'm going to say integer is
equal to let's just do some random number now when we're calling
12:30:17
that keyword within our body of code what we're going to do is we're going to
actually type out integer just like this and this looks a little bit different but
what this allows us to do is we can put as many keyword arguments in here as we
want later and I'll show you in just a second but for us we're just creating this
key and this value when we are calling it within the function so now when we create
this and we run this oh whoops I forgot this has to be a string um so let's run
this
12:30:47
again now we will say my number is 2309 then we're we're going to add we'll say
plus and this isn't going to look great but we'll say my other number because this
will all be in the same line that's okay my other number and then we'll say number
and we can specify again what we want in there so now we can go down here to where
we're calling it we'll put a comma and we'll say integer oops integer 2 is equal to
we'll do a random number and then we'll put in two right
12:31:23
here and then we'll add plus right here so we don't error out we'll create this
we'll run this and as you can see both numbers were passed through again the syntax
is terrible but now you can see that you have this arbitrary keyword argument right
here and all we have to do is put number number and we can pass through as many of
these arbitrary keyword arguments as we want as long as we just specify within our
function when we're calling it so that's all we're going to look at in today's
video on
12:31:52
functions there are of course other things that you can do within functions and it
can get a little bit more advanced but I wanted to show you the basics the meat and
potatoes of things I definitely think you should know in order to get started using
functions I hope that you were able to understand functions better because of this
video if you did be sure to like And subscribe below and I will see you in the next
[Music] video hell hello everybody today we're going to be talking about converting
12:32:25
data types in Python in this video I'm going to show you how to convert several
different data types including strings numbers sets tupal and even dictionaries so
let's start off by creating a variable we'll say numor int is equal to 7 and we can
check that data type by saying type and then inserting our variable number undor
int and that will tell us that our data type for this variable is an integer let's
go ahead and create another one we're going to say num underscore string is equal
to
12:32:55
and for this one we'll also do a seven but let's check the type and we'll do an
open parentheses and we'll say the type of num string and that one is a string now
let's say we wanted to add those we'll say Num uncore Sum so the sum of numor int
plus numor string now when we're adding these two values it is not going to work
it's going to give us an error and it's going to say unsupported op brand for INT
and string so it cannot add both an integer and a string what we
12:33:27
need to do in order to add these two numbers is to convert that string into an
integer so let's go right up here let's add another cell and let's say numor string
undor converted is equal to and we want to convert it into an integer so all we
have to do to convert it into an integer is type int and then we're going to say
num underscore string and that is as easy as it's going to get all we have to do is
say integer with our numb string inside of it and then it's going to convert it and
we can even
12:34:03
check it right after by saying type num string converted and let's run this and now
we can see that it was converted into an integer so now let's add that num string
converted right here let's copy and replace that string with the string converted
and let's actually print out that numor sum and it worked properly now we did not
specify what type of value this Num Sum was going to be but because it was two
integers in here it's going to automatically apply that data type of integer to
that Num Sum let's go
12:34:41
right down here and now let's look at how we can convert lists sets and tupal so
now let's say we have a listor type and that's equal to 1 2 3 and we can check it
again by saying type and that is a list let's say we want to convert it to a tupal
it's fairly easy all we're going to do is write Tuple say listor type that list
uncore type is now going to be a tupal and we can check that by saying type and
wrapping it around this Tuple and it shows us that it is converting that list
12:35:19
into a tupal now we can also convert a list into a set but it may change the actual
values within it let's check that out really quickly so let's say we have this list
and let's add a few more values to this just like that now let's say we want to
convert it to a set so we're going to run this and we'll say set of listor type and
let's try running this and see what the output is so this is something that you
really need to be aware of when you are converting data types because set does not
act the same
12:35:54
as a list a set is basically going to take the unique values in the list and
convert it to a set and it fundamentally changes the data that was in that original
list and just to check the data type we can say type I'm just doing this for all of
them and as you can see that is now a set now let's go down here and take a look at
dictionaries now let's say we have a dictionary called dictionary type and we'll do
a squiggly bracket and we'll say name name and we'll do a colon and
12:36:24
we'll say Alex then we'll do age and a colon and we'll say 28 and then we'll do
hair colon and so really quickly let's take that dictionary type and just confirm
that it is a dictionary and it is and now what we're going to do is take a look at
all of the items within that dictionary so we're going to do dictionary type. items
open parenthesis and this is going to show us all the items within it now we can
also take this and look at something like the values and when we run that these are
12:37:07
our values So within our dictionary we have items and that's what this is right
here this is one item and then within that we have our values which are right here
so Alex 28 and Na and then we have something called a key and this is the key the
name age and hair are all keys and we can look at that by saying dot keys so let's
say we want to take all of the keys and put that into a list what we're going to do
is we're going to take this right here say list we'll do an open parenthesis we'll
12:37:39
type that in right there so it says a list and we're converting these Keys into a
list and let's run that and now this is a list and let's just check the type as
well just to confirm and as you can see it was converted properly into a list and
we can do the exact same thing with values and the values can also be converted
into a list now we can also convert longer strings that aren't just numbers like we
did above in our very first example so let's do longcore string and we'll say I
like to party now
12:38:17
we're going to take this string and we're going to say list long string so we're
going to convert this string into a list and let's see what happens so it took
every single character in that string and put it into a list and we could also do a
set as well that one's a lot shorter because it's only looking at unique values so
that is how you convert data types in Python thank you guys so much for watching I
really appreciate it if you like this video be sure to like And subscribe below and
I'll see you in
12:38:45
the next [Music] video [Music] hello everybody today we're going to be working on
building a BMI calculator in Python now before we get started I want to show you
this BMI calculator that I found online and it shows you the basic calculation that
they use and that's the one we're going to use in this video and they also have
this calculator right down here and some ranges that we can use for our calculator
as well so for reference I weigh about 170 I'm about 5 9 let's calculate this
12:39:24
so I'm about a 25.1 BMI which falls into the overweight category that's unfortunate
but we can see exactly how this works and how ours should work when we actually
build it so we're going to kind of reference this throughout the video so let's go
right over here to our BMI calculator we need to calculate weight and height and
then run this calculation right here so let's go ahead and copy this and we're
going to put it right down here here and so now we have our calculation
12:39:57
so what we need is we need input from a user and there is an input function within
python that we're going to be using so let's actually give me a few more cells so
the first thing that we need to calculate is their weight let's type out weight
right here we'll say weight is equal to and this is where we'll use our input
function so we'll say input and when we actually run this it's just going to give
us this blank square or a user can input something we'll say Alex so this is our
output is
12:40:24
what the actual user input and it does save it to this variable so if we say print
weight it will still print out Alex now this is where we want the user to just like
we did before where they'll input their weight so we want to kind of give them a
prompt for this we'll put a string in here so I'll do a double quote and then I'll
say enter your weight in and we're using pounds say pounds colon space so now when
we do this it'll say enter your weight in pounds I'll say 170 and then when we run
12:41:01
this it does store that now let's do print I should have saved it wait again oops
now it's only storing the value of 170 it's not actually storing this string right
here so that's really important for when we do our calculations later um I'm going
to I'm going to save this right down here because I'm sure I'm going to use that
later um so we have that it's working now we need to also do our height so let's
copy this and we'll put it right here and we'll do
12:41:32
height and enter your height in inches so now for this one if we hit enter it's
actually running let's stop it really quick and interrupt it let's try running this
so it's going to say enter your weight and pounds that's the first input say 170
and then when I hit enter it's going to prompt me for that second input and so in
inches 59 is 69 in and then I can hit enter again and now we have both of our
inputs now we need this calculation right down here and just like that so
12:42:11
now we have weight in pounds time 703 divided by height in inches by height in
inches so we actually have weight and it's already written in there but I'm just
going to do like this we'll do weight time 73 so that's pounds there our weight and
pounds * 703 divided by now we have our height in inches times the height in inches
so this is our calculation right here so let's do this exact same thing let's run
this and this times of course is not going to work whoops we need to do our
12:42:45
star for both of these all right now this is our calculation so let's run this so
we have 170 and that's pounds and inches was 69 hit enter and it says cannot
multiply the sequence of non- integer type of string Ah that's because these are
being stored in strings they right down here I do and we'll do type of height we
run that this is actually a string so we want to change that because we don't need
that anymore that so we don't want it to be a string we need those to be integers
or Floats
12:43:24
or really anything besides a string it just needs to be numerical uh so integer
float really so let's do integer and we'll wrap that input in it and we'll do the
same thing for this one now we have an integer for our weight an integer for our
height so now when we're running this calculation it should work properly let's run
this again our pounds are 70 our height is 69 in and it's not giving us our output
because we're not printing anything okay so I just need to do
12:44:00
print BMI so let's try this again 170 69 and there is our BMI 25.1 so it worked the
exact same as this one so they input well we input our height we inputed our or we
inputed our weight we inputed our height and then it calculated rbmi the next thing
that we need to do is we need to kind of give the user some context is that good is
there BMI in within a good range a bad range we don't know uh so let's go ahead and
I'm going to see if I can copy this know if this will work or not let's go ahead
and copy this right
12:44:37
down here perfect so what we now need to do is we need to say okay if the user has
given us this input we want to give them or tell them if they are a normal weight
overweight obese severely obese anything like that and we have these ranges so that
should help us out quite a bit so let's just write our if statement and then we'll
include it up here but let's go down here and we'll say if and then we'll do BMI
and let's just say BMI is greater than zero so if it's greater than zero if they
had any
12:45:13
input where the BMI was not zero which should be every time if they do it properly
and they don't you know put a string in there or something or type out 40 which
maybe we should make a prompt for that if that happens then we can say if we'll do
BMI and now we need to give that first range so this range right here so if it's
under 18.5 so we need to do a less than so if it's less than 18.5 and it just says
under it doesn't say under or equal to so I'll keep it at 18.5 so if it's under
12:45:47
18.5 then let's give kind of the output we'll say print and the output or the
basically the prompt is underweight so we'll just say you are under under case
underweight and just like that um then we're going to pass several ellf statements
through here well let's just say else so I guess this would be like if they are if
they don't input something properly if something messes up maybe I we could write
something like um print oops I'm thinking all this through we can write print
12:46:30
enter valid inputs or something like this or we can always change that but let's
really quickly let's run this okay so I'm not in that range uh let's make the next
one so then I can be within a certain range oops and we need we should need one one
more a minimum so we'll say LF and LF these next two are this 24.9 so it's going to
check this one first so if it's 18.5 or below 18.5 it's automatically going to
print this one so this next one we don't have to do like a range or
12:47:11
anything we can just say if it's below if it's between 25 and 29.9 so this one
actually should be less than or equal to um this one is normal oh whoops 24.9 so
this one is 24.9 this one is going to say you are normal weight so let's run this
now let's see BMI was 25.1 oh guys I'm just messing up here I apologize all right
this is the one that I was part of so now it's going to be I'm part of the
overweight crowd now now let's run this and now our prompt is you are overweight
cuz remember the BMI was
12:47:55
saved right here as 25.1 down here if we run through this it's saying no you're not
in oops get rid of that no you're not in under 18.5 you're not under 24.9 if you
under 29.9 you are overweight so that did work properly so that's really good and I
don't think I want this to be our output for person because we're going to add this
up here it's just going to give us the BMI and then the output is going to say you
are overweight uh let's make it a little bit more customized um I'm
12:48:31
going to say name is equal to input and then we'll say enter your name um so it'll
be enter your name we'll do Alex 70 69 there's our BMI now it's going to run
through this logic or it will run through this logic and just just a second when we
actually finish this so then we have 34.9 and let's do one more oops and then this
one's going to be for 39.9 so this one was overweight this one is obese severely
obese so we'll say severely that you spell it really obese and then anything that's
over that 40
12:49:23
and over so if it's not this one anything else should be S morbidly obese so
actually this lse statement right here should say uh you are you are severely obese
this is going to say morbidly morbidly obese now I added that name up here because
I wanted to add that down below actually so we're we're going to say uh name plus
and then we'll do like comma you are underweight so it'll be a little bit more
personalized uh I think it'll I think it'll be a nice touch I really do we'll do it
like this and
12:50:09
we'll say you and let's go back and do that to all of them and let me see how
quickly I can do thiss oh whoops what I do get rid of that name plus u like that
geez you guys are seeing me mess up a h name plus you and then name plus you so now
let's run this and now it's a little more personalized it says Alex you are
overweight so this is all really good now this is an if statement um what we had
done before I think is actually what we should put right down here so we'll say l
else and
12:50:52
then if that doesn't work we'll say what do we say enter valid input we'll just put
that um and let let me see if I can test this out don't I don't know if this will
error out or if this will even work let me just see if I can mess with it and see
if I can get it to work actually let's copy this we're going to copy this whole
thing we're going to include it right here and now we have basically our entire
calculator so um let's run this enter your name we'll say Alex enter
12:51:28
your pounds 170 into your inches 69 and then it's going to say 25.1 Alex you are
overweight and that's perfect we could even go as far as adding like some feedback
we say you are overweight and then it would be a period and we could say um you
need to exercise more stop sitting and writing so many python tutorials so now if
we run this we'll do Alex 17069 it says Alex you are overweight you need to
exercise more and stop sitting and writing so many python tutorials period and
that's it this is
12:52:13
the entire project um you can go a ton farther you can include much more complex
logic you could even build out a UI to create your own you know app just like this
where it has this input and this UI you can build that out with in jupyter
notebooks with python um but that's not really what this tutorial is for this is
just to kind of help you um think through some of the logic of creating something
like this so you know I hope that this was helpful I hope that this was fun I like
creating stuff like
12:52:44
this we have two other projects that we're going to do and maybe I'll include more
but we have two right now that I have planned um and I hope those those are helpful
this is probably our easiest one and they'll get a little bit more difficult in the
next projects so I hope that this was fun I hope that this was helpful and that you
can now kind of utilize those python skills that you've been working on if you like
this video be sure to like And subscribe below and I'll see you in the next
12:53:09
[Music] video hello everybody today we're going to be creating an automatic file
sorder for your files and file explorer now out of all the projects that we've done
in this series so far I think this one might be the most difficult but I also think
this one is the most cool because it has some real life applications so without
further Ado let's take a look at some files that we have right down here in my file
explorer so I have this beautiful picture of Rosie uh right here this is a PNG file
I have a CSV file and
12:53:45
a text file and I want to sort all of them into their own folders depending on what
kind of file it is so if I go right in here and I click on this one I go to
properties I can see that this is a PNG file um if I go into this one I don't need
to but if I go into this one it's a CSV file and of course this one is a text file
so I want three separate folders in here and I want them to automatically go into
those folders without me having to drag and drop and going and clicking now we only
have four
12:54:18
files here but imagine if we have thousands of files how much time that could save
us so let's get out of here and let's start writing our code so we're going to say
import OS comma and then we're going to say chut iil now OS obviously stands for
operating system shuil uh I don't know what it actually supposed to stand for but
what it will allow us to do is do some highlevel operations on our files in file
explorer so we're going to go ahead and import those and now that we
12:54:50
have those imported uh something that's going to be very important for us to have
throughout this whole thing and this is anytime I'm working with like directories
or something like this we want to get this path down so I'm going to go ahead and
copy this path and we're just going to say path is equal to and we'll do this right
here so let's run this and I need to put an R right here to make this a raw text um
so when you don't have the r uh it's going to read in these you know these
12:55:18
backslashes and these colons and different stuff if we do R it's just going to read
it in as the raw string and that's what we want so here's what we need to do there
there's a few different things that have to happen when we are writing this out one
thing is is we need to go in here and we need to see this path and we need to see
are there folders in here already um if not we need to create a folder so that's
one of the first things that we need to do the next thing that we need is it needs
12:55:43
to check each of these files individually identify what kind of file it is and then
put it into the correct folder so we have to create the folder then check these and
then place it into the correct folder so let's go right out of here so what we're
going to start doing is we're going to start working with these paths and these
directories and some of these things you may never have seen before but that's okay
I'll try to explain it as I go through so the first thing that we're going to write
is
12:56:11
os. list directories uh and what this is actually going to do is show us all the
files in there we're going to say path so it should show us all the files within
path and so here are our results so we have the data professional results fake text
file our image and our other image so this is actually showing us what files are in
that path and that's super important because we're probably going to have to Loop
through this in some way later um I wrote this all out before so I kind of remember
but I'm
12:56:41
doing this all off the top of my head so I guarantee you throughout this I'll make
some mistakes but what we now need to do is we need to create folders or check if
there's a folder and create it if it isn't there that's um The Next Step that we
need to take so let's go right down here and we want to check if this path exists
already so if that folder already exists so we're going to say os. path. exists so
this is going to check does this path just like this path up here does it already
exist and then
12:57:11
we're going to do an open parenthesis we'll say path so that's our path now we need
to add a folder name to this um we could hardcode it so we could do plus we could
say CSV files and that could work so it would say does this path already exist and
we can try running this and it's going to say false so this doesn't already exist
but the thing is is we need to create three separate path so we could do this by
just hardcoding it in by saying CSV files image files um and text files or we can
just put this all
12:57:46
in a list and loop through it I think it's just going to be easier to do that or I
don't know visually it's going to be easier so we'll do uh folder undor names and
we'll say is equal to and we'll create a list so I think I want to call it CSV
files comma um image files or PNG files whatever you want to write and then we'll
do text files do text files and then we can go right down here um a little for Loop
uh I think what we'll do actually let's write folder underscore names um then we
can
12:58:26
put something like uh let's write Loop why not um so a little trick for the for
Loop is you going to say four and we'll say Loop and and we'll just do a range
because we want it to basically go through here we don't want it to actually give
us these file names we just want it to count Zer one and two so if we do range from
Zer to two zero uh 0 one2 that should work if we do um this then when it Loops
through it's going to call folder name and say zero which would be CSV files image
files and text
12:59:00
files um so let's uh yeah I need a colon let's run through this really quickly uh
shouldn't do anything but what we can do now is we can say okay if this does not
exist what we can do is actually create it so we'll say if not so if this does not
exist then what we're going to do is take this and we'll say os. make directory and
then we'll do just like that um I think it's make directory S I can't I think
that's correct um so let's test this out really
12:59:45
quickly let's see if this works and invalid syntax I I need a colon okay so I just
ran this let's see if it did actually make those folders let's refresh it and it
didn't so let's just print this off um so if not let's just print let's see does
this actually work let's do if okay ah okay so I think I know what might be
happening I think it's giving us it actually be let let's check this really quick
go to python tutorials oh no I think it's creating yeah it's creating these Python
13:00:30
tutorial images right here whoops okay so I just figured it out um let's go back
into python tutorials don't take a look at any of those notebooks those are secret
um we were creating them in the wrong place um and that's because of this right
here we need a backslash so we need to actually include a backslash right here here
in this path we didn't have that um e y scanning string literal okay so this back
slash could cause an issue let's see if I can do forward slashes on all these just
stick
13:01:04
with me guys I might cut this out I might not we'll see if this is important just
going to keep talking while we're doing it um let's run this okay so now that we're
doing these forward slashes we're still checking let's make sure we can still check
those files good now when we Loop through this I'm not going to well yeah I can
print it off doesn't matter I'm going to print it and we'll see if that name works
and then we're also going to um uh I said if so if it exists then
13:01:37
make it no no no so if not I think the not did make sense we just weren't sure we
had to do some um checking so if it exists then we're going to create it and we'll
keep the print in there because it doesn't really matter so it's going to create
the CSV an image but didn't create the text let's see okay let's uh I don't know
why this would work but let's run it okay so I think I just had the wrong range so
now we have our images all through or we have our
13:02:05
folders all three folders now we need to write a script that will read in these and
check and see what kind of file it is and place it into the correct folder so let's
come right down here and let's see what we need to do so now I think we need to use
this right here um I think we need to Loop through this to be able to check each
one so we need to name this so we'll just do um file name is equal to run that so
now we have this file name um and what we can do is Loop through this so let's say
let's say for
13:02:45
file in file name so we're going to Loop through this now when it goes through it
needs to check the it's going to check the file path and in the file path it'll
say. txt CSV so let's say um if I think it should be CSV Let's test it on this one
but if CSV is in file name or actually it's file so if if it's in file and not in
and oh not not in if it's also not in this I believe because we're going to check
we're going to check each of those folders so we're
13:03:27
going to Loop through and it's going to check and see if the CSV so if that string
is in the file then what we want to do is check that it's also not in here that's
actually just the folder we also need um also we're not doing that for Loop anymore
um um okay I'm sorry I'm talking this through I'm figuring it out as I go because I
may have forgotten some of this so we're going to say this that's the CSV files so
we need to check this one um let's do it like this oops okay
13:04:09
so it's going to check to see if CSV files and I think it needs that in between it
so it's going to say the path so there's our path plus slash C SV files um actually
no it needs to be like this CU we're going to check that then I got it all right I
figured it out now then we're going to check if this file is in there yeah so
that's right so it says if the CSV is in the file um which is right where am I
looking oh file name so if it's in that list of the actual files which is all of
13:04:46
these if we find CSV in any of these files and it's not already in here so it's
going to say path plus CSV files did I say files yeah CSV files plus file okay that
all looks correct so if it's not in there we're going to use shuttle. move now this
is how we actually move the file it gives us the ability to move what we want then
we'll say move we need to take it from our initial path to our new path so we're
going to specify we'll separate by comma we need to spef ify
13:05:20
its original path which it should just be this without this I think it should be
file path because this is where it is now it's in the FI this path with that file
name then we need to say we want to move it to here that is what we want to do um
yeah so let's check it with just this one and see if it works okay it ran through
it let's go check aha now that CSV file is gone perfect that is exactly what we
want it to happen now we can just recreate this for um for both our PNG files our
image
13:06:03
files and our text files so we'll say LF and LF and let's do PNG then we'll do
image files and image files because again we're just doing the exact same thing I
can do text files the next one's going to be text files text files so this one's
going to check for txt now do we need anything else um we'll just say else and
we'll print off print this file type is not included or or if there's multiple
files we'll say there are files in this path that were're not
13:06:51
moved okay so if we run through this it's going to catch our CSV catch our PNG
catch our text and if not it'll say there are files in this path that we're not
moved exclamation point all right now let's run through this uh uh that's because
if LF LF L if and then it's going to this lse statement uh I don't know let's let's
Circle back around to that in a second all of them were moved properly that's
really good really quickly I I'll I'll check and see I just don't I'm G to take
that
13:07:32
out for now so I'm just going to run it um I'm we may or may not go back to that
but let's check and see if everything worked properly so let's go into the CSV file
and we have our CSV file let go into our image files and we have our images and
let's go into our text file and there are our text files now is there anything else
that we need to do I don't believe so but what I can do is I can take all this I
can include it in here and I'm going to basically restart it just to see if it
works properly from
13:08:16
scratch right I just want to make sure that I didn't miss it anything and we'll
delete these so we have our I'm just going to rerun everything we we imported we
created our path these are our file names and then when we run this it should take
our folder names check through them if they aren't already created it's going to
create it don't need it to print so let's get rid of that then for the file within
our file names and it check it it checks each one we check if there's a CSV and if
it's
13:08:49
already already in that file if it's already in that folder I mean if it's in that
folder then it doesn't do anything but if it isn't so and not it's not in there it
is going to move it to that location so it's going to check CSV PNG and text I
think everything should work properly let's run this and it looks like it's working
good good good and perfect it worked exactly how I had hoped um that's great so
this is the automatic file sorder in file explorer project uh you can go even
13:09:26
a step further so I had to come in here and manually run this you can go a step
further and put a timer on this where it automatically does this maybe every hour
every day every 30 minutes you can run this in your background especially if you
create um like a an execution for this you can run this in your background um if
you are curious on how to do that I think I did something something similar to that
in my web scraping project um my Amazon web scraping project if you want to go
check that one
13:09:56
out but we're not going to do it in this project this is all I wanted to show you
how to do so I hope that this was helpful I hope that this project was you know
interesting and that you liked it I hope that you learned something and so if you
did be sure to like And subscribe below and I will see you in the next video what's
going on everybody welcome back to another video today we're going to be starting
our python web scraping tutorial series now this is more of a continuation of the
Python tutorial
13:10:19
series series but because we're going to be focusing on web scraping for three or
four videos I wanted to just make it its own little minseries in this series I'm
going to show you the basics of web scraping how to actually look at HTML how to
inspect a web page how to pull that data in and then even put it into a CSV file so
you can save it and use it now in this series we're just covering the basics which
is a fantastic place to start but in future series I'll be going into some of the
more advanced web
13:10:42
scraping topics as well so without further Ado let sh up on my screen and get
started with web scraping now the first thing that we need to learn is HTML HTML
stands for hypertext markup language and it's used to describe all of the elements
on a web page now when we actually go to a website and start pulling data and
information we need to know HTML so we can specify exactly what we want to take off
of that website so that's where HTML comes in and we're going to look at the basics
13:11:08
understanding just the basic structure of HTML then we'll go look at a real website
and you'll kind of see that's a little bit more difficult than what we just have
right here but this is the basic building blocks to get to what the HTML actually
looks like on a website now this is basically what HTML looks like we have these
angled brackets with things like HTML head title body and then you'll notice that
at the end we'll have a body and then we'll have a body at the bottom this forward
SL body
13:11:38
denotes that this is the end of the body section in HTML so everything inside of
this is within this body so there is this hierarchy within HTML we have HTML and
HTML at the bottom which encapsulates all the HTML on the website then we have
things like head and head body and body now Within These sections we usually have
things like classes tags attributes text and all these other things things that
we'll get to in different lessons but one of the easiest ones to notice and look at
are tags
13:12:10
things like a P tag or a title tag now Within These tags because this is a super
simple example we have these strings here my first web page page and this is what's
called a variable string and this is actual text that we could take out of this web
page now that you understand the super basics of HTML let's actually go to our
website and I'm going to have a link down below but it's going to be this one right
here this is basically just a website that you can you know practice web scraping
on it's
13:12:37
called scrape the site.com and what we're going to do is look at the HTML behind
this web page and you can do this on any website that you go on so we're going to
right click we're going to go down to inspect now right off the bat this looks a
lot more complicated and a lot more complex than the very simple illustration that
we were looking at but let's kind of roll this up just a little bit you'll notice
we have HTML and HTML at the bottom we have a head and there is the end of the head
and then a body and the
13:13:08
end of the body so in a super simple sense it is similar but just the information
that's within it is a lot more difficult now if we look at this title right here
this is our title tag if we click this little arrow this is our dropdown you'll
notice that here we have the string hockey teams forms searching imp pagination now
let's say we didn't know we didn't want to click on that and go find it there's
something that's super helpful within this inspection page that you can click on
13:13:37
right here it says select an element in the page to inspect it so we're going to
click on that and as we go through our page and let's click on this title it's
going to take us to exactly where this is in our our HTML this is extremely helpful
extremely useful for example let's say the data I want is down here I want to take
in the Boston Bruins I can click on it and it's going to take me to where that is
exactly in the HTML this is where we can start writing our web scraping script to
specify okay I'm
13:14:06
looking for a TR tag I'm looking for a TD tag I'm looking for the class called team
this is all information and things that we can use to specify exactly what we want
to pull out of our web page now there are other things that didn't really look at
as well in just our simple illustration let's come right over here there's things
like HRS now these are hyperlinks so if we went and then clicked on this this is
just regular text but inside of it is this hyperlink where if we clicked on it it
13:14:35
would take us to another website and typically that's denoted by this hre right
here then you'll typically see things like a P tag which usually stands for a
paragraph now the last thing that I want to show you while we're here and we're
going to learn a lot more in the next several lessons but if we come right down
here there is this actual entire table here and let's try to find this table and
I'm having trouble selecting the entire thing but let's select this team name and
if we
13:14:59
look at this team name you can see that this is encapsulating the table this table
tag now these are super helpful because it takes in the entire table now if we wrap
this up and we look just at this it says class table and then we have the end of
this table tag now when we open it it's going to have all of this information so as
you can see as I'm highlighting over it we have these th tags and we have these TD
tags and even these TR tags which is the individual data and this is something
13:15:29
that we'll look at when we're actually scraping all of the data from this table in
a future lesson so this is how we can use HTML how we can inspect the web page and
see exactly what's going on kind of under the hood and then in future lessons we'll
see how we can use this HTML to specify exactly what data we want to pull out thank
you guys so much for watching if you like this video be be sure to like And
subscribe below I will see you in the next [Music] lesson hello everybody in this
lesson
13:16:06
we're going to be taking a look at beautiful soup and requests now these packages
in Python are really useful these are the two main ones that I use when I was first
starting out with web scraping it can get a lot of what you want done in order to
get that information out now of course there are other packages that you can use
that may be a little bit more advanced but again this is just the beginner Series
in a future series we'll look at other packages as well that have some more
advanced functionality so what we're
13:16:30
going to be doing is we're going to import these packages and then we're going to
get all of the HTML from our website and make sure that it's in a usable State and
then in the next lesson we're going to kind of query around in the HTML kind of
pick and choose exactly what we want we look at things like tags variable strings
classes attributes and more so let's get started by importing our packages what
we're going to say is from bs4 this is the module that we're taking it from we're
going to say import
13:16:58
and then we'll do beautiful soup then we're going to come down and we're going to
say import requests now let's go ahead and run this I'm going hit shift enter and
it works well for me now if this does not work for you you may potentially need to
actually install bs4 so you may have to go to your terminal window and say pip
install BS 4 I'll just let you Google how to do that if you need to do that cuz
it's pretty easy but if you're using Jupiter notebooks through Anaconda like
13:17:25
how we set it up at the beginning of this python series then you should be totally
fine it should be there for you the next thing that we need to do is specify where
we're taking this HTML from so what we need to actually do is come right over here
to our web page and we need to get the URL so we're going to go here we're going to
copy this URL and I'm just going to put it right here for a second and what we're
going to do is we're going to be using this URL quite a bit so we just want to
assign it to a
13:17:50
variable so just say URL is equal to and then we'll put it right in here now we can
get rid of that so now this is our URL going forward this is where we're going to
be pulling data from let's go ahead and run this now we're going to use requests
and what we're going to do is we're going to say requests.get and then we're going
to put in url now this get function is going to use the request Library it's going
to send a get request to that URL and it's going to return a response object let's
13:18:19
go ahead and run this as you can see here I got a response of 200 if you got
something like a 204 or a 400 or 401 or 404 all these things are potentially bad
something like a 204 would mean there was no content in the actual web page 400
means a bad request so it was invalid the server couldn't process it and you don't
get any response if you got a 404 that might be one that you're familiar with
that's an error that means the server cannot be found the next thing that we're
going to do is take the
13:18:48
HTML now if you remember we come right back here and we inspect this we have all
this HTML right here now on this web page specifically right now it's completely
static it's not a bunch of moving stuff or anything like that usually when you're
looking at HTML if you're looking at something like Amazon and those web pages can
update but when you actually pull that into python you're basically getting a
snapshot of the HTML at that time so what we're going to do is bring in all of this
HTML
13:19:15
which is our snapshot of our website and then we can take a look at it so we're
going to come right down here and now we're going to say beautiful soup so now
we'll use the beautiful soup package or Library so we need to say beautiful soup
and we're going do an open parenthesis we're going to do two things there's two
parameters that we need to put in here first we need to put in this get request we
actually need to name this and we'll call this page we'll say page is equal
13:19:40
to and let's run this and now we're going to put that page in here and what we're
going to say is do text so the page is what's sending that request and then the
text is what's retrieving the actual raw HTML that we're going to be using then
we're going to put a comma here and what we need to specify is how we're going to
parse this information now this is an HTML so what we're going to do is HTML just
like this this is a standard this is already built into to
13:20:06
this Library so we don't need to go any further but it's basically going to parse
the information in an HTML format let's go ahead and run this let's see what we get
and as you can see we have a lot of information and as we scroll down I'll try to
point out some things that we've already looked at in previous lessons um something
like this th tag that should be very similar that's the title then we have these TD
tags and then of course if we scroll down even further we'll have things like ATR
tag so these
13:20:36
are all things that we looked at in that first lesson when learning about HTML now
again we want to assign this to a variable so we're going to say soup that's going
to say equal to this information information right here now I'm not going to go
into all the history behind beautiful soup what I will say is the guy who created
this beautiful soup Library uh what he said was is that it takes this really messy
HTML or XML which you can also use it for and makes it into this kind of beautiful
soup so I
13:21:04
just thought that was kind of funny uh but that's why we're calling it soup right
here and we're going to go ahead and run this and we'll come right down here and
we'll say print soup and let's run it and now we have everything in here so we have
our HTML L our head we have some HR and some links in here let's scroll down a
little bit more and then we have our body right there and of course we have a bunch
of information in here now in the next lesson what we're going to be doing is
learning how to
13:21:32
kind of query all of this to take specific information out and basically understand
a lot of what's going on in this HTML to make sure we can actually get what we need
now if this looks really kind of messy to you and it just doesn't make a lot of
sense there is one more thing that I'm going to show you and we'll come right down
here so we'll say soup. pry and if you've ever used a different type of programming
languages uh pry is very common in a lot of them where it'll just make it a little
bit
13:21:59
more easy to visualize and see uh you'll notice that it kind of has this hierarchy
built in whereas if we scroll up there's no hierarchy built in it's all just down
this left hand side so if you kind of want to view it and just kind of visually see
the differences this does help a lot but it doesn't actually help a lot when you're
you know querying it or using you know find and find all which is what we're going
to look at in the next lesson so that is our lesson on beautiful soup and
13:22:25
requests in the next two lessons we're going to be looking at find and find all as
well as really diving into things like variable strings and tags and classes and
all those things and then in the last lesson we're going to do kind of this mini
project where we try to get all the data from this web page that we've been using
from that table and put it into a panda's data frame so thank you guys so much for
watching I really appreciate it if you like this video be sure to like And
subscribe below and I
13:22:48
will see you in the next [Music] lesson hello everybody in this lesson we're going
to be taking a look at find and find all really we're going to be looking at a ton
of different things in this lesson this is where we really start digging in seeing
how we can extract specific information from our web page but in order to do that
let's set everything up where we actually bring in the HTML like we did in the last
lesson and we're just going to write all this out one more time just
13:23:24
for practice if nothing else and then we'll get into actually getting that
information from the HTML so we're going to start by saying from bs4 import
beautiful soup there we go and import requests we'll go ahead and run this then
we're going to come up here grab our HTML or sorry our URL so we'll say URL is
equal to and we'll have that right here now we need to say page is equal to and
then we'll do requests.get and then we'll put in our URL right here and we're going
to come
13:24:01
over here and run this and lastly we need to say soup so we'll say soup is equal to
beautiful soup there we go and then within our parentheses we need to specify the
page. text because we need that and our parser which is HTML and there we go and
let's go ahead and run this let's print it out make sure it's working and there we
go so we have our soup right here all this should look really similar to uh our
last lesson and so now we've brought in our HTML from our page we have a lot a lot
a lot of
13:24:37
information in here now really quickly let's come over and let's inspect our web
page now in here we have a ton of information right we have bunch of different tags
and classes and all these other things but how do we actually use these well that's
where the find and find all is going to come into play and they're pretty similar
and you'll see that in just a little bit but let's say we want to take uh one of
these tags and let's come down let's say we just want to take this div tag now
there's going
13:25:10
to be a lot of different div tags in our HTML but let's just come right here let's
go down and let's say we're going to call Soup we're going to say soup that's all
of our information we're going to say do find now within our parentheses we can
specify a lot of different things but we're going to keep it really simple right
now we're just going to say di let's go ahead and run this what this is going to
bring up is the very first div tag in our HTML and that's going to
13:25:37
be this information right here now let's copy this and we're going to do the exact
same thing except we're going to say find underscore all now let's run this now
we're going to have a ton more information really all find and find all do is that
they find the information now find is only going to find the first response in our
HTML Le that's the div class container let's go back up to the top that's our div
class container but find all is going to find all of them so
13:26:11
it'll put it in this list for you so it's going to have this first one and it goes
down to uh this SL div which should be right here and then we have a comma which
separates our next div tag so that is how we can use it now what if we want to
specify one of these div tags we pulled in a ton of them but we want to just look
for one of them well this is something where the class comes in handy because right
now we have class is equal to container classes equal to co md-12 I don't know what
these are at the
13:26:41
off the top of my head but um usually they'll be somewhat unique and we can use
these to help us specify what we're looking for for example just kind of glancing
of this we could also use this a tag if we wanted to look at this so we could say
oh we're looking for uh these hrefs so we have an hre here and this right down here
we have this hre as well which again uh if you remember from previous lesson that
stands for a hyperlink now something like the class or the href um or these IDs
these are
13:27:11
all attributes so we can specify or kind of filter Down based off of these now
let's try it so what we can do is we can do class first and this is kind of the
default uh within something like find all is you can even do class underscore we
can come right back up we have this div and then here's our class so again we have
to have the div and the class if we took this a tag this is an a tag which would go
right here with the class of something like navlink or something like navlink again
down here we need to
13:27:41
specify that more but we have our div so we'll say CL Cole md12 right here and
let's go ahead and run this and now it's going to pull in just that information now
we're still getting a list because we have multiple of these so this div class uh
Cole md-12 doesn't just happen once if we scroll down we'll see it multiple times
something like right here uh or actually let me see right here so here's this comma
then here's our next one so we have two of these uh div tags with a
13:28:13
class of coal- md-12 and in each of these we have different information this looks
like a paragraph with this P tag right here and let's scroll back up uh so I also
think we should try out doing something like this P tag typically these P tags
stand for paragraphs or they have text information in them let's try to P tag
really quickly and let's just see what we get and let's run this and it looks like
we get multiple P tags now if we come back here you can see that there's this
information and it's
13:28:46
this information that we're pulling in and I'm just you know noticing that from
right here and then we have this information right here and it looks like there's
one more which is this href which looks like this open source so data via and then
that uh hyperlink or that link right there so we have three different P tags now
just to verify and make sure that that's correct what we could do is come over here
we're going to click on this paragraph it's going to take us to that P tag where
the class is
13:29:15
equal to lead let's come over here and look at this paragraph now we have another P
tag right over here with the class is equal to glyphicon glyphicon education I have
no idea what that means um and then we'll go to our last one which is right here
where the P tag is equal to uh we have a tag HRA class uh and a bunch of other
information so let's say we just wanted to pull in this paragraph right here let's
go here and see how we can specify this information so it looks like P or the class
is equal
13:29:49
to lead that looks like it's going to be unique to just that one so if we come down
here we're going to say comma and it was class so you can do uh class underscore is
equal to and then we're going to say lead let's try running this and we're just
pulling in that information now let's say we actually want to pull in this
paragraph We actually want this text right here and this is a very real use case
you know let's say I'm trying to pull in some information or or a paragraph of text
13:30:21
well let's copy this and what we're going to then do is say. text and let's run
this now we're going to get an error right here and this is a very common error
because we're trying to use find all unfortunately find all does not have a text
attribute we actually need to change this to find typically when I'm working with
these find and find alls I'm using findall most of the time until I want to start
extracting text then when I specify it I'll change this back to find just like this
now let's try
13:30:55
this and now we're getting in parentheses this information now this is all wonky it
needs to definitely be cleaned up a little bit but if we code back up it's no
longer in a list and we no longer have things like these P tags in here or this
class attribute so we're really just trying to pull out this information now again
this does not look perfect we could even try to do something like do strip look
like there's some white space uh that cleans it up a little bit this definitely
looks
13:31:26
a little better um and we could definitely go in here and clean this up more but
just for you know an example this is how we can then extract that information now
let's look at one more example this is some information and this is what we're
going to do kind of our little mini project in the next lesson on let's say we
wanted to take all this information what if we wanted to pull in something like the
team name that's going to be in right here in this TR tag and each of these TR tags
have th
13:31:53
tags underneath them so if we scroll down you'll notice that each row is this TR
tag so let's go ahead and search for let's do th let's just search for that first
so let's come right back up here let's use this find all and we'll get rid of this
text for right now and let's just say we want to look for the TR is that what we
said we were looking for no th so let's say we're looking for th let's go ahead and
run this so we're going to have underneath this th we have team name
13:32:29
year wins losses and notice these are all the titles so these titles are the only
ones with these th tags if we go down you'll notice that the data is actually TD
tags so now let's go back and look for TD we'll say D and this is going to be a lot
longer we have a lot of information but these are all the rows of data let's see if
we can just get one piece of this data we're going to get back we want just this
team name that's all we're trying to pull in for now um and then we'll try
13:33:02
to get this row and then in the next lesson we're going to try to get all of this
information make it look really nice and then we'll put it into a panda's data
frame so let's just get this team name right now let's go ahead we're going to say
th let's run this and we have this th and now that we know we're getting this
information in we can do find let's run this so there's our team name we're just
going to say. text and again we can do do strip just like
13:33:37
that and Bam we have our team name so you can kind of start getting the idea of how
we're pulling this information out we're really just specifying exactly what we're
seeing in this HTML and what's really really helpful and you know something that I
do all the time is I'm inspecting it I'm just kind of searching like how what do I
want what piece of information do I want then I go ahead and click on it and then
I'm looking you know where is this sitting in the hierarchy it's within the body
13:34:04
it's within this table with the class of table then it's down here where this TR
tag and then this TD tag so I'm looking kind of at the hierarchy and I'm specifying
exactly what I'm looking for so that is what we're going to look at in today's
lesson that's how we can use find and find all we were able to look at classes and
tags and attributes and variable strings which is this right here getting that text
uh and variable strings and we will look at find and find all and how it's pulling
that
13:34:32
information in and how we can specify exactly what we're looking for now in the
next lesson which is definitely going to be the most exciting one we're going to
try to pull in all of this information so every single thing because we'll be able
to put all this information into a data frame which then we can use pandas to
really search and manipulate that data within that data frame so with that being
said that is the end of this lesson if you like this video be sure to like And
subscribe I
13:34:57
will see you in the next [Music] lesson hello everybody in this lesson we are going
to be scraping data from a real website and putting it into a p and's data frame
and maybe even exporting it to CSV if we're feeling a bit spicy now in the last
several lessons we've been looking at this page right here and I even promised that
we were going to be pulling this data but as I was building out the project I just
I honestly thought it was a little bit too easy since in the last lesson we
13:35:35
kind of already pulled out some information from this table and I want to kind of
throw you guys off so we're going to be pulling from a different table we're going
to be going on to Wikipedia and looking at the list of the largest companies in the
United States by Revenue and we're going to be pulling all of this information so
if you thought this was going to be easy in a little mini project uh it's now a
full project because why not so let's get started uh what we're going to do is
13:35:58
we're going to import beautiful soup and requests we're going to get this
information and we're going to see how we can do this and it's going to get a
little bit more complicated and a little bit more tricky we're going to have to you
know format things properly to get it into our Panda data frame to make it looking
good and making it more usable so let's go ahead and get rid of the this easy table
we don't want that one uh and we're going to come in here and we're just going to
start off this
13:36:21
should look uh really familiar by now we're going to say from bs4 import beautiful
soup I don't know if you've noticed but I've messed up spelling beautiful soup in
every single uh video I've noticed uh let's run this and now we need to go ahead
and get our URL so let's come up here let's get our URL say URL is equal to and
we'll just keep it all in the same thing really quickly because we know this by
Heart by now right uh we'll say request.get and then URL to make sure that we're
getting
13:36:57
that information it give us a response object um hopefully it'll be 200 that'll
mean a good response and then we'll say soup is equal to and then we'll say
beautiful soup and we'll do our page. text now we're pulling in the information
from this URL and then we use our parser which will be oops HTML and let's go ahead
and run this looks like everything went well let's print our soup now this is
completely new to you it's completely new to me I don't know what I'm doing uh but
it looks like
13:37:28
we're pulling in the information am I right so we got a lot of things going for us
uh the uh stuff was imported properly we got our URL we got our soup which is uh
not beautiful in my opinion but let's keep on rolling let's come right down here
now what we need to do is we we need to specify what data we're looking for so
let's come and let's inspect this web page now the only information that we're
going to want is right in here we're going to want these uh titles or these headers
whoops so
13:38:00
we're going to want rank name industry Etc and then we are for sure going to want
all of this information let's just scroll down see if there's anything tricky in
here all right that looks pretty good and there is another table so there's not
just one table in here there are two tables in this page so that might change
things for us but let's come right back and let's inspect our page by using this
little button right here and let's specify in let's see if I can highlight
13:38:31
just this page oh it's not going oh let's do that right there so now we have this
uh Wiki table sorter now I'm going to actually come right here I'm going to copy
and I'm just going to say copy the outer HTML just just going to paste in here real
quick and that's a ton of information I didn't think it was going to copy all of it
and we're just going to delete that I just wanted to keep that class uh because I
wanted to then come right down here at the bottom and just see what this table uh
looks like I
13:39:03
don't know if it's part of it or if it's a if it's its own table um I can't tell
let's look at this Rank and let's come up so it says uh it's under this table and
it looks like it's its own table but it says Wiki table sort sortable jQuery table
sorter wikip sortable jQuery table sorter so it looks like there are two tables
with the same class which shouldn't be a problem if we're using find to get our
text because we should be taking the first one which
13:39:36
will be this table and this is the table we want um and if we wanted this one we
could just use find all and since it's a list we could use index ing to pull this
table right um but I think we're going to be okay with just pulling in this one so
let's go ahead and let's do our find so we'll do soup. find and we could find all
or we could just do find uh table let's just try this and see what we get and if it
pulls in the right one that we're looking for that' be great now this does
13:40:11
not look correct at all um I don't know what table it's pulling in oh maybe it's
this right here this might be a table yeah it is so we have this uh box more
citations so actually we are going to have to do exactly like what I was talking
about uh let's pull this and we well we could do comma class uh right here and
let's do both you know what this is a learning opportunity let's do both so let me
go back up to the top because I need these um and what we're going to do let's come
right down
13:40:47
here I want to add in uh another thing actually I'll just push this one up there we
go so we're going to say findor all let's run this so now we have multiple and
again we got that weird one first but if we scroll down here's our comma and then
here's our wik Wiki table sortable and then we have rank name industry all the ones
that we were hoping to see and I guarantee you if you scroll all the way to the
bottom um we're going to see potentially Well Fargo Goldman Sachs I'm pretty sure
those are
13:41:25
um let's see yeah here we go like Ford motor Wells Fargo Goldman Sachs that's this
table right here so now we're looking at the third table but again this is a list
so we can use indexing on this and we'll just choose not position zero because
that's this one right here which we did not like well now we'll take position one
let's run this let's go back up to the top and this is our table right here rank
name industry this is the information that we were actually wanting just to confirm
rank name
13:41:58
industry Etc so this is the information we're wanting and we're able to specify
that with our findall and this is the information we want so we now want to make
this the only information that we're looking at so I'm just going to copy this we
didn't need to use our class for this one you could probably could have um but we
could so let's actually um put this right down here this will be our table we'll
say equal to but then I'll come right here and I'm going to say soup. find this is
just for
13:42:27
demonstration purposes we do table comma class underscore is equal to and then
we'll look at this right here whoops me do this and let's see if we get the correct
output and let's run this and looks like we're getting a nun type object uh if I
remember remember looks like the actual class is this right here so let's run this
instead and I got to get rid of the index there we go okay so we were able to pull
it in just using the find so the find table class and it says Wiki table
13:43:02
sortable at least that's the HTML that we're pulling in right here let me go back
because I don't don't know if that's what I was seeing earlier let's just get this
rank let's go back up where's the rank we go rank there we go so here's our Rank
and let's go up to the table and there's our class yeah and and that's just uh to
me that's a little bit odd so it says Wiki table sortable jQuery Das table sorder
right here but in our actual um in our actual python script
13:43:39
that we're running it was only pulling in the wiki table sortable so it wasn't
pulling in the jQuery dot sorter why uh I'm not 100% sure but all things that we're
working through and we were able to uh we were able to figure out so we're going to
make this our table we're going to say tables equal to uh soup. findall and let's
run this and if we print out our table we have this table now this is our only data
that we are looking at now the first thing that I want to get is I want to get
these
13:44:15
titles or these headers right here that's where we're going to get first so let's
go in here we can just look in this information you can see that these are with
these th tags and we can pull out those th tags really easily let's come right down
here we're just going to say th and we can get rid of this let's run this now these
are our only th tags because everything else is a TR tag for these rows of data so
these th tags are pretty unique which makes it really easy which is really really
great because
13:44:46
then we can just do worldcore titles is equal to so now we have these titles but uh
they're not perfect but what we're going to do is we're going to Loop through it so
I'm going to say worldcore titles and I'll kind of walk through what I'm talking
about isn't a list and each one is Within These th tags so th and then there's our
um string that we're trying to get so we can easily take this list and use list
comprehension and we can do that right down here so I'm going to keep this
13:45:17
where we can see it um we'll do worldcore tore titles that's equal to now we'll do
our list comprehension should be super easy uh we'll just say for title in
worldcore titles and then what do we want we want title. text that's it um because
we're just taking the text from each of these we're just looping through and we're
getting rank then We're looping through getting name looping through getting
industry that's that's it so let's go and print our world table
13:45:50
titles and see if it worked and it did uh this looks like it needs to be cleaned up
just a little bit so let's go ahead and do that while we're here before we actually
put it into the uh P's data frame oops I just wanted uh I just wanted this actually
so what we're going to do is try to get rid of those back slash ends if we do dot
strip that may actually not work yeah uh because this is a list what we need to do
is we can actually do it dot. text. strip right here let's try to do it in there
13:46:24
there we go so now we have uh this and now this world tables is good to go now I'm
actually noticing one thing that may be odd yeah so we have rank name industry goes
to headquarters but then in here we're getting rank name industry and then the
profits which is from this table right here which we don't want uh let's scroll
back up let's kind of backtrack this and see where this happened we did find all
table we're looking at the first one right and then we're doing [Music]
13:47:05
headquarters uh so we're doing print table ah okay I think I found the issue here
and let's backtrack again this is we're working through this together we're going
to make mistakes uh the table is what we actually wanted to do we just did soup.
findall th which is going to pull in that secondary table um jeez we were not
thinking here um so now we need to do find all on the table not the soup because
now we were looking at all of them oh what a rookie mistake okay uh let's go back
now let's look at
13:47:35
this now it's just down to headquarters okay okay let's go ahead and run this let's
run this now we just have headquarters now let's run this now we are sitting pretty
okay excuse my mistakes Hey listen you know if it happens to me it happens to you I
promise you this is you know this is a project this a little U little project we're
creating here so we're going to run into issues and that's okay we're figuring out
as we go now what I want to do before we start pulling in all the
13:48:02
data is I want to put this into our Panda's data frame we'll have the uh you know
headers there for us to go so we won't have to get that later and it just makes it
easier uh in general trust me so we're going to import pandas as PD let's go ahead
and run this and now we're going to create our data frame so we'll say PD dot now
we have these world uh table titles so what we're going to do is pd. data frame and
then in here for our columns we'll say that's equal
13:48:31
to the world table titles and let's just go ahead and say that's our data frame and
call our data frame right here let's run it there we go so we were able to pull out
and extract those headers and those titles of these columns we're able to put it
into our data frame so we're set up and we're ready to go we're rocking and rolling
the next thing we need let's go back up next thing we need is to start pulling in
this data right here so we have to see how we can pull
13:48:58
this data in now if you remember that we had those th tags those were our titles as
you can see I'm highlighting over it but down here now we have these TD tags and
those are all encapsulated within a TR tag so these TR represent the row right then
the D represents the data within those rows so R for rows D for data so let's see
how we can use that in order to get the information that we want so let's go back
up here just going to take this because again we're only pulling from table not
soup not soup
13:49:35
what were we thinking um and let's go ahead and let's look at TR let's run this now
when we're doing this TR these do come in with the head so we're going to have to
later on we're going to have to get rid of these we don't want to pull those in um
and have that as part of our data but if we scroll down there's our Walmart um we
have the location these are all with these TD tags and then of course it's
separated by a comma then we have our td2 so above we had our td1 so
13:50:09
Row one row two Row three all the way down now we will easily be able to use this
right because this is our column data and we can even call it that column
underscore data is equal to we'll run that um and what we're going to do is we're
going to Loop through that because it was all in a list so we're going to Loop
through that information but instead of looking at the TR tag we're going to look
at the T D tag so let's come right down here we'll say for the row in column
13:50:38
row and we'll do a colon now we need to Loop through this we'll do something like
row. findor all all and then what are we looking for we're not looking for the TR
looking for the TD and just for now let's print this off see what this looks like
apparently I didn't run this uh column data that's why and let's run this and what
we actually need to do is something almost exactly like this and I'm going to put
it right below it um instead of of printing this off because again this is all in a
list
13:51:19
we're using find all so we're we're printing off another list which isn't actually
super helpful um for each of or all these data that we're pulling in what we can do
is we can call this uh the rowcor data and then we'll put the row data in here so
we'll say four and we'll say in row data so we'll just say for the data in row data
and we'll take the data we'll exchange that and now instead of uh World Table
titles we can change this into uh individual row data right and now let's
13:51:56
print off the individual row data so it's the exact same process that we were doing
up here and that's how we cleaned it up and got this and we may not need to strip
but let's just run this and see what we get there we go um and strip I'm sure was
helpful let's actually get rid of this yeah strip was helpful is the exact same
thing that happened on the last one so let's keep that actually let's run this and
now let's just kind of glance at this information let's look through
13:52:25
it this looks exactly like the information that's in the table let's just confirm
with this first one uh 25 uh two what am I saying 572 754 2.4 2300 57275 2.4 2200
so this looks exactly correct now we have to figure out a way to get this into our
table because again these are all individual lists it's not like we're just you
know putting all of this in at one time we can't just take the entire table and
plop it into um into the data frame we need a way to kind of put this in one at
13:53:02
a time now if you're just here for web scraping and you haven't taken like my panda
series that's totally fine that's not what we're here for anyways um but what we
can do we'll have our individual row data and we're going to put it in kind of one
at a time time now the reason we have to do that is because when we had it like
this and let's go back when we had it like this it's printing out all of it but
what it's really doing and let's get rid of it um
13:53:26
what it's really doing is it's kind of doing it like this it's printing it off one
at a time and it's only going to save that current row of data this last one it's
only going to save that as it's looping through so what we actually want to do is
every time it Loops through we append this information onto the data data frame so
as it goes through and eventually it's going to end up with this one but as it goes
through let's run this as it goes through it puts this
13:53:53
one in and then the next time it Loops through it puts this one in and the next
time it Loops through Etc all the way down um so let's see how we can do this so we
have our data frame right here let's get rid of this let's bring our data frame in
now again like I just mentioned if you don't know pandas and you haven't learned
that uh you know go take my uh series on that it's really good and we do something
very similar to this in that Series so I'm not going to kind of walk through the
entire logic um
13:54:21
but there is something called Lo which stands for location when you're looking at
the index on a data frame and we're going to use that to our advantage so we're
going to say the length of the data frame so we're looking at how many rows are in
this data frame and then we're going to say that's our length then we're going to
take that length and use it when we're actually putting in this new information
pretty um pretty cool so we're going to say df.loc then a
13:54:49
bracket and we're putting in that length so we're checking the length of our data
frame each time it's looping through and then we're going to put the information in
the next position that's exactly what we're doing so let's go ahead and put in the
individual row data um so let's just recap We're looping through this TR this is
our column data so these TR that's our row of data then we're as as We're looping
through it we're doing find all and looking for TD tags that's
13:55:19
our individual data so that's our row data then we're taking that data each piece
of data and we're getting out the text and we're stripping it to kind of clean it
and now it's in a list for each individual row then we're looking at our current
data frame which has nothing in it right now we're looking at the length of it and
we're appending each row of this information into the next position so let's go
ahead and run this it's working it's thinking and it looks
13:55:48
like we got an issue canot set a row with mismatched columns now we're encountering
an issue not one that I got earlier but we're going to cancel this out we're going
to figure this out together so let's print off our individual row data let's look
at this this one is empty uh this is I'm almost certain is probably the issue um I
didn't encounter this issue when I wrote these uh when I wrote this lesson um but
I'm almost certain that this is the issue right here so let's do the column
13:56:18
data but let's start at position um let's try one and not parentheses I need
brackets because this is a list right so it should work and there we go so now that
first one's gone so now we just have the information I didn't even think about that
um just a second ago but I'm glad we're running into it in case you ran into that
uh issue let's go ahead and try this again and it looked like it worked so let's
pull our data frame down I could have just wrote DF let's pull our data
13:56:49
frame down and now this is looking fantastic now um these three dots just mean
there's information in there just doesn't want to display it but it looks like we
have our rank we have our name have the industry revenue revenue growth employees
and headquarters for every single one so this is perfect now this is exactly what I
was hoping to get now you can go in and use pandas and manipulate this and change
it and you know dive into all the information in there but we can also export this
into a
13:57:20
CSV if that's what you're wanting so we could easily do that by saying we'll do DF
do2 CSV and then within here we're just going to do R and specify our file path so
let's come down here to our file path then we'll go to our folder for our output so
we're just going to take this path and let me do it like that so I have this path
in my one drive documents python webscript being folder for output so you know I
already made this um and I'm just going to put this right down
13:57:49
here now I do have to specify what we're going to call this um we'll just call this
companies and then we have to say CSV that is very important now if we run this I
already know just because uh we have this Rank and this index here we're going to
keep this index in the output not great uh but let's run it let's look at our
output there's our companies and when we pull this up as you can see this is not
what we want because we have this extra thing right here now if we're automating
13:58:20
this this would get super annoying so what we're going to do is go back and just
say index equals false let's go out of here and now we're just going to come right
down here we're going to say comma index equals false and so it's going to take
this index and it's not going to import or actually export it into the CSV now
let's go ahead and run this let's pull up our folder one more time and let's
refresh just to make sure should be good and now this looks a lot
13:58:50
better so we're able to take all of that information and put it into a CSV and it's
all there so this is the whole project so if we scroll all the way back up let's
just kind of glance at what we did here scroll down we brought in our libraries and
packages we specified our URL we brought in our soup um and then we tried to find
our table now that took a little bit of uh testing out but we knew that the table
was the second one so in position one so we took that table we were also able to
specify it using
13:59:23
find but then we used the class and of course we just wanted to work with that
table that's all the data we wanted so we specifi this is our table and we worked
with just our table going forward of course uh we encountered some small issues
user errors on my end but we were able to get our world titles and we put those
into our data frame right here using pandas then next we went back and we got all
the row data and the individual data from those rows and we put it into our Panda
data frame then we
13:59:55
came below and we exported this into an actual CSV file so that is how we can use
webs scraping to get data from something like a table and put it into a panda data
frame I hope that this lesson was helpful I know we encountered some issues that's
on my end and I apologize but if you run into those same issues hopefully that
helped uh but I hope this was helpful and if you like this be sure to like And
subscribe below I appreciate you I love you and I will see you in the next [Music]
lesson so the first thing that we need
14:00:35
to do is import our Panda's Library so we're going to say import we're going to say
pandas now this will import the pandas library but it's pretty common place to give
it an alias and as a standard when using pandas people will say as PD so this is
just a quick Alias that you can use uh that's what I always use and I've always
used it because that's how I learned it and I want to teach it to you the right way
so that's how we're going to do it in this video so let's hit shift enter now that
that
14:01:02
is imported we can start reading in our files now right down here I'm going to open
up my file explorer and we have several different types of files in here we have
CSV files text files Json files and an Excel worksheet which is a little bit
different than a CSV so we're going to import all of those I'm going to show you
how to import it as well as some of the different things that you need to be aware
of when you're importing so we're going to import some of those different
14:01:29
file types and I'll show you how to do that within pandas so the first thing that
we need to say is PD Dot and let's read in a CSV because that's a pretty common one
we'll say read CSV and this is liter literally all you have to write in order to
call it in now it's not going to call it in as a string like it would in one of our
previous videos if you're just using the regular operating system of python when
you're using pandas it calls it in as a data frame and I'll talk about some of the
14:01:59
nuances of that so let's go down to our file explorer we have this countries of the
world CSV you just need to click on it and rightclick and copy as path and that's
literally going to copy that file path for us you don't have to type it out
manually you can if You' like and we're just going to paste it in between these
parentheses now if we run it right now it will not work I'll do that for you it's
saying we have this Unicode error uh basically what's happening is
14:02:26
is it's reading in these backs slashes and this colon and all those back clashes in
there and this period at the end what we need to do is read this in as a raw text
so we're just going to say R and now it's going to read this as a literal string or
a literal value and not as you know with all these back slashes which does make a
big difference when we run this it's going to populate our very first data frame so
let's go ahead and run it and now we have this CSV in here with our country and our
14:02:55
region now if we go and pull up this file and let's do that really quickly let's
bring up this country's of the world it automatically populated those headers for
us in the data frame but we don't have any column for those 0 1 2 3 so if we go
back as you can see right here there's this index and that's really important in a
data frame it's really makes a data frame a data frame and we use index a lot in
pandas we're able to filter on the index search on the index and a lot of other
things
14:03:22
which I'll show you in future videos but this is basically how you read in a file
now if we go right up here in between these parentheses and we hit shift tab this
is going to come up for us let's hit this plus button and what this is is these are
all the arguments or all the things that we can specify when we're reading in a
file and there are a lot of different opts options so let's go ahead and take a
look really quickly really quickly I wanted to give a huge shout out to the sponsor
of this entire Panda
14:03:50
series and that is udemy udemy has some of the best courses at the best prices and
it is no exception when it comes to pandas courses if you want to master pandas
this is the course that I would recommend it's going to teach you just about
everything you need to know about pandas so huge shout out to you me for sponsoring
this Panda series and let's get back to the video the first thing is obviously the
file path we can specify a separator which there is no default so when we're
pulling in this CSV when
14:04:15
we're reading in the CSV it's automatically going to assume it's a comma CU it's a
comma separated uh file you can choose delimers headers names index columns and a
lot of other things as you can see right here now I will say that I don't use
almost any of these uh the few that I'm going to show you really quickly in just a
second are up the very top but you can do a ton of different things and I'm just
going to slowly go through them so that's what those are you can also go down here
this
14:04:44
is our dock string and you can see exactly how these parameters work it'll show you
and give you a text and walk you through how to do this again most of these you'll
probably never use but things like a separator could actually be useful and things
like a header could be useful because it is possible that you want to either rename
your headers or you don't have a header in your CSV and you don't want it to autop
populate that header so that is something that you can specify so for example this
14:05:13
header one I'll show you how to do this uh the default behaviors is to infer that
there are column names if no names are passed this behavior is identical to header
equals zero so it's saying that first row or that first index which is like right
here that zero is going to be read in as a header but we can come right over here
and we'll do comma header is equal to and we can say none and as you can see there
are no headers now instead it's another index so we have index indexes on both the
x- axis
14:05:45
and the Y AIS and so right now we have this zero and one index indicating the First
Column and the second column if we want to specify those names we can say the
header equals none then we can say names is equal to and we'll give it a list and
so the first one was country and what's that second one oh region so right here
that's the first um the first row but we'll rename it and we'll just say country
region and when we run that we've now populated the country and the region uh we're
just pretending that our
14:06:19
CSV does not have these values in it and we have to name it ourselves that's how
you do it but let's get rid of all that because we actually do want those in there
so we're just going to get rid of those and read it in as normal and there we go
now typically when you're reading in a file what you need to do is you want to
assign that to a variable almost always when you see any tutorial or anybody online
or even when you're actually working people will say DF is equal to DF stands for
data frame again
14:06:50
this is a data frame in the next video in this series I'm going to walk through
what a series is as well as what a data frame is because that's pretty important to
know when you're working with these data frames but we'll assign it to this value
and then we'll say we'll call it by saying DF and we'll run it and that's typically
how you'll do things because you want to save this data frame so later on you can
do things like data frame Dot and you can uh you know pass
14:07:14
in different modules but you can't really do that it's not as easy to do it if
you're calling this entire CSV and importing it every time so let's copy this
because now we're going to import a different type of file so now we've been doing
read CSV but we can also import text files now you can do that with the read CSV we
can import text files let's look at this one we have the same one it's countries of
the world except now it's a text file because I just converted it for this video
I'll copy
14:07:43
that as a path and so now when we do this oops let me get those quotes in there
it'll say world. txt it will still work as you can see this did not import properly
um we have this country back SLT region and then all of our values are the exact
same with this back SLT that's because we need to use a separator and I'll show you
in just a little bit how we can do this in a different way but with that read CSV
this is how we can do it we'll just say sep is equal to we need to do back SLT now
let's try running this and
14:08:18
as you can see it now has it broken out into country and region we could also do it
the more proper way and this is the way you should do it and I'll get rid of these
really quickly but just want to keep them there in case you want to see that but
you can also do read table and let's get rid of this separator and now we have no
separators just reading it in as a table let's run this and it reads it in proper L
the first time this read table can be used for tons of different data types but
14:08:48
typically I've been using it for like text files um we can also read in that CSV so
let's change this right here to CSV we can read it in as a CSV but just like we did
in the last one when we read in the text file using read CSV this read table you're
going to need to specify the separator so I'll just copy this and we'll say comma
and now it reads it in properly again you can use that for a ton of different file
types but you just need to specify a few more things if you don't want to use the
more
14:09:18
specific read uncore function when you're using pandas now let's copy this again
we're going to go right down here and now let's do Json files Json files usually
hold semi structured data um which is definitely different than very structured
data like a CSV where has columns and rows so let's go to our file explorer we have
this Json sample we will copy this in as the path let's paste it right here and
we'll do reor Json again these different functions were built out specifically
14:09:53
for these file types that's why you know each one has a different name so now we're
reading this in as the Json let's read it in and it read it in properly now let's
go ahead and copy this and take a look at Excel files because Excel files are a
little bit different than other ones that we've looked at um so let's just do read
uncore Excel and let's go down to our file explorer and let's actually open up this
workbook as you can see we have sheet one right here but we also have this
14:10:27
world population which has a lot more data let's say we just wanted to read in
sheet one we can do that or by default it's going to read in this world population
because it's the first sheet in the Excel file well let's go ahead and take a look
at that let's get out of here and and let's say oops I forgot to copy the file path
let's go ahead and copy as path and we'll put it right here and let's just read it
in with no arguments or anything in there or no parameters when we read it in it's
14:11:01
reading in that very first sheet so this is the one that has all of the data now
let's say we wanted to read in that extra sheet name or the second sheet name we'll
just go comma sheet undor name say is equal to and then we can specify sheet was it
sheet one like this yes it was so we just had to specify the sheet name right here
and then it brought in that sheet instead of the default which is the very first
sheet in that Excel now that definitely covers a lot of how you read in those files
again
14:11:31
you can come in here and hit shift Tab and this plus sign and take a look at all
the documentation and you can specify a lot of different things things that I
didn't think were very important for you guys to know especially if you're just
starting out the ones that we looked at today are what I would say are like the
ones that I use almost all the time so I wanted to show you those but if you're
interested in any of these other ones or you have very unique data and you need to
do that um you know it's
14:11:55
worth really getting in here and figuring things out a few other things that I
wanted to show you just in this kind of first video or this intro video on how to
read in files um one thing that you may have noticed especially in this file right
here is we're only looking at the first five and then the last five so if we wanted
to see all the data all the data is in these like little three dots right here
right we want to be able to see that data but right now we can't and that's because
of
14:12:24
some settings that are already within pandas and all we need to do is change that
so this one has 234 rows and four columns so obviously we can see all the columns
well let's just change the rows all we'll say is pd. set uncore option now what we
need to do is we're going to change the rows we're not going to change the columns
at least not on this one so we'll say quote display. max. rows now if we just run
this for whatever data we bring in it's going to be able to show the max rows
14:12:59
and then we'll say 235 although there's 234 rows I'm just going to be safe let's
run this and now it has changed it so let's read in this file again and you'll see
how it's changed now we have all the numbers and we have this little bar on the
right that allows us to go down all the way to the bottom and all the way to the
top so now we can actually look and kind of skim and see our values I like that
better than just having that you know shorter version um we can do the exact
14:13:29
same thing on columns as well so if we look at this one this is our Json file has
the same thing right here we have what was it 38 columns but we can only see I
think it's maybe it's 20 or something like that I can't remember um but we have 38
we can only see like let's say 15 of them or 20 of them we'll do the exact same
thing and we'll just say pd. set options. max. columns and we'll set that to 40 for
that one when we run this oops let's get over here when we run this one again we
14:14:07
can now scroll over and see every single one of our columns now that one is a in my
opinion a lot more useful I like being able to see every single column so
definitely something that you should be using especially when you have these really
large files you want to be able to see a lot of the data and a lot of the columns
so when you're slicing and dicing and doing all the things that we're about to
learn in this Panda series you know you know what you're looking at I also want to
show you just
14:14:31
how to kind of look at your data in these data frames as well that's also pretty
important so let's go right down here and the very last one that we imported was
this one right here this read Excel so this data frame is the only one that's going
going to read in let's run it um this is the last one to be run so this variable
right here DF uh it won't be applied to all these other ones which we can always go
back and change those typically you'll do something like data frame two you want
14:14:57
to do something like that um so let's keep data Frame 2 oops so what we're going to
do is we're going to bring data Frame 2 right down here and we want to take a look
at some of this data we want to know a little bit more about it something that you
can do is dataframe 2. info and we'll do an open parenthesis and when we run this
it's going to give us a really quick breakdown of a little bit of our data so we
have our columns right here rank CCA 3 country and capital it's saying we have
14:15:25
234 values in those columns because there's 234 scroll up here because there's 234
uh rows that tells me that there's no missing data in here at least not you know
completely missing like null values there is something something in each of those
rows the count tells me it's non- null so there's no null values and it tells me
the data type so it's ringing in as an integer an object an object and an object
and it also tells us how much memory it's using which is also pretty
14:15:56
neat because when you get really really large data types memory usage and and
knowing how to work around that stuff does become more important than when you're
working at these really small You Know sample sizes that we're looking at we can
also do oops let me get rid of that can also do data frame two and we'll do shape
and for this one we do not need the parentheses and all this is going to tell us is
we have 234 rows and four columns we're also able to look at uh the first few
values or rows in each of
14:16:29
these data frames so we can just say data frame 2. head and if we do that it's
going to give us the first five values but we can specify how many we want we can
say head 10 it'll give us the first 10 rows right here we can do the exact same
thing and let's go right down here and we'll say tail so they'll give us the last
10 rows within our data frame now let's copy this and let's say we don't want to
actually look at all of these values or all these columns we can
14:16:58
specify that by saying df2 and oops let's get rid of all of this and we'll say with
a quote we'll say Rank and now we can take just a look at the rank data now we
can't do that by doing the index or at least not like this if we want to use this
index that is right here we can but there's a very special function called Lo and
IO for that and I'm going to have an entire video on this because it does get a
little bit more complex but there's df2 and there's Lo and I stands for
14:17:31
location and I location that's only for the indexes whether it's the x axis or the
Y AIS those are the indexes and for location it's looking for the actual text the
actual string of the index so if we come up here that data Frame 2 we can specify
224 and it'll give us this information right here in a little different format so
let's go bracket and we'll say 224 and when we run this it gives us our rank CCA
country capital with our values over here kind of like a dictionary
14:18:05
almost now let's copy this and we'll say df2 do IO and right now these look the
exact same but we haven't really talked a lot about changing the index and you can
change the index to a string or a different column or something like that and we'll
look at that in future videos the iock looks at the integer location so even if
these um let's go right up here even if this index had changed to let's say this
rank or this CCA 3 or country or whatever you make this index the ILO will still
look at the integer
14:18:37
location so that 224 would still be 224 even if it was usbekistan so then when we
look at this it's going to be the exact same but if we had changed that Index this
Lo is the one that we could search on and we could search usuzan is that how you
spell usbekistan hey I nailed it so that is how you use Lo and IO again I just
wanted to show you a little bit about how you can look at your data frame or search
within your data frame now in future videos I'm going to dive a lot deeper into a
lot of
14:19:10
the concepts that we just looked at because I just kind of touched on them I wanted
you to have a brief introduction to them so that in future videos I'm not just
dropping everything on you all at once so hopefully this was a good quick
introduction to those topics uh you should be able to read in a file now see your
data frame and kind of look at it in a few different ways that we just looked at
and I hope that that was helpful and if it was be sure to check out all my other
videos on Python and
14:19:33
pandas and if you like this video be sure to like And subscribe below and I will
see you in the next [Music] video [Music] hello everybody today we're going to be
looking at filtering and ordering data frames in pandas there are a lot of
different ways you can filter and order your data in pandas and I'm going to try to
show you all of the main ways that you can do that so let's kick it off by
importing our data set so we're going to say data frame is equal to and we'll say
pandas and I need to import my pandas so
14:20:10
we'll say import pandas as p that's pretty important I think um so pd. read CSV and
we'll do R and then we'll say the world population CSV so let's run this all our
data frame right here and this is the data frame that we're going to be filtering
through and ordering in pandas so let's kick it off the first thing that we can do
is filter based off of The Columns so the data within our columns so Asia Europe
Africa or whatever data we may have in that column let's go right down here we're
14:20:46
going to say DF and then within it we're going to specify what column we're going
to be filtering on so we're going to say DF with another bracket and we'll say rank
so we're going to be looking at this rank column right here and we'll say in that
rank column we want to do greater than 10 and that's actually going to be a lot of
them let's do less than so when we run this it's only going to return these values
that are less than 10 we can also do less than equal
14:21:13
to you know all of these um comparison operators so less than or equal to so now we
have all of the ranks 1 through 10 now if we look at these countries we can specify
by specific values almost exactly like we did here but instead of doing a
comparison operator like we did right here and including those names let's say
Bangladesh and Brazil we can use the is in function almost like an in function in
SQL if you know SQL so let's go right down here and we're going to say specific
underscore countries so
14:21:45
right now we're just going to make a list of the countries that we want and then
we'll say Bangladesh and Brazil so let's go right down here and we'll say okay for
these specific countries from the data frame let's do our bracket we'll say in this
country column so we'll do data frame and then another bracket for country so in
this country column we can do do is in and then an open parenthesis and then look
for our specific countries so we're looking at just this column and we're
14:22:24
saying is in so we're looking at are these values within this column and we're
getting this error and this looks very very odd let me um this doesn't look right
there we go I just had some syntax errors I apologize made it way more complicated
than it needs to be but here's how you use this is in function so we're looking at
Bangladesh and Brazil and we return those rows with Bangladesh and Brazil really
quickly I wanted to give a huge shout out to the sponsor of this entire Panda
series and
14:22:55
that is udemy udemy has some of the best courses at the best prices and it is no
exception when it comes to pandas courses if you want to master pandas this is the
course that I would recommend it's going to teach you just about everything you
need to know about pandas so huge shout out to UD me for sponsoring this Panda
series and let's get back to the video we can also do a contains function kind of
similar to is in except it's more like the like in SQL as well I'm comparing a lot
of this to
14:23:19
SQL CU When You're filtering things I always my brain always goes to SQL but in
pandas it's called the contains so let's do let's actually copy this because I
don't want to make the same mistake again let's do that and we'll do the bracket
but instead of dot is in we're going to do string. contains and then an open
parenthesis so we're going to going to be looking for a string if it contain if it
contains let's do United almost like United States or or any other United so let's
run this and
14:23:54
as you can see we have United Arab Emirates United Kingdom United States United
States Virgin Islands so we can kind of search for a specific string or a number or
a value within our data or within that column of country now so far we've only been
looking at how you can filter on these columns we can also fil filter based off of
the index as well and there's two different ways you can do it or two of the main
ways there's filter and then there's L and IO Lo stands for location and IO stands
for
14:24:23
integer location and if you've seen other previous videos I've kind of mentioned
those so we can take a quick look at all of those so really quickly we need to set
an index because the index right now is uh not the best we'll set our index to
Country so let's say df2 is equal to DF do setor index and we'll say country I'm
just doing df2 because later on I want to use that data frame again so I'm just
going to assign it to another data frame so that we can just easily switch back and
forth so now we
14:24:59
have this index as the country and what we can do is use the filter function so
let's go down here we'll say df2 filter and we'll do an open parenthesis and now we
can specify our items so these are actually going to be specifying which columns we
want to keep so we're going to say items is equal to then we'll make a list we'll
say continent hope that's how we spell continent I'm always messing up with my uh
my stuff here my spelling then we'll do CCA 3 because why not you can specify
14:25:33
whichever ones you want when we run this it's going to only bring in those two
columns Now by default it's choosing the access for us but we can also specify
which axis we want to search on so if we say axis is equal to zero it's actually
going to search this axis this is the zero axis this is the one axis so where our
columns are is one so if we go back and do one we're searching on that one Axis or
those header axises again and this is the default but you can specify that so if
you just want to search on uh
14:26:06
you know filtering right here you can do that and let's actually copy this and do
that right down here just you can see what it looks like but let's search for
Zimbabwe and we'll do Zimbabwe and we'll be looking at the zero axis which is the
up and down on the left hand side and when we filter on that we can filter by
Zimbabwe by looking just at the country index we can also use the like just like we
did before and I'll show you the exact same demonstration that we did which you can
say like is equal to and
14:26:39
instead of having to put in a concrete um text text you can just say United just
like we did before and we're searching where the axis is equal to zero which again
is this left-handed access so now we're looking for United and it's going to give
us all of the countries or all the indexed values that have United in it like we
were talking about before we also have l and ILO so we can say data frame 2. L now
this is a specific value so we'll do United States so location is just looking at
the
14:27:13
actual name or the value of it not its position so if we search for United States
it's going to give us this right here where it gives us all of the columns for
United States and then all of the uh values for United States or we can do the io
which is the energ location which is not the exact same because we're looking at
the string for the L we're looking at this string but underneath it there still is
a position that's that integer location let's do a completely random one let's just
say
14:27:46
three if we look at the third position it's going to give us ASM which I'm not
exactly sure what it is but it still gives us basically the same kind of output
which is the columns and the values so that's another way that you can search
within your index when you're actually trying to filter down that data now let's go
look at the order bu and let's start with the very first one that we looked at
let's do data frame that's why I kept it because I wanted to use it
14:28:11
later now we can sort and order these values instead of it just being kind of a
jumbled mess in here we can sort these columns however we would like ascending
descending multiple columns single columns and let's look at how to do that so
we'll say data frame and then we'll do data frame look at rank again just like we
were doing above and let's do data frame where it's less than 10 I should have just
gone and copyed this I apologize so now we have this data frame that is greater
than 10 now we can do do
14:28:44
sortore values and this is the function that's going to allow us to sort everything
that we want to sort so we can do buy is equal to and we'll just order it by the
exact same thing that we were doing or calling it on we'll do rank so now what this
is going to do it's going to order our rank column and as you can see it did that
one 2 3 4 5 we can also do it with ascending or descending so if you want to you
can look here and see what you can do so we'll do ascending we'll say that's equal
to
14:29:18
true and so that's the automatic default so that didn't change anything but if we
say false it's going to be descending from highest to lowest so now we have it in
the opposite direction now we don't have to just order or sort this on one single
column we can do multiple columns and we can do that by making a list right here
whoops make a list just like that and we'll input different ones as well so now
let's input our country and when we run this it will give us rank of
14:29:51
9876 as well as the country of Russia Bangladesh Brazil now if you noticed the
country really didn't change because the rank stayed the exact same that's because
there's an order of importance here and it starts with the very first one if we
change this around and we look at this one and put a com right here now the country
is going to be descended and the rank would come second so it's not going the rank
isn't going to really have any effect here so now we have the country United States
Russia Pakistan
14:30:24
and the rank really didn't get ordered at all now if we want to see how that can
actually work let's do continent right here and actually put it right here and do
country here so if we run this it's first going to come and it's going to organize
or sort the continent then it's going to come come back and go to the country and
then it's going to sort the country so keep so keep your eye right here in this
Asia area because we're going to sort this differently than ascending so we have
ascending
14:30:54
false and that applies to both of these it's false and false but we can specify
which one we want to do we can do a false here and a true here so we'll do false
comma true and what this is going to do is it's going to say false for the
continent so the continent right here is going to stay the exact same and so that
is a lot of how you can filter and order your data within pandas I hope that this
was helpful I hope that you enjoyed this video if you liked it be sure to like And
subscribe below check out all my
14:31:22
other videos on Python and pandas and I will see you in the next [Music] video
hello everybody today we're going to be looking at indexing and pandas if you
remember from previous videos the index is an object that stores the access labels
for all Panda objects the index in a data frame is extremely useful because it's
customizable and you can also search and filter based off of that index in this
video we're going to talk all about indexing how you can change the index and
customize that as
14:31:59
well as how you can search and filter on that index and then we're also going to be
looking at something a little bit more advanced called multi- indexing and you
won't always use it but it's really good to know in case you come across a data
frame that has that so let's get started by importing pandas import pandas as PD
now we'll get our first data frame we say DF is equal to pd. read CSV and I've
already copied this but we're going to do R and we're going to put this file path
so I have
14:32:30
this world population CSV I will have that in the description just like I do in all
of my other videos let's run DF and let's take a look at this data frame so we have
a lot of information here we have rank country continent population as well as the
default index from zero all the way up to 233 now if you haven't watched any of my
previous videos on pandas the index is pretty important and it's basically just a
number or a label for each row it doesn't even necessarily have to be a unique
number um you can
14:33:02
create or add an index yourself if you want to and it doesn't have to be unique but
it it really should be unique uh especially if you want to use it appropriately for
what we're doing the country is actually going to be a pretty great index because
the country you know is going to be all unique because we're looking at every
single row as a different um country as well as the population so let's go ahead
and create this country or add this country as our index now we can do this in a
lot of
14:33:28
different ways but the first way that you can do this if you already know what you
are going to create that index on is we can just go right in here when we're
reading in this file and we'll say comma index underscore oops I I spelled that
completely wrong index uncore column and we'll say that is equal to and then we're
going to say quote country so we're taking this country and we're going to assign
it as the index now let's read this in and as you can see this is our index now it
looks a little
14:33:59
bit different we didn't have this country header right here which is specifying
that this is still the country but you can tell that this is the index based off
the um bold letters as well as it being on the far left and all the regular columns
for the data is over here while the country header is right here and it's lower
than all the others just a quick way that you can see that that is the index now
before we move on I want to show you some other ways that you can do this as well
but
14:34:24
I'm going to show you how to reverse this index before we move on and we'll say
data frame so we had our data frame right here so we have data frame dot we'll say
reset index and then we'll say in place is equal to True which means we don't have
to assign this to another variable and all that stuff it'll just be true so now
when we run that data frame again the index was reset to the default numbers so now
let's go down here I'll show you how to do this in a different way you can do DF do
we'll say
14:34:56
setor index and then we'll just say country so very similar to when we were reading
in that file and we said set the index or that index column we said index column
equals country if we do this and we run it in it works but if we say data frame
right down here it's not going to save that if we want to save it just like we did
above we're going to say in place is equal to true that is going to save it to
where we don't have to assign it another variable so now when we run this the data
frame right here which is
14:35:28
going to populate this the data frame is going to say in place is equal to true so
that country will now be our index again let's run this and there we go really
quickly I wanted to give a huge shout out to the sponsor of this entire panda
series and that is udemy udemy has some of the best courses at the best prices and
it is no exception when it comes to pandas courses if you want to master pandas
this is the course that I would recommend it's going to teach you just about
everything you need to know
14:35:53
about pandas so huge shout out to UD to me for sponsoring this Panda series and
let's get back to the video now what's really great about this index is we're able
to search based off just this index and so we can filter on it and basically look
through our data with it and there are two different ways that you can do that at
least this is a very common way that people who use pandas we'll do to kind of
search through that index the first one is called lock and there's lock and iock
that stands for location
14:36:18
or integer location let's look at lock first let's say df.loc and then we'll do a
bracket now we're able to specify the actual string the label so let's go right up
here and let's say Albania so we'll say Albania so again this is just looking at
the location let's run this now it's going to bring up all the Albania data just
like here where it's kind of looks like a column in a column and we can get this
exact same data but using iock right here and when we ran lock we were searching
based
14:36:53
off Albania which is in the 01 position so if we actually pull the one position for
that integer the iock we can look at the one position and this should give us the
exact same data now let's take a look at multi- indexing and we'll come back to a
little bit of this in a second so multi- indexing is creating multiple indexes
we're not just going to create the country as the index now we're going to add an
additional index on top of that so let's pull up our data frame right now we have
the country but let's do do
14:37:28
reset index and we'll say in place equals true oops let's run it so now we have our
data frame now let's set our index but this time when we set our index we're going
to add the country as the index as well as the continent as an index so we'll say
data frame. setor index then we'll do a parenthesis and instead of just doing
country like we did before we're going to create a list oops and we'll do it like
that and then we'll say oops continent and separate it by a comma so
14:38:08
we have continents and Country let's just say in place is equal to true now when we
run this we're going to have two indexes and let's see what this looks like and
let's run this so now we have country as well as continent as our index now you may
notice that these indexes are repeating themselves on this continent index we have
Europe right here and Europe right here as well as Asia and Asia and it looks a
little bit funky but we are able to sort these values and make they look a lot
better
14:38:46
so let's go ahead and try this we'll do DF do sortore index and when we run this it
should sort our index alphabetically and we can also look in here and see what kind
of things we can you know specify we can specify the axis but it's automatically
going to be looking at the zero this is zero and this is one so we have two axes
within our data frame you can choose the level whether it's ascending or not
ascending in place kind string sort remaining all of these different things the
only one that I
14:39:17
really you know think is worth looking at is the ascending we already know some of
these other ones but if we look at ascending let's run it now it's sorted these and
so now it's kind of grouped together so we have Africa and all the African ones as
well as South America and all the South American ones let's really quickly say pd.
setor option and we'll say display. max. columns and just like this let's run it
and I need to specify whoops specify right here let's see how many
14:39:55
rows we have 235 so let's do 235 let's run this and now when we run this you can
see that Africa is all grouped together and all the countries are in alphabetical
order under it and then we go all the way down to Asia and again just all in
alphabetical order if we wanted to we could say ascending equals true and then when
we run this oh I meant to say false and then when we run this it's the exact
opposite so it starts with South America the last one and then goes in reverse
alphabetical
14:40:30
order we could also say false make it a list and do comma true and just like this
and then it would sort this First Column as false and this next column as true so
you can really customize it but you know for what we're doing we don't need any of
that we just need to be able to see this right here so now when we try to search by
our index like we did before we did data frame. Loke now when we did that and we
said you know let's say Angola when we specified Angola it's not going to work
properly because it's searching
14:41:05
in this first index for the first string that we have we can search Africa let's
search for Africa and now we have all of the African countries and if we want to
specify to Angola we can also go down another level oops by doing Ang Angola and
now we have what we were looking at before where we're calling all the data within
those but we couldn't do it just based off Africa because we had an additional
Index right here so once we called both indexes now we get this view but let's look
at that
14:41:41
I look really quick when we run this let's just say one because right up here oh we
have Angola zero and then one so you think it may pull up Angola let's go ahead and
run this and it's still pulling up Albania let's go right up here if you remember
when we didn't have the multiple indexes it was pulling up Albania the difference
when you're doing these multi- indexes is that the the L is able to specify this
whereas this one does not go based off that multi- indexing it's going to go based
off the
14:42:19
initial index or the integer based index so that's a lot about indexing in pandas
we'll cover even a few more things in future videos as we get more and more into
pandas but this is a lot of what indexing looks like within pandas and again super
important to learn how to do and know how to do because it's a pretty important
building block as we go through this Panda series so I hope you enjoyed this video
on indexing if you did be sure to like And subscribe below and I will see you in
the next
14:42:49
[Music] video hello everybody today we're going to be taking a look at the group by
function and aggregating within panas group I is going to group together the values
in a column and display them all on the same row and this allows you to perform
aggregate functions on those groupings so let's start reading in our data and take
a look so we're going to do import pandas as PD and then we're going to say our
data frame is equal to and we'll say pd. read CSV we'll do an open parenthesis R
and
14:43:32
our file path and we're going to be looking at the flavors CSV right here so right
here we have our flavor of ice cream we have our base flavor flavor whether it was
vanilla or chocolate whether I liked it or not the flavor rating texture rating and
its overall or its total rating now these are all my own personal scores so you
know I've spent years researching this so these are all very accurate but this
should be a low stress environment to learn Group by and the aggregate functions so
the
14:43:59
first thing that we can do is look at our group by now you can't Group by well you
can you can Group by flavor but as you can see these are all unique values what we
need is something that has duplicate values or or similar values on different rows
that'll group together so this base flavor is actually a perfect one to group it on
and we'll do that by saying DF do group by do an open parenthesis and we'll just
specify base flavor and this will then group together those values and I need to
make sure I
14:44:33
can spell properly this will group those flavors together so let's run this and as
you can see it actually is its own object so it has a group by data frame Group by
object so now that we've grouped them let's give it a variable so we'll say group
underscore byor frame let's say that's equal to Let's copy this we'll run it and
now what we need to do is run our aggregations in order to get an output so we're
going to say mean and that's all we're going to put
14:45:07
just for now just to get an output that we can take a look off and then we'll build
from there so let's go ahead and run this and right here we have our base flavor
which is now saying is the index of chocolate or vanilla and then it's taking the
mean or the average of all the columns that have integers notice that it did not
take the liked column and it did not take the flavor column because those are
strings and they cannot aggregate those and we'll take a look at that later but it
took all the
14:45:35
values that have integers and then it gave us the average of those ratings really
quickly I wanted to give a huge shout out to the sponsor of this entire Panda
series and that is udemy udemy has some of the best courses at the best prices and
it is no exception when it comes to pandas courses if you want to master pandas
this is the course that I would recommend it's going to teach you just about
everything you need to know about pandas so huge shout out to UD me for sponsoring
this Panda series and
14:45:57
let's get back to the video so right off the bat as averages with chocolate I have
a much higher rating overall than the ones with vanilla bases now we can actually
combine all of this together into one line and we can do something like this so
we'll say DF do group by and we'll say mean just like this and this will actually
run it before we didn't have any aggregating function on there so it didn't run but
now that we combine it all into one it will run properly now there are a lot of
different aggregate
14:46:30
functions but I'm going to show you some of the most popular ones or the most
common ones that you will see so let's copy this right here so we can do dot count
and when we run this we can look at the count and this will show us the actual
count of the rows that were aggregated so for chocolate we have three so there
going to be three all the way across and for vanilla we had six so we're looking at
a higher count of vanilla which if you're comparing it to this mean up here that
could be a big
14:46:59
skew towards the chocolate because if you have one or two good chocolates it could
really pull the numbers up whereas if you had two good vanillas but all the other
ones were bad it pulls that average down so knowing the count of something
something is really good let's take a look at the next one and we can do Min and
Max and I'll just run these really quickly we can do Min and when we run this the
first thing that you should notice is that it now has a flavor and a liked column
and that's because Min and Max will actually
14:47:27
look at the first letter in the string or the first set of letters if there are um
you know chocolate something it'll look at the first and then it'll actually
populate it so chocolate with the CH chocolate is the very first or the minimum
value for that string and for a cake batter that is the minimum value in vanilla as
well now with the liked it's interesting because apparently I liked all the
chocolate ones I'm going to go take a look so chocolate I liked chocolate I liked
chocolate I lik so there is no no option
14:47:58
in this liked column so yes was the only option and now let's look at Max whoops
and it should do the exact opposite which is going to take the highest value even
if it's a string so Rocky Road the letter r comes later in the alphabet so that's
what it's looking at and so does vanilla and then we have yes as well and then of
course right here it's taking the max value so before when we were looking at Min I
just focused on those but it still does the exact same thing to these integer um
14:48:26
columns as well so for the max value for vanilla it was mint chocolate chip that
was our base so I had a rating of 10 for this vanilla row or grouping and then we
can also look at the sum and there are all the sums for these and again it only
does integer because we can't add the strings here are the sum or the total values
for all of them and for the total values since we had you know six rows that were
grouping into this vanilla we now have a lot of a much higher score for vanilla now
that's a
14:48:58
really simple way to do your aggregations but there is actually an aggregation
function and let's take a look at this CU this is um a little bit more complex
although when I write it out or show you hope it makes a lot of sense we can do a
so this is our aggregate function and what we need to pass into our aggregate
function is actually a dictionary so let's do an open parenthesis and we're going
to do a squiggly bracket and then we need to specify what we're going to be
aggregating on or what column so let's
14:49:29
do this flavor rating let's copy this we'll do flavor rating and I need to put that
as a string and then we'll do a colon and now we can specify what what aggregate
functions we want so we've done sum count mean Min and Max all of those and we can
actually put all of those into here and perform all of those aggregations on just
one column so let's make a list and then let's say mean Max count and uh what's
another one sum so let's do all four of those only on this flavor rating
14:50:08
column and when we run this we have our base flavor right here chocolate and
vanilla but now we don't have multiple columns we have one column with multiple
Columns of our aggregations and it is possible to pass in multiple columns like
that so we'll do texture rating and we'll just come right here and do a comma then
we'll say uh uh texture rating and then a colon I don't know why I spelled it out
when I copied it but I did and then we'll do the exact same ones and now when we
run it we're
14:50:43
getting the exact same columns mean Max count and sum for flavor rating then mean
Max count and sum for our texture rating now so far we've only grouped on one
column but we can actually group on multiple columns let's go back up here to our
data and I should have just copied this down here let's go back down and just look
at this so really we only grouped it on this base flavor but you can do multiple
groupings or group by multiple columns so let's do our base flavor which we did
already as well as
14:51:16
the liked column so we're going to say DF dog Group by then we'll do an open
parenthesis and then instead of just passing through one string we're going to do a
list and we'll say base flavor oops comma and then we'll do liked so now when it
groups this it should put put two groupings and let's run this and just see oops I
got to say let's just do mean so now we have our chocolate and a vanilla and
remember chocolate only had yes so that's the only one that it's
14:51:54
going to group on but vanilla had a no and a yes so if we look at the vanilla we
have our base flavor vanilla and then within liked we have no and a yes which can
show us that within our vanilla when we group on these our NOS were really low but
our yeses were really high we actually had a pretty similar rating or very close to
the same rating as the ones we really liked in chocolate and just like we did above
we can take this doag and I'm going to copy this and it'll perform it on each of
those rows
14:52:26
let me close that and what did I do wrong oh I need the squiggly bracket and it'll
show us each of those so the mean Max count and sum for all of the chocolate and
vanilla as well as the groupings of light yes and no now after we've looked at all
that and that's how I usually do it there is one uh shortcut function that can give
you some of these things just really quickly and so let's go back up here and take
this it's just called describe um and if you've ever done it it's just going to
give you some
14:52:58
high level overview of some of those different aggregations so let's run this and
it's going to give us our chocolate and vanilla and within each column it's going
to give us our count our mean our standard deviation I believe is what that is our
minimum 25% 50 75 and 100 which is our Max then our count and our means so a lot of
those aggregate functions but the describe is you know a very generalized um
function we can't get as specific as we were with the previous ones that we were
looking at
14:53:29
but I just wanted to throw this out there in case this is something that you'd be
interested in because it you know technically is showing a lot of those aggregate
functions just you know all at one time so that is our group Buy and aggregate
functions within pandas I hope that that was helpful I hope that you understood you
know everything that we were working on if you like this video be sure to like And
subscribe and check out all my other videos on python as well as pandas and I will
see you in
14:53:51
the next [Music] video hello everybody today we're going to be talking about
merging joining and concatenating data frames in p do this whole video is basically
around being able to combine two separate data frames together into one data frame
these are really important to understand when we're actually using the merge and
the join right here we have what's called an inner join and the Shaded part is
what's going to be returned it's only the things that are in both the left and the
14:54:29
right data frames then we have an outer join or a full outer join and this will
take all the data from the left data frame and the right data frame and everything
that is similar so basically just takes everything we also have a left join which
is going to take everything from the left and then if there's anything that's
similar it'll also include that and then the exact opposite of that is the right
join which is going to give us everything from the right data frame and it's going
to give
14:54:54
us everything that is similar but it's not going to give us anything that is just
unique to the left data frame so this is just for reference because in a little bit
when we start merging these these become very important so I just wanted to kind of
show you how that works visually so let's get started by pulling in our files so
first we're going to say import and is aspd we'll run this and then we'll say data
frame one and we'll also have a data frame two and these are the different data
frames
14:55:20
the left and the right data frame that we'll be using to join merge and concatenate
so we'll say data frame 1 is equal to pd. CSV read and we'll do R and here is our
file path so we have this lr. CSV that's our Lord of the Rings CSV and let's call
that really quickly so we can see what's in there and I'm having a dyslexic moment
uh because it's supposed to be reor CSV uh I apologize for that but this is our
data frame this is our data frame one we have three columns
14:55:54
it's their Fellowship ID 10001 2 3 and four their first name froto Sam wiise gelf
and Pippen and their skills hide and gardening spells and fireworks so this is our
very first data frame that we're going to be working with let's go down a little
bit let's pull this down here and we're just going to say data Frame 2 Data Frame 2
and this is the Lord of the Rings 2 so let's pull this one in now as you can see
it's very similar we have Fellowship ID 1 2 6 7 8 so we have three different IDs
here we
14:56:26
don't have six seven and eight in this upper this First Data frame we also have the
first name so froto and Sam or Sam wise are in the very first and the second data
frame but now we have three new people barir Eland and legalis and now we have this
age column which again is unique to just this second data frame really quickly I
want to give a huge shout out to the sponsor of this video and that is zendesk I've
been using zenes for my company's customer analytics and it has been absolutely
14:56:53
phenomenal they're going to be hosting a conference called zenes relate on May 10th
and they're going to talk all about customer analytics chat Bots and AI in this
space you can attend in person in San Francisco or you can attend virtually but
space is limited so be sure to apply if you want to attend so if you are a business
leader and you want to make most out of your customer data or you want to learn
customer data analytics I will leave links in the description again huge shout out
to
14:57:16
zendesk for sponsoring this video now the first one that I want to look at is merge
and I want to look at merge first because I think this one is the most important I
use this one more than any of ones that we're going to talk about today the merge
is just like the joins that we were just looking at the outer the inner the left
and the right and there's also one called cross and I'll show you that one although
if I'm being honest I don't really use that one that much but It's Worth showing
just in case
14:57:40
you come into a scenario where you do want to do that so let's go right down here
and I want to be able to see these while we do it so we're going to say data frame
one and when we specify data frame one as the very first data frame we say datf
frame. merge this is automatically going to be our left data frame then if we do
our parentheses right here and we say data Frame 2 this is our right data frame and
let's see what happens when we do this so what it's going to do and this we
14:58:12
didn't specify this it's just a default it's going to do an inner join so it's only
going to give us an output where specific values or the keys are the same now you
can't see this but what is happening is is it's taking this Fellowship ID and
saying I have 101 here a 102 here this is the exact same as up here with this
Fellowship ID and fellowship ID of 101 and 2 but when we look at 13 and 4 those
aren't in this right right data frame and 678 is not in this left data frame so the
only ones
14:58:44
that match are this 101 and two and that's why they get pulled in down here but
because we didn't explicitly say here's what I want to join or merge between these
two data frames it actually is looking at the fellowship ID and the first name so
it's taking in these unique values of froto and Sam wise which are the same in both
which is why I pulled it over but really quickly let's just check and make sure
that we did it on the inner join because again we didn't specify anything that was
just
14:59:15
the default so we're going to say how is equal to and then we'll say iner and if we
run this it's going to be the exact same because again the inner is the default but
now just to show you how it's kind of joining these two uh data frames together I'm
going to say on is equal to and then I'm only going to put Fellowship ID so let's
run this now the first thing that you make may have noticed is this first name
undor X and this first name uncore Y what the merge does as kind of a default is
when you
14:59:47
were only joining on a fellowship ID we have this right data frame with Fellowship
ID the left data frame with the fellowship ID if you're just joining on these and
you're not joining on the first name and the first name then it's going to separate
those into an underscore X and an underscore Y and even though they have the exact
same values since we are not merging on that column it automatically separates that
into two separate columns so we can see the values within each of those columns
15:00:15
if we went into this on and we make a list and let's do it like that and we say
comma and then we write first name oops first name and then we run this it's going
to look exactly like it did before again it automatically pulled in both of these
columns when it was merging at the first time even though we didn't write anything
but if we actually write this this it's doing exactly what it was doing when we
just had df2 we're just now writing it out now there are other arguments that we
can pass into
15:00:46
this merge function let's hit shift Tab and let's scroll down here so within this
merge function we have a lot of different arguments that you can pass into it first
we have this right which is the right data frame which is this data frame two then
we have the how and the on which we've already shown how to do there's a left on
right on left Index right index not something you'll probably use that much but you
definitely can if you want to look into that and there's all these doc strings
15:01:13
which show you exactly how to use all of these so if you're interest in looking at
the left and the right and the left index it's all in here the one that is really
good is the sort and you can sort it saying either it's false or true then we have
these suffixes now if you remember when we took these out what it automatically did
was it put in these underscore X and underscore y you can customize that and you
can put in what whatever you'd like instead of the underscore X andore Y you can
put in
15:01:42
some custom um string for that we also have an indicator and a validates again all
things you can go in here and look at I'm just going to show you the stuff that I
use the most so these things right here are things that I definitely use the most
so now that we've looked at the inner join let's copy this right down here and
let's look at the outer join and these get a little bit more tricky I think the
inner join is probably the easiest one to understand well look at the outer is
spelled o u t
15:02:11
e r i I don't know why I always want to say o t t r but let's run this and see what
we get so now this looks quite different the inner join only gave us the values
that are the exact same this one is going to give us all of the values regardless
of if they are the same so we have 1 2 3 4 six seven and eight so let's scroll back
up here so we have 1 2 3 4 1 two and six s and 8 so we don't have a 105 and then if
you notice in this data frame right here if the value doesn't have so if we can't
join
15:02:49
on the fellowship ID or the first name like legalis wasn't one that we joined on or
that has a similar value in the left data frame it just gives us an N which is not
a number and it's going to do that for any value where it couldn't find that join
or it couldn't match uh something within that either ID or first name so in age we
also have that for the ones that weren't in the right data frame we only had 101
and 102 so we'll have the age for both froto and Sam but for Gandalf and Pippen we
don't have
15:03:21
their corresponding IDs and so it's just going to be blank for Gandalf and Pippen
and you can see that right here so again outer joins are kind of the opposite of
inner joins they're going to return everything from both if there is overlapping
data it won't be duplicated now let's go on to the left join and I'm going to pull
this down right here and now we're just going to say how is equal to left and let's
run this so what this is going to do is it's going to take
15:03:50
everything from the left table or the left data frame right here so everything from
data frame one then if there is any overlap it'll also pull the overlapped or the
you know whatever we're able to merge on from data Frame 2 so let's go back up to
our data frame 1 and two so it's going to pull everything from this left data frame
cuz we're specifying we're doing a left join so everything from the left data frame
will be in there we're also going to try to bring in everything from the right but
only if
15:04:19
it matches or or is able to merge so just this information right here will come
over we weren't able to join on 1006 17 or 1008 so really none of that information
is going to come over so let's go down and check on this so again we have 1 2 3 4
all of the data with this first name and skills everything is in here but then we
are trying to bring over the age but we only have matches with 1,1 and 1002 so only
these two values will come in let's look at the right join because it's basically
the
15:04:53
exact opposite let's look at the right and this is basically the exact opposite of
the left in the fact that now we're only looking at the right hand and then if
there's something that matches in data frame one then we will pull that in so this
this is basically just looking like data Frame 2 except we're pulling in that
skills column and since only 101 and 102 are the same that's why the skills values
are here now those are the main types of merges that I will use when I'm using a
data
15:05:24
frame or when I'm trying to merge a data frame but there also is one called a cross
or a cross join uh and let's look at this one and this one is quite a bit different
here we go let's run this so this one is different in that it takes each value from
the left data frame and Compares it to each value in the right data frame so for
froto in this left data frame it looks at the froto in the right data frame Sam
wise in the right data frame legalis elron and baromir all on the right data frame
then it goes to
15:05:56
the next value Sam wise and does the exact same thing Roto Sam wise legalis Elon
baromir and it does that for every single value so let's go right back up here so
it's taking this this this 101 it's comparing it to 1 2 3 4 5 then it's taking
Samwise it's comparing it to 1 2 3 4 5 Gandalf 1 2 3 4 5 Pippen and then you kind
of see that pattern and that's what a cross joint is um there are very few in my
opinion reasons for a cross join although you'll if you ever do like
15:06:28
an interview where you're being interviewed on python you will sometimes be asked
on Cross joins but there aren't a lot of instances in actual work where you really
use need a cross join now let's take a look at joins and joins are pretty similar
to the merge function and it can do a lot of the same thing except in my opinion
the join function isn't as easily understood as the merge function it's a little
bit more complicated um but let's take a look and see how we can join together
these data frames using
15:07:00
the join function so let's go right up here we're going to say data frame one do
join and then we'll do data frame two very similar to how we did it before and
let's try running this and it's not going to work um when we did the merge function
it had a lot of defaults for us let's go down and see what this error is it says
the columns overlap but no suffix was specified so it's telling us that it's trying
to use the fellowship ID and the first name just like the join
15:07:28
did except it's not able to distinguish which is which and so we need to go in
there and kind of help it out a little bit again a little bit more Hands-On than
the merge but let's see what we can do to make this work let's do comma and we'll
say on and let's really quickly let's open this up and kind of see what we have so
this one has less options than the merge does we have other and that's our other
data frame we can do on and we're going to specify you know what
15:07:56
column do we want to join on and then we can look at how do we want it to be a left
an inner an outer the same kind of types of joins as the merge then we have that
left suffix right suffix and that's right here is kind of part of the issue that we
were just facing is that those columns are the same but if we say left suffix it'll
give us an underscore whatever we want to specify any string four columns that are
both in the left and the right we can give it a unique name so we'll no longer have
that issue
15:08:25
and then we can also sort it like we did on the other one but anyways let's go back
to our on we'll say on is equal to and then we'll say Fellowship ID let's try
running this and we're still getting an error it's just not as simple as the merge
so let's keep going so now let's specify the type so we'll say how is equal to and
we'll do an outer and if we run this it still doesn't work we're still getting the
exact same issue as the left suffix and the right suffix so now let's finally
15:08:55
resolve it I just wanted to show you how a little bit more frustrating it was but
now let's say uh L suffix is equal to and now it automatically when we did the
merge did an underscore X but we can do let's do underscore uh left and then we can
do a comma we'll do right suffix and we'll says equal to and we'll do underscore
right now when we run this it should work properly let's run this so this is our
output and obviously looks quite a bit different over here we have this Fellowship
ID then we also
15:09:31
have Fellowship ID left first name left Fellowship ID right and first name right so
it just doesn't doesn't look right now something I didn't specify when I first
started this cuz I kind of wanted to show you is that the join usually is better
for when you're working with indexes before when we were using the merge we were
using the column names and that worked really well and it was pretty easy to do but
as you can see right here when we're trying to use these column names it's not
working
15:09:59
exceptionally well let's go ahead and create our index and then I can show you how
this actually works and how it works a little bit better when we're working with
just the index although you can get to work just the same as the merge it's just a
lot more work so let's go right down here and let's go and say df4 so we'll create
a new data frame we'll say df1 do setor index and we'll do an open parentheses and
we'll say we want to do this index on the fellowship ID and then we're going to do
15:10:31
the join so now we're going to say join so we're setting an index so we're setting
that index on the fellowship ID now we're we're going to join it on df2 do setor
index and then we're also going to do that on the fellowship ID and I'll just copy
this oh geez I hate it when I do that okay now we also want to do and specify the
left and the right index so I'll just copy this as we do need to specify this now
let's try running the data frame 4 so really quick just to recap we
15:11:09
were setting the indexes we were doing the same thing above right we have this join
we were joining data frame one with data Frame 2 now we're joining data frame 1
with data frame two except in both instances we're setting the index as Fellowship
ID so we're joining now on that index so now let's run this and this should look a
lot more similar to the merge than the join that we did above except now the
fellowship ID right here is actually an index so it's just a little bit different
but we can still go
15:11:39
in here and do how is equal to Outer oops let's say outer so we can still specify
our different types of joins or the different way that we can merge or join these
data frames together we can still specify that again it's just a little bit
different and that's why for most instances I'm using that merge function because
it's just a little bit more seamless little bit more intuitive the join function
can still get the job done but as you can see it takes a little bit more work now
let's
15:12:07
look at concatenate concatenating data frames can be really useful and the
distinction between a merge and join versus the concatenate is that the concatenate
is kind of like putting one data frame on top of the other rather than putting one
data frame next to one another which is like the merge and the join so
concatenating them is just a little bit different in how it'll operate but let's
actually write this out and see how this looks let's go up here and we'll say pd.
concat we'll do
15:12:35
an open parenthesis and then we're going to concatenate data frame 1 comma data
Frame 2 that's all we have to write and let's run this and so just like I said it
literally took the First Data frame 1 2 3 4 and put it on top of the right data
frame 1 2 6 7 8 so that is our left data frame this is our right data frame and
they're literally just sitting one on top of the other but just like when we merg
either with a left or a right when you have these skills and there aren't any
values that populate for them
15:13:06
it is going to say not a number and since we're not actually joining we're not
joining on one and two even though this one and this one is the same rows it's not
populating that value because again we're not joining these together we're just
concatenating and putting one on top of the other now if we go into this concat we
say shift tab there are a lot of different things that we can do which if you
remember the zero axis is the leftand index and the axis of one is the top index
which is the columns so
15:13:36
you can specify that and we can also o do joins and this is the one that I'm going
to take a look at but there are other ones that you can um look into as well let's
look at join let's do comma and we'll say join is equal to and let's do an inner
join so let's see what happens with this as you can see it is only taking the
columns that are the same that's what this in is doing it's joining these columns
together and the ones that were different they didn't take because again we weren't
able to
15:14:05
combine them they aren't similar between both frames Let's do an outer and now it's
going to take all of them and like I said that's doing this on these columns right
here but we can also do it on this axis as well so let's go ahead and say axis is
equal to one and when we run this now it's joining us on this Index right here of 0
1 2 3 4 so now these ones are being joined together and it's putting it side by
side much like a merge wood so that's how concatenate works and I'm going to show
you one more
15:14:37
thing and again it's not up here in this you know title because it's not one that I
recommend but is one called append the append function is used to append rows from
one data frame to the end of another data frame and then we can return that new
data frame and so let's do data frame one. aend we'll do an open parenthesis and
we'll say data Frame 2 very similar to how we've been doing other things and let's
run this and as you can see this is almost exactly like how the concatenate did
when we first
15:15:04
did it but if we read kind of this warning it's saying the frame append method is
deprecated and will be removed from pandas in the future version use pandas do
concat instead so it's literally warning us you know a pend is on its way out if
you want to do exactly what you're doing right here go and try concat or
concatenate because that'll do the exact same thing so I'm not really going to show
you any other variations of a pend because there's no reason it's going to be on
its way out in the next
15:15:32
version so that is our video on merge join and concatenate and aend as well uh in
panda does and I hope that that was helpful I hope that you learned something I
mean this stuff is really important because often times you're not just working
with one CSV or one Json or one text file you're working with multiple of them and
you need to combine them all into one data frame and so this is a really really
important concept and thing to understand with that being said be sure to like And
subscribe check out
15:15:57
all my other videos on Python and pandas and I will see you in the next [Music]
video [Music] hello everybody today we're going to be building visualizations in
pandas in this video we'll look at how we can build visualizations like line plots
Scatter Plots bar charts histograms and more I'll also show you some of the ways
that you can customize these visualizations to make them just a little bit better
with that being said let's go right over here start importing our libraries and
we'll start with
15:16:33
importing pandas as PD and this one is really all you need to actually create the
visualizations in pandas but we may get a little bit crazy uh and so we're going to
do a few different ones as well like import numpy as NP and then we're going to do
import Matt plot lib do pyplot as PLT now I may or may not use this I just you know
when I get into visualizations I may want to change some different things so we're
going to at least have them here in case we do want to use them let's go ahead and
run this
15:17:08
so now let's get our data set that we're going to be using so let's say data frame
is equal to pd. read _ CSV and let's get this in right here now we're going to be
doing these ice cream ratings let's take a look at this really quickly now these
values are completely randomly generated they are not real in any way um but that's
what we're going to be using cuz I just wanted something kind of generic something
that wouldn't be too crazy confusing just something
15:17:36
that we could use and you guys can understand that they're just numerical values
but let's also set that index really quick so we'll say data frame. setor index and
then we'll say date and then we'll say that's equal to the data frame and we have
this date column right here as our index so we have uh January 1st 2nd 3rd 4th and
then we have our ratings right here and again these are all just integers and
they're pretty easy or are really easy to demonstrate how you can visualize these
so that's
15:18:05
why we're using it today so the way that we visualize something in pandas is we use
something called plot so let's just take our data frame we'll do data frame. plot
and we'll do our parentheses now let's go in here really quickly let's hit shift
Tab and this is going to come up and this is pretty important because this kind of
is going to tell us what we can do within this plot and unfortunately there isn't
like a quick overview we just have this doc string but we have our parameters right
here
15:18:33
these are what we can pass in to kind of customize our visualization so the data is
going to be our data frame then we have our X and Y labels we can specify the kind
and this one's important because you can specify what kind of visualization do we
want we can do a line plot horizontal a vertical bar plot histogram box plot and
then a few others including area Pi density all these other things we can also
specify if we want it to be a subplot and a lot of these things that I'm specifying
you
15:19:04
know I'm going to show you how to do you can use a different indexes you can add
titles add grids Legends Styles all these different things I mean you can go
through here CU there are a lot but you can specify and and you know customize all
of these things we won't be going into all of them but I will show you some of the
ones that I probably use the most and that I think are the most useful to know
right away so let's get out of here and we're just going to do DF do plot and when
we run this we'll
15:19:31
get this right here and that was super super easy created a line plot by literally
doing just about nothing nothing um but by default it's going to give us a line
plot so if we come up here we say kind and let me get that out of the way is equal
to line and we run this so by default without us actually having to input anything
it's giving us that line plot as a default so uh we can specify it's a line plot as
you can see we already have all of our data right here we didn't have to specify
anything
15:20:02
it kind of automatically took it in it is visualizing all three of these columns
and it has this little um Legend right here and we can specify where we want that
uh there is an argument to be able to do that it also gave us these tick marks of 2
4 6 8 10 again it read in and said it's only going from 0.0 to 1.0 that is kind of
the peak and so it kind of automatically gave us these ticks for us again that's
another thing that you can specify we make it go up to 2 5 10 1,000 whatever you
want it to be and
15:20:36
then we're doing this based on off of this date value right here really quickly I
wanted to give a huge shout out to the sponsor of this entire Panda series and that
is udemy udy has some of the best courses at the best prices and it is no exception
when it comes to pandas courses if you want to master pandas this is the course
that I would recommend it's going to teach you just about everything you need to
know about pandas so huge shout out to you me for sponsoring this Panda series and
let's
15:20:58
get back to the video if we wanted to break these out by the actual column we could
go in here and say subplot is equal to true and it's actually subplots whoops and
now we can run that and then we can see each of those columns being broken out by
themselves instead of them all being in one visualization it's now uh three
separate visualizations now let's go right over here we're going to get rid of the
subplots I want to show you just some of the different arguments that you can use
to make this look nice
15:21:28
uh because I don't want to do this on every single visualization I just want to
show you what you can do so we have this one right here we can add a title notice
there's no title or anything really telling us what that is so we can say comma
title and we'll say ice cream ratings if we run this we now have this nice title
right here now we can also customize the labels or the titles for the X and Y AIS
it automatically took this date which is right here this is our date index it
automatically took
15:22:00
that for us but we can customize that if we'd like to all we have to do is comma
and then we'll say xlabel is equal to and so our X is this date one right here and
we can say daily rating and then we can do the Y Lael we'll say y label is equal to
and for this one we can say scores hope you cannot hear my dog in the background CU
they being insane uh but let's go ahead and run this and now we have these daily
ratings on the x- axis and on the Y AIS we have scores now let's go right down here
and start
15:22:33
taking a look at our next kind of visualization which is going to be a bar plot so
we'll do DF do plot we'll do kind is equal to and for this one we're going to say
bar now this is what your typical bar plot will look like and a lot of the
arguments that we just did on the line plot you can also apply to this bar plot
something that's unique to the bar plot is that you can also make it a stacked bar
plot all we have to do is go in here we'll say comma and we'll say stacked is equal
to true so now this
15:23:04
going to make it a stacked bar chart instead of just know your regular bar chart
let's go ahead and run this and as you can see this is now stacked on top of one
another with each of these columns all representing the values that they have now
we don't always have to do every single column we can also specify the column that
we want so let's take the flavor rating for example we could do flavor oops flavor
rating good night flavor rating and then it's only going to take in that flavor
rating column and
15:23:35
if you notice we don't have a legend that's only when you have multiple values
which we are only looking at this one column so all the values are right here now
in this bar chart it automatically defaults to a vertical bar chart but you can
change it to a horizontal bar chart let's go ahead and take a look at how to do
that bring back all of them we'll do DF do plot Dot and then we'll say barh and I
don't know if I can keeping that kind equals bar let me run this yeah I need to get
rid of that because
15:24:04
the bar. H is its own um this is its own function so now I'm going to run this it
should just have a stacked bar chart except now it should be horizontal so now you
can see this worked properly it's basically the exact same thing as a vertical bar
chart just now horizontal which may look better especially depending on if you have
values like this or you know something else that just looks better being horizontal
now the next one that we're going to take a look at is the scatter plot so we're
15:24:34
going to say DF do plot do scatter scatter and if we run this we're going to get an
error what we need in order to run this properly is we need to specify the X and
the Y AIS in order for this scatter plot to work so let's go here and we'll say x
is equal to and we can take any of our columns that we have up here so we'll say x
is equal to texture rating and then oops Y is equal to we'll do overall rating now
when we run this it should work properly let's go ahead and take a look
15:25:11
now if we go in here and we do shift tab we can also see some other things that we
can specify so let's go right down here so we have our X and we have our Y and
those are the ones that we just did we can also pass through an S which is going to
tell us or or change the size of the actual dots right here in our scatter plot
then we can also do a c which is the color of each point let's start with the S
let's say s is equal to let's just do 100 let's see what that looks like so we
15:25:42
have a much larger number let's do 500 and see what that looks like so we can make
these much larger on our visualization depending on what you're looking for we can
also look at the color let's put comma C so for color we can say color is equal to
and let's do uh yellow let's see if this works so now we've changed it to Yellow
that looks absolutely terrible but it does work now let's move on to the histogram
histogram is always a good one it's very similar to something like a bar chart but
what's
15:26:14
great about a histogram is you can specify the bins um so let's go ahead and say DF
dolot doist then we'll do an open parenthesis and let's go ahead and hit shift tab
in here take a look at this one as well so some of our parameters are the actual
Columns of the data frames that we want to pull in we get you can choose the bins
and they have a default of 10 in here and so let's take a look at how this works so
we'll just run this as it is so this is by default what this histogram is going to
look
15:26:48
like let's go ahead and specify our bins we'll just say it was 10 by default let's
just do 20 see what that looks like so there are smaller columns right off the bat
and remember histograms are really good for showing distribution of variables you
know that's really what a histogram is for but of course since these are completely
random numbers this histogram isn't going to make any sense at all but you can at
least kind of see visually how it works and if I didn't mention it before which I
should have
15:27:16
the bins represent how many kind of tick marks are down here so if we just do one
only going to be one very large uh you know histogram we could even go further down
from 10 and do five so now there's only one 2 3 four five so the distribution gets
smaller and and things get more compact as you spread it out again like we did 100
it's going to spread it out a lot um and this is what it shows you know it's
showing the distribution of those bins across however many you want so the 10 by
default you know it usually is pretty
15:27:54
good for a lot of different things now let's go down here and look at the box plot
and the box plot is a pretty interesting one let's go ahead and visualize it really
quickly and then I'll kind of explain how this one works let's do d boox plot let's
run this and really what we're looking at is some different markers within our data
this line right here is the minimum value within that column we also have the
bottom of the box which is the 25th percentile of all the values within just
15:28:22
this column this is 50% then we have 75% and then up here we have our maximum value
so I can take a glance at this and see that we have a low minimum a high maximum
and it definitely skews towards the lower range whereas if I look over here we have
a lower minimum and a higher maximum and you can see that this medium point is at0
6 versus 04 over here so the skew is a lot higher now let's go down here and take a
look at an area plot we'll do DF do plot. area and let's just run this this is what
we're
15:28:56
going to get by default now something I wanted to show you earlier I just haven't
gotten around to I want to show you something called Figure size or fig size um so
for this it's know it's just looks small small looks a little bit cramped let's say
we want to increase the size of this and we'll say fig size oops fig size is equal
to and let's just do a parentheses and say 10 comma 5 that should be pretty large
this is going to make it a lot larger just something I wanted to throw in there I
look at these
15:29:24
area charts as pretty similar to like a line chart if we went and compared those be
pretty similar um but they're different visually and you know you absolutely can
use these for different types of visualizations but I don't use this one a lot if
I'm being honest that's why it's kind of towards the end of the video but you
definitely can do it let's go on to our very last one of the video that's going to
be the beautiful pie chart let's say DF plot.py do an open parenthesis and let's
run it
15:29:53
we're going to get this error that's because we need to specify what column we're
working with here so let's just say the Y and that's what we need let me open this
up for us right here we have our y and this is our our label or a column that we're
going to plot that's really all we need so we can just say Y is equal to flavor
rating oops flavor rating let's run this and now we get this visualization right
here let's make this one a little bit bigger big size is equal to 10 comma 6
15:30:31
now it's a little bit bigger it definitely depends so this Legend is going to autop
populate you know you can make this as big as you want and obviously it's going to
look a little bit better if you do it larger and these colors autop populate now
you can customize these colors although I found these ones to be just when you have
a lot of them it's harder to customize them as easily but you know definitely look
into it these are things that everything in here is almost something that you can
customize in some way
15:30:56
although it does get a little bit tricky you definitely have to do some research
and some Googling around just to kind of figure out how to do those things now one
last thing that I wanted to show and something you know I could have probably done
at the beginning um is you can actually change what visual this is and we can do
that pretty easily within mpot lib there are different styles um and so let's go
right here let's add a new row a new cell and we'll say print and we'll do PLT so
that's that map plot lib right
15:31:27
here we'll do PLT do style. available and what this is going to do whoops what this
is going to do is show us all these different different types of stylings that you
can do to kind of change up this visualization then once we find the one that we
like we'll just do PLT do style. use and then in the parenthesis we'll just specify
which one we want now there's all these Seaborn ones and Seaborn is a really great
um really great Library let's try Seaborn deep I haven't tried this one at all
15:32:02
let's go ahead and try this and just changes some of the colors some of the visuals
we can try something like 538 let's try this that looks quite a bit different and
let's try something like um classic I don't know what this one looks like let's
just try it so you can try out all these different styles find one that you like
find one that you think looks really nice and you can run with it through all your
visualizations so this has been our video on visualizing data in pandas I
15:32:33
think it's is a really good introduction on how you can visualize data within
python and in future videos we'll look at mpot lib and Seaborn which are some
really great libraries for visualizing data which I use a lot so I hope that you
enjoyed this video if you did be sure to check out all my other videos on Python
and pandas and I will see you in the next [Music] video hello everybody today we're
going to be cleaning data using paint P now there are literally hundreds of ways
that you can clean data within pandas
15:33:10
but I'm going to show you some of the ones that I use a lot and ones that I think
are really good to know when you are cleaning your data sets so we're going to
start by saying import pandas aspd and we're going to run that and now we're going
to import our file so we're going to say data frame is equal to PD that's pandas do
read uncore and we actually have this in an Excel file so we'll say read oops say
read Excel do an open parenthesis eses and we'll do R and
15:33:38
then we'll paste the path right here and now we're just going to call that variable
so we'll call data frame and we'll actually read it in and look at the data so
let's scroll down here and let's take a look at this data frame or this Excel file
that we're reading in so right off the bat we have this customer ID that goes from
101 all the way down to 1020 we have this first name and everything looks pretty
good here except in this last name column uh looks like we have some errors we have
some forward
15:34:07
slashes some dots some null values um so definitely going to have to clean that up
because we don't want that in the data we have a phone number and it looks like we
have a lot of different formats um as well as Nas not a number um just lots of
different stuff so we're going to need to standardize that so clean it up and then
standardize it to where it all looks the same um we also have address and it looks
like on some of these we just have a street address but on some of the other ones
we have like a
15:34:39
street address and another location as well as a zip code in some of them so we'll
probably want to split those out we have a paying customer uh which is yes and Nos
and some of those are not the same so I have to standardize that we have a do not
contact kind of the same thing as the paying customer and we have this not useful
column which we'll probably just want to get rid of okay so the scenario is is that
we got handed this list of names and we need to clean it up and hand it off to the
people who
15:35:07
are actually going to make these calls to this customer list so they want all the
data in here standardized and cleaned so that the people who are making those calls
can just make those calls as quickly as possible but they also don't want columns
and rows that aren't useful to them so things like this not useful column we're
probably going to get rid of and then ones that say do not contact if it says yes
we should not contact them we probably will want to get rid of those somehow so
15:35:34
that's a lot of what we're going to be doing to clean this data set normally the
very first thing that I do when I'm working with a data set most of the time except
very rare cases when you're actually supposed to have duplicates is I actually go
and drop the duplicates from the data set completely all you have to do for that is
say DF do dropcore duplicates so they make it super easy for you let's just run it
and up here is our original data set we have this 19 and 20 and those are obviously
15:36:04
duplicates they have the exact same data it's just a duplicate row that we need to
get rid of if we look right down here we no longer have that 20 we now just have
one row of Anakin Skywalker and of course we want to save that so we're just going
to say DF is equal to and DF so now it's going to save that to the data frame
variable again and now when we run this our data frame Now does not have any
duplicates that's definitely one of the easier steps that we're going to look at uh
things are going to get
15:36:35
quite a bit more complicated as we go but I'm starting out you know kind of simple
so that we can kind of get a feel for it and then we'll start getting into the
really tough stuff so the next thing that I want to do is remove any columns that
we don't need I don't want to clean data that we're not going to use so if we're
just looking through here you know they may need you know first name last name
phone number for sure address might give them some information of where
15:36:59
they're calling to or time zone so we want that this not useful column looks like a
pretty good candidate to delete and it's very easy to do that we're going to go
right down here and we're going to say DF do drop and we'll do an open parenthesis
drop just means we are dropping that column and we can specify that by saying
columns is equal to and then we'll paste in that column that we want to delete so
let's run this and see what it looks like and it literally just drops that column
exactly like we were
15:37:31
talking about it no longer has that column again we want to save that we can always
do in in place equals true um if you follow this tutorial series you can always do
in place equals true and that'll save it as well but just for our workflow most of
the time I'm going to assign it back to that variable um just for keeping it the
same really quickly I wanted to give a huge shout out to the sponsor of this entire
Panda series and that is udemy udemy has some of the best courses at the best
prices and it is no
15:37:57
exception when it comes to pandas courses if you want to master pandas this is the
course that I would recommend it's going to teach you just about everything you
need to know about pandas so huge shout out to you me for sponsoring this Panda
series and let's get back to the video now let's kind of go column by column and
see what we need to fix and we'll start on this left-and side this customer ID to
me looks perfectly fine I'm not going to mess with it at all the first name at a
15:38:21
glance also looks perfectly fine I don't see anything wrong with it visually which
is a good thing um although sometimes that can be deceiving and that can cause
errors down the line but we're not going to uh assume that there are errors in here
now let's look at this last name now the last name obviously I'm I'm seeing some
obvious things things that we talked about when we were first looking at this data
set we have this forward slash which we definitely need to get rid of we have null
values
15:38:47
so not a number right here we have some periods as well as an underscore right here
so all those things I think we should clean up and get rid of it so that when the
person is making these calls you know it's all cleaned up for them so how are we
going to do that we can actually do this in several different ways but let's just
copy this last name the first one I'm going to show you is strip and we'll write it
kind of like this we'll say data frame and then we'll specify the column that
15:39:13
we're working with because we don't want to make these changes or strip all of
these values from everywhere we only want to do it on just this column if we do
this and we don't specify the column name it will apply to everywhere so if we're
trying to do these yeah let's say bum these underscores maybe that would mess with
something else in another column and we don't want that so we just want to specify
just this last name so let's go last name. string. strip now what strip does and
15:39:44
let's see if we can open this up really quickly no we can't um but what strip does
I was just I was hitting shift tab in here to see if it could bring up um you know
some of the notes on it but what strip does is it takes either the left side or the
right side well L strip takes from the left side our strip takes from the right
side and strip takes from both but you can strip values off the left and the right
hand side and we can specify those values now for what we're doing in this column
we can just use
15:40:12
strip because as you can see this forward slash these dots as well as this um
underscore are all on the far sides if there was a value Like swancore Son the
strip wouldn't work at all because it's not on the outside of the value or the word
so we can use strip I'll also show you how to use replace and replace is another
really good option for things like this but let's start with strip and just see
what it looks like and see if we can get what we need done so let's just run this
for now see what happens
15:40:46
so it looks like nothing has changed because again we're not specifying any
specific value just by default it's only taking out white space so like spaces that
shouldn't be there that's what it does by default now we can specify within this
exactly what values we want to take out so let's go ahead and do that let's say
left strip and let's try to take out these dots real quick so we're just going to
do a parenthesis dot dot dot now let's run this and see what
15:41:13
it looks like for this one Potter it is now gone so those three dots were there
before let's just show it so they were there and then when I ran it like this now
they're gone that's what the L strip does it takes it only off the left hand side
now we can also do a forward slash so we'll do something like this and it'll get
rid of the white but as you can see now we aren't taking out these three dots so
they're still there now is it possible to do something like this where we put these
values inside of a
15:41:47
list um let's try it so we'll say just like this one two 3 let's run it and no it
doesn't um this L strip actually sits within the the realm of regular expression so
if you've ever worked with regular expression you know it gets very complicated
very complex so you want to keep it kind of simple especially with these values
where we're just taking a few out so what we're going to do is we're going to do
dot dot dot and we're take it out one by one now in order to
15:42:15
save this because we want to save this we want to take out that value we don't just
want to say data frame equals because that would be uh very bad what this would say
is now this data frame is only equal to these values that we're seeing right here
we want to only apply it to this column so we're going to go like this so now when
we do it and then we call the entire data frame it's only applying this to this one
column the last name column so let's run it and now when we go down to Potter
15:42:48
right here it's cleaned up so we're going to do the same thing but for those other
values and we'll do it just like this we'll do a forward slash and it's a left
strip and then we'll do I'll do the left strip on this underscore to just to show
you that it won't work and then we will go on from there so it's not pulling it
because we're looking at the left hand side only we need to use R strip so now
let's use R strip and now that looks perfect has no underscore so that's how you
can use
15:43:21
strip for either the left side the right side or just Strip by itself which covers
both sides now I showed you all of that because I am going to show you a different
way to do it um and I apologize because I somewhat lied to you earlier um let's run
this right here actually we're just going to pull it in like this we're going to
remove the duplicates again bear with me we're going to drop that column and then
now we're sitting with that data frame again with those exact same mistakes I just
15:43:49
wanted to reset it for a second there is a way uh that you can do this and I just
wanted to you know kind of show you how you can do it you can do this right here
and we'll say so we're now again we're just looking at this column just this column
and we're using strip and let's get rid of R CU we want to do apply it to
everywhere you can input all of those values in visually and it will clean it up so
let's say we want to get rid of numbers we'll do one two three then we can do the
dot so that's going
15:44:20
to be for a period or for a dot dot dot Potter we could also do the underscore and
we can do the forward slash so we put it all in one string right here now let's
take a look at this we'll get rid of this really quickly now let's take a look and
all of them were removed I showed you how to do it before because that's at least
how my mind would think about it I'd think oh I can put it in a list and run it
through this L strip or this right strip and it would work um but that's not how
strip works you have
15:44:49
to kind of combine it all into one value so uh yes I deceived you I apologize but
now when we call data frame and we assign it to that column so the last name column
or assigning what we just did to this last name column everything should look
perfect and it does so our customer ID first name last name are all cleaned up now
we're going to come to a much more difficult one this is probably if I'm being
honest the hardest one I said we were going to work up but this is probably the
hardest one of the whole
15:45:18
video working with phone numbers and look at all these different types of of
formats I mean it is um it's not going to be fun and imagine you you know there's
20,000 of these you can't just go and manually clean those up you need something to
kind of automate that so that is what we're going to do so let's go right down here
we'll copy the data frame and I'm going to pull it right here so now we need to
clean up this phone number what we want is it all to look exactly the same unless
it's
15:45:51
blank and we'll keep it blank we don't want to populate that data but we want all
of them to look exactly like this one and what we're going to do is right off the
bat we're going to take all of the non-numeric values and just complete completely
get rid of them strip it down to just the numbers so this 1 23- 643 or forward
slash will just be the numbers same with these bars and these slashes and
everything all of these will just be numeric then we'll go back and reformat it how
we want to format it which will
15:46:24
look exactly like this one um but we just want to do it for the entire column so
let's go right up here and we're going to try replace for the first time so let's
do phone number just oops that's not what I wanted so we're going to do a bracket
say phone number do string. replace just like we did before now we're going to use
some regular expression in here and I'll kind of do a really high overview although
I'm not going to dive super deep into the regular expression then we're going to
15:46:56
do a parenthesis and within there we're going to do a bracket um I can't remember
what this is called is it called a carrot I think it's called a carrot uh I'm just
going to call it that it may not be correct but I think it's a an upper Arrow so
it's an upper Arrow a dash oops A- Z A- Z and then 0-9 now at a super high level
what that character that first thing is doing it's saying we're going to return any
character except and then we specify anything A to Z A to Z upper or
15:47:28
lowercase and then actually I think this should be like this A to Z uh and then 0
to 9 so any value like a BC 1 2 3 those are not going to be matched it's going to
match all of them except these values and then we're going to replace them by
saying comma and we're going to replace them with nothing so this is just an empty
string so literally we're taking everything that is not an A B C A one two 3 so a
letter or a number we're replacing all of that and then we're replacing it with
nothing so let's run
15:47:58
this and see what it looks like and it looks like that worked properly now we do
have this na cuz we had an n- a for I don't remember maybe that was Creed Bratton
um but it worked for basically everything else we're going to go through the entire
process and then at the end we'll remove any values we want them to just be
completely null we we don't want them to even see n an and wonder what that is we
just want it to be blank and we'll do that at the very end so now that we know that
that worked
15:48:27
let's assign it we'll do DF phone num is equal to and then we'll say data frame and
this looks a lot more standardized than it did before already but now what we want
to do is try to format this um and I've done this many many times I always use a
Lambda you can definitely use a for loop I just I don't do it that way myself so
I'm going to show you how to do it using a Lambda let's get rid of this and we're
going to say thef phone number we've already done that I'm just
15:48:59
going to get rid of it now we're going to say d phone number then we're going to
say do apply we'll do an open parentheses and then this is where we're going to
build out our Lambda so we'll say Lambda X colon now this is where we're going to
kind of format it so what I want to do is I want to take the first three strings
one two three then I want to add a slash and then the next three strings add a
slash or a dash uh and then that be the value that's returned so it's not super
difficult we're just
15:49:27
going to do X then a bracket let me get rid of that an X and then a bracket and
then we want the 0 to three so goes 0 1 2 so 0 1 2 it doesn't include the three it
goes up to three so 0 one two that's our third first three values then we'll do
plus and do a quote and do a dash so this is our first kind of sequence and I'm
just going to copy this we'll do plus and instead of three or we are going to start
at three because now it's inclusive so we're going to go from three and we're going
to go all the way
15:50:03
up to six so it should be 3 four five our next three values then we have a dash and
we'll copy this and we'll say plus and now we go from six all the way to 10 now
let's try running this and as you can see we get an error now I already know what
the error is float object is not subscriptable which means we're trying to um
basically look at it like a string right now it's not a string it's actually a
number so let me get rid of this for just a second I'm G show you what it's talking
about so
15:50:41
right now we have values that are floats and values that are strings or not even a
number so we have values that are strings or not a number so if we want to actually
look through it like kind of like indexing if we want to do that they all have to
be strings so we need to change this entire column into Strings before we can apply
this um formatting now when I was creating this if I'm being honest my first
thought when I was doing this was to do it like this string DF phone number um
let's just run that
15:51:14
this is what the values look like um and I don't remember why or why it was doing
this I can't I can't remember but I looked into it quite a bit and I was like oh I
need to apply this string converting it to a string on each value not the entire
row or not the entire column so how we can do that is actually fairly easy because
we've already done a lot of the heavy lifting we're just going to copy this and
we're going to say x so string of X and again Lambda is like a little Anonymous
function so you
15:51:50
could do this by saying for um X in this uh column we could do a for Loop and then
say for every X it equals the string of X and then it changes it to a string but a
Lambda just does it a lot quicker um so we're going to say so let's do that really
quickly and all of our values look exactly the same and that's how we want it so
we're just going to copy this apply it good and now we're going to take this and
we're going to run this again just ignore all my commented out stuff
15:52:25
pretend I don't have that um so now when we run this it should work there we go now
if we look at these numbers 1 2 3- 545 d 5 421 and it does that for every single
one where there's values even when there's Nan or na it's still adding those values
but we expected that so let's apply it say is equal to and then we'll look at the
data frame and this looks almost exactly what we're hoping for we just need to get
rid of these so this n- Das and this na Dash we need to get rid of those and that
is
15:53:05
super easy to do um we're just going to say so now that we've done it and we'll
comment that out we'll say DF and let's copy this ignore the messiness I do
apologize for that it's very messy um but if you're following along with me you get
what we're doing so DF phone number so only on the phone number say string. replace
no open parenthesis now we can specify this value so we want to take this exact
value and replace it with nothing and let's just see if that does work it does
15:53:43
now we have these Nas and so let's actually I'll paste that right down here we're
going to do this is equal to and then we're just going to take this entire string
put it right here and put this value as our what we're looking for and then
replacing and then when we call that data frame it should work properly and it is
perfectly cleaned so we have every single value all the exact same they don't have
different characters or different um you know formatting and we got rid of all the
ones that we don't
15:54:20
have or don't need um all the ones that were just random values so this column is
now completely cleaned up again definitely one of the more difficult ones um one
that I've done a thousand times I've had to work with a lot of phone numbers and
stuff like like that this one does get very tricky especially if you have like a
plus one which is like an area code um that can get tricky as well but this is on a
kind of a high level this is how you can do that and it's pretty neat how you can
actually
15:54:46
you know clean up and standardize those phone numbers so let's go right down here
uh let's run it the next thing that we're going to look at is this address now
let's just pretend that the people who are on the call center want all these
separated into three different columns they can read it easier see what the ZIP
code is where they live uh you know whatever they want it for let's just say we
want to do that and this is you know again for this use case it may not make sense
but you have to do
15:55:12
this I do this all the time um you need to split those columns now luckily all of
these things are separated by a comma so we can specify that we're going to split
on this column and then we'll be able to create three separate columns based off of
this one column which is exactly what we want then we can name it as well and we
can do that very easily by using this split so we're going to say DF and we want to
specify oh jeez not again so we want to specify that we're looking at the
15:55:44
address then we're going to say. string. split we'll do an open parenthesis now the
very first value that we need to specify is what we're splitting on so we want to
split on the comma so we want to specify that and then we need to specify how many
values from left to right it should look for now we'll just start with one and then
we'll go from there let's just see what this looks like so it doesn't really look
like it did anything let's do two well let's go back
15:56:18
to one and then let's say expand equals true when we expand it it's actually going
to uh separate it I believe okay so we're expanding we now we're only doing this
with one comma so we're only looking at the very first comma and splitting it but
in some of these well just in one there is an additional comma so we should do it
up to two let's do this okay so now we have three columns if we just save it like
this it's going to give us these 0 one two these basically these indexed values
15:56:50
for these columns and we don't want that we want to specify what these actually are
and we can do that by saying DF and let me just do is equal to we'll do bracket and
then within there we're going to specify our list so we have three three of them
that we have so I'm going to do um the first one this is the street address so
we'll say street address the next one is and it's sh is not a state uh but these
all are states so I'm just going to say State and then for the very last one
15:57:24
that looks like a zip code so we'll say zip and we'll do code in fact I also want
to do streetcore address um so what this is is now going to do is these three
columns are going to be applied to these three names and they'll basically be
appended it doesn't replace the address we're not saying DF address equals the DF
address we're not replacing it we're now creating different columns so let's run it
and then let's also call it so they're right over here on this right hand side I
15:57:54
couldn't see them at first but it did exactly what we needed it to do so now if we
wanted to at the very end if we want to we're not going to we could just delete
this address and keep the street address the state and the zip code another really
common thing that you can do this happens often again with like first name last
name well you'll have Alex freeberg but it's Alex comma freeberg or Alex space
freeberg and you can separate those out into different columns now the next one
that we want to
15:58:23
look at is this paying customer and the paying customer and do not contact are very
similar um in the fact that it's yes no NY yes no NY um and so let's go right on
down here and we're going to say DF Dot and we want to just replace these values as
all yeses or all NOS but just with the same formatting um just to keep it
consistent so let's make anything that's an N into a no anything that's a a y into
a yes I like it spelled out so let's change anything that's a yes into a y anything
15:59:00
that's uh a a no into an n that's usually how I do it just saves on data because
it's less strings although it's can be often very minimal um but let's specify the
P customer we'll s say DF bracket Pay customer then we'll do do string. replace so
now we're just going to look for those specific values so if it's a y oops a
capital Y then we'll say yes now let's run it and now we have no more y's we now
just have yeses although now these are yes yeses okay we don't
15:59:42
want to do that let's do if we're looking because it's taking it's literally
looking up here and saying okay there's here's a y um let's change the let's change
that Y into a y so now it's doing y uh we don't want that so let's look for the yes
and change it into a y now when we run this that looks a lot better um so we'll do
DF paying customers equal to and then we'll copy this we'll do the exact same thing
no and N then let's call it and now that entire
16:00:21
column looks really good except for that value right there but I'm going to leave
that because I'm just going to apply it to the entire thing all at once to get rid
of those at the end instead of just going column by column and then it's it's
literally going to be the exact same thing so I'm not even going to scroll down
whoops I'm just going to put it right up here because this is the exact same thing
I'm going save us all some time and when we run this this looks exactly like what
we're looking for
16:00:51
again some not a number values but we can get rid of that in just a second by doing
a place over the entire data frame and that is basically the end of cleaning up
individual columns now let's go right down here we're going to say DF do string.
replace and then we'll first do these values oops so we'll do oops let me do that
there we go and replace that with nothing and let's just see what it looks like
oops data frame object has no value string well that's CU we were looking at
16:01:25
columns before yeah I think I just need to get rid of this string we're not looking
it we're just really doing it across the entire data frame now let's try that okay
that worked appropriately and we'll just say data frame is equal to and then we'll
copy this and we'll do the NN as well and we'll [Music] do and now when we do this
it is not going to replace these because these aren't actually a value because
we're looking for that string we actually need to use and I I completely forgot
this
16:02:00
I'm not going to lie to you um let's get rid of this uh to get rid those values
because it's literally not a number there it is technically empty um I forgot we
can do um or we could not even specify it we'll do DF do fillna so we're going to
fill these values if there's nothing in them we're going to fill it and we're going
to say blank and when we run that every value that doesn't have something in it is
going to show up blank even over here where we only had a few all of them
16:02:32
throughout the data frame if if it doesn't have a value it is now blank so let's
apply that and we'll run this and now all of our cleaning we actually cleaning up
the individual columns is completely done we've removed columns we've split columns
we've formatted and cleaned up phone numbers we've also taken values off of first
name or or this last name column and then we formatt it in just kind of
standardized paying customer and do not contact now they also asked us to only
16:03:08
give them a list of phone numbers that they can call so if we take a look some of
these do not contacts are why which means we cannot contact them and then there are
some that don't even have phone numbers so we don't want to give the people the
call center numbers that or or people who don't have numbers so we want to remove
those now there's a few different ways that we can do this but let's start with and
we'll just go by do this do not contact it seems like the most obvious one now if
it's blank
16:03:41
we want to give them a call we only want to not call them if they've specifically
said we cannot call them so if it's y we're not going to call them so what we need
to do it's not anything like this we probably need to Loop through this column and
then look at each row that has a value of this and drop that entire row uh and we
probably will'll need to do that based off this index instead of doing it based off
just this column uh that may not make sense but let's actually let's actually start
writing it
16:04:15
so we'll do 4X in and we need to look at our index so we're just going to do let's
do nf. index and we'll do a colon enter and then we want to look at these indexes
how do we look at these indexes we use lock that's going to be DF Lo and then we
need to look at the value which is this x right here so each time it looks at the
index it's looking at the value but we want to look at the value of this column do
not contact I don't know if I copied this before let me copy it we only want to
look at the
16:04:51
value in this one column if we didn't it would look at um a different value so we
don't want that so we're looking at just that value if it's equal to Y so if this
value is equal to Y then we want to drop it so we actually need to say if so if
this value X in this column is equal to Y then we want to do DF do drop and then
we'll say x and we I think we have to say in place equals true here otherwise it
won't take a fact um otherwise you have to say like DF is equal to DF yeah I don't
I don't want to
16:05:30
start messing with that let's just do in place equals true um and let's see if that
works I I can't remember if this is going to work or not invalid syntax okay neon
and now let's try to run this okay okay yeah if we look at our index we can already
tell that there are ones missing the one the one is missing the three is missing um
let's see and the 18 is missing so we already got rid of those values and you can
you can see that there's no y's in here anymore which is really good we can if we
want
16:06:04
to and we probably should we should probably populate that um really quickly um let
me just go up here really quick I'll copy this we probably should populate that and
I didn't plan on doing this so um if it's blank oops it's blank give it an n and we
want to attribute it to do not contact do not contact whoops let's see if that
works and we probably need to do do string let's just see if it works so if it's
blank dude okay I don't know why it's giving us a triple
16:06:52
n maybe there's maybe I need to strip this or something uh okay never mind let's
not do that but now we basically need to the exact same thing for this phone number
um because if it's blank we don't want them calling it um so we can copy this
entire thing go right down here and but now we're looking at phone number so now
we're looking just at the values within phone number and we only want to look at if
it's blank so if it literally has no value we want to get rid of it let's run this
and see if it
16:07:29
works again it should good and now our list is getting much smaller so you can see
in our index a lot of um those rows were removed and okay good actually this worked
itself out because these all have ends um so right now we're sitting really good
everything looks really um standardized cleaned everything looks great I might drop
this address if you want to you can drop this address but besides that this is all
looking really good this Paint customer doesn't uh the yes and NOS aren't really
anything um
16:08:03
now we could and we probably should before we hand this off to the client or the
customer call list we probably should reset this index because they might be
confused as why there's numbers missing or you know they might use this index um to
show how many people they've called or I don't know something like that so let's go
right down here we're going to say DF Dot and then we'll do reset index and let's
just see what this looks like um it does work but as you can tell
16:08:33
it didn't uh get rid of that index completely it actually took the index and saved
that original one we do not need to save that whoops let's put it right in here now
we're just going to do drop equals true and when we do that it just completely
resets it drops the original index and gives us a new index and that is what we
want let's do DF equals and this is our final product now one thing that I you
definitely could have done here um and I made this a little probably more
complicated than it
16:09:03
needed to be um that was just how my brain was working at the time when I'm you
know typing this out we could have done DF do drop an a um which is literally going
to look at these null values um before we couldn't do that with this one because
these aren't we're not looking at na we're looking at y's so we couldn't do that
but because we're looking at null values we could have also done drop na um and
done subset is equal to and then done it just on this phone number
16:09:36
and then done like this and done in place equals true so we could have also done
this and then said DF equals um I can't I mean I can run it it's just not going to
do anything I can run it on the different column but that'll me mess everything up
but this is another way you can do it and I'll just save it in case you want to um
I'll say another way to drop null values there you go and that'll just be a note
for us in the future um but this is our final product it looks a lot
16:10:10
different than when we first started I mean we had mistakes here completely
different formatting in the phone number different address everything that we just
talked about um and this looks just a lot lot better and you can tell why it's
really important to do this process because again we're working on a very small
data set I I purposely you know created this data set with these mistakes because
you know when you're looking at data that has tens of thousands 100 thousands a
million rows
16:10:39
these are all things that are going to be applied to much larger scale and you
won't be able to as easily see them um you'll have to do some exploratory data
analysis to find these mistakes and then you're going to need to clean the data or
doing it at the same time when you're exploring the data uh so you'll clean it up
as you go but these are a lot of the ways that I clean data a lot of the things
that you can do to make your data just a lot more standardized is a lot more um
visually better and then it
16:11:06
really helps later on with visualizations and your you know actual data analysis so
I hope that that was helpful I know that this was a long video I'm sure it was uh
but I hope that you got something out of this you learned some of the techniques on
how to actually clean data in pandas if you like this video be sure to like And
subscribe check out all my other videos on pandas as well as Python and I will see
you in the next [Music] video [Music] hello everybody today we're going to be
looking at exploratory data analysis
16:11:44
using pandas exploratory data analysis or Eda for short is basically just the first
look at your data during this process we'll look at identifying patterns within the
data understanding the relationships between the features and looking at outliers
that may exist within your data set during this process you are looking for
patterns and all these things but you're also looking for um mistakes and missing
values that you need to clean up during your cleaning process in the future now
there are
16:12:09
hundreds of ways to perform Eda on your data set but we can't possibly look at
every single thing so I'm just going to show you what I think are some of the most
popular and the best things that you can do when you're first looking at a data set
the first thing that we're going to do are import our libraries so we'll do import
pandas aspd we're also going to import Seaborn and matplot lib now dur during this
exploratory data analysis process I often like to visualize things as I go
16:12:38
because sometimes you just can't fully comprehend it unless you just visualize it
and it gives you a a larger broader glimpse of everything so we're going to import
and let's do caborn oops as SNS and then we'll import Matt plot li. pyplot as PLT
let's run this this should work okay perfect now we need to bring in our data set
so we've worked with that world population data set that is the exact one that
we're going to use now so we'll say dataframe equals pd. read undor
16:13:20
CSV do R and we'll paste in our CSV and this is what it should look like although
your path may be different be sure to make sure that you have the correct file path
then we'll read it in now this data set should look extremely familiar if you've
done some of my previous pandas tutorials but I did make some alterations to this
one took out a little bit of data put in a little bit of data here and there um to
change things up because if it was just exactly how I pulled it which I got this
data
16:13:49
set from kaggle if it was exactly how we pulled it like we've looked at in the
previous videos it's too simple you know we wouldn't actually be able to do some of
the things that I would like to show you so be sure to actually download this exact
data set for this video because it is a little bit different but what we're going
to do now is just try to get some highlevel information from this now if yours
looks just a little bit different like your values are in scientific notation uh I
16:14:16
have applied this so many times I think it's um you know still applied to this you
can do something and we'll write it right down here we're going do pd. setor option
and we'll do an open parenthesis and we'll say display. float uncore format and so
we're going to change that float format by just saying Lambda X colon and then
we're going to change basically how many um decimal points we're looking at so
let's just do here so we'll do a quote percent sign 2f so we're formatting it
16:14:53
whoops 0 2f so we're going to format it and we'll do percent X this is going to
format it appropriately I'm I can run it um and actually it will change it this is
at 0 one I believe last time I did it so let's run this and then let's run this
again it'll change it to point 2 so that's two I like it at 0.1 we don't really
need it any well let's keep it at point two why not we're going to keep it at point
two that's how you change that and I like looking at it like this a lot
16:15:21
better than scientific notation so just something to point out um let's go down
here and let's just pull up data frame so we have this data one of the first things
that I like to do when I get a data set is to just look at the info so we're going
to do doino and this gives us just some really high level information this is how
many columns we have here are the column names here are how many uh values we have
and if you notice this is where it kind of gets so we have 234 in each of these so
in each
16:15:53
of these columns we have 234 until we get to this 2022 population once we get there
we start losing some values and then at the world population percentage we have all
of our values all 234 of them the count tells us that it's non null so it does have
values in it and then we also have the data types and these come in handy later um
and these are really great to know and we'll be able to kind of use those in a few
different ways later on in this tutorial really quickly I wanted to give a huge
16:16:23
shout out to the sponsor of this entire Panda series and that is udemy udemy has
some of the best courses at the best prices and it is no exception when it comes to
Panda courses if you want to master Master pandas this is the course that I would
recommend it's going to teach you just about everything you need to know about
pandas so huge shout out to you to me for sponsoring this Panda series and let's
get back to the video the next thing that I really like to do and this one is DF do
describe this allows you to get really a
16:16:49
highlevel overview of all of your columns very quickly you can get the count the
mean the standard deviation the minimum value and the maximum value as well as your
25 50 and 75 percentiles of your values so just at a super quick glance there is a
row somewhere in here and there this country their population is 510 for 2022 and
in fact if you go back to 1970 it was higher it was at 752 that's just interesting
then if we look at the um max population one has 1.42 billion I believe that's
China and
16:17:27
then over here in 1970 we have 822 million again I still believe that's China but
this gives you just a really nice high level of all of these values or all these
different calculations that you can run on it and we can run all these individually
on even specific columns but you know it's just a nice highlevel overview one thing
that we just talked about was the null values that we're seeing in here um I'd like
to see how many values we're actually missing because that is a problem um we
16:17:55
don't want to have too many missing values or could really obscure or change the
data set in irely and so we don't want that so we'll say DF do is null and then
we'll do a parenthesis and we'll say do sum and when we do this whoops dot sum
there we go when we do this it's going to give us all the columns and how many
values we're actually missing now we have 234 rows of data so we have 41 477 55424
um so we have we definitely have data missing what we choose to do with it in
16:18:33
the data cleaning process maybe we want to populate it with a median value maybe we
just want to delete those countries entirely if the data is missing um you know I
don't think you're going to do that but these are things that you need to think
about when you're actually finding these missing values this is what the Eda
process is all about we want to find different um either outliers missing values
things that are wrong with the data or we can find insights into it while we're
doing this
16:19:01
as well so so this is definitely something that I would consider um when I'm
actually going through that data cleaning process really really important
information to know now let's go right down here go to our next cell say DF do
unique and this is going to show us how many unique values and it's actually n
unique uh this is going to show us how many unique values are actually in each of
these uh columns and this one makes the most sense um for continents because I
think there's only seven continents
16:19:32
right right um but we have six right here and for all of these each of these ranks
countries capitals should all be unique that makes perfect sense as well as these
you know these populations are such specific numbers and such large numbers I would
be shocked if any of these were similar and then for these world population
percentages it's much lower and again that makes a lot of sense because when we're
looking at and we'll pull it up right here when we're looking at these world
population
16:20:00
percentages um a lot of them are really low 0.00 0.01 like this one um 0.2 there
are a lot of really low values for those small countries and so those are all um
you know one unique value now let's say we just have this data right here and we
want to take a look at some of the largest countries and we can easily do that we
could even we could say Max and take a look at the largest country but I want to be
a little bit more strategic I want to be able to look at some of the top range of
countries and we can do
16:20:31
that based off this 2022 population so we'll say DF do sortore values this is how
we sort and um not filter but um order our data so we'll do sort values and then
we'll do buy is equal and then we'll specify that we want uh this 2022 population
and then we're going to say comma and we'll say actually let's just run this as is
um but we'll do head because we just want to look at the top values so now we're
just looking at the very top values so what we're looking at is actually these
16:21:04
2022 population um that's what we're filtering on or sorting on basically and we're
looking at the very bottom values because it's sorting ascending so from lowest to
highest so this Vatican City in Europe is um you know 510 that's the value that we
were looking at earlier now we can do comma ascending equal to false because it was
by default true we can do false whoops we can do false and then it'll give us the
very largest ones so if we just take a look at the top five largest by population
we're looking
16:21:37
at China India United States Indonesia and Pakistan and we can even specify that we
want the top 10 in this head we can bring in the top 10 we also have Nigeria Brazil
Bangladesh Russia and Mexico and you can do this for literally any of these columns
whether you want to look at continent capital country um you can sort on these and
look at them and you can even look at you know things like growth rate world
percentage this one seems really interesting let's just look at this one really
quickly before
16:22:07
we move on to the next thing um if we look at this world percentage just China
alone I believe yeah just China alone is 17.88% of the world so 17.88% world
population percentage again just getting in here looking around that's all we're
really doing now I want to look at something and I have always liked doing this
which is looking at correlations um so correlation between usually only numeric
values we can do that by saying DF docr and a parenthesis and we'll run this and
what this is is it is comparing
16:22:55
every column to every other column and looking at how closely correlated they are
so this 2022 population if we look across the board it's very highly I mean this is
a one: one this is highly correlated to each other and that almost for all of these
populations they're very very closely tied to each other which makes perfect sense
because for most countries they're going to be steadily increasing and so they're
probably almost exactly correlated but we can look at these populations and if
16:23:26
you look at the area it's only somewhat correlated and that's because in some
countries you know they have a very high population but a small area or vice versa
a small area and a very high population so there isn't a one toone correlation
there but it's hard to really just glance at this um and understand everything
that's there we could just visualize it and it would be a lot easier so let's go
ahead and do that let's go down here we're just going to visualize this using a
heat map
16:23:55
basically so we're going to say SNS do heatmap and an open parenthesis and the data
that we're going to be looking at is DF do core correlation and then we also want
to say inote equals true I'll kind of show you what that looks like in just a
little bit um but let's do PLT doow and this will be our first look and I need to
say show not shot um we can get a little glimpse of what it looks like but this
looks um absolutely terrible let's change the figure size really quick so I want to
make this much
16:24:33
larger than it already is we'll do pl. RC prams RC params oops right there do an
open parenthesis and then right here we're going to do in quotes do figure. fig
size this actually needs to be in brackets I believe just like this not parentheses
we'll say fig size is equal to and now we can specify the value that we want let's
do 10 comma seven and see if this looks any better no no that doesn't look good do
20 okay that looks a lot better and um you know this is just a quick way
16:25:15
because it gives you basically a colorcoded system highly correlated is this tan
all the way down to basically no correlation or negative correlation even which is
black so when we're looking at these 2022 populations and these are populations
right down here on this axis we can see that all of these are extremely highly
correlated very very quickly whereas the rank really has nothing to do it's it's
negatively correlated doesn't really have anything to do with it then for the
population
16:25:45
and the world population percentage it again is quite correlated except for the
area density and growth rate so I find that really interesting that you know the
density the growth rate in the area aren't really all that Associated or correlated
with the population numbers that is I kind of would assumed that on some level they
went hand inand the area does um would you know again make sense you know larger
area larger population that kind of thing but even density um I guess I guess
density and growth rate um
16:26:21
growth rate I can see because that's a percentile thing that could be definitely
not correlated I thought the density would be more correlated than it is all that
to say is this is one way that you can kind of look at your data see how correlated
it is to one another that can definitely um help you know what to analyze and look
at later when you're actually doing your data analysis let's go right down here um
something that I do almost all the time when I'm doing any type of uh exploratory
data
16:26:48
analysis like this I'm going to group together columns start looking at the data a
little bit closer um so let's go ahead and group on the continent so let's look at
it right here let's group on this continent because some times when you're doing
this Eda you already know kind of what the end goal of this data set is you know
kind of what you're looking for what you're going to visualize at the end that you
really comes in handy when doing this but sometimes you don't sometimes just going
16:27:13
in blind and so far we've really just been going in blind we're just throwing
things at the wind kind of seeing some overviews um looking at correlation that's
all we've done now I kind of want to get more specific I want to have like a use
case something that I'm kind of looking for not doing full data analysis not diving
into the depths but something we can kind of aim for so the use case or the
question for us is are there certain continents that have grown faster than others
and in which ways so
16:27:42
we want to focus on these continents we know that that's the most important column
for this use case this very fake use case um so we can group on this continent and
we can look at these populations right here because we can't really see growth you
can see a growth rate but the density per uh kilometer we don't have multiple
values for that it's just a static one single value same for growth rate same for
world population percentage but we have this over a long span many many years um
you know 50
16:28:13
years of data here so this we can see which countries have really done well or
which continents have really done well so without you know talking about it even
more let's do DF Group by and then we'll say continent oops let me just copy this
I'm I'm not could it's spelling we're going to say DF groupy and then we'll do mean
and we can just do it just like this and now we have Africa Asia Europe North
America Oceana and South America okay so if I'm being completely
16:28:48
honest I knew most of these all right I'm no geography extra expert but I I knew
most of these I don't know what this ocean is um this that I don't I genuinely
don't know what that is um so let's just search for that value and see we'll come
back up here in just a second but I want to I want to kind of understand um what
this is so we're going to DF um and we'll say continent let me sound that out for
you guys um then we'll do string. contains oops contains good night and then I want
16:29:26
to look for Oceana uh and let's let's run this oh I need to do it like this now
let's run this so now we're looking at our data frame we're seeing when the values
have this continent as Oceana um okay so these look like Islands I'm guessing so we
have Fiji Guam um New Zealand Papa New Guinea yeah these look like all I'm I'm
guessing based off the continent Oceana um Oceania o Ocea Oceania guys this is
tough for me okay I'm doing my best I you know this is part of the Eda process
16:30:11
I don't know what that means I don't know what ocean ocean ocean Oceania geez I'm
just going to call it Oceana that's so wrong but I'm just gonna it's so easy for me
to say you know I I now am seeing this and it looks like Islands um which would
make sense because for their average they have the highest average rank um and I'm
guessing that's because they're just mostly small continents so let's let's order
this really quickly we're going to do dot
16:30:43
sortore values do an open parenthesis and I want to sort on the population we're
just doing the average population um we'll do BU um equal so on the average
population and we'll do ascending equals false so we're looking at this average or
the mean population Asia has the highest population on average and we have South
America Africa Europe North America and then Oceana at the very bottom which makes
perfect sense again small Islands um world population percentage so each of the
16:31:22
countries each of those countries in Asia makes up about 1% on average really
interesting um to know and just kind of look at this and and the density in Asia is
far higher than double almost double every single other continent um really really
interesting actually now that I'm looking at this but you know that's something
that I would actually look into and I would be like what is this Oceana or Oceania
what does that mean and you know let me look into that let me explore that more
because I want to
16:31:55
know this data set I'm trying to really understand this data set well but what I
want to do now is I want to visualize this um because I just feel like looking at
it I don't it's hard to visualize and again the use case that we're saying is is
which continent has grown the fastest like it could be percentage wise it could be
um you know as just a whole on average let's take a look so we're going to take
this and let's copy it like this let's bring this right down here so
16:32:23
let's look at this so if I try to visualize this and let's do that let's do df2 is
equal to because I'm I already know it's not going to look good just based off how
the data is sitting um we do df2 oops what am I doing I don't need to do that but I
will okay df2 and we'll do df2 do lot I'll we'll run it just like this um as you
can see Asia South America Africa Europe North America Oceana we can kind of
understand what's happening but these are the actual um values that are being
16:33:04
visualized not the continents which is what I wanted um in order to switch it and
it's actually pretty easy and this is something that um you know is good to know we
can actually transpose it to where these these continents become the columns and
the columns become the index and all we have to do is say df2 do transpose and
we'll do this parentheses right here and let's just look at it and then we'll save
it so now all these columns are right here and all of the indexes are the columns
so let's say df3 is equal to and
16:33:43
I'm just doing that so I don't you know write over the DF or my earlier data frames
so now we have this data frame three so now let's do data frame 3. plot and it
should look quite a bit different uh whoops I didn't run this let's run this and
run this and as you can see this does not look right at all and the reason is is
because we're not only looking at uh the correct columns we have this density in
here word population percentage rank we don't need any of those the only ones
16:34:15
that we want to keep are these ones right here this population now we can do that
and we can just go right up here this is where we created that data frame two that
we transposed we can go right up here and we can specify within this we actually
only want specific specific values now we can go through and handr write all of
these and by all means go for it but I am going to go down here I'm going to say DF
do columns and I'm going to run this it's going to give us this list of all of our
columns and I'm just going to
16:34:47
you can just copy this and you can put it right in here think I need a list with I
think it needs to be like this if I'm let me try running this okay so this worked
properly you can do it just like this or a little shortcut if you want to do it
like that if you want to do a shortcut like um I I would hope you would you would
just do DF doc columns just like how we looked at down here except since this is
our an index we can search through it so we can just say 0 one two okay so we can
do five up to 13 because
16:35:22
I think it's seven and we'll just let's see if this works uh it may not I may
actually need to go like this let's see there we go so you can just use you know
the indexing to save you some visual space gives you the exact same output so now
we have this this is our df2 now let's go down and transpose it so now we just have
these populations and we have our conents right here and then now we're going to
plot it and this looks good although it's backward um okay it's
16:35:57
backward so what I actually want to do is not this uh that is a quick way to do it
although not the best way to do it um so I'm actually going to copy all of these
and although I said it would save us time it did not at all so I'm going to put a
bracket right here I'm going to paste this in here and I'm literally going to
change these up I might speed this up or I might just have you sit through this
because you know this is an interesting part of the proc process and I want you
know you to get
16:36:32
the full experience you know what now that I'm talking about it that is what we're
going to do you guys can hang out with me this is a good time we have 2010 2015
2020 and 2022 now let's run it what did I do oh too many brackets there we go so
now it's ordered appropriately we have 1970 all the way up to 2022 this is how we
want it let's transpose it appropriate let's run it and now we basically have the
inverted uh image of this now just at a glance and we haven't done anything
16:37:09
to this except for literally what we are looking at at a glance we can see that
from 1970 China here you know Asia and China are already in the lead by quite a bit
and it continues to drastically go up especially in the 2000s like right here it
explodes like just straight up then kind of starts going up and just leveling off
every other continent especially oce Oceana is just really low it it never has done
a bunch let's see look at green green has gone up um from you know Point let's say
16:37:45
0.1 up to about point2 so they've almost doubled um in the last 50 years and again
you can just get an overview a high level overview of each of these you know
continents over the span of this time so this is kind of one way that we can you
know look at that use case we're not going to harp on that too long I just want to
give you an example like you know when you're looking at this sometimes you'll have
something in mind of what you're looking for and you go exploring and just kind of
find what's
16:38:15
out there and find what you see um the next thing I want to look at is a box plot
now I personally I love box plots you know they're really good for finding outliers
and there's a lot of outliers I already know this because because the average the
25th 50 percentile are very low and then there's some really just big outliers but
for your data set it may not be that way and those outliers may be something that
you really need to look into and box plots have been something that I've used a lot
where I
16:38:45
found those outliers that way and started to dig into the data to find those
outliers and you know came across some stuff that I'm like oh I have to clean this
up I have to go back to the source really um really really powerful and useful to
be able to find these so all you have to do is d. boox plot and let's take a look
at it and this already looks good as is maybe I'll make it a little bit wider um
let's do fig size oops sorry fig size is equal to let's try 20 by 10 um okay that
didn't help at all I
16:39:21
apologize thought I would but let's keep going what this is showing us is that
these little boxes down here which are actually usually much much larger because
you have a more equal distribution of of um numbers or values in the small value
this is where our averages lie this number right here is the upper range and then
all these values all these Open Circles those actually stand for outliers so we're
looking at the 2022 population there's a lot of outliers now for our data set
knowing our data set is
16:39:55
really important outliers are to be expected especially when most countries are
continents are small so we're looking at you know all of these little dots are
outlier countries um or outlier values which each value corresponds to a country so
if this was a different data set I would be you know searching on these and trying
to find these so that I can see what's wrong with them if anything or if they are
real um numbers like if this was Revenue everyone's revenue is way down here and
then
16:40:23
there's one company that's making like 10 trillion dollar that'd be an outlier up
here and it would definitely be something that you want to look into to for our
data set knowing that you know we're looking at population this is more than
acceptable you know oddly enough but that's what box plots are really good for
showing you some of those cor tiles the upper and the lower um as well as denoting
these points that fall outside of those normal ranges for you to look into so
really really useful so
16:40:50
now let's go down here pull up our data frame again and we've kind of just zoomed
into the whole Eda process there was one last thing that I wanted to show you and
this is the very last thing that we're going to look at we're ending on really a
low point if I'm being honest because the last kind of stuff was more much more
exciting but there is something DF DOD types oops let's do DF DOD types and we'll
run this now just like info it gave us these values but we're
16:41:19
actually able to search on these values now so these um object float and integer we
can search on those which is really great because we can do include equal and we
can do something like number and none of these are numbers right or none of them
explicitly say number but when we run it I'm getting an error series object not oh
that's because I'm doing um D types is for a series we need to do select underscore
D types now let's run this now it's only returning um The Columns in this data
frame where the
16:41:55
data types are included in this number so you won't see any you know country or any
of those text or the strings if we want to do that we go in here and say object and
run that and this is another really quick way where we can just filter those
columns to look for specific whether it's numeric um we could even do float in here
and so now it's not including that rank which was an integer so we can specify the
type of data type and it'll filter all of the columns based off of that which you
know
16:42:28
when you're doing stuff like this you it is good to know what kind of data types
you're working with and look at just those types of data types because there might
be some type of analysis you want to perform on just that whether it's numeric or
just the string or integer columns within your data set so again ending on a low
note I apologize um you know everything else that we looked at all those other
things that we looked at are all things that I typically do in some way or another
when I'm looking at
16:42:55
a data set exploratory data analysis is really just the first look you're looking
at it you're going to be cleaning it up doing the data cleaning process and then
you're going to be doing your actual data analysis actually finding those Trends
and patterns and then visualizing it um in some way to find some kind of meaning or
Insight or value from that data and again there's a thousand different ways you can
go about this it it does typically um you know depend on the data set but these are
a
16:43:24
lot of the ways that you'll clean a lot of different data sets and so you know
that's why I went into the things that we looked at in this video video so I hope
that you guys liked it I hope that you enjoyed something in this tutorial if you
like this video be sure to like And subscribe as well as check out all my other
videos on pandas and Python and I will see you in the next [Music] video what's
going on everybody welcome back to another video today we are back with another
data analyst portfolio
16:43:58
project where we will be scraping data from Amazon using [Music] python now you may
be asking do I need to know web scraping to become a data analyst and the answer is
no you absolutely don't need to know it but it is a very cool skill to learn and in
fact I have used it in my job in the past and so it is useful but you really don't
need to know it something that it is used for is kind of creating your own data
sets um and we're going to be looking at one where you can create your own data set
today but there are a lot
16:44:30
of other uses for web scraping and I'm sure I'll talk a little bit more about that
while we're actually walking through the project one last thing I want to say
before we get started is that this is most likely an intermediate project so if you
are just now learning the basics of python this might be a little bit challenging
for you but I still recommend going through it because I will do my best to walk
through everything every single step of the way and and kind of explain all the
concepts
16:44:52
and so you can still learn something even if you aren't super good at python right
now with that being said let's jump over to my screen and get started on the
project all right so we are going to get started and if you didn't watch the last
project I had people download Anaconda uh we use Jupiter notebooks um and I'll show
you how to get to that in just a second but I'll I'll leave this link in the
description if you haven't done that already and you are just doing this project um
but you'll go you'll
16:45:15
download andaconda You Know download super easy um and you're going to open up
Jupiter notebooks I'll launch it right now I already have it open uh but I'll open
up another one just for you know the purposes of demonstration what we are going to
do today and what we um what people voted on I mean there's like there was like
8,000 people that voted um in the poll that I made of what data you wanted me to
scrape there was like Amazon cryptocurrency weather um something else I don't
remember
16:45:44
overwhelmingly I mean like 70% of people maybe even 80% I you don't don't fact
check me on that voted for Amazon um and so I'm going to do it now there are many
things that you can scrape um off of Amazon just a ton of stuff um and I'm going to
show you how to do it I'm going to show you how to make it useful how to make a
data set um and it's going to be really interesting but there are lots of other
ways to do this and so I think um and I have already kind of created it
16:46:17
I'm going to show you how to do it off of this page um when you're actually in an
item and you can scrape you know basically anything in here um and I'll show you
how to do that another thing that is a little bit more advanced and that's why this
first video is starting off I think on the more easy side it's not easy but it's
easier the next thing the next video that I'm going to make is how to actually do
um basically do multiple items right so this item this item this item this item and
then
16:46:48
Traverse through the different pages so there 20 Pages um you want all of that data
how do you get all of that that'll be the next project um I don't know when I plan
on doing that I it like 90% of the way done um but I had this one completed and so
I wanted to get that out to you guys now but that will probably be the next project
I think that is much more difficult um and so if you can understand this one and
you get it and and you understand it then the next project you should be able to
16:47:15
understand too is just a little bit more complicated so with that being said um we
are going to actually get into the project I'm going to delete one of these um all
we're going to do is go to new do Python 3 it'll open up new one we'll call this um
Amazon web scraper um project that's what we'll call it I spell it right perfect um
the first thing that we need to do uh or that we should do is upload um or or or
import our libraries so I'm going to say um import oops what am I doing it's off to
a
16:47:56
terrible start there we go import libraries now I'm not going to write out all the
libraries um I have some things that I'm going to be copying and pasting throughout
this I won't there's only a few things that I'm copying and pasting you can take a
quick glance um some of the things that I just don't want to waste time on um
because this could be a long video I don't know I don't want to waste time on stuff
like this um and so you know I'm just going to copy and
16:48:20
paste it you guys are going to I'm going there will be a link below if you haven't
clicked it already that will go to the GitHub page where you can literally have all
of this code already written WR I do recommend writing it all yourself because you
will learn it much better I promise CU then you'll make mistakes and you'll figure
it out and all that all that good stuff but you will have that code available so
just go copy and paste it um that's what I would do but what we are we are going to
be
16:48:43
using today is uh something called Beautiful soup requests um then we're going to
be using time and date time and a potential one if you want to get and I'm going to
show you this at the end this is not really part of the project it goes above and
beyond but this Library here is for sending emails to yourself um and I'll show you
how uh you can use it if you want to I already have the whole code written out um
you can just steal it and try it out yourself and see if you can get it to work but
16:49:12
this one is not um as important I'll put it down here so um let's move on now one
thing I want to say before we get too into it is that well give me a second is that
right here in front of me is a different laptop now it took me a solid I would say
you know 10 hours or so to write all of this is took over the course of like two
weeks in my free time I'd pick it up it took me a solid you know two weeks on and
off an hour here an hour there to finish this project um and I made a ton of
mistakes and messed
16:49:50
a bunch of things up and I finally got it to work um you know after a bunch of
revisions that's typically how things go when I do projects and so uh I'm about to
give you a stream lined version of this because I have all the code right down here
and so I'm going to be glancing at this a lot um just so I don't make this video 20
hours of trying to remember all the code off the top of my head I have it written
out already I already did the project it works it's beautiful it's a good project
so um I
16:50:17
don't want to waste your time and I just want you to know that you know you you
nobody should be able to do this up top their head in an hour most people won't um
it takes time you make mistakes um but uh let's get started on the project now in
this uh in this what we're going to have to do is we going to have to tell
beautiful soup and requests where we are actually getting this data from what
website um what is our computer you know some information from our computer I'm
going to again there's going to be a
16:50:53
little copying and pasting in here because you don't ever you will never ever ever
need to know this um but right here we're going to to basically connect to the
website so I'm just going to say connect to website and we going to say URL is
equal to and let's go get our URL so we have this right here so literally just go
up here do you know uh controll a copy that oops that's the actual project get rid
of that uh paste it in here and that is our URL we will use that in just a second
uh
16:51:28
what am I doing me just get some room here and then we what we're going to need is
something called headers now again you will never ever ever need to know this so
I'm just going to say headers um what I'm going to do is I'm going to copy this I'm
going to show you how to get this really quick um but is something called headers
so uh let me show you how to use how to get this and why you don't need to know any
of this so what this headers is is this something called a user agent you need
16:52:03
to do this for your computer um and you can do that by going to this link right
here so I'm going to put this link in the description so that you can go and get
that and there's something right here called the user agent so all you have to do
is copy this just like this do copy I'm going to go back here and I'll show you
that it's I'm going to copy it in um it'll be the exact same so there you go it's
the exact same um all of this extra stuff except encoding except um this
16:52:38
HTML stuff Connection close all the you don't need to know any of it I promise
you'll never come in handy ever in life actually there will be one person who that
becomes in handy for and then they'll message me um but we are now connecting um
using our computer using this URL and then what we want to write is we want write
page we're going to say equals and this is where we start using uh these libraries
so we're going to use requests.get and we are going to pull in that URL and we're
just going to say
16:53:12
headers is equal to our headers right here so uh we have this and this is where
we're going to actually start getting the data bringing in the data um and it's not
going to look like that at first but I'll try to print some stuff out out as we go
along the way so that you can kind of see what it looks like and how we're going to
kind of make it more useful because it comes in very dirty uh when we first get it
and some of the things I'm going to show you will just help clean that up um and
before we
16:53:43
actually go any any further I don't want my head to be here for the entire time I'm
going to get rid of myself so you can just see the page uh I just it's less
distracting uh I hate when I feel like people are always watching me so I want
people to just focus on the code uh so I will see in a little bit let's get back
into it all right so what we are going to do is we are actually going to start
using the beautiful soup Library all right so we are going to say soup one is equal
to and this is where we
16:54:11
actually start bringing beautiful soup and you guess it you're going to say
beautiful soup and then in parenthesis we're going to do page. content um and again
these aren't really things that you need to remember or need to memorize we're just
pulling in the content from the page that's really all we're doing right now and
and it comes in as HTML so we're going to do html. parser uh and let's see if I can
print out uh actually let me just do soup one I don't like I don't like doing upper
16:54:40
caps on stuff let's see if anything prints out real quick so we are literally
pulling in all of the HTML um and let me go show you really quick because we're
going to get to this in a second anyways um if you come here this is this is a
static page basically written in HTML um if you have never seen HTML before um you
know actually a lot of this is you know just stuff that most people will never use
uh it's just good to know some of the stuff is good to know so as you see I'm
scrolling on this right side by the
16:55:18
way I did rightclick and inspect or control shift I whichever one works better for
you but as I'm scrolling over this you should see it kind of highlighting different
areas um it's hard to kind of get what you want let's say we want this title um
what I can do is I can click select element go right here um and then we can select
like a TI the the the header or the title of the the page now I just want to show
you though of what we're pulling in so we're pulling in this doc type HTML all of
16:55:49
this is coming in so that's what this is right here this doc type HTML and we're
pulling every single thing in that is what we're doing right now uh so let's get or
let's go down a little bit let's do soup two we're just going to do a very uh you
know an upgrade to soup one basically we'll do beautiful soup again and then we're
going to do uh soup one so we're pulling in that content again so that soup one and
we're going to do do PR prettify if you don't know
16:56:25
what that is it is common in a lot of different languages and a lot of different
stuff um it just makes things look better it that's really all it is uh I don't
know why I'm using double quotes I don't know why I can you can do single ones if
you want um and now let's do beautiful soup to and it should just be a it should be
better formatted um and let's see if that's true and it is so before if you did if
you could tell it was didn't have basically any formatting it has a little bit of
16:56:56
formatting now um it'll help in a second um and you'll see that but now what we
want to do is go back and we want to actually get the data that we want now you can
get any data you want I'm going to show you simple things really really easy um in
my in in in my opinion it gets more difficult the more complicated stuff you start
pulling um and and you'll understand that as we go into it so what I'm going to do
is I'm going to select this and I'm going to select this um the title I want that
and so if you
16:57:31
do span ID it's equal to product uh title so we need to remember that um class we
don't need to know class I believe uh we're going to be using that ID this um ID
equals product title so that's what we're going to be using um class will come in
in the next video when we start looking at these uh but not in this one so let's
remember ID equals product title so let's go back over here so we have this soup 2
it's basically all of that HTML in it right down here that that is what we're
16:58:05
pulling in so we need to kind of specify what we actually want so let's say title
that's what we're going to be getting um and we're going to do soup 2 so using
taking all that content um we're do find and we're going to do open parenthesis and
we're going to say we want to find that ID where it's equal to product title and
then we're going to do do getor text and then we're going to do open parentheses so
now let's um let's print the title and see what we get all right so
16:58:43
that is exactly what we're looking for it's funny got data Mis um T-shirt that that
is what we're trying to pull in so that's perfect that's exactly what we want we
don't uh let me let me just do this save me some time later on we don't only want
the title we are also going to be pulling in the price so if you can guess uh we'll
be doing some uh a data set on the actual pricing um and so let's go back here
we're going to again use this right here and we're going to go to this
16:59:19
price and it says again we're going to look at this ID the ID equals price blockor
our price so fairly easy you can copy this I'm just going to write it out um we're
going to say price is equal to sup 2. find and then it's going to be again ID is
equal to and it's going to be price block underscore our price did I saw that right
oops excuse me there we go and the exact same thing.get text parenthesis uh and
there's a g text there's a get all or get all text um so you know that get text is
a specific
17:00:04
thing that we are using you we might use a different one later on um but that that
is what we have so now let's let's print the title and print when I why do I have
all this too much uh too much space so let's T print the title and print the price
let's see what we get okay so we have our title and we have our price I mean you
know I don't know what all this white space is over here um but it looks like
there's a lot of white space over here we'll have to get rid of that uh in
17:00:39
a little bit as we clean it up a little bit you can if you want do things like um
you can get and this is up to you I'm not going to do this right now but I'm just
going to show you how to do it you can get this where you're pulling in the ratings
um which is you know if you want to look at like how the ratings over time or or
what ratings are for specific products that could be really useful um you can pull
basically anything you can go down the product details and look at Dimensions uh
anything you want on this
17:01:11
page it is static so you can go in here and pull anything it's it you just have to
pull it from the HTML know where you're looking pull it in um and now when we go
back here excuse me I'm going to show you now kind of how to use this right because
we have this but how are we going to use it um that's kind of the important part I
think first thing we need to do is clean this up a little bit because it just is
you know if we try to use this it wouldn't be super useful because it'd be just a
17:01:43
little bit dirty it's not super clean um so what we want to do is let's start with
the price why not uh we're going to say price. strip um and that's just going to
take uh basically the the junk off of either side and so let's run that real quick
so this is what we have but what we can also do is I don't want that dollar sign I
just want the numeric value um later on we are going to be putting this and we're
going to be um creating a process to put this into an Excel file again we're trying
to create
17:02:18
a data set I don't want you to have to copy and paste stuff it's all going to be
automated basically to input this data into an Excel file for you or a CSV file for
you so um you know think about making it useful in a CSV or in an Excel later on so
what we can do is do a bracket and we're going to do one and then everything after
that so basically it's just going to take everything from the first position onward
uh so let's run that and there we go so let's just say price is equal to price.
strip um
17:02:52
and pull uh just do everything after that first um that first not value what am I
saying what's the word for that I can't remember the word the first space that's
not the right word but all right let's do the title um this is basically going to
be the exact same thing um super easy so we're just going to do title. strip and
open parentheses um and we can you know if you want to do this exact same thing so
now we have it it's a little bit cleaner so this is what it originally looked
17:03:23
like and now this is what it looks like so you know nothing super crazy but you
know something interesting to know now we are about to in the very next part what
we are going to do and let me just add a few of these because makes me feel better
um what we are about to do is we're going to create our CSV to insert this data
into the CSV and then later on what I'm going to do is show you kind of how to um
automate this process to pull this data um to create a data set right just pulling
this one time and putting into a
17:03:58
csb really doesn't do anything you can just copy and paste that and save yourself a
lot of time um what I'm going to show you is is um basically doing it over over
time and just having it automated in the background that is what I'm going to show
you um I guess a spoiler but what we need to do is we need to create uh create the
CSV insert it into the CSV and then create a process to append more data into that
CSV um I'm doing a lot of talking let's do some writing so what we need to do is
we're
17:04:33
going to use um I should have done this at the top maybe I'll go back and add that
later on we're going to do import CSV now in a CSV what you want is you want
headers and then you want the data right so for our headers and we're going to call
it header we're going to do um we're going to do a bracket and let's make the first
one a title because that's going to be uh we can call it title you can call it
product um whatever you want I'm just going to call it because I've been using
title I'm
17:05:01
going to call it title um and then we'll also have price now we need our data so
I'm going to say data is equal to now this is important um right now how our data
is and I can do this right here we're going type um title or no let's do type price
so these are strings and that's important to know um again I don't want to get too
much into you know dictionaries and arrays and lists and and strings and all these
things but this is a string and you can't put that right now it's not super usable
what
17:05:39
we're going to do is make this a list um and so I'm doing an Open Bracket and I'm
going to say our data is title comma price oops price now oops if I do type oops of
data I'll just run that it's a list now um and this is important because you can
run into a lot of issues with the stuff it's really important to remember what's
what type um how do I say this uh how your data is is it a list is it an array is
it a dictionary um you know what is it these things are important they do play a
big
17:06:22
impact especially with this type of stuff so just wanted to show you that really
quick but what we are now going to do is create a CSV um you're going to create an
Excel I I call an Excel CSV you know whatever you want to call it so what we are
going to do is we are going to say with and we're going to say open and now we're
going to name our file you can name this whatever you want I'm going to call it uh
um Amazon web scraper data set that's real long uh. CSV and then we're going to do
17:07:01
underscore W and that means right um oh whoops that's not right just like I was
wondering why that was uh in Black uh so we're going to do W which means right um
and then we're going to do new line and if you don't know what new line is uh all
that does is when we insert the data it doesn't have a a space in between each CSV
and then we are going to do encode coding is equal to oops is equal to utf8 and
that is it and we'll just say as uh let's do F so some of that stuff
17:07:42
you don't need to know some of it's useful this W definitely need to know this new
line is is good to know and um I'll take it I might take it out just to show you
what it actually does because it's annoying if you don't have it I promise um but
you know that that new Line's important this encoding you know good to know I think
that's by default is is it's like that uh anyways what we're going to do now is
we're going to uh it's something within the CSV within the CSV um Library so we're
17:08:12
going to do something called CSV writer and oops CSV do writer and we're going to
do open parenthesis and that is that and we'll just call that writer and then we'll
we'll do this is where we need to actually create the header so uh we're going to
do writer is dot sorry writer. WR row uh and this is just for the initial um the
initial import or or or um not import the initial insertion of the data into the
CSV this is what's important the next one that we're going to write is
17:08:55
for when we're actually appending the data which is going to be a little bit
different but anyways we're going to do right Row open parenthesis and this is
where that header is going to go so we're going to that these headers are going to
be the title and the price and then for our last one we're going to actually write
the data which is this data right here and we're going to say writer. write row and
we're going to do data so this one we are creating the CSV and then we are
inserting the header
17:09:27
and inserting the data so super easy um yeah I think that's fairly straightforward
right now let's do this and let's see what happens so I just ran it um let's go
over here in here somewhere Amazon web scraper data set let's open that up and
there we go oh jeez this isn't good can't verify my uh my subscription uh why does
it say $699 I'm going to go back and look but I think I know the issue um but this
is exactly what we want now of course we want more data and
17:10:12
maybe a little bit more useful data um and I'll show you how to get that in just a
second but we just created that out of thin air uh that was not I didn't have that
saved before so we have this data set and the issue was is that I ran this multiple
times so now it's $6.99 if I do it again it's 99 uh and if I did it again it's it
gets rid of everything so I'm just going to run this again run this again now
everything's back to normal okay so now if we run this it's going to
17:10:46
overwrite this Amazon webscraper data set. CSV and it will put the data in properly
so there we go oh jeez guys this is embarrassing I'm embarrassed no I don't want
this okay perfect um guys I if you can't tell I'm in need of some um I'm in need
I'm in need of some help here but I'm just kidding I'm I'm doing fine uh I just I
don't know why that uh why I don't have my uh subscription activated it's not going
to matter for this video I guess but that's really random um so we got
17:11:25
what we need that's perfect now what we want to do after this um I I guess actually
what is important is some more useful data something that I like to do a lot when I
do this type of this type of stuff is I like to have some type of date stamp um or
some type of Tim stamp to know when I collected this data it usually comes in handy
later on um I I have never regretted putting it in there I'll show you really quick
how you can do it uh you're going to do import daytime geez I hate having to format
stuff like
17:12:00
that and what you can do is you can do date let me get date time and you do dat.
today open parentheses and that is going to give us this right here uh and so we're
just going to do um today that's what we'll call it is equal to this and we'll say
print today and there we go so that is today's date is the 20 of August in 2021 so
today is now um is now this so actually I'm going to get rid of that I'm going to
put it back up here I'm going to put it right there I'm going to
17:12:41
run it again let's add this right here we'll do um we'll do we'll call it date and
then we'll add today and we'll just run this again and what we can do just to check
the data without having to open up the data every single time which is super
annoying is we're going to use pandas again I should have imported this at the top
I'm just kind of um I'm not doing this off the top of my head but uh I didn't have
it 100% planned so import pandas and we're just going to say pd.
17:13:18
read CSV and then we'll read it in um what you can do or what I often do is I go to
properties and I go right here and we'll say boom boom back slash this right here
this I am doing off the top of my head I don't do this often I think I have this
memorized by now uh I I I hope and then we'll do print oh no we don't have to do
print we'll just do this uh what do I do R let's actually call this um data frame
and we'll do print let's see what happens perfect okay so what we have now is the
new our
17:14:07
new header our new data that we added in there so we have our title we have our
price and we have our date now again you can customize this whatever you want to
add go back here um you know find what you want you know do you want it to make
sure it has a men's option or different colors or you want to pull in this
information whatever you want it it really does not matter um just matters that you
know you get what you need for whatever purpose whatever you're making this for
this is more of an introductory
17:14:39
video to how to scrape data from Amazon um the next video will probably be a little
bit more difficult and in-depth but this is kind of let's get you guys started so
um we now have this and this is beautiful now something that you want to do when
you're scraping data and you're getting um I guess data over time and that's kind
of what we're doing is going to be almost like um a price tracker over time is you
want to then append data to this so we can't only create it and that's what this
does because if I
17:15:16
run this 100 times it'll only give me this first row we need to now append data to
this so um let's let's pull this down here uh again I'm I'm not I haven't added a
bunch of notes I'm going to say now we are appending data to the csb I haven't
added a ton of notes I'll try to go back maybe afterwards and add some notes for
people who like to read notes um so what we are now going to do is we're going to
change this W to an A+ now this is going to be how we append
17:15:50
the data um and we no longer need the header so we don't aren't going to do the
header anymore and there we go so now instead of excuse me so now instead of
creating that header again creating that first row of data again we are ignoring
the data and we're now going to the next nearest free row and a pending data which
means to add on data to that um and so if I run this which I'm not going to right
now I mean why not I can I can run it um and then we can read this in so now there
there's our data
17:16:26
I'll run it a few more more times I ran it like three or four more times I I run
that in and there we go now it's all the exact same data super um boring but very
very uh you know good to have now we don't want to have to come in here and run
this every day let's say we're going to do this daily um we don't want to have to
come and write run this every single day right we want a way where it does it while
we sleep it does it in the background of our laptop um and is easy to do right I
don't want to
17:16:57
come in here every single morning with a set an alarm on my phone every single
morning come in here I want to automate this so uh how are we going to do that give
me one second uh if you didn't know I have three kids and one of them is waking up
I'll be right back all right I think he is asleep um at least let's hope he's
asleep so now what we're going to do is we're going to put this all into uh this
check uncore price now you may never have used oh geez what are these things called
oh my
17:17:38
gosh super used all the time you'll know what I what it is uh not a function I
don't even remember what it's called maybe this's a function um I can't think I'm
having like a writer's block or whatever that is we're going to put it all in here
and then we're going to be able to use this price check later um because we want to
be able to automate this so let's go back all the way up here we are going to use
this so let's copy all of that in and oh jeez I hate
17:18:17
this all right everything just like that um so this pulls in our data pulls in uh
or or yeah pulls in all of our data down to the title and the price we want to make
it look right so we're going to put it right here so now we have it formatted
properly um we want to add our date time do it just like that I don't know if
there's a better I'm sure there's a better way to do this um then we need need this
right here and just like that like that so now we have our header and our data and
then
17:19:09
we want to pull this in right here boom boom boom okay so everything that we just
wrote out we are now putting into this check price now you can call it whatever you
want doesn't matter but let's run that see if we get any errors we don't so this is
now good to go basically um what we are going to use this for um and what this is
going to do is we are going to put this on a timer um you know have you ever wanted
to like check something once a day once every 10 seconds once a minute whatever you
want
17:19:52
and you don't want to have to actually pull up your phone and look at it this is
how we are going to do that so we had something called uh let's see time this this
Library time right here that's what we're going to use right now so we're going to
say while oops while true and go like this do a colon we're going to say check unor
price that's what we just wrote out and we're going to do time dos sleep now this
is completely up to you how how much time you want to put in here for
17:20:28
the purposes of demonstration I'm going to put 5 Seconds which means every 5
Seconds it is going to run through this entire process and so let's run this really
quick and I'm going to run it for let's say 30 seconds and then I'm going to pull
this in right here so we just looked at it earlier we had four um well five rows of
data right what we are going to do is in just a second I'm going to stop this you
know maybe after 30 seconds or so we're going to see how much data is in
17:21:05
there uh and let's stop it right now it's been going far enough um and La let's run
it so now we have five six seven eight so I guess I ran for 20 seconds we can that
was for demonstration purposes I've never do any some anything every every 5
Seconds um unless it was like Black Friday on Amazon we can put this as long or as
short as you want you can run it every second if you want um that doesn't make
sense to me but you can what we can do is do a little bit of math uh and I don't
know this off the
17:21:40
top of my head so I'm going to uh do the math with you live pretty exciting stuff
got the calculator out so there are 60 seconds in a minute and this goes by seconds
by the way and you could do you know you can do some um some string up here of
calculating this but I'm just going to put in the number because it's easier uh
maybe not easier I'm just going to do it there's 60 seconds um in a minute there
are 60 seconds or 60 minutes in an hour so that's one hour uh and we can do 24
hours in a day so
17:22:18
that's 86,000 400 I believe did I read that right oops did I read that right right
yes so this now if I ran this and I'm going to this is going to check the price
every single day and this is the entire point of this um of of this project not the
entire point but this is a big part of this project is we want to create our own
data set now something that I personally really love is a data set that has you
know that I can do some type of time ser series with now this is not exciting it's
probably not super
17:23:00
exciting for this right but you get the idea that if this price were to change we
would then see that reflected in the data at some point you can do this on any item
you could ever imagine on Amazon it's the exact same process and some items change
often this t-shirt will most likely never change um and so you know again this is
for for demonstration purposes the code itself will be nice to put in a project
although the data set that you get from this probably won't be the best I would
imagine but notice that this is running
17:23:38
um I can then minimize this and this can run on my computer basically as long as my
computer uh is is working um one thing I will say before I go on to some more stuff
one thing that I will say is that I personally when I did this for a when I um
created this I did something similar and I put this in Visual Studio code um and I
didn't put it in Jupiter notebooks that's a personal preference I would look into
that if that is something that you want um I think visual studio code is a little
bit easier for automating these
17:24:17
types of tasks um but for illustrative purposes and for demonstration purposes you
cannot beat jupyter notebooks that's why I did it so with all that being said that
is basically the end of the project now um I'm not going to stop this and read it
again but you get the point um we now have um a data set that oh jeez all this
again that now has um data I'm getting out of here oh geez it's hounding me let me
get out of here oh no all this is embarrassing guys I'm embarrassed we now have a
CSV file with
17:24:55
data in now you run this in the background of your computer you can do that I have
done it I've ran it for weeks I have ran it for months um if you restart your
computer just come back in here and restart running this process um it's the same
for any automated process unless you start using some online um automation service
which will run it regardless of your computer they do it you know either in the
cloud or on some um server so you know that this is a really good option again if
if you restart your
17:25:28
computer or something happens and you lose connection just come in here run this
through this script again um except for the one where it deletes all your data
don't run that one again only run that one time um and then you will in fact what I
would do is then um I would just comment this out right I'd come in here and I
would just comment this out so that anytime I come back in here I would never
accidentally delete all my data but that is what this project does now something
really interesting something
17:26:01
that I have done in the past that I thought was really cool really useful I
actually did it for um I actually did it for some watches that I was watching
especially on Black Friday it's when I used it I was interested in a price drop or
specific price change and what I did was is I said and I don't know so what I
basically did was is I said if the price is lower than let's say let's say we
wanted to drop below $14 it would then send an email um and I'm going to show you
the script that I
17:26:46
used it still works um and if this is something that you are interested in this
could be a completely different project I just think it's interesting and I wanted
to show it to you although I wouldn't say this this is part of the um final project
let me just come in here and we are going to create this super simple um not super
simple we're sending a mail we're connecting to a server we we're using Gmail we're
logging into our account that is my email you will not get my password we're
17:27:19
creting the subject the body um we we configure or or just kind of create this
message and then we send a mail so then I have this Define uh or this send mail I
am blanking on what this is called I'm going to call it a function but that's
probably not right so if that price drops below a certain point it'll send me an
email um I have used this and I used it and was able to buy a watch that was like
you know let's say 140 bucks for like 90 bucks um on Black Friday sale I was really
really happy about
17:27:51
that so this can be used in that way as well um not something you to write into
your project just something I'm going to include down here if you want to try it I
think it's super interesting something really fun um really fun to mess around with
I enjoyed this so with that being said uh this is this is the project um I in the
next one and I promise you this one is probably going to get a lot more difficult
if you thought this one was easy which I hope maybe I hope you do then that means
you're you know
17:28:25
pretty good at python you know in the next the next um web scraping project and I
hope to do many of these I might do um even all the ones that I put in that poll
but I started with the one that was the most popular um you know if you were able
to get through this I think that that is fantastic I think this is a solid project
to create um a data set and so use this how you will you can copy my code exactly I
don't have a problem with that again I don't think this is beginner there are some
a little bit
17:28:57
more advanced things and I not even Advanced just like intermediate level things um
that you kind of learn as you get into it and so um I hope that this was
instructional I hope I explained it you know well um and I hope that this is useful
again you know when you actually use this you'll have 22 23 24 25 you know you'll
see a price change a price change a price change a price change go use a a product
or go to something that you were interested in or that you know fluctuates often um
and there are plenty
17:29:32
of those on Amazon I promise you there some that literally change almost every
other day like down a dollar up a dollar um and then Black Friday just goes crazy
um with these price changes so use this as you will I hope that this was
instructional I hope that it's useful I think I said that before is you know I'm
doing this because I think it's really interesting it's really useful um um this to
me again was a good introduction a really good introduction to web scraping because
in this next one
17:30:04
it gets quite a bit more difficult um I would say on a scale of like difficulty
this is like maybe a four and it'll probably jump up to like a seven on this next
one um just just much more um technical or or coding heavy so um you know look
forward to that if that's something that you look forward to with that being said
I'm going to go back over here for my send off with that being said I hope this was
helpful I hope that you learned something um don't get mad at me if it was too easy
don't
17:30:38
get mad if it was me if it was too hard uh I'm doing my best over here so I
appreciate your patience thank you so much for watching I really appreciate it if
you like this video be sure to like And subscribe below and I will see you in the
next [Music] video [Music] what's going on everybody welcome back to another video
today we're going to be creating a script to automatically take data from a crypto
[Music] API now this project stems from an earlier video that I did where I walked
17:31:19
through what an API was and how you can use it and in that video I showed you how
to use coin market caps API so you could start pulling in their crypto data and in
this video we're going to take it one step further and automate that process now
we're going to do a little bit of transformation with the data I'm going to show
you some cool stuff of how you can use it and maybe we'll do a little bit of
visualization at the end but that is not the main point of this video it's mostly
around the automation
17:31:41
piece and a little bit of the data cleaning piece as well now fair warning this is
not a beginners level project it's probably more like an intermediate project and
it's not even a complete project per se because we're not doing all the data
cleaning we're not doing all the visualizations but but if you follow along we're
going to cover a lot of different things and you're really going to set yourself up
to be able to do just about anything you want with this data or different apis that
you
17:32:04
pull from so with that being said let's jump onto my screen and get started with
the project all right so this is where we stopped in our last video so if you
haven't watched it now is the time to go back and do that I'll have a link in the
description also all the code that we're going to be looking at today and working
through is going to be in a GitHub repo below so you can go and get all the code
and have it completely finished and just follow along or you can code it from
scratch along with me I do recommend
17:32:30
writing it from scratch if you can because I think you'll learn more and you'll
make mistakes and you'll learn from that as we go through it but it is up to you so
let's get started and as you can see uh we have the script right here and I'm
starting basically from scratch I have a completed one up here I'm actually going
to get rid of those um and what we're going to do is we're going to start from
exactly where we started in our last one I'm going to run the script um this is
going to p from
17:32:55
our API and we're going to look at the dictionary set our option and do our Json
normaliz so this is where we literally left off from the from the last video so we
have all of this data and what we want to do with it is we want to kind of automate
that process right because we don't want to have to come in here run this and you
know put into a CSV manually or something like that we want to automate this data
collection process so that we can just have the data ready for us to use um and
17:33:30
it all be ready to go so we're going to be using this script um but you know we we
might want to add a little bit more to it before we do that uh the first thing that
I want to do before um before anything is something that I like to do when I'm
creating these automation scripts as I I like to add a Tim stamp uh and the reason
for that is because I want to know when I ran or when each of those um Loops you
can say runs through an and does those automated runs right so if I do it every day
I want to know
17:34:04
what time of day I ran it making sure each run ran successfully and so all I'm
going to do is I'm going to add a new column at the end and just call it timestamp
so let's go right up here and we're going to say PD Dot and there's something
called two date time so we're going to do 2core date time and then we're going to
do now and what this is literally going to do is take the the date the the Tim
stamp of right now when it's running and it's going to show that now we need to of
17:34:42
course add a new uh a new column for that so all we're going to do is we're going
to say data frame whoops we're say data frame and let me see real quick we just
have the data we need to add we need to create this data frame right here so data
frame equals and then this Json normalized and we're going to say data frame and
then we're going to do a bracket and we're going to say timestamp and we'll do well
are all these lowercase we're going to keep with the the lower case we're going
17:35:12
to say time stamp and we do that bracket and we'll say equals so what this going to
do is g to first off it's going to create this dat or or assign this DF as our data
frame and then we're going to add this time stamp and add this new column and so
let's run this really quickly and let's go all the way to the right and this is our
timestamp and this is the time uh that it is right now this is the day that I'm
running it this is the time that I'm running it and so this
17:35:41
is working properly now if you look really quickly there is a last updated in here
and this is very close to this timestamp but it is not the same thing um but if you
looked through this data and you really into it a little bit there's this last
update is coming from coin market caps API and this is when the actual um
cryptocurrency was updated in their system and so it is going to be really close
but it's not going to be exact and so I don't like to rely on built-in ones that
you know are coming
17:36:14
from an API or something I want to make one myself that's running on the system
where I'm creating the automated process just like just something I do um so now we
have this original data frame created right we H we now have what we need but what
we want to do is to keep adding data to this um we don't want it to just go to um
you know create these 5,000 rows we want it to create 5,000 5,000 5,000 over time
whether it's a day an hour a week um whatever you want to run it so um what I'm
actually going to do
17:36:50
is I'm going to limit this a lot I just want to look at the top let's say 15 so
we're going to do that that we're going to run through all this again so now I just
have top 15 it's going to be um easier to to see and it won't take as much time to
run our scripts again you can keep as many as you'd like if you want a 100 200 all
5,000 you do whatever you'd like but what we are now going to do is we're going to
create a function using this original script so we again
17:37:19
we have this data frame and we are going to create an automated process that is
going to autom a script to automate this that is going to append data to this data
frame right here so that's kind of you know the big thing that we're trying to
accomplish in this project um so let's go up here and we're going to we'll just
take from here all the way to here we just going to copy this and going to paste it
down here now what we need to do is we need to create a function so we're going to
say
17:37:52
DF and we're going to call this the a apore Runner because this is going to run our
API um whenever we need it to run now when you are formatting um something for a
function it it needs to be formatted properly and so what we need to do is need to
go over here hit tap we're going to do this all the way down I'm just going to skip
forward when it's all the way done all right so now we have this URL and what we
want to add because this is again this is going to run through kind of
17:38:22
this this automated process we're going to run this um this function there what we
want is to also add this right here so we need to take this and we're gonna need to
add this we'll just put it down here [Music] okay and let's do that so what we have
so far is really close to what we want our function to be um we have this function
that we're going to be running through it's going to call this function it's going
to call the the API we're going to use our key we are going to um
17:39:00
you know test it load it format It And format it right here then we're going to add
this timestamp and then we will have this now right now it's just C it's just going
to print this data frame basically but that's not what we want right now what we
want is to actually append this data so when it gets to here when it gets to this
data that's going to be right um right here what we want to do now since we already
have the original data frame set up up top is we now want to say that this is going
to be data
17:39:29
frame two and we're going to say it's going to append it to data Frame 2 and so the
original data frame we're going to say data frame 2. append and we're going to say
df2 all this does is this says this new data that's GNA be coming in every time
let's say it's a loop and it's just looping through pulling the data pulling the
data pulling the data we're going to create this data frame we're going to add add
this time stamp like like we want and then we're going to append that
17:39:59
to this original data frame so as of right now this looks good I will we'll run it
in a second I'll create it so I just created it so now we need to actually create
our script to automatically run this so we're going to do something called import
OS and let me tell you there's a thousand different ways to do this and there are
better ways to do this but they are much more complex much more complicated and
some cost money in order to do it I'm going to show you different options on how to
do this in future
17:40:33
videos on how to automate your Python scripts but this one to me is one I've used a
lot um many many times for different projects and it works so I'm not going to show
you the most complicated thing in the world I'm going to show you something that
I've just used a lot and so we're going to say from time import time from time
import sleep that one's important and now we're going to create our Loop so what
these um what the time and the sleep and the OS uh your operating system what what
these are
17:41:07
going to do is they're going to give us the ability to track the time and we're
going to be able to run through and call this function in certain intervals that we
want so let's create our for loop we're going to say 4 I in now you can create this
specific part in different ways but what I'm going to do is I'm going to say range
of one uh let's say 333 and I say 333 and if you remember from the first video on
the API you only have 333 runs per day and so if I ran ran this 333 times today
that would be
17:41:49
our Max and so that's why I'm using that 333 just for reference so now we're going
to do API Runner so in this loop we're going to call this function up here and then
I'm going to say I want to prove or or show have an output to show that this is
running through successfully so I'm just going to and you can write anything here
we're just going to say API Runner completed uh completed successfully successfully
how do you spell that successfully that doesn't look right I'm just going to say
completed
17:42:29
all right forget that I don't remember how to say uh Spell successfully if that's
if it spelled it right you guys spell it that way but I can't remember now we're
going to use this sleep right here now this counts it in seconds you can change it
to minutes hours whatever we're GNA have it run every minute which is every 60
seconds and so this is going to I'm just going to say it's going to sleep for one
minute and then we're g to say exit so all this is going to do and this
17:43:02
is again fairly simple it's just a simple for Loop and what it says is it's going
to call this API it's going to tell us that it ran successfully and then it's going
to wait for 60 seconds and it's going to run again that's it so let's run this and
see what happens see if what we did works so rant the first time now I'm not gonna
I'm not going to bore you because I'm doing this live exactly what we're about to
get is what we're going to use I didn't run it
17:43:30
overnight or or for a week so that we have a bunch of data I'm what you were going
to work with I'm going to work with as well so I'm going to wait a few minutes I'm
going to let this run I want you to do the same thing I'm going to let this run for
maybe like five minutes or so and we'll work with what we have and we'll keep going
with the project because again we're not the point of this project is not to create
the final product or creating all the visuals ations that um will most likely be in
17:43:56
another video where we're taking all this data and doing all these things with it
the point of this video is to automate it clean it up to where we have it to where
we can really use it and then I'm going to let you guys loose and you guys can do
whatever you want with it and I think it's really setting you up for a lot of
successful projects in the future that you can do all by yourself without me having
to walk you through it so as you can see it's already ran through twice I'm going
to
17:44:20
pause for a second I'm going to let that run through uh just a few more times and
then we will continue with the project all right we are back and of course it's
only ran what five times um it has not reached the limit of 333 so we are perfectly
fine what I'm going to do is I'm just going to stop this by clicking this uh square
up here and it's going to give us some error and then we're going to check it and
we will see what we have I don't know why it's taking so long if
17:44:46
I'm being honest all right so I interrupted it and let's run this let's see what we
got I hope we have more than 15 because if not I'm going be very upset okay so okay
well uh I made a mistake um I was supposed to put data frame right here and I had
data frame too so um take change your script do not do what I just did we're
supposed to be append it's supposed to be data frame append and we're supposed to
be appending the original D this data frame two to the original data frame so so um
I messed up
17:45:26
on that one let's rerun that let's rerun that um let's see local variable DF
reference before assignment okay this is perfect because this happened to me before
um we're running into all sorts of good stuff I like to keep this stuff in my
videos I laugh because I hate running into mistakes but everybody says they they're
happy that I do this um so I'm going to keep doing it I'm not going to cut this out
I promise um but what we actually need to do is we need to go back up to this
function
17:45:58
because what happened was is we called this data frame and now it's it's because
it's in a function it's in what they would call a local variable what we need to do
is we now need to state that this is a global um it's just called a global that's
all it is um and so what we're going to do is we're going do tab we're say Global
say DF and what this should do is this should declare it as a global variable and
it should let this run properly let's hope it does all right it's
17:46:35
running um again I run into mistakes I let me tell you something while we're here
for just a second this project I ran into probably a hundred mistakes or a hundred
errors issues that I had to research for hours um and hours I'm legitimately on
stack Overflow and just Googling and F figuring these things out there were a lot
of new things that I had never run into before um just on this project and so um
everything that you're seeing is from after I went through all of those things or
after I
17:47:06
fixed all of those things and had to really work through them it was it was very um
it was frustrating at times I just I couldn't figure it out and so what you're
looking at is kind of the polished version of that now that I have everything laid
out because I I can't spend 10 hours on a project nobody would watch it so just
know that if you are running into some of these mistakes or you run into mistakes
later on when you're expanding this project that's completely normal so what we're
going to
17:47:31
do is we're going to let this run for a little bit and then after maybe three or
four minutes we'll come back and we'll keep going with the project all right so
let's run this and check and see if we have uh the data that we're looking for uh
and it looks like we do let's go actually back up here really quick um we want to
set this to display Max rows because I want to be able to see all the rows and not
just um a few of them so and that just instead of it gives us this scrolling
instead of that
17:48:07
dot dot dot that shows us just a few so there's our original 15 and then we have
the next um the next Loop and then we have the next Loop and let me scroll over to
the timestamps and I'll show you what I mean um so was ran on 52651 let's go down
526 at 150 2905 I say 1501 2905 and then the next one you can see was ran at 36 31
these are all the ones one minute after each other my original one was from earlier
32 33 yeah so you can see 32 31 3030 or um 3029 and this one was about 15 minutes
ago when I first
17:48:54
um ran the original data frame right all right guys this is Alex from the future
I've actually completed this entire project uh in the video and you're about to see
all that after this but I wanted to show you one more thing that you can do in this
function up here that I didn't show you uh originally that I'm coming back to show
you and that's how to actually put it into a CSV now all we've done in this one is
we we've kept it all enclosed in a data frame and that's it and that may be great
but a
17:49:24
lot of you guys are going to want to automate this and put it into a CSV and I want
to show you how to do that all right so what I'm going to show you really quickly
is right here in this uh in this folder right here I have all these different API
3es and fours these were tests that I did before but what you can do is instead of
just putting it into a data frame you can actually append the data to a CSV and
have that CSV sitting out there for you instead of just keeping it all in the data
frame
17:49:51
and there's a lot of different uses for that you may want to have that file
separately from here just in case something times out or something breaks which is
a legitimate concern or your computer shuts off or or something like that that is a
legitimate concern so what we're going to do is we're going to say um if not and
this is basically an if statement we're going to say os. path dot is file so what
this is going to do is check if there's already a file under this name and we're
going to do r
17:50:25
dot or or R um if you have never done um if you've never done CSV stuff before it's
really important that you put that you you're going to get an error every time so
we're going to take this right here and we're going to copy that and we're going to
put that right here and then we're also going to do a slash and then we're going to
name it basically um let's name this API because I don't think I have that one in
there I think I deleted it yeah so I don't have API so
17:50:54
I'm just going to keep it api. CSV and then I'm going to close that parentheses and
then we're going to add a colon right here and we're going to say if that does not
exist we are going to write this to it and create it so we're going to say data
frames that's this data frame right here data frame dot we going to say 2or CSV and
we're going to do that R and then we're going to copy this so let's just let's just
replace it like that and then we're going to say
17:51:35
comma header oops header is equal to column uncore names so what this is going to
do is if we run through this and what we would have to do is um I'll talk about
this in a little bit we'll have to change this up a little bit but what this is
going to do is going to check to see if this file right here exists if it does not
it is going to create it and create the column headers based off the this data
frame that is what that does now what we want to do is say else and this next part
that we're
17:52:15
going to write is saying if there's already the API file there we want to append
the data we don't want to overwrite it or anything like that we want to append the
the data so we're going to say we're basically going to copy this maybe not the
whole thing but I already did it um so we're going to copy that and we're going to
say mode oops mode equals a and a stands for append and then we're going to say
header oops keep messing up header and we're say false oops we're
17:52:48
going to say false which means when it depends the data it's not going to use those
the column headers every time which you don't want because every time you append it
if you added the headers every 15 rows every 15 rows you're going to have another
headers that you're going to have to like go out into that CSV and filter out and
and get rid of them so we're going to say header equals false now just a second ago
I said you would need to mess with this just a little bit and you would because
every
17:53:15
time um you'd be putting in this data frame which it's already appending it to this
data frame so every time you'd be creating a lot of duplicates if if you kept it
exactly as is what you were going to need to do is basically take it back to its to
its um bones um so you need to kind of keep it like this so what you need to do is
just now run this and it would work perfectly uh let's test it really quick um to
see if it works uh because I'm I'm promising you something I want to make sure it
actually works
17:53:47
let's run it this time okay so it just ran for the first time so it should have
created this file let's go see if that works properly so now it just created that
file and now we're going to see if it actually appends the data so let's wait just
one time um and then I'm going to stop it I'm going to see if it works again I'm
just verifying to make sure that what I'm telling you is actually working uh
because if it doesn't I would feel terrible we don't want that and while
17:54:16
that's running actually I'm going to add this because now I want to show you how to
call it um super easy we're just going to do pd. reor CSV we do that we're going to
call this just like that and then we're going to say data frame and we're just
going to do 72 something random because I've already done this whole project I
don't want to mess anything up so we're going say data frame 72 so now let's stop
this um and what we're going to do is once that stops we're going to run this and
17:54:58
see if it actually um worked and see make sure that this actually pulled the data
in all right so we interrupted it the file is ready to be read in so let's read it
in there's our file um let's see what did I mess up or did I mess anything up ah I
didn't mess anything up this is the index for this file and we already had this in
here we'd probably be able to get rid of it but if you see we have zero 1 two 3
four five six seven eight n 14 then we have zero 1 2 3 and if we look at the time
stamp it should be one
17:55:33
minute apart so it's 11 1945 it said 12045 so this worked exactly as planned um
again you have two different options you can just keep it how it was before and
I'll leave both of those options you know in the in the script so that you can kind
of choose which one you want but um that's how you do that so then right here
you're appending it to a CSV file and then if you just keep this and you get rid of
all this you're just appending it to a data frame now please continue with the
17:56:04
rest of the video that I already have done um but again I'm future Alex so uh
please continue with the rest of the video okay so we have all this data we have we
have so many columns we can do now you know if you want to completely just go and
do your own thing you absolutely can do that I'm going to mess around with a few
things um kind of show you something that I did that I thought was really
interesting um in order to visualize this data a little bit and transform it a
little bit to make it
17:56:36
more usable um but we're not doing a full data cleaning that's not what this
project is I'm not doing a full data cleaning of this data that would be a ma a
very large undertaking because honestly this needs a lot of work one thing that I
do want to clean up really quick uh is is this right here I this the math will be
fine it's just the way that it's shown on here is in state the scientific notation
and I don't like it so what I'm going to do really quickly I is just um get rid of
that so
17:57:06
we're going to we're GNA say pd. set and we do underscore option and this is going
to be do parentheses I'm going to say display this is just this how this is
formatting so we're going to display float underscore format and we're going to say
comma and now we're going to use this Lambda say x colon and we're going to say
percent 0.5f and that right there and we're going to say percent X now if you don't
know what lambdas is lambdas are um I highly recommend looking those up um
17:57:56
again this is not a beginner tutorial whoops no such Keys display floor format that
makes sense uh this is float yeah guys this is not a beginner's level all right uh
you can't use the floor format this is the float format all right so now let's take
a look at this uh this DF uh this data frame that we have so we're just GNA hit DF
hit enter and now our numbers are a little bit more easily readable I prefer it
this way you do not have to do this I'm doing this just because this is what I
17:58:27
prefer so let's jump right into it um something that when I saw this data I was
like something that I really thought was interesting is this percent change of one
hour percent change 24 hours 7 days 30 days 60 days 90 days if you're not in crypto
or you don't do investing or anything like that what this is going to show us is
how I mean it's pretty obvious how much the price of this coin has changed over the
last hour 24 hours seven days so as you can see it's it's barely fluctuated over
the past 24 hours
17:59:01
a little bit over the past um seven days a lot over the last 30 days 60 days and 90
days 20 minus 26% minus 33% we're in may we just had a kind of a crash in crypto a
couple weeks ago so I mean this tracks right but I want to visualize this see this
and kind of see um you know how this is going to look and how if I can gain any
insight from that information and just having it all displayed for me but in its
current state um you know we really cannot do that um now another issue not an
issue
17:59:41
but another thing that we have to take into consideration is we have Bitcoin net
right here we have Bitcoin right here after different polls now we just did it a
minute after each other but for your project may do it a a run each day a run every
hour or something like that right and if you did that your data could be very
different and so you may just want to take this first one but what I'm going to do
for the sake of this project I'm going to group them so let's go down here and
we're going to say DF dog Group
18:00:20
by and so if you've ever done something like SQL uh this is how you Group by in
pandas basically we're going to group by uh the name so so on bitcoin etherium te
so we're gonna we're gonna do that on name and uh I'm not gonna I'm gonna say sort
is equal to false oops I'm not going to sort it uh you could say true there but
we're not going to and I guess you'll see why later we're going to do an open
bracket and now we need to choose what we're going to group by uh or what we're
18:00:56
going to what columns we're going to have so I'm going to do another Open Bracket
and I'm just going to copy and paste these so I'm going to start right here at
quote percent one hour so I'm going to do boom and then go over one and we're going
to take 24 hours paste that comma we have the 7day 30-day and we're going to do
like that and I'm just going to do comma I'm gonna do the same one but I'm just
going to manually change it to 30day rid of that at the end I don't
18:01:38
know what that is uh then we're going to do 60 days and comma and we're going to do
our last one which is 90 days and let's see what that gives us uh doesn't give us
anything okay I know what's wrong here um we forgot to add basically the what we're
we have we're grouping by something we need to have like an average a mean a mode
or something like that right so all we have to do is go to the end right here and
let's just do we're going to do an average um and so we're taking this
18:02:22
number let's say this is for Bitcoin so we're going to take this number in this one
hour for every time it's Bitcoin it's going to group them all together um and then
it's going to average them so in the past five minutes where it's been running
we're going to take the average or the mean of that so let's run this again and so
now this is our output let's take a look Oops I meant down here let's run this now
now what we have is all of these um cryptos these are all 15 that we have
18:02:57
and this is the average um for this 1 hour 247 days 30 days 60 days and 90 days so
now we have all of our cryptocurrencies over here we have our percent changes up
top and then our averages um here as well and so now what we're going to do is you
know if you try to visualize this as is doesn't really work because these percent
changes are up here as columns and we don't really want them as columns because
that it just doesn't work for visual for actually creating the visualizations we
18:03:30
really need these to be rows and so my initial thought when I was doing this was I
of course I need to Pivot um you know if you've ever used pivot like an Excel or
powerbi or something like that that was my first thought and I tried everything and
I could get not could not get it to work and I almost gave up until I I ran across
um something called stacking or back and and so this was not something that I I I
think I have used it before but I I couldn't remember to be being completely Frank
I couldn't
18:04:00
remember how to do this so I just did um once I saw what it was I did Stack let's
make that dat four you don't have to do this uh you can keep this all the original
data frame I'm just I like for visual purposes you can see like the progression
that we're making um but I like to you know create its new data frame and I can
always go back and look at this data frame three um as we go but you don't you
don't have to do that that's just what I'm doing so now let's
18:04:27
take a look at this now uh up here we had Bitcoin and we had all these columns and
we had uh these numbers as rows but now we have all of these as rows as well this
how we have this is much much more usable um and if you've ever done something like
pivot or the stacking before you'll know that you you kind of have to do it if you
really want to visualize this well but um you because we just stacked it it kind of
changed it so if we look at um let's look at the type of let's do type of data
frame three this is
18:05:05
before um before we stacked it this was in a data frame but now let's go and look
at data frame four so this is a series this is no longer a data frame so we have to
remember that that's that's really important because we can no longer treat it as a
data frame it's now a series so we want to get it back to a data frame we don't
want it to be like that because you can't really use it in the series so what we're
going to do and let me just create a few of these so you can be up here better so
now what we're
18:05:37
going to do is we're going to say data frame 4 Dot and something called 2core frame
so we're going to make this into a frame and now we're going to specify the name
and it doesn't mean um the name like right here we have actually mean the name of
these values right here this is part of the stacking process in these columns or
these two columns so let's go right here and we're going to call it let's just say
values and let's make this data frame five and let's see the output whoops for
18:06:16
data frame five and now so there's that values and now this already looks a lot
better right so it's in this it's in this more um this is already a data so this is
a data frame so let's look at type data frame five so now it's in a data frame but
the issue is is that this name is kind of acting like a an index which we don't
want because we want to be able to use this so it doesn't really have an index at
the moment so we need to give it an index but typically when you give
18:06:51
an index you'll do something like um we'll say dataframe do5 we'll do setor index
and then you'll do something like um name so let's just do dat frame six is equal
to we'll see we'll see what happens here it's going to give us an error oops what I
meant is we're going to do data frame five bracket uh name and that's a column
right we're going to do that and it's basically going to say that that's not going
to work and and what we need to do is what or at least what I want to do
18:07:26
and what we're going to do in this video is I'm going to create numbers I really
would just want it to be numbered one two three four five that's what I want um but
we don't have that right now I can't just will it into existence so now what we're
going to do is kind of create uh an index basically out of thin air so we're going
to do pd. index and we're going to say uh you know we basically want how many um
rows are in here that's where we want our our um index to be we want it to count
how many
18:07:59
are in here now you can make this Dynamic and I it probably wouldn't be that hard
but I'm gonna take this super lazy route um and I'm just GNA say let's do DF do5 or
oops df5 doc count and there's 90 values in here so what I'm going to do is I'm
going to do a range of 90 uh and this is not uh I would definitely make this
Dynamic but I'm again I'm just being being a little bit lazy we call this index is
equal to and I'm going to put this Index right here so now this is
18:08:39
a number so now it's going to literally Index this for us now I've ran into this
issue many times um so what I need to actually do is to reset this index and then
do it properly the first time uh so let's do re let's get rid of this let's reset
this index um and it actually fixed itself um so what was happening was is we were
indexing something that was already indexed we were causing issues in a nutshell so
we reset the index and now this is what it looks like and this is exactly what we
want this is
18:09:16
really how we wanted it formatted in order to for our visualizations we have
multiple rows for the Bitcoin um each of these columns are is now a row with the
value attached to it exactly what we wanted so um really quick I for whatever
reason it it makes that uh level one I don't know why but we're just going to
rename that column really quickly so we're going to do data frame 6. rename and
then we're going to do and open parentheses say columns equal to we're going to do
one of these these bad
18:09:52
boys oops one of these bad boys this this type of bracket and we're going to say
levelor one and we do a colon and then oops and then a colon and then we want to
change it to and I'm just going to call this the percent underscore change so let's
call this dat frame [Music] seven again you don't have to do that I'm just doing it
so now this looks much much better now let's try to visualize this one um because
we haven't done any visualizations yet we've just been messing with the data a
little bit I I
18:10:29
you know I kind of want to see how we can use this it's something that I personally
am interested in so I kind of wanted to see visualize how these changed over these
these time periods um but we need to um import some stuff in order to be able to
visualize this so we're going to import cbor as SNS and if we need to um we're
going to import map plot lib as well I don't know if we'll use it right now or at
all but um we're going to we're going to add it in here either
18:11:01
way so now those are added and so what we're going to do is come right here we're
going to do SNS doat plot and we're going to oops we're going to say the x axis is
equal to and we want to do this as the percent change percent change and then we
have the Y AIS now we want the y- axis to be these values right here say comma Y is
equal to and we're going to say values oops and then we're going to say comma and
we'll say we want to basically create a Legend um I guess you could
18:11:45
call it we're going to say Hue is equal to name um I'll show you what it looks like
without it and then you know you can see that we need that we're going to say the
data is equal to this data frame seven data frame seven and then we are going to
say the kind is equal to now let's run this and see what we get and super quickly
with just you know limited um inputs here's what we have now this looks really good
we can narrow this down if we wanted to to a few less because there's a lot here
and there's a
18:12:29
lot of colors but again that's just because we have a lot of different stuff but
there's a few that are doing really well I think this is Tron um and then we have a
few that are not doing so well but it's really hard to see if you look down here
it's really hard to see this um and that's just because of the the column name and
so I actually want to change these column names or these values so that when we
visualize it right down here it it doesn't look like that I kind of want
18:13:02
this to be you know at least one good visualization you can take out of here this
is definitely not perfect or complete by any means but you know you can take take
that away from here um so let's um I did Alt Enter which adds another row I could
have just pushed plus that's was kind of the lazy way um what I'm going to do is
I'm going to change these um these values in here so how I'm going to do that is
I'm going to do data frame seven and we only want to look at this one
18:13:32
column so we'll do that right there and we want to say dot replace and we're going
to an open parenthesis and then a bracket now what we need to do is I'm just to
show you um one of them is I'm going to say this one hour do that oops and then
what I need to do is a comma another bracket and this is what it's going to change
to I'm just going to say one hour oops one hour um and we'll do this one really
quick and then I'm gonna I don't want you to have to watch me type all this out but
18:14:09
I'm going to go through and basically do all of this uh for those but let's let's
see this really quick and so now as you can see that um the originally it said
quote. USD percent change 1 hour is now only 1 hour now this didn't actually do
anything we need to apply it to this right here so I'm going to say data frame 7 is
equal to and then we'll run data frame 7 again so now that has actually changed
that value now I'm going to go through and I'm going to update that for every
18:14:41
single one all right so I basically just put the other ones um in here that we
wanted to change with commas afterneath so I have 24 hours comma with the seven
days 30 days 60 days 90 days and then this bracket over here which tells uh it what
to change it do 24 7 days 30 days 60 days 90 days so let's run this I haven't even
tried it yet uh and it looks like it obviously worked properly so now let's go back
down here and let's run this again and look at that it looks so much cleaner so
much nicer um and as you I
18:15:20
mean all of them with that 1 hour change has very little change and then you can
look back so we can see back within 90 days it's gone a lot of these have gone down
which again if you're following crypto you know there's a big crash recently um
especially with with you know all these altcoins um that you're seeing right here
went down a ton so I think this is um Avalanche or die or whatever these ones are
you know went down dramatically whereas there's one up here this Lone Wolf um
that's just
18:15:50
that's just did do really well for whatever reason so it's really interesting um to
see now this is a pretty specific um visualization that I personally wanted to see
and I thought was interesting you can do absolutely whatever you want to do with
this data I mean there's so much here you can do a lot I mean a lot with this data
especially depending on how long you track it right I only did this over the course
of like five minutes but if you set this up um and you can track it over
18:16:20
a longer time now um let's say you wanted to do something much simpler uh you just
wanted to look at like Bitcoin over that time that you you know uh uh took the data
in that's going to be a lot simpler than what we just did and I'll show you how to
do that really quickly so we're going to look at the data frame and we are going to
say uh or we're going to take specific columns we just want um a few columns that
we want to keep or or pull from so we're going to take uh oops
18:16:52
we're going to take the name column we're going to do uh might be easier if I copy
them but I'm just going to write them out quote. USD do price this is the price of
the actual cryptocurrency then we're going to do Tim stamp and let's make this data
frame and we're just going to do 10 for absolutely no reason uh maybe made at n it
would have been easier so now we just have these um these columns and you know we
have all these separate columns so what we can do and the re kind of the reason I
want to
18:17:33
show you this is you can just query this really quickly and just take the columns
that you want so let's say we just wanted to look at Bitcoin so we're going to say
data frame 10. query do open parenthesis and we're going to say name is equal and
equal is not like that uh when you're doing it like this you need to say equal
equal equal to oops ignore that uh is equal to bitcoin and we're going do it just
like that and we're going to say data frame 10 is equal to let's try running that I
18:18:10
think something's wrong with it try it like this oops all right let's try that
there we go it was just the I needed a double quotation instead of a single
quotation that was the issue so now we have Bitcoin we have the price and we have
these time stamps so this is the actual time when we ran it so this is the original
data frame and then in the you know this this project it took me 15 more minutes to
get this one and then we had it running properly for the next five minutes so
that's you know that's
18:18:38
actually what we have now if we want to just visualize this really simply what we
can do is we're going to say uh we're going to do SNS doline plot and that's going
to be like a little line chart or line graph what whatever you want to call it and
then we're going to say x is equal to and we'll say quote no actually we wanted the
time stamp to be on the x-axis um and then we'll do y is equal to quote. USD do
price and let's see if that works good not interpret time stamp for
18:19:23
the parameter uh that's because it's not understanding that the data equals data
frame 10 now let's try this all right so this is uh looks terrible let me me just
say SNS doet underscore theme and open parentheses we'll do style is equal to dark
grid this looks a little better now again we are looking just at a very very short
time series but we can look at just Bitcoin or we could look at multiple and we're
showing this you know this line that's showing us this trajectory over time so you
can get
18:20:15
really creative with this you can run this for a long time you can show Bitcoin
over days weeks or month months however long you run this and so that's really all
I've got um honestly like I said this is not a I wouldn't say this is a complete
full project but I'm showing you how to do something to enable you to kind of run
with it and run with the ball and do basically whatever you want with this you can
pull it from you know data from a different API you can use this exact API in data
18:20:44
but I wanted to show you just a few things that I initially saw that I might do
with the data and you you have so much let me go back to this original data frame
uh right we'll use this one right here this one right here look at all this data I
mean you have so so so much data actually let's go to this one this one's better
you have so much data so many numbers here um so many columns that we didn't even
look at that you can use um and so you know there's a lot that you can use here and
I'm really
18:21:18
trying to just set you up so that you can run with it and do whatever you want I
could have done a thousand different things here but you know I tried to just show
you two things that you can do with the data that I thought were pretty interesting
or or simple to do and you know I want you guys to go out and do something way way
better than what I did so I hope that this was helpful I hope that this showed you
how to automate that process so you don't have to sit there and click it and append
it and do
18:21:44
all these different things that it can show you how to kind of automate this
process and hopefully that will be helpful in your future projects so with that
being said thank you so much for watching if you made it all day to the end you
guys are fantastic if you like this video be sure to like And subscribe below I'll
see you in the next [Music] video what's going on everybody welcome back to another
video today I'm going to be walking you through how to create your very own
portfolio
18:22:20
website [Music] now we just completed our data analyst portfolio project Series
where we walk through four projects in SQL Tableau and Python and so if you have
completed those projects you now want to share them with potential employers and I
think the best way to do that is to create your own website in just a little bit
I'm going to show you two options on how you can actually create your own website
the first one is a website builder like wix.com and the second one is hosting your
own website through
18:22:50
something called GitHub Pages now if you have never created your own website before
it can sound a little bit daunting but don't worry I'm going to walk you through
every single step of the way from the very start to the very end and once you reach
the end you'll have a complete data analyst portfolio website so without further
Ado let's jump on my screen and let's get started all right so the website that
you're looking at right now is the actual website that we are going to build in
18:23:11
this video um it is hosted on GitHub Pages or github.io so this is actually being
hosted right now by GitHub pages so if you type this in I'll leave a link in the
description if you type Tye this in um you will get this page and you can check it
out for yourself if you don't want to just watch me look at it um so you know it
has this little header and you can write a little bit about yourself and then these
are our actual projects so this is our data cleaning in SQL project um and then
there's the
18:23:39
covid uh data exploration Tableau dashboards movie correlation with python um this
is a future video I plan on doing a few more of these projects because I just
really enjoy them so uh you know and then there's this contact information at the
bottom so it's a really simple website and it gets the point across and uh I have
something similar to this for my own personal one I I use a different variation but
um this all comes from this website HTML 5 up there are lots of templates lots of
options
18:24:13
that you can use um again the one we're going to be working with is this one but I
use a different one for mine and they are really good I me super easy to build and
customize yourself and I will say again I have no experience doing this I just
watched a YouTube video that showed me how to do this and now I am creating my own
YouTube video to show you how to do this so it's coming um pretty much full circle
so like I said there's no no real narrative to it it just clicks to your project um
if you click on this and
18:24:45
let's just open a new tab it'll take you right to our to the GitHub project um and
then you the the whoever is checking this out like a an employer or a recruiter can
see your code so super simple another way that you can do this is kind of creating
your own website through like a template or something like that um almost like a
Blog style so I imagine it being very something very similar to this where there's
this introduction and you can talk about you know where you got the data set how
you
18:25:13
got the data um and then you can kind of have a more narrative uh approach with
screenshots and with some code as well so you know this person included screenshots
um and then there's the code right here that I can actually copy um and paste that
and it just walks through the logic of how the project was done um there's a story
to it really and so that might be something that you're interested in now I have
done something like this in the past and I used Wix and there's a you can do this
completely for
18:25:44
free um the one we're doing today is completely free as well but you know if you
want the customize um the customized URL you do have to pay for it on Wix but you
can get a free Wix website with the Wix um in the URL so you know try this out
these are super easy you can find thousands of templates and a million tutorials of
how to do them um so that's not the one we're going to be working on today so with
that being said uh the very very first thing that we need to do before we do
anything is actually download visual
18:26:18
studio code this is where we're going to download that HTML and we're going to be
working with it in there um again I don't know if I said this before but it seems a
little bit intimidating at first but once we actually start looking at it it's a
lot easier than it looks I promise you so if you are me and you have a Windows
computer you'll just go right here you'll install it um super easy to install I'm
not going to walk you through how to do that um of course I already have it up and
running down
18:26:46
here so once you have that installed what you're going to do going to come to this
website a link should be in the description we are going to download this all you
have to click is the free download it's going to pop up I'm going to put it in my
downloads I'm GNA click save fantastic uh so let's go to the downloads and it
should be right here now if we open this up it has a few different things in it
okay so um I'm using the brave browser so that's going to be right here so that's
this the
18:27:18
symbol but for you if you're using Google Chrome that should be the symbol there as
well but this is everything that you should be seeing and what we want to do is we
want to take it out of this um zip folder because it's there are things that can
read into it with Visual Studio code but I want to make this as user friendly as I
possibly can so what we're going to do is we're going to make create a new folder
and I'm just going to call it massively or you can call it um Port website whatever
you
18:27:48
want to call it I'm just going to do Port website um and we are just going to I'm
going to copy this in I'm not going to cut it in just in case I make a mistake so
going to put all of those um all of those things in here and now what we're going
to do is we're going to go to visual studio code right here and you should be
greeted with this um this right here and we're just going to click open folder and
we're going to go to Port website and we're going to go select
18:28:19
folder and you're going to say say yes I trust this one and right over here is all
of the documents that we were just looking at now the one that the only one really
that we're going to be working in um we'll work a little bit in the images um
because I'll show you how to add your own images the really the only one we're
going to be working in is this index so again it looks complicated um if you've
never looked at HTML before um it does look a little bit complicated but HTML
18:28:50
to me is one of the more easily understood languages um once you start kind of
getting into it which we're about to we're going to walk through the entire process
it actually makes a lot of sense and it is pretty simple um something that you're
going to want is you're going to want something called a live so like if I click
right here and I click open with live server you don't have it yet I'm guessing
unless you've done this before um it's going to open up this website and this is
what we're looking
18:29:18
at right now so it has a bunch of um gibberish or some language that I do not know
and so we can view this live um in just a second I'm going to take myself off
screen but before I do that um let's download or let's um search for that that live
um I think it's called live share live server um let me see what this is called
yeah live server so come right here it's called this live server there it is yeah
that's the one so this is our live server you just need to click install it takes
like 5 seconds and it
18:29:57
should be completely installed um what this does is it just hosts a local website
it's not something that anybody can access um but it connects to your code and when
we make updates it'll make a lot you can see it live you can see those updates live
so I'll show you all that in a second just be sure to um be sure to download that
or install that uh with that being said let's get out of this let's go all let's go
back right here uh with that being said I am going to take myself off screen so
that you
18:30:25
can see everything that I am seeing as well um it's been really great seeing you
have lots of different videos coming up lots of new projects um I just I really
enjoyed this project series I think I'm just going to do more of them so uh all
right I'm G to get myself off screen so let's look at what we actually need to do
so I'm going to um so let me see okay so we're already connected to the live um
actually I got rid of it whoops let's pull this over and let's pull
18:31:02
that and we're going to open in live server so if we look right over here and I
know this going to be a little bit Squish and I'm sorry about that um but if we
look right over here this says this is massively so you you can change that that's
that's this right here and you can say we're going to say Alex the analyst
portfolio and we'll get rid of this massively I'm gonna hit control save you can
also go up here and hit save but I'm I'm going hit controls so I hit contrl s and
just
18:31:39
like that it updates on the website now again this is just a local so it's nothing
that anybody can see so don't worry but what we're going to do is I'm going to walk
you through the entire process of creating this and then at the end I will show you
how to host it on GitHub um and it's honestly it's it's a fairly easy process it's
just takes a little bit of time to customize it all so let's get into it so we have
this um you may not be able to see it let me actually pull this up so it says
18:32:06
massively by HTTP we're going to customiz that customize that as well whoops I
don't want to do that every single time I'm I'm going to try not to go full and go
back and everything like that so we're just going to say Alex the analyst portfolio
um contrl s and right up here that changed it you may not be able to see yeah don't
ask me that again thank you uh right up here you probably can't see at the moment
we'll see that later um but it it customizes this um tab
18:32:35
which is really cool so let's go right down here now this is where it says a free
fully responsive HTML uh five template we can customize that and I highly encourage
you do so what you can do and they actually included their Twitter handle right
here and you can do the same if you look at this one right here I included my Alex
the analyst handle that that goes to my YouTube channel and you can do the exact
same thing includes your LinkedIn or your GitHub profile or whatever you want to
include in there um
18:33:11
and so you know be aware that you can do that so let's say um oops I need to click
back in here so we're going to say um data analyst skilled in and then again don't
write what I'm writing um you can it's I'm just going to make it really simple but
you know this part is meant to be a little bit about you um as who you are so I'm
going to say data analyst skilled in SQL Tableau and Python and then I'm just going
to get rid of all of this yep yep yep everything from here
18:33:54
over and contrl S and so super simple um actually let me where was that four four
here it is we don't need that actually we don't need any anything from here over
probably here honestly see what that looks like um and yeah and I can again you can
use any website right here that you want and you can customize what it looks like
so I'm going to say Alex the analyst um and then whatever URL you want to include
in there that's what you need to put so now if I save oops if I hit contrl s so now
it says Alex the
18:34:33
analyst um so pretty easy now we're going to go down and you can use this however
you want to use it I would you can even make this um you can make this like one of
your one of your readmes like a you and put the link for that I decided to include
um again on this one I decided to include the project that I thought that we've
done that was like the most impressive or the I don't know the coolest one I don't
know if you consider data cleaning and SQ cool but um I do I think it's cool so
18:35:09
I included that one as my very first one so that's what we're going to do um right
here so we're going to go down and it's going to say let's say it says this is
massively that's not it uh cool so let's see what oh okay I know what that is we'll
come back to this up here um in just a little bit I'm going to go full screen I'll
show you what this is and then we'll come back to it but if we go right down here
this is our what they're calling a featured post
18:35:39
and then the ones below this are posts so in our featured post um I'm going to get
rid of the date I don't want them to know that I just created it like um I don't
know oops I keep doing uh control a selecting everything whoops so we're going to
say um data cleaning in SQL and we'll get rid of this and contrl S again I'm just
updating it a lot so that you see what I'm doing and where it's going and we're
going to get rid of basically all of this and go back and we're just going to
18:36:17
say in this project we C clean data in we clean let's do we clean housing data in
SQL server and contr S so super easy again uh give a little bit more description I
did in my other one um and you have the you have you can see that website so go
check it out and then we'll have an image and I'm going to show you um at the end
we're going to go back and redo all the images but I'm not going to do that at this
very moment um so what now you can have this full story I chose to do view
18:36:56
project and i h contrl s it says view project I think that just looks better
especially if you're displaying a project I think it is nice uh now we go into all
the indiv individual posts um actually no wait what I want I want to show you
really quick is how you actually link it to this so let's go right over here this
is our co uh that's our Co one here's a data cleaning project so all you have to do
is take um take this website so that's the URL and you're going to put it right
here now
18:37:29
there's three different places this href is places are places where you can put a
link to a website um and on here it references this right here so you can they can
click on this data cleaning and SQL they can click on the image um as because you
know this href is right next to this image they can also click on the view project
button so you can put it in all three um and you'll just go like this you'll you'll
stick the URL right where that um hashtag or pound sign is and then we're going to
save that
18:38:05
oops oh I I this is embarrassing I am not a website I am not a web developer as you
can see um but then if I go in here and I right click and I say open link it is
going to take me to that project so super super simple and we're going to do
basically that for all of these um I'm only going to show you three and then you
can do the rest but I want to show you how to also do the um put the Tableau it's
the exact same thing but you know it's different so wanted to show it to you so the
next one
18:38:35
that we're going to do is go down to posts and again I'm going to get rid of this
date you can keep that in there if you want excuse me and that's totally fine just
update the date um this is that said mag again I think this might be like some
language that I just don't know about um the next one is data exploration in SQL
and I'm going to get rid of this and we'll save that perfect and we'll do view
project cool and yeah so now we need to um customize this summary and so I'm just
18:39:19
going to say something really simple um data exploration of covid-19 data set in
SQL Server there we go let's save that we have view project now let's go get our
project so this is the data exploration we're going to take this we're going to
copy it and we're going to put it right in here and right in here as well and if
you want to you can also include it right up here so we have it in all three places
uh again once you click on these they will come up let's go to the next
18:40:04
one we're going to get rid of this this one is going to be our Tableau projects so
actually let me just copy that while we're here this is going to be our Tableau
projects so if you have one specific project that you want to include what you need
to do is actually go in here click view grab that URL what I am doing is I am just
sharing my Tableau public page so if you have tons of projects in here and um you
want to display all of them then or you want them to be able to see all of them and
18:40:37
go and pick and see and choose what they want to look at then just choose this URL
that we're choosing right here so um in here on in the um HTML we're going to put
I'm going to put tab projects and let's go like this and then we will get rid of uh
that hashtag pound sign whatever you want to call it and we'll hit contrl s and oh
we got to do the um this as well this is my this is going to be a terrible don't
use this this is my Tableau this holds I'm just this is bad
18:41:25
this holds all of my Tableau dashboards don't please don't do this um I am doing
this because I don't want to take forever in a video to make it perfect um and then
you know you're going to do the exact same thing so in this one right here I
included four so I'm going to keep four um let me do the no I'm just going to do
these three I'm not gonna take up more of our time um so we did those I'm just
going to keep these three in for visual purposes but once you get down here um you
know what
18:42:05
we're going to do is delete some of this right so we this is our data exploration
and where's our Tableau this is our Tableau right here so Tableau projects they're
separated by these articles so what we're going to do is go around right here and
we're going to go down down down down to right here this is going to get rid of all
these other articles or all these other what they're calling um posts so we're
going to get rid of those and we're going to hit save and now as you can see we
have our
18:42:36
header we have our first project and we have our second and our third I would
include those other projects that we've done in here so that it looks good this is
this footer right here we don't need that because we don't have any um anything
else in there so we're going to get rid of that as well and now we just have this
information now I don't have anything where they can do the name email message or
you can keep that in there if you'd like um but I am going to get rid of this so
we're
18:43:06
going to go right here that's the section so don't delete the section we want that
I'm going to delete this footer section as what they're calling it and now we have
this address phone email social um and I'm G to get to the Social in just a second
it's again super easy but for the address I just put location I don't want to give
somebody my address or put it on a website anywhere um it's not something I want to
do so what we're going to do is just put I'm going to put
18:43:34
Dallas and Texas and we can keep it like that and we'll hit oops we'll hit save and
it'll have Dallas Texas um hate the look of the zeros 6 seven8 n z so we're going
we're going to do that phone number two3 56 7890 and then email and we'll put Alex
the analyst [email protected] if you have issues with this um you can email me but I'll
try I will try to respond to all your emails I get a lot um so I will do my best
but that is my actual email if you are curious now um now that we have this we also
18:44:23
have these the social media now I want to display my LinkedIn and I also want to
display my GitHub so what I'm going to do right here is I'm going to go over here
and do LinkedIn perfect let's go to this so I'm going to take my LinkedIn URL and I
am going to get rid of these first two because I'm only going to include two and
for this one I'm going to do uh LinkedIn oops linked in and then for right here I'm
going to replace that with linked in and what you're going to do is put
18:45:09
this link right here and then we're going to go get get the GitHub so let's do
GitHub oh who is this sign up what is going on um I don't there let's just go back
here I that was some I was like viewing a while back or something um so we're going
to take the GitHub and we're going to put that right here so it already has it as
um the GitHub is this supposed to be lowercase I think it is let me see if this is
lowercased as well yeah um so do it like that do it lowercased um I
18:45:48
forgot that that was how they did it um and oh that's the label that doesn't matter
as much but this right here is the class is actually the important part because
then when we go back here there is no LinkedIn image but when we save it oops when
we save it it has the LinkedIn image because it's already a class that was created
in this HTML um template so we have that um and let me bring this full screen
really quick because there are a few things that we couldn't see in that that
screen these
18:46:20
right here are things that we could not see before um and these as well so what we
can do is we're going to go down here we're just going to copy these social we're
going to replace them right here so they can have those and then we're going to get
rid of these two right here and this says this is massively um and we're going to
change that as well let's make this full screen for the first time feels good um I
hate doing split screen but I do it for you guys um so this is
18:46:51
massively and we're just going to put we're just going to get rid of these two this
is um it's called The Navigator the the different tabs we're going to get rid of
those two tabs and then for this I'm just going to call it projects and I'll once I
once we go back and update all this then you will um you'll see those changes so
let's see so we made those changes here's our social or the social medias uh Social
Media stuff we're going to go and copy copy these
18:47:21
two and we're going to replace all of these with this um and let's save that and
let's go back so now as you can see those two are gone this says projects there's
only two right here and if you click on it it's going to go to my LinkedIn or your
LinkedIn when you do it um and this will take you to the GitHub so it is all
working as intended this is great um when you scroll down and it says massively we
can change that as well and we should let's do that really quick um we'll just
18:47:59
say Alex the analyst and we'll update that and there we go so in a nutshell this is
the a lot of it um we need images and I don't think I set this up for this video so
I'm going to I'm going to like cut myself off for like 2 seconds go pull those
images in um because it could take like a few minutes I don't want to waste your
time and then I'll come back so I'll see you in two seconds all right so I just
pulled over the images that we are going to use let's go to the downloads um
they're
18:48:35
right here they're the housing Tableau and Co um if I open up this Co one this is
what the image looks like this is what we're going to use for that covid project so
I'm going to copy these I'm going to go into the port website um that we just have
I'm going to go to images and I'm going to insert these in here so now that we have
those images in here let's go back and let's see what we got so we just put these
images in this um you'll have this folder right here and you can
18:49:05
open it up and you can see all of these that we have so all we're going to do is go
and replace the images these these you know temporary images that they had for us
and we should be gold and then we're going to actually upload it to to GitHub and
then create our website for free so let's go right down here this is our very first
uh one this is our data cleaning in SQL this is with the housing data so this image
right over here it says images p1. jpeg so jpeg I don't know why I said it like
that so this is
18:49:41
the housing so what we're going to do right here is do housing and it'll
autocomplete for us um so that housing should be in there now next one is the data
exploration in SQL that was with the co so we're going to get rid of this we're
going to say Co um because that is the image that I have right over here and then
the last one is excuse me Tableau so let's go right over here let's do TBL low
let's get rid oh I got to save that uh contrl s perfect and now let's look
18:50:15
at it there you go there you go go oh this one still says full story go change that
um I'm going to go change it just doesn't feel right uh view project oh that's not
how you spell it okay contrl s perfect okay so now this looks a lot better um and
when we host it um through GitHub Pages or github.io this is going to be what it
looks like I mean it is and you can add a lot more to it you can take away from it
you can add as many projects as you want you can keep adding you can copy those
articles or those posts and you
18:50:55
can just keep adding them um so this is kind of what it's going to look like and it
was not that hard I don't think I hope this was not too difficult I really don't
think it is um it's really just using a template and kind of understanding a little
basics of HTML so um we are going to take this and we we have this saved already we
have this all saved what we are going to do now is upload this to GitHub so let's
go right over here let's go to here and let's go to repositories and how do where
where's
18:51:34
the new one oh I need to sign in okay I'm going to get rid of this part so you
can't see it so we are going to say a new repository we're going to call it Alex
the analyst 2 . github.io so we're going to write it just like that you know if
your name's um Alex Jimmy I don't know why I said Jimmy Alex Jimmy Alex jimmy.
github.io you can always go back after the fact and change this so it's not a big
deal whether you change it or not and we're going to create this repository we're
going to say upload an
18:52:14
existing file and instead of choosing them what we're going to do is just go right
over here go to this and we're just going to copy this in or not copy it in but
drag it in okay so we're going to take this drag it in right here and it can take a
it'll take a little bit has a 75 but it shouldn't take that long and let's just
wait for it I taking a sip of water I apologize but it is literally uploading just
everything that we had in there so all the updates and all the changes and
18:52:46
all the stuff that we um had and it looks like it's done so let's just write
initial commit commit changes it is processing it all right and it should be done
very very soon as long as I have a good internet connection we shall see stick with
me it's taking its time um while while it's loading let's go over to oh oh there it
is so perfect so here's everything that we have has this read me that it generated
let's over to settings and we have this U github.io and if we go right down here
18:53:35
to GitHub Pages pages settings now has its own dedicated tab let's check it out
here so it is um it's currently disabled but we're going to say want it to do pull
from the main um I think it's the doc we'll see I'm going to save this your site is
ready to be published let's open this up okay site not found maybe it's from the
root save um your site is having a build a problem let me see if I can actually
change the name I already have an Alex analyist but I'm GNA see it's already
18:54:13
taken um I'm just going to try this one one more time oh and now it's working uh I
have no idea why it uh didn't work before but this is fantastic it was giving me
all this I was maybe I was just reading too much into that I had I had never tried
to create another umio or or GitHub pages on this so anyways thanks for sticking
with me through all that um stuff so now we have our actual website um it doesn't
look the same up here because of that thing that we were just looking at it should
18:54:49
just be this part right here but um this is an actual website now it's being hosted
through GitHub and it's completely free if you want to pay you can hide this from
your GitHub um your repository has to be public uh something I didn't mention when
you're doing this your repository has to be public um if I change the visibility to
private um you will not be able to see it anymore you'll have to then pay if you
want to make this repository private you have to then pay I think it's like $4 a
month or
18:55:23
something like that so worth looking into um if you don't want to display that on
your GitHub worth looking into but this is our final product I mean it looks pretty
fantastic and you can use any of these templates right there are lots of different
templates that are fantastic I mean they look amazing they look professional um
it's really up to your style like this one looks kind of cool a little bit um edgy
for for my taste but uh this one looks really good too might might be able to add
some more
18:55:55
narrative to that one so again go through it make your make a good choice in it and
then update it how we updated it uh I will include the um let's see I will include
everything that's in here and I'll keep this on my on this GitHub that you can go
in there and if you want to download these images you can download the images that
I I used um or you can go find your own just um you know look for try to get like
HD images on Google just type in Google Images and search for whatever image you
want to
18:56:27
search try to get an HD image with that being said that is the entire project I I I
I hope this didn't go too long um this may have gone you know this may have gone
like 30 45 minutes but in the end of it at the at the end which is where we are now
we have an entire website it was completely free and I hope that you can host the
projects and you can create create more projects I will be coming out with more
projects myself that hopefully will be interesting to you in the future so with
that being said thank you guys for
18:56:58
joining me for you who stuck it out to the very end you are fantastic you know send
me a post your website on LinkedIn and tag me in it because I love seeing um you
guys do these projects and this stuff so I'm super excited to see all of these um
that you guys tag me on on LinkedIn and whatnot so with that being said this is it
I hope you learned something I hope that it worked for you and I appreciate you
watching be sure to like And subscribe below and I will see you in the next video
[Music]
18:57:39
goodbye what's going on everybody welcome back to another video today I'm going to
help you create a data analyst resume [Music] now when I say data analyst rume it's
not that much different than a regular rume except that it's going to be catered
for a data analyst job in just a second we're going to take a look on my screen at
a sample resume I'll have the template in the description so you can just go and
download it and fill in your information but it's a fantastic
18:58:07
starting place to actually creating your resume when we're looking at this resume
we'll take a look at each section kind of dissect each part of it and then at the
very end I'll give some extra tips on what you should include and how to actually
write your rume as well so without further Ado let's jump onto my screen take a
look at the rume and see how you can create your own data analyst resume so here's
our sample resume I'm just going to walk through the entire thing super quick and
then we'll break
18:58:29
down each section individually I'll give my thoughts and some tips on each section
and remember you can download this exact thing in the description below I'll have a
link I'll probably put it on my GitHub or somewhere else but it'll be free to
download uh so you can go ahead and do that but let's zoom in just a little bit so
at the very top we have our header we have some just basic uh contact information
then we have skills then we have projects and notice the projects are up here at
the top and
18:58:56
we'll get to that later about the order of where you should be putting your things
then we have work experience and then we have education so really quickly I'm going
to zoom out and I hope you can still see it the order is actually quite important
now there is one piece that is not in here right now and that is a summary section
I don't have a summary section on my real resume I just I don't think it's useful
or helpful I don't have one you can include one and it would be right up here at
the very top
18:59:25
now why do we have the skills and projects at the top well it's because that most
people who are trying to break into a data analytics don't have any experience in
data analytics if I am reading this resume as a hiring manager and the first thing
that I look up here and I see is experience and it's not analyst it's a teacher or
a nurse or something I'm going to be like this person doesn't have any experience I
don't want to hire them the first thing that you want to have in your
18:59:52
resume is something that is good for the hiring manager to see the first several
things you should put all your best stuff at the top that's my uh what I believe so
I think that these skills are really strong a lot of great skills and then these
projects are all really good projects now this is just a sample these aren't all
real projects um or they are real real projects they're just not you know ones that
I built myself it's just a sample so uh then right here we have our work experience
now if you're like I
19:00:21
said a nurse or a teacher or a lawyer or something that's not relevant to data
analytics you want that at the bottom um and then you're going to want to tie in uh
some things in these descriptions and then the education at the bottom my education
was terrible okay I had a bachelor's in recreational therapy which had nothing to
do with data analytics so for a tech job has was not good I always had mine at the
bottom so let's start at the very top and walk through each section so at the very
top you want to have maybe a
19:00:50
title but for sure your full name you definitely want to include your phone number
if you're okay with them calling you but definitely an email for sure include
things like a LinkedIn profile or a GitHub profile you can also put your portfolio
in fact I highly recommend putting your portfolio because it just looks good or if
they check it out that's a really good thing and then your location cuz sometimes
your job is going to be location based whether you're in Dallas or another
Metropolitan
19:01:16
City it's just nice ni to have that on there this should be the simplest one to
fill out unless you haven't built out something like a portfolio you just don't
include it um but this one should be the simplest one right you're just putting
contact information maybe a link to a website next we have the skill section and
this one on my own personal resume I have at the very top I typically recommend
anyone who does not have experience who is trying to break in to data analytics to
put this at the
19:01:42
top as well and have these skills and know these skills that's important um but
when the hiring manager first initially sees this there's just going to be a mental
check okay they have the skills that we're looking for let's move on to the rest of
the resume um but you want as many mental checks for what they're looking for at
the beginning just going to I'm going to keep repeating that um this is how I
personally write my skills so I write something like SQL and then I'll say SQL
19:02:10
Server my SQL postrace SQL now I have used all these different types of SQL in my
actual job if you don't you haven't done done that and you're just starting out
maybe you put something like um you know subqueries store procedures joins whatever
the actual things within SQL I don't really think I don't recommend that as much
because typically people know what SQL is like if they use SQL they know what SQL
is so they're just going to expect that you know those things now for something
like python
19:02:39
it's different because there are packages something are there are packages and
libraries within them so you can specify I have worked with pandas in my actual job
and I look for people who know pandas as well because you know we use it so
actually specifying these packages or libraries is really helpful so this is how I
would put these things on a resume now this is another resume this is our sample
two I'm going to maybe include this one down below although I don't like this
format
19:03:06
as much but if you like it you can but here's another way that you can um show
these skills just a different way to do it I want to show you both ways um we have
like Python and the libraries underneath it I've even seen it to where people will
write out almost like um let me go down here they'll write out like a narrative um
they'll do Python and then they'll have like a colon and then they'll say use to um
manipulate data and I'm not spelling that right in pandas dot dot dot and
19:03:37
they've write it out you can do that as well again I'd like bullet points because
it's to the point it's exactly what you need let's get rid of this one real quick
so this is the one uh that I like so that's the skill section let's move down to
the projects now the project section is almost primarily for people who are just
starting out once you get experience typically you maybe have one project on there
or no projects at all but the project section is used as kind of um
19:04:08
inl of actual experience right I've always said that you need to build projects not
just for your resume but also for the interviews so so then when you get into an
interview you can point to these projects and say yes I've used SQL I did it in
this project and they may have seen it and you can walk them through how you
actually used it it gives you more credibility than just saying you know how to use
SQL So within the project section we're going to have a project like this one says
data
19:04:36
science job market exploratory data analysis so this is a personal project and then
within it they did some really great stuff here's usually what I recommend and this
is in here which is you specify what you did you say I used Python and what did you
do to analyze this and gain insights in the job market then you walk through some
of the things that you actually did things like regex techniques you used pandas
matplot lib you built a wordcloud these are keywords that somebody will look for
and they
19:05:06
even highlighted them which I personally like and do as myself they highlighted
these things so that the viewer or the um hiring manager is actually seeing them
making sure that they're bold so that they are catching their eye so I personally
do this and I recommend this that's all it needs to be it just needs to be I built
a Tablo dashboard doing this from this data set I cleaned it in SQL and you show
those skills something that's important in both the skill section and the project
section is using
19:05:37
and highlighting your skills as much as possible especially if you don't have any
experience if you've never had a job before once you have a job and you come down
to like the work experience then it kind of speaks for you but if you don't you
want the projects and the skills to speak towards your skills and credibility so we
have this right here now one thing that's not in here that I actually do recommend
is a hyperlink maybe right here or actually this being a hyperlink to the project
because they
19:06:07
might read this and be like I we work with you know data science job market data I
don't know and then they'll click on this link and they can see your work that is
the one thing that I would change change in this other than that this is exactly
how I would have it very very very similar to my own um and a lot of this that I
did I actually took from other resumés and formatted how I prefer and like it um so
again some of this is personal preference and you can change it however you want
that's just how I
19:06:33
like it so that is the project section now we're going to go down to the work
experience section now this person does have a little bit of analyst uh experience
so you know if you don't that's okay but you put your previous experience now
here's what I recommend if you've been a teacher for 15 years you've been a nurse
for 10 years you've had 10 different jobs don't put all your experience on here um
maybe put your last two jobs going back maybe three years I don't
19:07:03
recommend you filling it up because it's not going to be super relevant unless
you're applying for a healthc care data analyst position and you have a Nursing
degree then it's relevant and that experience is super helpful because it's domain
experience right then you may go back five years just you know use your discretion
but what you need to include of course your title where you worked your location
and the times that's standard for almost any resume but within here uh what you
really want to
19:07:29
do is highlight again the skills if you can if you can't that'll change but in here
he says implemented a new reporting using Excel pivot and VBA which reduced
processing time by 50% these types of um quantitative information I reduced time I
I I saved the company money I I did something quantitative putting that in here is
always helpful always highly recommended although it can be tough to measure these
things right typically what I recommend especially if you're first starting out is
to highlight
19:07:59
skills if you're a teacher you've probably used Excel and you've probably used
Excel for closer to data analytics than you think just in a teacher way and not a
data analytics way but you can reward these things and make them sound good if you
are a a nurse like I was saying youve used used Excel you've used a health
information system you've used uh some type of database talk to that include that
in here um and it can be hard to write these out and I'm going to show you away in
just a little bit about
19:08:31
how you can write these out and think about these things or have a way to help you
write them or give you ideas we'll get to that in a second lastly we have the
education piece this is again really simple at the very bottom education what your
degree was where you went um and if you have you know some help ful things to
include you can do that and then when you actually went now you can include other
things in here as well like boot camps if you went to a boot camp or you could also
include things like a GPA
19:08:58
although I don't personally recommend it GPA has never been anything that I've ever
cared about or I've seen anyone care about ever um so you don't normally have to
include it one other thing that you can include at the very bottom is something
like certifications uh I personally don't put a lot of stock in certifications
unless it is one that I have recommended in previous video like the Tableau
certification or Tableau desktop certification if you're applying to a job that
uses taow that actually could
19:09:26
be really good so definitely include that but ones on udem me ones on corsera or
like my Alex the analyst boot camp that I have on my channel I wouldn't really
include that in your resume it's mostly for learning if you get something like the
Tableau one or the AWS uh Cloud one or the um Azure Cloud one those are all actual
certifications that can help you and give you credibility towards a certain skill
now really quickly let's just take a glance at the other resume this is resume 2 so
we have the
19:09:55
education at the top doesn't have to be at the top unless it's relevant which you
could put at the top we have a skill section they again this is the projects same
projects and then work experience this is just a little bit different um order so
you can do it like this as well in different way you can write the skills and you
can also include a summary section as well so that's the meat and potatoes of how I
would create create a data analyst resume now writing it is actually a different
Beast right
19:10:20
you have to actually write it out get something on the resume and then apply using
that resume but it can be hard to come up with these ideas so uh I just want to
show you something that a lot of people have been using I personally haven't
written a resume in a little while so I don't use it for my own resume or haven't
used it but I will um and that's using chat gbt or some variation whether it's on
Bing or you know you get some different version or some new product that's out
there at the
19:10:46
moment I'm just going to show you how to do it in chat GPT some of the things that
you can prompt it to do and that'll be it I'm just going to show you kind of some
ideas that it can generate for you to help you write these things all right so here
in my screen we're on chat gbt if you haven't used it I'll leave a link in the
description I also have a whole video on how to use chat GPT for a data analysis um
so I like chat GPT now I've already written out these questions
19:11:09
because I don't want to wait for the responses but here's what I asked it to do and
you can do some variation of this whether you're a nurse or a lawyer or a teach
teacher or whatever I said I'm a math High School teacher trying to become a data
analyst how can I use my experience on my resume to help me get a job this is just
to help provoke some ideas and it says you know you most likely have some skills
emphasize your quantitative skills so those are some of the things you can focus on
showcase
19:11:34
your ability to commute complex Concepts which is really important in data
analytics being able to present information which teachers have highlight your
experience with technology hopefully you're using some type of uh you know database
for students or you know Excel or something like that you can highlight that and
showcase your ability to solve problems now the next thing that I asked it was I
built a covid tableau dashboard using Tableau how can I add this to my resume and
then it's going to tell you exactly
19:12:03
how you can do that it's going to say include the link to your dashboard which I
also recommend provide a brief description highlight your data visualization skills
include screenshots or images which that's what I would be putting in the project
itself not on your resume then provide context for the data all really good stuff
really great now the last thing is kind of what I'm trying to get at as a whole it
can help you write things so I'm going to say write a two sent I said write a two
19:12:29
write two sentences highlighting my covid Tableau dashboard to add to my resume and
it's going to say developed a covid tablet dashboard to visualize pandemic Trends
using real-time data sources demonstrating strong data visualization and Analysis
skills so this can help you generate those descriptions in your work experience it
can help you generate the descriptions in your projects and this can be really
helpful to just generate some ideas cuz I personally really struggle with like
highlighting my skills and descriptions
19:12:58
within those things this can be a way to kind of help you do that so don't you know
just copy and paste but let it prompt you let it give you ideas now the last thing
that I want to mention is just your overall resume as a whole the template that I
use the template that I recommend is very very friend friendly to these automated
systems that check your resume if you did not know most companies especially big
companies use these automated systems that scan your resum see if it has what
they're looking
19:13:27
for and then that rume if it gets through that system gets passed on to a recruiter
or hiring manager typically most companies don't go straight to the hiring manager
so you need a resume that can pass through those initial systems and pass those
tests the RS that I've shown you today will do that they have bullet points they
have the keywords they have everything you need that's why I recommend or partially
why I recommend this type of resume other ones that have images and different fonts
and different
19:13:55
stylings can cause issues with these automated systems where it just doesn't read
it properly or you know it doesn't read the right words that you want it to read so
just know that these types of résumés have different uses right you're not just
handing it off to somebody to where they can read it and it's needs to be visually
stimulating really what you need is you needed to get through those initial systems
which these resumés uh if you write them well you have good you know skills and the
right things on your
19:14:23
resume they will pass through that first layer to get to those hiring managers so
again be sure to download those those are completely free I just I highly recommend
using them I think they're really good so be sure to download those use those just
put in your own information be sure to build out your own projects don't just keep
the ones that are on there because you'll need to be able to speak to them
sometimes recruiters or hiring managers are going to ask you about them how you
build it
19:14:46
what you did and you can also point to those projects in your actual interview so I
hope that this was helpful I hope that your resume is ready to go I hope that you
ready to start applying for those data analyst jobs thank you guys so much for
watching I really appreciate it if you like this video be sure to like And
subscribe below and I'll see you in the next [Music] video what's going on
everybody my name is Alex freeberg and today we're going to be walking through my
top three tips
19:15:21
on how to use LinkedIn to land a job LinkedIn is a fantastic place to look for a
job it's its own little ecosystem where career-driven people can connect and talk
with one another and help each other find jobs I personally have landed jobs
through Linkedin and so I know how effective it can be let's jump over to my screen
and I'm going to show you my top three strategies that I have found to be the most
successful to actually finding a job so I'm logged into my completely Anonymous
account here and
19:15:44
I'm going to show you the very first tip which is you shouldn't be just applying to
a position you should be actually reaching out to the recruiter and I'm going to
show you exactly how to do that so the first thing that we have to do is actually
find a job that we want to apply to so let's go to the job section right over here
and let's search for data analyst and let's do that in let's do Chicago because why
not uh so it's going to search for data analy positions in
19:16:14
Chicago we have one right here let's see what it looks like cuz you know I don't
want to apply to jobs that I'm not extremely qualified for so this is a job that I
want to apply for and before I actually go and applies to the job I want to see if
I can reach out to a recruiter and talk to them beforehand so let me show you how
to do that so what we're going to do is actually click on the company right here
it's going to take us to basically their LinkedIn profile page for their entire
company
19:16:39
and we're going to scroll down we're going to go over to people and then we're
going to search for recruiter so if we scroll down all the way to the bottom we can
see that there are recruiters that actually work inh house for this company and so
now would be a time where I actually reach out to some of these recruiters and I
say hey I see a job that I really like I think I'm really qualified for and I would
love to talk more about it with you you can ask them things about the job to make
sure
19:17:04
that it is a good fit for you and then I highly recommend you asking them what they
think is the best way to apply for this job to make sure that your resume gets
noticed and you get an interview since they are a recruiter who works at this
company they may be the the one who's actually going to be looking at these resumés
and so they may give you a tip on the best way to actually apply they may also just
ask you to send them your ré directly that they can look at it or maybe later on
down the line this
19:17:26
actually is a person who is reviewing resumés and so if they come across your
resume they may be able to put a face to the name and that may give you bonus
points I'm going to leave a template script in the description in case you don't
know exactly what you want to say to this recruiter and it'll give you just a
baseline of some of the things that you might want to say number two is to actually
ask for a referral now if you don't know what a referral is it is is where somebody
who already works at
19:17:47
the company can refer you to a specific job and then might get you a little bit
higher on the list for interviews so I highly recommend reaching out to somebody
who already works at that company and ask if they're willing to be a referral for
you I get people reaching out to me all the time asking to be a referral for them
for my company and nine times out of 10 I say yes I always ask to see their resume
first just to make sure that their resume aligns with the position at least a
little bit but
19:18:11
there's basically no harm in me being a referral for somebody in fact I may
actually get a bonus if that person ends up getting hired and so for the most part
there's almost no risk for the employee to actually being a referral and so a lot
of times they will say yes now let me show you how to do that and it is very
similar to finding a recruiter so we're going to stay on this people section but
instead of searching for a recruiter we're going to search for a job title that is
similar to yours
19:18:34
so let's actually see if they do already have any data analysts and if they do that
is the person that we're going to reach out to because that is the person we'll
probably have the best connection with so it looks like we have six employees and
let's SC SC down and so it looks like all these people have data related jobs and
so I would reach out to these people and say I saw an open data analyst position at
your company I would love to know more about your company as a whole and then you
can talk to them a
19:18:57
little bit and then in the end your goal is to ask them for a referral and if that
happens that is fantastic and then you can go ahead and apply for the job and mark
them as a referral for you now my third tip on how to get a job through Linkedin is
to actually have recruiters reach out to you so let me show you how to do that the
first thing we're going to do is actually go over to my profile here and we'll
click view profile now there's a few things that we want to make sure that we have
on here
19:19:23
so that recruiters can reach out to us the first thing that I want to do is to
actually come to this section right here which is show recruiters you're open to
work and when I click on this I can actually choose some job titles and some
locations where I actually want to apply and have recruiters reach out to me and so
right now I have data analyst I have in the DFW area which is where I live I can
also add titles like business analyst um and then maybe Junior data analyst entry-
level data analyst or
19:19:49
things like that that could potentially have recruiters reach out to me for
positions that I'm interested in and then you can say that you're immediately and
actively applying and you can also say that you're only looking for full-time
positions or contract positions and then you can actually add this to your profile
and I only want recruiters to see that because I do currently have a job at
McDonald's and so I don't want McDonald's firing me because I'm looking for
employment
19:20:10
elsewhere so let's save that and it looks like it was updated and so now when
recruiters are searching for candidates for a specific position you will be on that
list so that they can find you and reach out to you something else I should mention
is on your profile page I would try to have some type of professional photo so that
you look really good I would also try to include data analyst somewhere in your
title if you already have a data analyst job and you're looking for another one you
can
19:20:34
just have your previous company but if you're looking for a data analyst job you
can always put seeking data analyst position or something like that another thing I
think is really important is having really good descriptions for your previous work
I don't currently have this but I would go a little bit into the work that I
actually do make sure that the experience matches kind of what you're looking for
if you do have previous experience if not that's totally fine the next section on
your
19:20:57
profile page that I would recommend looking at and updating is your skill section
and so you want to go in there and make sure you have all of your relevant really
data analyst heavy skills on there specifically hard skills because soft skills
aren't going to translate too much into this section I would definitely stick to
things like SQL python Tableau Excel things that data analysts are going to use
because this is where they're going to actually look and see if you have the skills
that
19:21:20
they are looking for for that position when I was applying to jobs in only applying
to job postings and not using any of these strategies my success rate was 0.04
which means out of 1,000 applications that I filled out and sent my resume to I
only heard back from four of them to actually get an interview but with these
strategies I was able to get that up to 10% and at my best I was able to get that
up to 15% but that's because I was applying to a lot less positions and I was
targeting jobs that I really
19:21:46
wanted to work for and so I put in more effort in order to contact people and work
with Recruiters in order to get that job I genuinely hope that these strategies can
be helpful for you especially if you're trying to apply for jobs right now thank
you guys so much for watching I really appreciate it if you liked this video and
got anything out of it at all be sure to like And subscribe below and I'll see you
in the next video hello everybody congratulations if you are watching this that
means that you completed the data
19:22:09
analyst boot camp if you haven't don't keep watching this is only for people who
have completed the data analyst boot camp playlist on my YouTube channel woo all
right now that we filtered those people out I'm going to show you how you can
download your certificate and your certification now that you've completed the data
analyst boot camp I will leave a link in the description but let's go on to my
screen I'm going to show you how to actually access this and download your
certification all right guys don't
19:22:33
go around telling people this or sharing this uh but this is our data analytics
boot camp on the Alex the analyst GitHub right up here I will have this link in the
description what you can go ahead and do is you can come right here you can
download this you'll just right click or click download and you just do something
like save image as um or you can come to this one this is the one that I think is
the the real money maker here uh this is the certificate of completion for the data
analytics boot
19:23:00
camp I have my not signature but my name as well as U my position with a blank
space right here to fill in your name feel free to put this on LinkedIn or Twitter
or Instagram and tag me in that because I would love to just say congratulations
because honestly it's a lot lot of work to go through all those videos and learn
all of those skills so congratulations I hope that you learned something along this
journey a new skill a new thought a new idea and I'm proud of you I'm proud of you
for putting in
19:23:25
the work it's not easy but you did it and I hope that you came out on the other
side better for it so congrats I'll see you in the next [Music] video