Open navigation menu
Close suggestions
Search
Search
en
Change Language
Upload
Sign in
Sign in
Download free for days
0 ratings
0% found this document useful (0 votes)
17 views
Youtube Analysis
Uploaded by
deepak Rulez
AI-enhanced title
Copyright
© © All Rights Reserved
Available Formats
Download as PDF or read online on Scribd
Download now
Download
Save youtube analysis For Later
Download
Save
Save youtube analysis For Later
0%
0% found this document useful, undefined
0%
, undefined
Embed
Share
Print
Report
0 ratings
0% found this document useful (0 votes)
17 views
Youtube Analysis
Uploaded by
deepak Rulez
AI-enhanced title
Copyright
© © All Rights Reserved
Available Formats
Download as PDF or read online on Scribd
Download now
Download
Save youtube analysis For Later
Carousel Previous
Carousel Next
Save
Save youtube analysis For Later
0%
0% found this document useful, undefined
0%
, undefined
Embed
Share
Print
Report
Download now
Download
You are on page 1
/ 13
Search
Fullscreen
8113723, 12:08AM In [1]: In [2]: In [3]: LUntites22-Copys -Jupyter Notebook y-tube Channels Analysis Using Python The y-tube set has the information about the Channels, ‘The Data set available from Flexible which is a Third Party y-tube which engine , and available on Kaggle dataset for free. Import Library import pandas as pd import pandas as pd import numpy as np import seaborn as. sns import matplotlib.pyplot as plt import seaborn as sns €:\Users\Syed Arif\anaconda3\1ib\site-packages\scipy\__init_.py:146: Userwar ning: A NumPy version >=1.16.5 and <1.23.@ is required for this version of Sc ipy (detected version 1.25.1 warnings.warn(f"A NumPy version >=(np_minversion} and <{np_maxversion}" Uploading Csv fle df = pd.read_excel(r \users\syed Arif \Dounloads \Y-tube-Channels.x15x") Data Preprocessing -head() head is used show to the By default rows in the dataset locahost 8888/natebooks/Untiled22-Copy4 jpyno sna8113723, 12:08AM LUntites22-Copys -Jupyter Notebook In [4]: df-head() out [4]: Rank Grade Channel name Video Uploads Subscribers Video views, ott AM Zo0TV 82757 16752951 20869786501 1 and Att T-Series 12661 61196302 47548830843 2 3rd A++ Cocomelon - Nursery Rhymes 373 19238251 9793305082 3 ath At SET India 27323 91180569 22675948293, 45th At wwe 36756 92852346 26273688433, .tail() tall is used to show rows by Descending order In [5]: df.tail() Out[5]: Rank Grade Channel name Video Uploads Subscribers Video views 4995 4.906 B+ Utes Bentogu 70s 2072942 441202796 4996 4.997) B+ -HLTECHMUSICLTO 7971055091 977991722 4997 4.908) Be Mastrsaint tio 3265735 31175806 4998 4.900t) B+ Bruce Mcintosh ars 209014589764 4999 5.000 Be SSehasQua aes attra raatast1 -Shape It show the total no of rows & Column in the dataset In [6]: df.shape out[6]: (eee, 6) -Columns It show the no of each Column In [7]: df.colums Out[7]: Index(['Rank', ‘Grade’, ‘Channel name', ‘Video Uploads’, ‘Subscribers’, "Video views'], dtypes'object') locahost 8888/natebooks/Untiled22-Copy4 jpyno ana8113723, 12:08AM In [8]: out [8]: In [9]: out[9]: In [10]: out [19]: LUntites22-Copys -Jupyter Notebook -dtypes This Attribute show the data type of each column df.dtypes Rank object Grade object Channel name object. Video Uploads object. Subscribers object Video views intea dtype: object -unique() Ina column, It show the unique value of specific column. af{"channel nane*] .unique() array(['Zee TV', ‘T-Series’, 'Cocomelon - Nursery Rhymes’, ..., “Mastersaint', ‘Bruce McIntosh’, 'SehatAQUA'], dtyp% -nuique() It will show the total no of unque value from whole data frame af nunique() Rank 5000 Grade 6 Channel name 4993 Video Uploads 2286 Subscribers 4612 Video views 5000 dtype: intes -describe() It show the Count, mean , median ete locahost 8888/natebooks/Untiled22-Copy4 jpyno ana8113723, 12:08AM LUntites22-Copys -Jupyter Notebook In [11]: df.describe() out(14]: Video views ‘count 5.000000e+03 mean 1.071449+09 std 2.0038¢40+09 min 7.5000000+01 25% 1.8623280+08 50% 4.8205480+08 75% 1.124368e+09 max 4,7548840+10 -value_counts It Shows all the unique values with their count In [12]: df["Channel name"].value_counts() out{12]: Tho Nguyén Various Artists - Topic Learn Colors For Kids Super Kids Funny Vines MeLlananFredy 1 Soosloli PoP 1 sBs A 1 FRRERISL 1 ‘SehatAQUA 1 Name: Channel name, Lengt 4993, dtype: intea Get Overall Statistics about the datafram locahost 8888/natebooks/Untiled22-Copy4 jpyno ang8113723, 12:08AM LUntves22-Copyé -Jupyter Notebook In [13]: df.describe() out(13]: Video views ‘count 5.000000e+03 mean 1.0714498+09 std 2.0038¢40+09 min 7.5000000+01 25% 1.8623280+08 50% 4.8205480+08 75% 1.124368e+09 max 4,7548840+10 Convert the Exponential part into Decimal +03 , +09 etc In [14]: pd.options.display.float_format = "{:.2f}".format In [15]: df.describe() out [15]: Video views count 000.00 mean 1071449400.15 std 2003843972.12 min 78.00 25% — 166232046.75 50% 482054780.00 75% — 1124367826.75, max 47548839843.00 Replace "--" to "Nan" locahost 8888/natebooks/Untiled22-Copy4 jpyno8113723, 12:08AM In [16]: df.head(20) LUntites22-Copys -Jupyter Notebook out [16]: Rank Grade Channel name Video Uploads Subscribers Video views, ots AM Zoo 22757 18752951 20869786591 1 and Ate T series 12661 61196302 47sae830843 2 rd A+ Cocomelon- Nursery Rhymes 373 19238251 9793305082 34th Ast SET Inia 27323 91180569 22675948293 45th Att wwe 36756 32852348 26273688433 5 oh Att Moviecips 30243 17149705 16618094724 6 Th Ast etd mitzk 8500 11373567 29898730764 7 8h Ast ABS-CBN Entertainment 400147 12149206 17202609860 8 oth Ast Ryan ToysReview 1140 16082927 24518098041 9 10h Ast Zee Marathi 748072841811 2591830307 40 1th At 5-Minute Crafts 2085 33492951 8587520379 11h At Canal KonéZila 822 39409728 19291034467 12 13th As Like Nastya Vlog 150 7662886 2540099931 13 tah At ozna 50 18824912 8727783225 14 15h As Wave Music 16119 15899764 10989179147 18 16h At ChaThalland 49239 11569723 9388600275 16 17th AS WORLDSTARHIPHOP 4781830088 11102158475 7 teh AS Viad and Nikita 53 = 142e274554 18 19h AS Badabun 3060 23603062 5860444053 19 20h WorkpointOficial 24287 17687229 14022189658 In [17]: df=dF.replace( locahost 8888/natebooks/Untiled22-Copy4 jpyno rue) ana8113723, 12:08AM In [18]: df-head(20) LUntites22-Copys -Jupyter Notebook out [18]: Rank Grade Channel name Video Uploads Subscribers Video views ois AM ZeoTV 8277.00 18752951.00 20869786591 1 and Ate TSeries ——12861.00 6119630200 47548839843 2 rd A+ Gocomelon- Nursery Rhymes 373.00 19238251.00 9793305082 34th Ast SET India _—-27328.00 31180569.00 22675048293 45th AM WWE ——_-36756.00 32852346.00 26273668433 5 oh Att Moviecips 0243.00 17149705.00 16518094724 6 Th Att etd mitzk 8500.00 11373567.00 23898730764 78h Ast ——ABS-CBN Entertainment —-10047.00 12149208.00 17202609850, 8 oth Ast Ryan ToysReview 1140.00 16082927.00 24518098041 9 10h Ast Zee Marathi -—«74807.00 2841811.00 2591830307 40 1th At 5-Minute Crafts 2085.00 33492951.00 8587520379 11h At Canal KonéZila 822.00 39409726.00 19291034467 12 13th As Like Nastya Vlog 150.00 7652886.00 2540099931 13 tah At ozna 50.00 1882491200 8727783225 14 15h As Wave Music 1611.00. 15899764.00 10989179147 18 16h At ChsThaland 49239100 11569723.00 9388600275 16 17th AS WORLDSTARHIPHOP 4778.00 15830098.00 11102188475, 7 teh AS Viad and Nikita 53.00 NaN 1428274554 18 19h AS Badabun 3060.00 23603062.00 S860444053, 19 20h WorkpointOficial «2427.00 17687228.00 14022189654 check the Missing Values in our dataset In [19]: df.isnul1().sum() out[19]: Rank Grade Channel name Video Uploads Subscribers Video views dtype locahost 8888/natebooks/Untiled22-Copy4 jpyno inted m38113723, 12:08AM In [20]: sns.heatmap(df.isnul1()) Out[29]:
° 238 CChannet name -| Video Uploads | In [21]: df.dropna(axis = 0, inplace In [22]: sns.heatmap(df.isnul1()) Out[{22]:
channel name | Video Uploads Remove the string values from Rank Column locahost 8888/natebooks/Untiled22-Copy4 jpyno LUntites22-Copys -Jupyter Notebook 10 os os oa 02 00 Video views | subscribers | video views | subscribers ana8113723, 12:08AM LUntites22-Copys -Jupyter Notebook In [23]: df out [23]: Rank Grade Channel name Video Uploads Subscribers Video views 0 ast A ZeeTv '82757.00 18752951.00 20869786591 1 and Ate T-Series 12661.00 61196302.00 47548839843 2 316 At+ Cocomelon - Nursery Rhymes 373,00 19238251.00 9793305082 3 fh Ate SET India 27323,00 31180559,00 22675948293, 4 Sth Ate wwe 36756,00 32852346,00 26273668433, 4995 4,996th B+ Uras Beniiogiu 708.00 2072942.00 441202795 4996 4.997th B+ HL-TECH MUSIC LTD 797.00 1055091.00 377331722 4997 4,998th B+ Mastersaint 110,00 3285735,00 311758426 4998 4.999th Br Bruce Mcintosh 3475.00 $2990.00 14583764 4999 5,000 B+ ‘SehatAQua, 254.00 2117200 73912511 4610 rows x 6 columns In [24]: df["Rank"] = dF["Rank"].str[@:-2] In [25]: d#-tail() out [25]: Rank Grade Channelname Video Uploads Subscribers Video views 4995 4.999 B+ Uras Benlogl 706.00 2072942.00 441202795 4996 4,997 BY HI-TECH MUSICLTD 797.00 1056091.00 377331722 4997 4.998 B+ Mastersaint 10.00 3265735.00 311758426 4998 4.999 B+ Bruce Mcintosh 3475.00 2990.00 14563764 4999 5.000 B+ SonatAQUA 284.00 2117200 73312511 We Want To remove Commas from Rank Columns In [26]: df["Rank"] = df["Rank -str.replace(", locahost 8888/natebooks/Untiled22-Copy4 jpyno ona8113723, 12:08AM In [27]: out(27]: In [28]: out [28]: In [34]: In [33]: In [34]: out [34]: In [35]: out (35): In [37]: LUntites22-Copys -Jupyter Notebook df.tail() Rank Grade Channel name Video Uploads Subscribers Video views 4995 4995 B+ Uras Benloglu 706.00 207294200 441202795 4996 4997 B+ HL-TECH MUSIC LTD 797.00 1055091.00 377331722 4997 4998 B+ Mastersaint 110.00 3265735.00 311758426 4998 4999 B+ Bruce Melntosh 3475.00 2990.00 14569764 4999 5000 B+ ‘SehatAQuA 254.00 © 21172.00 73312511 df.dtypes Rank object Grade object Channel name object Video Uploads —float64 Subscribers floated Video views intea dtype: object dF["Rank"] = df["Rank"].astype(" int") df["Subscribers"] = df[ "Subscribers" ].astype(*int") df.dtypes Rank int32 Grade object Channel name object Video Uploads float64 Subscribers int32 Video views intea dtype: object Data Cleaning "Grade" Column dF["Grade"] .unique() array(['At+ ', "AR, TA, » ‘B+ '], dtypesobject) dF [ "Grad . haya df["Grade"].map({'At+ ': 5, "At 3A locahost 8888/natebooks/Untiled22-Copy4 jpyno ron8113723, 12:08AM LUntites22-Copys -Jupyter Notebook In [38]: df-head() out (38): Rank Grade Channel name Video Uploads Subscribers Video views, o 7 500 ZeoTV 8275700 18752951 20869786501 1 2 500 TSeries ——-12661,00 61196902 47s4a639e43, 2 35.00. Cocomelon - Nursery Rhymes 373.00 19298251 9793305082 3 4 500 SETinda ——-27328,00 31180559 22675948203 4 5 500 WWE 3675600 32852348 26273688433 Find Out the Maximum Number of "Videos Upload" In [39]: df.colums out[39]: Index([*Rank', 'Grade", ‘Channel name’, ‘Video Uploads’, ‘Subscribers’, ‘Video views'], dtype='object') In [43]: df.sort_values(by = ‘Video Uploads’, ascending = False).head(5) out [43]: Rank Grade Channel name Video Uploads Subscribers Video views 3535454 NaN APArchive ——«422526.00 «746325 ——548610560 4149 1150 1.00 -YTNNEWS ——355996.00 820108 1640347646 2223 2224 NaN SBSDrama _—«335521.00 1418619 1565758044 323 324 3.00 GMANews —-269085.00 2599175 2786949164 2956 2057 NaN MB -267649.00 1434208. 1328208302 Find the Corelation In [44]: df.corr() out [44]: Rank Grade Video Uploads Subscribers Video views Rank 1.00 -0.88 -0.07 0.38) -0.40 Grade 0.88 1.00 0.08 oat 0.38 Video Uploads -0.07 0.08 1.00 0.01 0.09 Subscribers 0.38 0.31 oor 1.00 079 Video views -0.40 0.38 0.09 0,79 1.00 locahost 8888/natebooks/Untiled22-Copy4 jpyno nna8113723, 12:08AM LUntites22-Copys -Jupyter Notebook Whcih Grade the Maximum Number of Video_Upload In [45]: df.columns out[45]: Index([*Rank", ‘Grade’, ‘Channel name’, ‘Video Uploads’, ‘Subscribers’, "Video views'], dtype="object') Grade", y = In [49]: sns.barplot(x = ‘Video Uploads", data = df) Out[49]:
0000 50000 $ 40000 30000 video Up! 20000 10000 Which Grade has Heighest Number of Views In [58]: sns.barplot(x = "Grade", y = ‘Video views’, data = df) Out[50]: