We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 13
915728, 1125 PM
In [1]:
In [2]:
In [3]:
Unitles22-CopyS -Jupyer Notebook
instagram Analysis Using Python
The instagram accounts dataset has the information about the Followers, Profession & Country.
‘The Data set available from Flexible which is a Third Party instagram accounts which engine,
and available on Kaggle dataset for free.
Import Library
import pandas as pd
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import seaborn as sns
€:\Users\Syed Arif\anaconda3\1ib\site-packages\scipy\__init_.py:146: Userwar
ning: A NumPy version >=1.16.5 and <1.23.@ is required for this version of Sc
iPy (detected version 1.25.1
warnings.warn(f"A NumPy version >=(np_n
version} and <{np_maxversion}"
Uploading Csv fle
df = pd.read_csv(r"C:\Users\syed Arif\Desktop\ instagram.csv")
Data Preprocessing
-head()
head is used show to the By default = 5 rows in the dataset
localhost 8888/natebooks/Untiled22-Copy3 ipyno sna915728, 1125 PM Unitles22-CopyS -Jupyer Notebook
In [4]: df-head()
out{4}:
Rank Username __Owner_Followersmilions) __Profession/Acivty Country
01 Ginstagram —insiagram 3450 Sov mediapatom aed
cisiano
12 @eesiano sano se40 Footer Pagel
28 laces! Lonel Mess 760 Footbal Agenina
setena Musa, acess,and United
3 4 @selenagomez Gomez 423.0 businesswoman States
loonner Kye Jomer Telovision porsonaty and United
4 5 @herlics Kylie a 395.0 businesswoman States
tail()
tail is used to show rows by Descending order
In [5]: df.tail()
out{s]:
Rank Username Owner Followers(milins) __Professlow/Activity Country
Snoop
4548 @snoopdo 800 Musician United States
@enoopdogg Dogg
Forme footballer,
David United
45 47 @davidbecknam D0 793 pesdonterntseub wad
47 48 @jennierubyjane Jennie 79.4 Musician South Korea
449 @knabyoo —haby 79.1 Social media personality italy Senegal
49°50 @gighadid Gigi Hadid 785 Model United States
-Shape
It show the total no of rows & Column in the dataset
In [6]: df.shape
out[s]: (58, 6)
-Columns
It show the no of each Column
localhost 8888/natebooks/Untiled22-Copy3 ipyno ana915728, 1125 PM
In [7]:
out(7]:
In [8]:
out [8]:
In [9]:
out(9]:
In [10]:
out [10]:
Unitles22-CopyS -Jupyer Notebook
df.columns
Index(['Rank’, ‘Username’, ‘Owner’, ‘Followers(millions)',
"Profession/Activity’, ‘Country'],
dtype="object')
-dtypes
This Attribute show the data type of each column
af.dtypes
Rank intea
Username object
owner object
Followers(millions) — floatea
Profession/Activity object
Country object
dtype: object
-unique()
In a column, It show the unique value of specific column,
dF “Country” .unique()
array({"United states’, ‘Portugal’, ‘Argentina’, ‘Canada*, ‘India‘,
“Trinidad and Tobago\xa@United States", ‘Brazil’, “Barbados’,
‘spain’, ‘Europe’, "France, ‘Israel’, ‘Thailand’,
“united Kingdom\x20Albania*, ‘Colonbia', "United States\xa@Canada' ,
“united Kingdon", "South Korea’, ‘Italy\xa@Senegal'], dtype=object)
-nuique()
It will show the total no of unque value from whole data frame
df. nunique()
Rank 58
Username 5
Owner 5
Followers(millions) 47
Profession/Activity 30
Country 1s
dtype: intea
localhost 8888/natebooks/Untiled22-Copy3 ipyno ana915728, 1125 PM Unitles22-CopyS -Jupyer Notebook
-describe()
It show the Count, mean , median ete
In [11]: df.describe()
out (a2):
Rank Followors(millions)
count 50.0000 0.000000
mean 25.50000 208.976000
std 1457738 136.262832
min 1.00000 78.5000
25% 13.2500 96,850000
50% 25.5000 461.0000
78% 37.7500 288.0000
max 50.0000 1645,000000
-value_counts
It Shows al the unique values with their count
In [12]: df["Country"].value_counts()
out{12]: United States 2
India
Canada
Spain
Israel
South Korea
United Kingdom
United States Canada
Colombia
United Kingdom Albania
Thailand
Europe
France
Portugal
Barbados
Brazil
Trinidad and Tobago United States
Argentina
Italy Senegal
Name: Country, dtype: intea
localhost 8888/natebooks/Untiled22-Copy3 ipyno
ang915728, 1125 PM Unitles22-CopyS -Jupyer Notebook
-isnull()
It shows the how many null values
localhost 8888/natebooks/Untiled22-Copy3 ipyno sna915728, 1125 PM Unitles22-CopyS -Jupyer Notebook
In [13
df. isnull()
localhost 8888/natebooks/Untiled22-Copy3 ipyno ana915728, 1125 PM
out [13]:
10
"
2
8
“4
18
16
7
18
0
20
2
2
2
24
25
26
a
2
2
30
a
32
33
4
Rank
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
Username
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
localhost 8888/natebooks/Untiled22-Copy3 ipyno
Owner
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
Unitles22-CopyS -Jupyer Notebook
Followers(millions)
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
ProfessioniActivity
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
Falso
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
Country
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
m3915728, 1125 PM
In [14]:
out (14):
Unitles22-CopyS -Jupyter Notebook
Rank Username Owner Followers(nillions) Profession/Activity
35 Fake False False False
36 Faso False False False
37 False False False False
38 False False False False
39 False False False False
40 False False False False
41 False False False False
42 False False False False
43 Faso False False False
44 Faso False False False
45 False False False False
46 False False False False
47 False False False False
48 False False False False
49 False False False False
sns-heatmap(dF.isnul1())
0100
-o07s
-0.050
| 0025
| 0.000
} -0.025
| -0.050
| 0075
3
3
k
i
i
H
g
| 0.100
Top 10 Performers with the Highest Number of
Followers
localhost 8888/natebooks/Untiled22-Copy3 ipyno
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
Country
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
ana915728, 1125 PM
In [15]:
out(15]:
In [16]:
out (16):
In [17]:
out(17]:
top_1@_followers= df.sort_values(by =
top_10_followers
Unitles22-CopyS -Jupyer Notebook
Followers(millions)" , ascending = Fals'
Rank Username __Owner_Followers(millions) Profession/Activity Country
eo41 G@instagram Instagram 645.0 Social media platform Ye
1 2 @erstiane —_Sisono 5040 Footballer Porugl
203 @leomessi Lionel Messi 476.0 Footballer Argentina
34 domanae Soc og Minn sean, Sat
4 5 @hylejenner Kyle Jenner 395.0 Téewslon personaly and Unted
5 5 anewes Bin sis Meorantoceat at
67 rnense Set sso Meccan, Sate
. 8 beyonce Beyoncé s129 businesswoman States
9 10 @ktloekardashion apg KOS 009 Tdvisionprsonaliy and Unted
Correlation between Followers and Rank
correlation = df['Followers(millions)' ].corr(d#[ ‘Rank’ ])
correlation
-0.9121488888123581
Average Number of Followers
average_followers = df['Followers(millions)'].mean()
average_followers
208.976
Countries with the Most Performers
localhost 8888/natebooks/Untiled22-Copy3 ipyno
ona915728, 1125 PM Unitles22-CopyS -Jupyer Notebook
In [18]: country_counts = df['Country'].value_counts()
country_counts
out{18]: United States 2
India
Canada
Spain
Israel
South Korea
United Kingdom
United States Canada
Colombia
United Kingdom Albania
Thailand
Europe
France
Portugal
Barbados
Brazil
Trinidad and Tobago United States
Argentina
Italy Senegal
Name: Country, dtype: intea
Most Common Professions
localhost 8888/natebooks/Untiled22-Copy3 ipyno ron915728, 1125 PM
In [19]:
out(19]:
Unitles22-CopyS -Jupyer Notebook
‘common_professions
‘common_professions
Musician
Musician and actress
Footballer
Television personality and model
Actress
Musician and businesswoman
Football club
Club football competition
Footballer at Paris Saint-Germain
Social media platform
Actor
Space agency
Actress and musician
Professional basketball league
Former footballer, president of MLS club Inter Miami CF
Social media personality
Comedian and television personality
Actress and singer
Basketball player
Comedian and actor
Cricketer
Magazine
Model and television personality
Sportswear multinational
Television personality, model and businesswoman
Musician, actress and businesswoman
Actor and professional wrestler
Television personality and businesswoman
Musician, actress, and businesswoman
Model
Name: Profession/Activity, dtype: intea
Relationship Between Rank and Follower
Count
localhost 8888/natebooks/Untiled22-Copy3 ipyno
dF [ 'Profession/Activity' ].value_counts()
PEPE PBB BP PPB BP BEB NNN
nna915728, 1125 PM Unitled22-Copy3 -Jupyter Notebook
In (20
import matplotlib.pyplot as plt
plt.scatter(df['Rank'], d#['Followers(millions)'])
plt.xlabel(’Rank')
plt.ylabel( ‘Followers (millions)')
plt.title(‘Rank vs. Follower Count")
plt.show()
Rank vs. Follower Count
Followers(milions)
8828 8
5
Rank
In [21]: df.Country.value_counts().plot(kind = "bar")
out[21]:
DAATTCLTATTCeaatast
PSGgeeiegieesgaies
gr artes Sse ak seen y
i ae é Ha
ao fF 2
iy i
3
i
localhost 8888/natebooks/Untiled22-Copy3 ipyno rane915728, 1125 PM Unitled22-Copy3 -Jupyter Notebook
Show all the records where Owner == Cristiano
Ronaldo
In [27]: df[df[‘Owner"] == "Cristiano Ronaldo"]
out [27]:
Rank Username Owner Followers(millions) Profession/Activity Country
1 2 @oristiano Cristiano Ronaldo 594.0 Footballer Portugal
In [ ]:
localhost 8888/natebooks/Untiled22-Copy3 ipyno
sana