100% found this document useful (1 vote)

719 views

As Quiz 3 PCA Solution PDF

This document shows code for performing principal component analysis (PCA) on a Facebook post engagement dataset. It loads and preprocesses the data, applies PCA to extract six components, and answers questions about the analysis results, including eigenvectors and eigenvalues.

Uploaded by

BhagyaSree J

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

719 views

As Quiz 3 PCA Solution PDF

Uploaded by

BhagyaSree J

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 1

In [1]: import numpy as np

import pandas as pd
from sklearn.decomposition import PCA
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

In [5]: df = pd.read_csv("FB-1 (1).csv")

df.head()

Out[5]: status_id num_reactions num_comments num_shares num_likes num_loves num_wows num_hahas num_sads

0 246675545449582_1649696485147474 529 512 262 432 92 3 1 1

1 246675545449582_1649426988507757 150 0 0 150 0 0 0 0

2 246675545449582_1648730588577397 227 236 57 204 21 1 1 0

3 246675545449582_1648576705259452 111 0 0 111 0 0 0 0

4 246675545449582_1645700502213739 213 0 0 204 9 0 0 0

Q5) Which of the variables in the dataset is not significant for doing Principal
Component Analysis?
Now, let us go ahead and drop the column 'status_id' as that variable is of no use to us when we are doing
Principal Component Analysis.
In [6]: df_new = df.drop(['status_id'],axis = 1)
df_new.head()

Out[6]: num_reactions num_comments num_shares num_likes num_loves num_wows num_hahas num_sads num_angrys status_link status_phot

0 529 512 262 432 92 3 1 1 0 0

1 150 0 0 150 0 0 0 0 0 0

2 227 236 57 204 21 1 1 0 0 0

3 111 0 0 111 0 0 0 0 0 0

4 213 0 0 204 9 0 0 0 0 0

In [ ]:

Q6) After doing z-score scaling on the dataset, what is the value of the 2nd observation of the
variable ‘ num_hahas’?
In [7]: from scipy.stats import zscore
df_new=df_new.apply(zscore)
df_new.head()

Out[7]: num_reactions num_comments num_shares num_likes num_loves num_wows num_hahas num_sads num_angrys status_link status_phot

0 0.646104 0.323350 1.686879 0.482727 1.983266 0.196196 0.076713 0.473570 -0.155748 -0.094957 -1.24599

1 -0.173192 -0.252206 -0.304144 -0.144720 -0.318454 -0.147879 -0.176010 -0.152587 -0.155748 -0.094957 0.80257

2 -0.006738 0.013089 0.129017 -0.024571 0.206938 -0.033187 0.076713 -0.152587 -0.155748 -0.094957 -1.24599

3 -0.257499 -0.252206 -0.304144 -0.231495 -0.318454 -0.147879 -0.176010 -0.152587 -0.155748 -0.094957 0.80257

4 -0.037003 -0.252206 -0.304144 -0.024571 -0.093286 -0.147879 -0.176010 -0.152587 -0.155748 -0.094957 0.80257

ANS - The value of the 2nd observation of the variable ‘ num_hahas’ is -0.176010 .

In [ ]:

Q7) Apply PCA taking all features and extract 6 components and Find out the eigenvector of
the 5th component
In [9]: #Apply PCA taking all features
from sklearn.decomposition import PCA
pca = PCA(n_components=6, random_state=123)
pca_transformed = pca.fit_transform(df_new)

In [10]: #Extract eigen vectors

pca.components_

Out[10]: array([[ 0.29363054, 0.34749787, 0.44325444, 0.2517696 , 0.46125508,

0.29634039, 0.30885435, 0.16313058, 0.23724676, -0.00138341,
-0.23261371, 0.01379735],
[ 0.60664114, -0.230746 , -0.20491048, 0.6406539 , -0.16591724,
0.01626203, -0.13903343, -0.11041549, -0.12687418, 0.06418546,
0.03655064, 0.21318874],
[ 0.11200241, -0.087548 , -0.00392859, 0.10570202, 0.05181555,
0.21154873, 0.101801 , -0.04987934, 0.08923166, -0.23521304,
0.64341911, -0.65653464],
[ 0.00104601, -0.01595734, 0.03483879, -0.00173808, 0.03336338,
0.03375172, 0.01780145, -0.25206584, -0.042459 , 0.89259956,
-0.07188694, -0.35877103],
[ 0.08189114, 0.1862877 , -0.06986598, 0.1020903 , -0.13942737,
-0.37729947, -0.13429183, 0.81640504, 0.12355741, 0.20996813,
0.11148599, -0.15861432],
[-0.08520722, -0.43754044, -0.19674073, -0.09669555, -0.00487 ,
0.37224941, 0.05770028, 0.17312055, 0.66984295, 0.19021436,
0.12087046, 0.28612412]])
ANS - [ 0.08189114, 0.1862877 , -0.06986598, 0.1020903 , -0.13942737, -0.37729947, -0.13429183, 0.81640504, 0.12355741, 0.20996813,
0.11148599, -0.15861432],

In [ ]:

Q8) What is the eigenvector associated with the Second variable?

ANS - [ 0.60664114, -0.230746 , -0.20491048, 0.6406539 , -0.16591724, 0.01626203, -0.13903343, -0.11041549, -0.12687418, 0.06418546,
0.03655064, 0.21318874]

In [ ]:

Q9) Using the scaled dataset, Find out eigenvalues?

In [11]: #Check the eigen values
#Note: This is always returned in descending order
pca.explained_variance_

Out[11]: array([3.596288 , 1.78479109, 1.2511225 , 1.02089676, 0.95528279,

0.84959164])
ANS -([3.596288 , 1.78479109, 1.2511225 , 1.02089676, 0.95528279, 0.84959164])

In [ ]:

Q10) Using the given dataset, What are explained variances

In This
[12]: #Check
study source the explained
was downloaded variance
by 100000838437785 for each
from CourseHero.com PC
on 02-18-2022 07:44:44 GMT -06:00
#Note: Explained variance = (eigen value of each PC)/(sum of eigen values of all PCs)
pca.explained_variance_ratio_
https://www.coursehero.com/file/102630915/AS-Quiz-3-PCA-Solutionpdf/

Out[12]: array([0.29964816, 0.14871149, 0.10424542, 0.08506266, 0.07959561,

0.07078926])
ANS - ([0.29964816, 0.14871149, 0.10424542, 0.08506266, 0.07959561, 0.07078926])

In [ ]:

Powered by TCPDF (www.tcpdf.org)

This Study Resource Was: Quiz 3
100% (1)
This Study Resource Was: Quiz 3
5 pages
PM - ExtendedProject - Business Report
100% (4)
PM - ExtendedProject - Business Report
35 pages
HMS - National Report 2022 2023 - Full
No ratings yet
HMS - National Report 2022 2023 - Full
26 pages
Arnab Chowdhury As1
No ratings yet
Arnab Chowdhury As1
12 pages
Arnab Chowdhury DM
75% (4)
Arnab Chowdhury DM
14 pages
SMDM Project
100% (1)
SMDM Project
19 pages
Assignment 1: Reinforcement Learning Prof. B. Ravindran
100% (2)
Assignment 1: Reinforcement Learning Prof. B. Ravindran
4 pages
Problem Statement2
0% (1)
Problem Statement2
2 pages
Predictive Modelling ALOK KUMAR
100% (1)
Predictive Modelling ALOK KUMAR
25 pages
Business Report
No ratings yet
Business Report
12 pages
Advance Statistics Business Report
No ratings yet
Advance Statistics Business Report
15 pages
Project - Advanced Statistics - Final-1
100% (3)
Project - Advanced Statistics - Final-1
15 pages
DM Gopala Satish Kumar Business Report G8 DSBA
100% (2)
DM Gopala Satish Kumar Business Report G8 DSBA
26 pages
SMDM Business-Report Arvind Soni-2
0% (1)
SMDM Business-Report Arvind Soni-2
15 pages
PCA Project Advanced Statistics
67% (3)
PCA Project Advanced Statistics
24 pages
Advanced Statistics Project
17% (6)
Advanced Statistics Project
2 pages
Assignment Clustering
No ratings yet
Assignment Clustering
22 pages
Lab Report Triangle Test
100% (1)
Lab Report Triangle Test
4 pages
Education - Post 12th Standard - CSV
No ratings yet
Education - Post 12th Standard - CSV
11 pages
Data Mining Quiz 1 Clustering
100% (2)
Data Mining Quiz 1 Clustering
4 pages
Advanced Statistics: Business Report Ranvijay Sharma
No ratings yet
Advanced Statistics: Business Report Ranvijay Sharma
16 pages
SMDM Project
No ratings yet
SMDM Project
16 pages
Data Mining Project PCA Report
100% (1)
Data Mining Project PCA Report
27 pages
SMDM Project Gopala Satish Kumar Jupyter Notebook G8 DSBA
100% (1)
SMDM Project Gopala Satish Kumar Jupyter Notebook G8 DSBA
14 pages
Project Advance Stats - Abhishek
No ratings yet
Project Advance Stats - Abhishek
14 pages
Linear - Regression - Assignment: Problem Statement
100% (3)
Linear - Regression - Assignment: Problem Statement
24 pages
Assignment Report - Advanced Statistics
No ratings yet
Assignment Report - Advanced Statistics
12 pages
Predictive Modelling Project 1 PDF
50% (2)
Predictive Modelling Project 1 PDF
38 pages
Advance Statistics - Buisness Report
100% (1)
Advance Statistics - Buisness Report
26 pages
Pranjal - Singh - 27.11.2022 AS Project
No ratings yet
Pranjal - Singh - 27.11.2022 AS Project
9 pages
Predictive Modelling Project Gloria Susan Raju 11 APR 2021 PDF
No ratings yet
Predictive Modelling Project Gloria Susan Raju 11 APR 2021 PDF
56 pages
Project Report - Advanced - Stats - Final PDF
No ratings yet
Project Report - Advanced - Stats - Final PDF
25 pages
DATA MINING PROJECT PAVITHRAA GOVINDARAJAN 24 OCT 2021 Jupyter Notebook PDF
100% (3)
DATA MINING PROJECT PAVITHRAA GOVINDARAJAN 24 OCT 2021 Jupyter Notebook PDF
49 pages
Advance Statistics-Project Report
50% (2)
Advance Statistics-Project Report
17 pages
Predective Modellig Project
100% (1)
Predective Modellig Project
18 pages
Business Report: Predictive Modelling
100% (2)
Business Report: Predictive Modelling
37 pages
Advanced Statistics Jupyter File PDF
100% (2)
Advanced Statistics Jupyter File PDF
56 pages
Detail Project Report SMDM
100% (1)
Detail Project Report SMDM
25 pages
Data Mining Case Study PDF
100% (1)
Data Mining Case Study PDF
21 pages
Advanced Statistics Project - Jayant Chandra
No ratings yet
Advanced Statistics Project - Jayant Chandra
20 pages
AS Project Report
No ratings yet
AS Project Report
22 pages
Advanced Statistics Project Report
100% (1)
Advanced Statistics Project Report
34 pages
Predictive Modeling Business Report Seetharaman Final Changes PDF
100% (1)
Predictive Modeling Business Report Seetharaman Final Changes PDF
28 pages
Shoe Sales
100% (3)
Shoe Sales
105 pages
Which Year Has The Most Number of Records?: AS Quiz 2: Exploratory Data Analysis
100% (2)
Which Year Has The Most Number of Records?: AS Quiz 2: Exploratory Data Analysis
5 pages
Data Mining Business Report
No ratings yet
Data Mining Business Report
38 pages
Cart-Rf-ANN: Prepared by Muralidharan N
0% (1)
Cart-Rf-ANN: Prepared by Muralidharan N
16 pages
Assighment Project 1
100% (3)
Assighment Project 1
18 pages
Data Mini Proj
100% (2)
Data Mini Proj
44 pages
QUIZ Week 2 CART Practice PDF
No ratings yet
QUIZ Week 2 CART Practice PDF
10 pages
ML Quiz-2
No ratings yet
ML Quiz-2
5 pages
Predictive Model: Submitted by
100% (3)
Predictive Model: Submitted by
27 pages
Problem 1:: Readingcsv PD Read - Excel (Readingcsv) Readingcsv Head
No ratings yet
Problem 1:: Readingcsv PD Read - Excel (Readingcsv) Readingcsv Head
18 pages
Business Report - Advanced Statistics - Great Learning
100% (1)
Business Report - Advanced Statistics - Great Learning
20 pages
TSF - Graded Quiz 4 - Great Lakes Institute
No ratings yet
TSF - Graded Quiz 4 - Great Lakes Institute
5 pages
Mini Project - Factor Hair Analysis: Sravanthi.M
100% (2)
Mini Project - Factor Hair Analysis: Sravanthi.M
24 pages
DataMining Aug2021
100% (2)
DataMining Aug2021
49 pages
Graded Project AS
No ratings yet
Graded Project AS
14 pages
SMDM - Project Report - Lakshmi
No ratings yet
SMDM - Project Report - Lakshmi
26 pages
Project Predictive Modeling
50% (2)
Project Predictive Modeling
69 pages
Predictive Modelling Project 2
100% (4)
Predictive Modelling Project 2
32 pages
Week 5 (1)
No ratings yet
Week 5 (1)
4 pages
Tut01
No ratings yet
Tut01
52 pages
Full File at Http://testbankshop - eu/OM-5-5th-Edition-Collier-Test-Bank
No ratings yet
Full File at Http://testbankshop - eu/OM-5-5th-Edition-Collier-Test-Bank
12 pages
Business STAT 2 Class Lectures
No ratings yet
Business STAT 2 Class Lectures
15 pages
1_s20_S0264999323004145_main
No ratings yet
1_s20_S0264999323004145_main
17 pages
Assam Public Service Commission: ADVT. NO. 08/2016
No ratings yet
Assam Public Service Commission: ADVT. NO. 08/2016
4 pages
Statistics 2 Formula Sheet
No ratings yet
Statistics 2 Formula Sheet
2 pages
Lean Six Sigma Black Belt Body of Knowledge PDF
0% (2)
Lean Six Sigma Black Belt Body of Knowledge PDF
11 pages
Data Analytics With Python - Unit 14 - Week 12
100% (1)
Data Analytics With Python - Unit 14 - Week 12
4 pages
CH06
No ratings yet
CH06
48 pages
Thailand Expoert Rice
No ratings yet
Thailand Expoert Rice
16 pages
BABE
No ratings yet
BABE
4 pages
Gender Diversity Agency Cost
No ratings yet
Gender Diversity Agency Cost
18 pages
CSBS Sem I Sem Ii Sem Iii & Sem Iv
No ratings yet
CSBS Sem I Sem Ii Sem Iii & Sem Iv
80 pages
L32-LOF Example PDF
No ratings yet
L32-LOF Example PDF
12 pages
Work Life Balance-Novartis
100% (1)
Work Life Balance-Novartis
86 pages
Schedule
No ratings yet
Schedule
6 pages
Modelling and Forecasting Indias Electricity Consumption Using ANN
No ratings yet
Modelling and Forecasting Indias Electricity Consumption Using ANN
14 pages
Shs Pr2 q2 m7 Val - Withposttest
No ratings yet
Shs Pr2 q2 m7 Val - Withposttest
13 pages
BRM Test Bank 1
No ratings yet
BRM Test Bank 1
26 pages
CNN Midterm
No ratings yet
CNN Midterm
103 pages
The Impact of Creative Teaching Approach On Reducing Boredom in The Teaching Process
No ratings yet
The Impact of Creative Teaching Approach On Reducing Boredom in The Teaching Process
31 pages
2022-23 S1 - 22 (DSE) - ISM - EC3M - April 2023
No ratings yet
2022-23 S1 - 22 (DSE) - ISM - EC3M - April 2023
2 pages
Stock Market Forecasting Using Intrinsic Time-Scale Decomposition in Fusion With Cluster Based Modified CSA Optimized ELM (2022)
No ratings yet
Stock Market Forecasting Using Intrinsic Time-Scale Decomposition in Fusion With Cluster Based Modified CSA Optimized ELM (2022)
17 pages
Botany Ug Syllabus
No ratings yet
Botany Ug Syllabus
205 pages
Comparison Between Multinomial and Bernoulli Naïve Bayes For Text Classification
No ratings yet
Comparison Between Multinomial and Bernoulli Naïve Bayes For Text Classification
4 pages
Special Parametric Families of Univariate Distributions
100% (1)
Special Parametric Families of Univariate Distributions
4 pages
29_Cybersecurity+in+Network+Traffic+latest
No ratings yet
29_Cybersecurity+in+Network+Traffic+latest
11 pages
Syllabus Epidemiology
No ratings yet
Syllabus Epidemiology
10 pages

As Quiz 3 PCA Solution PDF

Uploaded by

As Quiz 3 PCA Solution PDF

Uploaded by

In [1]: import numpy as np

In [5]: df = pd.read_csv("FB-1 (1).csv")

0 246675545449582_1649696485147474 529 512 262 432 92 3 1 1

1 246675545449582_1649426988507757 150 0 0 150 0 0 0 0

2 246675545449582_1648730588577397 227 236 57 204 21 1 1 0

3 246675545449582_1648576705259452 111 0 0 111 0 0 0 0

4 246675545449582_1645700502213739 213 0 0 204 9 0 0 0

0 529 512 262 432 92 3 1 1 0 0

2 227 236 57 204 21 1 1 0 0 0

In [10]: #Extract eigen vectors

Out[10]: array([[ 0.29363054, 0.34749787, 0.44325444, 0.2517696 , 0.46125508,

Q8) What is the eigenvector associated with the Second variable?

Q9) Using the scaled dataset, Find out eigenvalues?

Out[11]: array([3.596288 , 1.78479109, 1.2511225 , 1.02089676, 0.95528279,

Q10) Using the given dataset, What are explained variances

Out[12]: array([0.29964816, 0.14871149, 0.10424542, 0.08506266, 0.07959561,

Powered by TCPDF (www.tcpdf.org)

You might also like