1 Task 2: Random Data?

The document describes an issue where randomly generated data (X) and responses (Y) for a binary classification task using an SVM model are incorrectly producing AUC scores significantly higher than 0.5, as would be expected from random data. The author implemented the same experiment in both R and Python using leave-group-out cross validation 1000 times on data with a small number of rows for X. They are looking for potential causes of this issue other than simply increasing the number of rows in X or decreasing the number of columns. No answer is provided.

Uploaded by

Jawwad Qammar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

45 views

1 Task 2: Random Data?

Uploaded by

Jawwad Qammar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

task2_random-data

February 24, 2023

1 Task 2: Random Data?

1.1 Question
I ran the following code for a binary classification task w/ an SVM in both R (first
sample) and Python (second example).
Given randomly generated data (X) and response (Y), this code performs leave group
out cross validation 1000 times. Each entry of Y is therefore the mean of the prediction
across CV iterations.
Computing area under the curve should give ~0.5, since X and Y are completely random.
However, this is not what we see. Area under the curve is frequently significantly higher
than 0.5. The number of rows of X is very small, which can obviously cause problems.
Any idea what could be happening here? I know that I can either increase the number
of rows of X or decrease the number of columns to mediate the problem, but I am
looking for other issues.
Y=as.factor(rep(c(1,2), times=14))
X=matrix(runif(length(Y)*100), nrow=length(Y))

library(e1071)
library(pROC)

colnames(X)=1:ncol(X)
iter=1000
ansMat=matrix(NA,length(Y),iter)
for(i in seq(iter)){
#get train

train=sample(seq(length(Y)),0.5*length(Y))
if(min(table(Y[train]))==0)
next

#test from train

test=seq(length(Y))[-train]

#train model
XX=X[train,]

1
YY=Y[train]
mod=svm(XX,YY,probability=FALSE)
XXX=X[test,]
predVec=predict(mod,XXX)
RFans=attr(predVec,'decision.values')
ansMat[test,i]=as.numeric(predVec)
}

ans=rowMeans(ansMat,na.rm=TRUE)

r=roc(Y,ans)$auc
print(r)
Similarly, when I implement the same thing in Python I get similar results.

[10]: Y = np.array([1, 2]*14)

X = np.random.uniform(size=[len(Y), 100])
n_iter = 1000
ansMat = np.full((len(Y), n_iter), np.nan)
for i in range(n_iter):
# Get train/test index
train = np.random.choice(range(len(Y)), size=int(0.5*len(Y)),␣
↪replace=False, p=None)

if len(np.unique(Y)) == 1:
continue
test = np.array([i for i in range(len(Y)) if i not in train])
# train model
mod = SVC(probability=False)
mod.fit(X=X[train, :], y=Y[train])
# predict and collect answer
ansMat[test, i] = mod.predict(X[test, :])
ans = np.nanmean(ansMat, axis=1)
fpr, tpr, thresholds = roc_curve(Y, ans, pos_label=1)
print(auc(fpr, tpr))

0.8367346938775511

1.2 Your answer

[ ]:

1.3 Feedback
Was this exercise is diﬀicult or not? In either case, briefly describe why.

[ ]:

2
[ ]:

Machine Learning - An Applied Mathematics Introduction PDF
100% (12)
Machine Learning - An Applied Mathematics Introduction PDF
246 pages
Fin 460-HW 4 Adianto Joel
No ratings yet
Fin 460-HW 4 Adianto Joel
7 pages
Solutions To Selected Problems in Machine Learning: An Algorithmic Perspective
0% (1)
Solutions To Selected Problems in Machine Learning: An Algorithmic Perspective
4 pages
ML5
No ratings yet
ML5
5 pages
120 DS-With Answer
100% (1)
120 DS-With Answer
32 pages
Permutation and Combinations
From Everand
Permutation and Combinations
Ramesh Chandra
4/5 (36)
Optimiation Ass 05
No ratings yet
Optimiation Ass 05
5 pages
ADS_phase 3
No ratings yet
ADS_phase 3
34 pages
Assignment 3
No ratings yet
Assignment 3
2 pages
ISYE6501-Homework-1
No ratings yet
ISYE6501-Homework-1
7 pages
Ai Ml Exam_1march 16 2022-Michael Magreola
No ratings yet
Ai Ml Exam_1march 16 2022-Michael Magreola
8 pages
Lokesh T00691325
No ratings yet
Lokesh T00691325
5 pages
C2_W3_Assignment
No ratings yet
C2_W3_Assignment
437 pages
Basic Exercises for Competitive Programming: Python
From Everand
Basic Exercises for Competitive Programming: Python
Jan Pol
No ratings yet
ML Lab6
No ratings yet
ML Lab6
4 pages
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
Unit2 ML Programs
No ratings yet
Unit2 ML Programs
7 pages
Discussion 3 Supervised
No ratings yet
Discussion 3 Supervised
14 pages
Aula 4 (L) - Oggi La Tua Lezione È in Presenza
No ratings yet
Aula 4 (L) - Oggi La Tua Lezione È in Presenza
11 pages
Bilal Ahmad Ai & DSS Assign # 03
No ratings yet
Bilal Ahmad Ai & DSS Assign # 03
7 pages
Trabajo Práctico 2 Sistemas de Inteligencia Artificial: Integrante: Victoria Vallejo L. 61834
No ratings yet
Trabajo Práctico 2 Sistemas de Inteligencia Artificial: Integrante: Victoria Vallejo L. 61834
10 pages
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
No ratings yet
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
10 pages
Model Paper - Applied Machine Learning
No ratings yet
Model Paper - Applied Machine Learning
3 pages
ML Ex 5
No ratings yet
ML Ex 5
6 pages
ML_Prac1-10
No ratings yet
ML_Prac1-10
32 pages
CS3491 Lab Manual
No ratings yet
CS3491 Lab Manual
21 pages
Exp 3 Bi 30
No ratings yet
Exp 3 Bi 30
7 pages
Python Learning
No ratings yet
Python Learning
21 pages
Assignment - 01
No ratings yet
Assignment - 01
4 pages
Exercise7 Ensembles
No ratings yet
Exercise7 Ensembles
19 pages
22-Structured Introduction
No ratings yet
22-Structured Introduction
66 pages
Supervised Learning in R Classification
No ratings yet
Supervised Learning in R Classification
7 pages
Import Pandas As PD DF PD - Read - CSV ("Titanic - Train - CSV") DF - Head
No ratings yet
Import Pandas As PD DF PD - Read - CSV ("Titanic - Train - CSV") DF - Head
20 pages
Machine Learning Foundations and Applications Assignment 1 Due Date: 10 October, 2021
No ratings yet
Machine Learning Foundations and Applications Assignment 1 Due Date: 10 October, 2021
3 pages
PPT6-Buss Intel Analytics
No ratings yet
PPT6-Buss Intel Analytics
41 pages
Assignment3 - Nekhlesh SIngh Sajwan
No ratings yet
Assignment3 - Nekhlesh SIngh Sajwan
5 pages
Code ExerciseModelSelection
100% (1)
Code ExerciseModelSelection
19 pages
Assignment3 - Nekhlesh SIngh Sajwan
No ratings yet
Assignment3 - Nekhlesh SIngh Sajwan
5 pages
Digital Signal and Image Processing using MATLAB, Volume 3: Advances and Applications, The Stochastic Case
From Everand
Digital Signal and Image Processing using MATLAB, Volume 3: Advances and Applications, The Stochastic Case
Gérard Blanchet
3/5 (1)
Exercises Question
No ratings yet
Exercises Question
30 pages
SVM
No ratings yet
SVM
8 pages
Lecture2 PDF
No ratings yet
Lecture2 PDF
111 pages
Setup: This Notebook Contains All The Sample Code and Solutions To The Exercises in Chapter 3
No ratings yet
Setup: This Notebook Contains All The Sample Code and Solutions To The Exercises in Chapter 3
30 pages
ASSIGNMENT 3 - Probabilistic Models, GBDT, SVM
No ratings yet
ASSIGNMENT 3 - Probabilistic Models, GBDT, SVM
3 pages
Topic 2 Matlab Examples
No ratings yet
Topic 2 Matlab Examples
5 pages
Midterm Sol
No ratings yet
Midterm Sol
23 pages
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
From Everand
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
Jeffrey M. Wooldridge
No ratings yet
COMPARISON - Jupyter Notebook
No ratings yet
COMPARISON - Jupyter Notebook
5 pages
Maxbox Starter66 Machine Learning4
No ratings yet
Maxbox Starter66 Machine Learning4
10 pages
Prathamesh KRAI
No ratings yet
Prathamesh KRAI
38 pages
A Brief Introduction to MATLAB: Taken From the Book "MATLAB for Beginners: A Gentle Approach"
From Everand
A Brief Introduction to MATLAB: Taken From the Book "MATLAB for Beginners: A Gentle Approach"
Peter Kattan
2.5/5 (2)
Charmi Shah 20bcp299 Lab2
100% (1)
Charmi Shah 20bcp299 Lab2
7 pages
Classification Is For Predicting Type and Regression Is For Predicting Value
No ratings yet
Classification Is For Predicting Type and Regression Is For Predicting Value
4 pages
Journal of Statistical Software
No ratings yet
Journal of Statistical Software
66 pages
M818A: Machine Learning and Cyber Security-A
No ratings yet
M818A: Machine Learning and Cyber Security-A
11 pages
Midterm Solutions For Machine Learning
No ratings yet
Midterm Solutions For Machine Learning
13 pages
A1388404476 - 64039 - 23 - 2023 - Machine Learning II
No ratings yet
A1388404476 - 64039 - 23 - 2023 - Machine Learning II
10 pages
R Assignment
No ratings yet
R Assignment
8 pages
Giant Pile ML Problems
No ratings yet
Giant Pile ML Problems
56 pages
HW_02
No ratings yet
HW_02
3 pages
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
From Everand
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
Yue Jiang
4.5/5 (2)
Ai HW
No ratings yet
Ai HW
7 pages
42459
No ratings yet
42459
11 pages
Excel and Forecasting Lab
No ratings yet
Excel and Forecasting Lab
22 pages
A Comprehensive Survey of Dynamic Graph Neural Networks: Models, Frameworks, Benchmarks, Experiments and Challenges
No ratings yet
A Comprehensive Survey of Dynamic Graph Neural Networks: Models, Frameworks, Benchmarks, Experiments and Challenges
17 pages
BIO/CS 471 - Algorithms For Bioinformatics: Concepts
No ratings yet
BIO/CS 471 - Algorithms For Bioinformatics: Concepts
33 pages
Speciality Packaging Case Study
No ratings yet
Speciality Packaging Case Study
20 pages
Chapter 4 Controlling Discretization Errors
No ratings yet
Chapter 4 Controlling Discretization Errors
34 pages
Pumping Lemma For Regular Languages: Problem 1: Section 4.3
No ratings yet
Pumping Lemma For Regular Languages: Problem 1: Section 4.3
15 pages
Mathematics - Wikipedia
No ratings yet
Mathematics - Wikipedia
30 pages
NepaliGPT 2.0: Nepali Text Understanding and Generation
No ratings yet
NepaliGPT 2.0: Nepali Text Understanding and Generation
9 pages
1.1 Why Study Theory?
No ratings yet
1.1 Why Study Theory?
5 pages
Security With &: Secure Elements
No ratings yet
Security With &: Secure Elements
26 pages
HW#1 M. Tahrawi
No ratings yet
HW#1 M. Tahrawi
10 pages
RAG with math
No ratings yet
RAG with math
7 pages
Lecture 10_5_Expectation of Binomial Variable
No ratings yet
Lecture 10_5_Expectation of Binomial Variable
36 pages
2IL50 Data Structures: 2018-19 Q3 Lecture 2: Analysis of Algorithms
No ratings yet
2IL50 Data Structures: 2018-19 Q3 Lecture 2: Analysis of Algorithms
39 pages
"Objectives" of Lecture #DSP: - The Need For DSP - Aliasing & Windowing - Introduction To FFT
No ratings yet
"Objectives" of Lecture #DSP: - The Need For DSP - Aliasing & Windowing - Introduction To FFT
21 pages
HR Viton
No ratings yet
HR Viton
27 pages
Cryptography and Network Security
No ratings yet
Cryptography and Network Security
24 pages
AI L5 - Genetic Algorithms
No ratings yet
AI L5 - Genetic Algorithms
17 pages
Thinkwell Exercise 2.102 PDF
No ratings yet
Thinkwell Exercise 2.102 PDF
1 page
Solutions
No ratings yet
Solutions
3 pages
Final Quiz 1 - Attempt Review
No ratings yet
Final Quiz 1 - Attempt Review
6 pages
Review Worksheet Part 1: Unsigned 8-Bit Arithmetic: Girum Ibssa
No ratings yet
Review Worksheet Part 1: Unsigned 8-Bit Arithmetic: Girum Ibssa
2 pages
Control Syllabus
No ratings yet
Control Syllabus
2 pages
Ses B.tech Artificial Intelligence
No ratings yet
Ses B.tech Artificial Intelligence
9 pages
Decision Trees
No ratings yet
Decision Trees
128 pages
Chapter 17 - Waiting Lines
No ratings yet
Chapter 17 - Waiting Lines
4 pages
Round and Round Maze: Input
No ratings yet
Round and Round Maze: Input
2 pages

1 Task 2: Random Data?

Uploaded by

1 Task 2: Random Data?

Uploaded by

task2_random-data

February 24, 2023

1 Task 2: Random Data?

#test from train

[10]: Y = np.array([1, 2]*14)

1.2 Your answer

You might also like