0% found this document useful (0 votes)

2 views

Alam-Proj2

The document outlines a data analysis project involving a dataset on shill bidding behavior. It includes data preprocessing steps such as reading the dataset, removing ID columns, converting class values, and checking for missing values. The analysis also covers scaling of features, class distribution visualization, and preparation for model training and validation.

Uploaded by

tamim.sfsab

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

Alam-Proj2

Uploaded by

tamim.sfsab

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

Alam-Proj2

Tamim Alam_80764318

2025-02-28
Loading Necessary Libraries
library(dplyr)

## Warning: package 'dplyr' was built under R version 4.3.3

##
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':

##
## filter, lag

## The following objects are masked from 'package:base':

##
## intersect, setdiff, setequal, union

library(ggplot2)

## Warning: package 'ggplot2' was built under R version 4.3.3

library(caret)

## Warning: package 'caret' was built under R version 4.3.3

## Loading required package: lattice

Ques 1
# Read the dataset (assuming the file is named "Shill Bidding Dataset.csv")
df <- read.csv("Shill Bidding Dataset.csv")

# Remove the first three columns (ID variables)

df <- df %>% select(-c(Record_ID, Auction_ID, Bidder_ID))

# Convert Class values: Change 0 to -1

df$Class <- ifelse(df$Class == 0, -1, 1)

# Display the structure of the updated dataset

str(df)

## 'data.frame': 6321 obs. of 10 variables:

## $ Bidder_Tendency : num 0.2 0.0244 0.1429 0.1 0.0513 ...
## $ Bidding_Ratio : num 0.4 0.2 0.2 0.2 0.222 ...
## $ Successive_Outbidding : num 0 0 0 0 0 0 0 1 1 0.5 ...
## $ Last_Bidding : num 2.78e-05 1.31e-02 3.04e-03 9.75e-02 1.32e-
03 ...
## $ Auction_Bids : num 0 0 0 0 0 ...
## $ Starting_Price_Average: num 0.994 0.994 0.994 0.994 0 ...
## $ Early_Bidding : num 2.78e-05 1.31e-02 3.04e-03 9.75e-02 1.24e-
03 ...
## $ Winning_Ratio : num 0.667 0.944 1 1 0.5 ...
## $ Auction_Duration : int 5 5 5 5 7 7 7 7 7 7 ...
## $ Class : num -1 -1 -1 -1 -1 -1 -1 1 1 1 ...

# Display first few rows

head(df)

## Bidder_Tendency Bidding_Ratio Successive_Outbidding Last_Bidding

Auction_Bids
## 1 0.20000000 0.4000000 0 0.0000277778
0
## 2 0.02439024 0.2000000 0 0.0131226852
0
## 3 0.14285714 0.2000000 0 0.0030416667
0
## 4 0.10000000 0.2000000 0 0.0974768519
0
## 5 0.05128205 0.2222222 0 0.0013177910
0
## 6 0.03846154 0.1111111 0 0.0168435847
0
## Starting_Price_Average Early_Bidding Winning_Ratio Auction_Duration
Class
## 1 0.9935928 0.0000277778 0.6666667 5 -
1
## 2 0.9935928 0.0131226852 0.9444444 5 -
1
## 3 0.9935928 0.0030416667 1.0000000 5 -
1
## 4 0.9935928 0.0974768519 1.0000000 5 -
1
## 5 0.0000000 0.0012417328 0.5000000 7 -
1
## 6 0.0000000 0.0168435847 0.8000000 7 -
1

Ques 2a
# Compute the number of unique values for each column
distinct_counts <- sapply(df, function(x) length(unique(x)))

# Print the distinct counts for each variable

print(distinct_counts)
## Bidder_Tendency Bidding_Ratio Successive_Outbidding
## 489 400 3
## Last_Bidding Auction_Bids Starting_Price_Average
## 5807 49 22
## Early_Bidding Winning_Ratio Auction_Duration
## 5690 72 5
## Class
## 2

Comment: 1. Binary Variable: Class (2 unique values): Represents the target variable with -
1 (normal behavior) and 1 (otherwise). This is a binary classification problem. 2. Variables
with Very Few Unique Values (Likely Categorical or Discrete): Successive_Outbidding (3
unique values): Since this describes whether a bidder successively outbids themselves, it is
likely an ordinal variable (e.g., low, medium, high frequency). Auction_Duration (5 unique
values): Since auction durations are limited to a few possible values, it is likely categorical
or discrete numeric. 3. Variables with a Moderate Number of Unique Values (Likely
Discrete Numeric): Auction_Bids (49 unique values): Auctions with shill bidding tend to
have higher bid counts. Since bid counts are whole numbers, this is discrete numeric.
Starting_Price_Average (22 unique values): The number of distinct starting prices is
relatively small, meaning that auctions often begin at specific price points rather than any
random value. This could be discrete numeric but with limited granularity. 4. Variables
with Many Unique Values (Likely Continuous Features): Bidder_Tendency (489 unique
values): Measures how concentrated a bidder’s activity is within a small group of sellers.
Since it has many values, it is continuous. Bidding_Ratio (400 unique values): Indicates how
frequently a bidder participates, and its high variation suggests it is continuous.
Last_Bidding (5807 unique values): Measures inactivity towards the end of the auction; the
large number of values suggests a continuous variable. Early_Bidding (5690 unique
values): Indicates if a bidder places bids early in an auction, another continuous variable.
Winning_Ratio (72 unique values): Represents how often a bidder wins. This is continuous
but with a somewhat limited range.
Ques 2b
# Check for missing values in the dataset
missing_values <- colSums(is.na(df))

# Print the missing values count for each column

print(missing_values)

## Bidder_Tendency Bidding_Ratio Successive_Outbidding

## 0 0 0
## Last_Bidding Auction_Bids Starting_Price_Average
## 0 0 0
## Early_Bidding Winning_Ratio Auction_Duration
## 0 0 0
## Class
## 0

Comment: The dataset has no missing values.

Ques 2c
# Convert dataset to long format for ggplot
df_long <- reshape2::melt(df, id.vars = "Class")

# Create parallel boxplots

ggplot(df_long, aes(x = variable, y = value)) +
geom_boxplot() +
labs(title = "Parallel Boxplot of Predictors",
x = "Predictors",
y = "Values") +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) # Rotate x-axis
labels for better visibility

Comment: The parallel boxplot clearly shows that Auction_Duration has a much larger
range than other variables. Most other features are tightly packed near zero, meaning they
contribute much less in their raw form.
Some predictors, like Successive_Outbidding, have a small range with a few outliers. Others,
like Auction_Bids and Winning_Ratio, have a moderate range. Without scaling, variables
with a large range will dominate distance-based models.
Kernel-based methods like SVM are sensitive to feature scales. Without scaling, features
with higher magnitudes (e.g., Auction_Duration) will have a stronger influence on the
model than those with smaller values (e.g., Bidding_Ratio). Scaling ensures all features
contribute equally. Since we plan to use SVM (which relies on the kernel trick), scaling is
mandatory to avoid feature dominance and improve model performance.
# Identify numeric columns (excluding Class)
numeric_features <- df %>% select(-Class)

# Standardize the numeric features (Z-score normalization)

df_scaled <- numeric_features %>% mutate(across(everything(), scale))

# Add the Class column back to the scaled dataset

df_scaled$Class <- df$Class

# Display the summary of the scaled data

summary(df_scaled)

## Bidder_Tendency.V1 Bidding_Ratio.V1 Successive_Outbidding.V1

## Min. :-0.723250 Min. :-0.881203 Min. :-0.371047
## 1st Qu.:-0.586115 1st Qu.:-0.640091 1st Qu.:-0.371047
## Median :-0.406126 Median :-0.337080 Median :-0.371047
## Mean : 0.000000 Mean : 0.000000 Mean : 0.000000
## 3rd Qu.: 0.122415 3rd Qu.: 0.296487 3rd Qu.:-0.371047
## Max. : 4.350740 Max. : 6.632156 Max. : 3.204238
## Last_Bidding.V1 Auction_Bids.V1 Starting_Price_Average.V1
## Min. :-1.2184234 Min. :-0.9073611 Min. :-0.9651145
## 1st Qu.:-1.0923288 1st Qu.:-0.9073611 1st Qu.:-0.9651145
## Median :-0.0583584 Median :-0.3476901 Median :-0.9651145
## Mean : 0.0000000 Mean : 0.0000000 Mean : 0.0000000
## 3rd Qu.: 1.0451096 3rd Qu.: 0.8734102 3rd Qu.: 1.0629895
## Max. : 1.4122205 Max. : 2.1807058 Max. : 1.0759356
## Early_Bidding.V1 Winning_Ratio.V1 Auction_Duration.V1
## Min. :-1.1310382 Min. :-0.8423123 Min. :-1.4656003
## 1st Qu.:-1.0611290 1st Qu.:-0.8423123 1st Qu.:-0.6547772
## Median :-0.1853496 Median :-0.8423123 Median : 0.1560459
## Mean : 0.0000000 Mean : 0.0000000 Mean : 0.0000000
## 3rd Qu.: 1.0401619 3rd Qu.: 1.1089099 3rd Qu.: 0.9668691
## Max. : 1.4948532 Max. : 1.4482529 Max. : 2.1831038
## Class
## Min. :-1.0000
## 1st Qu.:-1.0000
## Median :-1.0000
## Mean :-0.7864
## 3rd Qu.:-1.0000
## Max. : 1.0000

Comment: The mean of all numeric variables is approximately 0. The standard deviation of
all features is 1. The min and max values are now centered around 0, confirming that
scaling was applied correctly.
Most features now fall within the range of -1 to 1, though some values go slightly beyond
due to outliers. For example: Bidder_Tendency.V1: Min = -0.72, Max = 4.35
Winning_Ratio.V1: Min = -0.84, Max = 1.44 This shows that some features have outliers, but
the overall transformation brings them to a comparable scale.
The Class variable still has only two unique values (-1 and 1), which confirms that the
target variable was not mistakenly scaled.
Successive_Outbidding and Auction_Duration Were Scaled. Even though these features had
only a few unique values, they were still standardized.
This is fine for SVM or logistic regression, as it prevents numerical instability.
Ques 2d
# Create a bar plot for Class distribution
ggplot(df_scaled, aes(x = as.factor(Class))) +
geom_bar(fill = "steelblue") +
labs(title = "Class Distribution", x = "Class", y = "Count") +
theme_minimal()

Comment: Severe Class Imbalance

The majority class (-1, normal bidding behavior) significantly outweighs the minority class
(1, shill bidding behavior). From the plot, it looks like around 90% of the data belongs to
Class -1, while only about 10% belongs to Class.
Ques 3
# Set seed for reproducibility
set.seed(123)

# Define the splitting ratios

train_ratio <- 0.5 # 2/4
valid_ratio <- 0.25 # 1/4
test_ratio <- 0.25 # 1/4

# First, split into training (50%) and temp (50%) (valid + test)
train_index <- createDataPartition(df_scaled$Class, p = train_ratio, list =
FALSE)
train_data <- df_scaled[train_index, ]
temp_data <- df_scaled[-train_index, ]

# Split temp_data into validation (25%) and test (25%)

valid_index <- createDataPartition(temp_data$Class, p = valid_ratio /
(valid_ratio + test_ratio), list = FALSE)
valid_data <- temp_data[valid_index, ]
test_data <- temp_data[-valid_index, ]

# Check dimensions of the splits

dim(train_data)

## [1] 3161 10

# Check dimensions of the splits

dim(valid_data)

## [1] 1580 10

# Check dimensions of the splits

dim(test_data)

## [1] 1580 10

Ques 4a
# Pool the training and validation datasets
D_prime <- rbind(train_data, valid_data)

# Separate predictors and response variable

X <- as.matrix(D_prime %>% select(-Class)) # Predictor variables
y <- as.numeric(D_prime$Class) # Response variable (-1,1)

# Define the negative log-likelihood function

neg_log_likelihood <- function(beta, X, y) {
linear_pred <- X %*% beta # Compute linear combination
log_likelihood <- -sum(log(1 + exp(-y * linear_pred))) # Logistic loss
return(-log_likelihood) # Return negative log-likelihood
}

# Initialize beta coefficients

beta_init <- rep(0, ncol(X))

# Optimize using BFGS method

optim_result <- optim(
par = beta_init,
fn = neg_log_likelihood,
X = X, y = y,
method = "BFGS",
hessian = TRUE
)

# Extract estimated coefficients

beta_hat <- optim_result$par

# Extract Hessian matrix (for standard errors)

hessian_matrix <- optim_result$hessian

# Compute standard errors from Hessian

se <- sqrt(diag(solve(hessian_matrix)))

# Compute p-values (Wald Test)

z_scores <- beta_hat / se
p_values <- 2 * (1 - pnorm(abs(z_scores))) # Two-tailed test

# Tabulate results
results <- data.frame(
Coefficient = beta_hat,
Std_Error = se,
Z_Score = z_scores,
P_Value = p_values
)

# Print the results

print(results)

## Coefficient Std_Error Z_Score P_Value

## 1 0.0036445904 0.05382070 0.067717268 0.9460107
## 2 0.0004977948 0.10537620 0.004723977 0.9962308
## 3 5.0963641229 0.13436311 37.929787126 0.0000000
## 4 0.1369996745 0.16035600 0.854347027 0.3929127
## 5 -0.0257401134 0.06780506 -0.379619344 0.7042280
## 6 -0.0145358099 0.06065458 -0.239649000 0.8106024
## 7 -0.0789644815 0.16200879 -0.487408618 0.6259688
## 8 0.0549160275 0.06822855 0.804883416 0.4208870
## 9 0.0137500248 0.04664752 0.294764320 0.7681739

# Check if optimization converged

if (optim_result$convergence == 0) {
print("Optimization converged successfully!")
} else {
print("Optimization did NOT converge.")
}

## [1] "Optimization converged successfully!"

Comment: Only One Predictor is Statistically Significant

Third Predictor (β₃ = 5.0963641229, p-value = 0.000000) This variable has a very low p-
value, meaning it is highly significant in predicting shill bidding. Most Other Predictors are
Not Significant.
Variables with high p-values (e.g., 0.946017, 0.9962308, 0.704228, etc.) are not statistically
significant. These predictors do not strongly contribute to distinguishing between normal
and shill bidding.
Convergence Check: Optimization converged successfully! This confirms that the optim()
function successfully minimized the negative log-likelihood, meaning the logistic regression
model has found optimal coefficients.
Ques 4b
# Convert Class from {-1,1} to {0,1}
D_prime$Class <- ifelse(D_prime$Class == -1, 0, 1)

# Fit logistic regression model using glm()

logit_model <- glm(Class ~ ., data = D_prime, family = binomial)

# Display summary of the model

summary(logit_model)

##
## Call:
## glm(formula = Class ~ ., family = binomial, data = D_prime)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -6.841774 0.493919 -13.852 < 2e-16 ***
## Bidder_Tendency 0.067916 0.115924 0.586 0.558
## Bidding_Ratio 0.004853 0.137284 0.035 0.972
## Successive_Outbidding 3.103406 0.199467 15.558 < 2e-16 ***
## Last_Bidding 0.466863 0.323492 1.443 0.149
## Auction_Bids 0.134472 0.198016 0.679 0.497
## Starting_Price_Average 0.048412 0.167136 0.290 0.772
## Early_Bidding -0.413480 0.321724 -1.285 0.199
## Winning_Ratio 2.485606 0.315598 7.876 3.38e-15 ***
## Auction_Duration 0.138105 0.130763 1.056 0.291
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 3220.28 on 4740 degrees of freedom
## Residual deviance: 422.68 on 4731 degrees of freedom
## AIC: 442.68
##
## Number of Fisher Scoring iterations: 9

Comment: Intercept is highly negative, meaning the baseline probability of shill bidding
(when all predictors are 0) is very low. Successive_Outbidding has a very large positive
coefficient (3.1034, p < 2e-16), making it the strongest predictor of shill bidding.
Winning_Ratio also has a large positive coefficient (2.4856, p = 3.38e-15), meaning bidders
with a low winning ratio are more likely to be shill bidders.
optim() only found one significant predictor. glm() found two (Successive_Outbidding,
Winning_Ratio). More Stable Standard Errors in glm().
Ques 4c
# Convert Class from {0,1} to {-1,1}
D_prime$Class <- ifelse(D_prime$Class == 0, -1, 1)

# Extract test predictors (excluding the response variable)

X_test <- as.matrix(test_data %>% select(-Class))

# Add intercept column (column of 1s)

X_test <- cbind(1, X_test) # First column should be 1s

# Compute linear combination X'β (dot product)

linear_pred <- X_test %*% coef(logit_model)

# Apply logistic function

probabilities <- exp(linear_pred) / (1 + exp(linear_pred))

# Convert probabilities to class predictions using threshold 0.5

y_pred <- ifelse(probabilities > 0.5, 1, 0) # 1 if prob > 0.5, else 0

# Extract actual labels from test set

y_actual <- test_data$Class

# Compute accuracy
accuracy <- mean(y_pred == y_actual)
# Print accuracy
print(paste("Test Accuracy:", round(accuracy * 100, 2), "%"))

## [1] "Test Accuracy: 9.68 %"

Comment: The logistic regression model is performing very poorly on the test set. A 9.68%
accuracy is much worse than random guessing (~50% for a balanced dataset).
Ques 5a
# Extract predictor matrices for training (X1), validation (X2), and test
(X3)
X1 <- as.matrix(train_data %>% select(-Class))
X2 <- as.matrix(valid_data %>% select(-Class))
X3 <- as.matrix(test_data %>% select(-Class))

# Standardize X1 (Compute column means and standard deviations)

X1_scaled <- scale(X1)

# Extract means and standard deviations from X1

X1_mean <- attr(X1_scaled, "scaled:center")
X1_sd <- attr(X1_scaled, "scaled:scale")

# Scale X2 and X3 using the means and SDs from X1

X2_scaled <- scale(X2, center = X1_mean, scale = X1_sd)
X3_scaled <- scale(X3, center = X1_mean, scale = X1_sd)

# Convert response variables to numeric form

y1 <- train_data$Class # Training labels
y2 <- valid_data$Class # Validation labels
y3 <- test_data$Class # Test labels

# Compute mean vectors for each class in the training data

m_pos <- colMeans(X1_scaled[y1 == 1, ]) # Mean of class 1
m_neg <- colMeans(X1_scaled[y1 == -1, ]) # Mean of class -1

# Compute middle point m

m <- (m_pos + m_neg) / 2

# Define corrected primitive LDA function

primitive_lda <- function(X, m_pos, m_neg, m) {
# Ensure (m_pos - m_neg) is a column vector
w <- as.matrix(m_pos - m_neg) # Convert to column vector
w <- matrix(w, ncol = 1) # Ensure correct shape

# Compute discriminant function

scores <- X %*% w - drop(t(w) %*% m) # Ensure proper dimensions
# Assign class labels using sign function
predictions <- ifelse(scores > 0, 1, -1)
return(predictions)
}

# Make predictions on the test set

y_pred <- primitive_lda(X3_scaled, m_pos, m_neg, m)

accuracy <- mean(y_pred == y3)

print(paste("Test Accuracy of Primitive LDA:", round(accuracy * 100, 2),
"%"))

## [1] "Test Accuracy of Primitive LDA: 97.72 %"

Comment: The Primitive LDA classifier achieved 97.72% accuracy, which is significantly
better than our earlier logistic regression model (~9.68% accuracy). This suggests that
LDA’s linear decision boundary works well for this dataset.
Ques 5b
# Define Polynomial Kernel Function
poly_kernel <- function(X, Y, degree = 2, coef0 = 1) {
return((X %*% t(Y) + coef0)^degree) # Polynomial transformation
}

# Define function to train Kernelized LDA with Polynomial Kernel

kernel_lda_poly <- function(X_train, y_train, X_valid, y_valid, degree) {
# Compute Kernel Matrices
K_train <- poly_kernel(X_train, X_train, degree)
K_valid <- poly_kernel(X_valid, X_train, degree)

# Compute class means in feature space

m_pos <- colMeans(K_train[y_train == 1, , drop = FALSE]) # Ensure column
format
m_neg <- colMeans(K_train[y_train == -1, , drop = FALSE]) # Ensure column
format
m <- (m_pos + m_neg) / 2 # Middle point

# Ensure (m_pos - m_neg) is a column vector

w <- as.matrix(m_pos - m_neg)
w <- matrix(w, ncol = 1) # Ensure correct shape

# Compute decision function

scores <- K_valid %*% w - drop(t(w) %*% m) # Ensure proper dimensions

# Assign class labels using sign function

y_pred <- ifelse(scores > 0, 1, -1)
# Compute accuracy
return(mean(y_pred == y_valid))
}

# Try different polynomial degrees

degree_values <- seq(1, 5, by = 1) # Try degrees from 1 to 5
accuracies <- numeric(length(degree_values))

# Train and evaluate for each polynomial degree

for (i in seq_along(degree_values)) {
accuracies[i] <- kernel_lda_poly(X1_scaled, y1, X2_scaled, y2,
degree_values[i])
}

# Print best polynomial degree

best_degree <- degree_values[which.max(accuracies)]
print(paste("Best polynomial degree:", best_degree, "with Accuracy:",
max(accuracies)))

## [1] "Best polynomial degree: 2 with Accuracy: 0.963924050632911"

# Create a data frame for plotting

accuracy_df <- data.frame(Degree = degree_values, Accuracy = accuracies)

# Plot accuracy vs. polynomial degree

ggplot(accuracy_df, aes(x = Degree, y = Accuracy)) +
geom_line(color = "blue") +
geom_point(color = "red") +
labs(title = "Kernelized LDA Accuracy vs. Polynomial Degree",
x = "Polynomial Degree", y = "Validation Accuracy") +
theme_minimal()
Comment: Best Accuracy at Degree 2.
The highest validation accuracy (~96.5%) is at degree = 2. This suggests that a quadratic
decision boundary works best for the kernelized LDA. Accuracy Declines for Higher
Degrees.
As the polynomial degree increases beyond 2, the validation accuracy drops.
Ques 5c
# Merge training (D1) and validation (D2) to create D'
D_prime <- rbind(train_data, valid_data)

# Extract predictors and response variable

X_prime <- as.matrix(D_prime %>% select(-Class)) # Feature matrix X'
y_prime <- D_prime$Class # Labels for D'

# Extract test predictors (X3) and labels (y3)

X3 <- as.matrix(test_data %>% select(-Class))
y3 <- test_data$Class

# Standardize X' (Compute column means and standard deviations)

X_prime_scaled <- scale(X_prime)

# Extract means and standard deviations from X'

X_prime_mean <- attr(X_prime_scaled, "scaled:center")
X_prime_sd <- attr(X_prime_scaled, "scaled:scale")

# Scale X3 using the means and SDs from X'

X3_scaled <- scale(X3, center = X_prime_mean, scale = X_prime_sd)

# Use the best polynomial degree found previously

best_degree <- 2 # From previous tuning

# Compute Kernel Matrices

K_prime <- poly_kernel(X_prime_scaled, X_prime_scaled, best_degree)
K_test <- poly_kernel(X3_scaled, X_prime_scaled, best_degree)

# Compute class means in feature space

m_pos <- colMeans(K_prime[y_prime == 1, , drop = FALSE]) # Mean for class +1
m_neg <- colMeans(K_prime[y_prime == -1, , drop = FALSE]) # Mean for class -1
m <- (m_pos + m_neg) / 2 # Middle point

# Ensure correct shape for (m_pos - m_neg)

w <- as.matrix(m_pos - m_neg)
w <- matrix(w, ncol = 1) # Convert to column vector

# Compute decision function

scores <- K_test %*% w - drop(t(w) %*% m)

# Assign class labels using sign function

y_pred <- ifelse(scores > 0, 1, -1)

# Compute accuracy
test_accuracy <- mean(y_pred == y3)

# Print the test accuracy

print(paste("Test Accuracy with Polynomial Kernel (Degree =", best_degree,
"):", round(test_accuracy * 100, 2), "%"))

## [1] "Test Accuracy with Polynomial Kernel (Degree = 2 ): 95.63 %"

Comparison with Previous Models: Logistic Regression (glm()): Low (~9.68%) (Poor)
Primitive LDA (Without Kernel): ~97.72% (Worked well but was fully linear) Kernelized
LDA (Polynomial Kernel, Degree = 2): Validation Accuracy- 96.5% Test Accuracy- 95.63%
(Best performing model so far)

House Prices Prediction in King County
No ratings yet
House Prices Prediction in King County
10 pages
Machine Learning Masterclass
100% (11)
Machine Learning Masterclass
108 pages
Project-Predictive Modeling-Rajendra M Bhat
100% (3)
Project-Predictive Modeling-Rajendra M Bhat
14 pages
Cardiac Notes Nursing
94% (17)
Cardiac Notes Nursing
16 pages
Codigo R Diamantes
No ratings yet
Codigo R Diamantes
5 pages
Analyzing The Ionosphere Using R
No ratings yet
Analyzing The Ionosphere Using R
22 pages
Tute Exercise 5
No ratings yet
Tute Exercise 5
22 pages
Lec3 4 ML Project
No ratings yet
Lec3 4 ML Project
26 pages
Alam-Project3.1
No ratings yet
Alam-Project3.1
16 pages
ml labs
No ratings yet
ml labs
14 pages
20mia1032 A Sri Karthik Lab Model Exam Part 2
No ratings yet
20mia1032 A Sri Karthik Lab Model Exam Part 2
10 pages
K Nearest Neighbours (KNN) : Short Intro To KNN
No ratings yet
K Nearest Neighbours (KNN) : Short Intro To KNN
13 pages
G 203008076 - 4 - Christhian Quiñonez - Ex1 - 2 A PDF
No ratings yet
G 203008076 - 4 - Christhian Quiñonez - Ex1 - 2 A PDF
20 pages
Targeted Optimal Bidding
No ratings yet
Targeted Optimal Bidding
3 pages
AAAAAAAAAAAAAAAAAAAAAAAAA
No ratings yet
AAAAAAAAAAAAAAAAAAAAAAAAA
41 pages
Xgboost
No ratings yet
Xgboost
12 pages
profitanalysis
No ratings yet
profitanalysis
18 pages
Problem Set 3: General Guideline
No ratings yet
Problem Set 3: General Guideline
12 pages
Converted R
No ratings yet
Converted R
8 pages
Credit Card Fraud Detection
No ratings yet
Credit Card Fraud Detection
101 pages
2_DataPreProcessing_code
No ratings yet
2_DataPreProcessing_code
46 pages
Gdpforecast.r: Rehanshu Vij 2020-12-10
No ratings yet
Gdpforecast.r: Rehanshu Vij 2020-12-10
10 pages
credit card-fraud-detection
No ratings yet
credit card-fraud-detection
39 pages
Amta - Final - Notes.r: ### Step Wise AIC Regression
No ratings yet
Amta - Final - Notes.r: ### Step Wise AIC Regression
6 pages
Pratik Zanke Source Codes
No ratings yet
Pratik Zanke Source Codes
20 pages
Lab Wk1soln PDF
No ratings yet
Lab Wk1soln PDF
14 pages
Bai11 1 1
No ratings yet
Bai11 1 1
11 pages
Project-Report Sample
No ratings yet
Project-Report Sample
59 pages
Module 2
No ratings yet
Module 2
20 pages
Bestnormalize Package
No ratings yet
Bestnormalize Package
23 pages
ML Fundamentals
No ratings yet
ML Fundamentals
38 pages
Factor-Hair-Revised: Salma Mohiuddin 27/08/2019 Setting Up The Working Directoryy
No ratings yet
Factor-Hair-Revised: Salma Mohiuddin 27/08/2019 Setting Up The Working Directoryy
37 pages
ForecastingIndividualassignment MohammadMujtaba 12020063
No ratings yet
ForecastingIndividualassignment MohammadMujtaba 12020063
20 pages
Granger Causality and VAR Models
No ratings yet
Granger Causality and VAR Models
1 page
03 Multiple Linear Regression
No ratings yet
03 Multiple Linear Regression
7 pages
Practical4 Solution-1
No ratings yet
Practical4 Solution-1
9 pages
Pattern - Recognition - 3 - Code With Output
No ratings yet
Pattern - Recognition - 3 - Code With Output
7 pages
Da Lab It
No ratings yet
Da Lab It
20 pages
ML Assignment Presentation
No ratings yet
ML Assignment Presentation
37 pages
Gold Price Forecasting Using Time Series
100% (2)
Gold Price Forecasting Using Time Series
15 pages
ml lab exam document
No ratings yet
ml lab exam document
14 pages
07exercise Solution
No ratings yet
07exercise Solution
9 pages
Association Rules:: Books Data Set
No ratings yet
Association Rules:: Books Data Set
23 pages
Output
No ratings yet
Output
24 pages
Module 2notes
No ratings yet
Module 2notes
44 pages
Rugarch Introduction
No ratings yet
Rugarch Introduction
50 pages
Basic R Programming
No ratings yet
Basic R Programming
37 pages
ISYE6501-Homework-1
No ratings yet
ISYE6501-Homework-1
7 pages
Machine Learning Prediction
No ratings yet
Machine Learning Prediction
9 pages
Introduction To The Rugarch Package PDF
No ratings yet
Introduction To The Rugarch Package PDF
50 pages
Case Study
No ratings yet
Case Study
20 pages
Terro's REA
No ratings yet
Terro's REA
43 pages
Week 10
No ratings yet
Week 10
50 pages
ML Book Notes
No ratings yet
ML Book Notes
9 pages
TOBo ML
No ratings yet
TOBo ML
135 pages
One Hot Encoding
No ratings yet
One Hot Encoding
12 pages
Trading With Trend Reversal Indicators: DeMarker (DeM), Stochastic Oscillator, William's %R, Force Index (FI), Alligator Indicator: Trend Reversal Indicators
From Everand
Trading With Trend Reversal Indicators: DeMarker (DeM), Stochastic Oscillator, William's %R, Force Index (FI), Alligator Indicator: Trend Reversal Indicators
SmartMoney
No ratings yet
Trading With Trend Reversal Indicators: Commodity Channel Index (CCI), Average True Range (ATR), Williams Fractals, Relative Strength Index (RSI): Trend Reversal Indicators
From Everand
Trading With Trend Reversal Indicators: Commodity Channel Index (CCI), Average True Range (ATR), Williams Fractals, Relative Strength Index (RSI): Trend Reversal Indicators
SmartMoney
5/5 (1)
Microsoft Visual Basic Interview Questions: Microsoft VB Certification Review
From Everand
Microsoft Visual Basic Interview Questions: Microsoft VB Certification Review
Equity Press
No ratings yet
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
C Programming
From Everand
C Programming
Netra
No ratings yet
Limited Power of Attorney Form - Texas
No ratings yet
Limited Power of Attorney Form - Texas
1 page
Ch5 MC
No ratings yet
Ch5 MC
16 pages
Money Matters Essays
No ratings yet
Money Matters Essays
34 pages
Table Of: Explore The World
No ratings yet
Table Of: Explore The World
11 pages
MC 10140985 9999
No ratings yet
MC 10140985 9999
4 pages
Key Stage 4 Biology
No ratings yet
Key Stage 4 Biology
47 pages
Biology Holiday Homework
No ratings yet
Biology Holiday Homework
2 pages
Quiz With Answer and Rationale
No ratings yet
Quiz With Answer and Rationale
4 pages
Recipe - Chocolate Rugelach
No ratings yet
Recipe - Chocolate Rugelach
3 pages
DBL
No ratings yet
DBL
3 pages
Circuit Inspection DTC 37 NT Revolution Sensor Circuit Malfunction (Speed Sensor NT)
No ratings yet
Circuit Inspection DTC 37 NT Revolution Sensor Circuit Malfunction (Speed Sensor NT)
2 pages
CPL Flying Training Quotation
No ratings yet
CPL Flying Training Quotation
1 page
Chapter1 Thermochemistry (Revised)
No ratings yet
Chapter1 Thermochemistry (Revised)
87 pages
Organic Medicinal Chemistry: Functionalization
No ratings yet
Organic Medicinal Chemistry: Functionalization
25 pages
Mop Up Round Seats Matrix SS Counelling - 2022 D.M
No ratings yet
Mop Up Round Seats Matrix SS Counelling - 2022 D.M
22 pages
Electricity Test Answers
No ratings yet
Electricity Test Answers
3 pages
Risk Assessment For Living Room
0% (1)
Risk Assessment For Living Room
2 pages
Tutorial 7 Sample Size Calculation Dr. Shaik Shaffi: Dr. Rufaidah Dabbagh Dr. Nurah Al Amro
No ratings yet
Tutorial 7 Sample Size Calculation Dr. Shaik Shaffi: Dr. Rufaidah Dabbagh Dr. Nurah Al Amro
27 pages
Soil Classification
No ratings yet
Soil Classification
12 pages
BIO 111: Biomolecules and Cells: Welcome To
No ratings yet
BIO 111: Biomolecules and Cells: Welcome To
21 pages
Cultivation Practices of Cowpea (Vigna Unguiculata - L)
No ratings yet
Cultivation Practices of Cowpea (Vigna Unguiculata - L)
16 pages
OXYMAG
No ratings yet
OXYMAG
2 pages
Deg DB en 9000NT 7000NT As
No ratings yet
Deg DB en 9000NT 7000NT As
2 pages
Stats
No ratings yet
Stats
5 pages
Green Steam Supply ITC Munger Dairy Division
No ratings yet
Green Steam Supply ITC Munger Dairy Division
2 pages
6 Authentic Leadership
No ratings yet
6 Authentic Leadership
12 pages
Maintenance Schedule Housekeeping
100% (2)
Maintenance Schedule Housekeeping
6 pages
Invoice: Soswholesale
0% (1)
Invoice: Soswholesale
16 pages
Forensic Psychology Notess
No ratings yet
Forensic Psychology Notess
34 pages