0% found this document useful (0 votes)

6 views

Concrete Strength.ipynb - Colab

The document outlines a machine learning workflow for predicting concrete strength using a dataset that includes various components of concrete. It details steps for data preprocessing, including handling missing values and outliers, normalizing data, and splitting the dataset into training and testing sets. The document also describes the training and evaluation of different regression models, including Decision Tree, Random Forest, and XGBoost, along with their performance metrics.

Uploaded by

251SAYEE REKHE

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views

Concrete Strength.ipynb - Colab

Uploaded by

251SAYEE REKHE

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

11/8/24, 3:48 PM concrete strength.

ipynb - Colab

# Import necessary libraries

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor
from xgboost import XGBRegressor
from sklearn.metrics import mean_absolute_error, r2_score, mean_squared_error
from sklearn.preprocessing import StandardScaler
from scipy.stats import zscore

# Load the dataset

data = pd.read_csv('/content/ConcreteStrengthData.csv')

# Display the first few rows of the data

print("Initial Data:\n", data.head())

Initial Data:
CementComponent BlastFurnaceSlag FlyAshComponent WaterComponent \
0 540.0 0.0 0.0 162.0
1 540.0 0.0 0.0 162.0
2 332.5 142.5 0.0 228.0
3 332.5 142.5 0.0 228.0
4 198.6 132.4 0.0 192.0

SuperplasticizerComponent CoarseAggregateComponent \
0 2.5 1040.0
1 2.5 1055.0
2 0.0 932.0
3 0.0 932.0
4 0.0 978.4

FineAggregateComponent AgeInDays Strength

0 676.0 28 79.99
1 676.0 28 61.89
2 594.0 270 40.27
3 594.0 365 41.05
4 825.5 360 44.30

# Step 1: Handle Missing Values

missing_value_strategy = input("Enter 'remove' to drop missing values or 'median' to fill
if missing_value_strategy == 'remove':
data = data.dropna()
print("Missing values removed.")
elif missing_value_strategy == 'median':
data = data.fillna(data.median())
print("Missing values filled with median.")

Enter 'remove' to drop missing values or 'median' to fill with median: median
Missing values filled with median.

https://colab.research.google.com/drive/1OaThaXg5S2tBw172V5NmebklkrofR9HK#scrollTo=HrAqhLmHxtqv&printMode=true 1/7
11/8/24, 3:48 PM concrete strength.ipynb - Colab

# Display data after handling missing values

print("\nData after handling missing values:\n", data.head())

Data after handling missing values:

CementComponent BlastFurnaceSlag FlyAshComponent WaterComponent \
0 540.0 0.0 0.0 162.0
1 540.0 0.0 0.0 162.0
2 332.5 142.5 0.0 228.0
3 332.5 142.5 0.0 228.0
4 198.6 132.4 0.0 192.0

SuperplasticizerComponent CoarseAggregateComponent \
0 2.5 1040.0
1 2.5 1055.0
2 0.0 932.0
3 0.0 932.0
4 0.0 978.4

FineAggregateComponent AgeInDays Strength

0 676.0 28 79.99
1 676.0 28 61.89
2 594.0 270 40.27
3 594.0 365 41.05
4 825.5 360 44.30

# Step 2: Handle Outliers using IQR and Z-score

def remove_outliers_iqr(df):
Q1 = df.quantile(0.25)
Q3 = df.quantile(0.75)
IQR = Q3 - Q1
return df[~((df < (Q1 - 1.5 * IQR)) | (df > (Q3 + 1.5 * IQR))).any(axis=1)]

def remove_outliers_zscore(df, threshold=3):

z_scores = np.abs(zscore(df))
return df[(z_scores < threshold).all(axis=1)]

outlier_method = input("Enter 'iqr' for IQR method or 'zscore' for Z-score method to hand
if outlier_method == 'iqr':
data = remove_outliers_iqr(data)
print("Outliers handled using IQR.")
elif outlier_method == 'zscore':
data = remove_outliers_zscore(data)
print("Outliers handled using Z-score.")

Enter 'iqr' for IQR method or 'zscore' for Z-score method to handle outliers: zscore
Outliers handled using Z-score.

# Display data after handling outliers

print("\nData after handling outliers:\n", data.head())

https://colab.research.google.com/drive/1OaThaXg5S2tBw172V5NmebklkrofR9HK#scrollTo=HrAqhLmHxtqv&printMode=true 2/7
11/8/24, 3:48 PM concrete strength.ipynb - Colab

Data after handling outliers:

CementComponent BlastFurnaceSlag FlyAshComponent WaterComponent \
0 540.0 0.0 0.0 162.0
1 540.0 0.0 0.0 162.0
5 266.0 114.0 0.0 228.0
7 380.0 95.0 0.0 228.0
8 266.0 114.0 0.0 228.0

SuperplasticizerComponent CoarseAggregateComponent \
0 2.5 1040.0
1 2.5 1055.0
5 0.0 932.0
7 0.0 932.0
8 0.0 932.0

FineAggregateComponent AgeInDays Strength

0 676.0 28 79.99
1 676.0 28 61.89
5 670.0 90 47.03
7 594.0 28 36.45
8 670.0 28 45.85

# Step 3: Normalize Data

scaler = StandardScaler()
scaled_data = scaler.fit_transform(data)
data = pd.DataFrame(scaled_data, columns=data.columns)
print("\nData after normalization:\n", data.head())

Data after normalization:

CementComponent BlastFurnaceSlag FlyAshComponent WaterComponent \
0 2.560014 -0.858514 -0.88112 -0.931973
1 2.560014 -0.858514 -0.88112 -0.931973
2 -0.112045 0.480231 -0.88112 2.346817
3 0.999687 0.257107 -0.88112 2.346817
4 -0.112045 0.480231 -0.88112 2.346817

SuperplasticizerComponent CoarseAggregateComponent \
0 -0.673726 0.839761
1 -0.673726 1.032749
2 -1.129625 -0.549747
3 -1.129625 -0.549747
4 -1.129625 -0.549747

FineAggregateComponent AgeInDays Strength

0 -1.288508 -0.229254 2.672454
1 -1.288508 -0.229254 1.590217
2 -1.365815 1.453139 0.701707
3 -2.345042 -0.229254 0.069106
4 -1.365815 -0.229254 0.631152

# Splitting the data into features and target

X = data.drop(columns=['Strength']) # 'Strength' is the target column
y = data['Strength']

https://colab.research.google.com/drive/1OaThaXg5S2tBw172V5NmebklkrofR9HK#scrollTo=HrAqhLmHxtqv&printMode=true 3/7
11/8/24, 3:48 PM concrete strength.ipynb - Colab

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Verify the shapes of the split datasets

print("Training feature set shape:", X_train.shape)
print("Testing feature set shape:", X_test.shape)
print("Training target set shape:", y_train.shape)
print("Testing target set shape:", y_test.shape)

Training feature set shape: (784, 8)

Testing feature set shape: (197, 8)
Training target set shape: (784,)
Testing target set shape: (197,)

# Define Models
models = {
"Decision Tree": DecisionTreeRegressor(random_state=42),
"Random Forest": RandomForestRegressor(random_state=42),
"XGBoost": XGBRegressor(random_state=42, objective='reg:squarederror')
}

# Hyperparameter tuning for Random Forest and XGBoost using GridSearchCV

rf_param_grid = {
'n_estimators': [50, 100, 200],
'max_depth': [10, 20, None],
'min_samples_split': [2, 5, 10]
}

xgb_param_grid = {
'n_estimators': [50, 100, 200],
'max_depth': [3, 5, 10],
'learning_rate': [0.01, 0.1, 0.3]
}

# Tune and fit models

for model_name, model in models.items():
if model_name == "Random Forest":
grid_search = GridSearchCV(estimator=model, param_grid=rf_param_grid, cv=3, scori
grid_search.fit(X_train, y_train)
models[model_name] = grid_search.best_estimator_
print(f"\n{model_name} Best Parameters: {grid_search.best_params_}")
elif model_name == "XGBoost":
grid_search = GridSearchCV(estimator=model, param_grid=xgb_param_grid, cv=3, scor
grid_search.fit(X_train, y_train)
models[model_name] = grid_search.best_estimator_
print(f"\n{model_name} Best Parameters: {grid_search.best_params_}")
else:
model.fit(X_train, y_train)

Random Forest Best Parameters: {'max_depth': None, 'min_samples_split': 2, 'n_estimat

https://colab.research.google.com/drive/1OaThaXg5S2tBw172V5NmebklkrofR9HK#scrollTo=HrAqhLmHxtqv&printMode=true 4/7
11/8/24, 3:48 PM concrete strength.ipynb - Colab
XGBoost Best Parameters: {'learning_rate': 0.3, 'max_depth': 5, 'n_estimators': 200}

# Model Evaluation Function

def evaluate_model(model, X_test, y_test):
predictions = model.predict(X_test)
mae = mean_absolute_error(y_test, predictions)
r2 = r2_score(y_test, predictions)
rmse = np.sqrt(mean_squared_error(y_test, predictions))
return mae, r2, rmse

# Evaluate each model

for model_name, model in models.items():
mae, r2, rmse = evaluate_model(model, X_test, y_test)
print(f"\n{model_name} Evaluation Metrics:")
print(f" MAE: {mae}\n R²: {r2}\n RMSE: {rmse}")

Decision Tree Evaluation Metrics:

MAE: 0.3109036376665524
R²: 0.7531568581871906
RMSE: 0.4824773835905145

Random Forest Evaluation Metrics:

MAE: 0.23158315377649288
R²: 0.8737109631559321
RMSE: 0.3451034120134297

XGBoost Evaluation Metrics:

MAE: 0.1958459295404801
R²: 0.8916473524633806
RMSE: 0.3196584515582839

# Delta Buckets Evaluation for Each Model

def delta_buckets(y_true, y_pred, delta=10):
differences = np.abs(y_true - y_pred)
within_delta = np.sum(differences <= delta) / len(differences)
return within_delta * 100 # Percentage of predictions within delta

# Display Delta Bucket Evaluation

for model_name, model in models.items():
y_pred = model.predict(X_test)
delta_10 = delta_buckets(y_test, y_pred, delta=10)
print(f"\n{model_name} - Percentage of predictions within ±10 units of true value: {d

Decision Tree - Percentage of predictions within ±10 units of true value: 100.0%

Random Forest - Percentage of predictions within ±10 units of true value: 100.0%

https://colab.research.google.com/drive/1OaThaXg5S2tBw172V5NmebklkrofR9HK#scrollTo=HrAqhLmHxtqv&printMode=true 5/7
11/8/24, 3:48 PM concrete strength.ipynb - Colab
XGBoost - Percentage of predictions within ±10 units of true value: 100.0%

# Function to evaluate model performance

def evaluate_model(model, X_train, X_test, y_train, y_test):
model.fit(X_train, y_train)
predictions = model.predict(X_test)
mae = mean_absolute_error(y_test, predictions)
mse = mean_squared_error(y_test, predictions)
r2 = r2_score(y_test, predictions)
return mae, mse, r2

# Initialize models
models = {
"Decision Tree": DecisionTreeRegressor(random_state=42),
"Random Forest": RandomForestRegressor(n_estimators=100, random_state=42),
"XGBoost": XGBRegressor(n_estimators=100, random_state=42, objective='reg:squarederro
}

# Dictionary to store model performance

performance = {}

# Evaluate each model

for model_name, model in models.items():
mae, mse, r2 = evaluate_model(model, X_train, X_test, y_train, y_test)
performance[model_name] = {
"Mean Absolute Error": mae,
"Mean Squared Error": mse,
"R^2 Score": r2
}
print(f"{model_name} Performance:")
print(f"Mean Absolute Error: {mae:.2f}")
print(f"Mean Squared Error: {mse:.2f}")
print(f"R^2 Score: {r2:.2f}")
print("-" * 30)

# Display performance comparison

import pandas as pd
performance_df = pd.DataFrame(performance).T
print("\nPerformance Comparison:")
print(performance_df)

Decision Tree Performance:

Mean Absolute Error: 0.31
Mean Squared Error: 0.23
R^2 Score: 0.75
------------------------------
Random Forest Performance:
Mean Absolute Error: 0.23
Mean Squared Error: 0.12
R^2 Score: 0.87
------------------------------
XGBoost Performance:
Mean Absolute Error: 0.20
Mean Squared Error: 0.11
R^2 Score: 0.89

https://colab.research.google.com/drive/1OaThaXg5S2tBw172V5NmebklkrofR9HK#scrollTo=HrAqhLmHxtqv&printMode=true 6/7
11/8/24, 3:48 PM concrete strength.ipynb - Colab
------------------------------

Performance Comparison:
Mean Absolute Error Mean Squared Error R^2 Score
Decision Tree 0.310904 0.232784 0.753157
Random Forest 0.230687 0.120433 0.872293
XGBoost 0.199964 0.107200 0.886325

https://colab.research.google.com/drive/1OaThaXg5S2tBw172V5NmebklkrofR9HK#scrollTo=HrAqhLmHxtqv&printMode=true 7/7

(Feature Engineering) (Extended-Cheatsheet)
No ratings yet
(Feature Engineering) (Extended-Cheatsheet)
9 pages
Beatrice Warde Typography Crystal
100% (1)
Beatrice Warde Typography Crystal
27 pages
Project Management APPLE INC
0% (2)
Project Management APPLE INC
13 pages
Noise CRTN PDF
No ratings yet
Noise CRTN PDF
39 pages
Untitled 25
No ratings yet
Untitled 25
15 pages
Pre-Processing techniques.ipynb - Colab
No ratings yet
Pre-Processing techniques.ipynb - Colab
3 pages
vertopal.com_model_training
No ratings yet
vertopal.com_model_training
6 pages
air-quality-randomforest
No ratings yet
air-quality-randomforest
5 pages
ML LAB 12 - Jupyter Notebook
No ratings yet
ML LAB 12 - Jupyter Notebook
11 pages
Untitled 25
No ratings yet
Untitled 25
8 pages
Data Analysis Dummy Report: 0. Data Import and Cleaning
No ratings yet
Data Analysis Dummy Report: 0. Data Import and Cleaning
1 page
20MIS1025 - Comparative Analysis - Ipynb - Colaboratory
No ratings yet
20MIS1025 - Comparative Analysis - Ipynb - Colaboratory
6 pages
23111462_unit3
No ratings yet
23111462_unit3
7 pages
Student - Linear Regression Example - Colaboratory
No ratings yet
Student - Linear Regression Example - Colaboratory
6 pages
Tables Perf
No ratings yet
Tables Perf
3 pages
vertopal.com_IBA Practical Set A 14th Dec
No ratings yet
vertopal.com_IBA Practical Set A 14th Dec
3 pages
Karisma_23011101119_eda_rec
No ratings yet
Karisma_23011101119_eda_rec
88 pages
Untitled 23
No ratings yet
Untitled 23
4 pages
PCA File
No ratings yet
PCA File
7 pages
Introduction to Neural Networks
No ratings yet
Introduction to Neural Networks
4 pages
EXP-2 ML
No ratings yet
EXP-2 ML
6 pages
Ass 3 - Best
No ratings yet
Ass 3 - Best
13 pages
Keeraiit 2
No ratings yet
Keeraiit 2
19 pages
k-fold
No ratings yet
k-fold
2 pages
Practical 1 ML_removed
No ratings yet
Practical 1 ML_removed
5 pages
0.45 Poxer Chart For Blends
No ratings yet
0.45 Poxer Chart For Blends
5 pages
Numpy Cheatsheet
No ratings yet
Numpy Cheatsheet
11 pages
DWM Practical
No ratings yet
DWM Practical
12 pages
Coding An
No ratings yet
Coding An
19 pages
3rd Semester DDM AI DAA DEV Print Pages For Spiral Record 25-1-24 - Removed
No ratings yet
3rd Semester DDM AI DAA DEV Print Pages For Spiral Record 25-1-24 - Removed
28 pages
Clustering Documentation Python Code
No ratings yet
Clustering Documentation Python Code
8 pages
STR 2 HJJHJYTHTY4 GTTGHTHTNGNN
No ratings yet
STR 2 HJJHJYTHTY4 GTTGHTHTNGNN
151 pages
Python - How To Make A 4d Plot With Matplotlib Using Arbitrary Data - Stack Overflow
No ratings yet
Python - How To Make A 4d Plot With Matplotlib Using Arbitrary Data - Stack Overflow
13 pages
Concrete Data
No ratings yet
Concrete Data
50 pages
EXP-2
No ratings yet
EXP-2
6 pages
HHHHH
No ratings yet
HHHHH
8 pages
_MLP_reg_improved pdf
No ratings yet
_MLP_reg_improved pdf
38 pages
EDA Plots Code
No ratings yet
EDA Plots Code
13 pages
MLTHEORY_2
No ratings yet
MLTHEORY_2
14 pages
Data_Analyzer
No ratings yet
Data_Analyzer
10 pages
LT05 L1TP 129057 20101224 20161011 01 T1 Ver
No ratings yet
LT05 L1TP 129057 20101224 20161011 01 T1 Ver
7 pages
ML lab manual 1-10
No ratings yet
ML lab manual 1-10
58 pages
Cofe2o4 Jcpds Data Card
No ratings yet
Cofe2o4 Jcpds Data Card
3 pages
DATA SCIENCE IDC 302 End Sem Project
No ratings yet
DATA SCIENCE IDC 302 End Sem Project
1 page
Big Data Analysis
No ratings yet
Big Data Analysis
38 pages
Wine Quality Prediction
No ratings yet
Wine Quality Prediction
6 pages
Code
No ratings yet
Code
5 pages
ASSi2 DSBDA
No ratings yet
ASSi2 DSBDA
4 pages
Https Raw - Githubusercontent.com Joelgrus Data-Science-From-Scratch Master Code Working With Data
No ratings yet
Https Raw - Githubusercontent.com Joelgrus Data-Science-From-Scratch Master Code Working With Data
7 pages
D3 docs
No ratings yet
D3 docs
6 pages
Dimentionality Reduction Implementation
No ratings yet
Dimentionality Reduction Implementation
8 pages
Mini Projects 1-3-Satyaki Mitra
No ratings yet
Mini Projects 1-3-Satyaki Mitra
33 pages
Day 10 Pandasdatacleaning
No ratings yet
Day 10 Pandasdatacleaning
6 pages
ABAQUSVUMAT初学者用户子程序小例子
No ratings yet
ABAQUSVUMAT初学者用户子程序小例子
15 pages
Mini Project With Output
No ratings yet
Mini Project With Output
8 pages
DA lab
No ratings yet
DA lab
27 pages
Midas Gen: Project Title
No ratings yet
Midas Gen: Project Title
4 pages
JCPDScardno 024-0735
No ratings yet
JCPDScardno 024-0735
3 pages
Topic 7 Panel Data Analysis II
No ratings yet
Topic 7 Panel Data Analysis II
45 pages
Data Toolkit Assignment
No ratings yet
Data Toolkit Assignment
11 pages
IDM Assignment
No ratings yet
IDM Assignment
15 pages
Anatomy of a Robot
From Everand
Anatomy of a Robot
Charles Bergren
2.5/5 (1)
Math Puzzlers - Grade 4
From Everand
Math Puzzlers - Grade 4
Wilai & William Crouch
No ratings yet
Practice Problems 2
No ratings yet
Practice Problems 2
2 pages
Bailey Decker Resume
No ratings yet
Bailey Decker Resume
2 pages
Oup Lead2pass Og0-091 Sample Question 2022-Dec-01 by Hilary 137q Vce
No ratings yet
Oup Lead2pass Og0-091 Sample Question 2022-Dec-01 by Hilary 137q Vce
10 pages
Excel Exersizes
No ratings yet
Excel Exersizes
14 pages
IBMJCE
No ratings yet
IBMJCE
39 pages
Alternator+Components 2015-2016
100% (2)
Alternator+Components 2015-2016
772 pages
Continue: English Grammar Book PDF Download in Urdu
No ratings yet
Continue: English Grammar Book PDF Download in Urdu
3 pages
Ibps-Po 22 T2 01 Q
No ratings yet
Ibps-Po 22 T2 01 Q
60 pages
PerDev Module 5
No ratings yet
PerDev Module 5
5 pages
Gulfstream G650Er: Maintenance Manual
No ratings yet
Gulfstream G650Er: Maintenance Manual
20 pages
History Notes Ancient India
No ratings yet
History Notes Ancient India
110 pages
Segmentation Targeting and Positioning
No ratings yet
Segmentation Targeting and Positioning
27 pages
C01 First Test (Classification and Behaviour of Cost)
100% (1)
C01 First Test (Classification and Behaviour of Cost)
5 pages
Sapnote 0000838402
No ratings yet
Sapnote 0000838402
3 pages
Research on Steel Structure
No ratings yet
Research on Steel Structure
9 pages
Short History of The Roman Mass
100% (2)
Short History of The Roman Mass
18 pages
Masoneilan Catalogo Svi Ii Esd
No ratings yet
Masoneilan Catalogo Svi Ii Esd
12 pages
How To Migrate Data Into Oracle Applications - Oracle E-Business Financial
No ratings yet
How To Migrate Data Into Oracle Applications - Oracle E-Business Financial
14 pages
CRM Maruti Suzuki
No ratings yet
CRM Maruti Suzuki
20 pages
Department of Botany Government College University, Lahore: Sahar Tariq Farooq ROLL NO: 1004-BH2-BOT-08 SESSION 2008-2010
No ratings yet
Department of Botany Government College University, Lahore: Sahar Tariq Farooq ROLL NO: 1004-BH2-BOT-08 SESSION 2008-2010
12 pages
17 Inspissator Finthfhg
No ratings yet
17 Inspissator Finthfhg
6 pages
BAPCH-bpcc103-104 2020-21
No ratings yet
BAPCH-bpcc103-104 2020-21
7 pages
Whole Life Mid Cap Equity Fund: ULIF 009 04/01/07 WLE 110 Fund Assure, Investment Report, January 2021
No ratings yet
Whole Life Mid Cap Equity Fund: ULIF 009 04/01/07 WLE 110 Fund Assure, Investment Report, January 2021
1 page
The Froogle Dba
No ratings yet
The Froogle Dba
56 pages
The Effect of Diversification Strategy On Organizational Performance
No ratings yet
The Effect of Diversification Strategy On Organizational Performance
12 pages
Schematic Biostar H61MGV3
100% (2)
Schematic Biostar H61MGV3
41 pages
Ashwin Nair Microentreprenuers
No ratings yet
Ashwin Nair Microentreprenuers
3 pages