Knime - Project Report
Knime - Project Report
Manual
CSL 232
Semester: 5th
Group: DS B
Table of Contents
S.No Page No.
1. Project Description 2
2. Problem Statement
3
3. Analysis
6. Output (Screenshots) 5
The Environmental Performance Index (EPI) ranks 180 countries on 32 performance indicators
in the following 11 issue categories: air quality, sanitation and drinking water, heavy metals,
waste management, biodiversity and habitat, ecosystem services, fisheries, climate change,
pollution emissions, agriculture, and water resources. These categories track performance and
progress on two broad policy objectives, environmental health, and ecosystem vitality.
EPI measures help to identify issues, define goals, follow trends, understand outcomes, and
identify effective policy methods.
The Environmental Performance Index (EPI) statistics show that financial resources, excellent
governance, human development, and regulatory quality all play a role in boosting a country’s
sustainability. EPI helps decision-makers to identify all these factors that contribute to top-tier
performance.
About Dataset
The dataset contains 181 rows and 1352 columns, some of which are country
name, code, region, eu27, g20, environmental performance index, air quality,
environmental health, household solid fuels, ozone exposure, sanitation and
drinking water, ecosystem vitality, biodiversity and habitat, ecosystem services and
many other.
BIDV Lab Manual (CSL 232) | 3
2022-23
2. Problem Statement:
Predicting the Environmental Performance Index score from the given measures
using regression analysis.
3. Analysis
4. Design
The following steps were taken to get the best model accuracy:
a) Excel Reader
Reading the excel file using this node.
b) Column Filter
Removing unnecessary columns
c) Normalizer
Normalizing the data using min-max normalization
d) Partitioning
Dividing the dataset into two parts: 80% of training data and 20% of test data
g) Numeric Scorer
Finding the accuracy of the model
BIDV Lab Manual (CSL 232) | 5
2022-23
6. Output (Screenshots)
File Table
Filtered table
BIDV Lab Manual (CSL 232) | 6
2022-23
Normalized table
Partitioning
- train data
BIDV Lab Manual (CSL 232) | 7
2022-23
-test data
Statistics:
Statistics:
BIDV Lab Manual (CSL 232) | 10
2022-23
7. Conclusion
Firstly, we applied both the techniques (Random Forest and simple regression tree
learner) on our dataset without normalization. The accuracy was:
Random Forest: 94.8%
Simple Regression tree learner: 91.4%
Therefore, we need to normalize the data. We can clearly see from the above accuracy
scores that Random Forest is better.