0% found this document useful (0 votes)
78 views

Knime - Project Report

The document describes a project to predict a country's Environmental Performance Index score using regression analysis on various environmental measures. The analysis required importing an Excel dataset, removing unnecessary columns and duplicate rows, normalizing the data, and splitting it into training and test sets. A random forest regression model was trained on the training set and achieved 97.4% accuracy on the test set, demonstrating it can effectively predict a country's EPI score based on environmental indicators. Normalization of the data was found to improve model accuracy. The project aims to help identify factors contributing to high environmental performance.

Uploaded by

Ansh Rohatgi
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
78 views

Knime - Project Report

The document describes a project to predict a country's Environmental Performance Index score using regression analysis on various environmental measures. The analysis required importing an Excel dataset, removing unnecessary columns and duplicate rows, normalizing the data, and splitting it into training and test sets. A random forest regression model was trained on the training set and achieved 97.4% accuracy on the test set, demonstrating it can effectively predict a country's EPI score based on environmental indicators. Normalization of the data was found to improve model accuracy. The project aims to help identify factors contributing to high environmental performance.

Uploaded by

Ansh Rohatgi
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 11

Business Intelligence and Data Visualization Lab

Manual
CSL 232

Knime Project Report

Faculty name: Dr. Poonam Chaudhary

Student name: Harshita Bhatia and Ankit Jhangu

Roll No.: 20csu305 & 20csu365

Semester: 5th

Group: DS B

Department of Computer Science and Engineering


The NorthCap University, Gurugram- 122001, India
Session 2022-23
BIDV Lab Manual (CSL 232) | 1
2022-23

Table of Contents
S.No Page No.

1. Project Description 2

2. Problem Statement
3
3. Analysis

3.1 Hardware Requirements

3.2 Software Requirements 3


4. Design 3

5. Implementation and Testing (stage/module wise) 4

6. Output (Screenshots) 5

7. Conclusion and Future Scope 10


BIDV Lab Manual (CSL 232) | 2
2022-23
1. Project Description
The Environmental Performance Index is a global rating system that ranks nations based on their
environmental health. It provides a data-driven evaluation of the global level of sustainability.

The Environmental Performance Index (EPI) ranks 180 countries on 32 performance indicators
in the following 11 issue categories: air quality, sanitation and drinking water, heavy metals,
waste management, biodiversity and habitat, ecosystem services, fisheries, climate change,
pollution emissions, agriculture, and water resources. These categories track performance and
progress on two broad policy objectives, environmental health, and ecosystem vitality.

EPI measures help to identify issues, define goals, follow trends, understand outcomes, and
identify effective policy methods.

The Environmental Performance Index (EPI) statistics show that financial resources, excellent
governance, human development, and regulatory quality all play a role in boosting a country’s
sustainability. EPI helps decision-makers to identify all these factors that contribute to top-tier
performance.

By emphasizing these connections, the EPI contributes to the promotion of sustainable


development in support of a more ecologically secure and equitable future.

About Dataset

The dataset contains 181 rows and 1352 columns, some of which are country
name, code, region, eu27, g20, environmental performance index, air quality,
environmental health, household solid fuels, ozone exposure, sanitation and
drinking water, ecosystem vitality, biodiversity and habitat, ecosystem services and
many other.
BIDV Lab Manual (CSL 232) | 3
2022-23
2. Problem Statement:

Predicting the Environmental Performance Index score from the given measures
using regression analysis.

3. Analysis

3.1. Hardware Requirements


A 64-bit operating system with at least 32GB RAM and 8 CPU cores as minimum

3.2. Software Requirements


Knime analytics platform

4. Design
The following steps were taken to get the best model accuracy:

 Importing excel dataset


 Removing unnecessary columns
 Removing duplicate rows
 Normalizing the dataset
 Splitting data into train and test data
 Using model learner
 Model prediction
 Checking model accuracy
BIDV Lab Manual (CSL 232) | 4
2022-23
5. Implementation and Testing (stage/module wise)

a) Excel Reader
Reading the excel file using this node.

b) Column Filter
Removing unnecessary columns

c) Normalizer
Normalizing the data using min-max normalization

d) Partitioning
Dividing the dataset into two parts: 80% of training data and 20% of test data

e) Random Forest Learner (Regression)


Applying random forest technique on the training dataset to train the model. The EPI
score is taken as the target variable.

f) Random Forest Predictor (Regression)


Applying model to the test data.

g) Numeric Scorer
Finding the accuracy of the model
BIDV Lab Manual (CSL 232) | 5
2022-23
6. Output (Screenshots)
File Table

Filtered table
BIDV Lab Manual (CSL 232) | 6
2022-23

Normalized table

Partitioning

- train data
BIDV Lab Manual (CSL 232) | 7
2022-23

-test data

Simple Regression Tree learner


BIDV Lab Manual (CSL 232) | 8
2022-23

Statistics:

Random Forest Learner


BIDV Lab Manual (CSL 232) | 9
2022-23

Statistics:
BIDV Lab Manual (CSL 232) | 10
2022-23

7. Conclusion

Firstly, we applied both the techniques (Random Forest and simple regression tree
learner) on our dataset without normalization. The accuracy was:
Random Forest: 94.8%
Simple Regression tree learner: 91.4%

After normalization, the accuracy changed to:


Random Forest: 97.4%
Simple Regression tree learner: 95%

Therefore, we need to normalize the data. We can clearly see from the above accuracy
scores that Random Forest is better.

You might also like