Insurance Premium Prediction
Insurance Premium Prediction
In this project, we aim to predict insurance premiums for individuals by analyzing their health
data. We employed models like the Regression model, Tree algorithm, and Bagging and
Boosting Algorithm to accomplish this. We used these models to compare and contrast their
performance.
The training dataset was used to train the model, and the predictions made by the model were
compared with actual data to test and verify the model's accuracy. After reaching the accuracy
of all the models, it was determined that the GradientBoostingRegressor and
RandomForestRegressor algorithms performed better than the remaining models.
Ultimately, we found that the GradientBoostingRegressor algorithm was the best suited for this
task as it provided the best evaluation score compared to the other models.
Insurance Premium Prediction
Introduction
1.1 Why this High-Level Design Document?
The purpose of this High-Level Design (HLD) Document is to add the necessary
detail to the current project description to represent a suitable model for
coding. This document is also intended to help detect contradictions prior to
coding, and can be used as a reference manual for how the modules interact at
a high level.
1.2 Scope
The HLD documentation presents the structure of the system, such as the
database architecture, application architecture (layers), application flow
(Navigation), and technology architecture. The HLD uses non-technical to
mildly-technical terms which should be understandable to the administrators
of the system.
Insurance Premium Prediction
1.3 Definitions
Term Description
UGV Unmanned Ground Vehicle
Database Collection of all the information monitored by this system
IDE AWS Integrated Development Environment
Amazon Web Services
Insurance Premium Prediction
For training the model, the following system requirements are preferred:
• 4 GB RAM or more.
The data requirements for this project will depend on the specific problem
statement. A CSV file will be used as the input file, and the feature/field names
and sequence should be followed as decided. It's important to have a clear
understanding of the problem statement and the data that is required to solve
it, to design a suitable data pipeline, and to train the model effectively.
Pandas is an open-source Python package that is widely used for data analysis
and machine-learning tasks.
NumPy is the most commonly used package for scientific computing in
Python.
Plotly is an open-source data visualization library used to create interactive
and quality charts/graphs.
Scikit-learn is used for machine learning.
Flask is used to build API.
VS Code is used as an IDE (Integrated Development Environment)
GitHub is used as a version control system.
Front-end development is done using HTML and CSS.
Railway is used for the deployment of the model.
2.7 Constraints
It is useful for the user by predicting Insurance prices based on their provided
details for ex: - Bmi, sex, smoker, yes/no, age, etc.
2.8 Assumptions
The main objective of the project is to develop an API to predict the premium for
people based on their health information. A machine learning-based regression
model is used for predicting the above-mentioned cases on the input data.
Insurance Premium Prediction
3 Design Details
3.1 Process Flow
The system should log every event so that the user will know what
process is running internally.
4 Performance.
4.1 Reusability
The entire solution will be done in a modular fashion and will be API oriented.
So, in the case of scaling the application, the components are completely
reusable.
The interaction with the application is done through the designed user interface,
which the end user can access through any web browser.
4.3 Deployment
Insurance Premium Prediction
5 Dashboards
A dashboard is a data visualization and analysis tool that displays on one screen the
status of key performance indicators (KPIs) and other important business metrics.
6 Conclusion
This system shows us the different techniques that are used to estimate the how
much amount of premium required based on individual health situations. After
analysis, it shows how a smoker and non-smokers affect the amount of estimate.
Also, a significant difference between male and female expenses. Accuracy plays
a key role in prediction-based systems. From the results, we could see that Gradient
Boosting turned out to be the best working model for this problem in terms of
accuracy. Our predictions help users to know how much amount premium they
need based on their current health situation.