Sales Analytics
Sales Analytics
The project is about sales and marketing of a Pharmaceutical Company where we have
to analyze a dataset and find insights with the help of some use cases and try to make
an end to end pipeline.
Requirements :
Docker Desktop: 4.14.0 (91374)
PostgreSql: 15.1
Getting Started
Setting Up Docker
Download Docker Desktop for Mac or Windows. Docker Compose will be
automatically installed.
Setting Up Airbyte
Run this in Directory/Command Prompt
cd airbyte
docker-compose up
Once you see an Airbyte banner, the UI is ready to go at http://localhost:8000 You will
be asked for a username and password. By default, that's username airbyte and
password password. Once you deploy airbyte to your servers, be sure to change these in
your .env file.
Setting Up Postgres Container in Docker
Prerequisites
To use the Postgres destination, we'll need:
Install PostgreSql
Download and install PostgreSql from here.
Install pgAdmin
Download and install pgAdmin from here.
Install Tableau
Download Tableau Desktop for Mac or Windows.
Install PowerBI
Download Power BI for Mac or Windows.
Airbyte Setup
After setting up airbyte and postgres, on airbyte setup source as file that will be
your dataset, and setup destination as postgres i.e what we setup earlier postgres
container in docker. For doubts while setting up postgres you can follow instruction as
given here, follow from Step 2
pgAdmin Setup
Create a database in preferred server,
That would be under servers < servername < Database, refer here in case of doubt.
I did data visualization, data cleaning, data analysis with the help of python and
created insights from the data. Python provides various libraries that come with
different features for visualizing data. All these libraries come with different features
and can support various types of graphs.The libraries which I have used are as follows.
● Numpy
● Pandas
● Matplotlib
● Seaborn
Numpy
NumPy is a general-purpose array-processing package. It provides a high-
performance multidimensional array object, and tools for working with these arrays. It is
the fundamental package for scientific computing with Python. For reference click here.
Pandas
Pandas is an open source library in Python. It provides ready to use high-
performance data structures and data analysis tools. Pandas module runs on top of
NumPy and it is popularly used for data science and data analytics.To see user guide
click here.
Matplotlib
Matplotlib is an easy-to-use, low-level data visualization library that is built on
NumPy arrays. It consists of various plots like scatter plot, line plot, histogram, etc.
Matplotlib provides a lot of flexibility. Click here to find documentation.
Seaborn
Seaborn is a library that uses Matplotlib underneath to plot graphs. It will be used
to visualize random distributions.To see user guide click here.
Churn Prevention :
Customer churn refers to the percentage of customers who have stopped buying
and using the product for a specific period. Machine learning algorithms are used to
identify trends and features in the behavior, communication, and ordering of customers
who have ceased shopping through customer relationship management information.Try
to rely on and take into consideration your best customers to prevent or minimize their
customer churn, provide feedback and communicate promptly, offer bonuses, inquire
about the thoughts of your consumers
Inventory management :
The stock referred to the stocking of products and afterward used in crisis times.
For enterprises to optimize resources and increase sales, inventory management is
therefore vital.Powerful machine learning algorithms evaluate and supply data in depth
and identify buying patterns and correlations. The analyst then evaluates this data and
provides a strategy for revenue increase, timely delivery, and inventory management.
Target specific patient populations more effectively :
With information from genomic sequencing, medical sensor data (a device that
can, for instance, be worn and track physical changes in an individual during treatment),
and electronic medical records more readily available than ever before, pharmaceutical
companies are able to dig into the root causes of specific pathologies and realizing that
one size truly does not fit all. Within any disease or condition, different patients will
respond differently to treatments – for a host of reasons. Combining the data from
these different sources can allow drug companies to spot trends and patterns that will
allow them to come up with more targeted medications for patients that share common
features.