AL801 BI For All
AL801 BI For All
Submitted By : Submitted To :
Devendra Kelwa Dr. Kamlesh Namdev
Roll No : Branch :
17 CSE AIML
1
Session 2023-2024
AL-801 Business Intelligence Assignment
Submitted By : Submitted To :
Kautuk Astu Dr. Kamlesh Namdev
Roll No : Branch :
26 CSE AIML
2
Session 2023-2024
AL-801 Business Intelligence Assignment
Submitted By : Submitted To :
Ayush Jain Dr. Kamlesh Namdev
Roll No : Branch :
69 CSE AIML
3
Session 2023-2024
AL-801 Business Intelligence Assignment
Submitted By : Submitted To :
Rahul Soni Dr. Kamlesh Namdev
Roll No : Branch :
72 CSE AIML
4
Assignment – I
BI tools access and analyze data sets and present analytical findings in reports, summaries,
dashboards, graphs, charts, and maps to provide users with detailed intelligence about the
state of the business. The main purpose of BI is to enable easy interpretation of large
volumes of data to identify new opportunities and implement an effective strategy based on
insights that can provide businesses with a competitive market advantage and long-term
stability.
The concept of business intelligence dates back to the 1960s but has evolved significantly
with the advent of modern computing technology. Today, BI encompasses a wide range of
tools, applications, and methodologies that enable organizations to collect data from internal
systems and external sources, prepare it for analysis, develop and run queries against that
data, and create reports, dashboards, and data visualizations to make the analytical results
available to corporate decision-makers and operational workers.
Resource Optimization: Every organization has limited resources, including time, money, and
manpower. Effective decision-making ensures that these resources are allocated optimally to
pursue the right opportunities and projects with the highest return on investment. Timely
decisions prevent the wastage of resources on less profitable or inefficient operations.
Employee Morale and Engagement: The decision-making process also affects the internal
environment of an organization. Decisions that are made efficiently and communicated
clearly can improve employee morale and engagement. Employees feel more valued and
motivated when they see that their input leads to decisive action and positive change.
6
Assignment – I
Long-term Sustainability: Ultimately, the capacity for effective and timely decision-making
contributes to the long-term sustainability of an organization. It allows businesses to navigate
challenges, seize opportunities, and evolve strategies that ensure their continued relevance
and success in the market.
In summary, effective and timely decision-making is a critical management skill that impacts
every aspect of an organization's operations. It requires a balance between speed and
accuracy, leveraging data and insights for informed choices that propel the organization
forward. Organizations that master this balance can navigate the complexities of the business
world more successfully, achieving their goals and setting new standards of excellence.
7
Assignment – I
8
Assignment – I
Conclusion
9
Assignment – I
1. Data Sources: The foundation of any BI system. Data sources can include internal
systems like ERP (Enterprise Resource Planning), CRM (Customer Relationship
Management), and financial software, as well as external data from market research,
social media, and more. These diverse data sources provide the raw material for insights
and analysis.
2. Data Integration Tools: These tools, such as ETL (Extract, Transform, Load) processes, are
used to consolidate, clean, and standardize data from various sources. Data integration
is crucial for ensuring that the data fed into the BI system is consistent, accurate, and
ready for analysis.
4. Analytics and Reporting Tools: These tools are used to analyze the data stored in the
data warehouse. They include ad hoc reporting tools, statistical analysis, predictive
analytics, and data mining tools. These technologies enable users to create reports,
dashboards, and visualizations that make the data understandable and actionable for
business users.
5. Business Analytics Layer: This layer includes advanced analytics tools, such as predictive
analytics, machine learning models, and data mining techniques. These tools are used
for more sophisticated analysis to predict future trends, identify patterns, and provide
deeper insights into the business operations.
10
Assignment – I
7. Administration and Security Layer: Ensures that data and analytics are securely
managed. This includes user authentication, data encryption, access controls, and audit
logs to protect sensitive information and comply with data governance and privacy
regulations.
How It Works:
1. Data Collection: Data is collected from various sources, including internal databases and
external data feeds.
2. Data Processing: Data is cleaned, transformed, and integrated to ensure consistency and
accuracy.
3. Data Storage: Processed data is stored in a data warehouse or data mart, where it is
organized for easy access.
4. Data Analysis: Business analysts, data scientists, and end-users use analytics and
reporting tools to analyze the data, generate reports, and uncover insights.
5. Insight Delivery: Insights are delivered to business users through dashboards, reports,
and visualizations, enabling informed decision-making.
Conclusion:
11
Assignment – I
Figure : BI Architecture
12
Assignment – II
Data:
Data represents raw facts and figures that are unprocessed and unorganized. It can be
quantitative or qualitative and comes from various sources like transactions, surveys,
sensors, and observations. Data in isolation lacks context and meaning; it's simply the raw
input that needs to be processed and analyzed. For instance, sales figures, temperature
readings, and customer feedback comments are all examples of data.
Information:
Information is data that has been processed, organized, or structured in a way that adds
context and makes it meaningful. Processing data involves sorting, aggregating, and analyzing
it to reveal patterns or relationships. Information answers questions like "who," "what,"
"where," and "when," providing clarity and understanding that data alone cannot. For
example, a report showing monthly sales figures by region turns data (individual sales
transactions) into information by organizing it in a way that reveals trends and performance
across different areas.
Knowledge:
Knowledge takes information a step further by applying experience, context, interpretation,
and reflection to derive insights and understanding. It represents the synthesis of multiple
pieces of information over time, incorporating learning and expertise to make informed
judgments or decisions. Knowledge answers "how" and "why" questions, enabling action and
decision-making. For example, understanding why sales peaks in certain regions during
specific months and knowing how to capitalize on this trend to boost future sales.
14
Assignment – II
Parameterized Reports:
Parameterized reports are dynamic reporting tools that allow users to customize the output
of a report based on specific parameters or criteria. These parameters can include dates,
geographic regions, product lines, or any variable relevant to the report's data. By entering
different parameters, users can tailor the report to display exactly the information they need,
without having to generate a new report each time.
• Functionality: The key functionality of parameterized reports lies in their ability to filter
and sort data dynamically according to the user's input. This is achieved through
predefined queries in the report design that adjust based on the parameters selected by
the user.
• Benefits: The primary benefit of parameterized reports is their flexibility and efficiency.
They can serve a wide range of users and purposes without the need for creating
numerous individual reports. Users can explore data in a more interactive way, drilling
down into specifics as needed.
• Use Cases: Parameterized reports are particularly useful in situations where the
reporting needs of an organization are varied and complex, such as financial reporting,
inventory management, and performance tracking across different business units.
Self-Service Reporting:
Self-service reporting, on the other hand, empowers non-technical users to create their own
reports and analyses without reliance on IT or data teams. This approach leverages user-
friendly tools and interfaces to enable direct access to data, offering a high degree of
customization and flexibility.
While both parameterized and self-service reporting aim to make BI more accessible and
relevant to users, they cater to slightly different needs. Parameterized reports offer a more
controlled environment with predefined options, suitable for standard reporting needs
across an organization. Self-service reporting, conversely, provides individual users with the
tools to explore data on their own terms, fostering a more exploratory and personalized
approach to data analysis.
In the landscape of modern business intelligence, the integration of both parameterized and
self-service reporting can offer organizations the flexibility and depth needed to harness their
data effectively, driving better business outcomes through informed decision-making.
16
Assignment – II
• The first step in optimizing presentation is understanding who your audience is. Different
stakeholders may have varying levels of technical expertise, interests, and decision-
making power. For instance, executives might prefer high-level overviews and
dashboards, while analysts may require detailed reports and raw data for deeper
investigation.
Crafting a Narrative:
• Data storytelling involves weaving data into a narrative that guides the audience through
the findings in a structured and engaging manner. This approach helps to contextualize
the data, making it more relatable and easier to understand. The narrative should align
with the objectives of the report and lead the audience towards the intended insights or
actions.
• Simplifying the presentation to focus on key insights is crucial. This means avoiding
unnecessary complexity and clutter that can distract or confuse the audience. Use clear
labels, avoid jargon, and ensure that each element on the page serves a purpose.
17
Assignment – II
Interactivity and Exploration:
• Offering interactivity, such as drill-downs, filters, and sliders, allows users to explore the
data at their own pace and according to their interests. This can be particularly effective
in self-service BI tools, where users might want to investigate specific aspects of the data
further.
• Consistency in design and layout across reports and dashboards aids in comprehension
and user experience. Additionally, ensuring accessibility for all users, including those with
disabilities, is important for inclusive data communication.
Feedback Loop:
• Implementing a feedback loop where users can provide insights on the usefulness and
clarity of the reports can help in continuously improving the presentation. This iterative
process ensures that the BI reporting evolves to meet the changing needs of its audience.
Conclusion:
Optimizing the presentation for the right message in BI is about more than just making data
look attractive; it's about enhancing the understandability, engagement, and actionability of
the data presented. By carefully considering the audience, choosing appropriate
visualizations, crafting a compelling narrative, and focusing on clarity, BI practitioners can
drive better decision-making and outcomes from their data insights.
18
Assignment – II
Interactive Analysis:
Interactive analysis refers to the process of exploring data through an intuitive, user-friendly
interface that allows for real-time data manipulation and visualization. This method enables
users to dig deeper into their data by applying various filters, sorting, and drilling down into
specifics, all in a dynamic and immediate way. Interactive analysis tools often come with
capabilities such as:
The primary advantage of interactive analysis is its ability to cater to the curiosity and
investigatory instincts of the user, allowing for a more natural exploration of data that can
lead to unexpected insights and a deeper understanding of underlying trends and patterns.
Ad Hoc Querying:
Ad hoc querying, on the other hand, is the capability to create and run queries against a
database spontaneously, without needing to write extensive code or rely on prebuilt reports.
This functionality is essential for users who need to answer specific business questions that
standard reports do not cover. Ad hoc querying tools typically provide:
• A user-friendly query interface that does not require in-depth SQL knowledge or
programming skills.
19
Assignment – II
• The flexibility to select specific data fields and apply filters and aggregations as needed.
• Rapid execution of queries to retrieve information in real-time or near-real-time.
• Export options to share findings with others or perform further analysis in other tools.
Ad hoc querying empowers business users, analysts, and managers to seek answers to
specific questions quickly, fostering a data-driven culture by enabling immediate access to
relevant data insights.
Together, interactive analysis and ad hoc querying provide a comprehensive toolkit for data
exploration and analysis within a BI environment. While interactive analysis offers a broad,
exploratory view of data, ad hoc querying allows for digging into specifics with precise,
custom queries. This combination ensures that users can start with high-level insights and
drill down to the exact details they need, all within a seamless, user-centric process.
Conclusion:
Interactive analysis and ad hoc querying are indispensable for modern businesses that
prioritize data-driven decision-making. By offering dynamic, user-driven ways to explore and
analyze data, these methodologies enable organizations to harness the full potential of their
data, uncover deep insights, and respond more effectively to changing business
environments and opportunities.
20
Assignment – III
Efficiency scores range from 0 to 1 (or 0% to 100%), where a score of 1 indicates that the
DMU is on the efficiency frontier and is considered as operating efficiently. Scores below 1
indicate inefficiency, with the magnitude of deviation from 1 reflecting the degree of
inefficiency.
Applications of DEA:
DEA is widely used in various fields for efficiency analysis, including:
• Public Sector: Evaluating the efficiency of hospitals, schools, and other public services.
21
Assignment – III
• Banking: Assessing the efficiency of branches or services within banks.
• Manufacturing: Measuring the efficiency of production units or processes.
• Energy: Analyzing the efficiency of power plants and renewable energy sources.
Advantages of DEA:
• Flexibility: DEA can handle multiple inputs and outputs without needing to specify a
functional form for the production process.
• Benchmarking: Provides benchmarks against which inefficient units can compare
themselves.
• Target-setting: Offers insights into potential improvements for inefficient DMUs.
Limitations of DEA:
• Sensitivity to Extreme Values: DEA can be sensitive to outliers or extreme values in the
data.
• Static Analysis: DEA typically provides a static efficiency picture, not accounting for
changes over time unless applied in a panel data context.
• Subjectivity in Weighting: The choice of inputs and outputs and their respective weights
can influence the efficiency scores.
In summary, Data Envelopment Analysis is a powerful tool for measuring the efficiency of
entities with multiple inputs and outputs, offering valuable insights for performance
improvement and strategic planning.
22
Assignment – III
Cross-efficiency analysis addresses this by using the DEA results but then extending the
evaluation in the following way:
1. Peer Evaluations: Each DMU’s input and output weights, obtained from its own DEA
optimization, are used to calculate the efficiency scores of all other DMUs. This means
that each DMU is evaluated not only under its own most favorable conditions but also
under the conditions favorable to others.
2. Cross-efficiency Score: The efficiency scores a DMU receives from all evaluations
(including its own and those by its peers) are then averaged to produce a cross-efficiency
score. This score reflects not just how efficiently a DMU uses its resources but also how
its efficiency is perceived by its peers under their optimal conditions.
Limitations:
While cross-efficiency analysis offers several advantages, it also comes with limitations:
• Subjectivity in Weight Selection: The choice of weights remains subjective and can
influence the outcomes.
• Complexity: The process is more complex than standard DEA, requiring multiple rounds
of efficiency calculations.
• Sensitivity: Results may be sensitive to the choice of inputs and outputs, as well as the
presence of outliers.
24
Assignment – III
2. Assumption of Constant Returns to Scale (CRS): One of the critical assumptions of the
CCR model is that DMUs operate under constant returns to scale. This means the model
assumes that changes in input levels will lead to proportional changes in output levels,
which is particularly suitable for evaluating DMUs that are operating at an optimal scale.
3. Efficiency Score: The CCR model calculates an efficiency score for each DMU, which
ranges from 0 to 1 (or 0% to 100%). An efficiency score of 1 indicates that the DMU is
operating on the efficiency frontier and is considered fully efficient relative to its peers.
Scores below 1 indicate inefficiency, with the magnitude of the score reflecting the
degree of inefficiency.
4. Slack Variables: The model incorporates slack variables to account for any excess inputs
or shortage of outputs in inefficient DMUs. These slacks are used to adjust the input and
output levels of DMUs to project them onto the efficiency frontier, providing specific
targets for improvement.
25
Assignment – III
b. Cluster analysis
Cluster analysis is a statistical method used to group objects (such as individuals, things, or
events) into clusters or segments based on their characteristics, such that objects within the
same cluster are more similar to each other than to those in other clusters. This method is
widely used in various fields, including marketing, biology, medicine, and social sciences, for
exploratory data analysis, pattern recognition, and classification purposes.
2. Clustering Algorithms: Several algorithms can be employed in cluster analysis, each with
its strengths and weaknesses. Some of the most common include:
• K-means Clustering: Partitions the data into K clusters by minimizing the variance
within each cluster. It's suitable for large datasets and numerical data but requires
specifying the number of clusters in advance.
• Hierarchical Clustering: Builds a hierarchy of clusters using a bottom-up approach
(agglomerative) or a top-down approach (divisive). This method is useful for
understanding the data structure but can be computationally intensive for large
datasets.
• DBSCAN (Density-Based Spatial Clustering of Applications with Noise): Identifies
clusters based on the density of data points, allowing for clusters of arbitrary shape
and the identification of outliers. It does not require specifying the number of
clusters.
3. Cluster Validation: Assessing the quality of the clustering results is critical. Techniques
such as the silhouette coefficient, Dunn index, or comparing within-cluster and
between-cluster distances are used to evaluate the effectiveness and coherence of the
clusters formed.
Understanding Outliers:
• Definition: An outlier is an observation that lies an abnormal distance from other values
in a random sample from a population. In a sense, this means that outliers are
observations that appear to deviate markedly from other members of the sample in
which they occur.
• Types of Outliers:
• Point Outliers: Individual observations that are far removed from the general data
distribution.
• Contextual Outliers: Observations considered outliers within a specific context or
condition.
• Collective Outliers: A collection of observations that deviate significantly from the
overall data pattern when considered together, even though the individual data
points may not be outliers.
1. Statistical Tests: Standard deviation method, Z-score method, and the Interquartile
Range (IQR) method are commonly used statistical techniques for identifying outliers.
2. Visualization Tools: Box plots, scatter plots, and histograms can visually highlight the
presence of outliers.
3. Machine Learning Algorithms: Clustering algorithms (such as K-means, DBSCAN) and
anomaly detection algorithms (such as Isolation Forest, Local Outlier Factor) are used for
more sophisticated outlier detection, especially in large datasets or datasets with
complex structures.
27
Assignment – IV
• Key Features:
• Predicts the probability of occurrence of an event by fitting data to a logistic
function.
• Outputs a value between 0 and 1, which can be interpreted as the likelihood of the
event (e.g., purchase decision).
• Useful in scenarios with a dichotomous outcome and where the relationship
between the independent variables and the dependent variable is not linear.
• Case Study Application:
• Customer Purchase Prediction: A retail company uses logistic regression to analyze
customer demographics, previous purchase history, and marketing engagement
data to predict the likelihood of customers purchasing a newly launched product.
The model helps tailor marketing campaigns to target customers with higher
buying probabilities, optimizing marketing spend and improving conversion rates.
• Key Features:
• Includes models like the Economic Order Quantity (EOQ) model, the Economic
Production Quantity (EPQ) model, and Just-In-Time (JIT) production.
28
Assignment – IV
• Aims to minimize production and inventory costs while meeting customer demand.
• Helps in decision-making regarding production scheduling, inventory levels, and
supply chain logistics.
• Case Study Application:
• Optimizing Production for Seasonal Demand: A beverage company employs
production modeling to manage the production of seasonal flavors. By analyzing
historical sales data, seasonal trends, and production costs, the company optimizes
its production schedule and inventory levels to meet anticipated demand spikes
during specific seasons, thereby reducing overproduction and minimizing storage
costs.
In summary, logistic and production models offer powerful tools for predicting customer
behavior and optimizing production strategies. By applying these models in a coordinated
manner, businesses can enhance their marketing effectiveness, improve operational
efficiency, and achieve a competitive edge in the market.
29
Assignment – V
a. Emerging Technologies:
The integration of emerging technologies into BI platforms is set to redefine how businesses
access, analyze, and leverage data. Technologies such as Artificial Intelligence (AI) and
Machine Learning (ML), the Internet of Things (IoT), blockchain, and augmented reality (AR)
are at the forefront.
• AI and ML are making BI tools more intelligent and autonomous, enabling predictive
analytics, natural language processing, and personalized insights at scale. These
advancements allow for more proactive and prescriptive analytics, where BI tools not
only report on what has happened but also predict future trends and recommend
actions.
• IoT devices generate a massive influx of real-time data, offering businesses
unprecedented visibility into operations, customer behavior, and market conditions. BI
platforms that can integrate and analyze IoT data can provide real-time insights for
immediate decision-making.
• Blockchain technology offers a secure and transparent way to store and share data,
potentially revolutionizing how data is managed and trusted in BI processes.
• Augmented Reality (AR) can transform data visualization, making complex data more
accessible and understandable through immersive experiences.
b. Machine Learning:
Machine Learning within BI is evolving from a complementary technology to a core
component of BI strategies. ML algorithms can uncover patterns and insights from vast
datasets much more efficiently than traditional methods. Future BI tools will likely offer more
advanced ML capabilities, such as:
30
Assignment – V
• Automated anomaly detection to identify and alert on unusual data patterns.
• Enhanced forecasting models that adapt over time for more accurate predictions.
• ML-driven data preparation and cleaning to reduce the time and effort required to make
data analysis-ready.
• Natural language processing (NLP) for querying BI systems using everyday language,
making data analysis more accessible to non-technical users.
• Advanced text analytics to extract insights from textual data such as social media posts,
customer reviews, and open-ended survey responses, integrating these insights into
broader data analysis efforts.
Conclusion:
The future of Business Intelligence is marked by the deeper integration of advanced
technologies, making BI tools more powerful, intuitive, and essential for business operations.
As these trends continue to evolve, businesses that adapt and incorporate these
advancements into their BI strategies will gain a significant edge in leveraging data for
strategic decision-making, operational efficiency, and innovation.
31