Business Analytics - Intro
Business Analytics - Intro
The science of posing and answering data questions related to business - Business analytics.
Tools - statistics, data management, data visualization, and machine learning. Big data handling to assimilate the advances made in
data sciences. The modeling methods connect the tools,
The applications - finance, marketing, and operations. The major software used are R, Python, MS Excel, and MYSQL.
Business problems, which are transformed into technological problems.
Methodology is developed to solve the technological problems.
Data analysis is done using suitable software and the output and results are clearly explained at each stage of development.
Finally, the technological solution is transformed back to a business solution.
Example of how the retailing industry might use various sources of data in order to better serve their customers and understand their
preferences.
Data Management—Relational Database Management Systems: data management and storage. The focus of the chapter is on relational
database management systems or RDBMS, MySQL (an open-source structural query language). examine the data present in the tables
using the SELECT command.
Data Management—Big Data: big data tools such as Hadoop, Spark, and surrounding ecosystem. distributed and parallel computing and
big data cloud. the architecture of the Hadoop runtime environment. It starts by describing the cluster, which is the set of host
machines, or nodes for facilitating data access. It then moves on to the YARN infrastructure, which is responsible for providing
computational resources to the application. It describes two main elements of the YARN infrastructure—the Resource Manager and the
Node Manager. It then details the HDFS Federation, which provides storage, and also discusses other storage solutions. Lastly, it
discusses the MapReduce framework, which is the software layer. the functions of MapReduce in detail. MapReduce divides tasks into
subtasks, which it runs in parallel in order to increase efficiency. the process steps that MapReduce takes in order to produce the
output, and describes how Python can be used to create a MapReduce process for a word count program. Spark and an application.
cloud storage. Cloudera virtual machine (VM) distributable to demonstrate different hands-on exercises.
Sridhar Seshadri
Data Visualization
how data is visualized and the way that visualization can be used to aid in analysis. The humans use visuals to understand
information, and that using visualizations incorrectly can lead to mistaken conclusions. The visualization is a cognitive aid and
the importance of working memory in the brain. It emphasizes the role of data visualization in reducing the load on the reader.
Six meta-rules of data visualization,
1. use the most appropriate chart, directly represent relationships between data, refrain from asking the viewer to compare
differences in area, never use color on top of color, keep within the primal perceptions of the viewer, and chart with integrity.
the advantages and disadvantages of 3D visualization, and the best practices of color schemes.
Statistical Methods—Basic Inferences: This chapter introduces the fundamental concepts of statistical inferences, such as
population and sample parameters, hypothesis testing, and analysis of variance. the differences between population and sample
means and variance and the methods to calculate them. the central limit theorem and its use in estimating the mean of a
population. Confidence intervals are explained for samples in which variance is both known and unknown. The concept of
standard errors and the t- and Chi-squared distributions. hypothesis testing and the use of statistical parameters to reject or fail
to reject hypotheses. Type I and type II errors are discussed. Methods to compare two different samples are explained. Analysis
of variance between two samples and within samples is also covered. The use of the F-distribution in analyzing variance is
explained. The chapter concludes with discussion of when we need to compare means of a number of populations. how to use a
technique called “Analysis of Variance (ANOVA)” instead of carrying out pairwise comparisons.
Linear Regression Analysis: examples, such as predicting newspaper circulation. It uses the examples to discuss the methods by
which linear regression obtains results. a linear regression as a functional form that can be used to understand relationships
between outcomes and input variables and perform statistical inference. the importance of linear regression and its popularity,
and explains the basic assumptions underlying linear regression. The modeling section begins by discussing a model in which
there is only a single regressor. why a scatter-plot can be useful in understanding single regressor models, and the importance
of visual representation in statistical inference. the ordinary least squares method of estimating a parameter, and the use of the
sum of squares of residuals as a measure of the fit of a model. the use of confidence intervals and hypothesis testing in a linear
regression