0% found this document useful (0 votes)
73 views

Significant Role of Statistics in Computational Sciences

This document discusses the significant role of statistics in computational sciences. It notes that statistics plays a vital role in providing contributions to fields like software engineering, neural networks, data mining, and bioinformatics. Large amounts of data are now automatically recorded and stored digitally, and data mining uses statistical techniques to generate new information by examining existing databases. Statistical techniques help quantify uncertainty in models and can be used to obtain maximum information from data with minimal data collection costs.

Uploaded by

Mitiku Abebe
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
73 views

Significant Role of Statistics in Computational Sciences

This document discusses the significant role of statistics in computational sciences. It notes that statistics plays a vital role in providing contributions to fields like software engineering, neural networks, data mining, and bioinformatics. Large amounts of data are now automatically recorded and stored digitally, and data mining uses statistical techniques to generate new information by examining existing databases. Statistical techniques help quantify uncertainty in models and can be used to obtain maximum information from data with minimal data collection costs.

Uploaded by

Mitiku Abebe
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/287109922

Significant Role of Statistics in Computational Sciences

Article  in  International Journal of Computer Applications Technology and Research · December 2015


DOI: 10.7753/IJCATR0412.1014

CITATION READS

1 16,081

3 authors, including:

Rakesh Kr Singh Neeraj Tiwari


Aryabhatta Knowledge University, Patna Kumaun University
157 PUBLICATIONS   669 CITATIONS    85 PUBLICATIONS   334 CITATIONS   

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Collaborative Research View project

Post-Doctoral Research View project

All content following this page was uploaded by Neeraj Tiwari on 23 February 2017.

The user has requested enhancement of the downloaded file.


International Journal of Computer Applications Technology and Research
Volume 4– Issue 12, 952 - 955, 2015, ISSN: 2319–8656

Significant Role of Statistics in Computational Sciences

Rakesh Kumar Singh Neeraj Tiwari R.C. Prasad


Scientist-D Professor & Head Scientist-F
G.B. Pant Institute of Department of Statistics G.B. Pant Institute of
Himalayan Environment & Kumaun University Himalayan Environment &
Development SSJ Campus, Almora Development
Kosi-Katarmal, Almora Uttarakahnd, India Kosi-Katarmal, Almora
Uttarakhand, India Uttarakhand, India

Abstract: This paper is focused on the issues related to optimizing statistical approaches in the emerging fields of Computer Science
and Information Technology. More emphasis has been given on the role of statistical techniques in modern data mining. Statistics is
the science of learning from data and of measuring, controlling, and communicating uncertainty. Statistical approaches can play a vital
role for providing significance contribution in the field of software engineering, neural network, data mining, bioinformatics and other
allied fields. Statistical techniques not only helps make scientific models but it quantifies the reliability, reproducibility and general
uncertainty associated with these models. In the current scenario, large amount of data is automatically recorded with computers and
managed with the data base management systems (DBMS) for storage and fast retrieval purpose. The practice of examining large pre-
existing databases in order to generate new information is known as data mining. Presently, data mining has attracted substantial
attention in the research and commercial arena which involves applications of a variety of statistical techniques. Twenty years ago
mostly data was collected manually and the data set was in simple form but in present time, there have been considerable changes in
the nature of data. Statistical techniques and computer applications can be utilized to obtain maximum information with the fewest
possible measurements to reduce the cost of data collection.

Keywords: Statistics, Data Mining, Software Engineering, DBMS, Neural Networks, etc

1. INTRODUCTION through the use of computational methods.


Statistics is a scientific discipline having sophisticated Computation in statistics is based on algorithms
methods for statistical inference, prediction, quantification of which originate in numerical mathematics or in
uncertainty and experimental design. From ancient to modern computer science. The group of algorithms highly
times statistics has been fundamental to advances in computer relevant for computational statistics from computer
science. The statistics encompasses a wide range of research science is machine learning, artificial intelligence
areas. The future of the World Wide Web (www) will depend (AI), and knowledge discovery in data bases or data
on the development of many new statistical ideas and mining. These developments have given rise to a
algorithms. The most productive approach is involve with new research area on the borderline between
statistics are: computational and mathematical. Modern statistics and computer science[1].
statistics encompasses the collection, presentation and  Computer Science vs. Statistics: Statistics and
characterization of information to assist in both data analysis Computer Science are both about data. Massive
and the decision-making process. Statistical advances made in amounts of data is present around today’s World.
collaboration with other sciences can address various Statistics lets us summarize and understand it with
challenges in the field of science and technology. Computer the use of Computer Science. Statistics also lets data
science uses statistics in many ways to guarantee products do our work for us[2].
available on the market are accurate, reliable, and
helpful[1][2]. Computer Science

 Statistical Computing: The term “statistical


computing” to refer to the computational methods
that enable statistical methods. Statistical computing
Statistics
includes numerical analysis, database methodology,
computer graphics, software engineering and the
Statistical Computational
computer-human interface[1]. Computing Statistics
 Computational Statistics: The term “computational
statistics” somewhat more broadly to include not
only the methods of statistical computing but also
modern statistical methods that are computationally Fig.1. Relation between statistics, computer science,
intensive. Thus, to some extent, “computational statistical computing and computational statics.
statistics” refers to a large class of modern statistical
methods. Computational statistics is grounded in 2. STATISTICAL APPROACHES IN
mathematical statistics, statistical computing and COMPUTATIONAL SCIENCES
applied statistics. Computational statistics is related Statistics is essential to the field of computer science in
to the advance of statistical theory and methods ensuring effectiveness, efficiency, reliability, and high-quality

www.ijcat.com 952
International Journal of Computer Applications Technology and Research
Volume 4– Issue 12, 952 - 955, 2015, ISSN: 2319–8656

products for the public. Statistical thinking not only helps 3. STATISTICS IN SOFTWARE
make scientific discoveries, but it quantifies the reliability,
reproducibility and general uncertainty associated with these ENGINEERING
discoveries. The following terms are a brief listing of areas in Software engineering aims to develop methodologies and
computer science that use statistics to varying degrees at procedures to control the whole software development
various times[6][7][8]: process. Nowadays researchers attempt to bridge the islands
 Data Mining: Data mining is the analysis of of knowledge and experience between statistics and software
information in a database, using tools that look for engineering by enunciating a new interdisciplinary field:
trends or irregularities in large data sets. In other statistical software engineering. Design of Experiments
words "finding useful information from the (DOE) uses statistical techniques to test and construct models
available data sets using statistical techniques". of engineering components and systems. Quality control and
 Data Compression: Data compression is the coding process control use statistics as a tool to manage conformance
of data using compact formulas, called algorithms, to specifications of manufacturing processes and their
and utilities to save storage space or transmission products. Time and methods engineering uses statistics to
time. study repetitive operations in manufacturing in order to set
 Speech Recognition: Speech recognition is the standards and find optimum (in some sense) manufacturing
identification of spoken words by a machine. The procedures. Reliability engineering uses statistics to measures
spoken words are turned into a sequence of numbers the ability of a system to perform for its intended function
and matched against coded dictionaries. (and time) and has tools for improving performance.
 Vision and Image Analyses: Vision and image Probabilistic design uses statistics in the use of probability in
analyses use statistics to solve contemporary and product and system design. Essential to statistical software
practical problems in computer vision, image engineering, is the role of data: wherever data are used or can
processing, and artificial intelligence. be generated in the software life cycle, statistical methods can
 Human/Computer Interaction: Human/Computer be brought to bear for description, estimation, and prediction.
interaction uses statistics to design, implement, and The department of software engineering and statistics trains
evaluate new technologies that are useable, useful, multiskilled engineers in the processing of information, both
and appealing to a broad cross-section of people. in its statistical and computational forms, for use in various
 Network/Traffic Modeling: Network/Traffic business professions.
modeling uses statistics to avoid network congestion
while fully exploiting the available bandwidth. 4. STATISTICS IN HARDWARE
 Stochastic Optimization: Stochastic optimization MANUFACTURING
uses chance and probability models to develop the The hardware manufacturing companies are applying
most efficient code for finding the solution to a statistical approaches to create a plan of action that will work
problem. more efficiently for forecasting the future productivity of the
 Stochastic Algorithms: Stochastic algorithms hardware enterprise[8]. Adopted statistical approaches for:
follow a detailed sequence of actions to perform or  Forecasting production, when there is a stable
accomplish a task in the face of uncertainty. demand and uncertain demand.
 Artificial Intelligence: Artificial intelligence is  Pinpoint when and which inputs of a specific model
concerned with modelling aspects of human thought will be the cause of uncertainty
on computers.  Calculate summary statistics in order to set sample
 Machine Learning: Machine learning is the ability data.
of a machine or system to improve its performance  To make market analysis and process optimizations.
based on previous results.  Statistical tracking and predicting for quality
 Capacity Planning: Capacity planning determines improvement
what equipment and software will be sufficient
while providing the most power for the least cost.
 Storage and Retrieval: Storage and retrieval 5. STATISTICS IN DATABASE
techniques rely on statistics to ensure computerized MANAGEMENT
data is kept and recovered efficiently and reliably. Databases are packages designed to create, edit, manipulate
 Quality Management: Quality management uses and analyze data. To be suitable for a database, the data must
statistics to analyze the condition of manufactured consist of records which provide information on individual
parts (hardware, software, etc.) using tools and cases, people, places, features, etc. Optimizer statistics are a
sampling to ensure a minimum level of defects. collection of data that describe more details about the
 Software Engineering: Software engineering is a database and the objects in the database. The optimizer
systematic approach to the analysis, design, statistics are stored in the data dictionary. They can be viewed
implementation, and maintenance of computer using data dictionary views. Because the objects in a database
programs. can be constantly changing; statistics must be regularly
 Performance Evaluation: Performance evaluation updated so that they accurately describe these database
is the process of examining a system or system objects. These statistics are used by the query optimizer to
component to determine the extent to which choose the best execution plan for each SQL statement[5].
specified properties are present. Optimizer statistics include the following:
 Hardware Manufacturing: Hardware  Table Statistics
manufacturing is the creation of the physical  Number of rows
material parts of a system, such as the monitor or  Number of blocks
disk drive.  Average row length
 Column Statistics

www.ijcat.com 953
International Journal of Computer Applications Technology and Research
Volume 4– Issue 12, 952 - 955, 2015, ISSN: 2319–8656

 Number of distinct values (NDV) in 7. STATISTICS IN NEURAL NETWORK


column Neural network had been used to refer to a network of
 Number of nulls in column biological neurons and artificial neural networks used to refer
 Data distribution (histogram) to a network of artificial neurons or nodes. Biological neural
 Index Statistics networks are made up of real biological neurons that are
 Number of leaf blocks connected or functionally related in the peripheral nervous
 Levels system or the central nervous system. Artificial neural
 Clustering factor networks are made up of interconnecting artificial neurons
 System Statistics (programming constructs that mimic the properties of
 I/O performance and utilization biological neurons). Artificial neural networks may either be
 CPU performance and utilization used to gain an understanding of biological neural
networks or for solving artificial intelligence problems
Statistical packages for databases are SAS, SPSS, R, etc. and without necessarily creating a model of a real biological
these are available over a wide range of operating systems. system. Because the inner product is a linear operator in the
Numerous other packages have been developed specifically input space, the Perception can only perfectly classify a set of
for the PC DOS environment. S is a commonly available data for which different classes are linearly separable in the
statistical package for UNIX input space, while it often fails completely for non-separable
data. While the development of the algorithm initially
generated some enthusiasm, partly because of its apparent
6. STATISTICS IN ARTIFICIAL relation to biological mechanisms, the later discovery of this
INTELLIGENCE inadequacy caused such models to be abandoned until the
Artificial intelligence (AI) is the intelligence exhibited by introduction of non-linear models into the field[4].
machines or software. Popular AI approaches include
statistical methods, computational intelligence, machine
learning and traditional symbolic AI. The goals of AI include
8. STATISTICS IN BIOINFORMATICS
Bioinformatics is the application of "computational biology“
reasoning, knowledge, planning, learning, natural language
to the management and analysis of biological data. Concepts
processing, perception and the ability to move and manipulate
from computer science, discrete mathematics and statics are
objects. There are a large number of tools used in AI,
being used increasingly to study and describe biological
including versions of search and mathematical optimization,
systems. Bioinformatics would not be possible without
logic, methods based on probability and economics, and many
advances in computer hardware and software: analysis
others[4]. The simplest AI applications can be divided into
of algorithms, data structures and software engineering. To
two types:
elaborate algorithms on computers increased the awareness of
 Classifiers: Classifiers are functions that use pattern
more recent statistical methods. Statistical analysis for
matching to determine a closest match. A classifier
differently expressed genes are best carried out via hypothesis
can be trained in various ways; there are many
test. More complex data may require analysis via ANOVA or
statistical and machine learning approaches. The
general linear models[8].
most widely used classifiers is the neural network.
 Controllers: Controllers do however also classify
conditions before inferring actions, and therefore
classification forms a central part of many AI
systems.
Bioinformatics

Artificial Intelligence
Applications

Computational Statistics Computational Biology

Classifiers and
Controllers

Statistics Computer Science Biology


Pattern
Matching
Fig.3. Taxonomy of Bioinformatics.
Statistical
Implications 9. STATISTICS IN DATA MINING
Data Mining is a process of discovering previously unknown
and potentially useful hidden pattern in the data. Advances in
Neural Network, Gaussian mixture information technology have resulted in a much more data-
model, Naive Bayes classifier, etc. based society. Data touch almost every aspect of our lives like
commerce on the web, measuring our fitness and safety,
doctors treat our illnesses, economic decisions that affect
Fig.2. Graphical approach of Artificial Intelligence. entire nations, etc. Alone, data are not useful for knowledge

www.ijcat.com 954
International Journal of Computer Applications Technology and Research
Volume 4– Issue 12, 952 - 955, 2015, ISSN: 2319–8656

discovery. Data mining are transitioning from data-poor to Methods/Te - Dependence Methods: - Predictive Data
data-rich by using the methods like data exploration, chniques Discriminant analysis, Mining:
statistical inference and understanding of variability and Logistic regression Classification,
uncertainty[5]. - Interdependence Regression
Methods: Correlation - Discovery Data
Statistical Elements Present in Data Mining analysis, Mining:
 Contrived serendipity, creating the conditions for Correspondence Association
analysis, Cluster Analysis, Sequence
fortuitous discovery. analysis Analysis,
 Exploratory data analysis with large data sets, in Clustering
which the data are as far as possible allowed to
speak for themselves, independently of subject area
assumptions and of models which might explain 10. PROPERTIES OF STATISTICAL
their pattern. There is a particular focus on the
search for unusual or interesting features.
PACKAGES
 Specialised problems: fraud detection. Statistical packages offer a range of types of statistical
 The search for specific known patterns. analysis[3]. Statistical packages includes:
 Standard statistical analysis problems with large  Database functions, such as editing, printing reports.
data sets.  Capabilities for graphic output, particularly graphs
but many also produce maps.
 Common packages are SAS, SPSS, R, etc.
Data Mining from Statistical Perspective
 Data sets which are relatively large and  Available over a wide range of operating systems.
 Some have been "ported" to (rewritten for) the IBM
homogeneous might be reasonable to us PC.
mainstream statistical techniques on the whole or a  Numerous other packages have been developed
very large subset of the data. specifically for the PC DOS environment.
 All analyses done by mainstream statistics have  S is a commonly available statistical package for
intended outcome like set of data to a small amount UNIX
of readily assimilated information.
 The outcome may include graphs, or summary 11. CONCLUSION
statistics, or equations that can be used for In this paper, many areas of computer science have been
described in which statistics plays a very vital role for data
prediction or a decision tree.
and information management. Statistical thinking fuels the
 Large volume of data without loss of information be cross-fertilization of ideas between scientific fields
reduced to a much smaller summary form, this can (biological, physical, and social sciences), industry, and
enormously aid the subsequent analysis task. government. The statistical and algorithmic issues are both
 It becomes much easier to make graphical and other important in the context of data mining. Statistics is an
checks that give the analyst assurance that essential and valuable component for any data mining
predictive models or other analysis outcomes are exercise. The future success of data mining will depend
critically on our ability to integrate techniques for modeling
meaningful and valid
and inference from statistics into the mainstream of data
mining practice.
Statistics vs. Data Mining
Feature Statistics Data Mining 12. REFERENCES
Type of Well structured Unstructured / [1] Lauro, C. (1996). Computational Statistics or
Problem Semi-structured Statistical Computing, is that the question?
Computational Statistics and Data Analysis, Vol.
Inference Explicit inference plays No explicit 23, pp.191–193.
Role great role in any inference [2] Billard, L. and Gentle, J.E. (1993). The middle
analysis
years of the Interface. Computing Science and
Objective of First – objective Data rarely Statistics, Vol. 25, pp.19–26.
the Analysis formulation, and then - collected for [3] Yates, F (1966). Computers: the second revolution
and Data data collection objective of the in statistics. Biometrics, Vol. 22.
Collection analysis/modeling [4] Cheng, B. and Titterington, D. M. (1994). Neural
networks: a review from a statistical perspective.
Size of data Data set is small and Data set is large
set hopefully homogeneous and data set is Statistical Science, Vol. 9, pp.2-54.
heterogeneous [5] Elder, J. F. and Pregibon, D. (1996). A statistical
perspective on knowledge discovery in databases.
Paradigm/A Theory-based Synergy of theory- Advances in Knowledge Discovery and Data
pproach (deductive) based and Mining, MIT Press, pp.83-115.
heuristic-based [6] Gentle, J.E. (2004). Courses in statistical computing
approaches
and computational statistics. The American
(inductive)
Statistician, Vol. 58, pp.2–5.
Type of Confirmative Explorative [7] Grier, D.A. (1991). Statistics and the introduction of
Analysis digital computers. Chance, Vol. 4(3), pp.30–36.
Number of Small Large
[8] Friedman, J. H. and Fisher, N. I. (1999). Bump
variables hunting in high-dimensional data. Statistics and
Computing, Vol. 9, pp.123-143.

www.ijcat.com 955

View publication stats

You might also like