Significant Role of Statistics in Computational Sciences
Significant Role of Statistics in Computational Sciences
net/publication/287109922
CITATION READS
1 16,081
3 authors, including:
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Neeraj Tiwari on 23 February 2017.
Abstract: This paper is focused on the issues related to optimizing statistical approaches in the emerging fields of Computer Science
and Information Technology. More emphasis has been given on the role of statistical techniques in modern data mining. Statistics is
the science of learning from data and of measuring, controlling, and communicating uncertainty. Statistical approaches can play a vital
role for providing significance contribution in the field of software engineering, neural network, data mining, bioinformatics and other
allied fields. Statistical techniques not only helps make scientific models but it quantifies the reliability, reproducibility and general
uncertainty associated with these models. In the current scenario, large amount of data is automatically recorded with computers and
managed with the data base management systems (DBMS) for storage and fast retrieval purpose. The practice of examining large pre-
existing databases in order to generate new information is known as data mining. Presently, data mining has attracted substantial
attention in the research and commercial arena which involves applications of a variety of statistical techniques. Twenty years ago
mostly data was collected manually and the data set was in simple form but in present time, there have been considerable changes in
the nature of data. Statistical techniques and computer applications can be utilized to obtain maximum information with the fewest
possible measurements to reduce the cost of data collection.
Keywords: Statistics, Data Mining, Software Engineering, DBMS, Neural Networks, etc
www.ijcat.com 952
International Journal of Computer Applications Technology and Research
Volume 4– Issue 12, 952 - 955, 2015, ISSN: 2319–8656
products for the public. Statistical thinking not only helps 3. STATISTICS IN SOFTWARE
make scientific discoveries, but it quantifies the reliability,
reproducibility and general uncertainty associated with these ENGINEERING
discoveries. The following terms are a brief listing of areas in Software engineering aims to develop methodologies and
computer science that use statistics to varying degrees at procedures to control the whole software development
various times[6][7][8]: process. Nowadays researchers attempt to bridge the islands
Data Mining: Data mining is the analysis of of knowledge and experience between statistics and software
information in a database, using tools that look for engineering by enunciating a new interdisciplinary field:
trends or irregularities in large data sets. In other statistical software engineering. Design of Experiments
words "finding useful information from the (DOE) uses statistical techniques to test and construct models
available data sets using statistical techniques". of engineering components and systems. Quality control and
Data Compression: Data compression is the coding process control use statistics as a tool to manage conformance
of data using compact formulas, called algorithms, to specifications of manufacturing processes and their
and utilities to save storage space or transmission products. Time and methods engineering uses statistics to
time. study repetitive operations in manufacturing in order to set
Speech Recognition: Speech recognition is the standards and find optimum (in some sense) manufacturing
identification of spoken words by a machine. The procedures. Reliability engineering uses statistics to measures
spoken words are turned into a sequence of numbers the ability of a system to perform for its intended function
and matched against coded dictionaries. (and time) and has tools for improving performance.
Vision and Image Analyses: Vision and image Probabilistic design uses statistics in the use of probability in
analyses use statistics to solve contemporary and product and system design. Essential to statistical software
practical problems in computer vision, image engineering, is the role of data: wherever data are used or can
processing, and artificial intelligence. be generated in the software life cycle, statistical methods can
Human/Computer Interaction: Human/Computer be brought to bear for description, estimation, and prediction.
interaction uses statistics to design, implement, and The department of software engineering and statistics trains
evaluate new technologies that are useable, useful, multiskilled engineers in the processing of information, both
and appealing to a broad cross-section of people. in its statistical and computational forms, for use in various
Network/Traffic Modeling: Network/Traffic business professions.
modeling uses statistics to avoid network congestion
while fully exploiting the available bandwidth. 4. STATISTICS IN HARDWARE
Stochastic Optimization: Stochastic optimization MANUFACTURING
uses chance and probability models to develop the The hardware manufacturing companies are applying
most efficient code for finding the solution to a statistical approaches to create a plan of action that will work
problem. more efficiently for forecasting the future productivity of the
Stochastic Algorithms: Stochastic algorithms hardware enterprise[8]. Adopted statistical approaches for:
follow a detailed sequence of actions to perform or Forecasting production, when there is a stable
accomplish a task in the face of uncertainty. demand and uncertain demand.
Artificial Intelligence: Artificial intelligence is Pinpoint when and which inputs of a specific model
concerned with modelling aspects of human thought will be the cause of uncertainty
on computers. Calculate summary statistics in order to set sample
Machine Learning: Machine learning is the ability data.
of a machine or system to improve its performance To make market analysis and process optimizations.
based on previous results. Statistical tracking and predicting for quality
Capacity Planning: Capacity planning determines improvement
what equipment and software will be sufficient
while providing the most power for the least cost.
Storage and Retrieval: Storage and retrieval 5. STATISTICS IN DATABASE
techniques rely on statistics to ensure computerized MANAGEMENT
data is kept and recovered efficiently and reliably. Databases are packages designed to create, edit, manipulate
Quality Management: Quality management uses and analyze data. To be suitable for a database, the data must
statistics to analyze the condition of manufactured consist of records which provide information on individual
parts (hardware, software, etc.) using tools and cases, people, places, features, etc. Optimizer statistics are a
sampling to ensure a minimum level of defects. collection of data that describe more details about the
Software Engineering: Software engineering is a database and the objects in the database. The optimizer
systematic approach to the analysis, design, statistics are stored in the data dictionary. They can be viewed
implementation, and maintenance of computer using data dictionary views. Because the objects in a database
programs. can be constantly changing; statistics must be regularly
Performance Evaluation: Performance evaluation updated so that they accurately describe these database
is the process of examining a system or system objects. These statistics are used by the query optimizer to
component to determine the extent to which choose the best execution plan for each SQL statement[5].
specified properties are present. Optimizer statistics include the following:
Hardware Manufacturing: Hardware Table Statistics
manufacturing is the creation of the physical Number of rows
material parts of a system, such as the monitor or Number of blocks
disk drive. Average row length
Column Statistics
www.ijcat.com 953
International Journal of Computer Applications Technology and Research
Volume 4– Issue 12, 952 - 955, 2015, ISSN: 2319–8656
Artificial Intelligence
Applications
Classifiers and
Controllers
www.ijcat.com 954
International Journal of Computer Applications Technology and Research
Volume 4– Issue 12, 952 - 955, 2015, ISSN: 2319–8656
discovery. Data mining are transitioning from data-poor to Methods/Te - Dependence Methods: - Predictive Data
data-rich by using the methods like data exploration, chniques Discriminant analysis, Mining:
statistical inference and understanding of variability and Logistic regression Classification,
uncertainty[5]. - Interdependence Regression
Methods: Correlation - Discovery Data
Statistical Elements Present in Data Mining analysis, Mining:
Contrived serendipity, creating the conditions for Correspondence Association
analysis, Cluster Analysis, Sequence
fortuitous discovery. analysis Analysis,
Exploratory data analysis with large data sets, in Clustering
which the data are as far as possible allowed to
speak for themselves, independently of subject area
assumptions and of models which might explain 10. PROPERTIES OF STATISTICAL
their pattern. There is a particular focus on the
search for unusual or interesting features.
PACKAGES
Specialised problems: fraud detection. Statistical packages offer a range of types of statistical
The search for specific known patterns. analysis[3]. Statistical packages includes:
Standard statistical analysis problems with large Database functions, such as editing, printing reports.
data sets. Capabilities for graphic output, particularly graphs
but many also produce maps.
Common packages are SAS, SPSS, R, etc.
Data Mining from Statistical Perspective
Data sets which are relatively large and Available over a wide range of operating systems.
Some have been "ported" to (rewritten for) the IBM
homogeneous might be reasonable to us PC.
mainstream statistical techniques on the whole or a Numerous other packages have been developed
very large subset of the data. specifically for the PC DOS environment.
All analyses done by mainstream statistics have S is a commonly available statistical package for
intended outcome like set of data to a small amount UNIX
of readily assimilated information.
The outcome may include graphs, or summary 11. CONCLUSION
statistics, or equations that can be used for In this paper, many areas of computer science have been
described in which statistics plays a very vital role for data
prediction or a decision tree.
and information management. Statistical thinking fuels the
Large volume of data without loss of information be cross-fertilization of ideas between scientific fields
reduced to a much smaller summary form, this can (biological, physical, and social sciences), industry, and
enormously aid the subsequent analysis task. government. The statistical and algorithmic issues are both
It becomes much easier to make graphical and other important in the context of data mining. Statistics is an
checks that give the analyst assurance that essential and valuable component for any data mining
predictive models or other analysis outcomes are exercise. The future success of data mining will depend
critically on our ability to integrate techniques for modeling
meaningful and valid
and inference from statistics into the mainstream of data
mining practice.
Statistics vs. Data Mining
Feature Statistics Data Mining 12. REFERENCES
Type of Well structured Unstructured / [1] Lauro, C. (1996). Computational Statistics or
Problem Semi-structured Statistical Computing, is that the question?
Computational Statistics and Data Analysis, Vol.
Inference Explicit inference plays No explicit 23, pp.191–193.
Role great role in any inference [2] Billard, L. and Gentle, J.E. (1993). The middle
analysis
years of the Interface. Computing Science and
Objective of First – objective Data rarely Statistics, Vol. 25, pp.19–26.
the Analysis formulation, and then - collected for [3] Yates, F (1966). Computers: the second revolution
and Data data collection objective of the in statistics. Biometrics, Vol. 22.
Collection analysis/modeling [4] Cheng, B. and Titterington, D. M. (1994). Neural
networks: a review from a statistical perspective.
Size of data Data set is small and Data set is large
set hopefully homogeneous and data set is Statistical Science, Vol. 9, pp.2-54.
heterogeneous [5] Elder, J. F. and Pregibon, D. (1996). A statistical
perspective on knowledge discovery in databases.
Paradigm/A Theory-based Synergy of theory- Advances in Knowledge Discovery and Data
pproach (deductive) based and Mining, MIT Press, pp.83-115.
heuristic-based [6] Gentle, J.E. (2004). Courses in statistical computing
approaches
and computational statistics. The American
(inductive)
Statistician, Vol. 58, pp.2–5.
Type of Confirmative Explorative [7] Grier, D.A. (1991). Statistics and the introduction of
Analysis digital computers. Chance, Vol. 4(3), pp.30–36.
Number of Small Large
[8] Friedman, J. H. and Fisher, N. I. (1999). Bump
variables hunting in high-dimensional data. Statistics and
Computing, Vol. 9, pp.123-143.
www.ijcat.com 955