Research Methods for Computer Science - Copy
Research Methods for Computer Science - Copy
OF
COMPUTER SCIENCE
RESEARCH METHODS
OF
COMPUTER SCIENCE
By
EHTIRAM RAZA KHAN HUMA ANWAR
Sr. Lecturer Assistant Director
Dept. of Computer Science IPM Ghaziabad
Jamia Hamdard University Uttar Pradesh
New Delhi
ΞďLJ>ĂdžŵŝWƵďůŝĐĂƟŽŶƐ;WͿ>ƚĚ͘
ůů ƌŝŐŚƚƐ ƌĞƐĞƌǀĞĚ ŝŶĐůƵĚŝŶŐ ƚŚŽƐĞ ŽĨ ƚƌĂŶƐůĂƟŽŶ ŝŶƚŽ ŽƚŚĞƌ ůĂŶŐƵĂŐĞƐ͘ /Ŷ ĂĐĐŽƌĚĂŶĐĞ ǁŝƚŚ ƚŚĞ ŽƉLJƌŝŐŚƚ ;ŵĞŶĚŵĞŶƚͿ Đƚ͕ ϮϬϭϮ͕
ŶŽƉĂƌƚŽĨƚŚŝƐƉƵďůŝĐĂƟŽŶŵĂLJďĞƌĞƉƌŽĚƵĐĞĚ͕ƐƚŽƌĞĚŝŶĂƌĞƚƌŝĞǀĂůƐLJƐƚĞŵ͕ŽƌƚƌĂŶƐŵŝƩĞĚŝŶĂŶLJĨŽƌŵŽƌďLJĂŶLJŵĞĂŶƐ͕ĞůĞĐƚƌŽŶŝĐ͕
ŵĞĐŚĂŶŝĐĂů͕ƉŚŽƚŽĐŽƉLJŝŶŐ͕ƌĞĐŽƌĚŝŶŐŽƌŽƚŚĞƌǁŝƐĞ͘ŶLJƐƵĐŚĂĐƚŽƌƐĐĂŶŶŝŶŐ͕ƵƉůŽĂĚŝŶŐ͕ĂŶĚŽƌĞůĞĐƚƌŽŶŝĐƐŚĂƌŝŶŐŽĨĂŶLJƉĂƌƚŽĨƚŚŝƐ
ŬǁŝƚŚŽƵƚƚŚĞƉĞƌŵŝƐƐŝŽŶŽĨƚŚĞƉƵďůŝƐŚĞƌĐŽŶƐƟƚƵƚĞƐƵŶůĂǁĨƵůƉŝƌĂĐLJĂŶĚƚŚĞŌŽĨƚŚĞĐŽƉLJƌŝŐŚƚŚŽůĚĞƌ͛ƐŝŶƚĞůůĞĐƚƵĂůƉƌŽƉĞƌƚLJ͘/Ĩ
LJŽƵǁŽƵůĚůŝŬĞƚŽƵƐĞŵĂƚĞƌŝĂůĨƌŽŵƚŚĞŬ;ŽƚŚĞƌƚŚĂŶĨŽƌƌĞǀŝĞǁƉƵƌƉŽƐĞƐͿ͕ƉƌŝŽƌǁƌŝƩĞŶƉĞƌŵŝƐƐŝŽŶŵƵƐƚďĞŽďƚĂŝŶĞĚĨƌŽŵƚŚĞ
ƉƵďůŝƐŚĞƌƐ͘
WƌŝŶƚĞĚĂŶĚďŽƵŶĚŝŶ/ŶĚŝĂ
Typeset at 'ŽƐǁĂŵŝƐƐŽĐŝĂƚĞƐ͕ĞůŚŝ
&ŝƌƐƚĚŝƟŽŶ͗ϮϬϭϲ
ISBNϵϳϴͲϵϯͲϴϯϴϮϴͲϮϰͲϭ
(v)
PREFACE
The market is flooded with the books on computer science. But there has been a vacuum of
books on research methods in computer science. Everybody wants to build a competitive
professional career in their life. The study material plays a significant role in building a
career. And it is very difficult to select a study material for a professional career. Therefore, a
judicious choice is significant in selecting a book for your prospective career. To fill in the
vacuum and to provide you a novice study material, we have written the book "Research
Methods of Computer Science".
This book, written in a simple and lucid language deals with all aspects of research
methods in computer science, viz., objective and dimensions of research, research problems,
research methodology and research proposal. No other book combines these theories with
adequate examples. The basic concepts of these theories have been illustrated in detail in this
book.
The key feature of this book that sets it apart from other books is the provision of
detailed theory and self evaluation exercises at the end of each chapter. This provides an
opportunity to the students to test whether he/she has fully grasped the fundamental
concepts.
The book fulfils the curriculum needs of undergraduate, postgraduate and research
students of computer science in engineering and MCA courses. The judicious choice of the
topics also makes it a novel guide for the computer professionals who have been indulged in
computer research methodology.
Special thanks to all those who have helped in bringing out this book in its present form.
Finally suggestions, comments and error reports that have escaped our notice, for the
improvement of this book are cordially welcome.
Author
Chapter 1 OBJECTIVES AND
DIMENSIONS OF RESEARCH
Learning Objectives:
After going through this chapter, you should appreciate the following:
x The Objectives of Research
x The Dimensions of Research
x Tools of Research
Computer Science is the most happening field of Science. Wherever we go, we find computers
and its applications. Computer science can be defined as:
1. Computer Science is the study of phenomena related to computers.
2. Computer Science is the study of information structures.
3. Computer Science is the study and management of complexity.
4. Computer Science is the mechanization of abstraction.
5. Computer Science is a field of study that is concerned with theoretical and applied
disciplines in the development and use of computers for information storage and
processing, mathematics, logic, science, and many other areas.
Basically, we find characteristic features of classical scientific methods also in CS. What is
specific for CS is that its objects of investigation are artifacts (computer-related phenomena)
that change concurrently with the development of theories describing them and simultaneously
with the growing practical experience in their usage.
1
2 RESEARCH METHODS OF COMPUTER SCIENCE
A computer from the 1940s is not the same as a computer from the 1970s, which in its
turn is different from a computer in 2002. Even the task of defining what a computer is in the
year 2002 is far from trivial.
Computer science can be divided into: Theoretical, Experimental and Simulation CS,
which are three methodologically distinct areas. One method is however common for all three
of them, and that is modeling.
MODELING
Modeling is a process that always occurs in science, in a sense that the phenomenon of interest
must be simplified, in order to be studied. That is the first step of abstraction. A model has to
take into account the relevant features of a phenomenon. It obviously means that we are
supposed to know which features are relevant. That is possible because there is always some
theoretical ground that we start from when doing science.
A simplified model of a phenomenon means that we have a sort of description in some
symbolic language, which enables us to predict observable/measurable consequences of given
changes in a system. Theory, experiment and simulation are all about (more or less detailed)
models of phenomena.
Concerning Theoretical Computer Science, which adheres to the traditions of logic and
mathematics, we can conclude that it follows the very classical methodology of building theories
as logical systems with stringent definitions of objects (axioms) and operations (rules) for
deriving/proving theorems.
The key recurring ideas fundamental for computing are:
x Conceptual and formal models (including data models, algorithms and complexity)
x Different levels of abstraction
x Efficiency
Data models are used to formulate different mathematical concepts. In CS a data model
has two aspects:
x The values that data objects can assume, and
x The operations on the data.
Here are some typical data models:
x The tree data model (the abstraction that models hierarchical data structure):
x The list data models (can be viewed as special case of tree, but with some additional
operations like push and pop. Character strings are an important kind of lists)
OBJECTIVES AND DIMENSIONS OF R ESEARCH 3
x The set data model (the most fundamental data model of mathematics. Every concept
in mathematics, from trees to real numbers can be expressed as a special kind of set)
x The relational data model (the organization of data into collections of two-
dimensional tables)
x The graph data model (a generalization of the tree data model: directed, undirected,
and labelled)
x Patterns, automata and regular expressions. A pattern is a set of objects with some
recognizable property. The automaton is a graph-based way of specifying patterns.
Regular expression is algebra for describing the same kinds of patterns that can be
described by automata. Theory creates methodologies, logics and various semantic
models to help design programs, to reason about programs, to prove their correctness,
and to guide the design of new programming languages.
However, CS theories do not compete with each other as to which better explains the
fundamental nature of information. Nor are new theories developed to reconcile theory with
experimental results that reveal unexplained anomalies or new, unexpected phenomena, as in
physics. In computer science there is no history of critical experiments that decide between
the validity of various theories, as there are in physical sciences. The basic, underlying
mathematical model of digital computing is not seriously challenged by theory or experiments.
In computer science, results of theory are judged by the insights they reveal about the
mathematical nature of various models of computing and/or by their utility to the practice of
computing and their ease of application. Do the models conceptualize and capture the aspects
computer scientists are interested in, do they yield insights in design problems, and do they
aid reasoning and communication about relevant problems.
The design and analysis of algorithms is a central topic in theoretical computer science.
Methods are developed for algorithm design, measures are defined for various computational
resources, tradeoffs between different resources are explored, and upper and lower-resource
bounds are proved for the solutions of various problems. In the design and analysis of
algorithms measures of performance are well-defined, and results can be compared quite easily
in some of these measures (which may or may not fully reflect their performance on typical
problems). Experiment with algorithms are used to test implementations and compare their
“practical” performance on the subsets of problems considered important.
The subject of inquiry in the field of computer science is information rather than energy or
matter. However, it makes no difference in the applicability of the traditional scientific method.
To understand the nature of information processes, computer scientists must observe
phenomena, formulate explanations and theories, and test them.
Experiments are used both for theory testing and for exploration. Experiments test
theoretical predictions against reality. A scientific community gradually accepts a theory if all
4 RESEARCH METHODS OF COMPUTER SCIENCE
known facts within its domain can be deduced from the theory, if it has withstood experimental
tests, and if it correctly predicts new phenomena. Repeatability ensures that results can be
checked independently and thus raises confidence in the results.
Nevertheless, there is always an element of uncertainty in experiments and tests as
well: To paraphrase Edsger Dijkstra, an experiment can only show the presence of bugs (flaws) in a
theory, not their absence. Scientists are keenly aware of this uncertainty and are therefore ready
to disqualify a theory if contradicting evidence shows up.
A good example of theory falsification in computer science is the famous Knight and
Leveson experiment, which analyzed the failure probabilities of multiversion programs.
Conventional theory predicted that the failure probability of a multiversion program was the
product of the failure probabilities of the individual versions. However, John Knight and Nancy
Leveson observed that real multiversion programs had significantly higher failure probabilities.
In fact, the experiment falsified the basic assumption of the conventional theory, namely that
faults in different program versions are statistically independent.
Experiments are also used in areas to which theory and deductive analysis do not reach.
Experiments probe the influence of assumptions, eliminate alternative explanations of
phenomena, and unearth new phenomena in need of explanation. In this mode, experiments
help with induction: deriving theories from observation.
Artificial Neural Networks (ANN) are a good example of the explorative mode of
experimentation. After ANN having been discarded on theoretical grounds, experiments have
demonstrated properties better than those theoretically predicted. Researchers are now
developing better theories of ANN in order to account for these observed properties.
Experiments are made in many different fields of CS such as search, automatic theorem
proving, planning, NP-complete problems, natural language, vision, games, neural nets/
connectionism, and machine learning. Furthermore, analyzing performance behavior on
networked environments in the presence of resource contention from many users is a new
and complex field of experimental computer science. In this context it is important to mention
Internet. Yet, there are plenty of computer science theories that haven’t been tested. For instance,
functional programming, object-oriented programming, and formal methods are all thought
to improve programmer productivity, program quality, or both. Yet, none of these obviously
important claims have ever been tested systematically, even though they are all 30 years old
and a lot of effort has gone into developing programming languages and formal techniques.
Some fields of Computing such as Human Computer Interaction and parts of Software
Engineering have to take into consideration even humans (users, programmers) in their models
of the investigated phenomena. It is therefore resulting in a “soft” empirical approach more
characteristic for Humanities and Social Sciences, with methodological tools such as interviews
and case studies.
COMPUTER SIMULATION
In recent years computation, which comprises computer-based modeling and simulation, has
become the third research methodology within CS, complementing theory and experiment.
OBJECTIVES AND DIMENSIONS OF R ESEARCH 5
Research in any discipline is a hard task but when it comes to Computers and IT, it becomes
even more daunting task. Still there are very few researchers in computer science and this is
the reason why PhD in computer science is so important and crucial for a successful career.
Research lets you learn a set of work skill that you can’t get from classes. It includes
x Significant writing task
x Independent/unstructured work task
x Doing something real(Experimental support to prove the concept)
Research helps you to become a true expert in respective computer field. Research in computer
science is not only helping the acadmia but is helping industries also.
The expanding scope of ‘computing science’ makes it difficult to sustain traditional scientific
and engineering models of research. In particular, recent work in formal methods has
abandoned the traditional empirical methods. Similarly, research in requirements engineering
and human computer interaction has challenged the proponents of formal methods. These
tensions stem from the fact that ‘Computing Science’ is a misnoma. Topics that are currently
considered part of the discipline of computing science are technology rather than theory driven.
This creates problems if academic departments are to impose scientific criteria during the
assessment of PhDs. It is, therefore, important that people ask themselves ‘What is Research
in Computing Science’ before starting on a higher degree.
Good research practice suggests that we should begin by defining our terms. The Oxford
Concise dictionary defines research as:
x Research. 1. (a) the systematic investigation into and study of materials, sources,
etc., in order to establish facts and reach new conclusions. (b) an endeavor to discover
new or collate old facts etc., by the scientific study of a subject or by a course of
critical investigation.
6 RESEARCH METHODS OF COMPUTER SCIENCE
This definition is useful because it immediately focuses upon the systematic nature of research.
In other words, the very meaning of the term implies a research method. These methods or
systems essentially provide a model or structure for logical argument.
The highest level of logical argument can be seen in the structure of debate within a particular
field. Each contribution to that debate falls into one of three categories:
x Thesis
This presents the original statement of an idea. However, very few research
contributions can claim total originality. Most borrow ideas from previous work,
even if that research has been conducted in another discipline.
x Antithesis
This presents an argument to challenge a previous thesis. Typically, this argument
may draw upon new sources of evidence and is typically of progress within a field.
x Synthesis
This seeks to form a new argument from existing sources. Typically, a synthesis might
resolve the apparent contradiction between a thesis and an antithesis.
A good example of this form of dialetic is provided by the debate over prototyping. For
example, some authors have argued that prototypes provide a useful means of generating
and evaluating new designs early in the development process (thesis), (Fuchs, 1992). Others
have presented evidence against this hypothesis by suggesting that clients often choose features
of the prototyping environment without considering possible alternatives (antithesis) (Hayes
and Jones, 1989). A third group of researchers have, therefore, developed techniques that are
intended to reduce bias towards features of prototyping environments (synthesis) (Gravell
and Henderson, 1996). Research in a field progresses through the application of methods to
prove, refute and reassess arguments in this manner.
MODELS OF ARGUMENT
A more detailed level of logical argument can be seen in the structures of discourse that are
used to support individual works of thesis, antithesis or synthesis.
PROOF BY DEMONSTRATION
Perhaps the most intuitively persuasive model for research is to build something and then let
that artifact stand as an example for a more general class of solutions. There are numerous
examples of this approach being taken within the field of computer science. It is possible to
OBJECTIVES AND DIMENSIONS OF R ESEARCH 7
argue that the problems of implementing multi-user operating systems were solved more
through the implementation and growth of UNIX than through a more measured process of
scientific enquiry.
However, there are many reasons why this approach is an unsatisfactory model for
research. The main objection is that it carries high risks. For example, the artifact may fail long
before we learn anything about the conclusion that we are seeking to support. Indeed, it is
often the case that this approach ignores the formation of any clear hypothesis or conclusion
until after the artefact is built. This may lead the artifact to become more important to the
researcher than the ideas that it is intended to establish.
The lack of a clear hypothesis need not be the barrier that it might seem. The proof by
demonstration approach has much in common with current engineering practice. Iterative
refinement can be used to move an implementation gradually towards some desired solution.
The evidence elicited during previous failed attempts can be used to better define the goal of
the research as the work progresses. The key problem here is that the iterative development of
an artefact, in turn, requires a method or structure. Engineers need to carefully plan ways in
which the faults found in one iteration can be fed back into subsequent development. This is,
typically, done through testing techniques that are based upon other models of scientific
argument. This close relationship between engineering and scientific method should not be
surprising:
x engineering n. an application of science to the design, building and use of machines,
construction etc. (The Oxford Concise Dictionary).
EMPIRICISM
The Western empirical tradition can be seen as an attempt to avoid the undirected interpretation
of artifacts. It has produced the most dominant research model since the seventeenth century.
It can be summarized by the following stages:
x Hypothesis Generation
This explicitly identifies the ideas that are to be tested by the research.
x Method Identification
This explicitly identifies the techniques that will be used in order to establish the
hypothesis. This is critical because it must be possible for one’s peers to review and
criticize the appropriateness of the methods that you have chosen. The ability to
repeat an experiment is a key feature of strong empirical research.
x Result Compilation
This presents and compiles the results that have been gathered from following the
method. An important concept here is that of statistical significance; whether or not
the observed results could be due to chance rather than an observable effect.
8 RESEARCH METHODS OF COMPUTER SCIENCE
x Conclusion
Finally, the conclusions are stated either as supporting the hypothesis or rejecting it.
In the case that results do not support a hypothesis, it is important always to
remember that this may be due to a weakness in the method. Conversely, successful
results might be based upon incorrect assumptions. Hence, it is vital that all details
of a method are made available to peer review.
This approach has been used to support many different aspects of research within
Computing Science. For example, Boehm, Gray and Seewaldt (1984) used it to compare the
effectiveness of specification and prototyping techniques for software engineering. Others
have used it to compare the efficiency of searching and sorting algorithms. Researchers in
Information Retrieval have even developed standard methods which include well known test
sets to establish performance gains from new search engines.
There are many problems with the standard approach to scientific empiricism when
applied to computing science. The principle objection is that many aspects of computing defy
the use of probabilistic measures when analyzing the results of empirical tests. For example,
many statistical measures rely upon independence between each test of a hypothesis. Such
techniques clearly cannot be used when attempting to measure the performance of any system
that attempts to optimise its performance over time; this rules out load balancing algorithms
etc. Secondly, it can be difficult to impose standard experimental conditions upon the products
of computer science. For example, if a program behaves in one way under one set of operating
conditions then there is no guarantee that it will behave in the same way under another set of
conditions. These conditions might go down to the level of alpha particles hitting memory
chips. Thirdly, it can be difficult to generalise the results of tightly controlled empirical
experiments. For example, just because a user finds a system easy to use in a lab-based
evaluation, there is no guarantee that another user will be able to use that product amidst the
distractions of their everyday working environment. Finally, it is difficult to determine when
a sufficient number of trials have been conducted to support many hypotheses. For example,
any attempt to prove that a program always satisfies some property will be almost certainly
doomed to failure using standard experimental techniques, The number of potential execution
paths through even simple code makes it impossible to test properties against every possible
execution path.
MATHEMATICAL PROOF
The dissatisfaction with empirical testing techniques has led many in the computing science
research community to investigate other means of structuring arguments in support of
particular conclusions. In the United Kingdom, much of this work has focused upon
argumentation techniques that were originally developed to model human discourse and
thought within the field of philosophy. For example, Burrows, Abadi and Needham (1990)
adopted this approach to reason about the correctness of network authentication protocols.
Research Methods Of Computer
Science
40%
OFF