0% found this document useful (0 votes)
27 views

Data-Mining (Set 1)

The document provides a series of multiple choice questions about data mining and machine learning concepts. It covers topics like adaptive system management, Bayesian classifiers, algorithms, bias, background knowledge, case-based learning, classification, binary attributes, classification accuracy, clusters, black boxes, definitions of concepts, data selection, DNA, hybrid approaches, discovery, Euclidean distance, hidden knowledge, enrichment, heterogeneous databases, enumeration, heuristics, and Kohonen self-organizing maps.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views

Data-Mining (Set 1)

The document provides a series of multiple choice questions about data mining and machine learning concepts. It covers topics like adaptive system management, Bayesian classifiers, algorithms, bias, background knowledge, case-based learning, classification, binary attributes, classification accuracy, clusters, black boxes, definitions of concepts, data selection, DNA, hybrid approaches, discovery, Euclidean distance, hidden knowledge, enrichment, heterogeneous databases, enumeration, heuristics, and Kohonen self-organizing maps.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

Data Mining

1 of 4 sets

1. Adaptive system management is


A. it uses machine-learning techniques. here program can learn from past experience and adapt
themselves to new situations.
B. computational procedure that takes some value as input and produces some value as output.
C. science of making machines performs tasks that would require intelligence when performed by
humans.
D. none of these
Answer:A

2. Bayesian classifiers is
o m
. c
A. a class of learning algorithm that tries to find an optimum classification of a set of examples
using the probabilistic theory.
te
a
B. any mechanism employed by a learning system to constrain the search space of a hypothesis.

q M
C. an approach to the design of learning algorithms that is inspired by the fact that when people

c
encounter new situations, they often explain them by reference to familiar experiences, adapting

M
the explanations to fit the new situation.
D. none of these
Answer:A

3. Algorithm is
A. it uses machine-learning techniques. here program can learn from past experience and adapt
themselves to new situations.
B. computational procedure that takes some value as input and produces some value as output.
C. science of making machines performs tasks that would require intelligence when performed by
humans.
D. none of these
Answer:B

4. Bias is
A. a class of learning algorithm that tries to find an optimum classification of a set of examples
using the probabilistic theory.
B. any mechanism employed by a learning system to constrain the search space of a hypothesis.
C. an approach to the design of learning algorithms that is inspired by the fact that when people
encounter new situations, they often explain them by reference to familiar experiences, adapting
the explanations to fit the new situation.
D. none of these
Answer:B

5. Background knowledge referred to


A. additional acquaintance used by a learning algorithm to facilitate the learning process.
B. a neural network that makes use of a hidden layer.
C. it is a form of automatic learning.
D. none of these
Answer:A

6. Case-based learning is
A. a class of learning algorithm that tries to find an optimum classification of a set of examples
using the probabilistic theory.
B. any mechanism employed by a learning system to constrain the search space of a hypothesis.
C. an approach to the design of learning algorithms that is inspired by the fact that when people
encounter new situations, they often explain them by reference to familiar experiences, adapting
the explanations to fit the new situation.
D. none of these
Answer:C

7. Classification is
A. a subdivision of a set of examples into a number of classes.
B. a measure of the accuracy, of the classification of a concept that is given by a certain theory.
C. the task of assigning a classification to a set of examples
D. none of these
Answer:A

8. Binary attribute are

View all MCQ's at McqMate.com


A. this takes only two values. in general, these values will be 0 and 1 and .they can be coded as
one bit
B. the natural environment of a certain species.
C. systems that can be used without knowledge of internal operations.
D. none of these
Answer:A

9. Classification accuracy is
A. a subdivision of a set of examples into a number of classes
B. measure of the accuracy, of the classification of a concept that is given by a certain theory.
C. the task of assigning a classification to a set of examples
D. none of these
Answer:B

10. Biotope are


A. this takes only two values. in general, these values will be 0 and 1 and they can be coded as
one bit.
B. the natural environment of a certain species
C. systems that can be used without knowledge of internal operations
D. none of these
Answer:B

11. Cluster is
A. group of similar objects that differ significantly from other objects
B. operations on a database to transform or simplify data in order to prepare it for a machine-
learning algorithm
C. symbolic representation of facts or ideas from which information can potentially be extracted
D. none of these
Answer:A

12. Black boxes are


A. this takes only two values. in general, these values will be 0 and 1 and they can be coded as
one bit.
B. the natural environment of a certain species
C. systems that can be used without knowledge of internal operations

View all MCQ's at McqMate.com


D. none of these
Answer:C

13. A definition of a concept is-----if it recognizes all the instances of that concept
A. complete
B. consistent
C. constant
D. none of these
Answer:A

14. A definition or a concept is------------- if it classifies any examples as coming


within the concept
A. complete
B. consistent
C. constant
D. none of these
Answer:B

15. Data selection is


A. the actual discovery phase of a knowledge discovery process
B. the stage of selecting the right data for a kdd process
C. a subject-oriented integrated time variant non-volatile collection of data in support of
management
D. none of these
Answer:B

16. DNA (Deoxyribonucleic acid)


A. it is hidden within a database and can only be recovered if one ,is given certain clues (an
example is encrypted information).
B. the process of executing implicit previously unknown and potentially useful information from
data
C. an extremely complex molecule that occurs in human chromosomes and that carries genetic
information in the form of genes.
D. none of these
Answer:C

View all MCQ's at McqMate.com


17. Hybrid is
A. combining different types of method or information
B. approach to the design of learning algorithms that is structured along the lines of the theory of
evolution.
C. decision support systems that contain an information base filled with the knowledge of an
expert formulated in terms of if-then rules.
D. none of these
Answer:A

18. Discovery is
A. it is hidden within a database and can only be recovered if one is given certain clues (an
example is encrypted information).
B. the process of executing implicit previously unknown and potentially useful information from
data.
C. an extremely complex molecule that occurs in human chromosomes and that carries genetic
information in the form of genes.
D. none of these
Answer:B

19. Euclidean distance measure is


A. a stage of the kdd process in which new data is added to the existing selection.
B. the process of finding a solution for a problem simply by enumerating all possible solutions
according to some pre-defined order and then testing them
C. the distance between two points as calculated using the pythagoras theorem.
D. none of these
Answer:C

20. Hidden knowledge referred to


A. a set of databases from different vendors, possibly using different database paradigms
B. an approach to a problem that is not guaranteed to work but performs well in most cases
C. information that is hidden in a database and that cannot be recovered by a simple sql query.
D. none of these
Answer:C

21. Enrichment is

View all MCQ's at McqMate.com


A. a stage of the kdd process in which new data is added to the existing selection
B. the process of finding a solution for a problem simply by enumerating all possible solutions
according to some pre-defined order and then testing them
C. the distance between two points as calculated using the pythagoras theorem.
D. none of these
Answer:A

22. Heterogeneous databases referred to


A. a set of databases from different b vendors, possibly using different database paradigms
B. an approach to a problem that is not guaranteed to work but performs well in most cases.
C. information that is hidden in a database and that cannot be recovered by a simple sql query.
D. none of these
Answer:A

23. Enumeration is referred to


A. a stage of the kdd process in which new data is added to the existing selection.
B. the process of finding a solution for a problem simply by enumerating all possible solutions
according to some pre-defined order and then testing them
C. the distance between two points as calculated using the pythagoras theorem.
D. none of these
Answer:B

24. Heuristic is
A. a set of databases from different vendors, possibly using different database paradigms
B. an approach to a problem that is not guaranteed to work but performs well in most cases
C. information that is hidden in a database and that cannot be recovered by a simple sql query.
D. none of these
Answer:B

25. Hybrid learning is


A. machine-learning involving different techniques
B. the learning algorithmic analyzes the examples on a systematic basis 2nd makes incremental
adjustments to the theory that is learned
C. learning by generalizing from examples
D. none of these

View all MCQ's at McqMate.com


Answer:A

26. Kohonen self-organizing map referred to


A. the process of finding the right formal representation of a certain body of knowledge in order to
represent it in a knowledge-based system
B. it automatically maps an external signal space into a system\s internal representational space.
they are useful in the performance of classification tasks
C. a process where an individual learns how to carry out a certain task when making a transition
from a situation in which the task cannot be carried out to a situation in which the same\ task
under the same circumstances can be carried out.
D. none of these
Answer:B

27. Incremental learning referred to


A. machine-learning involving different techniques
B. the learning algorithmic analyzes the examples on a systematic basis and makes incremental
adjustments to the theory that is learned
C. learning by generalizing from examples
D. none of these
Answer:B

28. Knowledge engineering is


A. the process of finding the right formal representation of a certain body of knowledge in order to
represent it in a knowledge-based system
B. it automatically maps an external signal space into a system\s internal representational space.
they are useful in the performance of classification tasks.
C. a process where an individual learns how to carry out a certain task when making a transition
from a situation in which the task cannot be carried out to a situation in which the same task
under the same circumstances can be carried out.
D. none of these
Answer:A

29. Information content is


A. the amount of information with in data as opposed to the amount of redundancy or noise.
B. one of the defining aspects of a data warehouse

View all MCQ's at McqMate.com


C. restriction that requires data in one column of a database table to the a subset of another-
column.
D. none of these
Answer:A

30. Inductive learning is


A. machine-learning involving different techniques
B. the learning algorithmic analyzes the examples on a systematic basis and makes incremental
adjustments to the theory that is learned
C. learning by generalizing from examples
D. none of these
Answer:C

31. Inclusion dependencies


A. the amount of information with in data as opposed to the amount of redundancy or noise
B. one of the defining aspects of a data warehouse
C. restriction that requires data in one column of a database table to the a subset of another-
column
D. none of these
Answer:C

32. KDD (Knowledge Discovery in Databases) is referred to


A. non-trivial extraction of implicit previously unknown and potentially useful information from data
B. set of columns in a database table that can be used to identify each record within this table
uniquely.
C. collection of interesting and useful patterns in a database
D. none of these
Answer:A

33. Learning is
A. the process of finding the right formal representation of a certain body of knowledge in order to
represent it in a knowledge-based system
B. it automatically maps an external signal space into a system\s internal representational space.
they are useful in the performance of classification tasks.

View all MCQ's at McqMate.com


C. a process where an individual learns how to carry out a certain task when making a transition
from a situation in which the task cannot be carried out to a situation in which the same task
under the same circumstances can be carried out.
D. none of these
Answer:C

34. Naive prediction is


A. a class of learning algorithms that try to derive a prolog program from examples.
B. a table with n independent attributes can be seen as an n- dimensional space.
C. a prediction made using an extremely simple method, such as always predicting the same
output.
D. none of these
Answer:C

35. Learning algorithm referrers to


A. an algorithm that can learn
B. a sub-discipline of computer science that deals with the design and implementation of learning
algorithms.
C. a machine-learning approach that abstracts from the actual strategy of an individual algorithm
and can therefore be applied to any other form of machine learning.
D. none of these
Answer:A

36. Knowledge is referred to


A. non-trivial extraction of implicit previously unknown and potentially useful information from data
B. set of columns in a database table that can be used to identify each record within this table
uniquely
C. collection of interesting and useful patterns in a database
D. none of these
Answer:C

37. Node is
A. a component of a network
B. in the context of kdd and data mining, this refers to random errors in a database table.
C. one of the defining aspects of a data warehouse

View all MCQ's at McqMate.com


D. none of these
Answer:A

38. Machine learning is


A. an algorithm that can learn
B. a sub-discipline of computer science that deals with the design and implementation of learning
algorithms
C. an approach that abstracts from the actual strategy of an individual algorithm and can
therefore be applied to any other form of machine learning.
D. none of these
Answer:B

39. Projection pursuit is


A. the result of the application of a theory or a rule in a specific case
B. one of several possible enters within a database table that is chosen by the designer as the
primary means of accessing the data in the table.
C. discipline in statistics that studies ways to find the most interesting projections of multi-
dimensional spaces
D. none of these
Answer:C

40. Inductive logic programming is


A. a class of learning algorithms that try to derive a prolog program from examples
B. a table with n independent attributes can be seen as an n-dimensional space
C. a prediction made using an extremely simple method, such as always predicting the same
output
D. none of these
Answer:A

41. Statistical significance is


A. the science of collecting, organizing, and applying numerical facts
B. measure of the probability that a certain hypothesis is incorrect given certain observations.
C. one of the defining aspects of a data warehouse, which is specially built around all the existing
applications of the operational data
D. none of these

View all MCQ's at McqMate.com


Answer:B

42. Multi-dimensional knowledge is


A. a class of learning algorithms that try to derive a prolog program from examples
B. a table with n independent attributes can be seen as an n-dimensional space
C. a prediction made using an extremely simple method, such as always predicting the same
output.
D. none of these
Answer:B

43. Prediction is
A. the result of the application of a theory or a rule in a specific case
B. one of several possible enters within a database table that is chosen by the designer as the
primary means of accessing the data in the table.
C. discipline in statistics that studies ways to find the most interesting projections of multi-
dimensional spaces.
D. none of these
Answer:A

44. Query tools are


A. a reference to the speed of an algorithm, which is quadratically dependent on the size of the
data
B. attributes of a database table that can take only numerical values.
C. tools designed to query a database.
D. none of these
Answer:C

45. Operational database is


A. a measure of the desired maximal complexity of data mining algorithms
B. a database containing volatile data used for the daily operation of an organization
C. relational database management system
D. none of these
Answer:B

46. ...................... is an essential process where intelligent methods are applied to


extract data patterns.

View all MCQ's at McqMate.com


A. data warehousing
B. data mining
C. text mining
D. data selection
Answer:B

47. Which of the following is not a data mining functionality?


A. characterization and discrimination
B. classification and regression
C. selection and interpretation
D. clustering and analysis
Answer:C

48. ............................. is a summarization of the general characteristics or features


of a target class of data.
A. data characterization
B. data classification
C. data discrimination
D. data selection
Answer:A

49. ............................. is a comparison of the general features of the target class data
objects against the general features of objects from one or multiple contrasting
classes.
A. data characterization
B. data classification
C. data discrimination
D. data selection
Answer:A

50. Strategic value of data mining is ......................


A. cost-sensitive
B. work-sensitive
C. time-sensitive
D. technical-sensitive
Answer:C

View all MCQ's at McqMate.com


51. ............................. is the process of finding a model that describes and
distinguishes data classes or concepts.
A. data characterization
B. data classification
C. data discrimination
D. data selection
Answer:A

52. The full form of KDD is ..................


A. knowledge database
B. knowledge discovery database
C. knowledge data house
D. knowledge data definition
Answer:A

53. The out put of KDD is .............


A. data
B. information
C. query
D. useful information
Answer:A

54. . The full form of OLAP is


A. online analytical processing
B. online advanced processing
C. online advanced preparation
D. online analytical performance
Answer:C

55. ......................... is a subject-oriented, integrated, time-variant, nonvolatile


collection or data in support of management decisions.
A. data mining
B. data warehousing
C. document mining
D. text mining

View all MCQ's at McqMate.com


Answer:A

56. The data is stored, retrieved and updated in ....................


A. olap
B. oltp
C. smtp
D. ftp
Answer:B

57. An .................. system is market-oriented and is used for data analysis by


knowledge workers, including managers, executives, and analysts.
A. olap
B. oltp
C. both of the above
D. none of the above
Answer:A

58. ........................ is a good alternative to the star schema.


A. star schema
B. snowflake schema
C. fact constellation
D. star-snowflake schema
Answer:A

59. The ............................ exposes the information being captured, stored, and
managed by operational systems.
A. top-down view
B. data warehouse view
C. data source view
D. business query view
Answer:C

60. The type of relationship in star schema is ...............


A. many to many
B. one to one
C. one to many

View all MCQ's at McqMate.com


D. many to one
Answer:A

61. The .................. allows the selection of the relevant information necessary for
the data warehouse.
A. top-down view
B. data warehouse view
C. data source view
D. business query view
Answer:D

62. Which of the following is not a component of a data warehouse?


A. metadata
B. current detail data
C. lightly summarized data
D. component key
Answer:C

63. Which of the following is not a kind of data warehouse application?


A. information processing
B. analytical processing
C. data mining
D. transaction processing
Answer:D

64. Data warehouse architecture is based on .......................


A. dbms
B. rdbms
C. sybase
D. sql server
Answer:B

65. .......................... supports basic OLAP operations, including slice and dice, drill-
down, roll-up and pivoting.
A. information processing
B. analytical processing

View all MCQ's at McqMate.com


C. data mining
D. transaction processing
Answer:C

66. The core of the multidimensional model is the ....................... , which consists of
a large set of facts and a number of dimensions.
A. multidimensional cube
B. dimensions cube
C. data cube
D. data model
Answer:B

67. The data from the operational environment enter ........................ of data
warehouse.
A. current detail data
B. older detail data
C. lightly summarized data
D. highly summarized data
Answer:A

68. A data warehouse is ......................


A. updated by end users.
B. contains numerous naming conventions and formats
C. organized around important subject areas
D. contain only current data
Answer:A

69. Business Intelligence and data warehousing is used for ..............


A. forecasting
B. data mining
C. analysis of large volumes of product sales data
D. all of the above
Answer:B

70. Data warehouse contains ................ data that is never found in the operational
environment.

View all MCQ's at McqMate.com


A. normalized
B. informational
C. summary
D. denormalized
Answer:A

71. ................... are responsible for running queries and reports against data
warehouse tables.
A. hardware
B. software
C. end users
D. middle ware
Answer:D

72. The biggest drawback of the level indicator in the classic star schema is that is
limits ............
A. flexibility
B. quantify
C. qualify
D. ability
Answer:B

73. ............................. are designed to overcome any limitations placed on the


warehouse by the nature of the relational data model.
A. operational database
B. relational database
C. multidimensional database
D. data repository
Answer:A

74. KDD describes the _________.


A. whole process of extraction of knowledge from data
B. extraction of data
C. extraction of information
D. extraction of rules
Answer:A

View all MCQ's at McqMate.com


75. SQL helps to find _______.
A. the interesting data
B. hidden information
C. intermediate data
D. data under constraints that are already known
Answer:D

76. Translation of problem to learning technique is called as _______.


A. reengineering.
B. translational engineering.
C. representational engineering.
D. learning algorithm.
Answer:C

77. Which one of the following is not a part of empirical cycle in scientific
research?
A. Observation
B. Theory.
C. Self learning.
D. Prediction.
Answer:C

78. ________and __________ are the important qualities of good learning


algorithm.
A. Consistent, Complete.
B. Information content, Complex.
C. Complete, Complex.
D. Transparent, Complex.
Answer:A

79. Redundancy refers to the elements of a message that can be derived from other
parts of _________.
A. different message.
B. irrelevant message.
C. same message.
D. complete message.

View all MCQ's at McqMate.com


Answer:C

80. Metadata describes __________.


A. contents of database.
B. structure of contents of database.
C. structure of database.
D. database itself.
Answer:B

81. The partition of overall data warehouse is _______.


A. database.
B. data cube.
C. data mart.
D. operational data.
Answer:C

82. __________ is used to load the information from operational database.


A. Replication technique.
B. Reengineering technique.
C. Engineering technique.
D. Transformation engineering.
Answer:A

83. ___________ multiprocessing machines share same hard disk and internal
memory.
A. Massively parallel.
B. Symmetric.
C. Parallel.
D. Asymmetric.
Answer:B

84. A trivial result that is obtained by an extremely simple method is called


_______.
A. naive prediction.
B. accurate prediction.
C. correct prediction.

View all MCQ's at McqMate.com


D. wrong prediction.
Answer:A

85. The information on two attributes is displayed in ____________ in scatter


diagram.
A. visualization space.
B. scatter space.
C. cartesian space.
D. interactive space.
Answer:C

86. OLAP stands for ________.


A. Online Analytical Processing.
B. Online Linear Analytical Processing.
C. Online Animated Process.
D. Online Analytical Problem.
Answer:A

87. K-nearest neighbor is one of the _______.


A. learning technique.
B. OLAP tool.
C. purest search technique.
D. data warehousing tool.
Answer:C

88. The intermediate unit in perceptron is ________.


A. photoreceptors.
B. associators.
C. responders.
D. receptors.
Answer:B

89. OLAP is used to explore the ___________ knowledge.


A. shallow.
B. deep.
C. multidimensional.

View all MCQ's at McqMate.com


D. hidden.
Answer:C

90. A natural way to visualize the process of training a self-organizing map is


called __________.
A. kohonen movie.
B. kohonen map.
C. frame.
D. scatter diagram.
Answer:A

91. Hidden knowledge can be found by using ________.


A. searching algorithm.
B. pattern recognition algorithm.
C. searching algorithm.
D. clues.
Answer:B

92. Deep knowledge can be found only by using ________.


A. clues.
B. OLAP.
C. SQL.
D. algorithm
Answer:A

93. The next stage to data selection in KDD process ______.


A. enrichment.
B. coding.
C. cleaning.
D. reporting.
Answer:C

94. Enrichment means ____.


A. adding external data.
B. deleting data.
C. cleaning data.

View all MCQ's at McqMate.com


D. selecting the data.
Answer:A

95. The decision support system is used only for _______.


A. cleaning.
B. coding.
C. selecting.
D. queries.
Answer:D

96. In _________ approach data ware house is build first and all information
needed is selected.
A. top-down.
B. client/server.
C. bottom-up.
D. DSS.
Answer:A

97. The DB vendor who is able to operate massively parallel computers is


________.
A. TCS.
B. IBM.
C. CTS.
D. Wipro.
Answer:B

98. Which of the following is closely related to statistical significance and


transparency?
A. Classification Accuracy.
B. Transparency.
C. Statistical significance.
D. Search Complexity.
Answer:B

99. ________ is a creative activity that has to be performed repeatedly in order to


get best results.

View all MCQ's at McqMate.com


A. Cleaning
B. Reporting
C. Coding.
D. Selection.
Answer:C

100. _________ is an example for case based-learning.


A. Decision trees.
B. Neural networks.
C. Genetic algorithm.
D. K-nearest neighbor.
Answer:D

View all MCQ's at McqMate.com

You might also like