Data Mining
Data Mining
1. Which is the term used to define the task of inferring a model from labelled training
data? Supervised Learning.
2. Under which do self-organizing maps lie? unsupervised learning
3. Is discrimination between spam and ham emails a classification task? True
4. Is Data transformation involved in data mining? No
5. State whether True /False: Data warehouse is generally updated in real-time. False.
6. What does OLTP stand for? Online transaction processing
7. Which is defined as a subgroup of the data warehouses? Datamart
8. Where is data warehousing used? Decision support system.
9. Can data warehouse include Data base table, online data, and Flat files? YES
10. ETL stands for ____________. Extract, transform and load.
11. Which are systems of data warehousing mostly used? Reporting and data analysis.
12. Small logical units where data warehouses hold large amounts of data is known as _____.
data miners.
13. What is the main characteristic of OLTP? provides advanced database support
14. Is volatility a property of the data warehouse? No
15. On what dimension is data warehouse based? Multidimensional model.
16. DSS in data warehouse stands for _____________. Decision support system.
17. What is the time horizon in the data warehouse? 5-10 years.
18. From where are classification rules extracted? decision tree
19. What is the full form of KDD? Knowledge Discovery database.
20. ___________ is data about data. Metadata
21. Do Descriptive Classification and prediction are two categories of functions involved in
data mining? Yes
22. What is the reason of using Bayesian classifiers? A class of learning algorithm that tries to
find an optimum classification of a set of examples using the probabilistic theory.
23. What is Algorithm? Computational procedure that takes some value as input and produces
some value as output
24. What is bias? Any mechanism employed by a learning system to constrain the search space
of a hypothesis
25. Which is the process of subdivision of a set of examples into a number of classes?
Classification
26. State the properties of Binary attribute? This takes only two values. In general, these
values will be 0 and 1 and .they can be coded as one bit
27. What is Classification accuracy? Measure of the accuracy, of the classification of a concept that
is given by a certain theory
28. What is defined by Group of similar objects that differ significantly from other objects? Cluster
29. A definition of a concept is_____if it recognizes all the instances of that concept. Complete
30. The actual discovery phase of a knowledge discovery process is ___.
Data mining
31. A definition or a concept is_____ if it classifies any examples as coming within the concept.
Consistent
32. Name the stage in which the selecting the right data be done. Data selection
33. What is the task classification? The task of assigning a classification to a set of examples
34. How do we measure Euclidean distance? The distance between two points as calculated using
the Pythagoras theorem
35. Which type of database is “A set of databases from different b vendors, possibly using
different database paradigms”? Heterogeneous databases
36. What is referred Enumeration? The process of finding a solution for a problem simply by
enumerating all possible solutions according to some pre-defined order and then testing them
37. An approach to a problem that is not guaranteed to work but performs well in most cases
is called ____. Heuristic
38. Is Hybrid learning mean Machine-learning involving different techniques? Yes
39. Which type of engineering is the process of finding the right formal representation of a
certain body of knowledge in order to represent it in a knowledge-based system? Knowledge
engineering
40. The amount of information with in data as opposed to the amount of redundancy or noise
is ______. Information content
41. IS a prediction made using an extremely simple method, such as always predicting the same
output is defined as Naive prediction? yes
42. Measure of the probability that a certain hypothesis is incorrect given certain observations is
______ method. Statistical
43. What is Noise? In the context of data mining, this refers to random errors in a database table.
44. Linear Regression is the supervised machine learning model in which the model finds the
best fit ___ between the independent and dependent variable. Linear line
45. _____ is the mathematical likelihood that something will occur. Probability.
46. Which is a class of learning algorithm that tries to find an optimum classification of a set of
examples using the probabilistic theory? Bayesian
47. Choosing _______values for k can be noisy and will have a higher influence on the result.
Smaller
48. By maximizing the distances between nearest data point and hyper plane will help us to
decide the right hyper-plane ____. Margin
49. The A Priori algorithm is a ___________. Bottom-up search.
50. After the pruning of a priori algorithm, _______ will remain. No candidate set.
Long answer type questions