0% found this document useful (0 votes)
47 views

Statistical Analysis With Software Application

- Data mining is a method for discovering patterns in large data sets. It includes identifying groups of data records through cluster analysis. - Text analytics processes unstructured text to derive useful information through techniques like text mining. - Business intelligence transforms raw data into understandable business information through data mining and analysis to support decision making.

Uploaded by

Mathew Estrada
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views

Statistical Analysis With Software Application

- Data mining is a method for discovering patterns in large data sets. It includes identifying groups of data records through cluster analysis. - Text analytics processes unstructured text to derive useful information through techniques like text mining. - Business intelligence transforms raw data into understandable business information through data mining and analysis to support decision making.

Uploaded by

Mathew Estrada
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 22

Which of the following data mining techniques is predictive?

- classification

It is a powerful tool that shows the network of data.

- Knime

It makes complex data more understandable and usable.

- data visualization

What is the process of deriving useful information from text?

- Text Analytics 

It is used in organization’s strategic and tactical business decision making.

- business intelligence

It is a method for discovering patterns in large data sets.

- Data Mining  

It includes identifying groups of data records.

- cluster analysis

Which of the following is NOT a goal in data mining?

- collecting data

Which of the following is NOT a method used in data analysis?

- Statistics Analytics

Which of the following type of text  is processed in text analytics?

- unstructured   

It has the goal of discovering useful information to support decision making.

- data analysis
It extracts meaningful numerical indices from information and make it available to statistical
and machine learning.

- Text analytics

_____________ includes identifying groups of data record.

- Cluster analysis 

The following are artifacts used in data analysis EXCEPT:

- ANOVA

___________ uses artifacts to present data visually.

- data visualization 

It transforms data into actionable intelligence for business purposes.

- Business Intelligence 

The following processes are used in data analysis EXCEPT:

- collecting

It is a free software programming language.

- R-programming

What programming language doe Orange use?

- python

The goal is to transform raw data into understandable business information.

- Data mining

Which of the following type of text  is processed in text analytics?

- unstructured
A matrix that has the same number of rows and columns is called

- square 

A bell shaped curve that is symmetric about a vertical line.

- normal distribution

The product of a 2x5 and 5x3 matrices is a ______matrix

- 2x3

What is the value of the mean in a normal probability density function?

- 50

A special type of function where the domain is a  set of consecutive integers.

- sequence

Another term for text analytics.

- text mining

The proportion of a well-defined classified positive events.

- data base

A graph that is used to indicate frequency distribution.

- histogram

It is used to enable an entity to determine consequences by thinking rather than acting.

- Knowledge Representation

Null strings are indicated by

- λ 

It offers a way to examine trends from collected data and derive insights from it.
- Business Intelligence

Refers to using tools of statistics to present data visually.

- data visualization

Earlier name for data science.

- datalogy

What type of text are processed in Text analytics?

- unstructured

Which is Not an interaction data?

- data base

The proportion of a well-defined classified positive events.

- sensitivity

It is a collection of machine learning algorithms for data mining task.

- WEKA 

Which of the following is NOT a module in rapid Miner?

- loop     

Which of the following pertains to predictive data mining technique?

- Regression     

_____________ is rated as the number one business analytics software.

- Rapid miner

Primarily used for data pre-processing.

- Knime     
It is a perfect software which is written in Python computing language.

- Orange     

Which of the following is NOT a data mining tool?

- Python

It is a module in rapid miner that considers the workflow.

- studio       

The following are  data mining techniques EXCEPT:

- Collection   

Which is primarily written in C and in Fortran?

- R-programming

It sees a set of prototypes in particular prototypical diseases to be matched against the case at hand.

- INTERNIST

It is a numerical description of the outcome of a statistical experiment.

- random variable

The creation of data from varied sources and its qualification into information.

- datafication

If R= { (3,3), (3,6), (5,5),(5,10),(6.12)} is a binary relation in R which the domain is

- {3,5,6}

The sets  A= { x/x is a distinct letter in the word "MATHEMATICS"} and B={x/x is a distinct letter in the
word "STATISTICS"} , the two sets are

- joint

In α =babaa  β  =a^6b^5bb, what is the length of the concatenation of the two strings?
- 18

What does GLM means?

- Generalized Linear model

The process of inspecting,cleansing,transforming and modelling data with the goal of discovering useful
information.

- data analysis

Empirical rule for a normal distribution  lie ______% of data with 1 standard deviation below and above
the mean.

- 68

Another term for an empty set.

- null

Which is NOT a basic representation technology?

- graph

If A= { x/x is a distinct letter in the word "MATHEMATICS"}  AND B={x/x is a distinct letter in the
word "STATISTICS"} then their intersection is

- {A,C,I,S,T}

The range in  R={ (3,3), (3,6), (5,5),(5,10),(6.12)} is a binary relation in R is

- {3,5,6,10,12}

The proportion of well defined negative events is called ________________.

- specificity

The symbol used to indicate strings with no elements.


What programming language is used in Rapid miner?

- Java

It is a method for discovering patterns in large data sets.

- Data Mining  

_______________ is a data structure that every component has a unique processor and succesor.

- linear   

What is an organized collection of information and set of information used to manage that operation?

- ADT

Which of the following is the transpose of B?

- -8 7 1 0

What is the correct meaning of ADT?

- Abstract Data Type     

Which of the following is TRUE?

- A + B = B+ A

If A={ 2,3} B={4,5},which of the following is a Cartesian product of the two sets?

- { (3,4) (3,5) (2,4 ) {2,5) }

Which of the matrices is singular?

-A
What is the size of the product of a 5x 6    and a 6x 8 matrices?

- 5x 8

Matrix B is

- invertible

Addition and subtraction of matrices only is possible if  two are more matrices.

- Have same sizes.     

An array is a good example of _________data structure.

- static   

It refers to a data structure that grows and shrinks at execution time.

- dynamic

ML means:

- Machine Learning

The intersection of the two sets A={ 2,3} B={4,5} is a

- null set

What is a data structure that has a fixed size?

- static

The two sets If A={ 2,3} B={4,5} are said to be

- adjoint

What is the earlier name for data science?


- datalogy

3A + B = 

- -14 -2 13 18

What is the focus of data science?

- manipulate data efficiently and effectively       

Which is NOT a characteristic feature of data structure?

- It contains a fixed structure.

The method that does NOT require t he assumption that the parameters are normally distributed.

- profile likehood

He coined the term  "data scientist"

- DJ Patil

LR means ________________________.

- Logistic Regression

What range of values 3 SD below and above the mean in a normal distribution if the mean is 10 and
standard deviation is 2?

- 4-16

The following are the 3V's of big data EXCEPT

- veracity

According to Hilary Mason which is NOT a skill that a good data scientist must cultivate.

- critical thinking 

Data is NOT information unless we add_________.


- analytics

The expected value or mean of a random variable in discrete case.

- probability mass distribution

A graph used to indicate intervals in a frequency distribution is refereed to as a______________.

- histogram

It expands available data enormously.

- text mining 

A  vegetable distributor  knows that during the month of August ,the weights of tomatoes are normally
distributed with a mean of 0.61 lb  and a standard deviation of 0.15 lb. How many can be expected to
weigh between 0.31 to 0.91  in a shipment of 4500 tomatoes.

- 4275

The quantification of data into information.

- datafication

The major outcome of correlation.

- prediction

Which belong to the GLM  family?

- logistic and linear

Which is NOT a correct correlation Coefficient?

- 1.2

KR means __________________________.

- Knowledge Representation

He is someone who asks interesting questions on formal and informal theory.


- data scientist

He pointed out that until 2003 ,all of mankind had generated just 5 exabytes of data

- Eric Schmidt

The creation of data from varied sources and its quantification into information.

- Datafication

PAW means____________.

- Predictive Analytics World

Exabyte means ________bytes

- billion billion

It refers to well based theories  and sound business judgement.

- Data Science

IOT means

- Internet of things

He said that “ In mathematics the art of proposing a question  must be held of higher value than solving
it”.

- Georg Cantor

These are the data skills that a good data scientist need to cultivate EXCEPT

- speaking

The person who said that “ The future is not google-able”.

- William Gibson/William Gillason

How many bytes of data are generated every two days in today's world?
- 5 exabytes

“ All models are wrong but  some  are useful “

- George  E. P. Box

The explosion of _______data is the main reason why every 2 days 5 exabytes of data are generated.

- interaction

 A new phenomenon for the explosion of _________data

- interaction

The developer of farmville, a famous game in the internet.

- Zynga Incorporated

The creation of data from varied sources and its qualification into information.

- datafication

It shows a high correlation between the incidence of flu and searches about flu on google.

- Google Flu trends

It expands available data enormously since there is so much more text being generated than numbers.

- Text mining

What is a great example of data product?

- google maps

A distribution where large distribution are displayed.

- Grouped frequency distribution

What increases data volume?

- velocity
It is often used as model of of the number arrivals at a facility in a given period of time.

- poison probability distribution

It views the world in thinking of prototypical objects.

- frame

As of 2014,there are _______million of tweets a day.

- 500

The proportion of a well-classified negative event.

- specificity

The following are elements in an analytic plan EXCEPT

- graphs 

It allows you to see which value of the explanatory variable corresponds a given probability success.

- probability analysis table

A positive z-score means that the score  is

- Higher than the mean

If there are 101 scores the median is equal to the _____ranked score.

- 51st

The score easily affected by extreme values is the _________.

- mean

On an examination given to 1000  students, Jef’s score of 80 was higher than the score of 480 students
who took the exam. What is the percentile for Jef’s score?

- 48th
If  in a distribution all scores are distinct then_____________.

- it is skewed.

Which of the following statements is TRUE?

- Q2=Range

The most frequent score.

- mode

If the standard deviation of a distribution is 3, the variance is

- 1.41

The distribution 2,4,4,4,5,5,6,8,9  is said to be

- bimodal

The standard deviation for the data in 2,4,4,4,5,5,6,8,9

- 2.15

Which is NOT a measure of variability?

- range

Which is not a measure of central tendency?

- standard deviation

A score of 3 in 2,4,4,4,5,5,6,8,9 is

- 1.18 below the mean

A distribution with 4 modes is said to be a _________distribution.

- bimodal

What is value of quartile 3 in 2,4,4,4,5,5,6,8,9 ?


-6

In 2,4,4,4,5,5,6,8,9 the range is

-5

Which is NOT a measure of central tendency?

- quartile

The number that occurs most frequently is called________.

- Mode

Another term for variability.

- dispersion

The score NOT  easily affected by extreme  values.

- Median

it is  a perfect software  for machine learning.

- orange 

The following are large inputs EXCEPT

- Big beta notation

It relates the length of an algorithm’s input to the number of steps it takes.

- time complexity

The sets  A= { x/x is a distinct letter in the word "MATHEMATICS"} and B={x/x is a distinct letter in the
word "STATISTICS"} , the two sets are

- joint

Which of the following is a predictive data mining technique?

- regression
Algorithm analysis is an important part of a broader_____________.

- computational complexity theory

He coined the term “analysis of algorithms”.

- Donald Knuth

It is a process  of finding the computational complexity of algorithms.

- analysis of algorithms

If A= { x/x is a distinct letter in the word "MATHEMATICS"}  AND B={x/x is a distinct letter in the
word "STATISTICS"} then their intersection is

- {A,C,I,S,T}

The following are softwares used in  data mining  EXCEPT

- SPSS

It relates the length of an algorithm to the number of storage location it uses.

- space complexity

It is used to discover patterns in large data sets

- Data mining

An example of an abstract computer.

- Turing machine

It is  popular among financial data analysts.

- Knime

A special type of function where the domain is a  set of consecutive integers.

- Sequence

It is used for prototyping in Rapid miner.


- studio 

The function describing the performance of an algorithm is usually an upper bound determined from
______inputs.

- worst case

It is a process  of finding the computational complexity of algorithms.

- analysis of algorithms

The constant multiplicative factor in which algorithms are related are_______ constants.

- hidden

In α =babaa  β  =a^6b^5bb, what is the length of the concatenation of the two strings?

- 18

There are how many data mining techniques?

-7

It is a theoretical classification that estimates and anticipates the increase increase in running time for
algorithms.

- run time analysis

Which of the following is TRUE when a distribution is normal?

- Mean

It partitions a ranked data into four equal groups.

- quartile

If there are 103 scores the median is equal to the _____ranked score.

- 52nd

The creation of a data product contains 3 components  EXCEPT

- time
A data having the same number of occurrence in scores is said to be

- no mode

A survey of 100 consumers said that the price charged for a kilo of rice could be approximated by a
normal distribution with a mean of 35 and a standard deviation of 4.How many of them lie between 27
and 43?

- 95

It refers to the degree of relationship between two variables?

- Correlation

A  vegetable distributor  knows that during the month of August ,the weights of tomatoes are normally
distributed with a mean of 0.61 lb  and a standard deviation of 0.15 lb. What percent of the tomatoes
weigh less than 0.71 lb?

- 85, 95

A perfect positive correlation coefficient is equal to

-1

It list the percent of data in a distribution.

- relative frequency distribution

What percent of data will lie within 2 standard deviation of the mean?

- 95

If the standard deviation of a distribution is 3.5, the variance is

- 12.25

On an examination given to 1000 students, Jef’s score of 80 was higher than the score of 480 students
who took the exam. What is the percentile for Jef’s score?

- 48th

The equation of the _______line predicts the value of Y given X.

- Regression
A bell-shaped distribution that is symmetric about a vertical line?

- Normal

A positive z-score means that the score is

- Higher than the mean

A  vegetable distributor  knows that during the month of August ,the weights of tomatoes are normally
distributed with a mean of 0.61 lb  and a standard deviation of 0.15 lb. How many can be expected to
weigh between 0.31 to 0.91  in a shipment of 4500 tomatoes.

- 4275

Example of a data product.

- google map

A survey of 100 consumers said that the price charged for a kilo of rice could be approximated by a
normal distribution with a mean of 35 and a standard deviation of 4.How many are less than 39?

- 84

Which is NOT a value of r ?

- -0.05 0.98

In the equation of the regression line represented by Y= 1.24 X + 6.9 if X=2 then Y =?

- 9.38

The score NOT easily affected by extreme values.

- Median

The value of X in the regression equation Y= 1.24 X + 6.9 if Y=13.1 is

-5

A bell-shaped distribution that is symmetric about a vertical line.

- Normal

What is the value of the mean if a score of 110 is 3 standard deviation above the mean?
- 95

The area of the standard normal curve to the right of z=0.82 is _______.

- 0.206

The method of correlation used for ranked score is ________.

- Spearman rho

What range of values lie between 3 standard deviations above and below the mean if the mean is 80
and the standard deviation is 3?

- 71-89

Data involving two variables.

- bivariate

The normal distribution with a mean of 0 and standard deviation of 1.

- Standard

Who said that "The future is not  google-able " ?

- William Gillason

The difference between the highest and lowest value.

- range

A negative correlation exists when___________.

- x increases y decreases

Which of the following is used as a method for Correlation?


- Pearson r

A score of 50 lies 2 standard deviations above a mean of 30.What is the value of the standard deviation?

- 10

The middle-most value in a ranked list of numbers.

- median

A  vegetable distributor  knows that during the month of August ,the weights of tomatoes are normally
distributed with a mean of 0.61 lb  and a standard deviation of 0.15 lb. How many can be expected to
weigh more than 0.31 lb in a shipment of 6000 tomatoes.

- 200 150

You might also like