0% found this document useful (0 votes)
26 views2 pages

2mark Question

Data science is the combination of mathematics, statistics, machine learning, and computer science used to analyze data, discover patterns and insights, and make informed decisions. Exploratory data analysis involves analyzing and visualizing data to summarize characteristics and identify trends or patterns. Cross validation is more accurate than holdout validation because it uses different validation methods and splits the data into training and test sets randomly rather than equally. Data validation checks the accuracy and quality of source data before use or processing through data cleansing. Iteration is the continuous process of improving a concept or design by testing prototypes and refining them based on results.

Uploaded by

reshmibiotech
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views2 pages

2mark Question

Data science is the combination of mathematics, statistics, machine learning, and computer science used to analyze data, discover patterns and insights, and make informed decisions. Exploratory data analysis involves analyzing and visualizing data to summarize characteristics and identify trends or patterns. Cross validation is more accurate than holdout validation because it uses different validation methods and splits the data into training and test sets randomly rather than equally. Data validation checks the accuracy and quality of source data before use or processing through data cleansing. Iteration is the continuous process of improving a concept or design by testing prototypes and refining them based on results.

Uploaded by

reshmibiotech
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 2

2mark

1. Define Data Science?


Data Science is a combination of mathematics, statistics, machine learning, and computer
science. Data Science is collecting, analyzing and interpreting data to gather insights into the
data that can help decision-makers make informed decisions. Data Science is used in almost
every industry today that can predict customer behavior and trends and identify new
opportunities.

2. What is exploratory analysis


Exploratory data analysis (EDA) is used by data scientists to analyze and investigate data sets
and summarize their main characteristics, often employing data visualization methods. It
analyze the data and discover trends, patterns, or check assumptions in data with the help of
statistical summaries and graphical representations.

3. Differentiate between Hold on and cross validation


Hold on Cross validation
The training and test set is separated into The training and test set is separated
two equal set randomly into several criteria
Can create lots of misleading estimate Validation will be accurate by using
different forms of cross validation methods

4. What is Data Validation


Data validation means checking the accuracy and quality of source data before using,
importing or processing data, it is a form of data cleansing. Data validation is a general term
and can be performed on any type of data, that includes data within a single application (such
as Microsoft Excel) or when merging data within a single data store.

5. What is iteration
The iterative process is an approach to continuously improving a concept, design, or product.
Creators produce a prototype, test it, tweak it, and repeat the cycle with the goal of getting
closer to the solution.

6. Define Data mart


A data mart is a subset of a data warehouse focused on a particular line of business,
department, or subject area. Data marts make specific data available to a defined group of
users, which allows those users to quickly access critical insights without wasting time
searching through an entire data warehouse

7. List out the benefits of business intelligence


Relevant and accurate reporting, business insight, customer satisfaction, improved data
quality, competitive analysis, Increased revenue, Increased productivity, increased sales and
marketing

8. Expand and explain ETL


ETL, which stands for extract, transform and load, is a data integration process that combines
data from multiple data sources into a single, consistent data store that is loaded into a data
warehouse or other target system.
9. List out any two knowledge management tools
Document 360 is a tool that enables organizations to create public and private self-service
bases for both users and customers.
Helpjuice create and manage knowledge bases both internally and externally itenhances their
collaborative culture and customer support.

10. Differentiate between OLAP and OLTP in business intelligence


OLAP OLTP
Online Transaction Processing Online Analytical Processing
Manages large amount small transaction and Manage large volume of data
real time update
Uses traditional database Uses data warehouse
OLTP is a source of data Various OLAP is the source of data for
OLTP

You might also like