What Is Data Warehouse
What Is Data Warehouse
In statistics and data analysis, the data scale refers to the level of measurement
used to quantify data points. Essentially, it tells us how meaningful comparisons
and calculations we can make based on the data's values. There are four main
types of data scales, each with its own characteristics and limitations:
2. Ordinal Scale: Characteristics: Data points are ranked or ordered, but the
intervals between ranks are not necessarily equal. Think of movie ratings (1-5
stars). While we know 4 stars is "better" than 2 stars, the difference in quality
might not be the same between all levels. Examples: Customer satisfaction
ratings (poor, average, good, excellent), socioeconomic status (low, middle, high),
degree of injury (minor, moderate, severe). Operations allowed: Ranking,
identifying median and mode, comparing relative order.
3. Interval Scale: Characteristics: Data points are ordered with equal intervals
between them, but there is no true zero point. Consider temperature in Celsius.
The difference between 20°C and 30°C is the same as 0°C and 10°C, but a
temperature of 0°C doesn't mean "no heat" at all. Examples: Temperature
(Celsius, Fahrenheit), height and weight, IQ scores. Operations allowed: All
operations of ordinal scales plus calculations like addition, subtraction, finding
mean and standard deviation.
4. Ratio Scale: Characteristics: Data points are ordered with equal intervals and
have a true zero point, meaning the absence of the measured quantity. Imagine
money. A balance of $0 truly means no money, and the difference between $10
and $20 is the same as $20 and $30. Examples: Age, time, distance, salary,
What Are the Different Data Collection Methods?
Primary and secondary methods of data collection are two approaches used to gather
information for research or analysis purposes. Let's explore each data collection method
in detail:
Primary data collection involves the collection of original data directly from the source or
through direct interaction with the respondents.
Secondary data collection involves using existing data collected by someone else for a
purpose different from the original intent.
e. Past Research Studies: Previous research studies and their findings can serve as
valuable secondary data sources.
2. Data Preparation: Here, you make the raw data usable for
analysis. This often involves: Cleaning: Removing errors,
inconsistencies, and missing values. Transformation: Formatting
data into a consistent structure, converting units, and handling
outliers. Integration: Combining data from multiple sources if
needed.
Flexibility: You can store any type of data in a data lake, regardless of its
structure or format. This makes them ideal for organizations that deal with a
lot of diverse data.
Data quality: Because data lakes store everything, it's easy for low-quality or
irrelevant data to creep in.
Security: Ensuring the security of all that data in a data lake is crucial
Explain KDD in detail with advantages & disadvantages with diagram &
example?
Data Cleaning Data cleaning is defined as removal of noisy and irrelevant data
from collection. 1. Cleaning in case of Missing values. 2. Cleaning noisy data,
where noise is a random or variance error. 3. Cleaning with Data discrepancy
detection and Data transformation tools
Data Selection Data selection is defined as the process where data relevant to
the analysis is decided and retrieved from the data collection. For this we can
use Neural network, Decision Trees, Naive bayes, Clustering, and Regression
methods
Data Mining Data mining is defined as techniques that are applied to extract
patterns potentially useful. It transforms task relevant data into patterns, and
decides purpose of model using classification or characterization.
Advantages of KDD
1. Improves decision-making:
2- Increased efficiency:
3- Better customer service:
4- Fraud detection:
5- Predictive modeling
Disadvantages of KDD