Data Integration
Data Integration
Data Types
Structured Data
Data in a Organized form & confirm to data model
Unstructured Data
Data not in a organized form & does not confirm to data model
Digital Data
10% 10% structured data Semi structured data 80% Unstructured data
Structured Data
Conforms to a data model
Similar entities are grouped STRUCTURED DATA Attributes in a group are same
Retrieving Information
Indexing and searching Mining Data BI Operations
Semi-structured Data
Does not Conforms to any data model but contains tags and elements Similar entities are grouped Cannot be stored in rows and columns
Semi-structured Data
Attributes in a group may not be the same
Semi-Structured Data
Challenges Storage Cost RDBMS Irregular and partial Structure Evolving Schemas Schemas and Data Possible Solutions XML RDBMS Special Purpose RDBMS OEM
Unstructured Data
Does not Conforms to any data model Has no easily identifiable structure UNSTRUCTURED DATA Does not follow any rules or semantics Not in any particular format or sequence Cannot be stored in rows and columns
Measurements-Properties
Distinctness
Different objects receive different scores ( = or )
Order
Ordering of the numbers reflects ordering of the variable ( ,)
Addition &Subtraction
The difference in each situation is identical (+ or -)
Nominal Scale
Numbers assigned as labels or tags for identifying and classifying objects Numbers do not reflect the amount of characteristic possessed by the objects Permissible operation on nominal scale is counting as the numbers are arbitrary Limited statistical summarize measures can be used to
Ordinal Scale
Numbers are assigned to objects to indicate the relative position These numbers also indicate whether an object has more or less characteristic than some other object Orders categories logically Few statistical measures used to summarize these numbers
Interval scale
Will have equal intervals in the scale
Ratio Scale
Possess all the properties of nominal, ordinal and
Interval scale
It has an absolute zero point indicating the absence of the variable Categories has equal intervals Almost all the descriptive measures can be used to summarize and analyze
Scales of Measurement
Nominal Examples Ethnicity Religion, ZIP Gender, ID Distinctness Properties Ordinal Interval Ratio Weight, age, Height, Time , Temp in Kelvin Distinctness , Order , difference and Multiplication All arithmetic operations Class rank, Temperature in Letter Grade, Celsius or Mineral grading Fahrenheit Marks Distinctness & Order Distinctness , Order and difference
None Mathematical Operations Statistical Measures Mode, Contingency table, Chi square
Rank Order ( , )
Add Subtract
Describe Data
Properties of data Amount of data, data type etc
Explore Data
Cross tabulations, Associations & Descriptive Statistics
Data Preparation
Select Data
Selection of variables and data
Clean Data
Handle Missing values
Construct Data
Transformation of Variables
Integrate Data
Merging Data from different sources
Format Data
Structure of Variables
Explore
The Data Warehouse- Digital Chapter , HBSP SPSS Introduction spss.co.in Review of BRM/MR course