Chapter 2a Non Structured DataRozianiwati
Chapter 2a Non Structured DataRozianiwati
Non-Structured Data
ISP610 BUSINESS DATA
ANALYTICS
References:
April 17 2
NoSQL databases
• Non SQL or Non relational or Not only SQL.
• Stores and retrieves data that is not modelled in rows and columns.
• "Not only SQL“ - may support SQL-like query languages.
April 17 3
Applications of NoSQL databases
• The NoSQL distributed database infrastructure has been the solution
to handling some of the biggest data warehouses on the planet – i.e.
the likes of Google, Amazon, and the CIA.
• Airbus
http://medianetwork.oracle.com/video/player/4662924811001
April 17 4
NoSQL vs
SQL
1. Non-relational model. 1. Relational model.
NoSQL
SQL
2.
key/value, graphs, columns. 3. Adding a new property may
3. New properties can be added on require altering schemas.
the fly. 4. Good for structured data.
4. Good for semi-structured, 5. Relationships are captured in
complex or nested data. normalised model using joins to
5. Relationships are captured by resolve references across tables.
denormalizing data and 6. Strict schema.
presenting all data for an object
in a single record.
6. Dynamic/flexible schema.
April 17 5
April 17 6
SQL NoSQL
April 17 7
Case study: Building a social media website
• Users can post articles with related media like, pictures, videos, or
even music.
• Users can comment on posts and give points for ratings.
• Users can see a feed of posts.
• Users can interact with the main website.
April 17 8
Relational model
April 17 9
NoSQL model
In general:
• One query.
• No JOINS.
• No schema is maintained.
April 17 10
Types of NoSQL databases
• Key-value
• Column /
BigTable
• Document
• Graph
April 17 11
April 17 12
Key-value database
• Most basic and a backbone implementation of NoSQL.
• Underlying is a hash table which consists of a unique key that points
to a specific item of data.
• Work by matching keys with values like a dictionary.
• Give a key (e.g. the_answer_to_life) and receives a matching value
(e.g.24).
• Database is a global collection of key-value pairs.
• As the volume of data increases, maintaining unique values as
keys may become more difficult.
• Riak, Amazon S3 (Dynamo), Oracle NoSQL.
April 17 13
KEY
The key in a key-value pair must (or at least, should) be unique. This is the
unique identifier that allows you to access the value associated with that key.
In theory, the key could be anything. But this may depend on the
DBMS. One DBMS may impose limitations while another may impose none.
In Redis for example, the maximum allowed key size is 512 MB. You can use any
binary sequence as a key, from a short string of text, to the contents of an
image file. Even the empty string is a valid key.
However, for performance reasons, you should avoid having a key that’s too
long. But too short can cause readability issues too. In any case, the key should
follow an agreed convention in order to keep things consistent.
THE VALUE
The value in a key-value store can be anything, such as text (long or short), a number, markup code such as HTML,
programming code such as PHP, an image, etc.
The value could also be a list, or even another key-value pair encapsulated in an object.
Some key-store DBMSs allow you to specify a data type for the value. For example, you could specify that the
value should be an integer. Other DBMSs don’t provide this functionality and therefore, the value could be of any
type.
As an example, the Redis DBMS allows you to specify the following data types:
•Binary-safe strings.
•Lists: collections of string elements sorted according to the order of insertion.
•Sets: collections of unique, unsorted string elements.
•Sorted sets, similar to Sets but where every string element is associated to a floating number value,
called score. Allows you to do things like, select the top 10, or the bottom 10, etc.
•Hashes, which are maps composed of fields associated with values. Both the field and the value are strings.
•Bit arrays (or simply bitmaps).
•HyperLogLogs: this is a probabilistic data structure which is used in order to estimate the cardinality of a set.
Example
data
Key Value
The list contains the stock ticker, whether its a “buy” or “sell” order, 345678901 JAZZ, Buy, 235, 145.06
April 17 18
Basic reading and
writing
• Get(key), returns the value associated with the provided key.
• Put(key, value), associates the value with the key.
• Multi-get(key1, key2, .., keyN), returns the list of values associated
with the list of keys.
• Delete(key), removes the entry for the key from the data store.
April 17 20
Column/BigTable
• Advance the simple nature of key / value based.
• Do not require a pre-structured table to work with the data.
• Work by creating collections of one or more key / value pairs.
• Two dimensional arrays whereby each key has one or more key /
value pairs attached to it.
• Two groups: column-store and column-family store.
• Column-family store: Bigtable, HBase, Hypertable, and Cassandra.
• Column-store: Sybase IQ, C-store, Vertica, VectorWise, MonetDB,
ParAccel and Infobright.
April 17 21
Some key benefits of columnar databases include:
•Compression. Column stores are very efficient at data compression
and/or partitioning.
•Aggregation queries. Due to their structure, columnar databases
perform particularly well with aggregation queries (such as SUM,
COUNT, AVG, etc).
•Scalability. Columnar databases are very scalable. They are well
suited to massively parallel processing (MPP), which involves having
data spread across a large cluster of machines – often thousands of
machines.
•Fast to load and query. Columnar stores can be loaded extremely
fast. A billion row table could be loaded within a few seconds. You can
start querying and analysing almost immediately.
KEY
Column-store,
position-base
d
April 17 23
Column-store
,
rowid-based
KEY
VALUE
April 17 24
Column-family
April 17 25
April 17 28
• The outermost keys 3PillarNoida,
3PillarCluj, 3PillarTimisoara and
3PillarFairfax are analogues to
rows.
April 17 29
Document database
Example:
{ KEY
_id : 978 NAME-VALUES
“Title” : “The Linux Command Line”, Document 1
“Author” : “William Shotts”
}
April 17 31
Documents are gathered together in collections within the database.
Collections should make sense e.g. books, webstore, retail store, fruits.
Hence, document database is unstructured and schemaless.
April 17 32
Since we are so used to relational
db…
Relational Databases Document Databases
Databases Databases or Buckets
Tables Collections or Type Signifiers
Rows Documents
Columns Attributes/Names
Index Index
April 17 33
Document database
• We can store different schemas in different documents and these documents reside in the same
collection.
Example:
{
_id : 1
“ISBN” : “978”,
Document 1
“Title” : “The Linux Command Line”
}
Collection
{
_id : 2
“ASIN” : Document 2
“Item” “B00J”,
: “Cherry Barbeque Sauce”
}
April 17 34
Document database
_id : “978”,
“Title” : “Data Science”,
“Author” : [“William
List of values
Jackson”, “Ben
} Ten”]
April 17 35
Graph database
• Use graph structures with edges, nodes and properties.
• Nodes are organised based on their relationships with one
another.
• These relationships are represented by edges between the nodes.
• Relationship defines social connectivities.
• Both nodes and relationships have defined properties.
• Neo4j.
April 17 37
Here is a comparison between the classic relational model and the graph model :
Rows Vertices
Joins Edges
Use case
• People who likes this product, usually like that product.
• Mary is friends with George. George likes pizza. George has visited
Japan. Thus, we can ask the question of who are the friends of Mary’s
friends who likes the food that Mary’s friend likes but have not visited
the place that Mary’s friend has visited.
• You are more likely to be friends with Abu because you know Ali since
Abu is Ali’s friend.
April 17 39
Graph database
April 17 40
SOCIAL NETWORK
QUESTIONS:
Convert XML script into :
Key Value database,
columnfamily,
Document database
Convert relational database into document database