0% found this document useful (0 votes)

46 views

Subject: Big-Data Analytics (CSE-420) Class: B.Tech (CSE) Semester: 6 Semester: 6 Lecture No. 13

The document discusses concepts related to big data analytics including Aerospike, a new generation key-value store, and AsterixDB, a database management system for semi-structured data. It provides details on the architecture of Aerospike, including its use of primary and secondary indexes, and transactions. It also shows how AsterixDB can model semi-structured data like JSON documents using types and handles nested elements.

Uploaded by

Mdim

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

46 views

Subject: Big-Data Analytics (CSE-420) Class: B.Tech (CSE) Semester: 6 Semester: 6 Lecture No. 13

Uploaded by

Mdim

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

Subject: Big-Data Analytics (CSE-420)

Class: B.Tech (CSE)

Semester: 6th
Lecture No. 13
At the end of this lecture, students will be able to
understand the concept of

• Aerospike: (A New Generation KV Store)

• Architecture of Aerospike

• Transaction in Aerospike

• AsterixDB
• Aerospike: (A New Generation KV Store):--

• Large amount of data should be accessible at any point of

time

• Aerospike can interoperate with hadoop based system or

spark or with a real time data source

• It can exchange any large volume of data with any such

source and serves “fast look” and query to the application
server.
• Aerospike: (High Level Architecture Diagram):--
• “FAST PATH” essentially refers to left side of the architecture.

1. Client system process transaction that is the data primarily managed

in a primary index as a key value store.

2. This index stays in memory for operational purposes. However, the

system also interact with the storage layer for persistence.

3. The Storage layer uses three kinds of storage system, in-memory

with DRAM, a regular spinning disk & Flash/SSD for fast loading of
data when needed.
• Aerospike: (High Level Architecture Diagram):--

4. Aerospike builds secondary index as a non-primary keys. (Non-

primary key is a key attribute that makes a tuple unique, but not has
been chosen as a primary key)

5. In Aerospike, secondary index are stored in main memory, they are

built on every node in a cluster and co-located with the primary index.
• Querying Aerospike:--
• Aerospike uses standard data type like: --

– Standard Scalar, lists, maps, geospatial, large objects.

o Maps type is similar to the Hashes type in Radis and contain

attribute-value pair.

o Since it is focus on real time web application, it support geospatial

data (like latitude & longitude)

• Aerospike also provides more declarative language like AQL (looks

very similar to SQL)
• Transactions in Aerospike:--
• Aerospike ensures ACID

o Consistency: -- means two different things, one is to ensures that

all constraints like domain constraints are satisfied.

• Second means, is to apply distributed system and ensures all

copies of a data items within a cluster are in sync. (Uses
synchronous write to replicas)

o Durability: -- is achieved by storing data in flash SSD on every

node and performing direct reach from the flash memory.

• Durability is also maintained through the process of replication

because of multiple copies of data.
• AsterixDB: (A DBMS for semi-structured data):--
• A new approach currently been incubated by Apatche i.e
AsterixDB.

Originally AsterixDB was conceived by university of

california, since it is a full flagged BDMS, it provides ACID
guarantees.

• To understand the basic design of AsterixDB, lets consider

the incomplete JSON taken from an actual tweet.
{
"created_at": "Thu Oct 21 16:02:46 +0000 2010",
"entities": {
"user_mentions": [
{
"name": "Gnip, Inc.", An abbreviated Tweet
"screen_name": "gnip"
}

]
},
"text": "what we've been up to at @gnip -- delivering data to happy customers http://gnip.com/success_stories",
"id": 28039652140,
“geo”: null,
"retweet_count": null,
"in_reply_to_user_id": null,
"user": {
"name": "Gnip, Inc.",
"lang": "en",
"followers_count": 260,
"friends_count": 71,
"statuses_count": 302,
"screen_name": "gnip"
},
}
• AsterixDB: (A DBMS for semi-structured data):--
• From the previous slide, entities and user, two parts (in
blue) is nested, that means embedded within the structure of
the tweet.

• If we represent the part of the schema in AsterixDB, it

would look like…..

(Refers next slide….)

create dataset TweetMessages(TweetMessageType)
create dataverse LittleTwitterDemo; primary key tweetid;

create type TwitterUserType as open {

screen-name: string,
lang: string,
friends_count: int32,
statuses_count: int32,
id: int32,
followers_count: int32
}
create type TweetMessageType as closed {
tweetid: string,
user: TwitterUserType,
geo: point?,
created_at: datetime,
referred-topics: {{ string }},
text: string
}
• AsterixDB: (A DBMS for semi-structured data):--
• Top Part:

Which looks like standard database table, represent the user portion of
the JSON object that we highlight.

Open: -- means more no. of data types.

• Bottom Part:

represent the message instead of nesting it like JSON, user attribute

highlighted in blue is declared its type, twitter is its type.

Closed: data instance must have the same attribute within the schema.

geo: point?: ?(Optional) means -- All instance need not have it.
•
for $userin datasetTwitterUsersorder by
$user.followers_count desc,$user.langasc return$user

•
SELECT a.val,b.val FROM a LEFTOUTERJOIN bON
(a.key=b.key)

•
•
•
Hyracks Job Management

•
•

•
•
•
• from filesin a directory path
create dataset Tweets (Tweet)
primary key id;

create feed TestFileFeed using localfs

(("path"="127.0.01:///Users/adc/text/"),
("format"="adm"), ("type-name"="Tweet"),
("expression"=".*\\.adm"));

connect feed TestFileFeed to dataset Tweets;

• from anexternalAPI
use dataverse feeds;
create dataset Tweets (Tweet)
primary key id;
create feed TwitterFeed if not exists using "push_twitter"
(("type-name"="Tweet"),
("consumer.key"=“some-key"),
("consumer.secret"=“some-secret"),
("access.token"=“some-token"),
("access.token.secret"=“some-token-secret"));
connect feed TwitterFeed to dataset Tweets;

DP-203 Exam - Free Actual Q&Ans - ExamTopics
No ratings yet
DP-203 Exam - Free Actual Q&Ans - ExamTopics
270 pages
AWS Certified Solutions Architect - Professional
From Everand
AWS Certified Solutions Architect - Professional
VB Dev
No ratings yet
Basics of Revit Add-In Programming: Visual Studio
No ratings yet
Basics of Revit Add-In Programming: Visual Studio
25 pages
Mapping Existing Globals To Objects and SQL: Mike Larocca Intersystems Corporation
No ratings yet
Mapping Existing Globals To Objects and SQL: Mike Larocca Intersystems Corporation
69 pages
MongoDB Schema Design Basics
100% (2)
MongoDB Schema Design Basics
51 pages
Bladecenter S Sas Raid Controller Module
No ratings yet
Bladecenter S Sas Raid Controller Module
216 pages
Energy Storage Technologies and Applications
100% (11)
Energy Storage Technologies and Applications
328 pages
OSSEC Log Mangement With Elasticsearch
50% (2)
OSSEC Log Mangement With Elasticsearch
24 pages
Medium Com Unstructured Io Setting Up A Private Retrieval Augmented Generation Rag System With Local Vector Database D42f34692ca7 1
No ratings yet
Medium Com Unstructured Io Setting Up A Private Retrieval Augmented Generation Rag System With Local Vector Database D42f34692ca7 1
9 pages
PySpark Questions
No ratings yet
PySpark Questions
5 pages
Kafka-Streams-workshop 241008 222322
No ratings yet
Kafka-Streams-workshop 241008 222322
93 pages
Apache Spark - DataFrames and Spark SQL
100% (2)
Apache Spark - DataFrames and Spark SQL
146 pages
Assignment 2_ Data Storage
No ratings yet
Assignment 2_ Data Storage
4 pages
MongoDB Quick Book
No ratings yet
MongoDB Quick Book
11 pages
Hive Lecture Notes
100% (1)
Hive Lecture Notes
17 pages
BIG DATA 4
No ratings yet
BIG DATA 4
14 pages
Experiment2 Web Scraping and Data Analysis
No ratings yet
Experiment2 Web Scraping and Data Analysis
5 pages
docs-llamaindex-ai...
No ratings yet
docs-llamaindex-ai...
3 pages
ElasticSearch Howto
No ratings yet
ElasticSearch Howto
8 pages
Azure Standard
No ratings yet
Azure Standard
10 pages
Practical No - 01: Aim: Data Collection, Data Curation and Management For Unstructured Data (Nosql) Using Apache Couchdb
No ratings yet
Practical No - 01: Aim: Data Collection, Data Curation and Management For Unstructured Data (Nosql) Using Apache Couchdb
79 pages
INT 222
No ratings yet
INT 222
24 pages
syllabus
No ratings yet
syllabus
5 pages
AWS Data Lake
No ratings yet
AWS Data Lake
13 pages
Data Analysis With Python
No ratings yet
Data Analysis With Python
12 pages
SQLite GoogleMaps UNIT5
No ratings yet
SQLite GoogleMaps UNIT5
40 pages
Hacking IIS - NahamCon
No ratings yet
Hacking IIS - NahamCon
35 pages
Ads Final Assignment No 11
No ratings yet
Ads Final Assignment No 11
18 pages
Unit 5
No ratings yet
Unit 5
30 pages
Hacking IIS: W/ Shubs
No ratings yet
Hacking IIS: W/ Shubs
35 pages
Azure Data Engineer Guide
No ratings yet
Azure Data Engineer Guide
87 pages
Presentation
No ratings yet
Presentation
156 pages
CCF Usage Manual
No ratings yet
CCF Usage Manual
8 pages
DP 203t00a Enu Powerpoint 03
No ratings yet
DP 203t00a Enu Powerpoint 03
25 pages
AI Database Querying Solution
No ratings yet
AI Database Querying Solution
19 pages
DataStage Theory Part
No ratings yet
DataStage Theory Part
18 pages
SQ Lite
No ratings yet
SQ Lite
22 pages
CCA-175 Docs and Projects
No ratings yet
CCA-175 Docs and Projects
5 pages
BIG DATA
No ratings yet
BIG DATA
19 pages
CHAPTER 03: Big Data Technology Landscape
No ratings yet
CHAPTER 03: Big Data Technology Landscape
81 pages
FDEN
No ratings yet
FDEN
15 pages
Spark SQL
No ratings yet
Spark SQL
24 pages
Data Storage Overview Sqlite Databases
No ratings yet
Data Storage Overview Sqlite Databases
22 pages
3. ASP
No ratings yet
3. ASP
12 pages
WW101-04-Library
No ratings yet
WW101-04-Library
10 pages
APP8
No ratings yet
APP8
46 pages
HTML 5
No ratings yet
HTML 5
64 pages
Technical Assessment - Senior
No ratings yet
Technical Assessment - Senior
3 pages
Master Chief
No ratings yet
Master Chief
15 pages
Assignment 1-3 Awp
No ratings yet
Assignment 1-3 Awp
10 pages
SST MDS
No ratings yet
SST MDS
21 pages
DP900 NOTES Parti 1 - 40mn Vidéo
No ratings yet
DP900 NOTES Parti 1 - 40mn Vidéo
11 pages
M3 Dar
No ratings yet
M3 Dar
52 pages
computer project (1)
No ratings yet
computer project (1)
21 pages
Cognizant Interview Questions & Answers.pdf
No ratings yet
Cognizant Interview Questions & Answers.pdf
23 pages
SQLite DATABASE
No ratings yet
SQLite DATABASE
53 pages
Distributed Information Systems: Prototypicalactivewebsite R Est - Apis, Js On, Data Ba Se, Cha Rts
No ratings yet
Distributed Information Systems: Prototypicalactivewebsite R Est - Apis, Js On, Data Ba Se, Cha Rts
31 pages
Exploring The Sysmaster Database: by Lester Knutsen
No ratings yet
Exploring The Sysmaster Database: by Lester Knutsen
23 pages
A Short Introduction to Apache Iceberg _ by Christine Mathiesen _ Expedia Group Technology _ Medium
No ratings yet
A Short Introduction to Apache Iceberg _ by Christine Mathiesen _ Expedia Group Technology _ Medium
12 pages
Homework 0
No ratings yet
Homework 0
4 pages
Introduction To Elasticsearch.: Ruslan Zavacky
No ratings yet
Introduction To Elasticsearch.: Ruslan Zavacky
75 pages
Inspiring Powershell Articles
From Everand
Inspiring Powershell Articles
Murat Yildirimoglu
No ratings yet
System Error Codes (0-499)
No ratings yet
System Error Codes (0-499)
33 pages
Test 8
No ratings yet
Test 8
38 pages
Unit - 3 (Basic ICT Skills)
No ratings yet
Unit - 3 (Basic ICT Skills)
7 pages
Moxa Arm Based Computer Linux User Manual For Debian 9 v4.2
No ratings yet
Moxa Arm Based Computer Linux User Manual For Debian 9 v4.2
74 pages
Foundation of Computer Science - Grade 7 Computer Science Revision Notes
No ratings yet
Foundation of Computer Science - Grade 7 Computer Science Revision Notes
26 pages
Hmi Mobile Panel 277 Iwlan v2 Operating Instructions en-US en-US
No ratings yet
Hmi Mobile Panel 277 Iwlan v2 Operating Instructions en-US en-US
324 pages
WEEK4 COMPUTER PAST QUESTIONS-WPS Office
No ratings yet
WEEK4 COMPUTER PAST QUESTIONS-WPS Office
8 pages
Unit1 MCQs
No ratings yet
Unit1 MCQs
5 pages
CompTIA A+ Core 1 (220-1101) ToC and Quiz Questions
0% (1)
CompTIA A+ Core 1 (220-1101) ToC and Quiz Questions
149 pages
Computer 9 Components of Computer System
100% (1)
Computer 9 Components of Computer System
8 pages
Hadoop Hdfs Commands
No ratings yet
Hadoop Hdfs Commands
2 pages
Free Torrent: Free and Valid Exam Torrent Helps You To Pass The Exam With High Score
No ratings yet
Free Torrent: Free and Valid Exam Torrent Helps You To Pass The Exam With High Score
4 pages
W76SUN
No ratings yet
W76SUN
104 pages
Nutanix Spec Sheet 1
No ratings yet
Nutanix Spec Sheet 1
7 pages
TRCN 4
No ratings yet
TRCN 4
13 pages
Gartner Storage 2019
No ratings yet
Gartner Storage 2019
39 pages
Characteristics of Computers
100% (2)
Characteristics of Computers
12 pages
SCSI iSCSI RAID SAN FC
No ratings yet
SCSI iSCSI RAID SAN FC
74 pages
Hazelcast IMDG V IMDB Whitepaper
No ratings yet
Hazelcast IMDG V IMDB Whitepaper
8 pages
Thinksystem Sr630: Built For Business-Critical Versatility
No ratings yet
Thinksystem Sr630: Built For Business-Critical Versatility
2 pages
FPGA Dedup
No ratings yet
FPGA Dedup
49 pages
Intro.-to-CS---Previous-Midterm-Exams-ANSWER-KEYs-(3-Exams---2023)
No ratings yet
Intro.-to-CS---Previous-Midterm-Exams-ANSWER-KEYs-(3-Exams---2023)
6 pages
VNX - 5200 Procedures - 1
No ratings yet
VNX - 5200 Procedures - 1
5 pages
BCS302 Module 3notes R3qiw3xfd8iov9j5
No ratings yet
BCS302 Module 3notes R3qiw3xfd8iov9j5
32 pages
001 Bizgram Asia Pricelist December 23c
No ratings yet
001 Bizgram Asia Pricelist December 23c
15 pages
Dell U4p PG 10 0 0
No ratings yet
Dell U4p PG 10 0 0
40 pages
Linux Documentation
No ratings yet
Linux Documentation
10 pages
B.Ed Elective Course Vi Computers in Education Objectives
No ratings yet
B.Ed Elective Course Vi Computers in Education Objectives
4 pages

Subject: Big-Data Analytics (CSE-420) Class: B.Tech (CSE) Semester: 6 Semester: 6 Lecture No. 13

Uploaded by

Subject: Big-Data Analytics (CSE-420) Class: B.Tech (CSE) Semester: 6 Semester: 6 Lecture No. 13

Uploaded by

Subject: Big-Data Analytics (CSE-420)

Class: B.Tech (CSE)

• Aerospike: (A New Generation KV Store)

• Large amount of data should be accessible at any point of

• Aerospike can interoperate with hadoop based system or

• It can exchange any large volume of data with any such

1. Client system process transaction that is the data primarily managed

2. This index stays in memory for operational purposes. However, the

3. The Storage layer uses three kinds of storage system, in-memory

4. Aerospike builds secondary index as a non-primary keys. (Non-

5. In Aerospike, secondary index are stored in main memory, they are

– Standard Scalar, lists, maps, geospatial, large objects.

o Maps type is similar to the Hashes type in Radis and contain

o Since it is focus on real time web application, it support geospatial

• Aerospike also provides more declarative language like AQL (looks

o Consistency: -- means two different things, one is to ensures that

• Second means, is to apply distributed system and ensures all

o Durability: -- is achieved by storing data in flash SSD on every

• Durability is also maintained through the process of replication

Originally AsterixDB was conceived by university of

• To understand the basic design of AsterixDB, lets consider

• If we represent the part of the schema in AsterixDB, it

(Refers next slide….)

create type TwitterUserType as open {

Open: -- means more no. of data types.

represent the message instead of nesting it like JSON, user attribute

create feed TestFileFeed using localfs

connect feed TestFileFeed to dataset Tweets;

You might also like