Teradata Architecture
Teradata Architecture
Teradata Architecture
LEVEL – LEARNER
Icons Used
2
Module 1: Teradata basics
Objectives:
After completing this chapter you will be able to answer below questions
• What is Teradata?
• What are the unique features of Teradata?
• What are Teradata components and its functions?
• What is Teradata Architecture?
Introduction to Teradata Database
Teradata is a relational database management system that drives company’s data warehouse
Compatible with Industry standards (ANSI Complaint)
The architecture supports both single-node, Symmetric Multiprocessing (SMP) systems and
multinode,. Massively Parallel Processing (MPP) systems
It uses parallelism to manage terabytes of data
It is built on a parallel Architecture
Its scalability ranges from 10GB to 100+TB of data
Teradata runs on UNIX MP –RAS, Windows 2000 server platform
It is capable of supporting many concurrent users from various platforms
Over TCP/IP or IBM channel connection
Unique Features of Teradata
• Parallel processing
– Each AMP holds a portion of the data and they them in parallel
• Linear Scalability
– Double the AMPS and double the speed
• Mature Optimizer
– PE is the Matured optimizer
• Automatic Data distribution
– Each table has Primary index which is hashed and distributes to AMP
automatically
• Shared Nothing Architecture
– Each AMP has their own Memory, CPU and disk, so called shared Nothing
Architecture
• Single Data Store
– Teradata scalability allows all data to be on one system. This is Single data store
Teradata –Parallel processing
• The rows of a Teradata table are spread across the AMPs, so each AMP can
then process in parallel when a USER queries the table.
Parsing engine
(PE)
BYNET
Teradata – Linear Scalability
Teradata Components
• Parsing engine (PE)
• BYNET (BanYan NETwork)
• AMP
• Disk
What is a Node?
• Two SMP nodes connected via the BYNETs are now one Massively Parallel
Processing (MPP) system.
Teradata Functional Overview
• When a user logs into Teradata, a PE will log them in and be responsible
for their entire session
• The PE checks the SQL Syntax
• The PE creates the EXPLAIN plan checks security and builds a plan for the
AMPs to follow. Hence PE is also known as ‘Optimizer’.
• The PE converts EBCDIC (from the mainframe queries) to ASCII on the way
in and the AMPs are responsible for converting from ASCII to EBCDIC on
the way out.
• The PE always delivers the final answer set to the user.
The PE uses the COLLECTED STATISTICS to build the best plan (least cost
plan).
• BYNET connects PE and AMP for passing various instructions and corresponding outputs.
• In Teradata system, there are two BYNET systems viz. ‘BYNET 0’ and ‘BYNET 1’. This is
because, in case one BYNET fails, the other one carries the instruction. It also fastens
communication and hence enhances query performance.
• Symmetric Multiprocessing Node (SMP) – It has Boardless BYNET and no Physical BYNET
• Massively Parallel Processing system (MPP) - Nodes are connected by then two physical
BYNET boards.
• BYNET is responsible for Broadcast, multicast and point –to – point communications between
nodes and virtual processors.
AMP
• AMPS are responsible for storing and retrieving rows from their assigned disk (Vdisk).
• AMPs lock the tables and rows.
• AMPs sort rows and do all aggregation.
• AMPs handle all space management and space accounting.
• AMPs convert ASCII to EBCDIC when returning answer sets to the mainframe.
• In Teradata 13, the AMP Worker Task (AWT) per AMP is increased for better performance.
All Teradata Tables are spread across ALL AMPS
Disk Array
• Each AMP Vproc is assigned to a disk
• A Vdisk may contain 119 GB of its disk space
Teradata Components
23
Test Your Understanding
Questions:
24
Summary
25
Module 2: RDBMS Overview
Objectives:
• After completing this chapter you will be able to answer the following
questions
• What is RDBMS?
• Describe Logical/Relational Modeling?
• What is the relationship between primary and
foreign keys?
• What are the advantages of Relational Modeling?
Introduction to RBMS
Flexibility: Different tables from which information has to be linked and extracted can be easily manipulated by
operators such as project and join to give information in the form in which it is desired.
Security: Security control and authorization can also be implemented more easily by moving sensitive attributes
in a given table into a separate relation with its own authorization controls. If authorization requirement permits,
a particular attribute could be joined back with others to enable full information retrieval.
Data Independence: Data independence is achieved more easily with normalization structure used in a relational
database than in the more complicated tree or network structure.
Data Manipulation Language: The possibility of responding to query by means of a language based on relational
algebra and relational calculus e.g SQL is easy in the relational database approach. For data organized in other
structure the query language either becomes complex or extremely limited in its capabilities.
Cater for future requirements: By having data held in separate tables, it is simple to add records that are not yet
needed but may be in the future. For example, the city table could be expanded to include every city and town in
the country, even though no other records are using them all as yet. A flat file database cannot do this
Module 3: Teradata Index
Objectives:
After completing this chapter you will be able to answer below questions
• What is Primary Index?
• What is Secondary Index?
• How data rows are stored and retrieved?
Indexing
Index
Unique Primary Index Non Unique Primary Index Unique Secondary Index Non – unique Secondary index
Primary keys Vs. Primary Indexes
When the Primary Index is not specified , Teradata will default to the first column in the
table, and it will be defined as Non-Unique.
Unique Primary Index (UPI)
• Use the Primary Index column in your SQL WHERE clause and only 1-AMP
retrieves
• UPI is a one AMP operation and returns one row
Non-Unique Primary Index (NUPI)
• A Non-Unique Primary Index (NUPI) will have duplicates grouped together on the same
AMP, so data will always be skewed (uneven). The above skew is reasonable
Non-Unique Primary Index (NUPI)
• Use the Primary Index column in your SQL WHERE clause and only 1-AMP
retrieves.
• NUPI is a one AMP operation and returns multiple rows
Multi-Column Primary Index
A table can have only one Primary Index, but you can combine up to 64
columns together max to form one Multi-Column Primary Index.
Multi-Column Primary Index
• Use the Primary Index column in your SQL WHERE clause, and only 1-AMP
retrieves
NO Primary Index
• Every Teradata System has one Hash Map with a million buckets. Inside the buckets are AMP
numbers
Placing rows on AMP
• The below example hashed Emp_No 1001 (Primary Index value) and the output was a Row
Hash of 13. Teradata counted over to bucket 13 in the Hash Map, and it has the number one
(1) inside that bucket. This means that this row will go to AMP 1.
• Emp_No 1002 (Primary Index value) and the output was a Row Hash of 5. Teradata counted
over to bucket 5 in the Hash Map, and it has the number two (2) inside that bucket. This
means that this row will go to AMP 2.
• There is one Hashing Formula in Teradata, and it is consistent.
• Hash the Primary Index Value for a row with the Hash Formula.
• The output of the Hash Formula is a 32-bit Row Hash.
• Take the Row Hash and find its corresponding bucket in the Hash Map.
• Send the row and its Row Hash to the AMP listed in the Hash Map Bucket.
Skew Factor
• Skew refers to the row distribution on AMPs. If the data is highly skewed, it means
some AMPs are having more rows and some very less i.e. data is not
properly/evenly distributed. This in turn will result in poor performance. Choice of
Indexes should be made with utmost care to avoid Skewness.
• NULL values in the Primary Index is the main reason for skew. A Table with a
Unique Primary Index can have only one Null value, but a NUPI table can have
many NULL values, and each NULL value hashes to the same AMP.
Uniqueness Value
• Each AMP will place a Uniqueness Value after the row hash to track
duplicate values
• The Hash Formula is consistent so every Smith has the same Row Hash
and the same goes for each Jones and each Patel. Therefore, duplicate
values land on the same AMP.
• Row-ID equals the Row Hash of the Primary Index column and the
Uniqueness Value.
Row ID
AMPs sort rows by Row-ID so like data is grouped together and for
Binary searches.
Example
Plan:
1. PE sees the last name as Priamry index
2. It hash Smith and get row hash
3. Row hash =7
4. Counts the bucket in hash map 7 times and it says
Amp 1
5. Passes message to AMP1 through BYNET to
retrieve row has 7’s
6. Bring back all columns for Row hash 7 (‘Smith’)
Binary Search - Example
Emp_no is a USI.
PE will hash 1004 and see which AMP holds row in subtable. (AMP 3).
PE will have the BYNET contact with AMP 3 and retrieves row 1004 (Single AMP).
AMP will pass the real row id of base table row (1,4) back up to PE.
PE will use the ROW –ID to find the base table row with another single AMP retrieve.
• Syntax
• Subtable rows match those of the base rows on the same AMP , hence it is
AMP Local.
• A NUSI query always searches all AMPs, but the intent is not to do a Full
Table Scan. If there are 50 AMPs, then a minimum of 50 binary searches
are done.
How Parsing Engine uses the NUSI
Subtable
First_name is a NUSI.
PE will order each AMP to search if they have kyle’ in their NUSI subtable
Each AMP will simultaneously perform a binary search on their NUSI Subtable
If AMP has Kyle, PE will order them to retrieve the base row.
If there are 50 AMP’s, then all 50 AMP’s will perform a binary search simultaneously and if they find ‘Kyle’
they perform another binary search on base table.
63
Summary
Objectives:
After completing this chapter, you will be able to answer the following
questions
What is Teradata database and user?
How are space allocated to Teradata objects?
What is the hierarchy of objects in Teradata syatem?
Space
Syntax:
CREATE DATABASE new_db FROM existing_db
AS
PERMANENT = 20000000
,SPOOL= 50000000
,TEMP = 20000000
Syntax:
CREATE USER new_user FROM existing_user
AS
PERMANENT = 10000000
PASSWORD =‘Acdmy’
,SPOOL= 50000000
,TEMP = 20000000
Objectives
After completing this module you will be able to answer
• How locks prevents loss of data integrity?
• What are the types of locking provided by Teradata?
• What are FALLBACK tables?
Locks
Assume in Employee_Table, we have four SQL statement first two are SELECT, third is INSERT and fourth is
SELECT.
Compatibility:
• Read supports other Read locks and Access Locks
• Write supports Access Lock
Cliques
Fallback tables
• One AMP down
– Data fully available
• Tow or more AMPs down
– In different cluster
• Data fully available
– In the same cluster
• System halts.
• RAID 1 provides each AMP two disks for storing data and two disks for
mirroring.
• The data disk and the mirror disk are called a mirrored pair.
• RAID 1 costs 50% of the disk space, but it ensures a 99% up time for
customers.
• If a single disk goes down, it is easily replaced and Teradata isn't even
effected
RAID
RAID 5(Parity):
• For every 3 blocks of data, there is a parity block on a 4th disk.
• If a disk fails, any missing blockmay be reconstructed using the other three
disks
• Array controller reconstruction of failed disk is longer than RAID 1
Summary:
• RAID 1: Good Performance with disk failures. Higher cost in terms of disk
space
• RAID 5: Reduced Performance with disk failures. Lower cost in terms of
disk space
Questions
84
Test Your Understanding
Disclaimer: Parts of the content of this course is based on the materials available from the websites and books
listed above. The materials that can be accessed from the linked sites are not maintained by Cognizant Academy
and we are not responsible for the contents thereof. All trademarks, service marks, and trade names in this course
are the marks of the respective owner(s).
32
Change Log
34
Introduction to Teradata