db v2.0
db v2.0
The way we store and manage data has evolved significantly over time, from
manual systems to advanced digital tools. This evolution reflects the growing
need for efficiency, speed, and accessibility in handling data.
Data models have evolved over time to address the increasing complexity of
data management and processing. Each stage reflects advancements in
technology and the growing need for efficient and flexible data
representation.
With the growth of large-scale data, Big Data and NoSQL Models were
developed to handle massive, diverse, and unstructured datasets. Big Data
tools process distributed data efficiently, while NoSQL databases, including
key-value, document, column-family, and graph stores, prioritize scalability,
flexibility, and high performance.
4.Keys
Types of Keys
1. Primary Key
o It cannot contain NULL values and must have unique values for
every row.
o A table can have multiple candidate keys, but only one is chosen
as the primary key.
3. Composite Key
4. Foreign Key
5. Alternate Key
6. Super Key
7. Unique Key
o A unique key ensures all values in a column are unique.
Dependencies in Databases
1. Functional Dependency
2. Transitive Dependency
3. Partial Dependency
4. Multivalued Dependency
1. One-to-One (1:1)
Use Case: Used when each entity has a unique counterpart in another
entity.
Implementation:
Example:
o Tables:
2. One-to-Many (1:M)
Definition: A relationship where one record in Table A can be
associated with multiple records in Table B, but each record in Table B
is related to only one record in Table A.
Implementation:
Example:
o Tables:
3. Many-to-Many (M:N)
Implementation:
Example:
o Tables:
A table can have multiple indexes, including composite indexes that use
multiple attributes. However, each index is associated with only one table.
Indexes play a critical role in optimizing performance and implementing
database constraints like primary keys.
1. Entities: These are objects or concepts in the real world that can be
distinctly identified and stored in a database. For example, "Student" or
"Course" are entities. Entities are represented as rectangles in an ER
diagram.
2. Attributes
Types of Attributes:
1. Simple Attributes:
2. Composite Attributes:
3. Derived Attributes:
4. Multivalued Attributes:
3. Relationships
Types of Relationships:
1. One-to-One (1:1):
2. One-to-Many (1:M):
3. Many-to-Many (M:N):
Entity Supertype
Entity Subtype
2. Specialization Hierarchy
Definition
Types of Specialization
1. Total Specialization:
2. Partial Specialization:
Example:
Before 1NF:
Student Nam
Courses
ID e
Math,
1 Alice
Science
2 Bob English
After 1NF:
1 Alice Math
Scienc
1 Alice
e
2 Bob English
2. All non-key attributes are fully dependent on the primary key (no
partial dependency).
Example:
Before 2NF:
Englis Dr.
2 Room 102
h Brown
After 2NF:
o Student Table:
Student Cours
ID e
1 Math
Englis
2
h
o Course Table:
Englis Dr.
Room 102
h Brown
1. It is in 2NF.
2. There are no transitive dependencies (non-key attributes depend
only on the primary key).
Example:
Before 3NF:
Englis Dr.
Room 102 Arts
h Brown
After 3NF:
o Course Table:
Cours Instruct
e or
Englis Dr.
h Brown
o Instructor Table:
Dr.
Room 102 Arts
Brown
Beyond the first three normal forms (1NF, 2NF, and 3NF), higher-level normal
forms address more complex types of data dependencies. These include
Boyce-Codd Normal Form (BCNF), Fourth Normal Form (4NF), Fifth
Normal Form (5NF).
1. Boyce-Codd Normal Form (BCNF)
Definition
1. It is in 3NF.
Key Concept
Example:
Before BCNF:
Englis Dr.
Arts
h Brown
After BCNF:
o Course Table:
Cours Instruct
e or
Englis Dr.
h Brown
o Instructor Table:
Instruct Departme
or nt
Dr.
Arts
Brown
Definition
1. It is in BCNF.
Key Concept
Example:
Before 4NF:
Student Cours
Hobby
ID e
1 Math Chess
Scienc
1 Chess
e
Paintin
1 Math
g
Here, StudentID determines both Course and Hobby, but Course and Hobby
are independent of each other.
After 4NF:
o Student-Course Table:
Student Cours
ID e
1 Math
Scienc
1
e
o Student-Hobby Table:
Student
Hobby
ID
1 Chess
Paintin
1
g
Definition
1. It is in 4NF.
Key Concept
Example:
S1 P1 J1
S1 P2 J1
S2 P1 J2
Here, Supplier, Part, and Project are interrelated, and the table can be
decomposed into:
Supplier-Part Table:
Suppli Par
er t
S1 P1
S1 P2
S2 P1
Part-Project Table:
Par Proje
t ct
P1 J1
P2 J1
P1 J2
Supplier-Project Table:
Suppli Proje
er ct
S1 J1
S2 J2
Higher-level normal forms ensure the highest level of data integrity and
minimize redundancy. While they are rarely required in most practical
databases, understanding them helps database designers handle complex
scenarios and maintain a robust schema.
11. Denormalization
1. Combining Tables
Merge two or more related tables into a single table to avoid joins.
Example:
Example:
3. Precomputing Aggregates
Example:
Example:
Example of Denormalization
Normalized Schema:
1. Customer Table:
Customer Nam
ID e
1 Alice
2 Bob
2. Order Table:
2025-01-
101 1
01
2025-01-
102 2
02
Denormalized Schema:
CustomerOrder Table:
2025-01-
101 1 Alice
01
OrderI Customer Nam OrderDa
D ID e te
2025-01-
102 2 Bob
02
The SELECT statement is the most fundamental SQL query used to retrieve
data from a database. Below are the basic structures and examples of
SELECT queries.
Syntax
FROM table_name;
Example
Here, the asterisk (*) is used to select all columns from the table.
Syntax
FROM table_name;
Example
The WHERE clause is used to filter records that meet specific conditions.
Syntax
FROM table_name
WHERE condition;
Example
You can use the AND and OR operators to combine multiple conditions in the
WHERE clause.
Syntax
FROM table_name
Example
To select employees who are either older than 30 or have the job title
'Manager':
Syntax
FROM table_name
Example
Syntax
FROM table_name;
Example:
FROM Employees;
Syntax
FROM table_name;
Suppose we have a Sales table with Quantity and UnitPrice columns, and we
want to compute the total sales for each record.
FROM Sales;
In this example:
You can also use SQL functions in computed columns. For example, if you
want to calculate the full name of employees by combining FirstName and
LastName:
FROM Employees;
This query retrieves data only from the PRODUCT table. Columns not included
in the specified table are unavailable unless additional tables are included in
the FROM clause.
For queries requiring data from multiple tables, the FROM clause must
combine tables using JOIN operations to prevent a Cartesian product,
which yields incorrect results. Proper joins (e.g., INNER JOIN, LEFT JOIN)
ensure accurate relationships between tables and retrieve the desired data.
In summary, the FROM clause is essential for defining the data source, and
its design determines the query's structure and accuracy.
The ORDER BY clause in SQL is used to sort the results of a SELECT query in
ascending or descending order. By default, the results are sorted in
ascending order, but you can specify DESC for descending order. For
example:
You can also use cascading order by listing multiple columns to sort by. For
instance, to order by last name, then first name, and then middle initial:
This sorts the data in a multi-level order, similar to a phone directory. The
ORDER BY clause is also useful with derived attributes and for sorting data
in specific business scenarios like listing recent invoices or the largest
budget items first.
The WHERE clause in SQL is used to filter records and extract only those that
fulfill a specified condition. It can be used with comparison operators, logical
operators, and special operators to create complex filtering conditions.
Below, we’ll cover the key options available within the WHERE clause,
including conditional restrictions, comparison operators for different data
types, and logical and special operators.
The most common use of the WHERE clause is to filter records based on
conditions. These conditions can involve column values, expressions, or
functions. You can specify conditions using operators like =, >, <, >=, <=,
<>, and more.
Syntax
FROM table_name
WHERE condition;
Example
FROM Employees
=: Equal to.
FROM Employees
This query selects employees whose first name starts with the letter "A". The
% is a wildcard that matches any sequence of characters.
FROM Employees
This query selects employees whose last name does not start with the letter
"S".
When working with date attributes, comparison operators like =, <>, >, <,
>=, and <= are used to filter records based on specific date values or
ranges.
Syntax
FROM table_name
WHERE date_column comparison_operator 'date_value';
Example
FROM Employees
This query selects employees who were hired after January 1, 2020.
FROM Employees
Date Format:
AND Operator
Syntax
FROM table_name
Example
This query selects employees who have a salary greater than 50,000 and
work in the "HR" department.
OR Operator
The OR operator is used when you want to include records that meet at least
one of the conditions.
Syntax
FROM table_name
Example
FROM Employees
This query selects employees who either have a salary greater than 50,000
or work in the "HR" department.
NOT Operator
The NOT operator is used to negate a condition. It selects records where the
condition is not true.
Syntax
FROM table_name
Example
FROM Employees
Special Operators
Special operators allow you to perform more advanced filtering in the WHERE
clause. These operators include IN, BETWEEN, LIKE, IS NULL, and EXISTS.
1. IN Operator
The IN operator allows you to filter records where a column's value matches
any value in a list.
Syntax
FROM table_name
Example
FROM Employees
This query selects employees who work in either the "HR", "Finance", or "IT"
departments.
2. BETWEEN Operator
Syntax
FROM table_name
Example
FROM Employees
WHERE Salary BETWEEN 40000 AND 70000;
This query selects employees whose salary is between 40,000 and 70,000.
3. LIKE Operator
The LIKE operator is used for pattern matching with wildcards (% for any
sequence of characters and _ for a single character).
Example
FROM Employees
This query selects employees whose first name starts with the letter "J".
4. IS NULL Operator
The IS NULL operator is used to filter records where a column contains NULL
values.
Syntax
FROM table_name
Example
FROM Employees
This query selects employees who do not have a manager (i.e., their
ManagerID is NULL).
5. EXISTS Operator
Syntax
SELECT column1, column2
FROM table_name
Example
FROM Employees
This query selects employees who are assigned to at least one project (i.e.,
the subquery returns results).
17.JOIN Operations
In SQL, the JOIN operation is used to combine rows from two or more tables
based on a related column between them. There are variations of how to
perform joins, including Natural Join, JOIN USING, and JOIN ON. Below are
explanations and examples for each of these join operations:
Natural Join
A Natural Join automatically joins tables based on columns with the same
name and compatible data types in both tables. It eliminates duplicate
columns in the result set, only returning one instance of each matching
column.
Syntax
FROM table1
How it works:
Example
FROM Employees
Important Notes:
It’s important to ensure that the columns with the same name in both
tables are actually intended to be joined together, as the join happens
implicitly.
If the columns have different names or are not present in both tables,
the join will fail.
The JOIN USING syntax allows you to specify which columns should be used
for the join condition. This is helpful when the columns you want to join on
have the same name but are explicitly named in the query.
Syntax
FROM table1
JOIN table2
USING (column_name);
How it works:
It will join the tables based on the specified column(s) and eliminate
duplicates of those columns in the result set.
Example
FROM Employees
JOIN Departments
USING (DepartmentID);
In this query, the Employees and Departments tables are joined based on the
DepartmentID column, and the result will not contain two separate
DepartmentID columns.
Important Notes:
The column names in the USING clause must be the same in both
tables.
You can use multiple columns in the USING clause by listing them in
parentheses, separated by commas.
FROM Employees
JOIN Departments
JOIN ON Syntax
The JOIN ON syntax is the most flexible way to perform joins in SQL. It
allows you to specify a custom join condition, even when the columns in the
tables have different names or data types.
Syntax
FROM table1
JOIN table2
ON table1.column = table2.column;
How it works:
The ON clause allows you to define a condition for how the tables
should be joined.
This condition can involve columns with different names, different data
types, or more complex expressions.
Example
FROM Employees
JOIN Departments
ON Employees.DepartmentID = Departments.ID;
In this example, the Employees table is joined with the Departments table
where DepartmentID in the Employees table matches ID in the Departments
table. The columns have different names, which is why we use ON instead of
USING.
An outer join returns rows that match the join condition, as well as
unmatched rows from one or both tables. There are three types of outer
joins:
1. Left Outer Join (LEFT JOIN): Includes all rows from the left table and
matching rows from the right table. If no match is found, NULL values
are returned for columns from the right table.
Example:
SELECT P_CODE, VENDOR.V_CODE, V_NAME
FROM VENDOR LEFT JOIN PRODUCT ON VENDOR.V_CODE = PRODUCT.V_CODE;
2. Right Outer Join (RIGHT JOIN): Includes all rows from the right table
and matching rows from the left table. If no match is found, NULL
values are returned for columns from the left table.
Example:
3. Full Outer Join (FULL JOIN): Combines the results of both left and
right outer joins. It returns all rows from both tables, with NULLs where
there are no matches.
Example:
Outer joins are useful for finding unmatched rows, especially when dealing
with relationships like one-to-many (1:M), where you might need to find
vendors with no products or products with no vendors.
Cross Join
A cross join combines all rows from two tables, creating every possible pair
of rows between them.
Syntax:
SELECT column-list
FROM table1 CROSS JOIN table2;
Example:
If the INVOICE table has 8 rows and the LINE table has 18 rows, this query:
SELECT *
FROM INVOICE CROSS JOIN LINE;
18.Grouping Data
The GROUP BY clause is used to group rows that have the same values in
specified columns into summary rows. It’s often used with aggregate
functions to calculate results like totals or averages for each group.
Syntax
FROM table_name
GROUP BY column1;
Example
FROM Employees
GROUP BY DepartmentID;
This query counts the number of employees in each department. The result
will show a list of department IDs with the corresponding employee count.
19.HAVING Clause
The HAVING clause in SQL is used to filter the results of a query after the
GROUP BY operation has been applied. It is similar to the WHERE clause, but
while WHERE filters rows before grouping, HAVING filters groups after the
aggregation is done.
You use the HAVING clause when you need to apply conditions on aggregated
data (like sums, counts, averages, etc.) after the GROUP BY operation. You
cannot use WHERE for filtering aggregated data because WHERE operates
before grouping.
FROM table_name
GROUP BY column1
HAVING condition;
Let's say you want to find departments with more than 10 employees:
FROM Employees
GROUP BY DepartmentID
Explanation:
FROM Employees
GROUP BY DepartmentID
Explanation:
WHERE:
HAVING:
Example:
FROM Employees
GROUP BY DepartmentID
HAVING COUNT(EmployeeID) > 5; -- This filters groups after counting
In this example:
o The WHERE clause filters out employees with a salary less than
30,000 before the grouping occurs.
For example, to find vendors who do not provide products, you can use a
subquery:
Example uses:
SELECT P_CODE
FROM LINE
GROUP BY P_CODE
HAVING SUM(LINE_UNITS) > (SELECT AVG(LINE_UNITS) FROM LINE);
23.SQL Functions
SQL functions are essential for manipulating and transforming data within a
database. They are commonly used to process data elements that aren't
directly stored in the database but need to be derived. There are several
types of SQL functions, including arithmetic, string, and date functions.
String Functions:
o SELECT UPPER('hello');
o SELECT LOWER('HELLO');
o SELECT LENGTH('Hello');
o SUBSTRING(): Extracts a part of a string.
o SELECT NOW();
o SELECT CURDATE();
Numeric:
Relational set operators in SQL are used to combine the results of two or
more SQL queries. These operators treat the result sets of the queries as
sets and perform operations on them, similar to how sets are manipulated in
mathematics. The result of a set operation is typically a combination of the
rows returned by multiple queries, based on specific conditions.
1. UNION
2. UNION ALL
3. INTERSECT
1. UNION
The UNION operator is used to combine the results of two or more queries
and remove any duplicate rows from the final result set. It only returns
distinct rows.
Syntax:
FROM table1
UNION
FROM table2;
Requirements:
Example:
SELECT FirstName, LastName FROM Employees
UNION
This query returns all unique FirstName and LastName values from
both the Employees and Customers tables.
2. UNION ALL
The UNION ALL operator is similar to UNION, but it does not remove
duplicates. It combines the results of two or more queries and includes all
rows, even if they are duplicates.
Syntax:
FROM table1
UNION ALL
FROM table2;
Requirements:
o Like UNION, the queries combined by UNION ALL must have the
same number of columns and compatible data types.
Example:
UNION ALL
This query returns all FirstName and LastName values from both
Employees and Customers, including any duplicates.
3. INTERSECT
The INTERSECT operator returns only the rows that appear in both result
sets. It returns the common rows between the two queries.
Syntax:
FROM table1
INTERSECT
FROM table2;
Requirements:
o Only the rows that are present in both result sets are returned.
Example:
INTERSECT
This query returns only the FirstName and LastName values that
appear in both the Employees and Customers tables.
The EXCEPT operator (or MINUS in some databases like Oracle) returns the
rows that are in the first query but not in the second query. It subtracts
the result set of the second query from the first query.
Syntax:
FROM table1
EXCEPT
Requirements:
o EXCEPT returns the rows from the first query that are not
present in the second query.
Example:
EXCEPT
This query returns the FirstName and LastName values that are in the
Employees table but not in the Customers table.
Before you begin creating a database, it's important to define the structure
of the data. The database model outlines the organization of data, the
relationships between different entities, and how data will be stored and
retrieved.
Once you have a clear understanding of the database model, you can create
the actual database using the CREATE DATABASE command. This command
defines the database structure.
Syntax:
Example:
After creating the database, you can select it for use with the USE command
(in some database systems like MySQL):
USE CompanyDB;
The schema of a database refers to its structure, including the tables, views,
indexes, and relationships. It defines how data is organized within the
database. A schema can be thought of as a blueprint for the database,
providing the organization of data, including the tables and their columns.
You can create a schema within a database using the CREATE SCHEMA
command (if needed):
Syntax:
CREATE SCHEMA schema_name;
Example:
4. Data Types
When defining the structure of tables, it's important to specify the data
types for each column. The data type determines the kind of data that can
be stored in a column (e.g., integer, string, date).
Example:
CREATE TABLE Employees (
FirstName VARCHAR(50),
LastName VARCHAR(50),
BirthDate DATE,
Salary DECIMAL(10, 2)
);
Basic Syntax:
...
);
Example:
BirthDate DATE,
Salary DECIMAL(10, 2)
);
2. SQL Constraints
SupplierID INT,
);
You can also create a table by using the CREATE TABLE AS SELECT
statement. This method allows you to create a new table based on the result
of a SELECT query. This is useful for copying data from one table to another
or creating a table that stores the result of a complex query.
Syntax:
FROM existing_table
WHERE condition;
Example:
FROM Employees
4. SQL Indexes
Types of Indexes:
Example:
This creates a unique index on the Email column, ensuring that no two
employees can have the same email address.
Syntax:
Example:
This command changes the Salary column to store decimal values with
up to 12 digits, including 2 digits after the decimal point.
You may also want to change the characteristics of a column, such as its NOT
NULL constraint or default value.
Syntax:
Example:
3. Adding a Column
You can add a new column to an existing table. When adding a new column,
you can specify the data type and any constraints (such as NOT NULL).
Syntax:
Example:
This command adds a new Email column to the Employees table and
ensures that the email addresses are unique.
You can also add constraints to a table using the ALTER TABLE command. For
example, you can add a PRIMARY KEY, FOREIGN KEY, or CHECK constraint.
REFERENCES referenced_table(referenced_column);
Examples:
REFERENCES Departments(DepartmentID);
5. Dropping a Column
You can remove a column from an existing table using the ALTER TABLE
command with the DROP COLUMN clause.
Syntax:
Example:
This command removes the Email column from the Employees table.
To delete a table from the database, including all of its data, you can use the
DROP TABLE command. This is a permanent action, so use it with caution.
Syntax:
Example:
1. INSERT Command
Syntax:
table_name: The name of the table where you want to insert data.
column1, column2, ...: The names of the columns to insert data into.
Example:
This command inserts a new row into the Employees table with the
specified EmployeeID, FirstName, LastName, and Salary.
2. UPDATE Command
Syntax:
UPDATE table_name
WHERE condition;
table_name: The name of the table where you want to update data.
Example:
UPDATE Employees
3. DELETE Command
The DELETE command is used to remove rows of data from a table. You can
delete all rows or only those that meet a specific condition.
Syntax:
WHERE condition;
table_name: The name of the table from which you want to delete
data.
Example:
This command deletes the row from the Employees table where the
EmployeeID is 101.
4. SELECT Command
The SELECT command is used to retrieve data from one or more tables. It
allows you to specify which columns to retrieve, filter data with conditions,
and even perform operations on the data.
Basic Syntax:
SELECT column1, column2, ...
FROM table_name;
Example:
FROM Employees;
29.Procedural SQL
1. Stored Procedures
Syntax:
BEGIN
-- SQL statements
END;
Example:
BEGIN
END;
Example:
3. Conditional Execution
Syntax:
IF condition THEN
-- SQL statements
ELSE
-- SQL statements
END IF;
Example:
ELSE
END IF;
This example checks the Salary and updates the Status column based
on whether the salary is greater than 50,000.
4. Iteration or Looping
LOOP
-- SQL statements
END LOOP;
Example:
LOOP
END LOOP;
This example uses a loop to increment the salary of the first 10
employees by 500.
Syntax:
BEGIN
END;
Example:
BEGIN
END;
6. Triggers
A Trigger is a set of SQL statements that are automatically executed (or
triggered) in response to certain events on a table or view. Triggers are
commonly used for enforcing data integrity or auditing purposes.
Syntax:
ON table_name
BEGIN
END;
Example:
BEGIN
END;
This trigger is fired after an update to the Employees table and inserts
the old and new salary values into the SalaryHistory table.
Syntax:
BEGIN
-- SQL statements
RETURN result;
END;
Example:
RETURNS DECIMAL
BEGIN
END;
1. Planning
The planning stage is the foundation of the SDLC. It involves identifying the
scope, purpose, and feasibility of the project.
Key Activities:
2. System Analysis
Key Activities:
3. System Design
The design phase translates the requirements into a blueprint for the
system, outlining how it will function and look.
Key Activities:
4. Implementation
Key Activities:
o Write and compile the code for the system.
5. Maintenance
Key Activities:
The initial study is the first phase in the database life cycle, where the need
for a new database or modifications to an existing one is identified. The main
goal is to understand the requirements and feasibility of the project.
Key Activities:
2. Database Design
Key Activities:
Key Activities:
o Database Creation: Implement the schema by creating tables,
relationships, indexes, and constraints.
Testing and evaluation ensure that the database works as intended, meets
performance requirements, and satisfies the needs of users.
Key Activities:
5. Operation
Once the database is deployed, it enters the operational phase, where it is
actively used by the organization. The goal of this phase is to ensure that the
database operates smoothly and efficiently.
Key Activities:
The maintenance and evolution phase ensures that the database remains
up-to-date, efficient, and adaptable to changing business needs.
Key Activities:
The evaluation of transaction results ensures that only valid and consistent
data is stored in the database.
2. Consistency ensures that the database moves from one valid state to
another.
SQL also allows for setting different isolation levels to control the degree of
visibility that transactions have on each other (e.g., READ COMMITTED,
REPEATABLE READ, SERIALIZABLE).
33.Concurrency Control
Lost updates occur when two or more transactions update the same data
item simultaneously, and one of the updates is overwritten by another,
resulting in the loss of the first update.
Example:
o Transaction 2 also reads the same value of $100 from the same
account.
o The final balance is $130, but the $120 update from Transaction
1 is lost.
Example:
Example:
Types of Schedulers:
1. Transaction Log:
2. Transaction States:
o Active: A transaction is in progress.
3. Types of Failures:
Globalization and the need for quick data access from different
regions.
Mobile and web-based services creating demand for rapid,
location-independent data.
Data convergence requiring management of diverse data types (text,
video, etc.).
Advantages:
Disadvantages:
Companies like Google and Amazon use distributed databases, but their full potential
is still evolving with technologies like NoSQL.
1. Data Sources:
1. Improved Decision-Making:
2. Enhanced Efficiency:
3. Cost Savings:
1. Data Sources:
4. BI Tools:
1. Improved Decision-Making:
2. Historical Analysis:
While both data warehouses and data lakes are used for storing large
volumes of data, there are key differences:
1. Fact Table:
o The fact table is the central table in the star schema. It stores
quantitative data or facts (e.g., sales, revenue, quantity sold).
Fact tables typically contain foreign keys that reference
dimension tables, along with numerical values (measures) that
can be aggregated, such as sales amounts or transaction counts.
2. Dimension Tables:
3. Foreign Keys:
1. Simplified Querying:
o The star schema is easy to understand and query because of its
simple structure. Users can easily perform queries on the fact
table and filter or group data by dimensions.
2. Performance Optimization:
4. Easy to Maintain:
1. Redundancy:
2. Limited Flexibility:
o The simplicity of the star schema can limit its ability to represent
complex relationships, especially when handling many-to-many
relationships or complex hierarchies.
2. Diagnostic Analytics:
3. Predictive Analytics:
4. Prescriptive Analytics:
1. Statistical Analysis:
2. Data Mining:
3. Machine Learning:
5. Data Visualization:
o Data visualization tools, such as Tableau, Power BI, and Qlik, help
transform raw data into graphical representations, such as
charts, graphs, and dashboards. This makes it easier for users to
interpret and communicate findings.
6. Natural Language Processing (NLP):
1. Improves Understanding:
2. Enhances Decision-Making:
1. Clarity:
2. Accuracy:
3. Simplicity:
4. Interactivity:
5. Context:
14-1a Volume
Volume refers to the sheer amount of data being generated. With the
advent of digital technologies, organizations now deal with petabytes
and exabytes of data, far surpassing the capacity of traditional data
storage systems.
14-1b Velocity
14-1c Variety
Variety refers to the different types of data that come from multiple
sources. This data can be structured (e.g., tables in databases), semi-
structured (e.g., JSON or XML), or unstructured (e.g., text, images,
video).
1. Veracity:
2. Value:
o Refers to the usefulness of the data. While Big Data is abundant,
not all of it is valuable. Extracting meaningful insights from large
datasets is the key to leveraging Big Data.
3. Complexity:
42. Hadoop
2. MapReduce:
4. Hadoop Common:
Hadoop Ecosystem
1. Hive:
2. Pig:
3. HBase:
4. Spark:
1. Authentication:
2. Authorization:
3. Encryption:
6. Data Masking:
1. SQL Injection:
2. Privilege Escalation:
4. Insider Threats:
5. Data Breaches: