0% found this document useful (0 votes)
1 views22 pages

Database & Big Data

The document discusses the importance of databases and big data in organizations, highlighting how they facilitate data management and decision-making. It covers various database management systems (DBMS) like MySQL, PostgreSQL, MariaDB, SQL Lite, and CouchDB, detailing their advantages and disadvantages. Additionally, it addresses the significance of data cleansing and data warehouses in ensuring data accuracy and improving organizational efficiency.

Uploaded by

Mark Ah Chee
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views22 pages

Database & Big Data

The document discusses the importance of databases and big data in organizations, highlighting how they facilitate data management and decision-making. It covers various database management systems (DBMS) like MySQL, PostgreSQL, MariaDB, SQL Lite, and CouchDB, detailing their advantages and disadvantages. Additionally, it addresses the significance of data cleansing and data warehouses in ensuring data accuracy and improving organizational efficiency.

Uploaded by

Mark Ah Chee
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 22

DATABASE & BIG DATA

• Understanding their principles in the digital age.


• How it makes an organization successful.
• How they work.
Why learn about Database & Big
Data?
To learn :
• Where all this data came from.
• How to extract and analyze it.
• Where does it go.
• Safeguarded.
• How to use it to your advantage and the organization’s advantage.
• About tools and processes that enable users to manage all this data.
DATABASE
• A database is a well-designed, organized, and carefully managed
collection of data.
• Like other components of an information system, a database should
help an organization achieve its goal.
How does it help an organization
achieve its goals?
• By providing managers and decision makers with timely, accurate, and
relevant information built on data.

• It also helps companies analyze information to reduce costs, increase


profits, add new customers, track past business activities, and open
new market opportunities.
Database Management system
(DBMS)
• It consists of a group of programs used to access and manage a
database as well as provide an interface between the database and its
users and other application programs.
• DBMS is important because, most organizations have many
databases. Without good data management, it is nearly impossible for
anyone to find the right and related information for accurate and
business-critical decision making.
MySQL PostgreSQL MariaDB SQL Lite CouchDB
• MySQL is an open source RDBMS that create and
manage databases.
• As a rational database, MySQL stores data in tables
(data bin or storage container) of rows and columns
organized into schemes.
• A scheme defines how data is organized and stored
and describes the relationships among various tables.

Advantages
• Ease of use. Developers can install MySQL in
minutes, and the database is easy to manage.
• Reliability. MySQL is one of the most mature and
widely used database.

Disadvantage
• MySQL is inefficient for places where we need to
store very large data.
• MySQL does not have good developing and
MySQL PostgreSQL MariaDB SQL Lite CouchDB

• PostgreSQL is a free and open source relational database


management system emphasizing extensibility and SQL
compliance.
• It is one of the oldest yet the most advanced open source DBMS.
• It safely store and scale the most complicated data workloads.
• It aims to help developers build applications, administrators to
protect data integrity and build fault-tolerant environments, and
help you manage your data no matter how big or small the
dataset.

Advantages
• Reduce costs. As a true open source product, PostgreSQL does
not cause anything – no license fees.

Disadvantages
• Slower performance
MySQL PostgreSQL MariaDB SQL Lite CouchDB

• MariaDB is very similar to MySQL.


• It maintains high compatibility with MySQL.
• MariaDB supports external plugins, which means you
can extend the database and apply it in more use
case, such as e-commerce, data warehousing and
logging applications.

Advantage
• It is more scalable and offers a higher query speed.

Disadvantages
• It cannot support complex data types.
MySQL PostgreSQL MariaDB SQL Lite CouchDB

• SQL lite is the most used database engine in the


world.
• It is a C-language library that implements a small, fast,
self-contained, high-reliability, full-featured, SQL
database engine.

Advantage
• SQL lite source code is in the public-domain and is free
to everyone to use for any purpose.

Disadvantage
• It is not designed for high-concurrency or large scale
applications. It lacks advanced security features.
MySQL PostgreSQL MariaDB SQL Lite CouchDB

• It is an open source NoSQL document database that collects


and stores data in JSON-based document formats.
• It uses a schema-free data model, which simplifies record
management across various computing devices, mobile
phones and web browsers.

Advantage
• Scalability. The architectural design of CouchDB makes it
extremely adaptable when partitioning databases and scaling
data onto multiple nodes.

Disadvantages
• CouchDB takes a large space for overhead, which is major
disadvantage as compared to other databases.
• Temporary views on huge datasets are very slow.
Big Data
• Big data is the term used to describe data collections that are so
enormous (terabytes or more) and complex (from sensor data to
social media data) that traditional data management software,
hardware, and analysis processes are incapable of dealing with them.
• Organizations collect and use data from a variety of sources, including
business applications, social media, sensors and controllers that are
part of the manufacturing process, systems that manage the physical
environments and many more.
Sources of an organization’s useful data
• An organization has many sources of
useful data.
• Much of those data is instructed and does
not fit neatly into traditional relational
database management.
Characteristics of Big Data
• Volume provides the amount
of data and the form of data.
• Velocity is the data speed
and it provides the time at
which the data is collected
and analyze.
• Variety provides the type of
data collected.
Big Data Uses
Here are some examples of how organizations are employing big data to improve their day-to-day
operations, planning, and decision making.
• Retail organizations monitor social networks such as Facebook, Google, LinkedIn, Twitter, and
Yahoo to engage brand advocates, identify brand adversaries (and attempt to reverse their
negative opinions), and even enable passionate customers to sell their products.
• Advertising and marketing agencies track comments on social media to understand consumers’
responsiveness to ads, campaigns, and promotions.
• Hospitals analyze medical data and patient records to try to identify patients likely to need
readmission within a few months of discharge, with the goal of engaging with those patients in
the hope of preventing another expensive hospital stay.
Data Cleansing
• Data cleansing is the process of detecting and then
correcting or deleting incomplete, incorrect, inaccurate, or
irrelevant records that reside in a database.
• The goal of data cleansing is to improve the quality of the
data used in decision making.
Specific steps to perform Data
cleansing
There are specific steps
an organization might Steps:
take to perform data • To identify and correct data by cross checking it against a
cleansing before adding validated data set.
• Clear formatting.
this data to a data
• Remove irrelevant data
warehouse.
• Remove duplicates
Data cleansing is a key • Filter missing values
part of the overall data • Delete outliers
management process • Converting data types
and one of the core • Validate data
components of data
preparation work.
Data cleansing in an organization
• Inspect and profiling.
• Cleaning.
• Verification.
• Reporting.
Accuracy of Data
• The cost of performing data cleansing to achieve 100% database
accuracy can be prohibitively expensive.
• Accuracy can be measured with percent error which determines the
percentage of error between the sample’s measured observation and
the true measure of the population. If the measurement is far from
the true value of the population, the percent error is high and the
accuracy is low vice versa.
Concerns raised by performing data
cleansing and how to address them.
• One of the primary challenges you’ll encounter on your data cleansing
is resistance to change. Many organizations are accustomed to their
existing data processes and may be hesitant to disrupt them.
Employees might be resistant to adopting new tools and procedures,
fearing they will add complexity to their workflow. To overcome this
concern or challenge, it’s crucial to emphasize the benefits of data
cleansing and communicate the positive impact it will have on daily
operations. Provide training and support to help your team transition
smoothly and highlight how clean data can make their jobs more
efficient and effective.
• Data security is a paramount concern for any organization, and data
cleansing can raise legitimate security worries. Sharing and processing
data, even for the purpose of cleaning, can be perceived as a risk. To
address this challenge, implement robust data security measures.
Ensure the data is encrypted during transmission and storage, and
limit access to those who need it. Collaborate closely with your IT and
cybersecurity teams to establish a secure data cleaning process that
complies with relevant regulations and safeguards sensitive
information.
After doing all these data cleansing, than all these data is
than added to the data warehouse.

What is a data warehouse?


• A data warehouse is a database that holds
business information from many sources in the
enterprise, covering all aspects of the
company’s processes, products, and customers.
• Data warehouse allows managers to “drill
down” to get greater detail or “roll up” to
generate aggregate or summary reports.
• The primary purpose is to relate information in
innovative ways and help managers and
executives make better decisions.
• A data warehouse stores historical data that
has been extracted from operational systems
and external data sources.
REFERENCE
• Stair, R. & Reynolds, G. (2018). Principles of Information Systems 163 (13th ed.). Cengage Learning (pp. 192-227)
• Thryv Data Team. (January 15,2025). The Dirty Secret of Data: Why your Business Needs Data Cleansing Now More
Than Ever. Overcoming Data Cleansing Challenges. Retrieved from https://www.sensisdata.com.au
• Stephen, J. & Craig, S. (2005). What is data cleansing?. How to conduct data cleansing at your organization.
Techtarget. Retrieved from https://www.techtarget.com
• Christine, H. & Elizabeth, F. (November 21, 2023). Accuracy & Precision in Data. Measuring Accuracy. Retrieved
from https://study.com
• Atlan. (October 25, 2024). Data Accuracy in 2024: A Complete Guide to Reliable Data Quality. What is data
accuracy and what are its types? Retrieved from https://atlan.com
• Repustate. (December 15, 2022). Top 10 data cleansing techniques for better results. What are the top 10 data
cleansing techniques? Retrieved from https://www.repustate.com
• IBM. (October 6, 2021). What is CouchDB? Retrieved from https://www.ibm.com
• SQLite. (February 18, 2025). What is SQL lite? Retrieved from http://www.sqlite.org
• MariaDB. (2025). MariaDB Enterprise Platform. New to MariaDB Server? Retrieved from https://mariadb.org
• Stuti, D. (2019) PostgreSQL Advantages and Disadvantages. Aalpha. Retrieved from https://www.aalpha.net
• PostgreSQL. (February 20, 2025). PostgreSQL: The world’s most advanced open source relational database. New
to PostgreSQL? Retrieved from https://www.postgresql.org

You might also like