0% found this document useful (0 votes)

3 views

Data Management & Data Architecture

The document outlines various tools and technologies used in data analytics, including programming languages, data visualization tools, database management systems, and cloud platforms. It also discusses the evolution of data analytics from manual processes to advanced analytics and machine learning, highlighting real-life examples from companies like Netflix and Amazon. Additionally, it emphasizes the importance of data management and architecture in ensuring efficient data analysis and insights generation.

Uploaded by

triveni k

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views

Data Management & Data Architecture

Uploaded by

triveni k

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 21

Tools of Data Analytics:-

In the field of data analytics, a wide range of tools and technologies

are available to help professionals collect, clean, process, analyze, and

visualize data. These tools empower analysts to make sense of vast

datasets and extract valuable insights efficiently. Here are some of the
essential tools of the trade in data analytics:

Programming Languages:

Python: Python is a versatile and popular programming language for data

analytics. It offers libraries such as Pandas, NumPy, Matplotlib, and
Seaborn for data manipulation, analysis, and visualization.

R: R is another widely used programming language for statistical analysis

and data visualization. It has a rich ecosystem of packages like ggplot2,
dplyr, and tidyr.

Data Visualization Tools:

Tableau: Tableau is a powerful data visualization tool that allows users

to create interactive and shareable dashboards. It's known for its ease of
use and ability to connect to various data sources.

Power BI: Microsoft's Power BI is a business intelligence tool that

provides robust data visualization and reporting capabilities. It's
particularly well-suited for organizations using Microsoft products.

D3.js: D3.js is a JavaScript library for creating custom data

visualizations. It gives analysts full control over the visualization design
and interactivity.

Database Management Systems:

SQL: SQL (Structured Query Language) is essential for querying and

managing relational databases. It's used for data retrieval, manipulation,
and data cleaning.
MySQL, PostgreSQL, SQLite: These are popular open-source relational
database management systems (RDBMS) often used in data analytics
projects.

NoSQL databases: For handling unstructured or semi-structured data,

NoSQL databases like MongoDB, Cassandra, and Elasticsearch are
valuable.

Big Data Tools:

Hadoop: Hadoop is an open-source framework for distributed storage

and processing of big data. Hadoop's ecosystem includes tools like
HDFS, MapReduce, and Hive.

Spark: Apache Spark is another big data framework known for its speed
and versatility. It's used for data processing, machine learning, and graph
analytics.

Statistical Analysis Software:

IBM SPSS(statistical package for social sciences): SPSS is a statistical

software package that provides advanced statistical analysis, data mining,
and predictive analytics capabilities.

SAS: SAS (statistical analysis system)offers a suite of analytics solutions

for data analysis, machine learning, and statistical modeling.

Machine Learning Libraries:

Scikit(scipy tookkit / scipy)-Learn: Scikit-Learn is a Python library that

provides tools for machine learning, including classification, regression,
clustering, and model evaluation.scipy is used to solve scientific &
mathematical problems.

TensorFlow and PyTorch: These libraries are popular for deep learning
and neural network development.

Data Cleaning and Preprocessing Tools:

OpenRefine: OpenRefine (formerly Google Refine) is a tool for cleaning

and transforming messy data, making it suitable for analysis.
Trifacta: Trifacta offers data wrangling capabilities that simplify the
process of cleaning and structuring data.

Note: Data wrangling is the process of converting raw data into a usable
form

Cloud Platforms:-

Amazon Web Services (AWS): AWS provides a wide range of cloud-

based data analytics services, including Amazon Redshift, Amazon
Athena, and Amazon SageMaker.

Google Cloud Platform (GCP): GCP offers BigQuery, Dataflow, and

other services for data analytics and machine learning.

Microsoft Azure: Azure provides services like Azure SQL Data

Warehouse and Azure Machine Learning for data analytics.

The Evolution of Data Analytics:-

Early Stages:

Manual Data Analysis (Pre-20th Century): Before the advent of

computers, data analysis was a manual and time-consuming process.
Analysts relied on charts, graphs, and basic statistical methods for
insights.

Emergence of Computers:

1950s-1960s: Mainframe Era: With the rise of mainframe computers,

organizations started using rudimentary data processing for business
applications. This marked the beginning of more systematic data
handling.

1970s: Decision Support Systems (DSS): Decision Support Systems

emerged, integrating computer systems with data analysis tools to assist
in decision-making.

Database Management Systems (DBMS):

1980s: Rise of Relational Databases: The advent of relational database
management systems (RDBMS) streamlined data storage and retrieval,
laying the foundation for structured data handling.

Data Warehousing and Business Intelligence:

1990s: Data Warehousing and BI: Data warehousing gained prominence,

allowing organizations to consolidate and analyze data from various
sources. Business Intelligence tools facilitated more accessible reporting
and analysis.

Big Data Era

Early 2000s: Big Data Emergence: With the proliferation of the internet,
social media, and sensors, the volume of data exploded. The term "Big
Data" emerged, emphasizing the challenges and opportunities posed by
massive datasets.

Mid-2010s: Hadoop and NoSQL: Technologies like Hadoop and

NoSQL databases addressed the scalability issues associated with
handling large volumes of unstructured and semi-structured data.

Advanced Analytics and Machine Learning:

2010s-2020s: Advanced Analytics: The integration of machine learning

and advanced analytics became more prevalent. Predictive analytics, data
mining, and artificial intelligence played a crucial role in deriving
insights from complex datasets.

Current Trends:

Real-time Analytics and Edge Computing: The need for real-time

insights led to the development of technologies like edge computing,
allowing data analysis closer to the source.

Augmented Analytics: The use of machine learning to automate data

preparation, insight discovery, and sharing insights has become more
widespread, making analytics more accessible to non-experts.
Exponential Growth in Data: The sheer volume of data generated
continues to grow exponentially, challenging organizations to develop
strategies for effective data management and analysis.

REAL LIFE EXAMPLES OF DATA ANALYTICS:-

Netflix's Content Recommendation: Netflix, the popular streaming

service, uses data analytics to recommend content to its users. They
collect data on user preferences, viewing history, and more. By analyzing
this data, they can suggest movies and TV shows that users are likely to
enjoy. This personalized recommendation system has contributed to their
tremendous growth and customer satisfaction.

Amazon's Supply Chain Optimization: Amazon employs data analytics

to optimize its supply chain. They use historical data, real-time data, and
predictive analytics to forecast demand, manage inventory efficiently, and
reduce shipping times. This has allowed them to offer fast and reliable
delivery to their customers.

Healthcare Predictive Analytics: Hospitals and healthcare providers are

using data analytics to predict patient outcomes, identify disease trends,
and improve patient care.

For example: 1.the Cleveland Clinic uses predictive analytics to identify

patients at risk of re-admission, allowing them to provide early
interventions and reduce healthcare cost.

2.Mount Sinai Health System in New York demonstrates the power of

using data to prevent re-admissions, improve patient outcomes and drive
cost savings.

Uber's Dynamic Pricing: Uber uses real-time data analytics to

implement surge pricing during periods of high demand. This data-driven
approach helps balance supply and demand, ensuring that customers can
get a ride when they need one and drivers are available during peak times.
Walmart's Inventory Management: Walmart utilizes data analytics to
manage its vast inventory efficiently. They collect data on sales, weather
patterns, and even social media trends. By analyzing this data, they can
make informed decisions about inventory levels, reducing costs and
ensuring products are available when customers want them.

Sports Analytics: In the sports industry, data analytics has become

increasingly important. Teams use analytics to assess player performance,
make strategic decisions during games, and even identify potential talent.
The "Moneyball" story, which revolves around the Oakland Athletics' use
of analytics to build a competitive baseball team, is a well-known
example.

E-commerce Personalization: Companies like eBay and Amazon

employ data analytics to personalize the online shopping experience.
They analyze customer behavior and browsing history to recommend
products and tailor the website interface, leading to increased sales and
customer satisfaction.

DATA MANAGEMENT IN DATA ANALYTICS:

Data management in data analytics refers to the process of collecting,

organizing, storing, and maintaining data in a way that makes it accessible, accurate,
and secure for analysis. Effective data management is essential for ensuring that
analytics can be performed efficiently and that the insights derived are trustworthy.
Here are the key components involved in data management for analytics:

Data Collection: Gathering data from various sources, such as databases,

sensors, social media, business transactions, and external data providers. This
step is crucial to ensure that the data is comprehensive and relevant to the
analysis.

Data Storage: Organizing data in databases or data warehouses. It could

involve cloud storage, on-premises databases, or hybrid systems, depending on
the organization's needs. A well-structured storage system helps in easily
retrieving and querying the data when needed.

Data Cleaning: Ensuring that the data is accurate, complete, and free from
inconsistencies. Data cleaning involves removing duplicates, handling missing
values, correcting errors, and standardizing data formats to improve the quality
of data before analysis.

Data Integration: Combining data from multiple sources to create a unified

view. This might involve merging datasets from different departments,
applications, or external providers to form a comprehensive dataset that is
useful for analysis.

Data Security: Implementing measures to protect data from unauthorized

access, corruption, or loss. This includes encryption, access control, data
masking, and regular backups, especially when dealing with sensitive or
personal information.

Data Governance: Establishing policies and procedures that ensure data is

managed consistently, ethically, and in compliance with regulations. It also
ensures that there is accountability in the data management process.

Data Accessibility: Ensuring that the right people have access to the data they
need, when they need it, without compromising security. This can include
using tools like dashboards, APIs, or data lakes that allow for easy access and
sharing of data across different teams.

Data Quality Assurance: Continuously monitoring and improving data

quality to ensure that data remains reliable and useful for analysis over time.
This can involve setting data quality metrics and using automated tools for
monitoring data health.

Data Analysis and Reporting: Once the data is organized, cleaned, and made
accessible, it’s ready for analysis. Effective data management helps ensure that
data analysts and data scientists can extract meaningful insights, generate
reports, and build predictive models without running into data quality or
accessibility issues.

DESIGN DATA ARCHITECTURE:

Data Sources

Internal Data: CRM, ERP, databases, logs, etc.

External Data: APIs, third-party services, IoT, etc

Data Ingestion Data Ingestion Data Ingestion

(Batch) (Real-Time) (Hybrid)
ETL/ELT Stream processing Combination of batch
Batch & Real-time feeds & stream
jobsScheduling
Custom Pipelines
Data Storage Layer

Data Lake: Raw data (AWS S3, Azure Data Lake)

Data Warehouse: Structured, optimized for queries (Snowflake, Redshift, BigQuery)

NoSQL: Unstructured data (MongoDB, Cassandra, etc.)

Data Processing Layer
| - Data Marts: Department-specific data (Marketing, Finance)
ETL/ELT: Data Transformation (Apache Spark, Databricks)

Batch Processing: Big Data Processing (Hadoop, Spark)

Stream Processing: Real-time Analytics (Flink, Spark)

Data Wrangling & Cleansing: Prepare data for analysis

Data Analytics Layer

BI Tools: Dashboards, reporting (Power BI, Tableau)

Data Science: Predictive analytics (ML models)

Prescriptive Analytics: Actionable insights (AI)

Self-Service BI: Empower business users to analyze data

Data Presentation Layer

Dashboards & Reports: Interactive visualizations

API s: Enable programmatic data access

Alerts & Notifications: Real-time triggers for users

Data Governance & Security

Data Quality: Ensure accuracy and consistency

Data Security: Encryption, Role-based access (RBAC)

Compliance: Regulatory adherence (GDPR, HIPAA, etc.)
Data architecture design is set of standards which are composed of
certain policies, rules, models and standards which manages, what type of
data is collected, from where it is collected, the arrangement of collected
data, storing that data, utilizing and securing the data into the systems and
data warehouses for further analysis.
 Data architecture design is important for creating a vision of
interactions occurring between data systems.
 Data architecture also describes the type of data structures applied to
manage data and it provides an easy way for data preprocessing.
Designing a data architecture involves creating a blueprint for how
data will be collected, processed, stored, and consumed to meet business
and analytics goals. The above diagram shows to design a modern data
architecture for analytics, with an emphasis on scalability, flexibility, and
efficiency.

Key Components of the Data Architecture:

Data Sources:

1. Internal: Includes databases (SQL, NoSQL), CRMs, ERPs,

and operational systems that generate data.
2. External: APIs, external data providers, social media, web
scraping, and third-party services.

Data Ingestion Layer:

Data ingestion refers to the process of importing, transferring, or

loading data from various external sources into a system or storage
infrastructure.

1. Batch Processing: Periodic data ingestion processes using

ETL tools or batch jobs.
2. Real-Time Ingestion: Tools like AWS Kinesis, or Google
Pub/Sub for continuous data flow from IoT, event logs, or
real-time sources.
3. Hybrid: A combination of both batch and real-time
ingestion processes to handle different data types and
requirements.

Data Storage Layer:

1. Data Lake: Centralized, cost-effective storage for raw,

unprocessed data (e.g., AWS S3(Simple Storage Service),
Google Cloud Storage).
2. Data Warehouse: Structured data storage optimized for
analytical querying (e.g., Amazon Redshift, Snowflake,
BigQuery).
3. NoSQL Database: Used for semi-structured or unstructured
data that doesn't fit relational models (e.g., MongoDB,
Cassandra).
4. Data Marts: Small, subject-specific data stores for
departments like finance, marketing, etc.

Data Processing Layer:

1. ETL/ELT: Extract, Transform, and Load (or Load,

Transform, Extract) processes for converting raw data into
structured and clean data for analytics.
2. Batch Processing: Uses big data frameworks like Apache

Spark or Hadoop to process large datasets.

3. Stream Processing: Stream processing is the processing of

data in real time. Real-time analytics and data transformation

using Apache Flink or Spark Streaming.

Data Analytics Layer:

1. BI Tools: Tools like Power BI, Tableau are used for

reporting and building interactive dashboards.

2. Data Science: Predictive analytics powered by machine

learning models, using tools like Python (scikit-learn,

TensorFlow), R, or cloud ML platforms like AWS
SageMaker or Azure ML.
3. Prescriptive Analytics: Optimizing business processes or

providing actionable insights using algorithms and AI.

Data Presentation Layer:

1. Dashboards & Reports: Visualize data trends, KPIs(key

performance indicator), and actionable insights for business
decision-makers.
2. APIs: Expose data and analytical models via APIs to other
systems, services, or applications.
3. Alerts & Notifications: Real-time alerts and notifications
based on predefined business rules or triggers.

Data Governance & Security:

1. Data Quality: Ensure that data is accurate, clean, and

consistently meets the organization's standards.
2. Security: Apply security measures such as encryption, role-
based access control (RBAC), and other security protocols to
protect sensitive data.
3. Compliance: Ensure compliance with data regulations
(GDPR, HIPAA, etc.).
4. Data Lineage: Track the flow of data from its source to its
consumption to ensure transparency and traceability.

Tools and Technologies:

 Data Ingestion: Kafka, AWS Kinesis, Apache Flink, AWS Glue,

Talend
 Data Storage: AWS S3 (Data Lake), Snowflake, Amazon
Redshift, BigQuery, MongoDB, Cassandra
 Data Processing: Apache Spark, Databricks, Apache Hadoop,
Apache Flink
 Data Analytics: Tableau, Power BI, Jupyter Notebooks, AWS
SageMaker, TensorFlow, scikit-learn
 Data Governance: Apache Atlas, Amundsen, DataHub
 Security: AWS KMS, Azure Key Vault, IAM (Identity and Access
Management), data encryption technologies
Advanced Analytics Techniques

Advanced analytics techniques refer to a set of sophisticated and

complex methods used to analyze and interpret data, uncover patterns,
trends, and insights, and make informed business decisions. These
techniques go beyond basic statistical analysis and traditional business
intelligence approaches. Advanced analytics leverages cutting-edge
computational and statistical methods to handle large volumes of data and
extract meaningful information. Some common advanced analytics
techniques include:

Machine Learning (ML): ML algorithms enable computers to learn

from data and make predictions or decisions without explicit
programming. Supervised learning, unsupervised learning, and
reinforcement learning are common ML approaches.

Predictive Analytics: This involves using statistical algorithms and

machine learning techniques to identify the likelihood of future outcomes
based on historical data. Predictive analytics is widely used in forecasting
and risk assessment.

Data Mining: Data mining involves the extraction of patterns and

knowledge from large datasets. It includes techniques such as clustering,
association rule mining, and anomaly detection.

Natural Language Processing (NLP): NLP techniques enable

computers to understand, interpret, and generate human-like text. This is
especially useful for analyzing unstructured data, such as social media
posts or customer reviews.

Text Analytics: This involves analyzing and extracting insights from

textual data. Techniques include sentiment analysis, named entity
recognition, and topic modeling.

Big Data Analytics: Big data analytics involves processing and

analyzing massive datasets that traditional databases cannot handle.
Technologies like Hadoop and Spark are commonly used in big data
analytics.
Prescriptive Analytics: This type of analytics goes beyond predicting
future outcomes and recommends actions to achieve desired outcomes. It
combines predictive analytics, optimization, and simulation.

Time Series Analysis: This technique is used to analyze time-ordered

data points to identify patterns, trends, and seasonality. It is commonly
applied in finance, economics, and forecasting.

Cluster Analysis: This involves grouping similar data points together

based on certain characteristics. It is often used for customer
segmentation and anomaly detection.

Optimization Techniques: These methods aim to find the best solution

to a problem by optimizing certain parameters. Linear programming,
integer programming, and genetic algorithms are examples of
optimization techniques.

Simulation Modeling: This involves creating computer models that

simulate the behavior of complex systems. It is used to understand how
changes in variables can impact the overall system.

Spatial Analytics: This involves analyzing geographic or spatial data to

identify patterns and trends. Geographic Information System (GIS)
technology is commonly used in spatial analytics.

Fraud Analytics: Utilizing advanced analytics to detect and prevent

fraudulent activities. Machine learning algorithms can analyze patterns in
transactions to identify anomalies indicative of fraud.

Social Network Analysis (SNA): SNA examines relationships and

interactions between entities, such as individuals or organizations, to
reveal patterns and structures within social networks.

Preservation Analysis: This is commonly used in maintenance and

reliability engineering to predict the remaining useful life of equipment
and assets, helping organizations optimize maintenance schedules.

Cohort Analysis: Involves grouping individuals with shared

characteristics to analyze trends and behaviors over time. It's often used
in marketing to understand customer behavior.
A/B Testing (Split Testing): This is a statistical method used to compare
two versions of a product or webpage to determine which performs better.
It's commonly used in marketing and product development.

Bayesian Analysis: A statistical method based on Bayes' theorem that

updates probabilities as new data becomes available. It's particularly
useful in situations with limited data.

Deep Learning: A subset of machine learning that involves neural

networks with multiple layers (deep neural networks). Deep learning is
especially effective in tasks like image and speech recognition.

Quantitative Risk Analysis: Involves assessing and quantifying risks

using statistical models and simulations. It helps organizations understand
the potential impact of risks and prioritize risk management strategies.

Pattern Recognition: This involves identifying and classifying patterns

within data. It's used in various applications, such as image recognition,
speech recognition, and medical diagnosis.

Bayesian Networks: A probabilistic graphical model that represents a set

of variables and their probabilistic dependencies. It's used for reasoning
under uncertainty and is applied in various fields, including healthcare
and finance.

Ensemble Learning: This involves combining multiple machine learning

models to improve predictive performance and reduce overfitting.

Prescriptive Maintenance: This uses advanced analytics to predict when

equipment or machinery is likely to fail, allowing for proactive
maintenance and minimizing downtime.

Organizations use advanced analytics techniques to gain a

competitive advantage, improve decision-making processes, and uncover
hidden insights within their data. These techniques are applied across
various industries ,including finance, healthcare, marketing,
manufacturing, and more.

Data Management:
 Data management is the process of managing tasks like extracting
data, storing data, transferring data, processing data, and then
securing data with low-cost consumption.
 Main motive of data management is to manage and safeguard the
people’s and organization data in an optimal way so that they can
easily create, access, delete, and update the data.
 Because data management is an essential process in each and every
enterprise growth, without which the policies and decisions can’t be
made for business advancement. The better the data management the
better productivity in business.
 Large volumes of data like big data are harder to manage traditionally
so there must be the utilization of optimal technologies and tools for
data management such as Hadoop, Scala, Tableau, AWS, etc. Which
can further used for big data analysis in achieving improvements in
patterns.
 Data management can be achieved by training the employees
necessarily and maintenance by DBA, data analyst, and data
architects.

Data Collection:
 Data collection is the process of acquiring, collecting, extracting, and
storing the voluminous amount of data which may be in the structured or
unstructured form like text, video, audio, XML files, records, or other
image files used in later stages of data analysis.
 In the process of data analysis, “Data collection” is the initial step
before starting to analyze the patterns or useful information in data.
 The data which is to be analyzed must be collected from different valid
sources.
 The data which is collected is known as raw data which is not useful
now but on cleaning the impure and utilizing that data for further analysis
forms information, the information obtained is known as “knowledge”.
 The main goal of data collection is to collect information-rich data.
 Data collection starts with asking some questions such as what type of
data is to be collected and what is the source of collection.
Various sources of Data:
The data sources are divided mainly into two types known as:
1. Primary data
2. Secondary data

1. Primary data:
The data which is Raw, original, and extracted directly from the
official sources is known as primary data. This type of data is collected
directly by performing techniques such as questionnaires, interviews, and
surveys. The data collected must be according to the demand and
requirements of the target audience on which analysis is performed
otherwise it would be a burden in the data processing.
Few methods of collecting primary data:
1. Interview method:
The data collected during this process is through interviewing the
target audience by a person called interviewer and the person who
answers the interview is known as the interviewee. Some basic business
or product related questions are asked and noted down in the form of
notes, audio, or video and this data is stored for processing. These can be
both structured and unstructured like personal interviews or formal
interviews through telephone, face to face, email, etc.
2. Survey method:
The survey method is the process of research where a list of
relevant questions are asked and answers are noted down in the form of
text, audio, or video. The survey method can be obtained in both online
and offline mode like through website forms and email. Then that survey
answers are stored for analyzing data. Examples are online surveys or
surveys through social media polls.
3. Observation method:
The observation method is a method of data collection in which the
researcher keenly observes the behavior and practices of the target
audience using some data collecting tool and stores the observed data in
the form of text, audio, video, or any raw formats. In this method, the
data is collected directly by posting a few questions on the participants.
For example, observing a group of customers and their behavior towards
the products. The data obtained will be sent for processing.
4. Experimental method:
The experimental method is the process of collecting data through
performing experiments, research, and investigation. The most frequently
used experiment methods are CRD, RBD, LSD, FD.
 CRD- Completely Randomized design is a simple experimental
design used in data analytics which is based on randomization and
replication. It is mostly used for comparing the experiments.
 RBD- Randomized Block Design is an experimental design in which
the experiment is divided into small units called blocks. Random
experiments are performed on each of the blocks and results are drawn
using a technique known as analysis of variance (ANOVA). RBD was
originated from the agriculture sector.
 LSD – Latin Square Design is an experimental design that is similar
to CRD and RBD blocks but contains rows and columns. It is an
arrangement of NxN squares with an equal amount of rows and columns
which contain letters that occurs only once in a row. Hence the
differences can be easily found with fewer errors in the experiment.
Sudoku puzzle is an example of a Latin square design.
 FD- Factorial design is an experimental design where each
experiment has two factors each with possible values and on performing
trail other combinational factors are derived.

2. Secondary data:
Secondary data is the data which has already been collected and
reused again for some valid purpose. This type of data is previously
recorded from primary data and it has two types of sources named
internal source and external source.
Internal source:
These types of data can easily be found within the organization
such as market record, a sales record, transactions, customer data,
accounting resources, etc. The cost and time consumption is less in
obtaining internal sources.
External source:
The data which can’t be found at internal organizations and can be
gained through external third party resources is external source data. The
cost and time consumption is more because this contains a huge amount
of data. Examples of external sources are Government publications, news
publications, Registrar General of India, planning commission,
international labor bureau, syndicate services, and other non-
governmental publications.
Other sources:
 Sensors data: With the advancement of IoT devices, the sensors of
these devices collect data which can be used for sensor data analytics to
track the performance and usage of products.
 Satellites data: Satellites collect a lot of images and data in terabytes
on daily basis through surveillance cameras which can be used to collect
useful information.
 Web traffic: Due to fast and cheap internet facilities many formats of
data which is uploaded by users on different platforms can be predicted
and collected with their permission for data analysis. The search engines
also provide their data through keywords and queries searched mostly.

Data Architecture Design:

 Data architecture design is set of standards which are composed of
certain policies, rules, models and standards which manages, what type of
data is collected, from where it is collected, the arrangement of collected
data, storing that data, utilizing and securing the data into the systems and
data warehouses for further analysis.
 Data architecture design is important for creating a vision of
interactions occurring between data systems.
 Data architecture also describes the type of data structures applied to
manage data and it provides an easy way for data preprocessing.
 The data architecture is formed by dividing into three essential models
and then are combined :

 Conceptual model:
It is a business model which uses Entity Relationship (ER) model for
relation between entities and their attributes.
 Logical model:
It is a model where problems are represented in the form of logic such as
rows and column of data, classes, xml tags and other DBMS techniques.
 Physical model:
Physical models holds the database design like which type of database
technology will be suitable for architecture.

Data Architect:
 A data architect is responsible for all the design, creation, manage,
deployment of data architecture and defines how data is to be stored and
retrieved, other decisions are made by internal bodies.
Factors that influence Data Architecture :
Few influences that can have an effect on data architecture are business
policies, business requirements, Technology used, economics, and data
processing needs.

Business requirements:
These include factors such as the expansion of business, the
performance of the system access, data management, transaction
management, making use of raw data by converting them into image files
and records, and then storing in data warehouses. Data warehouses are
the main aspects of storing transactions in business.
 Business policies:
The policies are rules that are useful for describing the way of processing
data. These policies are made by internal organizational bodies and other
government agencies.
 Technology in use:
This includes using the example of previously completed data
architecture design and also using existing licensed software purchases,
database technology.
 Business economics:
The economical factors such as business growth and loss, interest rates,
loans, condition of the market, and the overall cost will also have an
effect on design architecture.
 Data processing needs :
These include factors such as mining of the data, large continuous
transactions, database management, and other data preprocessing needs.

Cyber Security
100% (1)
Cyber Security
34 pages
Aws-Devops Sreeja
100% (1)
Aws-Devops Sreeja
5 pages
Elastix 4 Installation Step by Step
No ratings yet
Elastix 4 Installation Step by Step
16 pages
4 - Ensure Freedom From Interference FFI For ASIL Applications - TASKING
No ratings yet
4 - Ensure Freedom From Interference FFI For ASIL Applications - TASKING
19 pages
TSCM60
0% (3)
TSCM60
2 pages
Pedigree Point Installation Instructions
No ratings yet
Pedigree Point Installation Instructions
10 pages
BDA Notes
No ratings yet
BDA Notes
54 pages
Lecture 0_dd96a9317d5537072feea03a885dc911
No ratings yet
Lecture 0_dd96a9317d5537072feea03a885dc911
21 pages
UNIT-1_BigData
No ratings yet
UNIT-1_BigData
10 pages
Unit-1 Introduction to Data Analytics.pptx
No ratings yet
Unit-1 Introduction to Data Analytics.pptx
35 pages
Unit-2
No ratings yet
Unit-2
15 pages
Da Unit Ii
No ratings yet
Da Unit Ii
25 pages
Data Analytics and Its Applications
No ratings yet
Data Analytics and Its Applications
2 pages
Unit 1 - DATA ANALYTICS - KIT-601 - AKTU
No ratings yet
Unit 1 - DATA ANALYTICS - KIT-601 - AKTU
24 pages
Presentation 20
No ratings yet
Presentation 20
31 pages
Lecture 2 - Hadoop 221
No ratings yet
Lecture 2 - Hadoop 221
28 pages
Data Analytics
No ratings yet
Data Analytics
30 pages
Unit II
No ratings yet
Unit II
91 pages
Unit 1 - From Big Data Analytics PDF
No ratings yet
Unit 1 - From Big Data Analytics PDF
5 pages
RESEARCH PROJECT NIKITHA R
No ratings yet
RESEARCH PROJECT NIKITHA R
22 pages
Data Analytics
No ratings yet
Data Analytics
13 pages
Big Data Analytics Unit-1
100% (1)
Big Data Analytics Unit-1
5 pages
BDA1-4 bunits
No ratings yet
BDA1-4 bunits
113 pages
m,
No ratings yet
m,
30 pages
Big Data Analytics 1
No ratings yet
Big Data Analytics 1
22 pages
Data Analytics Tools A Comprehensive Overview
No ratings yet
Data Analytics Tools A Comprehensive Overview
6 pages
The Power and Promise of Data Analytics
No ratings yet
The Power and Promise of Data Analytics
3 pages
Big Data
No ratings yet
Big Data
22 pages
Unit I - BigData
No ratings yet
Unit I - BigData
47 pages
TP 4 2docuatrimestre
No ratings yet
TP 4 2docuatrimestre
10 pages
Fda 1
No ratings yet
Fda 1
5 pages
What is Data Analytics
No ratings yet
What is Data Analytics
44 pages
Unit 1 Big Data
No ratings yet
Unit 1 Big Data
124 pages
Here is an even more detailed and expanded version of Chapter 1 - Copy
No ratings yet
Here is an even more detailed and expanded version of Chapter 1 - Copy
5 pages
Big Data Manual - Edited
No ratings yet
Big Data Manual - Edited
69 pages
Big Data Analytics
No ratings yet
Big Data Analytics
37 pages
Data Analytics-Unit1 Notes
No ratings yet
Data Analytics-Unit1 Notes
30 pages
DA_Chapter_1_Notes
No ratings yet
DA_Chapter_1_Notes
3 pages
Internship Report
No ratings yet
Internship Report
9 pages
Unit 1
No ratings yet
Unit 1
36 pages
Introduction
No ratings yet
Introduction
10 pages
Unit1 Introduction To Data Analytics and Data Analytics Lifecycle Notes
No ratings yet
Unit1 Introduction To Data Analytics and Data Analytics Lifecycle Notes
13 pages
Dou 05 06 2024
No ratings yet
Dou 05 06 2024
16 pages
Analytics PI Kit PDF
No ratings yet
Analytics PI Kit PDF
38 pages
abhijitya_midsem
No ratings yet
abhijitya_midsem
6 pages
Unit 1 Introduction to Data Analytics
No ratings yet
Unit 1 Introduction to Data Analytics
20 pages
Introduction to Business Analytics - Copy
No ratings yet
Introduction to Business Analytics - Copy
63 pages
IoT NOtes
No ratings yet
IoT NOtes
34 pages
Big Data and Data Analysis: Offurum Paschal I Kunoch Education and Training College, Owerri
No ratings yet
Big Data and Data Analysis: Offurum Paschal I Kunoch Education and Training College, Owerri
35 pages
UNUT 1- Introduction and Data Analytics Life Cycle
No ratings yet
UNUT 1- Introduction and Data Analytics Life Cycle
86 pages
Unit 1 Introduction
No ratings yet
Unit 1 Introduction
70 pages
big data analytics02
No ratings yet
big data analytics02
20 pages
Data Analytics
No ratings yet
Data Analytics
11 pages
DA Notes
No ratings yet
DA Notes
10 pages
DSBDA EndSem2023 12F FlyHigh
No ratings yet
DSBDA EndSem2023 12F FlyHigh
20 pages
Big Data Analytics Project Proposal by Slidesgo
No ratings yet
Big Data Analytics Project Proposal by Slidesgo
12 pages
Data Analytics-Wps Office
No ratings yet
Data Analytics-Wps Office
21 pages
MODULE-2
No ratings yet
MODULE-2
18 pages
Data Analytics
No ratings yet
Data Analytics
5 pages
Unit - II (Bca01)
No ratings yet
Unit - II (Bca01)
17 pages
1st Unit DA
No ratings yet
1st Unit DA
7 pages
Introduction to Big Data
No ratings yet
Introduction to Big Data
4 pages
BDA Assignment 1: Big Data Features and Characteristics
No ratings yet
BDA Assignment 1: Big Data Features and Characteristics
14 pages
All About Data Science
No ratings yet
All About Data Science
35 pages
Big Data Analytics Nep Sem 2 23-24
No ratings yet
Big Data Analytics Nep Sem 2 23-24
15 pages
"Big Data Science" Basic Concepts and Applications
From Everand
"Big Data Science" Basic Concepts and Applications
Sukanta Bhattacharya
No ratings yet
Manual Programs BoostSpeed 12 Beta Pro Edition
No ratings yet
Manual Programs BoostSpeed 12 Beta Pro Edition
58 pages
04 Introduction To SAP HANA Chapter 3 Data Modeling Slides en
No ratings yet
04 Introduction To SAP HANA Chapter 3 Data Modeling Slides en
29 pages
Iscsi Lun Linux 2014 PDF 2120982 PDF
No ratings yet
Iscsi Lun Linux 2014 PDF 2120982 PDF
18 pages
Reportbuilder PDF
No ratings yet
Reportbuilder PDF
116 pages
Username: Password:: File Hosting Search Login Register
No ratings yet
Username: Password:: File Hosting Search Login Register
3 pages
Software Packages 2
No ratings yet
Software Packages 2
5 pages
Create Framework Project Zf1
No ratings yet
Create Framework Project Zf1
4 pages
Walkowski 2019
No ratings yet
Walkowski 2019
4 pages
All PBL Topic Final
No ratings yet
All PBL Topic Final
6 pages
Business Rule Types, Order, and Update
No ratings yet
Business Rule Types, Order, and Update
5 pages
Amazon Interview Questions
No ratings yet
Amazon Interview Questions
4 pages
Deva Dattu: - Phone: 925-307-9979 - Linkedin
No ratings yet
Deva Dattu: - Phone: 925-307-9979 - Linkedin
8 pages
Application Security Posture Management: Securing Cloud-Native Applications at Scale
No ratings yet
Application Security Posture Management: Securing Cloud-Native Applications at Scale
27 pages
BSBADM504 Content File
100% (2)
BSBADM504 Content File
150 pages
Struxure 6.2.3 Release Notes
No ratings yet
Struxure 6.2.3 Release Notes
14 pages
Computer Project File
No ratings yet
Computer Project File
8 pages
Technical-Specifications-huaw Gnodeb Nbixml nr15-V1.02 PDF
No ratings yet
Technical-Specifications-huaw Gnodeb Nbixml nr15-V1.02 PDF
168 pages
Saket Kishore: Work Experience Skills
No ratings yet
Saket Kishore: Work Experience Skills
1 page
Program Scheme 29 August 2022
No ratings yet
Program Scheme 29 August 2022
9 pages
DataSunrise Database Security Admin Guide Linux
No ratings yet
DataSunrise Database Security Admin Guide Linux
56 pages
Master of Computer Applications (MCA) : Final Set of Assignments 2009 & 2010
No ratings yet
Master of Computer Applications (MCA) : Final Set of Assignments 2009 & 2010
10 pages
Midterm
100% (3)
Midterm
18 pages
CSC Main Project
No ratings yet
CSC Main Project
28 pages
Amazon.examcollection.aws Certified Cloud Practitioner.exam.Prep.2024 Mar 27.by.leopold.248q.vce
No ratings yet
Amazon.examcollection.aws Certified Cloud Practitioner.exam.Prep.2024 Mar 27.by.leopold.248q.vce
9 pages