0% found this document useful (0 votes)
72 views27 pages

Architectures of Big Data

The document discusses architectures for big data ecosystems. It describes the key elements of a big data ecosystem as infrastructure, analytics, and applications. Infrastructure provides storage and processing, analytics platforms enable accessing and analyzing the data, and applications utilize the data. The document also discusses strategies for integrating master data management with big data to improve analytics, governance, and insights.

Uploaded by

Palak Garhwani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
72 views27 pages

Architectures of Big Data

The document discusses architectures for big data ecosystems. It describes the key elements of a big data ecosystem as infrastructure, analytics, and applications. Infrastructure provides storage and processing, analytics platforms enable accessing and analyzing the data, and applications utilize the data. The document also discusses strategies for integrating master data management with big data to improve analytics, governance, and insights.

Uploaded by

Palak Garhwani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

Architectures of Big Data

Tushar B. Kute,
http://tusharkute.com
Big Data Ecosystem

• A data ecosystem is a collection of infrastructure,


analytics, and applications used to capture and
analyze data.
• Data ecosystems provide companies with data
that they rely on to understand their customers
and to make better pricing, operations, and
marketing decisions.
• The term ecosystem is used rather than
‘environment’ because, like real ecosystems, data
ecosystems are intended to evolve over time.
Why ecosystem ?

• Data ecosystems are for capturing data to produce


useful insights. As customers use products–especially
digital ones–they leave data trails.
• Companies can create a data ecosystem to capture and
analyze data trails so product teams can determine
what their users like, don’t like, and respond well to.
• Product teams can use insights to tweak features to
improve the product. Ecosystems were originally
referred to as information technology environments.
They were designed to be relatively centralized and
static.
Why ecosystem ?

• Increasing user engagement


• Using machine learning to identify hidden relationships in
the data
• Sending alerts to notify teams of changes
• Messaging users directly
• Analyzing individual users or cohorts
• Tracking conversions and marketing funnels
• Increasing user retention
• A/B testing feature changes
• Integrating with other applications in the data ecosystem
Elements of Ecosystem

• Infrastructure
• Analytics
• Applications
Infrastructure

• If a data ecosystem is a house, the infrastructure


is the foundation.
• It’s the hardware and software services that
capture, collect, and organize data.
• The infrastructure includes servers for storage,
search languages like SQL, and hosting
platforms.
• Infrastructure can be used to capture and store
three types of data: structured, unstructured,
and multi-structured.
Analytics

• Analytics serve as the front door through which


teams access their data ecosystem house.
• Analytics platforms search and summarize the
data stored within the infrastructure and tie
pieces of the infrastructure together so all data
is available in one place.
• While infrastructure systems provide their own
basic analytics, these tools are rarely sufficient.
Applications

• Applications are the walls and roof to the data


ecosystem house–they’re services and systems that
act upon the data and make it usable.
• For example, a product team might decide to port
its analytics data into its marketing, sales, and
operations platforms.
• This would allow the marketing team to score leads
based on activity, the sales team to get alerts when
ideal prospects engage, and operations teams to
automatically charge customers based on product
usage.
Enterprise Big Data Ecosystem
Infrastructure

• Infrastructural technologies are the core of the Big Data


ecosystem. They process, store and often also analyse data. For
decades, enterprises relied on relational databases– typical
collections of rows and tables- for processing structured data.
• However, the volume, velocity and variety of data mean that
relational databases often cannot deliver the performance and
latency required to handle large, complex data. The rise of
unstructured data in particular meant that data capture had to
move beyond merely rows and tables.
• Thus new infrastructural technologies emerged, capable of
wrangling a vast variety of data, and making it possible to run
applications on systems with thousands of nodes, potentially
involving thousands of terabytes of data.
Infrastructure - Examples

• Hadoop- A whole ecosystem of technologies designed for the storing,


processing and analysing of data. The core Hadoop technologies work
on the principle of breaking up and distributing data into parts and
analysing those parts concurrently, rather than tackling one monolithic
block of data all in one go.
• NoSQL- Stands for Not Only SQL; also involved in processing large
volumes of multi-structured data. Most NoSQL databases are most
adept at handling discrete data stored among multi-structured data.
Some NoSQL databases, like HBase, can work concurrently with Hadoop.
• Massively Parallel Processing (MPP) Databases- MPP databases work by
segmenting data across multiple nodes, and processing these segments
of data in parallel, and uses SQL. Whereas Hadoop is usually run on
cheaper clusters of commodity servers, most MPP databases run on
expensive specialised hardware.
Analytics

• Although infrastructural technologies incorporate data


analysis, there are specific technologies which are designed
specifically with analytical capabilities in mind. Sub-categories
of analytics on the big data map include:
• Analytics Platforms- Integrate and analyse data to uncover new
insights, and help companies make better-informed decisions.
There is a particular focus on this field on latency, and
delivering insights to end users in the most timely manner
possible.
• Visualization Platforms- Specifically designed- as the name
might suggest- for visualizing data; taking the raw data and
presenting it in complex, multi-dimensional visual formats to
illuminate the information
Analytics

• Business Intelligence (BI) Platforms- Used for


integrating and analysing data specifically for
businesses. BI Platforms analyse data from multiple
sources to deliver services such as business
intelligence reports, dashboards and visualizations
• Machine Learning- Also falls under this category, but
is dissimilar to the others. Whereas the analytics
platforms input processed data and output analytics/
dashboards/visualisations for end users, the input in
machine learning is data the algorithm ‘learns from’,
and the output depends on the use case.
Big data vs. Master Data

• Today, enterprises are becoming more customer-centric and


are trying to know more and more about their customer
preferences by collecting all kinds of data from available
sources.
• These data, essentially termed as ‘big data’, typically
encompasses large volumes of texts and other forms of
unstructured behavioral data from a variety of sources.
• Master data management (MDM) primarily revolves around
the creation of a trusted source of highly structured data
throughout an enterprise. Data management analysts see a
better future for enterprises through better insight into their
customers, when they use MDM and big data in tandem.
Big data challenges

• Big data tools enable the analysis of massive


volumes of data from various sources, however, the
real challenge is:
– Being really able to relate this data with
customers
– Actually knowing who the customers are
– Knowing your best customers
– Knowing how your trusted customers are
reacting to your product or service
Current Scenario

• Enterprises are starting to see the benefits big data and


MDM together could bring.
• According to a survey result by The Information
Difference Ltd. an MDM consulting and Research
Company, 67% of survey respondents saw MDM driving
big data, rather than the other way around, with just
17% seeing big data producing new master data,
including the ability to use master data to automatically
detect customer names in sets of big data.
• The most popular choice was for existing MDM data to
help drive big data searches.
Master Data Management
Master Data Management
Strategies of MDM

• More effective analytics


• Enforced governance
• Expanding the truth
More effective analytics

Linking an MDM system to Big Data can provide


the basic framework for performing analytics,
as the former offers essential information
about customers or products that reduces the
scope of analytics and yields greater insight.
Otherwise, users run the risk of randomly
querying Big Data in search of the proverbial
needle in the haystack.
Enforced Governance

• The Metadata, cleansing, and data quality


standards of MDM hubs (which often house the
enterprise’s most reliable and important data)
are ideal for identifying which Big Data is
appropriate and which should be discarded.
• Better governance considerably helps in the
“taming” of unstructured or semi-structured
data from lesser known public sources.
Expanding the truth

• While Master Data is generally considered the


closest data organizations have towards denoting
the elusive “single” version of the truth,
supplementing what is typically relational data with
unstructured data provides a more comprehensive
overview of key facets of customer (and product)
behavior, which directly assists in generating future
business value.
Driving BDA

• Organizations now have Big Data and Master Data, yet


what is the best way to utilize their interactions together?
• It appears more viable for organizations to utilize the latter
to inform the analytics process of the former, than to
expressly use Big Data to add to a Master Data repository –
although the first option will inevitably enhance an
enterprise’s Master Data assets.
• Basing queries on Master Data tailors results so that they
are aligned with business objectives.
• In this way MDM functions as a driver for Big Data due to
the degree of specificity regarding customers and products
towards which it can target Big Data analysis.
Integrating BI

• Business Intelligence (BI) tools can parse Big Data


according to information found in MDM hubs to identify
points of relevance.
• BI search capabilities can become significantly enhanced
by automating the application of Metadata related to
germane Master Data, which can take place either at the
time a query is issued or at the point that data is stored.
• Both methods enable users to expand the utility of
Master Data by aiding in the analytics of Big Data,
expediting time to insight, and subsequently enriching
MDM hubs.
Integration options

• Data Virtualization: Virtualization layers are perfect for


abstracting data between sources without actually moving the
physical location of the data.
• Contemporary MDM Platforms: A recent Gartner report
states, “By 2017, 35% of MDM software sales prospects will
purchase based on candidate vendors’ solutions for linking
structured Master Data to Big Data sources.”
• Hadoop: File systems such as Hadoop and certain NoSQL
offerings for Big Data allow SQL access so that users can
query and interact with data in a language more native to that
of most Master Data systems, which enables greater control
and conformity of Big Data from a governance perspective
Reference:

• https://mixpanel.com
• https://dataconomy.com
• https://www.iiconsortium.org
• https://datameer.com
• https://www.wipro.com
• https://www.dataversity.net
• https://medium.com
Thank you
This presentation is created using LibreOffice Impress 5.1.6.2, can be used freely as per GNU General Public License

/mITuSkillologies @mitu_group /company/mitu- MITUSkillologies


skillologies

Web Resources
https://mitu.co.in
http://tusharkute.com

[email protected]
[email protected]

You might also like