data engineering design patterns
data engineering design patterns
… knowing that
there is your
software running
it?
Standardize and increase the descriptive power
of engineering processes
by applying patterns
Or in other words
Source: https://www.health.harvard.edu/blog/right-brainleft-brain-right-2017082512222
About me
● IT Architect at Cognizant
● Data Engineering, Data Science,
Cloud Computing, Agile teams
● Financial, Manufacturing,
Logistics, Retail industries
● Organizer of Vilnius Microsoft Data
Platform Meetup & Hack4Vilnius Hackathon
● Blogging on www.valdas.blog
Maslow’s hierarchy of needs
Self-actualization
Personal growth and fulfillment
Esteem need
Unique individual, self-respect, etc.
Safety needs
security, employment, protection against hunger and violence
Enterprise architecture
Buy vs build, cloud readiness
Business drivers
Business goals and objectives
Culture
Core values, way of working
Maslow’s hierarchy of needs for data projects -
simplified view for today’s presentation
Data architecture
Ingestion, storage consumption, how data is collected,
stored, transformed, distributed, and consumed
Culture
Core values, way of working
Culture, way of working, values
DevOps culture
1. Foster a Collaborative Environment
2. Impose End-to-End Responsibility - you build it you ship it
3. Encourage Continuous Improvement
4. Automate (Almost) Everything
5. Focus on the Customer’s Needs
6. Embrace Failure, and Learn From it
7. Unite Teams — and Expertise
Source: https://www.cmswire.com/information-management/7-key-principles-for-a-successful-devops-culture/
Data architecture
If you are building a data platform in the
cloud, remember that ...
Source: https://azure.microsoft.com/en-in/solutions/architecture/modern-data-warehouse/
Architecture example
Digital portals
LOB
Cloud Reporting
Data ingestion
Digital portals
LOB
Cloud Reporting
Application integration approaches
File Transfer
Have each application produce files of shared data for others to consume, and consume files that others have produced.
Shared Database
Have the applications store the data they wish to share in a common database.
Messaging
Have each application connect to a common messaging system, and exchange data and invoke behavior using messages.
Ingestion challenges
● Multiple data source load and prioritization -> push vs pull strategy
● Data validation and cleansing -> separate business from processing logic
● Data transformation and compression -> different compression and file types
Choose privacy protection patterns
Privacy protection at the ingress Privacy protection at the
egress
Source: https://www.valdas.blog/2019/08/06/privacy-gdpr-implementation-in-azure/
Data storage
Digital portals
LOB
Cloud Reporting
Use cloud storage offerings instead of Hadoop
Data Warehouse vs Data Lake
Data Warehouse Data Lake
Requirements Relational requirements Diverse data, scalability, low cost
Data Value Data of recognised high value Candidate data of potential value
Data Processing Mostly refined calculated data Mostly detailed source data
Business Entities Known entities, tracked over time Raw material for discovering entities and facts
Data Standards Data conforms to enterprise Fidelity to original format and condition
standards
Source: Microsoft
Data Warehouse vs Data Lake
Source: Microsoft
Data Warehouse vs Data Lake
Source: Microsoft
Data preparation & training
Digital portals
LOB
Cloud Reporting
Offer self-service tools
Collect raw Train & Take Insights
Curate data Score Into Actions
data
Make
hypothesis
Validate
model
Identify
SQL variables
Build
Automated pipeline model
Split
data
Self service exploration
Use on-demand resources
Serve results to end consumers
Digital portals
LOB
Cloud Reporting
Apply domain and product thinking