Data Engineering for Model-Based Systems Engineering
Data Engineering for Model-Based Systems Engineering
Engineering:
Powering the Future of Complex System Design
Fundamentals, Applications, and Modern Technologies
Enabling Technologies:
Identify key Describe
• Storage, Processing, Analytics types of data fundamental DE
Challenges & Solutions: generated and concepts
consumed in applied to MBSE
• Current Hurdles and Innovations MBSE. data.
Case Study Snippets:
• DE & MBSE in Action
• "Systems Engineering is an
interdisciplinary approach and means
to enable the realization of successful
systems. It focuses on defining
Definition customer needs and required
functionality early in the development
(INCOSE): cycle, documenting requirements, then
proceeding with design synthesis and
system validation while considering
the complete problem..."
Common
Languages/ •SysML (Systems Modeling Language), UML (Unified
Notations: Modeling Language), UPDM, Arcadia/Capella.
INTRODUCING DATA ENGINEERING (DE): THE ENABLER
• Data Engineering is the discipline focused on
the practical application of data collection,
storage, processing, and analysis. It involves
Definition: designing, building, and maintaining the
infrastructure and systems that allow
organizations to handle and utilize large
volumes of data efficiently and reliably.
Simulation & Analysis • Performance metrics, Failure modes, Safety analyses, Trade-off studies. (Numerical,
Results: Tabular, Time-series)
V&V Data: • Test cases, Test results, Coverage analysis, Issue tracking. (Structured, Textual)
generated and
Creates a holistic view.
3.Analytics: Enable advanced
data as first-class
consumed by MBSE analytics on system models and data assets
lifecycle data (e.g., impact
processes at scale. analysis, pattern detection, requiring robust
predictive maintenance insights).
4.Data Quality & Governance: engineering
Ensure consistency, reliability, and
traceability of MBSE data across
practices.
tools and teams.
5.Collaboration: Provide a robust
data backbone supporting
collaborative modeling and
analysis.
6.Automation: Automate data flows
for reporting, V&V, and model
updates.
SCOPE OF DE ACTIVITIES IN THE MBSE CONTEXT
WHAT DOES A DATA ENGINEER DO HERE? → SPECIFIC TASK:
A data-driven architecture linking MBSE models form a critical part DE provides the infrastructure and
information generated throughout of the Digital Thread, representing integration mechanisms to build
the product lifecycle, connecting the authoritative source of truth for and maintain the Digital Thread.
processes and enabling a holistic system design and requirements.
Connecting disparate data silos
view of the asset's data (from (CAD, CAE, PLM, MBSE, ERP, MRO).
concept to operation and
disposal). Ensuring data traceability and
consistency across the thread.
Making Digital Thread data
accessible for analysis and
decision-making.
CORE CONCEPT 2: UNDERSTANDING MBSE DATA
-- WHAT KIND OF DATA ARE WE HANDLING?
• Storage choice needs to handle relationships well (Graph DBs are often
suitable).
DE Implications: • ETL/ELT needs to parse complex formats (e.g., XMI) and maintain
semantic integrity.
• Data models need to accommodate heterogeneity and versioning.
CORE CONCEPT 3: DATA PIPELINES FOR MBSE MOVING
AND TRANSFORMING MBSE DATA
• Automate the flow of data from MBSE tools and
Purpose: related sources into centralized storage and
processing systems.
• Concept: Store structured, processed data optimized for reporting and BI. Schema-
on-write.
Data Warehouses: • Use for MBSE: Storing aggregated metrics, KPIs, historical trends derived from models
and tests.
• Technologies: Snowflake, BigQuery, Redshift, Synapse Analytics.
• Concept: Optimized for storing and querying highly connected data (nodes, edges,
properties).
Graph Databases: • Use for MBSE: Excellent fit for representing model structure, relationships, and traceability.
Enables powerful pathfinding and impact analysis queries.
• Technologies: Neo4j, Amazon Neptune, Azure Cosmos DB (Graph API), TigerGraph.
Hybrid Approaches: • Often a combination is best (e.g., Lake for raw data, Graph DB for model structure,
Warehouse for reporting)
DATA GOVERNANCE & QUALITY IN MBSE
-- Ensuring Trustworthy Engineering Data
Why Critical? Key Aspects: Challenge:
• Decisions based on • Data Quality: Accuracy, Completeness, • Applying governance
Consistency (across tools/models), Timeliness,
models (design Validity (conformance to standards/rules). across heterogeneous
choices, safety V&V) • DE Role: Implement automated quality checks tools and processes.
have high stakes. Errors in pipelines, data profiling.
in data can lead to • Metadata Management: Documenting data
system failures. sources, definitions, lineage (how data was
transformed).
• DE Role: Implement metadata catalogs (e.g.,
Apache Atlas, Alation, Collibra).
• Data Lineage: Tracking data from origin
through transformations to consumption.
Crucial for impact analysis and debugging.
• DE Role: Tools and pipeline design should
capture lineage.
• Master Data Management: Defining and
managing authoritative sources for key entities
(e.g., standard components, requirements).
• Access Control & Security: Protecting sensitive
intellectual property and ensuring compliance.
Analytics On MBSE Data: Unlocking Insights --
Beyond Modeling: Analyzing The System Data
Example Analytics Use
Goal: Technologies:
Cases:
Impact Analysis: "If I change this requirement or component, what other parts of the
system (design, tests, documentation) are affected?" (Graph queries are powerful here).
Leverage the Model Completeness & Consistency Checks: Automated checks beyond basic tool SQL, Cypher (for
validation (e.g., identifying orphaned elements, inconsistent naming).
integrated MBSE Graph DBs),
data for deeper Requirements Coverage Analysis: Automatically verifying which requirements are covered
by design elements and test cases. Spark SQL/MLlib,
understanding Python (Pandas,
System Optimization: Analyzing simulation results across many runs to find optimal
and better design parameters. Scikit-learn), BI
decision- Defect Prediction: Using historical data (model changes, test results, issues) to predict tools (Tableau,
making. areas prone to defects.
Power BI).
Design Pattern Recognition: Identifying recurring architectural patterns across projects.
Current Challenges & Problems : The Hurdles We Face
• Different MBSE/Engineering tools use proprietary formats; getting data out and integrating it is
Tool Interoperability: hard. Standards like OSLC (Open Services for Lifecycle Collaboration) help but aren't universally
adopted or sufficient.
• Tools may represent similar concepts differently. Mapping them requires domain expertise and
Semantic Heterogeneity: robust semantic mediation.
• Analyzing very large system models (millions of elements/relationships) can challenge even
Scalability of Graph Queries: specialized graph databases.
• Ensuring consistency and accuracy across a fragmented toolchain is difficult. Lack of standard
Data Quality Assurance: validation rules.
Versioning & Configuration • Handling evolving models and their associated data consistently across the DE platform.
Management:
• Systems Engineers and Data Engineers often have different backgrounds and terminology. Effective
Bridging Cultures: collaboration is key.
• Protecting valuable IP contained within models when data is centralized and integrated.
Security:
Cloud
Low-
Platforms Improved
Graph Knowledge AI/ML Data Mesh Code/No-
(AWS, APIs &
Databases: Graphs: Integration: Concepts: Code
Azure, Standards:
Platforms:
GCP):
Provide
scalable
Combining
storage (Data
graph
Lakes), Applying ML for Decentralized
Maturing databases with
processing predictive approach to Emerging
rapidly, offering ontologies Tool vendors
(Spark, analytics, data platforms aim
better (semantic are slowly
Serverless anomaly ownership and to simplify
performance models) to improving APIs;
Functions), detection in pipelines, pipeline
and features handle ongoing work
databases simulation/test potentially
(Managed
specifically for heterogeneity
data, Natural aligning well
creation and in standards
connected and enable data bodies (OMG,
SQL, NoSQL, Language with distributed
data relevant more intelligent integration for INCOSE, ISO
Graph), AI/ML Processing engineering
to MBSE (e.g., querying. domain STEP
services, and (NLP) on teams owning
Neo4j, (Related to experts. AP233/242).
orchestration requirements their domain
Neptune). Semantic Web
tools. Form the documents. data.
tech: RDF,
backbone of
OWL, SPARQL).
modern DE
solutions.
DE Role: Ingested SysML models (e.g., from Cameo) into a Neo4j DE Role: Built data pipelines using Kafka and Spark to ingest,
graph database. Developed Cypher queries to traverse the clean, and correlate time-series sensor data, simulation logs,
model graph (requirements -> functions -> logical blocks -> and test results. Stored in a Data Lake. Linked results back to
physical components -> V&V cases). requirements traced in the MBSE model (e.g., Rhapsody).
Outcome: Rapid identification of all affected elements, Outcome: Enabled comprehensive coverage analysis,
associated tests, and documentation, reducing analysis time automated report generation for safety standards (ISO 26262),
from days to minutes. and faster debugging of failed test scenarios.
Call to Key
Summary:
Action: Resources:
INCOSE (International Council on Systems
MBSE is transforming systems engineering by using
Engineering): www.incose.org (MBSE initiatives, SE
www.incose.org
MBSE generates and consumes vast amounts of Object Management Group (OMG):
complex, connected data. www.omg.org (SysML Specification)
www.omg.org
Contact •[email protected]
Info •H205 –FATETA Building