Semantic Web 2022
Semantic Web 2022
1. Model-First Load-Later:
Advantages:
1. Better Control Over Data Structure: In the model-first approach, the data structure is
designed first, allowing you to define clear relationships and formats. This can help
maintain consistency and integrity across the system.
2. Optimized Performance: Since the model is already established, you can optimize data
loading for specific models. It helps in tailoring the load strategy for each model, thus
improving performance.
3. Data Integrity and Quality: This approach can enforce strong validation rules before
data is loaded, leading to better data quality and consistency.
4. Easier Integration with Existing Systems: If the model is already defined, integrating
with other systems can be more straightforward, as the data structure is well-
understood from the outset.
Disadvantages:
1. Potential for Delayed Data Availability: Since data is loaded after the model is defined,
there can be delays in data availability, which may affect the user experience or real-time
decision-making.
2. Complex to Modify Data Models: Any changes to the data model after implementation
can be difficult and costly to modify, particularly when data has already been loaded into
the system.
3. Initial Complexity: The approach requires upfront work on the model before data can
be loaded, making the initial setup more time-consuming.
4. Requires Strong Planning: Since the model is created first, careful planning is needed to
ensure the model can accommodate future data requirements.
1/70
2. Load-First Model-Later:
Advantages:
1. Faster Data Availability: The load-first approach enables data to be available quickly,
without waiting for the model to be defined. This can be important in real-time or big
data applications where immediate access to data is necessary.
2. Flexible Data Modeling: Since data is loaded first, it gives you flexibility to adjust the
model based on the nature of the incoming data. This allows for dynamic schema
design.
3. Easy to Scale: Data can be loaded without waiting for the model, allowing for faster
scaling of systems as more data comes in.
Disadvantages:
1. Inconsistent Data Structure: Loading data before defining a model can result in
inconsistent or poorly structured data, which may require significant effort to clean and
standardize later.
3. Data Redundancy: In the absence of a defined model, data may be loaded redundantly
or inefficiently, leading to storage overhead and possible data integrity issues.
4. Complicated Maintenance: Modifying or updating the model after loading data can be
challenging and may involve significant restructuring, which can affect system stability
and performance.
In summary, Model-First Load-Later offers better structure and data integrity but comes at
the cost of potentially slower data availability and more upfront work. Load-First Model-
Later provides faster data access and flexibility but can lead to inconsistencies and
management difficulties in the long run.
2/70
b. Business Intelligence (BI) also deals with collecting,
integrating, analyzing large volume of data. Then, how Big
Data is different from BI.
While both Business Intelligence (BI) and Big Data deal with large volumes of data, they are
distinct in their focus, technologies, methods, and objectives.
Scope: BI typically focuses on structured data and historical data analysis. It is often
used to summarize and analyze past performance to support decision-making.
Data Types: BI primarily works with structured data (e.g., data from transactional
systems, databases, spreadsheets).
Big Data:
Scope: Big Data refers to massive datasets, often encompassing a wide variety of
data types, including structured, semi-structured, and unstructured data. It is not
just focused on historical data but also on real-time or near-real-time data
processing.
Focus: Big Data focuses on handling, processing, and analyzing large, complex
datasets, often using advanced analytics techniques like machine learning,
predictive analytics, and artificial intelligence.
Data Types: Big Data includes structured data, unstructured data (e.g., social media
posts, sensor data, images), and semi-structured data (e.g., logs, JSON, XML).
Tools: BI tools include dashboards, reporting software (e.g., Tableau, Power BI, Qlik),
OLAP (Online Analytical Processing) cubes, and data warehousing solutions.
3/70
Big Data:
Tools: Big Data tools include Hadoop, Apache Spark, NoSQL databases (e.g.,
MongoDB, Cassandra), and data lakes for storing and processing large datasets.
Big Data:
Analysis Type: Big Data involves more advanced analytics, such as predictive
analytics, machine learning, and data mining, to uncover patterns, correlations, and
insights that aren't apparent through traditional methods.
Big Data:
Objective: Big Data aims to manage and analyze vast amounts of data to discover
insights, identify trends, and make predictions. It helps businesses handle
complexity and leverage diverse data types.
4/70
Use Cases: Examples include real-time fraud detection, personalized
recommendations (e.g., Netflix), predictive maintenance in manufacturing, and
sentiment analysis on social media.
5. Data Volume:
Business Intelligence (BI):
Big Data:
Volume: Big Data deals with extremely large datasets, often in the petabyte or
exabyte range, and scales to handle massive amounts of diverse data from multiple
sources.
6. Complexity:
Business Intelligence (BI):
Big Data:
Complexity: Big Data is more complex as it deals with diverse, high-volume datasets
and often requires advanced algorithms, distributed computing, and machine
learning for analysis.
Summary of Differences:
Aspect Business Intelligence (BI) Big Data
Focus Historical data and performance analysis Real-time, large, diverse datasets
Technology OLAP, data warehousing, BI tools (e.g., Hadoop, Spark, NoSQL, data lakes
Tableau, Power BI)
5/70
Aspect Business Intelligence (BI) Big Data
Complexity Less complex, focus on querying and Highly complex, involving machine
reporting learning and AI
In essence, Business Intelligence is more about analyzing historical, structured data for
reporting and decision-making, while Big Data is about handling vast, varied, and real-time
datasets for advanced analytics and predictions.
The V's of Big Data represent different characteristics that are essential to understand when
dealing with large, complex datasets in real-world applications. Here's a breakdown of
Veracity, Variability, Visibility, and Viability, with a focus on their importance in real-world
use cases:
1. Veracity:
Definition: Veracity refers to the trustworthiness or quality of the data. It highlights the
accuracy, reliability, and consistency of data, which can vary due to noise, errors, or
inconsistencies during collection or processing.
Data Quality Assurance: High veracity is essential for making reliable decisions
based on data. For example, in healthcare, accurate patient data is critical for
diagnosis and treatment.
Data Cleaning: In Big Data environments, ensuring data veracity often involves
cleaning, normalizing, and validating data to avoid misleading conclusions or
decisions.
Example: In an autonomous vehicle system, sensor data must have high veracity to
ensure that the vehicle's navigation and decision-making processes are based on correct
and reliable inputs.
6/70
2. Variability:
Definition: Variability refers to how data changes over time or across different contexts,
and the challenge it poses in terms of consistency. It encompasses fluctuations in data
patterns or values due to multiple factors such as seasonality, user behavior, or external
influences.
3. Visibility:
Definition: Visibility refers to the accessibility and transparency of data. It involves how
easily the data can be understood, tracked, and analyzed by decision-makers or systems.
Visibility helps ensure that data is available in a usable and actionable form.
Real-Time Monitoring: In industries like healthcare and finance, visibility allows for
the real-time tracking of operations or conditions. For example, hospital staff must
have visibility into patient vitals to act quickly in emergencies.
Data Governance: Visibility also plays a role in data governance, ensuring that
stakeholders can track how data is used, who has access to it, and how it’s
7/70
protected.
Example: In a logistics company, visibility into supply chain data (e.g., location, delivery
status) enables managers to optimize routes, manage inventory, and provide updates to
customers in real-time.
4. Viability:
Definition: Viability refers to the usefulness and value of data in relation to specific
business goals or objectives. It answers the question of whether the data can be
effectively utilized to support decision-making, innovation, or operational efficiency.
Business Alignment: Data must be viable to meet business objectives. For example,
customer data in a CRM system needs to be actionable for marketing strategies and
sales efforts.
Example: In the agricultural sector, sensor data about soil moisture and weather
patterns needs to be viable for farmers to make actionable decisions about irrigation
and crop management.
Summary Table:
V Definition Importance in Real-World Use Cases
8/70
V Definition Importance in Real-World Use Cases
Variability How data fluctuates over time Important for dynamic industries like e-commerce,
or across different contexts finance, and manufacturing where data patterns
change and need to be adapted to.
Visibility The transparency and Crucial for real-time monitoring in sectors like
accessibility of data for analysis healthcare, logistics, and finance to track and act on
and action data effectively.
Viability The usefulness and value of Ensures that data contributes to business goals,
data in relation to business making it critical for decision-making, cost-
objectives effectiveness, and innovation.
In conclusion, these V's are fundamental for managing Big Data effectively in real-world
applications. Ensuring data veracity, managing variability, providing data visibility, and
assessing data viability are crucial for driving valuable insights and making informed
decisions across various industries.
Example: The company analyzes past delivery data to evaluate delivery times, fuel
consumption, traffic patterns, and driver performance. Descriptive analytics will summarize
the following:
Outcome: The company can identify trends such as routes that consistently take longer,
frequent delays in specific areas, or vehicles that consume more fuel than others.
9/70
Tools: Reports, dashboards, data visualization (e.g., bar charts, line graphs).
Example: Using the historical data analyzed in descriptive analytics, the company builds
predictive models to forecast:
Expected delivery time based on factors like weather, traffic, and day of the week.
Fuel consumption estimates for different routes, considering traffic congestion and
vehicle efficiency.
Outcome: Predictive models may indicate that on certain days, traffic patterns in a particular
area could cause significant delays, and deliveries are likely to take 10-15% longer.
Tools: Machine learning models (e.g., regression analysis, time series forecasting), historical
data.
Example: Using the predictive insights, the company uses prescriptive analytics to optimize
routes and delivery schedules:
Suggest alternative routes that avoid traffic congestion or road closures, saving fuel and
time.
Recalculate optimal delivery times for drivers, avoiding peak traffic hours.
Recommend the best vehicles for each route based on fuel efficiency and predicted
delivery time.
Outcome: The company can implement optimized delivery schedules that reduce delivery
time by 20%, fuel consumption by 10%, and increase the overall on-time delivery rate.
10/70
Tools: Optimization algorithms, decision models, simulations.
Descriptive Analyzing past data to Analyzing past delivery times, fuel consumption,
Analytics understand what happened and delays
Visualization:
Imagine this in a flowchart-like diagram:
1. Descriptive Analytics – Look at past delivery data (e.g., "Last week's deliveries: Route X
took 45 minutes, 10% delayed").
2. Predictive Analytics – Predict future delivery times (e.g., "Traffic patterns tomorrow will
cause a 10% delay on Route X").
This use case shows how all three analytics types—descriptive, predictive, and prescriptive
—work together to improve decision-making in logistics, leading to optimized and efficient
operations.
11/70
A Linked Data Browser is a tool or application that allows users to view, navigate, and
interact with linked data on the web. Linked data refers to a method of structuring,
connecting, and querying data in a way that it can be easily interlinked across different data
sources, typically using standard web protocols and formats such as HTTP, URIs (Uniform
Resource Identifiers), RDF (Resource Description Framework), and SPARQL.
The browser allows users to browse data that is linked across different datasets.
Linked data enables the connection between resources through relationships
expressed in the form of triples (subject-predicate-object) based on RDF.
For example, a linked data browser could allow a user to navigate from one entity
(e.g., a book) to related entities (e.g., the author, publisher, or genre) by following
links between them.
2. Visualization of Data:
3. SPARQL Queries:
Many linked data browsers include a SPARQL query interface, which allows users to
directly query the underlying datasets. SPARQL is a query language designed for
querying RDF data, making it possible to retrieve specific pieces of information
across multiple linked data sources.
Linked data browsers are often used to explore open data sources available on the
Web, such as government data, cultural heritage collections, scientific datasets, and
more. These datasets are often published in linked open data (LOD) formats,
allowing users to explore and combine data from various domains.
12/70
the author of a book, related books, or the geographical locations associated with the
book.
Why is it Important?
Interoperability: Linked data browsers enable the integration of data from different
domains, enhancing the possibility of drawing connections and insights across different
types of information.
Data Discovery: They support discovering new and relevant information from various
datasets in a way that is transparent and navigable.
Better Data Access: These browsers simplify the access to complex datasets and allow
non-technical users to navigate and explore them easily.
In summary, a Linked Data Browser is a crucial tool for exploring, visualizing, and querying
data that is linked across different sources, providing an interactive way to navigate the
interconnected web of data.
Linked Data Access Mechanisms are the methods and technologies used to retrieve and
interact with linked data that is distributed across the web. Linked data is typically structured
using standard web protocols such as HTTP, and is often represented in formats like RDF
(Resource Description Framework). The main goal of linked data access mechanisms is to
allow seamless and efficient retrieval of data from multiple, interrelated datasets.
Usage: Each resource, such as an entity or concept (e.g., a book or a person), is assigned
a URI, and data about the resource can be retrieved by sending an HTTP request to that
URI.
Example: http://dbpedia.org/resource/Albert_Einstein
Importance: HTTP URIs are fundamental for accessing linked data because they provide
a consistent and standardized way to identify and retrieve data from various datasets
13/70
across the web. They allow resources to be accessible and shareable in a distributed
manner.
Usage: RDF provides a way to model data in the form of triples, making it possible to link
data in a machine-readable way.
Example: <http://dbpedia.org/resource/Albert_Einstein>
<http://www.w3.org/2000/01/rdf-schema#label> "Albert Einstein"@en
Usage: SPARQL queries are used to interact with linked data by querying specific
resources, filtering data based on certain conditions, and retrieving connected data
points.
14/70
?author <http://www.w3.org/2000/01/rdf-schema#label> ?name .
}
Importance: SPARQL enables powerful querying of linked data across multiple datasets.
It allows data to be retrieved in a structured manner, making it possible to perform
complex data analytics, transformations, and integrations.
Usage: LDP provides standardized methods to access linked data using HTTP methods
(GET, POST, PUT, DELETE) to interact with data resources.
Example: An API endpoint following LDP could allow retrieving data about a
resource, adding new data, or updating existing data.
Importance: LDP promotes the use of linked data through RESTful services, enabling
easy integration and access to data in distributed systems. It is especially useful for
building web applications that need to work with linked data.
Usage: Data dumps are typically used when real-time access to linked data is not
necessary, and large volumes of data need to be analyzed or processed offline.
Example: A linked data source might offer a downloadable RDF file containing a
comprehensive dataset about cultural heritage.
Importance: Data dumps provide an easy way to access large datasets in bulk. However,
they do not support real-time querying or dynamic data updates, so they are not
15/70
suitable for applications requiring frequent data changes.
Usage: The server returns the data in the requested format, allowing the client to
receive linked data in a format that is easiest to process.
Accept: application/ld+json
Usage: These tools abstract much of the complexity involved in querying and processing
linked data, making it easier for developers to integrate linked data into their
applications.
Example: Libraries like Apache Jena, RDFLib (Python), or rdflib.js (JavaScript) provide
APIs to query and manage RDF data.
16/70
data-handling code.
8. Caching Mechanisms
Definition: Caching involves storing previously retrieved linked data locally or in
intermediary servers to improve access speed and reduce server load.
Usage: Data is cached in response to frequent queries, which ensures faster access
times for commonly requested data.
Example: A linked data browser might cache results from SPARQL queries, reducing
the need to repeatedly fetch the same data from a remote server.
Importance: Caching improves the performance and scalability of applications that rely
on linked data, especially when accessing large datasets or making frequent queries.
17/70
Mechanism Description Example Use Case
Conclusion:
The access mechanisms for linked data are diverse and tailored to different use cases, from
querying real-time data using SPARQL to downloading data dumps for offline use. These
mechanisms enable the integration and use of linked data across various domains,
supporting applications in areas like research, government data, and knowledge
management.
18/70
storage, querying, and presentation. Below is an illustration of a typical framework for
building linked data applications, outlining the key components involved.
Linked Data Repositories: Linked data applications interact with various data sources or
repositories that publish data in RDF format or other linked data-compatible formats.
Data Integration: Linked data applications often need to integrate multiple data
sources. This is achieved by utilizing shared identifiers (URIs) and relationships defined in
RDF data.
SPARQL Endpoints: Many linked data sources expose a SPARQL endpoint that allows
developers to query the data programmatically.
SPARQL Queries: The SPARQL Protocol and RDF Query Language is used to query linked
data. This is the primary method for retrieving specific information from linked data
sources.
Example: A SPARQL query might fetch information about a specific resource, such as
a book or author.
sparql
19/70
?book <http://purl.org/dc/terms/title> ?title .
}
Querying Steps:
Data Formats: Linked data is typically represented in formats like RDF/XML, Turtle, or
JSON-LD.
Data Transformation: In some cases, linked data must be transformed or converted into
another format (e.g., from RDF to a relational database or a JSON object).
Triple Stores: A dedicated database for storing RDF data, called a triple store, allows for
efficient querying of large amounts of linked data.
NoSQL Databases: Sometimes, linked data applications may store data in NoSQL
databases such as MongoDB or GraphDB that support RDF or graph data models.
20/70
Storage Considerations:
Choose an appropriate database or storage mechanism based on the data's scale and
query requirements.
5. Data Presentation
Linked Data Browsers: A tool or user interface that allows users to explore linked data
interactively. These browsers help users visualize relationships between data resources
and navigate across linked datasets.
Technologies: HTML, CSS, JavaScript, frameworks like React or Angular, and libraries
such as D3.js for data visualization.
Presentation Flow:
Allow users to navigate between related entities in the linked data (e.g., clicking on an
author's name to see their works).
RESTful APIs: Many linked data applications expose RESTful APIs for programmatic
access to data, allowing other systems or applications to retrieve linked data.
SPARQL Endpoint APIs: Some applications expose a SPARQL endpoint that allows
remote querying of their linked data.
Interaction Process:
21/70
Expose API endpoints that return linked data in various formats (e.g., RDF, JSON-LD).
Allow users or other applications to interact with the data by sending queries or
requests.
Ensure proper handling of linked data relationships (e.g., book -> author -> publisher).
22/70
Example: Building a Linked Data Application for Authors and Books
1. Data Sources: You might pull data from DBpedia and Wikidata to get information about
authors, books, and publishers.
2. Querying: Use SPARQL to retrieve information about a particular book and its author.
SPARQL Query:
sparql
3. Data Representation: After retrieving the data in RDF format, the application processes
and converts it into a more user-friendly format such as JSON-LD or directly into HTML
for the front-end display.
4. Storage: The application can store the data in a triple store (like Virtuoso) for easy
querying and efficient management of the data.
5. Presentation: On the front-end, users can view a list of books along with their authors
and other related information. Links to related data (e.g., other books by the author) are
provided for easy navigation.
6. Interactivity: Users can click on any book or author to fetch more details about them,
and SPARQL queries are sent in the background to update the data in real-time.
23/70
JSON-LD: For serializing linked data in JSON format.
Triple Stores: For storing and querying large amounts of RDF data.
Web Frameworks: For building the user interface (e.g., React, Angular).
DBpedia
24/70
Description: DBpedia is one of the most well-known linked data projects, extracting
structured data from Wikipedia and publishing it as Linked Data. It covers a wide
range of domains, including people, places, organizations, events, and more.
Link: http://dbpedia.org
Wikidata
Link: https://www.wikidata.org
GeoNames
Link: http://www.geonames.org
Europeana
Link: https://www.europeana.eu
OpenCyc
Description: OpenCyc is a free and open knowledge base of general knowledge that
provides a formal representation of common-sense knowledge. It is widely used for
integrating diverse datasets, particularly for natural language understanding and
artificial intelligence applications.
Link: http://www.opencyc.org
25/70
domains, such as geographic data, social sciences, biology, and more.
The Linked Open Data Cloud visualizes the interlinking of datasets. It includes
datasets from DBpedia, Wikidata, OpenCyc, and many others, all of which are
interconnected to form a global data network.
The diagram is accessible online, showing the relationships between different data
sets.
Data.gov (US):
Description: Data.gov is a U.S. government portal that offers a wide array of open
datasets for public use. It supports linked data principles, providing datasets in RDF
and other accessible formats.
Link: https://www.data.gov
Link: https://data.gov.uk
Description: The European Data Portal offers access to datasets from the EU
institutions and member states, providing a wealth of linked data resources.
Link: https://www.europeandataportal.eu
26/70
4. Research and Scientific Datasets
Various scientific communities and organizations publish their datasets as linked data to
facilitate collaboration and discovery across disciplines.
Bio2RDF
Link: http://bio2rdf.org
Link: https://www.pangaea.de
Link: http://www.linkedopendata.org/drugdata
Link: http://www.od4d.net
Description: This foundation promotes open data standards and offers numerous
linked data resources, primarily for social, economic, and policy data.
27/70
Link: https://okfn.org
Description: LOSS is a project that offers linked data resources for the social
sciences, providing access to demographic, economic, and other social data.
Link: https://www.socialscience.soton.ac.uk
Link: https://www.europeana.eu
Description: Yelp offers a rich set of data regarding businesses, reviews, and
locations. The Yelp Data API allows developers to access linked data for building
location-based applications.
Link: https://www.yelp.com/developers
28/70
Google and other companies for semantic search and knowledge graph
development.
Key Takeaways
Linked data resources on the web represent diverse datasets from various domains, all
interconnected using standard web technologies such as RDF, URIs, and SPARQL. These
resources enable a vast array of applications in areas such as cultural heritage, government,
science, and business intelligence. By leveraging linked data, developers can build rich
applications that integrate and navigate across different data sources, offering new insights
and connections. The availability of these open resources is a key factor in the growth and
adoption of linked data technologies on the web.
Using HTTP to access resources so that they can be queried and retrieved on the web.
29/70
This makes data more accessible and enables users to discover relationships and gain
insights from different sources.
Provide a framework for data that is interoperable across different systems and
platforms.
Use technologies like RDF, OWL (Web Ontology Language), SPARQL, and other
standards to represent and query data in a more meaningful way.
The goal is to make data not just available but also understandable by machines, enabling
more intelligent search, reasoning, and automated decision-making.
Linked Data focuses on making data available and interlinked on the web using simple
principles like URIs and RDF. It makes it easier for machines and humans to access and
navigate interconnected data across the web.
The Semantic Web, on the other hand, goes beyond just linking data. It focuses on
adding meaning (semantics) to the data and enabling advanced reasoning and
interpretation of that data. The Semantic Web relies on more advanced technologies,
such as ontologies (using OWL) and formalized vocabularies, to provide the foundation
for automated data processing.
30/70
4. Challenges for Linked Data to Become the Semantic Web
While Linked Data is a critical step toward realizing the Semantic Web, several challenges
must be overcome for it to fully transform into the Semantic Web:
Scalability: While Linked Data allows data to be interconnected, the web contains vast
amounts of data. Managing, querying, and reasoning over massive datasets in real-time
is still a challenge.
Data Quality and Semantics: Linked Data focuses on making data accessible, but
ensuring that data is accurately semantically enriched and not just linked is important
for achieving the Semantic Web's true potential. Data sources need to be more than just
interlinked; they need to provide rich semantics that machines can use to understand
context and meaning.
Standardization: Despite the availability of standards like RDF, OWL, and SPARQL, the
adoption of these standards is still limited. There are multiple competing standards and
approaches, and not all data is structured in a way that supports Linked Data principles.
Achieving widespread interoperability across various data sources and platforms
remains a challenge.
Data Privacy and Security: As more data becomes interconnected, issues of privacy,
security, and control become more significant. Ensuring that data sharing and
interlinking respect privacy laws and user consent is critical for the Semantic Web to be
widely adopted.
Integration with Legacy Systems: Many existing systems and datasets are not built with
Linked Data principles in mind. Integrating them into a global, interoperable framework
can be difficult and may require significant transformation.
31/70
applications, such as intelligent search engines, advanced recommendation systems,
and data-driven decision-making.
Advancements in AI and Machine Learning: Linked Data can serve as a critical resource
for AI and machine learning systems, providing rich, structured datasets for training
models. These systems can use semantic reasoning to understand context,
relationships, and meaning in data.
Smart Cities and IoT: Linked Data can play a pivotal role in the development of smart
cities and IoT applications, where devices, sensors, and systems need to share and
interpret data across diverse domains. By adopting semantic technologies, these
systems can understand and react to data in a more intelligent, automated way.
Web of Data: The future of the web may be centered around data that is not just linked,
but also understood by machines, enabling a new wave of innovation in fields like
personalized medicine, autonomous vehicles, and personalized content delivery.
The evolution from Linked Data to the Semantic Web will depend on:
Ensuring that data is not only accessible but also semantically meaningful.
Thus, while Linked Data is a critical step in the journey toward the Semantic Web, further
advancements in semantics, reasoning, and interoperability are needed for Linked Data to
fully become the Semantic Web as envisioned by Tim Berners-Lee.
32/70
Semantic HTML refers to the use of HTML tags that convey meaning about the content they
contain, rather than just providing structure. It involves using HTML elements that describe
the type of content they hold, making it more accessible, understandable, and manageable
both for humans and machines (like search engines and assistive technologies).
1. Improved Accessibility
Screen readers: Semantic HTML helps screen readers to understand the structure and
purpose of a webpage. For example, using <article> , <header> , <footer> , and
<nav> tags makes it easier for screen readers to convey the content to visually impaired
users.
Keyboard navigation: It makes keyboard navigation more intuitive since elements like
<button> and <a> are recognized as interactive components.
Content clarity: Semantic elements help search engines understand the structure and
hierarchy of a webpage. This allows search engines to more accurately index content,
improving search rankings.
Structured data: Search engines use semantic HTML to interpret the context of the
content (such as distinguishing between an article, a heading, or a navigation menu).
This enhances the visibility and relevance of content in search results.
Readability: Semantic HTML provides a cleaner, more readable code structure. For
developers, it’s easier to maintain and modify code when the content is well-structured
and clearly defined using semantic tags.
Easier debugging: It is easier to debug semantic HTML since the tags convey the
meaning of the content. It also allows developers to quickly identify parts of the
webpage that perform specific functions, such as <header> , <footer> , <section> , or
<main> .
4. Enhanced Usability
Consistency: Semantic HTML creates a consistent structure, which makes it easier for
users to navigate different websites. Consistent use of semantic tags allows browsers to
33/70
render content in a standard and predictable manner.
Mobile optimization: It improves the website’s adaptability to different screen sizes and
devices, enhancing the user experience, especially in responsive web design.
5. Machine Readability
Data extraction: Semantic HTML helps bots and search engines extract information in a
meaningful way. For example, structured data like <article> tags or <h1> tags helps
web crawlers understand the main topics of the page and their hierarchy.
Linked Data: It aids in linking data across the web, enhancing the potential for
integrating structured information from multiple sources.
<header> : Defines a header for a section or page, often containing navigational links,
branding, and introductory information.
<nav> : Denotes a navigation section containing links to other pages or sections of the
same page.
<main> : Specifies the main content of a document, excluding header, footer, and sidebar
elements.
<aside> : Represents content that is tangentially related to the main content, such as
sidebars or pull quotes.
Conclusion:
34/70
The function of Semantic HTML is to improve accessibility, search engine optimization, code
structure, usability, and machine readability by using HTML elements that accurately
describe the content they contain. By making webpages more understandable for both users
and systems, semantic HTML enhances the overall web experience and supports a more
organized and interoperable web.
1. Entities (Nodes): These are the "things" or objects in the knowledge base. Entities could
represent people, places, organizations, products, or abstract concepts. For example, in
a product knowledge graph, entities might include items like "laptop," "smartphone,"
"company," or "customer."
2. Relationships (Edges): These define how entities are related to each other. A relationship
links two entities and describes how they interact or connect. For example, a relationship
might connect "laptop" and "company" with a relationship labeled "manufactured by."
35/70
3. Attributes: These are the properties or characteristics of entities. For example, an entity
"laptop" may have attributes such as "brand," "processor type," "price," and "release
year."
4. Context: Knowledge graphs incorporate the context in which entities and relationships
exist. This context helps derive meaning from the data. For instance, a relationship
between "laptop" and "company" might be different based on the country of operation,
time period, or market segment.
5. Metadata: Additional information about the data itself (e.g., source, credibility,
timestamp) that helps to interpret the meaning of the entities and relationships.
Database-Centric Approach:
Table Structure: A relational database may store product data in tables such as
Products , Categories , Customers , and Orders . The relationships between them are
defined using foreign keys.
Entities: Products, customers, orders, categories, brands, and features are entities in the
knowledge graph.
36/70
Relationships: The graph establishes relationships like "purchased by," "related to,"
"reviewed by," "belongs to category," and "has feature."
Attributes: Each product entity could have attributes like "price," "color," "brand,"
"rating," and "availability."
Context and Inference: The knowledge graph allows for rich recommendations by
inferring relationships. For instance, if a user buys a "smartphone," the graph might
recommend accessories (like cases or headphones) that are "related to" the smartphone,
and it might suggest a specific brand based on the user's past behavior (e.g., preference
for "Apple" products).
A knowledge graph connects data in ways that are more natural to human
understanding. It reflects how entities are related in the real world, such as how
"laptops" are connected to "manufacturers" or how "customers" are linked to
"purchase history."
2. Context-Aware Insights:
The ability to interpret data in its full context allows knowledge graphs to provide
more meaningful insights. For example, in a knowledge graph, "smartphone" might
have different meanings in various contexts (e.g., by brand, by feature, or by market
segment), and it can evolve as new relationships or facts are added.
3. Complex Querying:
Knowledge graphs support more complex queries, like finding entities that are
indirectly related. For example, "Find customers who bought a 'laptop' and also
purchased 'headphones' in the last 6 months" can be done more intuitively with
knowledge graphs.
4. Evolving Data:
As new information is added, knowledge graphs can continuously evolve, learn, and
infer new relationships. For example, if new data on "customer preferences"
becomes available, a knowledge graph can infer new recommendations or business
insights.
37/70
5. Semantic Understanding:
Conclusion
The shift from database-centric models to knowledge base/knowledge graph-centric models
reflects a transition from simple data storage to more complex, interconnected, and
semantically rich systems. Knowledge graphs offer a more holistic approach to
understanding and utilizing data, providing better support for complex queries, context-
aware insights, and automated reasoning.
In the use case of an e-commerce platform, a knowledge graph enables more accurate
product recommendations, personalized experiences, and deeper insights into user
behavior, providing a richer and more scalable solution than traditional databases. This shift
towards knowledge graph-based systems is becoming increasingly common in areas like e-
commerce, social media, healthcare, and more, as organizations seek to harness the power
of interconnected data to drive innovation and provide meaningful, context-aware services.
38/70
Mary. Obviously there is a semantic error here. How should
the semantic model make this error impossible?
iv) What relationship exist between "is child of" and "is
parent of"?
OWL Constructs:
Object Property: In OWL, you would use an object property to define relationships
between two individuals (males and females in this case). You can define an object
property like hasPartner or isMarriedTo to represent the relationship.
Domain and Range: You can specify that the domain of the property is the class of
males and the range is the class of females, or vice versa, depending on the
direction of the relationship. This would look like:
php
<ObjectProperty rdf:about="#hasPartner">
<rdfs:domain rdf:resource="#Male"/>
<rdfs:range rdf:resource="#Female"/>
</ObjectProperty>
RDFS Constructs:
rdfs:domain and rdfs:range can be used in RDFS to define which classes the
property applies to.
For example:
php
<rdf:Property rdf:about="#hasPartner">
<rdfs:domain rdf:resource="#Male"/>
<rdfs:range rdf:resource="#Female"/>
</rdf:Property>
39/70
ii.) Relationship Between Persons, Males, and Females
Relationship: "is a type of" (Subclass relationship)
Description: The relationship between males, females, and persons is that both
males and females are subclasses of persons. This means that every male or female
is also a person, but not every person is necessarily a male or female (there could be
other subclasses of persons, such as non-binary).
RDFS Constructs:
php
<rdf:Class rdf:about="#Male">
<rdfs:subClassOf rdf:resource="#Person"/>
</rdf:Class>
<rdf:Class rdf:about="#Female">
<rdfs:subClassOf rdf:resource="#Person"/>
</rdf:Class>
iii.) Semantic Error: Bob and Peter Cannot Both Be the Father of Mary
Problem: In the real world, Bob and Peter cannot both be the father of Mary
simultaneously unless there is a specific family structure that permits it (e.g., in a
polyamorous relationship with legal recognition of multiple fathers). To model this in an
ontology, we should ensure that each person has only one father.
Solution:
OWL Object Property: You would use an object property to represent the
relationship:
40/70
php
<ObjectProperty rdf:about="#hasFather">
<rdf:domain rdf:resource="#Person"/>
<rdf:range rdf:resource="#Male"/>
<rdf:cardinality
rdf:datatype="xsd:nonNegativeInteger">1</rdf:cardinality>
</ObjectProperty>
This ensures that each person has exactly one father. If you try to assert
multiple fathers, the ontology would produce an inconsistency, thus making the
error impossible.
iv.) Relationship Between "is child of" and "is parent of"
Relationship: Inverse Relationship
Description: The relationship between "is child of" and "is parent of" is one of
inverse properties. If someone is the child of another person, that person is their
parent, and vice versa.
OWL Constructs:
Object Properties Inverse: In OWL, you can define inverse relationships using the
inverseOf property. This ensures that if isChildOf exists between two individuals,
then isParentOf automatically holds as its inverse. For example:
php
<ObjectProperty rdf:about="#isChildOf">
<rdfs:domain rdf:resource="#Person"/>
<rdfs:range rdf:resource="#Person"/>
<rdf:inverseOf rdf:resource="#isParentOf"/>
</ObjectProperty>
<ObjectProperty rdf:about="#isParentOf">
<rdfs:domain rdf:resource="#Person"/>
<rdfs:range rdf:resource="#Person"/>
</ObjectProperty>
41/70
This defines isChildOf as the inverse of isParentOf , so whenever one
relationship holds, the inverse relationship holds as well.
Here are some of the key inductive techniques typically applied to knowledge graphs:
1. Graph Embedding
Overview: Graph embedding techniques transform graph data (nodes, edges, and
relationships) into low-dimensional vector representations that capture the graph's
structural and semantic properties.
Techniques:
Node Embedding: Algorithms like node2vec and DeepWalk aim to represent nodes
in a graph as vectors while preserving their local neighborhood structure. This
allows the system to infer relationships or similarities between entities based on
their proximity in the graph.
Graph Embedding Models: Approaches like TransE, DistMult, and ComplEx model
relationships between entities in a knowledge graph and create embeddings that
capture both entities and their relationships in continuous vector spaces. These
embeddings can be used for tasks such as link prediction, classification, and
clustering.
Applications:
42/70
Link Prediction: Inferring missing links (relationships) in the graph by identifying
similar nodes or patterns.
2. Path-based Learning
Overview: This approach focuses on learning from paths that connect nodes in the
graph. A path represents a sequence of edges and nodes between two entities, and
these paths can carry rich semantic meaning.
Techniques:
Path Ranking Algorithms: Algorithms like PATHNER and TransPath aim to rank or
score paths in a graph to predict possible relationships between entities. These
techniques consider the semantic information conveyed by the entire path, not just
the individual edges.
Random Walks: Random walk-based models use the concept of walking through the
graph from one node to another to capture relationships and infer possible
connections.
Applications:
Link Prediction: Using the paths between nodes to predict new relationships
between them.
Knowledge Transfer: Extracting new facts by learning from paths that may involve
multiple entities or relationships.
Techniques:
43/70
Graph Convolutional Networks (GCNs): A type of GNN that propagates information
across nodes by considering the node’s neighbors, making it well-suited for graph-
based prediction tasks.
Applications:
Link Prediction: GNNs can predict missing edges by learning the graph's topology
and node features.
Node Classification: Classifying entities in the graph based on their features and
relations.
4. Rule Induction
Overview: Rule induction involves learning patterns or rules from the data, typically
expressed as logical implications or relational patterns in the graph.
Techniques:
Relational Inductive Logic: This extends ILP to handle relational data, including
nodes and edges, where the relationships themselves can be generalized into rules.
Applications:
44/70
5. Clustering and Community Detection
Overview: These techniques are used to group entities in a graph that are similar or
closely connected, which can help discover hidden patterns and relationships.
Techniques:
Applications:
Techniques:
Applications:
45/70
Entity Matching: Identifying similar or equivalent entities across different
knowledge graphs or datasets.
Conclusion
Inductive techniques in knowledge graphs enable systems to go beyond explicitly stated
facts and make inferences, predictions, and generalizations based on the data structure and
relationships within the graph. These techniques, ranging from graph embeddings to
community detection and rule induction, are essential for extracting knowledge, enhancing
recommendation systems, discovering hidden patterns, and improving the scalability and
accuracy of knowledge graph-based applications.
By transforming the graph data into vectors, KGE models capture the relationships between
entities, allowing for operations like measuring the similarity between entities or predicting
missing relationships.
46/70
In TransE, an entity embedding is a vector that represents an entity, and a relation
embedding is a vector that represents the relation between two entities.
The key idea behind TransE is that the relation between two entities can be viewed as a
translation from one entity to another. Specifically, for a triple in the form of (h, r, t),
where:
The main objective is to learn embeddings for h, r , and t such that the following equation
holds:
h+r ≈t
where:
Entity Embeddings
Entity embeddings in TransE are represented as vectors that encode the semantic meaning
of an entity in the knowledge graph. Each entity ei in the graph is assigned an embedding vi ,
a continuous vector in a low-dimensional space. These entity embeddings are learned during
the training process.
For example:
If we have entities like "Paris", "France", and "capital_of", the embeddings for "Paris" and
"France" will be vectors in a high-dimensional space, and the embedding for "capital_of"
will also be a vector encoding the relationship between them.
Relation Embeddings
47/70
Relation embeddings are also represented as vectors in the same embedding space. Each
relation r in the knowledge graph is assigned a vector vr , which encodes the semantics of
that relation.
For example:
The relation "capital_of" would have a vector that reflects how the entities "Paris" and
"France" are related (in this case, Paris is the capital of France).
1. Initialization:
Each entity ei and relation rj is initialized with a random vector in the embedding
space.
2. Objective Function:
TransE defines an objective function based on the distance between the entities and
the relation. The objective is to minimize the distance between the sum of the head
entity h and the relation r , and the tail entity t.
L = ∑ ∥h + r − t∥p
(h,r,t)
where p can be 1 (L1-norm) or 2 (L2-norm), and the summation is over all observed
triples in the training set.
3. Training:
During training, the embedding vectors for entities and relations are adjusted to
minimize the objective function. In practice, negative sampling is often used to
generate "negative" triples (i.e., triples that do not hold in the graph) to help the
model distinguish valid triples from invalid ones.
For each valid triple (h, r, t), the model tries to make the distance h + r − t small,
while for a corrupted (negative) triple, it tries to make the distance large.
4. Learned Embeddings:
After training, the learned embeddings for entities and relations capture the
semantic properties of the entities and the nature of their relationships. For
example, the embedding for "capital_of" will have a vector that, when added to the
embedding of "France," brings it close to the embedding of "Paris."
48/70
Summary of TransE Model for Entity and Relation Embeddings
Entity Embeddings: The embeddings for entities are vectors that represent their
meaning in the knowledge graph. They are learned by the model during training.
Relation Embeddings: The embeddings for relations are vectors that capture the
semantic meaning of the relationships between entities. They define how one entity
(head) is related to another entity (tail).
Objective: The primary objective of TransE is to learn embeddings such that the head
entity plus the relation approximately equals the tail entity in the embedding space, i.e.,
h + r ≈ t.
Training: The embeddings are optimized by minimizing the distance between h + r and
t for positive triples and maximizing the distance for negative triples.
TransE's simplicity and effectiveness have made it a popular choice for knowledge graph
embedding tasks, particularly for link prediction and graph completion.
Here is the R2RML mapping document for the given database schema and its RDF
representation:
turtle
49/70
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix : <http://data.example.com/> .
50/70
rr:objectMap [ rr:column "LOC" ]
] ;
rr:predicateObjectMap [
rr:predicate ex:staff ;
rr:objectMap [
rr:parentTriplesMap <TriplesMap_EMP> ;
rr:joinCondition [
rr:child "DEPTNO" ;
rr:parent "DEPTNO"
]
]
] .
Explanation:
1. Prefixes:
rr : R2RML vocabulary.
2. TriplesMap_EMP:
3. TriplesMap_DEPT:
This R2RML document ensures the conversion of the database into the desired RDF triples
shown in Figure 2.
51/70
b. Draw the working flows of direct and R2RML mappings.
Write the advantages of R2RML mapping over direct
mapping.
Direct mapping is an automatic process that converts relational database tables directly into
RDF without additional customization.
Steps:
1. Extract Schema: The database schema is read (tables, columns, and relationships).
Columns are mapped to RDF properties, and the primary key is used as part of the
URI.
3. Assign Classes:
Each table is treated as a class, and rows are instances of that class.
4. Output RDF Graph: The resulting RDF graph is generated without customization.
Diagram:
graphql
R2RML mapping is a user-defined process that provides greater control over the RDF
generation process.
Steps:
52/70
Define how tables, columns, and relationships map to RDF resources, classes, and
predicates.
2. Extract Data:
RDF resources, predicates, and objects are created as per the R2RML rules.
4. Output RDF Graph: The resulting RDF graph is customized based on user-defined
mapping rules.
Diagram:
java
Database Tables → R2RML Mapping Document → Custom RDF Generation → RDF Triples
(R2RML Mapping)
R2RML allows full control over how relational data is mapped to RDF. You can specify
templates, predicates, and custom relationships, whereas direct mapping follows a
fixed pattern.
2. Complex Relationships:
R2RML supports advanced features like join conditions and hierarchical mappings,
which are not possible with direct mapping.
R2RML enables custom URI patterns and namespaces, making RDF data more
meaningful and compatible with linked data principles.
4. Class Assignments:
R2RML allows mapping rows to specific RDF classes, whereas direct mapping uses a
generic mapping where every table becomes a class.
53/70
5. Scalability for Complex Schemas:
R2RML is suitable for complex database schemas with many relationships, providing
a structured and meaningful RDF representation.
Summary:
Direct Mapping is faster and easier for small-scale databases or quick conversions but
lacks flexibility.
R2RML Mapping provides granular control, making it ideal for large, complex databases
where semantic meaning and ontology reuse are important.
1. RDF/XML
RDF/XML is the W3C-recommended XML-based syntax for RDF. It is suitable for systems that
rely on XML-based data processing.
Example:
xml
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:ex="http://data.example.com/">
<rdf:Description rdf:about="http://data.example.com/employee/7369">
<rdf:type rdf:resource="http://data.example.com/Employee"/>
<ex:name>SMITH</ex:name>
<ex:department rdf:resource="http://data.example.com/department/10"/>
</rdf:Description>
<rdf:Description rdf:about="http://data.example.com/department/10">
54/70
<rdf:type rdf:resource="http://data.example.com/Department"/>
<ex:name>APPSERVER</ex:name>
<ex:location>NEW YORK</ex:location>
<ex:staff>1</ex:staff>
</rdf:Description>
</rdf:RDF>
2. Turtle
Turtle (Terse RDF Triple Language) is a human-readable format for RDF that uses prefixes
and compact syntax.
Example:
turtle
3. JSON-LD
JSON-LD is a JSON-based serialization format for RDF, suitable for web-based applications
due to its lightweight and linked data-friendly structure.
Example:
json
55/70
{
"@context": {
"ex": "http://data.example.com/",
"rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#"
},
"@graph": [
{
"@id": "ex:employee/7369",
"@type": "ex:Employee",
"ex:name": "SMITH",
"ex:department": {"@id": "ex:department/10"}
},
{
"@id": "ex:department/10",
"@type": "ex:Department",
"ex:name": "APPSERVER",
"ex:location": "NEW YORK",
"ex:staff": 1
}
]
}
4. N-Triples
N-Triples is a line-based plain text format where each triple is written as a single line. It is
suitable for processing large datasets due to its simplicity.
Example:
plaintext
<http://data.example.com/employee/7369> <http://www.w3.org/1999/02/22-rdf-syntax-
ns#type> <http://data.example.com/Employee> .
<http://data.example.com/employee/7369> <http://data.example.com/name> "SMITH" .
<http://data.example.com/employee/7369> <http://data.example.com/department>
<http://data.example.com/department/10> .
<http://data.example.com/department/10> <http://www.w3.org/1999/02/22-rdf-syntax-
ns#type> <http://data.example.com/Department> .
<http://data.example.com/department/10> <http://data.example.com/name> "APPSERVER" .
56/70
<http://data.example.com/department/10> <http://data.example.com/location> "NEW
YORK" .
<http://data.example.com/department/10> <http://data.example.com/staff> "1" .
JSON-LD Web-friendly, integrates easily with JSON APIs, modern web-based applications
By selecting a suitable format based on the use case, RDF data can be effectively shared and
utilized in semantic web applications.
1. Web-Friendliness
JSON-LD: Based on JSON, a widely used and well-supported data format in web
technologies, making it easy to integrate with APIs, front-end frameworks, and web
services.
Turtle: Designed specifically for RDF and is not inherently compatible with web
technologies like JSON.
JSON-LD: Supported by a wide range of tools and programming languages due to its
foundation in JSON, making it easier to parse, manipulate, and serialize.
57/70
Turtle: Requires specialized libraries for parsing and handling, which are less commonly
available compared to JSON parsers.
3. Ease of Adoption
JSON-LD: Developers familiar with JSON can quickly adopt and understand JSON-LD,
even without prior knowledge of RDF.
Turtle: Requires familiarity with RDF concepts and Turtle syntax, which has a steeper
learning curve for non-RDF experts.
JSON-LD: Used extensively for embedding structured data into web pages, especially for
search engine optimization (SEO). Google and other search engines explicitly
recommend JSON-LD for schema markup.
Turtle: Not intended for embedding structured data into HTML documents, limiting its
use in SEO.
JSON-LD: Combines data and context in a single compact representation, reducing the
need for external ontologies or definitions.
Turtle: Requires prefixes for namespaces, which can sometimes make it less intuitive
and verbose.
6. Ease of Embedding
58/70
Turtle: Cannot be directly embedded into HTML documents; it is primarily a standalone
serialization format.
JSON-LD: Easily integrates with the broader web of data, enabling context definitions to
be linked via URLs.
Turtle: Focuses on RDF triples without built-in mechanisms for linking context
definitions.
JSON-LD’s ability to integrate with modern web technologies and its alignment with JSON
ecosystems make it more versatile for many contemporary use cases.
Use Case
59/70
Example
"Alice says that Bob is the author of the book 'Semantic Web'."
In RDF:
Subject: ex:Bob
Predicate: ex:isAuthorOf
Object: ex:SemanticWebBook
php
Reification
To make a statement about the triple itself (e.g., that Alice says this), we reify the statement.
This involves creating a new resource that represents the triple and then making statements
about this resource.
rdf:subject
rdf:predicate
rdf:object
vbnet
60/70
ex:Statement1 rdf:subject ex:Bob .
ex:Statement1 rdf:predicate ex:isAuthorOf .
ex:Statement1 rdf:object ex:SemanticWebBook .
2. Add Metadata
vbnet
vbnet
Explanation
ex:Statement1 represents the RDF triple: "Bob is the author of Semantic Web."
61/70
1. Tracks provenance and source of information.
Limitations
1. Verbosity: Reification increases the number of triples and can make data more verbose.
2. Performance: Querying reified data can be slower due to the additional triples.
3. Complexity: Reified data is harder to interpret and manage compared to simple triples.
In modern RDF systems, alternatives like named graphs or property graphs are often used to
address these limitations.
turtle
62/70
@prefix ex: <http://www.example.org/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
# Defining Classes
ex:Person rdf:type rdfs:Class .
ex:Actor rdf:type rdfs:Class .
ex:Director rdf:type rdfs:Class .
ex:Movie rdf:type rdfs:Class .
# Subclass Relationships
ex:Actor rdfs:subClassOf ex:Person .
ex:Director rdfs:subClassOf ex:Person .
# Defining Properties
ex:directs rdf:type rdf:Property .
ex:stars rdf:type rdf:Property .
ex:name rdf:type rdf:Property .
ex:title rdf:type rdf:Property .
ex:year rdf:type rdf:Property .
turtle
# Director Instance
ex:reitman rdf:type ex:Director ;
ex:name "Jason Reitman" .
# Movie Instance
ex:juno rdf:type ex:Movie ;
ex:title "Juno" ;
ex:year "2008"^^xsd:gYear .
# Actor Instances
ex:page rdf:type ex:Actor ;
63/70
ex:name "Ellen Page" .
# Relationships
ex:reitman ex:directs ex:juno .
ex:juno ex:stars ex:page, ex:cera .
Explanation
1. Terminological Triples:
Define the ontology, such as classes (e.g., Person , Actor ) and properties
( directs , stars ).
2. Asserted Facts:
Represent specific instances (e.g., reitman is a Director with the name "Jason
Reitman").
This Turtle model fully represents the RDF graph shown in Figure 3.
64/70
1. Inferred Triples
Below are the triples inferred from the RDFS semantics, along with justifications for each.
1. Triple:
turtle
2. Triple:
turtle
1.2 Type Inheritance for Instances (RDFS Rule: rdf:type and rdfs:subClassOf ) 3. Triple:
turtle
Justification:
ex:Actor rdfs:subClassOf ex:Person means that all ex:Actor instances are also
ex:Person .
4. Triple:
turtle
65/70
ex:cera rdf:type ex:Person .
Justification:
5. Triple:
turtle
Justification:
1.3 Property Domain and Range (RDFS Rule: rdfs:domain and rdfs:range ) 6. Triple:
turtle
Justification: Inferred based on the usage of ex:directs in the graph, linking a Director
( ex:reitman ) to a Movie ( ex:juno ). This is a new triple.
7. Triple:
turtle
Justification:
66/70
ex:Actor rdfs:subClassOf ex:Person infers that ex:stars 's domain is
ex:Person .
8. Triple:
turtle
turtle
Justification: Reflexivity of rdfs:subClassOf . Explicitly true but not explicitly stated. This is a
new triple.
10. Triple:
turtle
Justification: Reflexivity of rdfs:subClassOf . Explicitly true but not explicitly stated. This
is a new triple.
67/70
ex:reitman rdf:type ex:Person
Summary Table
New
Triple Reason for Inference Triple?
68/70
C. Write an SPARQL query to retrieve the following query:
"Retrieve the number of movies starred by Ellen Page per
year" The query must compile and include a proper
definition of the prefixes used.
sparql
Explanation:
1. Prefixes:
ex: is the namespace for your custom data model, as shown in the graph
( http://www.example.org/ ).
2. Query Logic:
?movie ex:stars ex:page filters the movies where Ellen Page starred.
69/70
?movie ex:year ?year retrieves the year of the movie.
3. Result:
The query will return a list of years and the count of movies Ellen Page starred in for
each year.
70/70