skip to main |
skip to sidebar
There is lots being published about Linked Data. I just saw that the Spring 2009 PriceWaterhouseCooper technology forecast is full of data Web and Semantic web coolness. But, before I jump into the forecast, I would like to give some background on the Linked Data work that is happening in the industry today.
Linking Open Data (LOD) is a W3C project. According to their web site, "The goal of the W3C SWEO Linking Open Data community project is to extend the Web with a data commons by publishing various open data sets as RDF on the Web and by setting RDF links between data items from different data sources.RDF links enable you to navigate from a data item within one data source to related data items within other sources using a Semantic Web browser. RDF links can also be followed by the crawlers of Semantic Web search engines, which may provide sophisticated search and query capabilities over crawled data. As query results are structured data and not just links to HTML pages, they can be used within other applications. ... Collectively, the data sets consist of over 4.7 billion RDF triples, which are interlinked by around 142 million RDF links (May 2009)."
Here is the LOD figure showing what is linked today (actually March 2009):

Just to get a feel for what is included ... let me note that DBpedia (the bigger circle in the left center of the image) provides structured access to Wikipedia's human-oriented data (actually, it provides a SPARQL interface). According to DBpedia's web site, "The DBpedia knowledge base currently describes more than 2.6 million things, including at least 213,000 persons, 328,000 places, 57,000 music albums, 36,000 films, 20,000 companies. The knowledge base consists of 274 million pieces of information (RDF triples). It features labels and short abstracts for these things in 30 different languages; 609,000 links to images and 3,150,000 links to external web pages; 4,878,100 external links into other RDF datasets, 415,000 Wikipedia categories, and 75,000 YAGO categories. The DBpedia knowledge base has several advantages over existing knowledge bases: it covers many domains; it represents real community agreement; it automatically evolve as Wikipedia changes, and it is truly multilingual. The DBpedia knowledge base allows you to ask quite surprising queries against Wikipedia, for instance “Give me all cities in New Jersey with more than 10,000 inhabitants” or “Give me all Italian musicians from the 18th century”. Altogether, the use cases of the DBpedia knowledge base are widespread and range from enterprise knowledge management, over Web search to revolutionizing Wikipedia search."
Going back to Tim Berners-Lee's request for us to imagine what it would be like to have people load and connect knowledge, let's imagine what all this data can do for a business and its decision making processes ....
Web 3.0 is coming up (a lot) in posts on Read-Write Web and in other places. One Read-Write Web posting (The Web of Data, written by Alexander Korth in April of this year) discussed the 3 aspects of the next web (Web 3.0) ... "In the coming years, we will see a revolution in the ability of machines to access, process, and apply information. This revolution will emerge from three distinct areas of activity connected to the Semantic Web: the Web of Data, the Web of Services, and the Web of Identity providers. These webs aim to make semantic knowledge of data accessible, semantic services available and connectable, and semantic knowledge of individuals processable ...".Tim Berners-Lee focused on the Web of Data in his TED talk on the next Web (recorded in Feb 2009). The talk is only a little longer than 15 minutes in length, and I highly recommend it. The key points are that we are now moving from a document-centric approach to storing information, to making raw data available and processable. That raw data is "linked data" - data about things (identified by URIs), including other interesting information (as RDF triples) and highlighting the relationships between the things. It is important to note that this is not about making data available through specific APIs or anticipated/pre-programmed queries on a "pretty" web site - but about making the "unadulterated data" available for machine understanding and new uses. It is about sharing and adding to data, making connections and relationships in novel ways, and bridging disciplines.If you think about business and an enterprise, think about how powerful this would be - to capture knowledge, share it via social networking technologies, allow update and addition to the knowledge within the enterprise (again using the social networking tools of today), and to bridge disciplines and knowledge using the Semantic web mining and matching technologies. Overall, we improve the ability of the enterprise to capture and access its knowledge, and increase the captured knowledge. In the talk, Tim Berners-Lee asks people to imagine the "incredible resource" of "people doing their bit to produce a little bit, and it all connecting."Just imagine ....