- Category: Blog
- Published on Monday, 16 May 2011 22:04
- Written by Pantelis Sopasakis
- Hits: 4415
The use of RDF in OpenTox - An overview of the experience gained regarding RDF technologies and RDF parsing/serialization in Java. Evaluation of the choice of RDF as the common exchange file format among OpenTox web services.
In this article we introduce the reader to the notions of the Semantic Web or Web of Data, the Resource Description Framework (RDF) and Web Ontologies (OWL). We explain how these technologies are incorporated in the OpenTox framework for the needs of predictive toxicology and we share our experience on RDF parsing and serialization. We present various benchmarking results on RDF processing and we make an evaluation of the use of RDF in OpenTox outlining its pros and cons.
Let us first describe the web as we already know it to identify certain inadequacies. The world wide web is characterized by such a plethora of information that you can find practically everything: news, the weather, tutorials, music, videos and much more. The information on the web is now distributed in a way that different servers own different data and we can say that they keep it for themselves. Data are formated in HTML (or other loosely-structured formats) and little are they linked to data provided by other servers or applications nor do they allow any kind of reasoning to be carried out by a machine. The knowledge that can be found in a web page is tuck away as it cannot be understood by any software, it cannot be processed and no inference can be applied.
There is a huge potential of capabilities we can benefit from a web of data or otherwise called semantic web. According to w3.org, the semantic web provides a common framework that allows data to be shared and reused across application, enterprise and community boundaries. A web of data requires a certain infrastructure which links data to each other and allows their integration.
Bob on his web page has a short paragraph where we talks about himself formatted in HTML. This is how it looks like:
<h3>Bob Smith - Personal Info</h3>
<p>Hi Folks, my name is Bob Smith and I am currently working for <em>XYZ international</em> - check out the <a href=...>company site</a>.</p>
<p>For more information you can mail me at bob_smith[at]yahoo.it or call me in the office. The number is +34567891011. I was born in Nicago in 1985</p>
This is fine when it is to be presented to a human. But what about a machine? What would a software application understand from the above text and what deductions could it make? More questions are raised like for example what if one wants to find on the web a person whose first name is Bob and is 25 to 27 years old? For that to be feasible we need an infrastructure that governs how data are formatted for computers to able to understand them. As Ivan Herman has put it, "Imagine of a web where documents are available for download on the Internet but there would be no hyperlinks among them".
Let us see now what Bob Smith did to allow client to be able to parse this information. He introduced his own XML schema and published the following document...
Here the information is much more structured than before. However a machine should be aware of Bob Smith's XML schema in order to parse it and in order to "understand it". And since anyone can invent an XML schema to describe what he has in mind, it is not much helpful towards a web of data...
Admittedly, today the grounds have shifted and the whole www moves towards a semantic web were data are self-described and elaborate queries will be feasible.
Semantic Web References
- Semantic Web on wikipedia is a good article to introduce the reader to the basics.
- The home page of the W3C semantic web activity.
- What is the semantic web? An article by purl.org.
- An article about the semantic web by Sean B. Palmer: The semantic web, takes form.
- Next >>