ToxOtis is a Java interface to the predictive toxicology services of OpenTox. ToxOtis is being developed to help both those who need a painless way to consume OpenTox web services and for ambitious service providers that don't want to spend half of their time in RDF parsing and creation, database management and security measures.
You can use ToxOtis to search in databases with chemical compounds, download a compound in any supported MIME type (e.g. SDF, SMILES, MOL etc), find a property for a compound (e.g. its LD50 lethal dose concentration), publish your chemicals in an online database, train QSAR models (regression, classification, clustering etc) and lots of other functionalities. Incorporation of ToxOtis into your services will relieve you from the labour of creating RDF documents.
Contents |
ToxOtis is a Java API for accessing the OpenTox network of web services. Check out our blog for news on ToxOtis. Javadoc for version 0.7.1 is now available from here.
You can use ToxOtis to search in databases with chemical compounds, download a compound in any supported MIME type (e.g. SDF, SMILES, MOL etc), find a property for a compound (e.g. its LD50 lethal dose concentration), publish your chemicals in an online database, train QSAR models (regression, classification, clustering etc) and lots of other functionalities. Incorporation of ToxOtis into your services will relieve you from the labour of creating RDF documents. Documentation including lots of examples an be found also on-line.
OpenTox components are the core elements in ToxOtis. These are all entities manipulated in OpenTox each one of which has a corresponding representation in RDF, i.e. a standard representation of a data model that describes it. Example of such components are Algorithms, Models, Tasks and Datasets. As far as their ontological nature and the corresponding RESTful API are concerned, you can find detailed documentation at the OpenTox site. From a programmatic point of view, all classes in org.opentox.toxotis.core and in org.opentox.toxotis.core.component subclass OTComponent and implement IOTComponent. This is an abstract class holding a URI and a MetaInfo field for all its subclasses; a very useful (abstract) method is included in this class: public abstract Individual asIndividual(OntModel model) which is implemented by all subclasses of OTComponent and allows the users to get an RDF representation straight from the component (we will provide some explanatory code snippets in the sequel). Other intermediate levels of abstraction are available such as OTOnlineResource and OTPublishable. All components are characterized by their meta information which consist of a subset of the Dublin Core properties, some RDFS and OWL properties and a couple of OpenTox specific properties (like ot:hasSource).
The following figure gives an overview of all OpenTox components and their interconnections. At the OpenTox web site one can find more information about these components.
An algorithm (source code) (javadoc) is characterized by a set of Ontological Classes that classify it according to the OpenTox Algorithms Ontology. As algorithms can be used for a wide variety of purposes (e.g. model building, feature calculation, feature selection, similarity calculation, substructure matching), required and optional input parameters and algorithm results (e.g. model or dataset URIs, literal values) have to be specified in the algorithm representation together with a definition of the algorithm. A set of parameters along with their scope (optional/mandatory) and default values are also available for every algorithm. Already JAQPOT3 implements the OPTIONS method providing guidelines to the client (templated machine-readable documentation) on its consumption.
BibTeX (source code) (javadoc) is a bibliographic reference characterized by the following attributes:
The above attributes are compliant with the Knouf ontology (see also this summary).
A Conformer is an identifier of a unique chemical substance up to its 3D characteristics. The class Conformer does not play yet an important role in ToxOtis. It is an extension of Compound that will be used in special cases as for example to access 3D-sensitive descriptor calculation services or in general services and resources that take inro account the 3D conformation of the chemical substance they cope with.
A Compound is a wrapper for a set of conformers and also (when used to identify a chemical substance) acts as a proxy for a conformer. Various methods have been implemented to provide access to available properties about a compound.
A Feature is an object,representing any kind of property, assigned to a Compound. The feature types are determined via their links to ontologies (Feature ontologies, Decsriptor ontologies, Endpoints ontologies). OpenTox has established an ontology for biological/toxicological and chemical features that is available online.
A Dataset consists of a List of Data Entries. Each Data Entry contains a Compound and a Feature-Value Pair. A Dataset can be converted into a weka.core.Instances object.
Provides different representations for QSAR/toxicology models. Models are the output/result of learning algorithms and cannot be modified. To make use of a model for prediction, it is necessary to have a dataset with compatible descriptors/features. If the dataset_service parameter is POSTed via HTTP, a new dataset will be created if on the other hand the result_dataset parameter is stated, the stated dataset will be updated with the predicted feature values. In other words, a new "column" for the predictions is added to the input dataset. If none of the two parameters is given, a default dataset service is used and a new dataset is created.
A Model is characterized by its parameters, the training dataset (the dataset used to produce the model), a set of independent variables and its predicted and dependent variables.
In OpenTox various computational procedures such as model training, data filtering or even data upload operations can be time consuming. While such operations take place, the socket that binds the client to the server has to be released. Such a connection is prone to network errors and additionally it would be quite burdensome for the server to keep lots of connections open until each operation completes. In order to tackle this problem three structures have to be adopted by the services: background jobs, queues and execution pools.
A client request initializes a background job which is put in an execution queue (In the meanwhile other jobs may run in the background). A task is then created and returned to the user in a supported MIME like application/rdf+xml. A task is a resource that contains information about the progress and the status of a background job. While the background job is running, an estimation of the time left for the completion of its execution (percentage completed) is also included in the task. Clients can access the task URI to inspect the progress of their task and be informed about its (successful or unsuccessful) completion. We could say that a task is the loading bar of RESTful web services.
Tasks are characterized by their percentage of completion, their status (running, completed, cancelled, error), the result URI (appears upon completion) and an error report in case of an exceptional event. In particular a Task holds the following attributes:
Tasks in OpenTox accept four status values: running, queued, completed, error, rejected and canceled. A task is tagged as running when the background job is either in the execution pool. Details about the progress of the execution may be revealed to the user as meta data (comment, description) in the task representation. If a background job was interrupted by some exceptional event, either because of a bad request or some internal server error, the related task must include an error report for this event (see API documentation for Error Reports). Completed tasks bear a link to the generated resource. Moreover, clients can cancel a task by applying the DELETE method on the task URI. It is up to the service provider to automatically cancel tasks that do not complete in a reasonable time period.
Error Reports are part of the OpenTox API since version 1.1. Error Reports define a formal way to handle exceptional situations while invoking a service or during inter-service communication thus facilitating debugging. They are sufficiently documented on-line at http://opentox.org/dev/apis/api-1.1/Error%20Reports. Error Reports are characterized by the actor of the exception (URI of the server or client that initiated the event), a brief explanatory message about the event, a detailed message that can also contain technical details , the HTTP status code that accompanies the report and another error report if the server reporting the exception acts as an error-proxy for an exception that happened while the server was acting as a client or proxy to some other server.
Features of Error Reports:
Users are entities that might optionally be exposed as resources by an OpenTox web service. Web services can use User resources to cater for accounting and implementation-specific access rights (e.g. various administrative tasks on the server can be exposed through an HTTP/REST interface). JAQPOT3 is so far the only OpenTox web service to have Users as resources and introduce User Quota. Models, Tasks, Compounds and other resources that can be created by a user, link back to it using the property ot:createdBy.
In OpenTox, access is controlled by an SSO (Single Sign-On) server (based on Sun’s openSSO). Single Sign-on can control access to systems based on any distributed architecture. Different services address to an SSO server to authenticate a client and ask permission for a given request. More on Single sign-on can be found at wikipedia. Read also our blog posts on Access Control in OpenTox and A&A for opentox.ntua.gr.
The OpenTox A&A API is documented at http://opentox.org/dev/apis/api-1.1/AA. If you don’t have an account on OpenTox, you should head over to the registration form.
Once you provide your credentials to the SSO server, you acquire an authentication token. This will be used to authenticate yourself against any web service in OpenTox (if needed) and get permission to perform an operation. This is easily accomplished in ToxOtis. You simply provide your credentials to the AuthenticationToken (javadoc :
AuthenticationToken at = new AuthenticationToken("JohnSmith","mysecretPass111");
If you have a password file (read next section), you can use it to acquire an authentication token:
File passwordFile = new File("/path/to/my_sercret.key"); AuthenticationToken at = new AuthenticationToken(passwordFile);
For security reasons, every token has a certain lifetime after which is invalid, so even if someone malvolently obtains your token, will not have access to any OpenTox web service after a certain time. For the same reason, it is considered good practise to invalidate your tokens (i.e. log out) in case you do not intend to use them any more. It is advisable that you add a shutdown hook (javadoc) in your application which will invalidate all tokens before exiting the application. According to the OpenTox specifications, it is up to the client to monitor and manage tokens according to their life time. A collection of methods is available in AuthenticationToken that return the creation timestamp of the token as well as its status. A token is characterized as ACTIVE, INACTIVE or DEAD. A token should be used only if it is ACTIVE. A token is INACTIVE if it has either expired or it has been invalidated and DEAD if it has not yet been initialized (the user was not authenticated). So output of the follwing code:
AuthenticationToken token = new AuthenticationToken("JohnSmith","mysecretPass111"); TokenStatus statusBefore = token.getStatus(); token.invalidate(); TokenStatus statusAfter = token.getStatus(); System.out.println(statusBefore + ", " + statusAfter);
will be:
ACTIVE, INACTIVE
In general, it is not good practise to store unencrypted passwords in your program or in your database. However, if you need to have your username and password stored so that you can easily use them, ToxOtis offers an encryption utility: PasswordFileManager. First of all, you have to create a (private) master key and store it in a file. You should make this file hidden and modify its permissions so that only your application will have access to it. The file should look like this:
--- START MASTER KEY --- fEFWQ1FRUVdSXXxhOVBnazQyKy8vUzRcPWFfM2tmKjE ajoTmn7ieV1qfb3645fFqa2MowkmMmP3Xg0A1gCRjTp D96r3MEhKC89EAfpNG3hIKVxi4JBtyBxWySJIiidJX3 De6mx2tYqTJgyC8g83141qf27p59z5P51lw7VQ8E55n wFr3T53y4WMW1nW5CN77C6oP832C2EtjUwR381ms6T3 P96y1NGm7I78k3sb4efDT462xVVUA8OU461u22T2v78 x3Mt6591855xKP65vQWn730jY889w47h9Fm0h6zYS04 --- END MASTER KEY ---
You should use the password generator of ToxOtis to create a good and valid master key. Here is an example of using PasswordFileManager for this purpose:
Thread createPasswordFile = new Thread() { @Override public void run() { try { PasswordFileManager.CRYPTO. createMasterPasswordFile("/dev/random", "/home/user/toxotisKeys/master.key", 500); } catch (IOException ex) { // Handle the exception properly! } } }; Executors.newFixedThreadPool(1).submit(createPasswordFile);
Here is the output of this method:
----- ToxOtis Pasword Generator ----- Random number generator : /dev/random Password file : /home/chung/toxotisKeys/master.key Password Stength : EXCELENT
The class PasswordFileManager is Observable (source code) (javadoc) so you can monitor the progress of the password generation (That is why we wrap the execution of the master password file cretion in a Thread). This process might take long especially if you choose a good random number generator such as /dev/random on Linux. In case you do not provide a random number source (null), java.security.SecureRandom (source code) (javadoc) will be used instead (This is an RNG implementation by Java). In that case the method will print:
----- ToxOtis Pasword Generator ----- Random number generator : Secure RNG (java.security.SecureRandom) Password file : /home/chung/Desktop/alt.key Password Stength : EXCELECT (100)
While running you can monitor the process of the key creation:
while (true) { if (CRYPTO.hasChanged()) { System.out.println(CRYPTO.getPasswordGenerationProgress()); } if (CRYPTO.getPasswordGenerationProgress()==100){ break; } }
Suppose now that your username is JohnSmith and your password is s3cret. Then you can use the above master password file to create an encrypted file for your credentials:
PasswordFileManager.CRYPTO.setMasterPasswordFile("/home/user/toxotisKeys/master.key"); PasswordFileManager.CRYPTO.createPasswordFile("JohnSmith", "s3cret", "/home/john/.hidpass/.my.key");
This will create a file with your credentials at the specified destination, that is/home/john/.hidpass/my.key (We suggest that this file should be hidden). Your private key will look like the following:
--- START PRIVATE KEY --- /EXEudbuXSmvp2SrNI6iewwq== 2SSiPLZuCMLlz81= --- END PRIVATE KEY ---
Now you can delete the line above which contains your credentials and any line like that and use the generated encrypted file to authenticate your self. Here is an example:
File passwordFile = new File("/home/john/.hidpass/.my.key"); AuthenticationToken at = new AuthenticationToken(passwordFile);
or alternatively:
AuthenticationToken at = PasswordFileManager.CRYPTO.authFromFile("/home/john/.hidpass/.my.key");
Authentication tokens are of high importance in ToxOtis, as they are necessary for most server-client data transactions.
The local status of a token can be retrieved using the method: AuthenticationToken#getStatus(). This checks whether the token has timed out. It is however more reliable to validate your token against an SSO server. For this purpose you should use the method AuthenticationToken#validate() : boolean. The method will return true if the token is valid and false otherwise.
AuthenticationToken at = PasswordFileManager.CRYPTO.authFromFile("/home/john/.hidpass/.my.key"); boolean isValid = at.validate();
If you need to discard your token so that it will not be active any more, you can invalidate it using the method AuthenticationToken#invalidate().
AuthenticationToken at = PasswordFileManager.CRYPTO.authFromFile("/home/john/.hidpass/.my.key"); // ... use your token ... at.invalidate(); // Log out boolean isValid = at.validate(); // isvalid is false
You can use a token to obtain information about the user that created it providing its username and password. These information will be returned as an instance of User (javadoc). Here is an example:
AuthenticationToken at = PasswordFileManager.CRYPTO.authFromFile("/home/john/.hidpass/.my.key"); User user = at.getUser(); System.out.println(user);
This will print:
UID : john
Name : John Smith
Mail : john@smith.org
Pass : {SSHA}FZLdpBMyrOO8SCYU7TeQY1JWAleotAVi7482Users are suggested to invalidate their tokens if they don’t need them anymore in other A&A sessions. What is more, if you need to create a new token, make sure you have invalidated your old one.
Obtaining a new token each time authentication/authorization is required is not good practice both in terms of performance and security. ToxOtis comes with a token management utility (javadoc) that allows for multiple login of different users but restricts a single user from obtaining multiple tokens. Once a user logs in, its token is stored in the pool. In case he/she attempts to login again and the stored token has not expired, then no new token is obtained but the existing one is returned from the method. Here is an example:
TokenPool tokenPool = TokenPool.getInstance(); for (int i =0; i < 10; i++){ tokenPool.login("/path/to/my.key"); } System.out.println(tokenPool.size());
The method will output 1 and not 10!
The following results are based on 30 successive measurements. The measurements were carried out on a Linux machine (2.6.31-22-generic kernel, x86_64 GNU/Linux) with 3.8GB of RAM and an Intel Core 2 Duo CPU P8700 @2.53GHz. The SDK ToxOtis was used to perform the measurements (version 0.4.2.23) which includes Weka version 3.6.2 (latest stable version) and Jena version 2.6.2. These libraries run on a Sun™ JVM, version 1.6.0.20 with Java™ SE Runtime Environment (build 1.6.0.20-b02). All measurements are in milliseconds (ms).
| Average ping time for opensso.in-silico.ch | 57.9 (0% packet dropout) |
| Authentication using file | 131.96 |
| Average Invokation time for the method validate() in AuthenticationToken | 79.0 |
| Average Invokation time for the method invalidate() in AuthenticationToken | 72.1 |
| Average Invokation time for the method getUser() in AuthenticationToken | 154.4 |
| Authorization | 184.1 |
Using ToxOtis one can parse remote OpenTox entities providing their URI or even OpenTox resources that are stored in some local file. Behind the scenes, ToxOtis downloads and parses an RDF representation of the resource and parses it into some instance of OTOnlineResource. For this purpose, the user is endowed with two tools: The abstract method loadFromRemote defined in OTOnlineResource (doc) and a set of spiders (doc) which are more powerful tools but also require a higher level of acquaintance with RDF and Jena (a library for parsing and editing RDF documents in Java). The ToxOtis API for downloading and parsing OpenTox resources is intertwined with the OpenTox A&A API, so in many cases users will need to provide their authentication token.
All subclasses of OTOnlineResource in ToxOtis, like Compounds, Features, Algorithms and Models can be downloaded from a remote location into some local resource such as a file or a variable (e.g. a String), or in general be directed to some output stream or written to some generic destination using a Write. The prototype methods are:
void download(String destination, Media media, AuthenticationToken token) throws ToxOtisException; void download(OutputStream destination, Media media, AuthenticationToken token) throws ToxOtisException; void download(File destination, Media media, AuthenticationToken token) throws ToxOtisException; void download(Writer destination, Media media, AuthenticationToken token) throws ToxOtisException
This way, one can download the MOL representation of a compound and write it into a file. Here is an example of use:
Compound comp = new Compound(new VRI(Services.IDEACONSULT).augment("compound","10")); File destination = new File("/path/to/file.mol"); comp.download(destination, Media.CHEMICAL_MDLMOL, (AuthenticationToken)null);
Before proceeding to the next sections, users are advised to take a look at the documentation about the implementation of OpenTox components in ToxOtis.
A predefined collection of OpenTox algorithms is available within the class OpenToxAlgorithms (doc). You can load the algorithm data from the remote location using the method loadFromRemote defined in Algorithm (doc). Here is an example:
Algorithm myAlg = new Algorithm(OpenToxAlgorithms.TUM_KNN_CLASSIFICATION.getServiceVri()); // This will load into your object all information found at the remote location: myAlg.loadFromRemote(); System.out.println(myAlg.getMeta());
The above source code will print the following to the System standard output:
identifier : http://opentox.informatik.tu-muenchen.de:8080/OpenTox-dev/algorithm/kNNclassification^^string
title : kNNclassification^^string
description : OpenTox REST interface to the WEKA k-Nearest Neighbor learning algorithm.
Can select appropriate value of K based on cross-validation. Can also do distance weighting.^^string
date : Mon Sep 13 20:19:24 EEST 2010^^dateTime
creator : tobias.girschick@in.tum.de^^stringIf the algorithm is a protected resource you will have to authenticate yourself against that algorithm service providing an authentication token (doc). Here is an example:
Algorithm myAlg = new Algorithm(OpenToxAlgorithms.NTUA_MLR.getServiceVri()); AuthenticationToken at = PasswordFileManager.CRYPTO.authFromFile("./.secret/.my_secret.key"); // This will load into your object all information found at the remote location: myAlg.loadFromRemote(at);
The following example illustates how to use a Dataset Spider (doc) to download and parse a dataset from a remote server:
VRI vri = new VRI(Services.IDEACONSULT.augment("dataset","5")); // Require that the dataset will contain no more than 10 compounds final int size = 10; vri.addUrlParameter("max", size); DatasetSpider spider = new DatasetSpider(vri); Dataset ds = spider.parse();
Now we can use this Dataset object to inspect its dataentries and values:
DataEntry de = ds.getDataEntries().get(2); FeatureValue fv = de.getFeatureValue(0); System.out.println(de.getConformer().getUri()); System.out.println(fv.getFeature().getUri() + " = " + fv.getValue());
The above code will print the following message to the System’s standard output:
http://apps.ideaconsult.net:8080/ambit2/compound/2554/conformer/327497 http://apps.ideaconsult.net:8080/ambit2/feature/20083 = 100-01-6^^string
Alternatively you can of course use the implementation of the method loadFromRemote() in Dataset (doc). Here is an example:
VRI vri = new VRI(Services.AMBIT_UNI_PLOVDIV.augment("dataset","9")); Dataset ds = new Dataset(vri); ds.loadFromRemote();
This will parse into the object ds the data downloaded from the URI: ambit.uni-plovdiv.bg:8080/ambit2/dataset/9.
Error Reports (code) (doc) are part of the OpenTox API since version 1.1. Error Reports define a formal way to handle exceptional situations while invoking a service or during inter-service communication thus facilitating debugging. They are sufficiently documented online at opentox.org/dev/apis/api-1.1/Error Reports. The parsing of Error Reports is carried out quite the same way as the entities mentioned above. The only difference with Error Reports is the the URL that hosts the error report differs from the IRI that describes the report in the RDF graph returned. So, if you choose to use a spider for parsing an Error Report you have to be careful with the initialization: The standard constructor for a spider ErrorReportSpider(Resource resource, Model model) will probably throw an error if you provide the wrong resource. This is why, you should prefer the constructor ErrorReportSpider(URI actorUri, Model ontModel) (doc) where you provide the URI of the actor of the exception and not the RDF node straightforward! Here is an example to obfuscate any misunderstanding:
VRI uri = new VRI(Services.NTUA.augment("algorithm", "mlr")); GetClient client = new GetClient(); client.setUri(uri); OntModel model = client.getResponseOntModel(); ErrorReportSpider spider = new ErrorReportSpider(uri, model); ErrorReport er = spider.parse();
Error reports also appear in ToxOtisException (doc). When a ToxOtis Exception is thrown due to some exception thrown by a remote service, the Error Report from that service is incorporated into the exception. Here is an example:
VRI uri = new VRI(Services.NTUA.augment("algorithm", "mlr")); try { new AlgorithmSpider(uri); } catch (ToxOtisException tox) { System.out.println(tox.getRemoteErrorReport()); }
This will print to the System’s output the following text:
URI : http://opentox.ntua.gr:3000/errorReport/#2390078396 Actor : http://opentox.ntua.gr:3000/algorithm/mlr Code : AuthenticationFailed Status : 403
This is an example of how a user can download and parse an OpenTox Model (code) (doc) from a remote location:
VRI vri = new VRI(Services.NTUA.augment("model","f9a97443-6baf-4361-a55c-b08cf12c3e39")); ModelSpider mSpider = new ModelSpider(vri); Model m = mSpider.parse();
The above code downloads the model from opentox.ntua.gr and creates the object m: Model. The same can be accomplished using a Model object exclusivle. Here is an alternative way:
VRI vri = new VRI(Services.tumDev().augment("model","TUMOpenToxModel_j48_7")); Model m = new Model(vri); m.loadFromRemote();
A Task is parsed as simply as any other OpenTox component. You simply have to provide its URI and invoke the method loadFromRemote() or, in case authentication is needed, loadFromRemote(AuthenticationToken). Here is an example of use:
VRI vri = new VRI("http://opentox.ntua.gr:3000/task/0fc060a0-f69b-4a81-bb2e-b9b32c8a04b3"); Task t = new Task(vri).loadFromRemote();
Compounds, as these are represented in OpenTox, do not provide much information that could be parsed from their RDF representation, so the API for Compounds is formulated in a way that meaningful information would be returned to the user. First of all, users can obtain the Set of conformers (if any) that it groups and delegates. In cases where 3D characteristics of the compound are not taken into account, conformers do not play a particular role, otherwise the exact conformer has to be determined. The set of these conformers is available using the method Set<Conformer> listConformers(AuthenticationToken token) throws ToxOtisException. What is more, one can download the compound and store it in some supported chemical media type like sdf or mol as it was explained in the previous section. Here is an example of downloading the SD file of a given compound:
Compound c =new Compound(Services.IDEACONSULT.augment("compound","100")); c.download(new File("/path/to/file.sdf"), Media.CHEMICAL_MDLSDF, null);
The compound and conformer APIs in ToxOtis are taking just their first steps so there is not much functionality in there yet.
ToxOtis can be used in combintation with Weka (stable version 3.6.2), a well known open source machine learning package for Java. Using ToxOtis you can convert your datasets into instances of weka.core.Instances which in turn can be used in some filtering, training or other data processing procedure. Here is an example of downloading a dataset and creating a corresponding weka.core.Instances object.
VRI vri = new VRI(Services.IDEACONSULT.augment("dataset","9")); Dataset ds = new Dataset(vri); ds.loadFromRemote(); weka.core.Instances data = ds.getInstances();
The above code will print the following to the System output:
@relation http://apps.ideaconsult.net:8080/ambit2/dataset/54
@attribute compound_uri string
@attribute http://apps.ideaconsult.net:8080/ambit2/feature/22202 numeric
@attribute http://apps.ideaconsult.net:8080/ambit2/feature/22197 string
@attribute http://apps.ideaconsult.net:8080/ambit2/feature/22201 numeric
@attribute http://apps.ideaconsult.net:8080/ambit2/feature/22196 string
@attribute http://apps.ideaconsult.net:8080/ambit2/feature/22200 numeric
@attribute http://apps.ideaconsult.net:8080/ambit2/feature/22198 numeric
@attribute http://apps.ideaconsult.net:8080/ambit2/feature/22199 numeric
@data
http://apps.ideaconsult.net:8080/ambit2/compound/261/conformer/419588,...
113.730003,chloramphenicol,3.7508,Molecule-1,-4.69,0.2812,1.14
http://apps.ideaconsult.net:8080/ambit2/compound/116508/conformer/419581,...
54.27,artemisinin,2.746,Molecule-1,-4.52,0.0667,2.22
...As you can notice, the Instances object has a structure that retains the links (URIs) to the dataset from which it was created and the feature URIs. Unfortunately, Instances objects are just data wrappers and were not designed to serve as data models so all meta information about the dataset and its contained features and compounds will not be found in this object.
In the following table, the computational times needed to convert a Dataset object into an instance of weka.core.Instaces are summarized. These results are based on 10 successive measurements. The measurements were carried out on a Linux machine (2.6.31-22-generic kernel, x86_64 GNU/Linux) with 3.8GB of RAM and an Intel Core 2 Duo CPU P8700 @2.53GHz. The SDK ToxOtis was used to perform the measurements (version 0.1.1.13) which includes Weka version 3.6.2 (latest stable version) and Jena version 2.6.2. These libraries run on a Sun™ JVM, version 1.6.0.20 with Java™ SE Runtime Environment (build 1.6.0.20-b02).
Table 1. Measurements on fragments of the dataset created from http://apps.ideaconsult.net:8080/ambit2/dataset/9 with 21 features and up to 1000 chemical compounds.
| No. Compounds | Avergage time (ms) |
|---|---|
| 100 | 2670 |
| 200 | 4896 |
| 500 | 10959 |
| 800 | 18661 |
| 1000 | 21132 |
Table 2. Measurements on fragments of the dataset created from http://apps.ideaconsult.net:8080/ambit2/dataset/10 with 60 features and up to 1000 chemical compounds.
| No. Compounds | Avergage time (ms) |
|---|---|
| 100 | 5607 |
| 200 | 8622 |
| 500 | 19714 |
| 800 | 31511 |
| 1000 | 41513 |
The reverse process of converting a Weka entity (either an ARFF file or an Instances object) into a ToxOtis Dataset component is accomplished using the static methods of the class DatasetFactory. It takes just one line of source code; here is an example:
Instance myInstances = ...;// This is your object Dataset myDataset = DatasetFactory.createFromArff(myInstances);
You can also use "DatasetFactory":http://github.com/alphaville/ToxOtis/blob/master/src/org/opentox/toxotis/factory/DatasetFactory.java to construct a Dataset object from an ARFF file or some ARFF InputStream (might be also an Input Stream from a remote location). Here is an example:
File myFile = new File("/path/to/my.arff"); Dataset ds = DatasetFactory.createFromArff(myFile);
References related to ToxOtis and Weka
A client can publish some OpenTox component, that is obtain a URI for its resource on some publicly available location, using the POST HTTP method. According to the OpenTox REST API clients create new resources and acquire a URI for them by POSTing an RDF representation of these resources to an appropriate service. There are a couple of notes we need to make here. First, the entity is always POSTed as an RDF document containing all necessary information that describe the OpenTox component in a formal way according to the specifications of the OpenTox ontology . Second, as far as the service response is concerned, the following status codes are possible:
Publishable components in ToxOtis subclass OTPublishable (doc) which defines two abstract methods: Task publishOnline(VRI,AuthenticationToken) and Task publishOnline(AuthenticationToken). We copy here the documentation for the first of these two that allows users to POST their components to a specified server:
/* * Publish the component to a proper server identified by the uri of the * publishing service provided in this method. The resource will be posted to the * server in RDF format (Mediatype: application/rdf+xml). * @param token * Provide an authentication token. If you think that the service does not * require auhtentication/authorization, you can leave this field <code>null</code> or * you can provide an empty authentication token.If the provided URI * already contains an authentication token (as the URL parameter <code> * tokenid</code>) it will be replaced by the new token provided to * this method. * @return * A Task for monitoring the progress of your request. If the service * returns the URI of the resource right away and does not return a task, * then the object you will receive from this method will now have an identifier, * its status will be set to {@link Task.Status#COMPLETED }, its progress * will be set to <code>100%</code> and the URI of the created resource will * be available applying the method {@link Task#getResultUri() } on the returned * task. In any case, the service's response will be wrapped in a {@link Task } * object. * @throws ToxOtisException * In case of invalid credentials, if the POSTed resource is not acceptable * by the remote service (returns a status code 400), communication error * occur with the remote server or other connection problems or the access * to the service was denied (401 or 403). */ public abstract Task publishOnline(VRI vri, AuthenticationToken token) throws ToxOtisException;
Using the ToxOtis API, one can create a new feature and publish it to some feature server. Here is an example:
Feature f = new Feature(); f.setUnits("m^4*mA*s^2*kg^-2"); f.getMeta().setTitle("Toxicity of my city"); f.getMeta().setHasSource("http://otherserver.net:8283/opentox/model/15451"); f.getMeta().setSameAs("http://www.youtube.com/watch?v=WMKmQmkJ9gg"); Task t = f.publishOnline(Services.AMBIT_UNI_PLOVDIV.augment("feature"), null); System.out.println(t.getResultUri());
This will print a feature URI to the standard output of your System.
We can publish a bibliographic reference as we did with features. Here is an example where a BibTeX object (doc) is created and published online:
BibTeX bib = new BibTeX(); // ...Create anonymous bibtex bib.setAuthor("Chung W."); bib.setTitle("The truth about UFOs"); bib.setVolume(100); bib.setJournal("International Journal of Conspiracy Theory"); bib.setCrossref("http://localhost:3000/bibtex/549a9f40-9758-44b3-90fe-db31fe1a1a01"); bib.setBibType(BibTeX.BIB_TYPE.Article); Task t = bib.publishOnline(Services.NTUA.augment("bibtex")), null);
POSTing a dataset always creates a new resource. A task URI is usually returned to the the client (with HTTP status 202) for monitoring the progress of the uploading. In the following example a dataset is downloaded from a remote server and POSTed to some other dataset server. Particularly, only the first 5 compounds of the dataset are requested using the URL query ?max=N.
VRI vri = new VRI(Services.IDEACONSULT.augment("dataset", "54").addUrlParameter("max", "5")); Dataset ds = new Dataset(vri).loadFromRemote(); Task t = ds.publishOnline(Services.AMBIT_UNI_PLOVDIV.augment("dataset"), null); System.out.println(t.getHasStatus()); while (t.getHasStatus().equals(Task.Status.RUNNING)) { t.loadFromRemote(); Thread.sleep(100); } System.out.println(t.getResultUri());
The above example will POST the dataset as application/rdf+xml to the dataset server at ambit.uni-plovdiv.bg and monitor the returned task. The dataset http://ambit.uni-plovdiv.bg:8080/ambit2/dataset/64 has been created by running the above example code.
In some cases it might be more convenient to get the background job as a Future<VRI> instead of as a Task or even assign the background job to a certain ExecutorService of Java. So, ToxOtis provides the method publish(VRI, AuthenticationToken): Future<VRI> that returns a Feature<VRI> for a Callable that runs on Single Thread Executor. Here is an example:
VRI vri = new VRI(Services.ideaconsult().augment("dataset", "54").addUrlParameter("max", "5")); Dataset ds = new Dataset(vri).loadFromRemote(); Future<VRI> t = ds.publish(Services.ambitUniPlovdiv().augment("dataset"),(AuthenticationToken)null); System.out.println(t.get());
As already metntioned, users can assign the task to a certain ExecutorService (doc) :
ExecutorService myExecutor = Executors.newFixedThreadPool(10); VRI vri = new VRI(Services.ideaconsult().augment("dataset", "54").addUrlParameter("max", "5")); Dataset ds = new Dataset(vri).loadFromRemote(); Future<VRI> t = ds.publish(Services.ambitUniPlovdiv().augment("dataset"), (AuthenticationToken)null, myExecutor); while (!t.isDone()){ // Do something while waiting for the result } vri result = t.get();
Publishing of compounds works exactly the same as it does for datasets and bibTex entries as already mentioned. However in case of compounds, users are additionaly able to create new compounds using SD files, MOL files, SMILES and other chemical formats. Such files are well known on the planet of chemo and bioinformatics and lots of scientists maintain databases of such files. The related method lies in CompoundFactory and is static since it is not an operation applied on some specific instance of Compound but in general creates a new compound on an online service. The output of this method is a Task that allows us to monitor the progress of upload. Here is an example of use where we create a new compound from an SD file:
File myFile = new File("/path/to/opentoxin.sdf"); CompoundFactory factory = CompoundFactory.getInstance(); Task task = factory.publishFromFile(f, Media.CHEMICAL_MDLSDF.getMime(), (AuthenticationToken)null);
In this section we present the functionalities provided by the methods of the factory classes of ToxOtis that reside in the package org.opentox.toxotis.factory. They contain static methods that either create OTComponent objects or (for the sake of simplicity and performance) just return a URI or a collection of such. Note that for the same reason, these classes don’t follow to the letter the Factory Design Pattern but in some cases return just pointers to the objects they create (i.e. in our case, their URLs/URIs). In the next sections we will go through each factory providing examples of use:
The package org.opentox.toxotis.factory includes 3 factory classes:
The method #listAllFeatures in FeatureFactory returns all features stored in a specified remote feature service. The method returns a list of URIs of the features. Users can subsequently use there URIs to download and parse some of these features if necessary. The maximum number of returned URIs can be prespecified to avoid huge lists of URIs (see also). Here is a simple example:
Set<VRI> featureUris = FeatureFactory.listAllFeatures(Services.ambitUniPlovdiv().augment("feature"), 10, null);
The above method will return a list of a maximum length equal to 10. If one needs all features, it suffices to set the max parameter to -1, that is:
Set<VRI> allFeatureUris = FeatureFactory.listAllFeatures(Services.ambitUniPlovdiv().augment("feature"), -1, null);
If paging is supported by the remote service, then you can specify the page length and page index while getting the list of features:
Set<VRI> featureUris = FeatureFactory.listAllFeatures(Services.ambitUniPlovdiv().augment("feature"), 3, 10, null);
The above code means that you request for the 3rd page of length 10. Here is a possible list of features:
http://apps.ideaconsult.net:8080/ambit2/feature/20089 http://apps.ideaconsult.net:8080/ambit2/feature/20088 http://apps.ideaconsult.net:8080/ambit2/feature/20087 http://apps.ideaconsult.net:8080/ambit2/feature/20086 http://apps.ideaconsult.net:8080/ambit2/feature/20085 http://apps.ideaconsult.net:8080/ambit2/feature/20084 http://apps.ideaconsult.net:8080/ambit2/feature/20091 http://apps.ideaconsult.net:8080/ambit2/feature/20090 http://apps.ideaconsult.net:8080/ambit2/feature/20093 http://apps.ideaconsult.net:8080/ambit2/feature/20092
You can perform a database lookup on a remote feature service providing your search criteria in a very convenient way. For example, say you need to get a list of all features that are owl:sameAs the dissociation constant pKa (otee:Dissociation_constant_pKa). Then using the method lookupSameAs, one has:
Set<VRI> features = FeatureFactory.lookupSameAs(OTEchaEndpoints.DissociationConstantPKa(), null); for (VRI f : features) { System.out.println(f.toString()); }
A list of all ECHA endpoints is provided from OTEchaEndpoints (javadoc) and a collection of some common features is available through OTFeatures (javadoc) .
This factory allows also for new features to be easily created and POSTed to a feature service for publication in a single line of code. Proper authentication/authorization are required most of the times. The invocation of the corresponding method is especially useful when developing model training web services where a prediction feature needs to be created for the model. Here is an example:
Model m = ...; Feature predictedFeature = FeatureFactory.createAndPublishFeature( "Feature created as prediction feature for the RBF NN model "+m.getUri(), new ResourceValue(m.getUri(), OTClasses.Model()), featureService, token);
DatasetFactory (doc) is a class with static methods that facilitates dataset creation and conversion from ARFF files and weka.core.Instances objects into Datasets. Using as a source a File, an InputStream, a Reader or an Instances object, a new Dataset can be created. More, there is a method to create a single DataEntry instance out of a weka.core.Instance object. Here we will provide two examples that users might stubmle across their development. First, the ordinary case of reading from a file :
String filePath = "/path/to/your_file.arff"; // << You path here! java.io.File file = new java.io.File(filePath); Dataset myDataset = DatasetFactory.createFromArff(file);
The above source code will generate a Dataset out of the given ARFF file or will throw a ToxOtisException in case the ARFF file you provided is not compliant with the ToxOtis requirements. For more details, please read the ToxOtis Documentation about weka
The second use case concerns the creation of a Dataset object out of an online resource where the ARFF file is available with content negotiation when the client specifies the Header ‘Accept: text/x-arff’. Here is an example:
Dataset myDataset = null; IGetClient client = ClientFactory.createGetClient(null); client.setMediaType(Media.WEKA_ARFF); try { int code = client.getResponseCode(); if (code == 200) { InputStream stream = client.getRemoteStream(); myDataset =DatasetFactory.createFromArff(stream); } else { // Handle Exceptional Event } } catch (IOException ex) { // Handle Exceptional Event } finally { try { client.close(); } catch (IOException ex) { // Cannot close client... } }
ToxOtis Source Code and Downloadables
Documentation
Implementations by NTUA
Publications