|
|
|
Contents: |
|
|
|
Related content: |
|
|
|
Subscriptions: |
|
|
| A Java-based Life Sciences Identifier authority consolidates
biological data resources
Stefan
Atev, Programmer, IBM Ben
Szekely (mailto:bhszekel@us.ibm.com?cc=youngt01@us.ibm.com&subject=Build
an LSID Resolution Service using the Java language), Software
Engineer, IBM
27 May 2003 Updated 03 March 2004
We take you through a step-by-step
approach to building a Java®-based Life Sciences Identifier (LSID)
authority from scratch. We demonstrate how to build this on a minimal
data set and on data downloaded from the protein sequence database
Swiss-Prot, all on the Linux platform.
The amount of biological data being created today is mind-boggling. As
a biologist or bioinformaticist, you probably know of places around the
network that provide very useful resources for your task at hand -- but
remembering the different ways to access this information is often a
productivity drain. Maybe you write a few Perl scripts or know someone who
will provide you with some code for this or a procedure for that. At this
point, you may be thinking that coming up with a common way of naming and
finding this data is the only way you will be able to remain a biologist
and not a programmer. Of course, the value of having a common way to
identify data extends beyond bioinformatics, but for this article we will
stay within the life sciences.
The Life Sciences Identifier
(LSID) is an I3C Uniform Resource Name (URN) specification in
progress. You can read more about the specification at the I3C (see Resources for a link).
Conceptually, LSID is a straightforward approach to naming and identifying
data resources stored in multiple, distributed data stores in a manner
that overcomes the limitations of the naming schemes that are in use
today.
An LSID resolver is a
software system that implements an agreed-upon LSID resolution protocol to
allow higher-level software to locate and access the data uniquely named
by any LSID URN. The "server" side of this resolver solution is called an
LSID authority. The
client stacks and an example client, the LSID LaunchPad, are provided by
the LSID Resolution Protocol Project.
In this article, you'll see how to create your own LSID Authority using
the LSID resolver stack for the Java language.
Getting started This
article assumes that you have the necessary administrative privileges on
the system that will house the authority (most likely you will need root access for some of
the steps).
All the steps in this article were tested on Red Hat Linux 7.1 and Red
Hat Linux 8. Java JDK versions 1.3.1 and 1.4.0 were tested. Jakarta Tomcat
4.1.18 was used. The sample code works with the IBM® WebSphere® software
platform as well.
Prerequisites First,
you need access to a system capable of running the Jakarta Tomcat 4 Web
server, Java2 (JDK 1.3.1 and up recommended), as well as a database engine
such as MySQL 3.23.x.
Required Java
packages The Java LSID client/server stacks need several
Java packages to be installed first:
Copy the .jar files to your Jakarta Tomcat shared/lib directory, or
alternatively, make sure they are available to your Java runtime engine
through the system class path.
If you opt to set up the sample authority using the Swiss-Prot data set
for your own testing purposes, you will also need the file
mysql-connector-java-x.x.x-bin.jar from the MySQL Connector/J distribution
available from MySQL AB (see Resources for a link).
You do not need the latest version of the JDBC drivers; the LGPL licensed
version 2.0.14 would do. This module is used by the sample authority
server to access a MySQL database containing the Swiss-Prot data and also
needs to go into your Jakarta Tomcat shared/lib directory (or be in the
system class path).
Installing the LSID
package Once you have downloaded the prerequisites, get the
latest version of the LSID Java Client/Server stack (1.0.1 at the time of
this writing). Obtain the binary LSID server
distribution, the binary LSID client
distribution, and copy the files lsid-client.jar and lsid-server.jar
into your Jakarta Tomcat shared/lib directory.
The Java LSID server package provides a set of servlets and a
simplified interface for quickly creating LSID authorities, as well as
fully featured LSID resolution services.
Hello World, your first LSID
authority Before we go any further, let's implement an
authority that only knows about one LSID:
urn:lsid:ibm.com:hello:world . The parts of this particular
LSID are:
ibm.com -- the domain of
the issuing authority
hello -- the namespace of
the LSID
world -- the object id of
the LSID
The easiest way to implement the authority is to extend the
com.ibm.lsid.server.impl.SimpleAuthority class, which will
get used by the standard authority servlet implemented by
com.ibm.lsid.server.AuthorityServlet . The methods we need to
provide/override are:
initService
getDataLocations
getMetaDataLocations
The authority will not provide data or metadata services, but will
simply describe the locations where data about
urn:lsid:ibm.com:hello:world can be retrieved.
You can get the code by downloading lsid-java-samples.tar.gz
and extracting HelloWorldAuthority.java or the WAR file
helloworld.war. Listing 1. Hello, code
01 package lsidsamples;
02
03 import com.ibm.lsid.LSID;
04 import com.ibm.lsid.MalformedLSIDException;
05 import com.ibm.lsid.ExpiringResponse;
06 import com.ibm.lsid.wsdl.LSIDDataPort;
07 import com.ibm.lsid.wsdl.LSIDMetadataPort;
08 import com.ibm.lsid.server.LSIDServiceConfig;
09 import com.ibm.lsid.server.LSIDServerException;
10 import com.ibm.lsid.server.impl.SimpleAuthority;
11 import com.ibm.lsid.wsdl.HTTPLocation;
12 import com.ibm.lsid.wsdl.FTPLocation;
13
14 public class HelloWorldAuthority extends SimpleAuthority {
15
16 public void initService(LSIDServiceConfig config) throws LSIDServerException {
17 }
18
19 public LSIDMetadataPort[] getMetaDataLocations(LSID lsid, String url) {
20 return new LSIDMetadataPort[0];
21 }
22
23 public LSIDDataPort[] getDataLocations(LSID lsid, String url) {
24 return new LSIDDataPort[] {
25 new HTTPLocation(
26 "www.ibm.com", 80, "/lsid/hello_world"
27 ),
28 new FTPLocation(
29 "ftp.ibm.com", "/lsid/hello_world.txt"
30 )
31 };
32 }
33 }
|
Dissecting "Hello
World" Line 01 specifies that our authority's
implementation will be a part of the lsidsamples package.
Lines 03 - 12 import the classes and interfaces we need to
implement the authority. We will use the class
com.ibm.lsid.server.impl.SimpleAuthority as the base for our
lsidsamples.HelloWorldAuthority implementation (line
14 ).
Lines 16 - 17 implement the initService
method, which will be called upon authority startup. Since we do not need
to save any configuration options (accessible through
LSIDServiceConfig ), we can choose to do nothing.
The function getMetaDataLocations (lines 19 -
21 ) takes an LSID object as a parameter and returns an array of
locations where metadata services about that LSID are available. Since we
implement no metadata service in this example, the method returns an array
with length 0 (returning null would have indicated an
error).
The function getDataLocations is very similar to
getMetaDataLocations , but this time we return an array
providing two possible locations for the data: the hypothetical URLs
http://www.ibm.com:80/lsid/hello_world and
ftp://ftp.ibm.com/lsid/hello_world.txt.
Configuring the
authority To configure the authority, we must provide a
deployment descriptor that gives the servlet a mapping from LSID to
service implementation. The XML in Listing 2 defines a mapping called
hello-world that applies to all LSIDs with authority ibm.com and namespace
hello. The services section of the XML binds this mapping to our authority
implementation.
You can find the deployment descriptor in the file
webapps/helloworld/services/hello-world.xml: Listing 2. Service configuration
<?xml version="1.0" encoding="UTF-8"?>
<deployment-descriptor xmlns="http://www.ibm.com/LSID/Standard/rsdl">
<maps>
<map name="hello-world">
<pattern auth="ibm.com" ns="hello" />
</map>
</maps>
<services>
<service name="aLSID">
<components>
<auth map="hello-world" type="class">lsidsamples.HelloWorldAuthority</auth>
</components>
</service>
</services>
</deployment-descriptor>
|
Running and testing the
authority To test the authority, you must first deploy it.
Copy the file helloworld.war into your Jakarta Tomcat webapps directory.
The file will be extracted into webapps/helloworld upon Tomcat startup,
and your Hello World authority will be available at
http://localhost:8080/helloworld/.
You can find the description of the authority servlets in the file
webapps/helloworld/WEB-INF/web.xml: Listing 3.
Servlet configuration
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE web-app PUBLIC "-//Sun Microsystems,
Inc.//DTD Web Application 2.3//EN"
"http://java.sun.com/dtd/web-app_2_3.dtd">
<web-app id="WebApp">
<display-name>Hello World LSID Authority</display-name>
<servlet>
<servlet-name>AuthorityService</servlet-name>
<display-name>Hello World Authority Servlet</display-name>
<servlet-class>com.ibm.lsid.server.servlet.AuthorityServlet</servlet-class>
</servlet>
<servlet-mapping>
<servlet-name>AuthorityService</servlet-name>
<url-pattern>/</url-pattern>
</servlet-mapping>
</web-app>
|
We did not have to write the Java servlet ourselves, since the standard
com.ibm.lsid.server.servlet.AuthorityServlet does all we
need. All authority services must implement the
com.ibm.lsid.server.LSIDAuthorityService interface. Our
sample authority, lsidsamples.HelloWorldAuthority , implements
this interface by virtue of extending
com.ibm.lsid.server.impl.SimpleAuthority . When
AuthorityServlet is loaded, it will instantiate
HelloWorldAuthority and will subsequently use the
getMetaDataLocations and getDataLocations calls
to get the information necessary to build the WSDL response for the
standard LSID authority method getAvailableServices .
To test the authority, use the TestClient.java sample client program.
The compiled class file for it is in the extracted samples directory, in
the file test-client.jar. Enter the following command:
java TestClient urn:lsid:ibm.com:hello:world \
http://localhost:8080/helloworld/
|
You will need the .jar files from your Tomcat shared/lib directory in
the class path, together with samples/test-client.jar. The first parameter
to TestClient is the LSID to test with, and the second
temporarily maps the authority service for ibm.com to
http://localhost:8080/helloworld/, where the Hello World authority is
running. The expected output is:
Data is available at:
(ftp) ftp://ftp.ibm.com/lsid/hello_world.txt
(http) http://www.ibm.com:80/lsid/hello_world
|
Using the Swiss-Prot data
set Swiss-Prot records contain an ID and an AC field,
corresponding to a human-readable identifier of the record and its
accession number. The sample Swiss-Prot authority that we'll implement
will understand LSIDs of these various forms:
Table 1. Supported LSIDs
Sample LSID |
Description |
urn:lsid:example.org:swiss-id:hv20_mouse |
An abstract LSID containing no data that represents the
Swiss-Prot record with ID HV20_MOUSE. The
LSID is related to the concrete representations of this Swiss-Prot
record in various formats. |
urn:lsid:example.org:swiss-id:hv20_mouse-sprot |
A concrete LSID naming the data about the Swiss-Prot record with
ID HV20_MOUSE in
Swiss-Prot format. |
urn:lsid:example.org:swiss-id:hv20_mouse-fasta |
A concrete LSID naming the data about the Swiss-Prot record with
ID HV20_MOUSE in
FASTA format. The actual conversion to FASTA will be done on the fly
and is left as an exercise for the reader. |
urn:lsid:example.org:swiss-ac:p01879 |
An abstract LSID containing no data that represents the
Swiss-Prot record with primary or secondary accession number P01879. The LSID
is related to the concrete representations of this Swiss-Prot record
in various formats. |
urn:lsid:example.org:swiss-ac:p01879-sprot |
A concrete LSID naming the data about the Swiss-Prot record with
accession number P01879 in
Swiss-Prot format. |
urn:lsid:example.org:swiss-ac:p01879-fasta |
A concrete LSID naming the data about the Swiss-Prot record with
accession number P01879 in FASTA
format. The actual conversion to FASTA will be done on the
fly. |
Obtaining the data
set You can download the Swiss-Prot database as a compressed
file from expasy.org via FTP (see Resources for a link).
Note that this involves some 63 MB to be transferred. Save the
sprot40.dat.gz file in a convenient location (~/lsid). You will then need
to extract it using the gunzip program:
cd ~/lsid
gunzip -d sprot40.dat.gz
|
Once you have done this, you should have a file called sprot40.dat in
your lsid directory. The file format of the database is described in the
Swiss-Prot user manual (again, please see Resources for a
link).
Importing the data into a MySQL
database The first task ahead of us now is to create a MySQL
user account to be used by the LSID authority and to create the necessary
data tables. If the MySQL daemon is not running, start it up
(etc/init.d/mysqld start as root on Red Hat Linux 7 and 8) and start the
MySQL client as the root user by typing mysql -u root -p .
Enter the appropriate password for the root MySQL user and enter the
following: Listing 4. Creating a user
account
create database sprot4;
grant all on sprot4.*
to lsiduser@localhost identified by 'none';
grant all on sprot4.*
to lsiduser@localhost.localdomain identified by 'none';
grant all on sprot4.*
to lsiduser@'%' identified by 'none';
use sprot4;
create table byid (
id varchar(40) unique,
version varchar(40),
rootac varchar(40) unique,
index(version)
);
create table byac (
ac varchar(40) unique,
rootac varchar(40),
index(rootac)
);
create table acdata (
rootac varchar(40) unique,
data blob
);
|
If you want to save yourself some typing, get the mysql.batch1 file
from the samples directory and run the command mysql -f -u root -p
< mysql.batch1 . A user account for "lsiduser" with password
"none" will be created with access to the database sprot4. The three
tables that we will create are byid, byac, and byacdata:
Table 2. Table byid
Field (column
name) |
Description |
id |
Unique identifier for LSIDs with namespace swiss-id. Up to 40
characters long. For example, id will contain the
string HV20_MOUSE for the
LSID urn:lsid:example.com:SWISS-ID:HV20_MOUSE . |
version |
An optional (may be NULL) version string of up to 40 characters.
For the LSID
urn:lsid:example.com:SWISS-ID:HV20_MOUSE:version2 , this
field will contain the value version2. We will
not use this field for our example. |
rootac |
The primary Swiss-Prot accession number for the corresponding
LSID. For the LSID
urn:lsid:example.com:SWISS-ID:HV20_MOUSE , this field
will contain the value P01789. This is
the primary field by which we will access the data about this
LSID. |
Table 3. Table byac
Field (column
name) |
Description |
ac |
A Swiss-Prot accession number (up to 40 characters). Secondary
accession numbers (such as P01234) can be
here and will correspond to object IDs in the swiss-ac
namespace. |
rootac |
The primary Swiss-Prot accession number for the corresponding
accession number. For the LSID
urn:lsid:example.com:SWISS-AC:P01234 , this field will
contain the value P08751. This is
the primary field by which we will access the data about this
LSID. |
Table 4. Table acdata
Field (column
name) |
Description |
rootac |
A primary Swiss-Prot accession number used to identify a
Swiss-Prot record. |
data |
A binary data object containing the Swiss-Prot record
corresponding to rootac in Swiss-Prot format. This is the actual
data that will be returned for the LSID in
question. |
Loading the data into
MySQL Before we import the data into the newly created
tables, we must extract it from the flat file sprot40.dat. You can use the
Perl script extract.pl from the samples directory downloaded earlier to do
that. The commands are:
cd ~/lsid
perl extract.pl sprot40.dat byid.txt byac.txt acdata.txt
|
This will certainly take some time as the data set is fairly large. You
must also have sufficient disk space to hold the data throughout the
import process. Once extract.pl has finished its job, you can start the
MySQL client as user "lsiduser" by typing mysql -u lsiduser
-pnone and enter the following commands:
use sprot4;
load data local
infile 'byid.txt' into table byid;
load data local
infile 'byac.txt' into table byac;
load data local
infile 'acdata.txt' into table acdata;
|
This process will also take some time. Once you are done, you can
delete the files byid.txt, byac.txt, and acdata.txt. If you want to save
yourself the typing, get mysql.batch2 from the samples directory and run
mysql -f -u lsiduser -pnone < mysql.batch2 .
The Java code We can
now take a look at a less trivial LSID authority. We will provide an
authority service for resolving LSIDs, a data service for both Swiss-Prot
and FASTA formatted data records, as well as a metadata service with
support for a limited amount of metadata. Before we visit the core
authority code, let's take a cursory look at the support routines in
SampleLSIDDataLookup.java, located in the samples archive. Listing 5. Looking up data with Java
package com.ibm.lsid.samples;
import java.io.InputStream;
import java.io.IOException;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.sql.Statement;
import com.ibm.lsid.LSID;
import com.ibm.lsid.MalformedLSIDException;
import com.ibm.lsid.server.LSIDServerException;
public class SampleLSIDDataLookup {
...
public SampleLSIDDataLookup() throws LSIDServerException {
...
}
public int lsidType(LSID lsid) throws LSIDServerException {
...
}
public InputStream lsidData(LSID lsid) throws LSIDServerException {
...
}
}
|
The implementation of lsidType and lsidData
is inconsequential; what they basically do is return the type of an LSID
(UNKNOWN, ABSTRACT, or CONCRETE) and the data associated with it as an
InputStream object. Appropriate
LSIDServerException exceptions are thrown if an error is
detected.
The core authority functionality is implemented by the class
SampleLSIDAuthorityMain : Listing 6.
The SampleLSIDAuthorityMain class
package com.ibm.lsid.samples;
import java.util.regex.Pattern;
import java.util.regex.Matcher;
import com.ibm.lsid.LSID;
import com.ibm.lsid.LSIDException;
import com.ibm.lsid.ExpiringResponse;
import com.ibm.lsid.wsdl.LSIDDataPort;
import com.ibm.lsid.wsdl.LSIDMetadataPort;
import com.ibm.lsid.wsdl.LSIDWSDLWrapper;
import com.ibm.lsid.server.LSIDServerException;
import com.ibm.lsid.server.LSIDServiceConfig;
import com.ibm.lsid.wsdl.HTTPLocation;
import com.ibm.lsid.wsdl.SOAPLocation;
import com.ibm.lsid.server.impl.SimpleAuthority;
public class SampleLSIDAuthorityMain extends SimpleAuthority {
private SampleLSIDDataLookup lookup = null;
public void initService(LSIDServiceConfig cf) throws LSIDServerException {
lookup = new SampleLSIDDataLookup();
}
public LSIDMetadataPort[] getMetaDataLocations(LSID lsid, String url) {
if (lookup == null)
return null;
int lsType;
try {
lsType = lookup.lsidType(lsid);
}
catch (LSIDServerException ex) {
ex.printStackTrace();
lsType = SampleLSIDDataLookup.UNKNOWN;
}
if (lsType == SampleLSIDDataLookup.UNKNOWN)
return null;
HostDescriptor hd = new HostDescriptor(url);
return new LSIDMetadataPort[] {
new SOAPLocation(
hd.baseURL + "metadata"
)
};
}
public LSIDDataPort[] getDataLocations(LSID lsid, String url) {
if (lookup == null)
return null;
int lsType;
try {
lsType = lookup.lsidType(lsid);
}
catch (LSIDServerException ex) {
ex.printStackTrace();
lsType = SampleLSIDDataLookup.UNKNOWN;
}
if (lsType == SampleLSIDDataLookup.UNKNOWN)
return null;
if (lsType == SampleLSIDDataLookup.ABSTRACT)
return new LSIDDataPort[0];
HostDescriptor hd = new HostDescriptor(url);
return new LSIDDataPort[] {
new SOAPLocation(
hd.baseURL + "data"
),
new HTTPLocation(
hd.host, hd.port,
hd.pathPrefix + "/authority/data"
)
};
}
private static final Pattern HOST_PTN = Pattern.compile(
"https?://([^/:]+)(?::(\\d+))?(.*)/authority(.*)"
);
/* Q&D implementation */
private class HostDescriptor {
public String host;
public int port;
public String pathPrefix;
public String baseURL;
public HostDescriptor(String url) {
host = "localhost";
port = -1;
pathPrefix = "";
if (url != null || url.length() > 0) {
Matcher m = HOST_PTN.matcher(url);
if (m.lookingAt()) {
host = m.group(1);
if (m.group(2).length() > 0)
port = Integer.parseInt(m.group(2));
pathPrefix = m.group(3);
}
}
if (port > 0)
baseURL = "http://" + host + ":" + port +
pathPrefix + "/authority/";
else
baseURL = "http://" + host + pathPrefix + "/authority/";
}
}
}
|
All we do in the initService function is prepare a
SampleLSIDDataLookup object, which we will use to verify the
existence of LSIDs we are asked to resolve.
The first crucial method is getMetaDataLocations . It will
be called by the authority servlet when SOAP requests for the
getAvailableServices service method are handled. After
verifying the existence of the given LSID, we return an array containing a
single location: the endpoint of our metadata service. The following line
needs some elaboration:
new SOAPLocation(
hd.baseURL + "metadata"
)
|
The SOAPLocation class is a concrete implementation of the
LSIDMetadataPort interface specialized for SOAP endpoints.
The argument to the constructor is the fully qualified URL of the metadata
service being exposed, which we construct using methods in the private
class HostDescriptor .
The getDataLocations method is very similar in appearance.
Instead of specifying locations for metadata, it specifies locations where
data associated with an LSID can be obtained. Both
SOAPLocation and HTTPLocation are concrete
implementations of the LSIDDataPort interface.
SOAPLocation takes only a fully qualified URL as its
argument, while HTTPLocation expects a host name, a data
port, and a path to the data.
The next piece of our LSID resolution server is the data service. We
implement the LSIDDataService interface and pass it as a
parameter to the DataServlet servlet class provided by the
LSID package. Listing 7. The data
service
package com.ibm.lsid.samples;
import java.io.InputStream;
import com.ibm.lsid.LSID;
import com.ibm.lsid.server.LSIDDataService;
import com.ibm.lsid.server.LSIDServerException;
import com.ibm.lsid.server.LSIDServiceConfig;
public class SampleLSIDAuthorityData implements LSIDDataService {
private SampleLSIDDataLookup lookup = null;
public InputStream getData(LSID lsid) throws LSIDServerException {
if (lookup == null)
throw new LSIDServerException(500, "Cannot query database");
return lookup.lsidData(lsid);
}
public InputStream getDataByRange(LSIDRequestContext ctx, int start, int length)
throws LSIDServerException {
throw new LSIDServerException
(LSIDServerException.METHOD_NOT_IMPLEMENTED,
"getDataByRange not implemented");
}
public void initService(LSIDServiceConfig cf) throws LSIDServerException {
lookup = new SampleLSIDDataLookup();
}
}
|
The data service implementation is trivial since the heavy lifting is
done in the supporting class SampleLSIDDataLookup . Perhaps
now is the time to point out that FASTA formatted data is being generated
on the fly from the Swiss-Prot records using the small utility class
SwissToFastaConverter available in the samples package. The
method getDataByRange was introduced in the latest version of
the LSID specification. Most implementations will choose to not to
implement this method, deferring chunking functionality to the underlying
protocol.
To complete the package, we must provide an implementation of the
getMetadata and initService . Nothing unusual
happens at initialization time, and getMetadata must simply
generate correct RDF description of the recognized LSIDs. The second
argument to getMetadata is an array of metadata formats that
the client understands. In this example, we ignore these formats. However,
we do return the proper metadata format application/xml+rdf
in the MetadataResponse . Listing 8.
Getting metadata
package com.ibm.lsid.samples;
import java.io.InputStream;
import java.io.ByteArrayInputStream;
import com.ibm.lsid.LSID;
import com.ibm.lsid.MetadataResponse;
import com.ibm.lsid.MalformedLSIDException;
import com.ibm.lsid.server.LSIDMetadataService;
import com.ibm.lsid.server.LSIDServerException;
import com.ibm.lsid.server.LSIDServiceConfig;
import com.ibm.lsid.server.LSIDRequestContext;
public class SampleLSIDAuthorityMetadata implements LSIDMetadataService {
private SampleLSIDDataLookup lookup = null;
public void initService(LSIDServiceConfig cf) throws LSIDServerException {
lookup = new SampleLSIDDataLookup();
}
private static final String RDF_NS=
"http://www.w3.org/1999/02/22-rdf-syntax-ns#";
private static final String DC_NS=
"http://purl.org/dc/elements/1.1/";
private static final String I3CP_NS=
"urn:lsid:i3c.org:predicates:";
private static final String I3C_CONTENT=
"urn:lsid:i3c.org:types:content";
private static final String I3C_SPROT=
"urn:lsid:i3c.org:formats:sprot";
private static final String I3C_FASTA=
"urn:lsid:i3c.org:formats:fasta";
private void appendTripleResource(
StringBuffer src,
String subj, String pred, String obj
) {
src.append("<rdf:Description rdf:about=\"");
src.append(subj);
src.append("<");
src.append(pred);
src.append(" rdf:resource=\")");
src.append(obj);
src.append("\"/></rdf:Description>");
}
public MetadataResponse getMetadata(LSIDRequestContext ctx, String[] formats)
throws LSIDServerException {
// should check formats[] for RDF format, but will assume client can accept RDF
LSID lsid = ctx.getLsid();
int lsType;
try {
lsType = lookup.lsidType(lsid);
}
catch (LSIDServerException ex) {
ex.printStackTrace();
lsType = SampleLSIDDataLookup.UNKNOWN;
}
if (lsType == SampleLSIDDataLookup.UNKNOWN)
throw new LSIDServerException(201, "Unknown LSID");
StringBuffer result= new StringBuffer();
result.append("<?xml version=\"1.0\"?><rdf:RDF");
result.append(" xmlns:rdf=\"");
result.append(RDF_NS);
result.append("\" xmlns:dc=\"");
result.append(DC_NS);
result.append("\" xmlns:i3cp=\"");
result.append(I3CP_NS);
result.append(">");
String baseLSID= lsid.toString();
if (baseLSID.endsWith("-fasta") || baseLSID.endsWith("-sprot"))
baseLSID.substring(0, baseLSID.length() - 6);
if (lsType == SampleLSIDDataLookup.ABSTRACT) {
appendTripleResource(
result,
baseLSID, "i3cp:storedas", baseLSID + "-fasta"
);
appendTripleResource(
result,
baseLSID, "i3cp:storedas", baseLSID + "-sprot"
);
appendTripleResource(
result,
baseLSID + "-fasta", "rdf:type", I3C_CONTENT
);
appendTripleResource(
result,
baseLSID + "-sprot", "rdf:type", I3C_CONTENT
);
appendTripleResource(
result,
baseLSID + "-fasta", "dc:format", I3C_FASTA
);
appendTripleResource(
result,
baseLSID + "-sprot", "dc:format", I3C_SPROT
);
}
else {
String format= I3C_SPROT;
if (lsid.getObject().endsWith("-fasta")) {
format= I3C_FASTA;
}
appendTripleResource(
result,
baseLSID, "i3cp:storedas", baseLSID + "-fasta"
);
appendTripleResource(
result,
lsid.toString(), "rdf:type", I3C_CONTENT
);
appendTripleResource(
result,
lsid.toString(), "dc:format", format
);
}
result.append("</rdf:RDF>");
return new MetadataResponse(
new ByteArrayInputStream(
result.toString().getBytes()
),
null,
MetadataResponse.RDF_FORMAT
);
}
}
|
Running and testing the
authority To test the Swiss-Prot authority, copy the file
swiss-prot.war to your Jakarta Tomcat webapps directory. Upon Tomcat
startup, this file will expand to webapps/swiss-prot, and the authority
service will be available at http://localhost:8080/swiss-prot/authority/.
You can test the authority the same way you tested the Hello World LSID
authority, by running TestClient :
java TestClient urn:lsid:ibm.com:swiss-id:hv20_mouse \
http://localhost:8080/swiss-prot/authority/
|
The expected output is:
Meta is available at:
(soap) http://localhost:8080/swiss-prot/authority/metadata
-- META DATA --
<?xml version="1.0"?><rdf:RDF xmlns:rdf="http://www.w3.org/199....
----
|
If you decide to test with an LSID that has associated data, you can
try the command:
java TestClient urn:lsid:ibm.com:swiss-id:hv20_mouse-fasta \
http://localhost:8080/swiss-prot/authority/
|
The expected output in this case is:
Data is available at:
(soap) http://localhost:8080/swiss-prot/authority/data
(http) http://localhost:8080/swiss-prot/authority/data?
urn:lsid:ibm.com:swiss-id:hv20_mouse-fasta
Meta is available at:
(soap) http://localhost:8080/swiss-prot/authority/metadata
-- DATA --
>HV20_MOUSE (P01789) Ig heavy chain V region M603.
EVKLVESGGGLVQPGGSLRLSCATSGFTFSDFYMEWVRQPPGKRLEWIAASRNKGNKYTTEYSASVKGRFIVSRDTSQ
SILYLQMNALRAEDTAIYYCARNYYGSTWYFDVWGAGTTVTVSS
----
-- METADATA --
<?xml version="1.0"?><rdf:RDF xmlns:rdf="http://www.w3.org/1999/......
----
|
Metadata Metadata
about an LSID can be described using RDF (see the RDF Primer, listed in Resources). The minimum
useful information about an LSID is whether it has data associated with
it, what format the data is in, or in the case of LSIDs with no data
associated, where to go to get a specific rendition of the concept in a
particular format.
About RDF RDF documents are made
up of simple statements that consist of three parts each: the subject, the
predicate, and the object (value). "Spot chases rabbits" is the English
translation of an RDF triple, where "Spot" is the subject, "chases" is the
predicate, and "rabbit" is the object. RDF is simply a formal way of
encoding such information. Metadata about a particular LSID consists of a
collection of RDF statements. Predicates (also known as properties)
themselves can be thought of as a particular kind of subject (resources).
The RDF Schema Specification (see Resources) specifies how
to describe the relationship between predicates using RDF statements. The
subject and predicate are always named by a URI, and since an LSID is a
URN, which is a kind of URI, LSIDs can be used as either RDF statements or
predicates. The objects in RDF are either URIs (in which case you can use
LSIDs) or so-called "literal" values that may or may not be typed.
An example Suppose
that the LSID urn:lsid:pets.org:cats:Tom names a cat. As
such, this LSID represents the abstract concept of Tom the cat, not any
particular and unchanging collection of bits describing him. However,
there are things related to Tom, such as a picture of him as a kitten,
that have concrete digital representations. Say that
urn:lsid:pets.org:cats:Tom-photos:Nov-22-1998 represents a
particular photo of Tom. We can "attach" this photo to
urn:lsid:pets.org:cats:Tom using the RDF property
urn:lsid:i3c.org:predicates:storedAs , like this:
<rdf:Description rdf:about="urn:lsid:pets.org:cats:tom">
<i3c:storedas rdf:resource="urn:lsid:pets.org:cats:tom-photos:nov-22-1998"/>
</rdf:Description>
|
Note that URIs in RDF are treated as case sensitive, while LSIDs are
case insensitive. To avoid any potential for error, you should always
represent LSIDs in RDF using their canonical form: lower case. The
peculiar-looking XML tag i3c:storedas is the name of the
property. Assuming that the namespace prefix i3c stands for
urn:lsid:i3c.org:predicates: , the fully-qualified property
name is urn:lsid:i3c.org:predicates:storedas (the
concatenation of the prefix and the tag name).
Since Tom's photo has data associated with it, we must describe that
fact in our metadata. The class
urn:lsid:i3c.org:types:content encompasses all things that
have data associated with them, so Tom's photo belongs to that class. We
describe this fact with an RDF statement:
<rdf:Description rdf:about="urn:lsid:pets.org:cats:tom-photos:nov-22-1998">
<rdf:type rdf:resource="urn:lsid:i3c.org:types:content"/>
</rdf:Description>
|
The rdf:type property is used to denote class membership.
The namespace prefix rdf is conventionally used to represent
the namespace http://www.w3.org/1999/02/22-rdf-syntax-ns#, defined in the
RDF specification.
Lastly, Tom's photo is stored in some format, like JPEG. We can use a
part of the Dublin Core RDF vocabulary (see Resources) to denote
that fact. The LSID urn:lsid:i3c.org:formats:jpg represents
the concept of the JPEG data format. The RDF statement that describes all
that is:
<rdf:Description rdf:about="urn:lsid:pets.org:cats:tom-photos:nov-22-1998">
<dc:format rdf:resource="urn:lsid:i3c.org:formats:jpg"/>
</rdf:Description>
|
The dc:format property is used to describe the format of a
resource. The namespace prefix dc is conventionally used to
represent the namespace http://purl.org/dc/elements/1.1/.
It is common to display RDF documents as an interlinked graph. For our
example, the schematic looks like this:
Figure 1. Graphical representation of
relations
A note about metadata's
persistence Unlike the data associated with an LSID,
metadata can expire. That means that the metadata about an LSID is the
perfect location for storing transient information about the object in
question. For example, a link to Tom the cat's home page could appear in
the metadata about Tom and can be modified at any time.
Adding value to your
metadata Whenever possible, standard property names should
be used to describe a particular collection of resources. Using standard
vocabularies greatly enhances the potential for interoperability between
providers and consumers of metadata. However, pre-existing properties
sometimes do not adequately describe a given relationship. In that case,
you can come up with a new property that serves your purpose. If an RDF
Schema or a Web Ontology Language description of the property is provided,
RDF Schema-enabled clients will still be able to understand the meaning of
your metadata even if they have no specific knowledge of the predicates
you use.
Location independence and the guarantee of immutability of LSIDs make
them perfect candidates for database cross-references. If you know about a
meaningful relation between two LSIDs, you should describe it in the
metadata, regardless of who issued the LSIDs. Any client that is able to
resolve LSIDs will be able to do so regardless of their origin.
Making the authority publicly
available To make an authority publicly available and
conformant to the LSID Resolution Proposal, you need to provide a way for
people to resolve LSIDs handled by your authority without knowing the
exact location of your service beforehand. That is, clients of your
authority should not need to edit their authorities , or do
anything similar.
The first step in solving that task is to set up a DNS service record
for your authority. To take the authority for pdb.org as an example, you
should be able to determine the host name and port number where the LSID
service resides. Enter the following command:
host -t srv _lsid._tcp.pdb.org
|
You are asking DNS for the lsid service record for pdb.org with TCP as
the network protocol. The response should look like this:
_lsid._tcp.pdb.org SRV 1 0 8080 lsidauthority.pdb.org.
|
This tells us that the service for the pdb.org authority is running on
the host with name lsidauthority.pdb.org and is waiting for connections on
TCP port 8080. Unfortunately, this information is not sufficient to
determine the endpoint for the pdb.org authority service. That is why the
LSID Resolution Proposal mandates that the service is available on the
host path /authority/. In the case of pdb.org, the fully qualified URL of
the authority service should therefore be:
http://lsidauthority.pdb.org:8080/authority/.
Setting up DNS All
that you -- or your system administrator -- must do is to add a service
record for the machine that will run the authority. Suppose the machine is
authority.company.net and that it will serve as the authority named
company.net. Further suppose that the service will be on port 8080. The
record that must be added should go into the master zone file for
company.net's DNS server (perhaps a file named /var/named/company.net.zone
on company.net):
_lsid._tcp IN SRV 1 0 8080 authority.company.net.
|
If the authority name is supposed to be authority.company.net rather
than company.net, the record in company.net's zone file should look like
this:
_lsid._tcp.authority IN SRV 1 0 8080 authority.company.net.
|
Conclusion We hope
that the step-by-step approach of this article -- and the extensive
samples directory included as the zip file in Resources -- will get
you up and running quickly. In the same spirit of saving you time and
energy, we also include a copy of a memo you might e-mail to your DNS
administrator to request that an SRV record be implemented:
To: <put DNS
administrator name here> cc: <put your department manager name
here> Subject: DNS record to allow resolution to my LSID
Authority
Please could you add the following SRV record to the
appropriate zone file:
_lsid._tcp IN SRV 1 0 <put port
here> <put Authority host name here>.
If you are running
BIND v4 or above, or an equivalent Domain Name Service, then your
system will support SRV records (RFC 2782).
The reason for this
SRV record is to allow clients running the LSID protocol (explained
at http://www.i3c.org/wgr/ta/resources/lsid/docs/) to resolve to an
LSID Authority server I am running behind port <put port number
here> on <put Authority host name here>.
For more
information about the LSID protocol, resolution model, and Authority
server, refer to
http://www-124.ibm.com/developerworks/oss/lsid/.
Please let me
know when this will start to become active.
Thank
you.
Kind regards,
Resources
- You can obtain all code samples and batch files used throughout this
article in a single zip archive (all the files will be in the samples
sub-directory of the extracted archive): lsid-java-samples.tar.gz
- Download the Java packages necessary to work through the examples in
this article:
- The LSID Resolution Protocol
Project provides LSID client and server stacks, along
with the LSID LaunchPad, an
example client.
- Download the Swiss-Prot database as
a compressed file via FTP.
- If you opt to set up the sample authority using the Swiss-Prot data
set for your own testing purposes, you will also need the file
mysql-connector-java-x.x.x-bin.jar from the MySQL Connector/J
distribution available from MySQL AB.
- The file format of the sprot40.dat database is described in the Swiss-Prot user
manual.
- If Perl is more your thing, Stefan has written a similar article, Setting up your own LSID
Authority using Perl.
- The Life Sciences Identifier (LSID) is an I3C Uniform Resource Name
(URN) specification in progress. You can read more about the
specification at the I3C site.
- See the RDF Primer to learn
how metadata about an LSID can be described using RDF(s).
- The RDF Schema
Specification specifies how to describe the relationship between
predicates using RDF statements.
- For this article, we use part of the Dublin Core RDF
vocabulary to denote which file format is used for Tom's
photo.
- Visit the IBM WebSphere download
center for product downloads, useful resources, advanced
information, and certification.
- The Web Services for Life
Sciences package that queries the National Center for Biotechnology
Information (NCBI) Web site is available for free download at IBM's
alphaWorks.
- IBM Life Sciences
addresses IT needs specific to biotechnology, pharmaceuticals, genomics,
proteomics, and healthcare.
- IBM researchers are involved in a number of scientific and
technological disciplines, including chemistry, computer science, electrical
engineering, materials science, math, and physics. Learn more
about it at the IBM Research
site.
- You might also be interested in the role IBM middleware played in
these Linux case studies, Structural Bioinformatics
supports drug discovery with IBM and Linux, and IBM and MDS Proteomics
alliance aims to speed drug development.
- Find more resources on open source projects,
Linux, Java technology, and
Web services on developerWorks.
- You'll find a wide selection of books on Linux at the Linux section of the
Developer Bookstore.
About the
authors Stefan Atev is an Extreme Blue 2001 alumn who has
returned to IBM over several summers. He has worked on the SashXB
Web services client and is presently involved with the Life Sciences
Identifier project, implementing Life Sciences data stores and
applying semantic Web technologies to improve the usability of
existing Life Sciences databases. You can reach Stefan at satev@us.ibm.com. |
Ben Szekely is an Extreme Blue 2000 alumn
who has returned to IBM full time. Ben is the lead developer for
LSID Java software and has been instrumental in developing the
specification that is under review at the Object Management Group.
You can reach Ben at bhszekel@us.ibm.com.
|
|
|