Dec 6, 2010

Xerces and offline validation with DTD

This was an issue that took me some time:

I need to validate a document using xerces parser. The document references a .xsd schema, which in turn references a DTD schema ( However, the document would not validate when there was no connection to Internet.


URL mySchemaURL = getClass().getResource("/com/mycompany/mySchema.xsd");
documentBuilderFactory.setAttribute("", " " + mySchemaURL.toString());

I managed to make the parser read the XML schema from the JAR file. However this did not work for the DTD referenced from mySchema.xsd.

The error message was rather unhelpful:

org.xml.sax.SAXParseException: schema_reference.4: 
Failed to read schema document 'myschema.xsd', because 
1) could not find the document; 
2) the document could not be read; 
3) the root element of the document is not <xsd:schema>.


It turns out, the parser was trying to retrieve from the Internet. The solution with external-schemaLocation did not work for DTDs.

I then found an article ( about caching DTDs, which led me to following solution:

class DTDResponseCache extends ResponseCache {
        Map<URI, String> savedDTD;
         * Original cache used for requests other than specified DTDs
        ResponseCache originalCache;
        public DTDResponseCache(ResponseCache originalCache) {
            this.originalCache = originalCache;
            savedDTD = new HashMap<URI, String>();
            //Add your DTDs here
        public CacheResponse get(final URI uri, String rqstMethod, Map<String, List<String>> rqstHeaders) throws IOException {
            if (savedDTD.containsKey(uri)) {
                return new CacheResponse() {
                    public Map<String, List<String>> getHeaders() throws IOException {
                        Map<String, List<String>> headers = new HashMap<String, List<String>>();
                        return headers;
                    public InputStream getBody() throws IOException {
                        return getClass().getResourceAsStream(savedDTD.get(uri));
            } else {
                if(originalCache != null){
                    return originalCache.get(uri, rqstMethod, rqstHeaders);
                } else {
                    return null;
        public CacheRequest put(URI uri, URLConnection conn) throws IOException {
            if (originalCache != null) {
                return originalCache.put(uri, conn);
            } else {
                return null;

Now you only save the DTDs somewhere accesible to your classloader and then on application startup:


ResponseCache.setDefault(new DTDResponseCache(ResponseCache.getDefault()));

Which will preserve your previous caching settings (if any).

No comments:

Post a Comment