Dec 6, 2010

Xerces and offline validation with DTD

This was an issue that took me some time:

I need to validate a document using xerces parser. The document references a .xsd schema, which in turn references a DTD schema (http://www.w3.org/2001/XMLSchema.dtd). However, the document would not validate when there was no connection to Internet.

Using 

URL mySchemaURL = getClass().getResource("/com/mycompany/mySchema.xsd");
documentBuilderFactory.setAttribute("http://apache.org/xml/properties/schema/external-schemaLocation", "http://server.com/myschema " + mySchemaURL.toString());

I managed to make the parser read the XML schema from the JAR file. However this did not work for the DTD referenced from mySchema.xsd.

The error message was rather unhelpful:

org.xml.sax.SAXParseException: schema_reference.4: 
Failed to read schema document 'myschema.xsd', because 
1) could not find the document; 
2) the document could not be read; 
3) the root element of the document is not <xsd:schema>.

 


It turns out, the parser was trying to retrieve http://www.w3.org/2001/XMLSchema.dtd from the Internet. The solution with external-schemaLocation did not work for DTDs.


I then found an article (http://tynne.de/xerces-w3c) about caching DTDs, which led me to following solution:

class DTDResponseCache extends ResponseCache {
        Map<URI, String> savedDTD;
        /**
         * Original cache used for requests other than specified DTDs
         */
        ResponseCache originalCache;
        public DTDResponseCache(ResponseCache originalCache) {
            this.originalCache = originalCache;
            savedDTD = new HashMap<URI, String>();
            //Add your DTDs here
            savedDTD.put(URI.create("http://www.w3.org/2001/XMLSchema.dtd"),"/com/mycompany/dtds/XMLSchema.dtd");
            savedDTD.put(URI.create("http://www.w3.org/2001/datatypes.dtd"),"/com/mycompany/dtds/datatypes.dtd");
        }
        @Override
        public CacheResponse get(final URI uri, String rqstMethod, Map<String, List<String>> rqstHeaders) throws IOException {
            if (savedDTD.containsKey(uri)) {
                return new CacheResponse() {
                    @Override
                    public Map<String, List<String>> getHeaders() throws IOException {
                        Map<String, List<String>> headers = new HashMap<String, List<String>>();
                        return headers;
                    }
                    @Override
                    public InputStream getBody() throws IOException {
                        return getClass().getResourceAsStream(savedDTD.get(uri));
                    }
                };
            } else {
                if(originalCache != null){
                    return originalCache.get(uri, rqstMethod, rqstHeaders);
                } else {
                    return null;
                }
            }
        }
        @Override
        public CacheRequest put(URI uri, URLConnection conn) throws IOException {
            if (originalCache != null) {
                return originalCache.put(uri, conn);
            } else {
                return null;
            }
        }
    }


Now you only save the DTDs somewhere accesible to your classloader and then on application startup:


 

ResponseCache.setDefault(new DTDResponseCache(ResponseCache.getDefault()));
 

Which will preserve your previous caching settings (if any).

No comments:

Post a Comment