Friday, 21 November 2008

Invalid XML Characters

Consuming third party web services is usually a very straight forward task, and most of the time they work without a problem. However, every now and again one comes along that is quite happy to return data containing invalid XML characters and causes Axis to throw an exception and cease processing in complete disgust. The ideal solution is to get the third party to fix their web service but sometimes this is not possible. I had once such problem the other week when consuming a web service provided by Microsoft SharePoint. It was not possible to alter the web service or fix the underlying data that was causing the problem so I had to deal with the problem client end. To do this I took advantage of Axis' custom handlers. With handlers you can insert code into the Axis chain and obtain a handle to the input stream allowing you to alter the contents before passing it back into the chain where processing continues as normal.

The easiest way to create a new handler is to extend BasicHandler and implement the invoke method. The following example does just that and removes all &#nnn; entities from the XML.

package org.bensjavaspot.handlers;

import java.io.ByteArrayInputStream;
import java.io.IOException;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import javax.xml.soap.SOAPException;
import javax.xml.transform.stream.StreamSource;
import org.apache.axis.AxisFault;
import org.apache.axis.Message;
import org.apache.axis.MessageContext;
import org.apache.axis.handlers.BasicHandler;
import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import org.apache.axis.SOAPPart;

/**
* Custom Axis handler that allows us to intercept the incoming input stream and
* remove any bad characters before passing it back into the chain.
*
* @author Ben Hills
*/
public class SOAPCleanerHandler extends BasicHandler
{

private static final Log log = LogFactory.getLog(SOAPCleanerHandler.class);

public void invoke(MessageContext mtx) throws AxisFault {

/* Get the SOAP Message */
Message message = mtx.getResponseMessage();

if (message != null) {
try {
/* Message normally consists of single part and optionally attachments. We just need the main part */
SOAPPart part = (org.apache.axis.SOAPPart) message.getSOAPPart();

/* Get the source */
StreamSource source = (StreamSource) part.getContent();

/* Get the input XML as a string */
String messageAsString = message.getSOAPPartAsString();

/* Regexp utils need a StringBuffer so create new one with input string */
StringBuffer out = new StringBuffer(messageAsString);

/* Create regular expresion to remove all &#nnn; type entities */
Pattern pat = Pattern.compile("&#[0-9]*;");

Matcher matcher = pat.matcher(out.toString());
StringBuffer sb = new StringBuffer();

/* Replace each match found with a single space */
while (matcher.find()) {
matcher.appendReplacement(sb, " ");
}

matcher.appendTail(sb);

/* Convert back to byte array using the correct encoding */
ByteArrayInputStream b = new ByteArrayInputStream(sb.toString().getBytes(part.getEncoding()));

/* Pass input stream back in to chain */
source.setInputStream(b);

/* Set it again */
part.setContent(source);

}
catch (SOAPException ex) {
log.error("Failed to get and set source", ex);
}
catch (IOException ex) {
log.error("Error reading input stream", ex);
}
}
}
}

You can download the source code here

Once we have created our handler we need to instruct Axis to insert it into the response chain. This is achieved by adding an entry into the Axis client-config.wsdd file

<globalConfiguration >
<responseFlow >
<handler type="java:org.bensjavaspot.handlers" />
</responseFlow >
</globalConfiguration >


And that's it!