Argh. I just got bit again by Byte Order marks that get sent out over HTTP when using ASP.NET. Basically the problem is that .NET's default XmlTextWriter encoding uses UTF-8 and the default Encoding includes generation of a BOM as part of the output. If that output gets sent to the Response output the BOM gets sent along with it which is definitely what you want.
In case you don't know what a BOM is - it's the the preamble that gets attached to the beginning of the document to identify its Encoding. There are different preambles for different Encodings. It's meant for files stored on disk so that when a file is opened there's a quick way for an editor to detect the encoding for the file. If you look at a UTF-8 document with a hex editor (or any editor that doesn't apply any encoding) you'd see:
<?xml version="1.0" encoding="utf-8"?>
If you use UTF8Encoding in .NET it defaults to generate this BOM prefix by default. Most XML objects in .NET too use XmlTextWriter for streaming output to, and they too defaults to UTF8Encoding. So if you stream XML to disk you'll get the right result which is what you want.
However, if you're streaming the XML to some other direct output source such as Http from ASP.NET you definitely don't want the BOM in front - it in fact makes the XML document invalid. I just had some code that does the post the content of an XmlDocument to an Http server. The following code incorrectly streams and ends up sending a BOM to the server:
public string PostDocument(string url, XmlDocument doc)
{
// *** Output from Http request will be stored here and returned
string result = null;
MemoryStream ms = new MemoryStream(1024);
XmlTextWriter writer = new XmlTextWriter(ms);
doc.Save(ms);
// *** Rewind the stream
ms.Position = 0;
// *** Http Wrapper Component
wwHttp http = new wwHttp();
// *** Pre-create request object so we can set properties on it
http.CreateWebRequestObject(url);
// *** Content will be XML
http.ContentType = "text/xml; charset=utf-8";
// *** Pick up the client Http headers that we need to add to the request
Dictionary<string, string> headers = GetCreditDecisionHttpHeaders(doc);
// *** Add all Http headers to the request
foreach (KeyValuePair<string, string> kv in headers)
{
http.WebRequest.Headers.Add(kv.Key, kv.Value);
}
// *** apply the POST data from stream
http.SetPostStream(ms);
// *** And send the request
result = http.GetUrl(url);
// *** Return the string result (should be OK)
return result;
}
In this code I have an XML document that is previously created and stream it out to a stream. The stream is then used as input POST data for an HTTP request that posts the XML document to a service. The http object here is a small wrapper around HttpWebRequest and essentially it assigns the stream into the RequestStream which then posts the data to the server.
This code as shown above will generate a BOM and so will be effectively invalid... The reason for this is that doc.Save(ms) will stream out using the default XmlTextWriter that XmlDocument uses and so it includes the BOM based on the default UTF8Encoding.
The way to fix this is to use an explicit XmlTextWriter as an intermediary for streaming:
public string PostCreditDecision(string url, XmlDocument doc)
{
// *** Output from Http request will be stored here and returned
string result = null;
MemoryStream ms = new MemoryStream(1024);
using (XmlTextWriter writer = new XmlTextWriter(ms, new UTF8Encoding(false)))
{
doc.Save(writer);
// writer.Close();
// *** Rewind the stream
ms.Position = 0;
// *** Http Wrapper Component
wwHttp http = new wwHttp();
// *** Pre-create request object so we can set properties on it
http.CreateWebRequestObject(url);
// *** Content will be XML
http.ContentType = "text/xml; charset=utf-8";
// *** Pick up the client Http headers that we need to add to the request
Dictionary<string, string> headers = GetCreditDecisionHttpHeaders(doc);
// *** Add all Http headers to the request
foreach (KeyValuePair<string, string> kv in headers)
{
http.WebRequest.Headers.Add(kv.Key, kv.Value);
}
// *** apply the POST data from stream
http.SetPostStream(ms);
// *** And send the request
result = http.GetUrl(url);
}
// *** Return the string result (should be OK)
return result;
}
Here the explicit XmlTextWriter is created with an explicit UTFEncoding. The key is:
new XmlTextWriter(ms, new UTF8Encoding(false)))
Note the false parameter on UTF8Encoding which turns off the BOM generation.
This is a subtle issue and it's easy to miss or forget about, because the BOM generation is by default and just about all the XML writers in .NET use it. It's also not easy to see the fact that the BOM is there, because if you render the output into a browser it gets stripped out. If you save the output to file too you're not likely to see the BOM even though it's there because when you open the file in a text editor the editor applies the BOM and strips it out. Even .NET's XML objects will skip over the BOM when reading XML. But if you're interfacing with other solutions like Java there will be problems.You'll only see it show in a hex editor or if you do an Http trace.
Other Posts you might also like