Rick Strahl's Weblog  

Wind, waves, code and everything in between...
.NET • C# • Markdown • WPF • All Things Web
Contact   •   Articles   •   Products   •   Support   •   Advertise
Sponsored by:
West Wind WebSurge - Rest Client and Http Load Testing for Windows

Byte Order Marks and XmlDocument Streaming to HTTP


:P
On this page:

Argh. I just got bit again by Byte Order marks that get sent out over HTTP when using ASP.NET. Basically the problem is that .NET's default XmlTextWriter encoding uses UTF-8 and the default Encoding includes generation of a BOM as part of the output. If that output gets sent to the Response output the BOM gets sent along with it which is definitely what you want.

In case you don't know what a BOM is - it's the the preamble that gets attached to the beginning of the document to identify its Encoding. There are different preambles for different Encodings. It's meant for files stored on disk so that when a file is opened there's a quick way for an editor to detect the encoding for the file. If you look at a UTF-8 document with a hex editor (or any editor that doesn't apply any encoding) you'd see:

<?xml version="1.0" encoding="utf-8"?>

If you use UTF8Encoding in .NET it defaults to generate this BOM prefix by default. Most XML objects in .NET too use XmlTextWriter for streaming output to, and they too defaults to UTF8Encoding. So if you stream XML to disk you'll get the right result which is what you want.

However, if you're streaming the XML to some other direct output  source such as Http from ASP.NET you definitely don't want the BOM in front - it in fact makes the XML document invalid. I just had some code that does the post the content of an XmlDocument to an Http server. The following code incorrectly streams  and ends up sending a BOM to the server:

public string PostDocument(string url, XmlDocument doc)
    {
 
        // *** Output from Http request will be stored here and returned
        string result = null;
 
        MemoryStream ms = new MemoryStream(1024);
 
           XmlTextWriter writer = new XmlTextWriter(ms);
        
            doc.Save(ms);
 
            // *** Rewind the stream
            ms.Position = 0;
 
            // *** Http Wrapper Component            
            wwHttp http = new wwHttp();
 
            // *** Pre-create request object so we can set properties on it
            http.CreateWebRequestObject(url);
 
            // *** Content will be XML
            http.ContentType = "text/xml; charset=utf-8";
 
            // *** Pick up the client Http headers that we need to add to the request
            Dictionary<string, string> headers = GetCreditDecisionHttpHeaders(doc);
 
            // *** Add all Http headers to the request
            foreach (KeyValuePair<string, string> kv in headers)
            {
                http.WebRequest.Headers.Add(kv.Key, kv.Value);
            }
 
            // *** apply the POST data from stream
            http.SetPostStream(ms);
 
            // *** And send the request
            result = http.GetUrl(url);
 
        
        // *** Return the string result (should be OK)
        return result;
    }

In this code I have an XML document that is previously created and stream it out to a stream. The stream is then used as input POST data for an HTTP request that posts the XML document to a service. The http object here is a small wrapper around HttpWebRequest and essentially it assigns the stream into the RequestStream which then posts the data to the server.

This code as shown above will generate a BOM and so will be effectively invalid... The reason for this is that doc.Save(ms) will stream out using the default XmlTextWriter that XmlDocument uses and so it includes the BOM based on the default UTF8Encoding.

The way to fix this is to use an explicit XmlTextWriter as an intermediary for streaming:

public string PostCreditDecision(string url, XmlDocument doc)
 {
 
    // *** Output from Http request will be stored here and returned
    string result = null;
 
    MemoryStream ms = new MemoryStream(1024);
 
    using (XmlTextWriter writer = new XmlTextWriter(ms, new UTF8Encoding(false)))
    {
        doc.Save(writer);
 
        // writer.Close();                
 
        // *** Rewind the stream
        ms.Position = 0;
 
        // *** Http Wrapper Component            
        wwHttp http = new wwHttp();
 
 
        // *** Pre-create request object so we can set properties on it
        http.CreateWebRequestObject(url);
 
        // *** Content will be XML
        http.ContentType = "text/xml; charset=utf-8";
 
 
        // *** Pick up the client Http headers that we need to add to the request
        Dictionary<string, string> headers = GetCreditDecisionHttpHeaders(doc);
 
        // *** Add all Http headers to the request
        foreach (KeyValuePair<string, string> kv in headers)
        {
            http.WebRequest.Headers.Add(kv.Key, kv.Value);
        }
 
        // *** apply the POST data from stream
        http.SetPostStream(ms);
 
        // *** And send the request
        result = http.GetUrl(url);
 
    }
    // *** Return the string result (should be OK)
    return result;
 }

Here the explicit XmlTextWriter is created with an explicit UTFEncoding. The key is:

new XmlTextWriter(ms, new UTF8Encoding(false)))

Note the false parameter on UTF8Encoding which turns off the BOM generation.

This is a subtle issue and it's easy to miss or forget about, because the BOM generation is by default and just about all the XML writers in .NET use it. It's also not easy to see the fact that the BOM is there, because if you render the output into a browser it gets stripped out. If you save the output to file too you're not likely to see the BOM even though it's there because when you open the file in a text editor the editor applies the BOM and strips it out. Even .NET's XML objects will skip over the BOM when reading XML. But if you're interfacing with other solutions like Java there will be problems.You'll only see it show in a hex editor or if you do an Http trace.

Posted in .NET  XML  

The Voices of Reason


 

[mRg]
April 21, 2008

# re: Byte Order Marks and XmlDocument Streaming to HTTP

Thanks Rick .. I was having the same problem the other day I was curious where i was getting weird symbols at the beginning of my XML from ! :)

Marie
October 31, 2009

# re: Byte Order Marks and XmlDocument Streaming to HTTP

Hi Rick, thanks for a great post. Though, I can "understand" the garbage MS make me go through every minute of my life (BOM is just a tiny of it) I am not familiar with this http post/get stuff. So I was not sure how to apply your solution. For example, there is no overload for a XmlTextWriter(httpcontext.response.ouput, encoding) to ship back a local file on my server.
I have the same problem with the following code done from silverlight/browser side
failing on the .parse because of BOM. Do you have a solution? I do not want to save all my files in UTF8 filetype using notepad :). I already have them in unicode and they are already working/read from Flash so I don't want to break that part.
private void LoadXMLFile()
{
WebClient xmlClient = new WebClient();
xmlClient.DownloadStringCompleted += new DownloadStringCompletedEventHandler(XMLFileLoaded);
xmlClient.DownloadStringAsync(new Uri("sampleXML.xml", UriKind.Relative));
}

void XMLFileLoaded(object sender, DownloadStringCompletedEventArgs e)
{
if (e.Error == null)
{
string xmlData = e.Result;
XDocument xdoc = XDocument.Parse(xmlData, LoadOptions.None);
}
} 
.
MS make me dummier by the minute and with every release,
Thanks for all the help.

West Wind  © Rick Strahl, West Wind Technologies, 2005 - 2024