Charset Encoding in ASP.NET Response

December 23, 2007 • from Maui, Hawaii • 5 comments

On this page:

I was reviewing some control and handler code in my wwHoverPanel control's AJAX callback handler code. There are a number of routines that generate JavaScript output from JSON to returning resources and I noticed that the content type headers would often vary slightly. When returning a content type header you typically want to do something like this:

       protected override void OnLoad(EventArgs e)
        {
            Response.ContentType = "text/xml";




            ... go on with your bad self
        }

"text/html" is the default and so for typical HTML output you don't need to specify a content type, but if you specify anything but - text/xml or application/x-javascript for example - the explicit Content Type is required. This results - depending on your encoding settings in Web.config - in the following HTTP Header:

HTTP/1.1 200 OK
Content-Type: text/xml; charset=utf-8
Server: Microsoft-IIS/7.0
X-AspNet-Version: 2.0.50727
X-Powered-By: ASP.NET
Date: Sun, 23 Dec 2007 00:18:24 GMT
Content-Length: 13412

Notice that the charset is automatically appended to the content type. Where does this come from? ASP.NET automatically appends the charset when the page is eventually written out into the output stream, reading the charset information from the active Response.ContentEncoding or if you explicitly specified it the Response.Charset. So to change the charset for the page the most reliable thing is to change the Response.ContentEncoding that is applied which will change the actual output encoding for Response.Write() as well as the the header.

Response.ContentType = "text/html";
Response.Write("HaÑú«");
Response.ContentEncoding = Encoding.ASCII;

This results in Ha?? output for the two extended character because ASCII can't present them. The ContentType header is changed to text/html; charset=us-ascii.Note that Response.Write() output is buffered so ContentEncoding can be applied after some output has already been written.

The encoding is typically defaulted to what's specified in System.Web.Globalization.contentEncoding in web.config and that encoding is applied to the output sent to the client. This is always true unless you override the Response.ContentEncoding explicitly. Typically UTF-8 is the right choice because it works fairly reliably with Unicode text from all over the world. It's rare that you'd want to change this.

Most of the time this is what you want especially with HTML content. ASP.NET will take your generated output which is Unicode and then encode it into the specified Encoding and add the Charset. No problem for HTML page output with this or even XML output.

If you want to surpress the character set you can explicitly set it to null:

Response.ContentType = "text/html";
Response.Charset = null;

This encoding occurs as part fot he ASP.NET pipeline and Response processing . So, I have a custom handler implementation - an Ajax CallbackHandler class - that returns Json and it too generates charset=utf-8. Now some content shouldn't be UTF 8 encoded. JSON for example uses its own internal encoding for extended characters using Unicode escape sequences so UTF-8 encoding strictly shouldn't be necessary. However leaving UTF-8 enabled isn't going to hurt either except a tiny bit of overhead looking at the output and basically leaving it alone <s>.

Things are a little different though if you send binary content of course. Binary content is not encoded so automatic encoding isn't applied. This is true when you use BinaryWrite or write directly to the Response.Outputstream in some fashion. Fairly obvious. But it can get weird when you end up writing text out this way as I had to do today.

I noticed something odd as I was looking at a module implementation that provides script compression. This module basically intercepts requests to a certain URL pattern and based on that generates its output - in this case compressed JavaScript from embedded resources (similar to what the MS AJAX ScriptHandler does). The output is either the raw JavaScript text (retrieved from a Resource or a file) or the GZipped content both of which are either directly created or cached. In effect the data is also a bunch of raw bytes rather than text.

Now the output generated from this module however is lacking the charset which in effect is fine. But why? The code basically handles this event and then eventually generates its output with a Response.BinaryWrite() followed by a Response.End():

private void SendOutput(byte[] Output, bool UseGZip)
{
    HttpResponse Response = HttpContext.Current.Response;            
    Response.ContentType = "application/x-javascript";
    Response.Charset = "utf-8";
 
    if (UseGZip) 
        Response.AppendHeader("Content-Encoding", "gzip");
 
    if (!HttpContext.Current.IsDebuggingEnabled)
    {
        Response.ExpiresAbsolute = DateTime.UtcNow.AddYears(1);
        Response.Cache.SetLastModified(DateTime.UtcNow);
        Response.Cache.SetCacheability(HttpCacheability.Public);
    }
 
    Response.BinaryWrite(Output);
    Response.End();
}

Notice that in the code I call BinaryWrite() rather than Write() and because of this binary output ASP.NET doesn't do any encoding or content type fixup on my behalf. Duh! Encoding is applied only to text content written out with Response.Write() for the first time. So I have to explicitly ensure that the Charset is set. Additionally my code now is responsible for properly encoding the content for UTF-8.

if (useGZip && script.Length > 8092)
    output = GZipMemory(script);  // also encodes to UTF8 before compressing
else
{
    output = Encoding.UTF8.GetBytes(script);
    useGZip = false;
}

Just one of those things that are possibly easy to miss if you deal with binary content that also happens to be text.

More Encoding Issues: Resource Encoding

Incidentally I ran into another weird encoding issue during all of this - initially I noticed that the data I sent back out of the module was double UTF8 encoded and because of the issues above I figured I had something messed up with output encoding. After stepping through the code I found that the problem wasn't in the output though but rather in the Resource retrieval using GetManfestResourceStream. The problem was the encoding of the original resource file on disk before it gets embedded. Now encoding is tricky in .NET because most of the text based readers use UTF-8 by default which means MOST of the time it works correctly with most content even if the content read is in fact not UTF-8 encoded.

I used this simple code to load the script:

// *** Load the script file as a string from Resources
string script = "";
using (Stream st = resourceAssembly.GetManifestResourceStream(resource))
{                
    StreamReader sr = new StreamReader(st);
    script = sr.ReadToEnd();
}

This works fine most of the time as long as there are no extended characters because UTF-8's base 127 characters match the ASCII set and any default encodings. However with upper ascii string values the input just becomes garbage (several square characters). The default stream reader encoding is UTF-8.

But the file content of resources loaded from disk may or may not be UTF-8 encoded and in fact all my resource files are using the default encoding (Windows 1252) which is Visual Studio's file default for code files including JS files. You can change that by using Save As and using hte special options on the save button which allows application of encodings:

When you import resources the resources are imported as binary resources directly from file as is. As it is I have some of these resource files that are UTF-8 encoded and most that are using the default encoding (with the latter being the more common). The problem for a generic routine is how do you sniff the content? A few weeks back I wrote a couple of routines that help with file byte order mark sniffing this for files on disk but this doesn't really help with Resources.

After some experimenting I found that the following actually works both with default encoding and UTF8 encoding on files:

StreamReader sr = new StreamReader(st,Encoding.Default);

In theory I wouldn't expect this to work - it should only work on those resources embedded with Windows 1252, but sure enough even the UTF-8 resources work. <shrug> I guess I shouldn't complain, eh? This solves my problem nicely.

The Voices of Reason

syuko
December 24, 2007

# re: Charset Encoding in ASP.NET Response

can you write a post about the ajax with josn.JQuery framework would be better.

Brian
September 17, 2010

# re: Charset Encoding in ASP.NET Response

Hello Rick, you've contributed to my knowlege several times.

Here's my conundrum...

If I don't have this line on the top of a very busy page with several datalists I get compile errors that characters are not understood.

I'm trying to add the facebook button via XFBML. Asynchronous will alleviate the loading problems I'm having waiting for FB to load.

I have another page that uses this: <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/> for the FB plugin and it works great.

But put this on my main page and it will not compile.

Brian
September 17, 2010

# re: Charset Encoding in ASP.NET Response

Sorry somehow my comment posted without finishing...

So here's the javascript to make this work;

<div id="fb-root"></div>
<script>
window.fbAsyncInit = function() {
FB.init({appId: 'your app id', status: true, cookie: true,
xfbml: true});
};
(function() {
var e = document.createElement('script'); e.async = true;
e.src = document.location.protocol +
'//connect.facebook.net/en_US/all.js';
document.getElementById('fb-root').appendChild(e);
}());
</script>

Is there a way to embed the utf-8 into this snippet so the rest of the page doesn't crash? Thanks Rick.

Robby Hawaii
May 31, 2012

# re: Charset Encoding in ASP.NET Response

Rick you saved my ass you son of a bastard! Thank you.

I was pulling my hair out with an issue with binary write sending back garbage. Seems there was a Telerik RadCompression module messing with the content type (setting it to Zip)

Regardless I was in pain for a long time and when I found this arcticle it gave me the information about what is going on that allowed me to debug it.

Thank you for publishing this. Also did you play the role of two face in the last batman movie?

jeremy simmons
January 28, 2022

# re: Charset Encoding in ASP.NET Response

new StreamReader(st,Encoding.Default); is defined as

public StreamReader(Stream stream, Encoding encoding) : this(stream, encoding, true, DefaultBufferSize, false) which calls this public StreamReader(Stream stream, Encoding encoding, bool detectEncodingFromByteOrderMarks, int bufferSize, bool leaveOpen)

Even though you defined an encoding, the internal value detectEncodingFromByteOrderMarks was true, so you get the right behavior.

Rick Strahl's Weblog