protected override void OnLoad(EventArgs e)
Response.ContentType = "text/xml";
... go on with your bad self
HTTP/1.1 200 OK
Content-Type: text/xml; charset=utf-8
Date: Sun, 23 Dec 2007 00:18:24 GMT
Notice that the charset is automatically appended to the content type. Where does this come from? ASP.NET automatically appends the charset when the page is eventually written out into the output stream, reading the charset information from the active Response.ContentEncoding or if you explicitly specified it the Response.Charset. So to change the charset for the page the most reliable thing is to change the Response.ContentEncoding that is applied which will change the actual output encoding for Response.Write() as well as the the header.
Response.ContentType = "text/html";
Response.ContentEncoding = Encoding.ASCII;
This results in Ha?? output for the two extended character because ASCII can't present them. The ContentType header is changed to text/html; charset=us-ascii.Note that Response.Write() output is buffered so ContentEncoding can be applied after some output has already been written.
The encoding is typically defaulted to what's specified in System.Web.Globalization.contentEncoding in web.config and that encoding is applied to the output sent to the client. This is always true unless you override the Response.ContentEncoding explicitly. Typically UTF-8 is the right choice because it works fairly reliably with Unicode text from all over the world. It's rare that you'd want to change this.
Most of the time this is what you want especially with HTML content. ASP.NET will take your generated output which is Unicode and then encode it into the specified Encoding and add the Charset. No problem for HTML page output with this or even XML output.
If you want to surpress the character set you can explicitly set it to null:
Response.ContentType = "text/html";
Response.Charset = null;
This encoding occurs as part fot he ASP.NET pipeline and Response processing . So, I have a custom handler implementation - an Ajax CallbackHandler class - that returns Json and it too generates charset=utf-8. Now some content shouldn't be UTF 8 encoded. JSON for example uses its own internal encoding for extended characters using Unicode escape sequences so UTF-8 encoding strictly shouldn't be necessary. However leaving UTF-8 enabled isn't going to hurt either except a tiny bit of overhead looking at the output and basically leaving it alone <s>.
Things are a little different though if you send binary content of course. Binary content is not encoded so automatic encoding isn't applied. This is true when you use BinaryWrite or write directly to the Response.Outputstream in some fashion. Fairly obvious. But it can get weird when you end up writing text out this way as I had to do today.
Now the output generated from this module however is lacking the charset which in effect is fine. But why? The code basically handles this event and then eventually generates its output with a Response.BinaryWrite() followed by a Response.End():
private void SendOutput(byte Output, bool UseGZip)
HttpResponse Response = HttpContext.Current.Response;
Response.Charset = "utf-8";
Response.ExpiresAbsolute = DateTime.UtcNow.AddYears(1);
Notice that in the code I call BinaryWrite() rather than Write() and because of this binary output ASP.NET doesn't do any encoding or content type fixup on my behalf. Duh! Encoding is applied only to text content written out with Response.Write() for the first time. So I have to explicitly ensure that the Charset is set. Additionally my code now is responsible for properly encoding the content for UTF-8.
(useGZip && script.Length > 8092)
output = GZipMemory(script); // also encodes to UTF8 before compressing
output = Encoding.UTF8.GetBytes(script);
useGZip = false;
Just one of those things that are possibly easy to miss if you deal with binary content that also happens to be text.
More Encoding Issues: Resource Encoding
Incidentally I ran into another weird encoding issue during all of this - initially I noticed that the data I sent back out of the module was double UTF8 encoded and because of the issues above I figured I had something messed up with output encoding. After stepping through the code I found that the problem wasn't in the output though but rather in the Resource retrieval using GetManfestResourceStream. The problem was the encoding of the original resource file on disk before it gets embedded. Now encoding is tricky in .NET because most of the text based readers use UTF-8 by default which means MOST of the time it works correctly with most content even if the content read is in fact not UTF-8 encoded.
I used this simple code to load the script:
// *** Load the script file as a string from Resources
string script = "";
using (Stream st = resourceAssembly.GetManifestResourceStream(resource))
StreamReader sr = new StreamReader(st);
script = sr.ReadToEnd();
This works fine most of the time as long as there are no extended characters because UTF-8's base 127 characters match the ASCII set and any default encodings. However with upper ascii string values the input just becomes garbage (several square characters). The default stream reader encoding is UTF-8.
But the file content of resources loaded from disk may or may not be UTF-8 encoded and in fact all my resource files are using the default encoding (Windows 1252) which is Visual Studio's file default for code files including JS files. You can change that by using Save As and using hte special options on the save button which allows application of encodings:
When you import resources the resources are imported as binary resources directly from file as is. As it is I have some of these resource files that are UTF-8 encoded and most that are using the default encoding (with the latter being the more common). The problem for a generic routine is how do you sniff the content? A few weeks back I wrote a couple of routines that help with file byte order mark sniffing this for files on disk but this doesn't really help with Resources.
After some experimenting I found that the following actually works both with default encoding and UTF8 encoding on files:
StreamReader sr = new StreamReader(st,Encoding.Default);
In theory I wouldn't expect this to work - it should only work on those resources embedded with Windows 1252, but sure enough even the UTF-8 resources work. <shrug> I guess I shouldn't complain, eh? This solves my problem nicely.
Other Posts you might also like