Rick Strahl's Weblog  

Wind, waves, code and everything in between...
.NET • C# • Markdown • WPF • All Things Web
Contact   •   Articles   •   Products   •   Support   •   Advertise
Sponsored by:
West Wind WebSurge - Rest Client and Http Load Testing for Windows

HtmlTextWriters, Encoding, ASP.NET and writing Encoded output to file


:P
On this page:

Ok, I feel like an idiot, but I’ve been experimenting with this for an hour now and I cannot for the life of me figure how to get the encoding correct to run output from an ASP.NET page into a file. Well, no I can get it to work with explicitly setting the encoding to the Windows 1252, but this is not really what I want…

 

Here’s the setup. I’m using the ASP.NET runtime inside of a desktop app. The ProcessRequest method needs to pass a TextWriter to the ASP.NET runtime into which it will then render the output. This is the same TextWriter that Page.Render writes into for example.

 

All works well, except I can’t get the output written to file to look correct. What I want is to get the output written in UTF-8. So I thought I can use:

 

TextWriter Output;

 

try

{

      // *** Note you have to write the right 'codepage'. If you use the default UTF-8

      // *** everything will be double encoded.

      Output = new StreamWriter(this.OutputFile,false, Encoding.UTF8);

 

}

catch (Exception ex)

{

      this.Error = true;

      this.ErrorMessage = ex.Message;

      return false;

}

 

// *** Reset the Response settings

this.ResponseHeaders = null;

this.Cookies = null;

this.ResponseStatusCode = 200;

 

wwWorkerRequest Request = new wwWorkerRequest(Page, QueryString, Output);

if (this.Context != null)

      Request.Context = this.Context;

 

Request.PostData = this.PostData;

Request.PostContentType = this.PostContentType;

Request.RequestHeaders = this.RequestHeaders;

Request.PhysicalPath= this.PhysicalDirectory;

 

try

{

      HttpRuntime.ProcessRequest(Request);

}

catch(Exception ex)

{

      Output.Close();

      this.ResponseStatusCode = 500;

      this.ErrorMessage = ex.Message;

      this.Error = true;

      return false;

}

 

Output.Close();

 

this.ResponseHeaders = Request.ResponseHeaders;

this.ResponseStatusCode = Request.ResponseStatusCode;

 

 

// *** Capture the Cookies that were set by the server

this.Cookies = Request.Cookies;

 

if (Request.Context != null)

      this.Context = Request.Context;

 

return true;

 

The ASP.NET application is setup to encode to UTF-8 in Web.config:

 

 <globalization requestEncoding="utf-8" responseEncoding="utf-8" />

 

So, what happens? Output gets generated but the output actually gets double encoded. I have a string like this embedded in the HTML of the ASPX rendered:

 

¢ª

 

After running the code above I get (raw output):

 

¢ª

 

Which is some funky double encoded wanna-be UTF-8 output of the above characters.

 

Next, I thought Ok, so we’re double encoding – let’s try Encoding.Ascii on the stream, but that gives me invalid characters (??????), so that’s no good either. Using Encoding.Default produces different results yet:

 

¢ª

 

which is just plain garbage.

 

I did manage to get this to work by using Encoding.Default (Windows 1252 basically) and then also setting the web.config to use Windows-1252 for its encoding, but this is not really what I want. Using a specific Encoding works to get me through, but it's not a good generic solution. Certainly UTF8 would be a better choice.

 

I don’t really understand what I should be passing in for a TextWriter here when I need to dump to file. Why is this double encoding occurring when I use Encoding.UTF8 on the stream? It seems what I need is raw binary stream into which the encoding TextWriter is writing. But then I’m stilling missing the byte order mark too…

 

What am I missing here? How do I set up my stream and TextWriter to get ASP.NET to write my output to file as properly encoded UTF-8 including the UTF-8 PreAmble and properly encoded upper characters?


The Voices of Reason


 

Diego Mijelshon
July 26, 2005

# re: HtmlTextWriters, Encoding, ASP.NET and writing Encoded output to file

I had a similar issue writing a Unicode CVS file on the fly from an ASP.NET app.
This is how I fixed it:

Response.ContentEncoding = Encoding.Unicode;
Response.BinaryWrite(Encoding.Unicode.GetPreamble());

And after that, just regular Response.Write for the content.
I guess it will work exactly the same using Encoding.UTF8.

Rick Strahl
July 26, 2005

# re: HtmlTextWriters, Encoding, ASP.NET and writing Encoded output to file

What about the file stream used to write out to file? What encoding is used on that?

I have no control over the code inside of the ASP.NET application since this is the hosting wrapper around the ASP.NET runtime.

I played around with using Unicode for the encoding in both the file stream and web.config, but this produced even more funky results with the extended characters.

Further oddities are if I create the stream and write out a PreAmble ASP.NET kills any of the stream output and re-writes the stream, even if I explicitly flushed the output to disk.

WTF are they doing with the stream? This should be trivial.


dominic turner
November 16, 2005

# re: HtmlTextWriters, Encoding, ASP.NET and writing Encoded output to file

found this told me a bit more about codepages..

http://www.informit.com/guides/content.asp?g=dotnet&seqNum=163&rl=1

when i used system.text.encoding.default then it wrote out characters with an ASCII code of above 127 how I expected.... (in fact this is what you wrote at the beginning of your article - the default must be the codepage your system is set to i.e. 1252).

Is this the case that you have written this file out in UTF-8 but that whatever is reading it thinks that it is ASCII and therefore shows the weird character - doesn't UTF encode a special character as 2 bytes which would look like 2 chars in an ASCII file.

My thinking that the "preamble" tells whatever is reading it that what follows is UTF-8 and therefore the 2char thing is a special char. I am writing out html pages - maybe there is a setting i can put in the header to say it is utf-8 and it would then be fine. There definitely is one for xml.

To be honest I was very frustrated by this weird char thing and have stopped now that i can at least get what i want using the "default" encoding value. Is this to do with codepages being used which has now been superceded by Unicode, meaning we have weird problems when you try and swap between the two.

dominic turner
November 16, 2005

# re: HtmlTextWriters, Encoding, ASP.NET and writing Encoded output to file

Yeah - I tried this mate - I think the simple fact is that you ARE writing out UTF-8

Its just that you are reading it with notepad so it looks horrible.

I was writing out web pages using the streamwriter (that I had read in using HttpWebRequest)

Surprise surprise I got these garbage characters.

If I added the following meta tag to the html

<meta http-equiv="content-type" content="text-html; charset=utf-8">

then when i opened them in a browser they looked fine. In notepad the pesky chars were still there.

The problem seems to me that Microsoft has not been very consistent - since I was writing out ASP.NET pages that I had written. The default should be for ASP.NET pages to be in UTF-8 with this meta tag, then the default for streamwriter being UTF-8 would make sense.

Anyhow you probably are not worried about this anymore but i bet many people are stumbling over the exact same problem (my google searches tend to confirm this).

Mr. Magoo
November 25, 2005

# re: HtmlTextWriters, Encoding, ASP.NET and writing Encoded output to file

Here is the thing!(Next 2 posts)

Mr. Magoo
November 25, 2005

# re: HtmlTextWriters, Encoding, ASP.NET and writing Encoded output to file

public bool SaveHtmlToFile(string strURL, string strFilename)
{
try
{
HttpWebRequest myHttpWebRequest = (HttpWebRequest)WebRequest.Create(strURL);
HttpWebResponse myHttpWebResponse = (HttpWebResponse)myHttpWebRequest.GetResponse();
Stream receiveStream = myHttpWebResponse.GetResponseStream();
Encoding encode = System.Text.Encoding.GetEncoding("utf-8");
StreamReader readStream = new StreamReader(receiveStream, encode);
System.IO.FileStream fileoutput;
Char[] read = new Char[256];
Byte[] readb = new Byte[256];
int count = readStream.Read(read, 0, 256);

fileoutput = System.IO.File.Create(strFilename);

while (count > 0)
{
int l = 0;
for (int i = 0; i < count; i++)
{
if (read[i] <= (int)Byte.MaxValue && read[i] >= (int)Byte.MinValue)
{
readb[l] = Convert.ToByte(read[i]);
l++;
}

Mr. Magoo
November 25, 2005

# re: HtmlTextWriters, Encoding, ASP.NET and writing Encoded output to file

else
{
string struni;
Char[] readuni;
Byte[] readbuni;

fileoutput.Write(readb, 0, l);

struni = "&#" + Convert.ToInt16(read[i]).ToString() + ";";
readuni = new Char[struni.Length];
readbuni = new Byte[struni.Length];
readuni = struni.ToCharArray();
for (int p = 0; p < struni.Length; p++)
readbuni[p] = Convert.ToByte(readuni[p]);
fileoutput.Write(readbuni, 0, struni.Length);
l = 0;
}
}
fileoutput.Write(readb, 0, l);

count = readStream.Read(read, 0, 256);
}
fileoutput.Close();
myHttpWebResponse.Close();
readStream.Close();
return (true);
}
catch(Exception ex)
{
return (false);
}
}/*SaveHtmlToFile*/

Utunga
August 16, 2006

# re: HtmlTextWriters, Encoding, ASP.NET and writing Encoded output to file

Thanks very much Diego.. your comment is what fixed this problem for me.. thought I'd mention that.. I now have..


Response.ContentType = "text/xml";

// since the XML is based on a XmlStringWriter its based on a string == unicode == UTF-16

Response.ContentEncoding = Encoding.Unicode;


// for some reason, the following is also needed - you would think it was set by default.. but it ain't
Response.BinaryWrite(Encoding.Unicode.GetPreamble());

// --- elsewhere I have.. ----

StringBuilder output = new StringBuilder();
XmlTextWriter writer = new XmlTextWriter(new StringWriter(output));

writer.WriteStartDocument();
writer.WriteStartElement("ajax-response");

.. etc..

TsS
June 05, 2007

# re: HtmlTextWriters, Encoding, ASP.NET and writing Encoded output to file

Thank you a lot Rick for the great posts.
How did you solve this issue with encoding?

wloescher
June 29, 2007

# re: HtmlTextWriters, Encoding, ASP.NET and writing Encoded output to file

Yup...it's that extra "Unicode.Preamble()" bit that fixed it for me. The Response encoding defaults to UTF-8, but StringBuilder and XmlWriter default to UTF-16 (i.e. Unicode).

Here's my code snippet on the page called by my AJAX code:

protected void Page_Load(object sender, EventArgs e)
{
            Response.ContentType = "text/xml";
            Response.ContentEncoding = Encoding.Unicode;
            Response.BinaryWrite(Encoding.Unicode.GetPreamble()); // <-- this is the key!!
            Response.Write(GetXmlResults(items));
}

private static String GetXmlResults(Object[] items)
{
            // Create xml setting object
            XmlWriterSettings xmlSettings = new XmlWriterSettings();
            xmlSettings.OmitXmlDeclaration = false;
            xmlSettings.Indent = true;

            // Instantiate an XmlTextWriter object.
            StringBuilder sb = new StringBuilder();
            using (XmlWriter xw = XmlWriter.Create(sb, xmlSettings))
            {
                // Write the XML declaration node.
                xw.WriteStartDocument(true);

                foreach (Object item in items)
                {
                     WriteXmlItems(item.Name, item.Value);
                }

                // Close the file and perform cleanup.
                xw.WriteEndDocument();
                xw.Flush();
                xw.Close();
            }

            return sb.ToString();
}

private static void WriteXmlItems(String name, String value)
{
            xw.WriteStartElement("item");
            xw.WriteElementString("name", name);
            xw.WriteElementString("value", value);
            xw.WriteEndElement(); // item
}

Simon McEnlly
August 20, 2008

# re: HtmlTextWriters, Encoding, ASP.NET and writing Encoded output to file

Hi Rick,

I'm not sure if you still have this problem. I ran into it today as well. Slightly different solution to others mentioned in response.

I assume that your wwWorkerRequest class inherits from System.Web.Hosting.SimpleWorkerRequest?

If so, you'll need to override the SendResponseFromMemory() method and write to your StreamWriter in UTF8.

e.g.

public override void SendResponseFromMemory(byte[] data, int length)
{
_output.Write(System.Text.Encoding.UTF8.GetString(data));
}

Matt Burnell
August 20, 2008

# re: HtmlTextWriters, Encoding, ASP.NET and writing Encoded output to file

Simon forgot to mention (or deemed it too obvious to post) that of course SimpleWorkerRequest._output is private, so you’ll need to declare your own TextWriter and assign to it the TextWriter passed in in the constructor(s). Simon obviously used _output as the name for this TextWriter, too.

This solution stopped HttpRuntime.ProcessRequest() mangling Japanese characters in my page.

rgvlee
April 08, 2010

# re: HtmlTextWriters, Encoding, ASP.NET and writing Encoded output to file

I had an issue today with special characters popping up in a html stream sent to word. Someone had entered a body of text into our web app. This text was drafted in word or an equivalent, copied and then dropped into a text field in the web app. It was then saved into the DB.

When the record was printed, the word special characters appeared as garbage. The 2 lines mentioned above:

Response.ContentEncoding = Encoding.Unicode;
Response.BinaryWrite(Encoding.Unicode.GetPreamble());

fixed this issue. Thank you very much!

Lee

Dmitri
August 19, 2010

# re: HtmlTextWriters, Encoding, ASP.NET and writing Encoded output to file

Got this issue trying to save a grid to excel using htmlwriter (using german letters and euro signs).

Response.ContentEncoding = Encoding.Unicode; 
Response.BinaryWrite(Encoding.Unicode.GetPreamble()); 


Saved my day!
Thanks!

West Wind  © Rick Strahl, West Wind Technologies, 2005 - 2025