Rick Strahl's Weblog  

Wind, waves, code and everything in between...
.NET • C# • Markdown • WPF • All Things Web
Contact   •   Articles   •   Products   •   Support   •   Advertise
Sponsored by:
West Wind WebSurge - Rest Client and Http Load Testing for Windows

Help with working with multiple Unicode Locales in a VFP ANSI application


:P
On this page:

I’ve been struggling with a pretty nasty problem in Visual FoxPro dealing with a Web Application that must support multiple double byte/high Unicode languages from a single application. Visual FoxPro of course is not Unicode aware and must deal with all strings as ANSI. Worse VFP really has no conversion format to hold Unicode strings without potentially loosing the encoding information.

 

I can make my app work with one Language at a time, but not mixing languages (Chinese, Korean, Russian and a few others).

 

Here’s the scenario:

 

Data is stored in SQL Server and if I run some simple tests in ASP. Net to enter and store Unicode text in all of the languages I can store the data easily into SQL Server into an NTEXT field. To make this easy I created 4 records with a few languages. With ASP.NET all of this is completely automatic. You simply do a Request.Form, read the data and Insert it into Sql Server using standard ADO.NET SQL syntax or DataAdapter.Update commands.

 

Now in VFP I pull this data back out using SQLExec. If I run VFP in its default English mode doing:

 

loSQL = CREATEOBJECT("wwSQL")

loSQL.Connect("driver={sql server};database=westwindadmin;server=(local)")

loSQL.Execute([select nShapeValueId,sLongDescription  from tbShapeValues],"TShapes")

 

I get the Unicode data back merely as a bunch of ‘???? ???’ strings.

 

Now I can mess around with Window’s language settings to get me back at least one of these. Basically you can set the Regional Setting to have exactly one Unicode to Double-Byte Ansi translation. This can be done in Regional Settings:

 

 

 

When I do this I can now get Korean data to be translated properly into VFP. It also causes some of the Chinese, the Russian text to be translated, but all but the Korean text have problems.

 

 

 

The last two entries are actual content captured from VFP and then stored into the database.

 

So, the short point here is that I can easily capture a single language, but how do I capture multiple languages in this scenario? I can’t figure out how to do this…

 

The issue (I think) is that VFP cannot properly translate strings if the above regional setting is not in place. This causes problems all the way through this application. Not only can you not display the data, but the data permanently looses its formatting. Once the data comes down to VFP into a string that has ‘?’ inside of the string to replace a character the character is lost for good.

 

While you can use STRCONV() to handle conversions, STRCONV() relies on the fact that the string returned is valid in the first place. So you can’t pick up one of the ???? field values and run it through STRCONV() to get a valid value. It would have been useful if you could get a byte representation of these characters, but even using ASC() to return the values only returns 63 (‘?’) for the ‘missing’ characters.

 

It gets worse when you’re starting out with strings to start with. The above data is to be read and written out to a Web application. In order to make this work the output needs to be encoded in UTF-8 or other encoding that a Web browser can understand.

 

So with the result above I can write out the data in West Wind Web Connection with something like this:

 

loSQL = CREATEOBJECT("wwSQL")

loSQL.Connect("driver={sql server};database=westwindadmin;server=(local)")

loSQL.Execute([select nShapeValueId,CAST( sLongDescription as nVarChar(174) ) as sLongDescription from tbShapeValues],"TShapes")

 

LOCAL loSC as wwShowCursor

loSC = CREATEOBJECT("wwShowCursor")

loSC.ShowCursor()

pcCursorText = loSC.GetOutput()

 

*** Run ExpandTemplate to a String so we can encode it

lcResult = Response.ExpandTemplate(Request.GetPhysicalPath(),"NONE",,.t.) 

 

*** UTF-8 Encode

lcEncodedResult = STRCONV(lcResult,9)

 

*** Create a custom header

LOCAL loHeader as wwHttpHeader

loHeader = CREATEOBJECT("wwHTTPHeader")

loHeader.SetProtocol()

loHeader.Setcontenttype("text/html; charset=utf-8")

loHeader.AddHeader("Content-Length",TRANSFORM(LEN(lcEncodedResult)))

 

Response.Write( loHeader.Getoutput() )

 

*** Write it out

Response.Write(lcEncodedResult)

 

This works to generate:

 

 

The UTF-8 encoding of the text makes it possible for the foreign text to display correctly and the browser to automatically show in generic Unicode (UTF-8) display mode.

 

But obviously there’s still the problem of the other languages not displaying of course.

 

The other end of this is the data entry. Given that the page was generated with UTF-8 encoding the results of a POST operation (the textbox and Save button clicked) is also UTF-8 and URLEncoded. To get the data out you can use:

 

IF Request.IsPostBack()

   *** Read the raw input data

   pcSavedDescription = Request.Form("txtDescription")

 

   *** Convert the data from UTF-8 into ANSI string

   pcSavedDescription = STRCONV(pcSavedDescription,11)

 

   *** Insert Captured value back to SQL Server

   loSql.Execute(;

      [insert into tbShapeValues (sLongDescription,nLanguageId ,;

nCorpId,nShapeId) values ] +;

      [(cast( N'] + STRTRAN(pcSavedDescription,"'","''") + [' as nText)] )

 

 

This too works for Korean - only with the installed Unicode->Ansi Locale (Korea). For the other locales some things work others don’t. For some reason the Russian actually works, while the Chinese has a number of characters that work, but not others (there's apparently some overlap). Polish and Spanish miss a few accented characters. For Polish and other Euro charsets overriding the Content Type for the page with a specific character set (Windows 1252 forexample) will make it work, but I suspect this can cause problems with others that don't use the same encoding.

 

What sucks about this is that STRCONV() supports LOCALEID or CODEPAGE parameters, but they have no real effect of the data coming back when converting from UTF-8. The reason for this I think is the fact that once you do a conversion from UTF-8 into ANSI or Unicode VFP stores to string and immediately looses the locale specific Unicode encoding other than the one that is configured. At that point your text is lost and you get the ???? data that is being written in the browser and browse window screen shots above.

 

I’ve been messing with this stuff for a couple of days now and I cannot find a way to do this, so if anybody has any ideas on how to deal with multiple Ansi/Unicode Locales in a single application, heck even a single machine I would love some feedback.


The Voices of Reason


 

Kevin Pirkl
December 09, 2004

# Yeah I think I might be able to help you but it's a lot of muck...

In parts because the post is 2 big.

I'm working at Intel contracting and building localization support for SQL, ASP.Net, Multi Languange web page displays and exports to Excel.

Now the next stuff I'm going to tell you is not what I did to get around the problem but it might help with the font display in VFP that you are having. Later I give some better information for MultiFont web page display.

From working with Intels Localization team they always gave us back Excel Spreadsheets with multi language sets. I did not even think about how they splatted multi languages into Excel cells till I exported one to an HTML page to look at it's display and found out that they were using these fonts below for Language display and then I stated to see some light.

Language -------------- Font-Family
English - Arial
Chinese: Simplified - SimSun
Chinese: Traditional - PMingLiU
Czech - Arial
French - Arial
German - Arial
Hungarian - Arial
Italian - Arial
Japanese - MS UI Gothic
Korean - New Gulim
Spanish: America - Arial
Spanish:Spain - Arial
Polish - Arial
Portuguese: Brazil - Arial
Russian - Arial
Turkish - Arial

Ok, so some fonts support multiple languages and they are encoded into different parts of the Font-Family (At this point the lights are not on much, but Im still stirring the swill around.)

Seems ugly to me to have to change font every time I change to a different language that I want to display. I really didn't like this option and I dug up the following information. This article is DaBomb for explaining where languages reside per font and such (http://www.alanwood.net/unicode/fontsbyrange.html) Also if you plunder that site you will find even more information about stuff you probably don't care about but poke around anyway. (Now the lights are getting brighter...)

Kevin Pirkl
December 09, 2004

# re: Help with working with multiple Unicode Locales in a VFP ANSI application

So I know that if fonts are by range then they should be able to be encoded. It struck me that if Fonts can be by range then perhaps I could HTMLEncode them and all would display well in a web page but NOOOOOOOO! NOOOOO! it doesn't. I have guessed that the way Windows works with the Hex encoded crap and font display in a browser is to just know where to look for the corresponding font by range but don't bother with HTMLEncode it only works with Characters up to 255. Why only those characters would encode I dont really know but but I needed a way of extending the encoding and I dug up this URL (http://users.bigpond.com/conceptdevelopment/Localization/HtmlEncode/). <-- Wow someone with the same problem and a VIABLE SOLUTION. This site carries some extended encoder code that "will 'encode' upper-range/extended ASCII characters beyond (eg. decimal 128 - 255) as Html." (http://users.bigpond.com/conceptdevelopment/Localization/HtmlEncode/code3.html). Use it and then the web page will display all the Languages correctly if all the fonts are installed. At this point the Tables can export to MS Excel as well which is nice to point out.

Well to some degree I am probably talking out my ass but WTH if you don't put it out there then you will never figure anything out. Thats how I think it works but heck I really have not delved into the details that much because it fixed our problem anyhow, it just works

This article might help with understanding WTH is going on with Encoding, Code Pages and Encoding Web Pages (http://www.microsoft.com/globaldev/getWR/steps/wrg_codepage.mspx) or may just confuse you more.

Well dats my bit for now. Hope it helps some.

Kevin Pirkl

Rick Strahl
December 09, 2004

# re: Help with working with multiple Unicode Locales in a VFP ANSI application

Thanks for the info Kevin.

Couple of things. I have no problem displaying the data properly. If I have the raw data I can UTF-8 encode and it will in fact display correctly in the browser (assuming the right languages and character sets are installed). In fact, doing this with ASP.NET which also uses UTF-8 by default all of this is completely transparent.

As long as you bank on Unicode displaying in the browser and the charsets are installed it should just work in the browser. If necessary you can add a specific Locale suffix in addition to the encoding.

My issue at hand is how to get the data from an incoming Web page into something that VFP doesn't screw up with 'just giving up'. Inside of VFP I don't even need to see the data so I would actually prefer to get the data in binary format and pass it through the system that way. This actually works in getting the data out of SQL:

loSQL.Execute([select nShapeValueId,CAST( CAST( sLongDescription as nVarChar(150) ) as VarBinary(150)) as sLongDescription from tbShapeValues],"TShapes")

By casting to binary the data stays free of VFP's encoding. I haven't figured out how to get this data back into SQL Server. VFP doesn't support parameterized queries with Unicode data (ie. using the N' prefix). If need be ADO might do this though.

The other problem I have now that is more serious is that if I get data from a Web page at some point it will end up in a string and VFP will screw up at that point. I can get UTF-8 encoded input, but if I use STRCONV() it will give me a screwed up string. Even going UTF-8 to Unicode doesn't work as far as I can tell.

Gabe
December 10, 2004

# re: Help with working with multiple Unicode Locales in a VFP ANSI application

Well, as you probably know, ASP.NET has no problem handling unicode text and interacting with SQL server to store said text. Would it be worth your while or possible to write the language handling part of the application in ASP.NET?

Kevin Pirkl
December 10, 2004

# re: Help with working with multiple Unicode Locales in a VFP ANSI application

Yeah you may be stuck doing a majority of your work through ADO. Not much luck finding VFP localization Good News as I would call it with a couple of hours of GOOGLE'ing it. I plastered 16 languages into one file and used SQLExec to retrieve it and see exactly what you saying on the ???? field garbage. Yup ADO may be your only saving grace in this case. You may have already looked at this article from your screen prints above. By Margaret Duddy http://www.stevenblack.com/INTLUsingAsianCharacters.asp">http://www.stevenblack.com/INTLUsingAsianCharacters.asp from (http://www.stevenblack.com/) where there is a lot of stuff and links on VFP Internationalization. Margaret Duddy reiterates correct Windows Font Selection for character set/language display like the list I provided above (URL references 2.) ADO returned strings and correct Font selections might be your best answer.

For the web page display I still like the extended HTML encode function and the fact that it lets Windows find the correct font to display from for you.

Anyhow cheers.

Rick Strahl
December 10, 2004

# re: Help with working with multiple Unicode Locales in a VFP ANSI application

Actually with a little help from Steve I was able to get it all to work yesterday. The key is to push everything through the system in binary until display time. This works fine for Web interfaces, but in a Windows interface it's not possible to display more than a single codepage at a time in VFP - anytime you assign to string your page output gets screwed.

I'll post an article for what I found in a few days...

Rick Strahl
January 18, 2005

# re: Help with working with multiple Unicode Locales in a VFP ANSI application

There's a full article on Unicode use in Visual FoxPro online now at:

http://www.west-wind.com/presentations/foxunicode/foxunicode.asp

tony
February 17, 2005

# re: Help with working with multiple Unicode Locales in a VFP ANSI application

I meet a very strange thing, when I save a file with some Chinese Characters in asp format, the browser can display it correctly. But if I save it in aspx format, the browser can not display it correctly. All the pages are in utf-8 encoded. can anyone help me out?

Jenny
April 02, 2005

# re: Help with working with multiple Unicode Locales in a VFP ANSI application

Hi, Tony:
I met the same problem. It can be solved by adding "N" in front of the field contents. such as: update tableName set col=N'fieldContents' where ...(if type of col is nvarchar or ntext...

Vinayak
June 17, 2005

# re: Help with working with multiple Unicode Locales in a VFP ANSI application

Hi its all ok,
but I am using ADO for retrieving from database, but it is giving me same.
Can u give me some simple examples in VB, for rectrieving, adding, and updating these data into the database?

Can u please send details to my mail id
vinayak.khavasi@in.bosch.com.

regards
Vinayak.


superedge.net
August 22, 2005

# re: Help with working with multiple Unicode Locales in a VFP ANSI application

this is chinese. let's see how it displays here :)
?: ???? ?: ????


VFP 愛用者社區
October 08, 2006

# VFP ·R¥ÎªÌªÀ°Ï :: Æ[¬Ý¤å³¹ - Using Unicode in Visual FoxPro(Âà¶K)


Elvir
January 08, 2010

# re: Help with working with multiple Unicode Locales in a VFP ANSI application

I think that this language issue is trouble for many programers around the World, and computer language company made so weak support and solutions, like whole World is still on ASCII 255 characters. Shame. I like FoxPro, all version, now on VFP9 testing, but multibyte conversion is so clumsy developed and supported.
Our national letters as follows:
A B C Č Ć D Đ E F G H I J K L LJ M N NJ O P Q R S Š T U V Z Ž X Y

MARLO
October 18, 2014

# re: Help with working with multiple Unicode Locales in a VFP ANSI application

hi is anybody can help me regarding the russian characters for example in the entry form or if you upload to foxpro 9 dbf files it became scrats ?????...
I already change the settings of my computer to russian but the same...if you type in foxpro dbf file russian character p the system will change it to b i dont understand why....

Rick Strahl
October 20, 2014

# re: Help with working with multiple Unicode Locales in a VFP ANSI application

@Marlo - the Codepage has to be the same both on the input end and the output end. So the database has to be in the right code page, the client that created the record had to be using the same codepage and the user on the other end reading the data has to use the same codepage. If anywhere in there the codepage doesn't match you get the ???? character translation issue.

West Wind  © Rick Strahl, West Wind Technologies, 2005 - 2024