Integrating Word Spell Checking for Text and HTML

January 28, 2005 • 6 comments

On this page:

After experimenting with various modes of Word integration mainly to provide spell checking in Help Builder I decided to provide a simpler route to this process in addition to the full fledged Word insertion I talked about a couple of days ago.

Doing a Spell Check in Word is pretty simple if you have plain text to deal with – it only takes a few lines of code to automate Word to pop up the Spell Checking Dialog interactively. If you keep Word invisible you can get the Word dialog to pop up right on top of your application like this:

In the image above Help Builder is performing a spell check on the body content which is in HTML edit mode, meaning the content is not actually text, but HTML. Spell checking an HTML document is not a trivial task and the function below includes some logic to handle HTML spell checking as well.

************************************************************************

* wwUtils :: SpellCheck

****************************************

*** Function: Uses Word's Spell Check functionality to spell check

*** a string of text interactively

*** Assume: Requires Word 2000 and IE 4 or later COM objects

*** Pass: lcText - the text to spell check

*** llIsHtml - if .t. the text is HTML

*** Return: Spellchecked text

************************************************************************

FUNCTION SpellCheck(lcText,llIsHtml)

LOCAL loWord, loDoc, x, y

IF EMPTY(lcText)

RETURN ""

ENDIF

IF !ISCOMOBJECT("Word.Application")

RETURN lcText

ENDIF

IF !llIsHtml

*** Plain Text - simple assign and retrieve

loWord = CREATEOBJECT("Word.Application")

loDoc = loWord.Documents.Add(,,1,.T.)

loDoc.Content.Text = lcText

loDoc.CheckSpelling()

lcText = loDoc.Content.Text

loWord.Visible = .f.

loDoc.Close(.f.)

loDoc = .null.

loWord.Quit(.f.)

loWord = .null.

RETURN lcText

ENDIF

*** HTML - load into IE, retrieve text, replace

*** changed text

LOCAL loIe as InternetExplorer.Application

loIE = CREATEOBJECT("InternetExplorer.Application")

loIE.Navigate("about:blank")

DO WHILE loIE.Busy

DOEVENTS

ENDDO

loIEDoc = loIE.Document

loIEDoc.Body.innerHtml = lcText

lcTText = loIEDoc.Body.innerText

loIE = .f.

loWord = CREATEOBJECT("Word.Application")

*loWord.WindowState= 2 && wdWindowStateMinimize

loDoc = loWord.Documents.Add(,,1,.T.)

loDoc.Content.Text = lcTText

*** Pick up all the error text

x = 0

FOR EACH loError IN loDoc.SpellingErrors

x = x + 1

DIMENSION laErrors[X,2]

laErrors[x,1] = loError

laErrors[x,2] = loError.Text && Old text

ENDFOR

IF x > 0

loDoc.SpellingErrors.Item(1).CheckSpelling()

ENDIF

FOR y = 1 TO x

IF laErrors[y,1].Text # laErrors[y,2]

lcText = STRTRAN(lcText,laErrors[y,2],laErrors[y,1].Text)

ENDIF

laErrors[y,1] = .null.

ENDFOR

loWord.Visible = .f.

loDoc.Close(.f.)

loDoc= .null.

loWord.Quit(.f.)

loWord = .null.

RETURN lcText

* wwUtils :: SpellCheck

The basic spell check is pretty simple: Create Word, and open a document, assign the text to the Content object, Spellcheck, then read the text back out and shut down Word. It’s very important to use the proper code to shut Word down or you’ll get hung references and Word will not shut down and keep running invisibly. Note also that I don’t make Word visible. The Spell Checking dialog is independent of the Word document container and pops up on top of the current application. Note that it’s not modal, nor tied to your application so clicking anywhere else cause the window to disappear (or rather go to the bottom of the window stack).

For importing HTML the code uses the InternetExplorer.Application object to retrieve the innerText of the HTML document. This is the easiest way to retrieve just the Text from an HTML document although there are other ways including manual parsing and stripping of tags. The plain text is then passed to Word for spellchecking. In this code the errors are trapped before and stored in an array, which gives us the ability to keep track of the original and changed values so we can replace them in our HTML document. It keeps track of each of the spelling errors and after the spell check is complete runs through each of the original spelling errors and updates them in the original HTML document by replacing the text.

This works fairly well, but with the HTML portion there are a couple of issues to watch out for:

Replacements replace all instances of the changed text, so it’s possible that you might replace some text that shouldn’t be replaced. Replacing something like IN will cause problems.

innerHTML text is not always well formed text. For example, text in two adjacent cells runs together without spaces. This might give you a few false positives for spell checking that might not require any action.

It’s not perfect for HTML but workable – certainly better than no spell checking at all.

The Voices of Reason

Gabe
January 28, 2005

# re: Integrating Word Spell Checking for Text and HTML

You may have answered this before, but why not use a 3rd party spell check? There are several free or open source spell check components (both COM and .NET) that work well.

Rick Strahl
January 28, 2005

# re: Integrating Word Spell Checking for Text and HTML

I haven't seen any free COM spell checkers. I did look at a few commercial ActiveX controls, but they were all pretty bad. Most of them work with textboxes and the textboxes don't have the full support I need for them (Tabs, selections etc). I'm still looking for a decent edit control...

The other issue is dictionaries. If you have a custom spell checker you need dictionaries for various languages etc. and to ship all those files takes a ton of space. Help Builder is big enough as it is...

Help Builder uses Word for a variety of things already, so using it for spell checking doesn't seem a big issue. The nice thing is that's a nice and easy integration and a spell checker most people are already familiar with.

Alan Sheffield
February 01, 2005

# re: Integrating Word Spell Checking for Text and HTML

We have used similar code to provide spellchecking for our application. We have encountered a problem. THe user can click on VFP and sent the spellcheck window to the back. WIth the word windows being hiden their is nothing to click on in the startbar. The only way to retrieve the spellchecker is to alt+tab to get it back. I've looked into setting the windows "always on top" I can not see how to get a referance to the document's window and setting the property on the word objects window does not seem to have an effect.

a92rttt@hotmail.com
Any suggestions?

jo
February 16, 2006

# re: Integrating Word Spell Checking for Text and HTML

Can you make it modal?

Rick Strahl's Web Log
October 10, 2006

# Embedding Microsoft Word as a control into Desktop Forms - Rick Strahl's Web Log

Did you ever want to embed Word into your own applications and use it as a fancy text box for editing rich content? You can use the Web Browser Control with a Word document to display, edit and programmatically control Word documents quite easily.

Bill Totten
February 21, 2013

# re: Integrating Word Spell Checking for Text and HTML

Rick,
A) you are a great programmer and I thank you for your help, and B) you live in my dream place!

That said, is there a way to use Word to spell and grammar check realtime in a VFP (8 or 9) editbox - so that as the user types they see spelling and grammar errors?

Mahalo,
Bill

Rick Strahl's Weblog