Rick Strahl's Weblog  

Wind, waves, code and everything in between...
.NET • C# • Markdown • WPF • All Things Web
Contact   •   Articles   •   Products   •   Support   •   Advertise
Sponsored by:
West Wind WebSurge - Rest Client and Http Load Testing for Windows

RTF to HTML Conversion


:P
On this page:

I've had a real frustrating day to day trying to find a solution for properly pasting code from Visual Studio and Visual FoxPro into Html Help Builder. It actually works today in the product, but the process - especially in WYSIWYG editing mode - is not intuitive and I'm guessing most people won't find the menu option for getting code properly pasted until much later. <s>

Html Help Builder supports two editing modes: Text Mode and WYWSIWYG mode where text mode is a mostly plain text markup with HTML tags escaped in "extended tags" (ie. <<b>>Text<</b>>). The reason for this is to make it easy to allow actual HTML to be embedded as text.

Anyway, currently the way code insertion works is that you can either choose to paste code as text and then mark it up manually by highlighting the block and then picking a language in text mode. Personally I actually prefer this approach because it allows some flexibility in the code by letting you edit code later in the editor and still getting proper syntax coloring for the edited code. IOW, Html Help Builder actually performs the coloring (based on your style choices).

In text mode you can simply highlight the code and select the language:

This works great in text mode and as you can see the code isn't actually marked up - there are just <<code lang="C#'" ><</code>> tags around any code and so the code can be edited and it will properly syntax color even with the changes. The code itself just remains plain text.

If WYSIWYG mode there's a special menu option - Paste Code or Formatted Text (Ctrl-F6) which when activated prompts you to select a code language (or No Formatting) and then uses the text from the clipboard to pick up the code and do the syntax coloring on it. In this scenario the code is actually pasted as HTML so once pasted it's no longer editable with syntax coloring - it's just colorful text <s>.

The above approach works reasonably well, but especially for the WYSIWYG mode it would be nice to just be able to paste code from Visual Studio, Visual FoxPro and so on without any special work. The problem here is that the HTML Edit control used in Html Help Builder doesn't accept RTF formatted code very well. This is quite common - if you use  a Rich HTML Edit control in your Web apps or even desktop applications like Live Writer which use the Html Editor control they also don't deal with code very well.

So for a while I've been thinking about adding better RTF to HTML conversion support to make this process a bit more transparent. I went down this road on a few occasions and have always been frustrated by a lack of decent tools.

I started with a few .NET samples and utilities. There are actually quite a few utilities aimed at cleaning up VS.NET source code for pasting into blogs. But most of the free samples have a number of problems with formatting. While most work fine with code when you throw XML out of Visual Studio or Visual FoxPro (which has actually has invalid RTF formatting altogether) several of the tools I tried just choked. Natch that I need this to be solid and more universal than that.

So I started looking at a few commercial components which worked a lot better with a wide variety of code. While I have no problem paying for a tool I need I hate spending a lot of money for something that's essentially a small utility function. There are several controls out there but they are around $300-$400. Is it just me or does that seem damn pricey?  Worse though not one of them is decently documented reeking of a quickly thrown together component. I can't convince myself (just yet) to go down this path.

So before I get desperate I thought I throw out the question here: Anybody used a component they can recommend or have an RTF converter sitting around? COM or .NET will work but actually COM would be the preferred route for me here.

FWIW, here's how I would like to handle the Paste operation (this is using one of the commercial controls as a sample):

   PROCEDURE HTMLTextContainerEvents_onpaste() AS LOGICAL

     LOCAL lcText, lcHTML, loRange, loEvent

 

     *** Check for HTML - just let it paste

     lcText = GetClipboardText("Html Format")

     IF !ISNULL(lcText)

        RETURN

     ENDIF    

             

       *** Check for RTF - we'll want to try and convert this to HTML    

       lcText = GetClipboardText("Rich Text Format") 

       IF !ISNULL(lcText)

            LOCAL Converter as EasyByte.Rtf2HtmlV6

            Converter = CREATEOBJECT("EasyByte.Rtf2HtmlV6")

 

            Converter.CleanHTML = "yes"

            Converter.Links = "yes"

 

            *** Doesn't work

            * Converter.XHTMLOutput = "yes"                  

                       

            Converter.RTF_Text = lcText

            lcHtml = Converter.ConvertRTF()

     

          loRange = this.oDoc.Selection.CreateRange()

      IF !ISNULL(loRange)

            loRange.PasteHtml(lcHtml)

        ENDIF

 

          loEvent = this.GetEventObject()

         

        loEvent.ReturnValue = .F.

        loEvent.CancelBubble = .T.  

       ENDIF     

       

   RETURN .T.  

   ENDPROC

Basically there's a check to see whether the text on the clipboard is HTML. HTML work fine for pasting so it just pastes. Then there's a check for RTF and if it is the RTF text is converted into HTML. If it's not RTF then it's plain text and that just gets pasted as is.

It actually works well and definitely is much more discoverable than fishing for the Ctrl-F6 key <g>.

Posted in Help Builder  FoxPro  

The Voices of Reason


 

Robert Dean
August 05, 2007

# re: RTF to HTML Conversion

Rick,

I faced a similar problem a while back (3yrs ago) where we needed to convert a bunch of RTF text fields into HTML in a hurry.

It was for a demo so commercial solutions were not feasible. Instead, I chose to create a quick console app that used Word 2003 to convert all fields into HTML. The app extracted the rtf content for each record from the database and saved it as a .rtf file on disk. Using the Word converter engine, it opened the document and saved it as a html file. The file then opened and the content was extracted and saved into a new field.

I used word 2003 because it had a "clean" HTML converter that removed most of the Microsoft tags and classes that usually acompany this process.

It was unorthodox and ugly, but it worked and the client was satisfied. If you are interested, I can look for the code and send it to you.

Thanks for sharing your work with the world!!!

Peter Bromberg
August 05, 2007

# re: RTF to HTML Conversion

Rick,
Have you seen CopySourceAsHtml VS.NET 2005 add-in? I believe he makes the source available:

http://www.jtleigh.com/people/colin/software/CopySourceAsHtml/

Rick Strahl
August 05, 2007

# re: RTF to HTML Conversion

@Robert - thanks, but I want to avoid a dependency on Word, plus dealing with Words massive HTML it generates even for it's "clean" version <s>...

@Peter - yup played around with that component. It works with code out of VS but doesn't do so well with text from other places. Still checking that code to see if I can tweak it to work a little better with uhm, mal-formed RTF (RTF Textbox and VFP code for example).

Aaron Fischer
August 06, 2007

# re: RTF to HTML Conversion

you can check sharpdevelops code/webservice http://codeconverter.sharpdevelop.net/FormatCode.aspx

Rick Strahl
August 06, 2007

# re: RTF to HTML Conversion

@Aaron - Code conversion I can do, but it's basically RTF conversion that's required. The problem is that the paste may come from other places like an RTF textbox or WordPad for example.

The other issue in that regard is that I'd need to know whether I'm dealing with code and what type of code is being pasted and that's always the issue.

I suppose one approach might be to intercept the RTF pasting and then prompt the user for the type they are posting (ie. Text, or Code). Hmmm... that may be a reasonable option actually given how crappy the components I've looked at are. Thanks - I think you may have just given me an idea that will work in the interim until (or if) I find a decent RTF conversion component.

Marty Cantwell
August 07, 2007

# re: RTF to HTML Conversion

Have you tried the Bennet-tec Alltext control lately? I've been using their ActiveX control for several years in a project. The HTML license will convert between RTF and HTML (and back) but I don't know how well. I've not used it myself...

http://www.bennet-tec.com

They do seem pricey now-a-days though...

Craig J.
May 09, 2008

# re: RTF to HTML Conversion

Did you find a resolution to this?

I'm looking around at all the same stuff and find nothing that really suits my needs.

Indra Preet Singh
October 30, 2010

# re: RTF to HTML Conversion


i know that this is a very old post ,
But respecting this post's topic ,

i just want 2 know that is there any free web service available to convert a RTF file to HTML file .

If there is , then do share it ..

Help is appreciated ....

West Wind  © Rick Strahl, West Wind Technologies, 2005 - 2024