Image Problems when Importing HTML into Microsoft Word via Automation
A simple way to do this is to open the document in Word and then copy the entire selection and paste it into another document that contains the proper format template to export to. It looks something like this (using Visual FoxPro 8 code):
llError = .F.
TRY
oWord=CREATE("word.application")
oWord.VISIBLE = .F.
DoEvents
CATCH
MESSAGEBOX("Unable to load the Word COM object:" + CHR(13) + CHR(13) +;
MESSAGE())
llError = .T.
ENDTRY
IF llError
RETURN
ENDIF
*** Start by loading the HTML file
oDoc = oWord.Documents.OPEN(lcFile)
DOEVENTS
*** Select and copy the whole thing to the ClipBoard
oWord.SELECTION.WholeStory
oWord.SELECTION.COPY
oDoc.CLOSE()
*** Copy the template file
COPY FILE (THISFORM.oHelp.cProjPath + ;
"templates\msword\helpbuildertemplate.doc") TO ;
( FORCEEXT(lcFile,"doc") )
oDoc = oWord.Documents.OPEN(FORCEEXT(lcFile,"doc"))
oWord.SELECTION.Paste()
oDoc.SAVEAS(FORCEEXT( lcFile, "doc" ))
This works great with one exception: All the images embedded in the document are considered external to the document – ie. linked images that must be there on disk. What I really need though is images that are embedded into the document.
After much shitty research through the woefully incomplete docs for Office Automation I found a way to embed ‘most’ images easily (I’ll come back to the ‘most’ part shortly as this is the reason for this rant). The following is a VBA macro I created that I call from the VFP code:
' This macro replaces external image links with
' embedded images so the document is self-contained
' This macro must be run while the images are in place
Sub ReplaceImages()
For Each oField In ActiveDocument.Fields
If oField.Type = wdFieldIncludePicture Then
oField.LinkFormat.SavePictureWithDocument = True
End If
Next
End Sub
It took me a while to find the SavePictureWithDocumnent option. It basically determines whether images are embedded in the document or live externally as linked files.
Well, after some back and forth I found out that the above approach is simple, but causes major problems when dealing with a large document. When running the above code on small documents it works well, but with larger documents (where large refers to the images embedded) memory usage goes through the roof and Word locks up. So… back to the drawing board and some adjustments to code I originally used which is more complex but interactively removes the image and then pastes it back into the document as an embedded image:
'***************************************************************************
'*** ReplaceImages
'*****************
' This macro replaces external image links with
' embedded images so the document is self-contained
' This macro must be run while the images are in place
Sub ReplaceImages()
lcPath = ActiveDocument.FullName
lcPath = Left(lcPath, InStrRev(lcPath, "\"))
For Each oField In ActiveDocument.Fields
If oField.Type = wdFieldIncludePicture Then
lcText = oField.Code
lnLoc1 = InStr(1, lcText, Chr(34))
lnLoc2 = InStr(lnLoc1 + 1, lcText, Chr(34))
lcCode = Mid(lcText, lnLoc1 + 1, lnLoc2 - 1 - lnLoc1)
lcCode = Replace(lcCode, "/", "\")
lcCode = Replace(lcCode, "\\", "\")
oField.Select
If FileExists(lcCode) Then
Selection.InlineShapes.AddPicture FileName:=lcCode, LinkToFile:=False, SaveWithDocument:=True
GoTo DonePath
End If
lcCode = Replace(lcCode, "\\", "\")
lcCode = lcPath + lcCode
If FileExists(lcCode) Then
Selection.InlineShapes.AddPicture FileName:=lcCode, LinkToFile:=False, SaveWithDocument:=True
End If
DonePath:
End If
Next
End Sub
Private Function FileExists(ByVal FileName As String)
Dim FileSize As Long
On Error GoTo FileExists_Error
FileSize = FileLen(FileName)
FileExists = True
GoTo FileExists_Exit
FileExists_Error:
FileExists = False
FileExists_Exit:
On Error GoTo 0
End Function
This code works well even on a very large document of over 500 pages. It’s still not fast, but it doesn’t seem to hurt Word resources much and shows something happening on the screen that doesn’t make the app appear locked up.
Now to the MOST part. The FOR loop does not catch all images. The problem is that there appears to be a bug in the Fields collection parsing as it does not catch all images from the HTML. Specifically if the image is marked up with any extra attributes the image does not show up in the Fields collection. Something as simple as:
<img src=”images/wwhelp.gif” align=”right”>
causes the image to not be included in the fields list which bites big time. Removing the align or any other tag (like HSPACE) causes the image to show up fine.
I have yet to figure out how to get at those images, but in the meantime I’ve been removing these tags from the default generation code in Help Builder which works for some of the automatically generated images such as icons for headers and class/data lists.
It always amazes me how things like this get through a testing process?
Other Posts you might also like
- Adding minimal OWIN Identity Authentication to an Existing ASP.NET MVC Application
- Resolving Paths To Server Relative Paths in .NET Code
- Map Physical Paths with an HttpContext.MapPath() Extension Method in ASP.NET
- Getting the ASP.NET Core Server Hosting Urls at Startup and in Requests
- Back to Basics: Rewriting a URL in ASP.NET Core
The Voices of Reason
# re: Image Problems when Importing HTML into Microsoft Word via Automation
# re: Image Problems when Importing HTML into Microsoft Word via Automation
http://weblogs.mozillazine.org/djst/archives/004866.html
The casing tip was a fantastic one - thanks Rick!
# re: Image Problems when Importing HTML into Microsoft Word via Automation
It was a lucky chance that i've found this post via google!
I'd like to share some thoughts regarding the subject.
I'm trying to convert .html to .doc via automation from the outside of Word, so i'm free to choose my favorite JavaScript. Also, i wish to do this in background. I wasn't so lucky to get rid of Word modal messages when just saving .html as WordDoc -- there were ugly modal messages compalining about .css in original .html . Finally i've succeeded just using famous Copy/Paste technology:
<script>
myWordApp.Documents.Open(myHTML)
myWordApp.Documents(myHTML).Select()
myWordApp.Selection.Copy()
myWordApp.Documents(myHTML).Close()
var newDoc=VMBVariables.myWordApp.Documents.Add()
myWordApp.Selection.Paste()
</script>
Then look how did i implemented Rick's approach:
<script>
var wdFieldIncludePicture=67
with(newDoc)
{ for(var i=1;i<=Fields.Count;i++)
with(Fields.Item(i))
{ if(Type==wdFieldIncludePicture)
{ Select()
with(VMBVariables.myWordApp.Selection.InlineShapes.AddPicture(LinkFormat.SourceFullName,false,true))
ScaleWidth=ScaleHeight=300
}
}
}
</script>
Now a little bit about above code:
I've removed all that fancy path calculation, because Field object actually have a property containing full path to an external image: LinkFormat.SourceFullName -- i'm fully relying on it. That's why i skip checking wether the file exists or not. Even if i should, i'd do something like the following -- simply to suppress possible run-time error:
<script>
try{
VMBVariables.myWordApp.Selection.InlineShapes.AddPicture(myPictureFileName,false,true))
}
catch(e){
}
</script>
Finally, you may wonder why i'm scaling a result image by 300%? Don't know! That is still a question for me: why Word is decreasing my images by ca. 3 times? Even more funny is that in a Format Picture dialog Word assure me that his 3 times smaller images are 100% of original! Then when i'm trying to increase them -- i see that they are not getting distorted, e. g. they were actually DECREASED by Word! No need to tel that neither Reset button, nor InlineShape.Reset() didn't help!
That's aLL
bb!
# re: Image Problems when Importing HTML into Microsoft Word via Automation
# re: Image Problems when Importing HTML into Microsoft Word via Automation
i was facing the same problem while generating the word document with html through asp.net
thanks for your hint
thank you very much once again
# Wierd Symbols when using word
When I type up my work on microsoft word I then email it to myself to continue on with it when I get home, I Then download the work form my email and when it now appears in microsoft word the work still appears, however now it contains dots (.) between each word and also contains A strange symbol that limits you and destroys the presentation as well as following the cursor, Why does this happen??? could someonew please help me out, Thanks very much!
# re: Image Problems when Importing HTML into Microsoft Word via Automation
c to it tht in word
under tools->options->view tab .. the formatting marks are unchecked ...
if they r checked u will get wierd symbols representing
return, space n all
just c if this is the reason ?
# re: immd. help plzzz... Image Problems when Importing HTML into Microsoft Word via Automation
embedding linked images in a doc.
the flow i followed.
identify the shapes, if linked picture (get its details like filefullname,top,left,width,height,zorder posn (the alignment details ) n delete the linked picture
now using these alignment properties n using add picture function am trying to embed the image
prb. am facing is : linked picture is not identified as msoLinkedPicture but is being identified as msoAutoShape
so am not able to embedd the image at the linked image's posn.
and
whtever value i give for top in the AddPicture argument it places the picture in the top=0 ... why ???
# re: Image Problems when Importing HTML into Microsoft Word via Automation
The following code (in perl) shows my solution to unlinking images. This works on my doc on Word 2002 but you may find it behaves badly on very large docs (as per the original solution).
$bit = $word->ActiveDocument->Shapes;
foreach $sec ( in $bit )
{
# $sec->ConvertToInlineShape;
$linkFormat = $sec->LinkFormat;
$linkFormat->{SavePictureWithDocument} = 1;
$linkFormat->BreakLink;
}
$bit = $word->ActiveDocument->Fields;
foreach $sec ( in $bit )
{
$linkFormat = $sec->LinkFormat;
$linkFormat->{SavePictureWithDocument} = 1;
$linkFormat->BreakLink;
}
# re: Image Problems when Importing HTML into Microsoft Word via Automation
You're probably long past caring, but have you tried simply unlinking all the fields rather than iterating through each one? The following is a VB.NET snippet. This unlinks all fields, so it may not fit all situations, but it should help boost performance.
' Iterate through each section of the document
' and unlink fields.
For Each docSection As Word.Section In .Sections
docSection.Range.Fields.Update()
docSection.Range.Fields.Unlink()
' Do same for each header/footer in section.
Next docSection
# re: Image Problems when Importing HTML into Microsoft Word via Automation
ActiveDocument.InlineShapes.AddPicture "http://www.lucenaturale.com/MercedWinter_150.jpg"">http://www.lucenaturale.com/MercedWinter_150.jpg", Savewithdocument:=True, LinktoFile:=False
I get a 5152 "not a valid file name" message. But, if I use that SAME filneame directly in Word's "Insert Picture|From File" dialog, it works fine! Also, if I use VBA in Excel with the Excel technique (and the same filename), it works fine:
dim p as object
set p=ActiveSheet.Pictures.Insert("http://www.lucenaturale.com/MercedWinter_150.jpg"">http://www.lucenaturale.com/MercedWinter_150.jpg")
This one is driving me nuts. The Inlineshapes.Addpicture method works just fine for local images, even images on a local intranet referenced with a network path (//).
# re: Image Problems when Importing HTML into Microsoft Word via Automation
.Addpicture Filename:="name" etc.
# re: Image Problems when Importing HTML into Microsoft Word via Automation
# re: Image Problems when Importing HTML into Microsoft Word via Automation
# re: Image Problems when Importing HTML into Microsoft Word via Automation
wdApp.Selection.Fields.Add Range:=Selection.Range, Text:="INCLUDEPICTURE ""<URL>"" \d", PreserveFormatting:=False
# re: Image Problems when Importing HTML into Microsoft Word via Automation
Anyone?
# re: Image Problems when Importing HTML into Microsoft Word via Automation
In my tests, it Word converts those images to shapes. You can loop through the shapes in the doc and look for those of type MsoShapeType.msoLinkedPicture and then set the shape's LinkFormat.SavePictureWithDocument property to true.
Hope that helps.
# re: Image Problems when Importing HTML into Microsoft Word via Automation
Call the Update method on the associated field (or all fields) before you unlink.
# re: Image Problems when Importing HTML into Microsoft Word via Automation
http://west-wind.com/weblog/posts/1178.aspx
# re: Image Problems when Importing HTML into Microsoft Word via Automation
# re: Image Problems when Importing HTML into Microsoft Word via Automation
I'm getting real annoyed..I tried switching the code, but it only gives 'image unavailable' in this random thing. Argh hard to explain. been doing this for a couple months..but I have to print something and arh!!
thx
ps: how come I was in the dormwire thing a few minutes ago...odd..
# re: Image Problems when Importing HTML into Microsoft Word via Automation
BB, I tried your code and it works great. Except, like you, I'm having the image resize issue, and I've tried the scale thing, but not working. I'm using ASP - could you write the asp equivalent of your .NET code?
<script>
var wdFieldIncludePicture=67
with(newDoc)
{ for(var i=1;i<=Fields.Count;i++)
with(Fields.Item(i))
{ if(Type==wdFieldIncludePicture)
{ Select()
with(VMBVariables.myWordApp.Selection.InlineShapes.AddPicture(LinkFormat.SourceFullName,false,true))
ScaleWidth=ScaleHeight=300
}
}
}
Thanks!
# re: Image Problems when Importing HTML into Microsoft Word via Automation
Oh, my friend, there is really nothing special about that code, so, i suppose, you can paste it in ASP directly. Okay, lemme try to spell it in VB:
<script>
...
With newDoc
For i As Integer = 1 To .Fields.Count
With .Fields.Item(i)
If .Type=wdFieldIncludePicture Then
.Select
With newDoc.Selection.InlineShapes.AddPicture(.LinkFormat.SourceFullName, False, True)
.ScaleWidth=300
.ScaleHeight=300
End With
End If
End With
Next i
End With
</script>
Sorry for that confusing "VMBVariable.." -- it has nothing to the topic..
# re: Image Problems when Importing html into Microsoft Word via Automation
I am not very good at expressing myself in computerese - so, here it goes:
Until last week, I was able to import images off the net (non-copyright images) without a problem. I could right click and copy directly into MS WORD, or do a Save Picture As on the Desktop, or Save Target As.
Now I can no longer do this. NO IMAGES at all. I cannot place an image on the desktop that is viewable in WORD or in Photoshop Elements 3. All I get is "The file listed will not be imported as a true file type."
Also, when I'm in word and hit Insert and do Insert Picture From File, I get: "An Error occured while importing this file."
I am in contact with Microsoft support but so far nothing's changed. What could have happened? I mean, did I click something in MS WORD I shouldn't have - or did something bizarre just happen with WORD - I don't know.
If you understand the problem and how to fix it, could you be so kind as to babytalk through whatever procedure I need to perform to get Images again? Ever so grateful,
Thanks,
Kit
segobibi@aol.com
# re: Image Problems when Importing HTML into Microsoft Word via Automation
I am trying the code u gave in asp. but it is giving error. can u please guide how exactly this code is to be utilized and embeded in the application so that image also appear in the doc.
please help me... i am confused.
thanx alot
# re: Image Problems when Importing HTML into Microsoft Word via Automation
# re: Image Problems when Importing HTML into Microsoft Word via Automation
Set WordApp = CreateObject("word.application")
Set WordDoc = WordApp.Documents.Add()
WordDoc.InlineShapes.AddPicture "c:\inetpub\wwwroot\test\logo.gif",False,True
# re: Image Problems when Importing HTML into Microsoft Word via Automation
I have an opposite problem to solve but still is uses the image that was copied from Microsoft Word so I hope somebody can help me out coz it's driving me nuts!
My program must allow the users to copy formatted-text and images from Microsoft Word and paste it into a web page. Actually,I'm using a 3rd party tool and it does all the formatting then it generates the HTML string of the web page.
The problem is when the image is pasted, the image file is saved to a directory like the following:
file:///C:\DOCUME~1\...\LOCALS~1\Temp\msoclip1\01\imageName.gif
In reality, I have to save the image into a special folder and not int then "DOCUME~1\..."
I was thinking of using the "CreateHTMLDocument" to load the HTML string so I can iterate through the elements (using javascript) but i found-out that W3C does not recommend this to use or it was not supported anymore (or something to that effect).
SO CAN SOMEBODY HELP ME OUT ON WHAT TO DO? THANKS A LOT!
# re: Image Problems when Importing HTML into Microsoft Word via Automation
It might not paste well here but I tried this and it worked fine;
I used Word 97;
and that I changed the Microsft article at the PreserveFormatting bit as follows:
On my vba, I use 2 lines -
line 1 ends at the underscore:
Sub InsertIncludePictureField()
' NOTE: Replace <Internet Address> with a valid URL.
Selection.Fields.Add Range:=Selection.Range, Text:= _
"INCLUDEPICTURE ""http://www.lpga.com/content/photos/pp_Ammaccapane_Danielle_lg.jpg"" \d", PreserveFormatting:=False
End Sub
# re: Image Problems when Importing HTML into Microsoft Word via Automation