Rumors are that  Microsoft Office 2012 will support PDF import. There is also add-on to Microsoft Office 2007 that allows you to export and save to the PDF and XPS formats in eight 2007 Microsoft Office programs. It also allows you to send as e-mail attachment in the PDF and XPS formats in a subset of these programs.

Download 2007 Microsoft Office Add-in Microsoft Save as PDF or XPS - Microsoft Download Center - Download Details

Nuance use to have PDF converter but it was not very good.

[Aug 12, 2010] Sun PDF Import Extension

a free replacement for Adobe Acrobat
Linux Journal

The Sun PDF Import Extension is one of the most popular extensions ever created. For the last two years, it has been near the top of the list of most popular downloads on the Extensions site -- and no wonder, considering that it is a free replacement for Adobe Acrobat, which is currently priced at $449US. However, the extension does have some quirks and limitations that you have to learn to work around.

The first quirk you have to overcome is obtaining it. To start with, you need to be running 3.0 or higher.

That is probably not a problem for most users, but finding a usable copy of the extension may be. When you click the Get it! button on the extensions site, the link takes you to a page about Oracle Open Office, the successor to Sun Microsystems's Star Office. This page mentions the PDF Import Extension, but provides no downloads.

To download the extension, you need to be alert when your browser switches to the page that thanks you for downloading, and choose a manual download before you can get the file.

Even then, to judge from the comments on the extensions page (and my own experience), you may have trouble using the extension after you install it from Tools -> Extension Manager. The easiest way to get the extension is to check your distribution's repository to see if it is included as a package, as in Debian.

You will know if you have succeeded in installing if you try to open an PDF file and it displays in Draw.

By contrast, if you get a few characters of gibberish, you need to keep searching for another way of getting the extension. You might be able to find an alternative download site with an earlier version that you can use. Don't worry if the version number is far below the 1.01 release mentioned on the extension page; the version numbers took a huge, unwarranted leap, and (so far as I can tell) a .4x version will not be much different in functionality from the 1.01 release.

Using the extension

Once you have the Sun PDF Import Extension installed, you need to know its limitations. Unfortunately, it's a mixture of good and bad news.

The good news is that the extension works extremely well with text, preserving all types of formatting including font size, bold, italics, strike-through and underlining. Fonts, too are preserved, although their names are not always parsed correctly and may have a few additional characters at the end of them. Should the fonts not be available on your system, the extension tries to replace them with a font whose characters are metrically equivalent. The positioning, too, of text, is maintained in all-text documents, so that a brochure that has text scattered over the page is imported as accurately as a white paper that is a solid block of paragraphs.

The extension places each line of text in a separate text frame. Each fragment of a line separate by a tab or spacing is also placed in a separate text frame. This arrangement means that you can easily correct typos, or add a few words if the line is short. Add much more, and you will throw off the line spacing in the document. You can, of course, add your text frames, but you will have to work carefully not to interfere with the line spacing or the bottom margin -- to say nothing of moving every line carefully downwards. Still, the effort may be worthwhile if you need to edit or recover an important document.

Another problem is that true Adobe forms and graphics are not imported at all. At the most, you will have only their frames, and, at times, especially with PNG graphics, the positioning of text will be thrown off by the missing elements. In these cases, if you want to include the forms or graphics included in a PDF made outside of, then you will have to capture them and insert them manually into the Draw document.

If you import a PDF created within, you may be able to import forms and graphics -- providing that you set the PDF to Hybrid format when you exported the file. A Hybrid PDF combines Acrobat and Open Document formats. A PDF reader like Adobe Acrobat that cannot parse Open Document Format will simply ignore it, but, when you come to import the file into for editing, the forms and graphics will be imported along with the text. The cost of using Hybrid format is that your files will be an average of about 20% larger, but that is a relatively small price to pay for the convenience of the kludge.

Finally, when you are finished editing, remember not to save the file, but to use File -> Export to PDF instead.

[Aug 23, 2008] How to convert PDF files to HTML or XML files in openSUSE SUSE & openSUSE

Converting a PDF file into an HTML or a XML file has been made easy by a small useful utility called PDFTOHTML. PdftoHTml is a Xpdf based tool which can convert PDF files to HTML or XML format. PDFTOHTML also supports encrypted files and support for images in the PDF file by converting to PNG images files.

PDF import and hybrid PDFs as a new extension - Ninja

The extension installs as easily as any or Firefox extension. extensions cannot register file associations with the operating system (though you can set them up manually), but importing a PDF is as simple as clicking on File and then Open. The import process takes a long time compared to opening an document because of the necessary guesswork caused by the limitations of PDF.

For a test, I exported ODF_text_reference_v1_1.odt from and imported it again. When the initial screen appeared with the results, I stared at it in disbelief. It looked just like the original. The text, layout, font faces, text colors, bold, italics, underline, and picture were well preserved.

Below are the original in Writer and the imported document in Draw. Doesn't it take more than a glance to identify which is the original?

Alternative PDF import did not pioneer PDF import-not even in the open source market. Some of the work in is done by xpdf, a PDF viewer. To import PDFs, open source alternatives include pdftohtml, Abiword, KWord, and Inkscape. There are also a host of proprietary applications.

Depending on your needs, there are other ways to import PDFs into To import PDFs into Writer or Impress, you may be able to combine the new PDF import extension with copy and paste. If you just need to extract text, copy the text in Adobe Acrobat Reader and paste it into This retains some formatting.

[Mar 21, 2008] PHP html2ps 2.0.41 (Stable) by Konstantin

About: html2ps is a PHP equivalent of the popular Perl script by the same name that accurately converts HTML with images, complex tables (including rowspan/colspan), layers/divs, and CSS styles to Postscript and PDF. Unlike most other HTML2PS/HTML2PDF converters, it offers good CSS 2.1 support and is very tolerant to non-valid HTML. It can convert even CSS-intense sites like and

Changes: A large number of layout engine fixes and improvements were made.

[Mar 13, 2008] pisa 3.0.15 by spirito

About: pisa converts HTML to PDF using the ReportLab Toolkit, the HTML5lib, and pyPdf. It supports HTML 5 and CSS 2.1 (and some of CSS 3). The main benefit of this tool that a user with Web skills like HTML and CSS is able to generate PDF templates very quickly without learning new technologies.

Changes: New features: barcode and a table of contents. Many bugfixes. Better CSS support.

[Jan 18, 2008] Project details for pdf2djvu by Jakub Wilk

pdf2djvu 0.4.2

About: pdf2djvu creates DjVu files from PDF files. It's able to extract: graphics, text layer, hyperlinks, document outline (bookmarks), and metadata.

coolwanglu-pdf2htmlEX ∑ GitHub

pdf2htmlEX renders PDF files in HTML, utilizing modern Web technologies, aims to provide an accuracy rendering, while keeping optimized for Web display.

It is optimized for modern web browsers such as Mozilla Firefox & Google Chrome.

This program is designed for scientific papers with complicate formulas and figures, so a precise rendering is also the #1 concern. But of course general PDF files are also supported.


Not supported yet

ScanSoft - PDF Solutions - PDF Converter 3

This is a pretty revolutionary product that integrates with MS Word and provided much need functionality at the fraction of cost of Adobe Acrobat. You need to have MS Word to install the product, as it is a plug-in, and not a standalone program. Integration is seemless: you have addition item Open PDF in the file menu.

Based on my experience it converts pretty complex documents and presentation from PDF to MS Word 2003 with rather high quality. Conversion quality is excellent (almost perfect for text in all documents that I tried). Presentation are also converted very well (those that I tried were rather simple, mainly text slides PDF presentations.

I did not test it documents with complex layout (newspaper type documents).

You can convert from MS Word format to any Supported by MS Word formats including HTML, but generated HTML is very complex.

I also noticed that long documents conversion is rather slow.

Accessibility Tools page -- adobe PDF2HTML conversion page



