Software

Converting HTML or web pages to DOCX

DOCX is the file format of Microsoft Word documents, it is the most widely used word processing file format in the world and as such it is often used were the ability to alter the contents of a file is required in contrast to a static PDF document.

One example of this flexibility might be to create a report that can be customised as required by the user.

While there are libraries that can create Word documents. These are often complex as each part of the Word document from titles to paragraphs has to be defined in the object structure of the library and then combined together. It is even more difficult if you want to convert HTML to Webpage to Docx , the few libraries that do this such as phpdocx try to do this using the CSS styles contained in the HTML content passed to it. However because this HTML is not rendered in a browser it does not correctly interpret information in style sheets and JavaScript files which leads to inaccurate conversions.

GrabzIt overcomes this problem by running a custom HTML to DOCX parser in a real browser instance to create the most accurate word document conversions possible, by looking at the true style of each HTML element as determined by the browser.

In fact it is this ability of reliable HTML to DOCX conversion that also provides us with a better approach to creating a Word document. Rather than defining one element, such as a paragraph, table etc at a time, in code. This is especially true as HTML is such a widely understood language for defining documents it is therefore much easier to create your desired document in HTML and then pass it to GrabzIt to convert it into the final Word document. It is also simple for someone to write a bit if code to build a HTML document allowing, the whole document creation process to be potentially automated.

Alternatively you can also pass GrabzIt the URL of a webpage that you want converted into DOCX.

Of course GrabzIt’s HTML to DOCX API doesn’t only allow simple HTML to DOCX conversions but also has more complicated features such as custom headers and footers were a header and footer template can be created and passed to the API as a template ID. This templating system is sophisticated and can accept custom template variables, tables, images and much more to create the header and footer exactly how you want it.

Another useful feature is being able to merge multiple Word documents together allowing you to create a book like document of a website or other content. This works quite simply the ID of the previously created document is passed to the new document that is being created with GrabzIt’s API the old document is then inserted at the beginning of the new document, this allows you to chain documents together.

Unfortunately the features listed here are only a small sample of what is available in GrabzIt’s API in fact we would probably need another article to go into detail about all the flexible features it provides.