These five libraries cover the essential lifecycle of working with Microsoft Word (.docx) files in a JavaScript environment. docx-preview focuses on rendering documents directly in the browser for user viewing. docxtemplater specializes in merging data into existing Word templates. mammoth prioritizes extracting clean HTML or text from documents, sacrificing complex layout fidelity for content accuracy. officegen is a legacy tool for generating Office files from scratch. Finally, jszip serves as the low-level engine that many of these libraries rely on to handle the ZIP compression inherent in the .docx format.
Working with Microsoft Word files in a web environment is a common requirement for business applications, document management systems, and reporting tools. The ecosystem offers specialized tools for different stages of the document lifecycle: viewing, templating, extraction, and generation. Let's break down how docx-preview, docxtemplater, jszip, mammoth, and officegen solve these problems differently.
docx-preview is designed to render .docx files directly in the browser using HTML and CSS. It parses the document structure and attempts to replicate the layout without requiring a server-side conversion to PDF.
import { renderAsync } from 'docx-preview';
const blob = await fetch('/document.docx').then(r => r.blob());
const container = document.getElementById('container');
// Renders the docx into the provided DOM element
await renderAsync(blob, container, null, {
className: 'docx',
inWrapper: true
});
mammoth can also display content, but it converts the document to HTML first. This is not a visual preview of the layout, but rather a content extraction.
import mammoth from 'mammoth';
const buffer = await fetch('/document.docx').then(r => r.arrayBuffer());
// Converts to HTML, ignoring complex layout fidelity
const result = await mammoth.convertToHtml({ arrayBuffer: buffer });
document.getElementById('content').innerHTML = result.value;
officegen, docxtemplater, and jszip do not offer rendering capabilities. They are focused on manipulation and generation.
docxtemplater is the industry standard for merging JSON data into a Word template. It uses a syntax similar to Mustache inside the Word document itself.
import Docxtemplater from 'docxtemplater';
import PizZip from 'pizzip'; // Often used with jszip
const content = await fetch('/template.docx').then(r => r.arrayBuffer());
const zip = new PizZip(content);
const doc = new Docxtemplater(zip, {
paragraphDelimiter: { start: '{', end: '}' }
});
// Replace {name} in the docx with actual data
doc.render({ name: 'John Doe', role: 'Developer' });
const outBlob = doc.getZip().generate({ type: 'blob' });
officegen generates documents from scratch using API calls. It does not support templating in the same way; you define the structure programmatically.
const officegen = require('officegen');
const fs = require('fs');
const docx = officegen('docx');
docx.on('finalize', function (written) { console.log('Done'); });
// Define structure via code, not a template file
docx.createText('Hello World');
docx.createText('Role: Developer');
const out = fs.createWriteStream('out.docx');
docx.generate(out);
mammoth, docx-preview, and jszip do not have built-in templating engines. jszip could theoretically be used to swap XML nodes manually, but that is fragile and not recommended.
mammoth shines when you need to get text or HTML out of a document. It focuses on semantic meaning rather than visual exactness.
import mammoth from 'mammoth';
// Extract raw text
const textResult = await mammoth.extractRawText({ path: 'document.docx' });
console.log(textResult.value);
// Extract HTML with custom style map
const htmlResult = await mammoth.convertToHtml({ path: 'document.docx' });
docx-preview renders to DOM nodes, not a string format, so it is not suitable for data extraction.
docxtemplater is for inputting data, not extracting it, although you can inspect the XML if needed.
officegen is for creation only.
jszip is the underlying utility that handles the ZIP compression. Since .docx files are essentially ZIP archives containing XML files, jszip allows you to touch the raw internals.
import JSZip from 'jszip';
const zip = new JSZip();
// Add a file to the zip structure
zip.file('Hello.txt', 'Hello World\n');
// Generate the zip file
const content = await zip.generateAsync({ type: 'blob' });
docxtemplater uses jszip (or pizzip) internally to load and save the document.
officegen handles ZIP generation internally, abstracting jszip away from the user.
It is important to note the maintenance status of these tools. officegen has been around for a long time and is considered legacy by many in the community. For new projects requiring programmatic generation, the docx library (by dolanmiu) is often recommended for better TypeScript support and active maintenance.
// Modern alternative for generation (not in comparison list but worth noting)
import { Document, Packer, Paragraph } from 'docx';
const doc = new Document({
sections: [{
properties: {},
children: [new Paragraph('Hello World')]
}]
});
const blob = await Packer.toBlob(doc);
docx-preview and docxtemplater remain actively maintained and are the go-to choices for viewing and templating respectively.
| Feature | docx-preview | docxtemplater | mammoth | officegen | jszip |
|---|---|---|---|---|---|
| Primary Goal | View in Browser | Fill Templates | Extract Content | Generate Files | ZIP Utility |
| Input | .docx Blob | .docx Template + JSON | .docx File | JSON/Config | Files/Data |
| Output | HTML DOM | .docx File | HTML/Text | .docx File | .zip/.docx |
| Layout Fidelity | High (Visual) | High (Preserved) | Low (Semantic) | N/A (New File) | N/A |
| Templating | No | Yes (Mustache-style) | No | No (Code-based) | No |
| Maintenance | Active | Active | Active | Legacy/Slow | Active |
When designing a system that handles Word documents, you will likely combine these tools rather than picking just one.
docxtemplater on the server to fill templates with database data, then serve the resulting file. Use docx-preview on the client if users need to verify the document before downloading.mammoth to strip content out of legacy Word docs and import it into your CMS as HTML. Do not use docx-preview for this, as you need data, not a visual render.jszip to unzip the .docx, modify the XML, and zip it back up.Final Thought: Avoid using officegen for complex new development if possible — the API is older and less flexible than modern alternatives. For viewing and templating, docx-preview and docxtemplater are robust choices that handle the heavy lifting of the OpenXML standard for you.
Choose docx-preview when you need to display .docx files directly in a web browser without converting them to PDF or HTML first. It is ideal for document management systems, intranets, or any app where users need to review Word files client-side. Be aware that it replicates the layout using HTML and CSS, so complex formatting might not match Microsoft Word perfectly.
Choose docxtemplater if your workflow involves filling out pre-designed Word templates with dynamic data, such as generating invoices, contracts, or reports. It preserves the original document's styling and layout better than generating from scratch. It is the standard choice for server-side or client-side mail-merge style operations.
Choose jszip when you need low-level control over the ZIP structure of a .docx file or when building custom tooling that other libraries don't cover. Most developers will use this indirectly as a dependency for docxtemplater or officegen, but it is essential if you need to manipulate the raw XML parts inside the document package manually.
Choose mammoth when your goal is to extract the text content of a document for display on a web page or for indexing in a search engine. It is not suitable if you need to preserve exact visual layout, but it excels at producing clean, semantic HTML from Word styles. Use this for content migration or reading workflows.
Choose officegen only for legacy maintenance or simple generation tasks where modern alternatives are not an option. For new projects, consider more modern libraries like docx (by dolanmiu) as officegen has seen slower maintenance cycles. It is useful if you need to generate basic .docx, .xlsx, or .pptx files without heavy dependencies.
Docx rendering library
Demo - https://volodymyrbaydalka.github.io/docxjs/
Goal of this project is to render/convert DOCX document into HTML document with keeping HTML semantic as much as possible. That means library is limited by HTML capabilities (for example Google Docs renders *.docx document on canvas as an image).
npm install docx-preview
<!--lib uses jszip-->
<script src="https://unpkg.com/jszip/dist/jszip.min.js"></script>
<script src="docx-preview.min.js"></script>
<script>
var docData = <document Blob>;
docx.renderAsync(docData, document.getElementById("container"))
.then(x => console.log("docx: finished"));
</script>
<body>
...
<div id="container"></div>
...
</body>
// renders document into specified element
renderAsync(
document: Blob | ArrayBuffer | Uint8Array, // could be any type that supported by JSZip.loadAsync
bodyContainer: HTMLElement, //element to render document content,
styleContainer: HTMLElement, //element to render document styles, numbeings, fonts. If null, bodyContainer will be used.
options: {
className: string = "docx", //class name/prefix for default and document style classes
inWrapper: boolean = true, //enables rendering of wrapper around document content
hideWrapperOnPrint: boolean = false, //disable wrapper styles on print
ignoreWidth: boolean = false, //disables rendering width of page
ignoreHeight: boolean = false, //disables rendering height of page
ignoreFonts: boolean = false, //disables fonts rendering
breakPages: boolean = true, //enables page breaking on page breaks
ignoreLastRenderedPageBreak: boolean = true, //disables page breaking on lastRenderedPageBreak elements
experimental: boolean = false, //enables experimental features (tab stops calculation)
trimXmlDeclaration: boolean = true, //if true, xml declaration will be removed from xml documents before parsing
useBase64URL: boolean = false, //if true, images, fonts, etc. will be converted to base 64 URL, otherwise URL.createObjectURL is used
renderChanges: false, //enables experimental rendering of document changes (inserions/deletions)
renderHeaders: true, //enables headers rendering
renderFooters: true, //enables footers rendering
renderFootnotes: true, //enables footnotes rendering
renderEndnotes: true, //enables endnotes rendering
renderComments: false, //enables experimental comments rendering
renderAltChunks: true, //enables altChunks (html parts) rendering
debug: boolean = false, //enables additional logging
}): Promise<WordDocument>
/// ==== experimental / internal API ===
// this API could be used to modify document before rendering
// renderAsync = parseAsync + renderDocument
// parse document and return internal document object
parseAsync(
document: Blob | ArrayBuffer | Uint8Array,
options: Options
): Promise<WordDocument>
// render internal document object into specified container
renderDocument(
wordDocument: WordDocument,
bodyContainer: HTMLElement,
styleContainer: HTMLElement,
options: Options
): Promise<void>
Thumbnails is added only for example and it's not part of library. Library renders DOCX into HTML, so it can't be efficiently used for thumbnails.
Table of contents is built using the TOC fields and there is no efficient way to get table of contents at this point, since fields is not supported yet (http://officeopenxml.com/WPtableOfContents.php)
Currently library does break pages:
<w:br w:type="page"/> is inserted - when user insert page break<w:lastRenderedPageBreak/> is inserted - could be inserted by editor application like MS word (ignoreLastRenderedPageBreak should be set to false)Realtime page breaking is not implemented because it's requires re-calculation of sizes on each insertion and that could affect performance a lot.
If page breaking is crutual for you, I would recommend:
<w:lastRenderedPageBreak/> break pointsNOTE: by default ignoreLastRenderedPageBreak is set to true. You may need to set it to false, to make library break by <w:lastRenderedPageBreak/> break points
So far I can't come up with final approach of parsing documents and final structure of API. Only renderAsync function is stable and definition shouldn't be changed in future. Inner implementation of parsing and rendering may be changed at any point of time.
Please do not include contents of ./dist folder in your PR's. Otherwise I most likely will reject it due to stability and security concerns.