html-to-docx vs html-docx-js
HTML到DOCX转换库
html-to-docxhtml-docx-js类似的npm包:

HTML到DOCX转换库

HTML到DOCX转换库用于将HTML内容转换为Microsoft Word文档(DOCX格式)。这些库提供了将网页内容导出为可编辑文档的功能,适用于需要生成报告、文档或其他文本内容的应用程序。通过这些库,开发者可以轻松地将HTML格式的内容转换为用户友好的文档格式,从而提高文档的可分享性和可编辑性。

npm下载趋势

3 年

GitHub Stars 排名

统计详情

npm包名称
下载量
Stars
大小
Issues
发布时间
License
html-to-docx103,2134764.8 MB1063 年前MIT
html-docx-js28,0021,148-8310 年前MIT

功能对比: html-to-docx vs html-docx-js

功能支持

  • html-to-docx:

    html-to-docx提供更全面的功能,支持复杂的HTML结构,包括表格、图像和多种样式。它能够更好地处理复杂文档,适合需要高质量输出的应用。

  • html-docx-js:

    html-docx-js专注于将基本的HTML元素和样式转换为DOCX格式,适合简单文档的生成。它支持文本、段落、标题和基本的样式,但对复杂布局的支持有限。

自定义能力

  • html-to-docx:

    html-to-docx提供更强的自定义选项,允许开发者深入控制文档的每个方面,包括样式、布局和内容结构。这使得它在生成复杂文档时更具优势。

  • html-docx-js:

    html-docx-js允许用户对生成的DOCX文档进行一定程度的自定义,包括字体、颜色和段落样式。用户可以通过JavaScript代码灵活地调整输出格式。

性能

  • html-to-docx:

    html-to-docx在处理复杂文档时表现出色,能够有效管理大量数据和复杂布局,适合需要高性能的应用场景。

  • html-docx-js:

    html-docx-js在处理简单文档时性能良好,转换速度快,但在处理大量复杂内容时可能会出现性能瓶颈。

社区支持

  • html-to-docx:

    html-to-docx拥有更活跃的社区支持,提供了丰富的文档和示例,开发者可以更容易找到解决方案和获取帮助。

  • html-docx-js:

    html-docx-js的社区相对较小,文档和示例代码有限,可能需要更多的自定义开发来满足特定需求。

学习曲线

  • html-to-docx:

    html-to-docx的学习曲线相对较陡,尤其是在处理复杂文档时,开发者需要花费更多时间理解其功能和用法。

  • html-docx-js:

    html-docx-js的学习曲线较平缓,适合初学者快速上手,尤其是对于简单文档的生成。

如何选择: html-to-docx vs html-docx-js

  • html-to-docx:

    选择html-to-docx如果你需要一个功能更全面的库,支持更复杂的HTML结构和样式。该库提供了更强大的功能,适合需要处理复杂文档格式的应用。

  • html-docx-js:

    选择html-docx-js如果你需要一个轻量级的解决方案,能够快速将HTML转换为DOCX,并且希望对生成的文档有较高的自定义能力。该库支持基本的样式和格式,但可能在复杂布局方面有所限制。

html-to-docx的README

html-to-docx

NPM Version

html-to-docx is a js library for converting HTML documents to DOCX format supported by Microsoft Word 2007+, LibreOffice Writer, Google Docs, WPS Writer etc.

It was inspired by html-docx-js project but mitigates the problem of documents generated being non-compatiable with word processors like Google Docs and libreOffice Writer that doesn't support altchunks feature.

html-to-docx earlier used to use libtidy to clean up the html before parsing, but had to remove it since it was causing so many dependency issues due to node-gyp.

Disclaimer

Even though there is an instance of html-to-docx running in production, please ensure that it covers all the cases that you might be encountering usually, since this is not a complete solution.

Currently it doesn't work with browser directly, but it was tested against React.

Installation

Use the npm to install foobar.

npm install html-to-docx

Usage

await HTMLtoDOCX(htmlString, headerHTMLString, documentOptions, footerHTMLString)

full fledged examples can be found under example/

Parameters

  • htmlString <String> clean html string equivalent of document content.
  • headerHTMLString <String> clean html string equivalent of header. Defaults to <p></p> if header flag is true.
  • documentOptions <?Object>
    • orientation <"portrait"|"landscape"> defines the general orientation of the document. Defaults to portrait.
    • pageSize <?Object> Defaults to U.S. letter portrait orientation.
      • width <Number> width of the page for all pages in this section in TWIP. Defaults to 12240. Maximum 31680. Supports equivalent measurement in pixel, cm or inch.
      • height <Number> height of the page for all pages in this section in TWIP. Defaults to 15840. Maximum 31680. Supports equivalent measurement in pixel, cm or inch.
    • margins <?Object>
      • top <Number> distance between the top of the text margins for the main document and the top of the page for all pages in this section in TWIP. Defaults to 1440. Supports equivalent measurement in pixel, cm or inch.
      • right <Number> distance between the right edge of the page and the right edge of the text extents for this document in TWIP. Defaults to 1800. Supports equivalent measurement in pixel, cm or inch.
      • bottom <Number> distance between the bottom of text margins for the document and the bottom of the page in TWIP. Defaults to 1440. Supports equivalent measurement in pixel, cm or inch.
      • left <Number> distance between the left edge of the page and the left edge of the text extents for this document in TWIP. Defaults to 1800. Supports equivalent measurement in pixel, cm or inch.
      • header <Number> distance from the top edge of the page to the top edge of the header in TWIP. Defaults to 720. Supports equivalent measurement in pixel, cm or inch.
      • footer <Number> distance from the bottom edge of the page to the bottom edge of the footer in TWIP. Defaults to 720. Supports equivalent measurement in pixel, cm or inch.
      • gutter <Number> amount of extra space added to the specified margin, above any existing margin values. This setting is typically used when a document is being created for binding in TWIP. Defaults to 0. Supports equivalent measurement in pixel, cm or inch.
    • title <?String> title of the document.
    • subject <?String> subject of the document.
    • creator <?String> creator of the document. Defaults to html-to-docx
    • keywords <?Array<String>> keywords associated with the document. Defaults to ['html-to-docx'].
    • description <?String> description of the document.
    • lastModifiedBy <?String> last modifier of the document. Defaults to html-to-docx.
    • revision <?Number> revision of the document. Defaults to 1.
    • createdAt <?Date> time of creation of the document. Defaults to current time.
    • modifiedAt <?Date> time of last modification of the document. Defaults to current time.
    • headerType <"default"|"first"|"even"> type of header. Defaults to default.
    • header <?Boolean> flag to enable header. Defaults to false.
    • footerType <"default"|"first"|"even"> type of footer. Defaults to default.
    • footer <?Boolean> flag to enable footer. Defaults to false.
    • font <?String> font name to be used. Defaults to Times New Roman.
    • fontSize <?Number> size of font in HIP(Half of point). Defaults to 22. Supports equivalent measure in pt.
    • complexScriptFontSize <?Number> size of complex script font in HIP(Half of point). Defaults to 22. Supports equivalent measure in pt.
    • table <?Object>
      • row <?Object>
        • cantSplit <?Boolean> flag to allow table row to split across pages. Defaults to false.
    • pageNumber <?Boolean> flag to enable page number in footer. Defaults to false. Page number works only if footer flag is set as true.
    • skipFirstHeaderFooter <?Boolean> flag to skip first page header and footer. Defaults to false.
    • lineNumber <?Boolean> flag to enable line numbering. Defaults to false.
    • lineNumberOptions <?Object>
      • start <Number> start of the numbering - 1. Defaults to 0.
      • countBy <Number> skip numbering in how many lines in between + 1. Defaults to 1.
      • restart <"continuous"|"newPage"|"newSection"> numbering restart strategy. Defaults to continuous.
    • numbering <?Object>
      • defaultOrderedListStyleType <?String> default ordered list style type. Defaults to decimal.
    • decodeUnicode <?Boolean> flag to enable unicode decoding of header, body and footer. Defaults to false.
    • lang <?String> language localization code for spell checker to work properly. Defaults to en-US.
  • footerHTMLString <String> clean html string equivalent of footer. Defaults to <p></p> if footer flag is true.

Returns

<Promise<Buffer|Blob>>

Notes

Currently page break can be implemented by having div with classname "page-break" or style "page-break-after" despite the values of the "page-break-after", and contents inside the div element will be ignored. <div class="page-break" style="page-break-after: always;"></div>

CSS list-style-type for <ol> element are now supported. Just do something like this in the HTML:

  <ol style="list-style-type:lower-alpha;">
    <li>List item</li>
    ...
  </ol>

List of supported list-style-types:

  • upper-alpha, will result in A. List item
  • lower-alpha, will result in a. List item
  • upper-roman, will result in I. List item
  • lower-roman, will result in i. List item
  • lower-alpha-bracket-end, will result in a) List item
  • decimal-bracket-end, will result in 1) List item
  • decimal-bracket, will result in (1) List item
  • decimal, (the default) will result in 1. List item

Also you could add attribute data-start="n" to start the numbering from the n-th.

<ol data-start="2"> will start the numbering from ( B. b. II. ii. 2. )

Font family doesnt work consistently for all word processor softwares

  • Word Desktop work as intended
  • LibreOffice ignores the fontTable.xml file, and finds a font by itself
  • Word Online ignores the fontTable.xml file, and finds closest font in their font library

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Please make sure to branch new branches off of develop for contribution.

Support

"Buy Me A Coffee"

License

MIT

Contributors

Made with contrib.rocks.