xmlbuilder vs cheerio vs dom-serializer vs htmlparser2
HTML and XML Processing Libraries for Node.js
xmlbuildercheeriodom-serializerhtmlparser2Similar Packages:

HTML and XML Processing Libraries for Node.js

cheerio, dom-serializer, htmlparser2, and xmlbuilder are Node.js libraries used for processing HTML and XML documents, but they serve different roles in the document lifecycle. htmlparser2 is a fast, streaming parser for HTML and XML that can emit events or build a DOM-like tree. cheerio provides a jQuery-inspired API for server-side DOM manipulation, using htmlparser2 for parsing and dom-serializer for output. dom-serializer is a lightweight utility that converts DOM node trees back into HTML or XML strings. xmlbuilder is designed specifically for generating well-formed XML documents from JavaScript objects using a fluent, chainable API. Together, these packages cover parsing, manipulation, serialization, and generation of markup documents in Node.js environments.

Npm Package Weekly Downloads Trend

3 Years

Github Stars Ranking

Stat Detail

Package
Downloads
Stars
Size
Issues
Publish
License
xmlbuilder45,854,761926-76 years agoMIT
cheerio18,108,05230,1441.01 MB35a month agoMIT
dom-serializer014128.8 kB7-MIT
htmlparser204,800306 kB22a month agoMIT

HTML and XML Processing in Node.js: Cheerio, dom-serializer, htmlparser2, and xmlbuilder Compared

When you need to parse, manipulate, or generate HTML or XML in a Node.js environment, four packages often come up: cheerio, dom-serializer, htmlparser2, and xmlbuilder. While they sometimes overlap in functionality, each serves a distinct role in the data processing pipeline. Understanding their responsibilities — and how they fit together — is key to choosing the right tool for your task.

🧩 Core Responsibilities: Parsing vs. Manipulation vs. Serialization vs. Generation

These libraries aren’t direct competitors — they solve different problems in the document lifecycle:

  • htmlparser2 is a streaming HTML/XML parser. It reads raw markup and emits events or builds a DOM-like tree.
  • cheerio is a jQuery-like API for server-side DOM manipulation. It uses htmlparser2 under the hood to parse and relies on dom-serializer (or similar) to output HTML.
  • dom-serializer is a minimalist serializer that converts a DOM node tree back into a string of HTML or XML.
  • xmlbuilder is an XML generation library focused on building well-formed XML documents from scratch using a fluent API.

Let’s look at how each works in practice.

🔍 Parsing HTML: htmlparser2 vs Cheerio

Using htmlparser2 directly

If you need fine-grained control over parsing — like handling streaming input or custom tag behavior — use htmlparser2 directly.

// htmlparser2: Parse with event handlers
import { Parser } from 'htmlparser2';

const parser = new Parser({
  onopentag(name, attribs) {
    if (name === 'script' && attribs.src) {
      console.log('External script:', attribs.src);
    }
  },
  ontext(text) {
    console.log('Text:', text);
  }
});

parser.write('<div>Hello <script src="app.js"></script></div>');
parser.end();

You can also use it to build a DOM tree:

// htmlparser2: Build DOM tree
import { parseDocument } from 'htmlparser2';

const doc = parseDocument('<ul><li>One</li><li>Two</li></ul>');
console.log(doc.children[0].children); // Array of nodes

Using cheerio for jQuery-style manipulation

For most web scraping or HTML transformation tasks, cheerio is more convenient because it gives you a familiar API.

// cheerio: Parse and manipulate
import * as cheerio from 'cheerio';

const $ = cheerio.load('<ul><li>One</li><li>Two</li></ul>');
$('li').last().remove();
$('ul').append('<li>Three</li>');

console.log($.html()); // <ul><li>One</li><li>Three</li></ul>

Under the hood, cheerio uses htmlparser2 to parse the input and dom-serializer to produce the final HTML string.

📤 Serializing DOM Trees: dom-serializer

dom-serializer doesn’t parse anything — it only turns a DOM node (like those produced by htmlparser2) into a string.

// dom-serializer: Convert DOM node to HTML
import { serialize } from 'dom-serializer';
import { parseDocument } from 'htmlparser2';

const doc = parseDocument('<p>Hello <em>world</em></p>');
const htmlString = serialize(doc);
console.log(htmlString); // <p>Hello <em>world</em></p>

You can pass options to control output:

// dom-serializer: With options
const xmlString = serialize(doc, { xmlMode: true });
// Produces self-closing tags: <em />

Note: cheerio uses dom-serializer internally when you call $.html(), so you rarely need to use it directly unless you’re working with raw htmlparser2 trees.

🏗️ Generating XML from Scratch: xmlbuilder

None of the other packages are designed to build structured XML documents programmatically. That’s where xmlbuilder shines.

// xmlbuilder: Create XML declaratively
import { create } from 'xmlbuilder';

const root = create('root')
  .att('version', '1.0')
  .ele('child')
    .txt('Hello')
  .up()
  .ele('sibling')
    .att('id', '2');

console.log(root.end({ prettyPrint: true }));
/* Output:
<root version="1.0">
  <child>Hello</child>
  <sibling id="2"/>
</root>
*/

This is ideal for generating sitemaps, RSS feeds, or API responses in XML format — tasks where you start with data objects, not raw markup.

🔄 Real-World Workflows: How These Packages Fit Together

Scenario 1: Web Scraping and Cleanup

You fetch an HTML page, remove ads, and extract clean content.

Use cheerio — it handles parsing, selection, and serialization in one chain.

import * as cheerio from 'cheerio';

const $ = cheerio.load(htmlFromNetwork);
$('.ad-banner').remove();
const cleanHtml = $.html('.main-content');

Scenario 2: Transforming Large HTML Streams

You’re processing gigabytes of HTML logs and need to extract metadata without loading everything into memory.

Use htmlparser2 with streaming mode — avoid DOM construction entirely.

import { createReadStream } from 'fs';
import { Parser } from 'htmlparser2';

const parser = new Parser({
  onopentag(name, attribs) {
    if (name === 'meta' && attribs.name === 'author') {
      console.log('Author:', attribs.content);
    }
  }
});

createReadStream('huge-file.html').pipe(parser);

Scenario 3: Building an XML Sitemap

You have a list of URLs and need to output a valid sitemap.xml.

Use xmlbuilder — it ensures proper structure, escaping, and formatting.

import { create } from 'xmlbuilder';

const sitemap = create('urlset')
  .att('xmlns', 'http://www.sitemaps.org/schemas/sitemap/0.9');

urls.forEach(url => {
  sitemap.ele('url').ele('loc').txt(url).up().up();
});

const xml = sitemap.end({ prettyPrint: true });

Scenario 4: Custom HTML Serializer

You’ve modified a DOM tree using low-level htmlparser2 nodes and need to output HTML.

Use dom-serializer directly.

import { parseDocument } from 'htmlparser2';
import { serialize } from 'dom-serializer';

const doc = parseDocument('<div><span>old</span></div>');
doc.children[0].children[0].data = 'new'; // modify text node

const output = serialize(doc);
// <div><span>new</span></div>

⚠️ Common Misconceptions

  • “Can I use xmlbuilder to parse XML?” → No. It only builds XML. For parsing, use htmlparser2 (which supports XML mode) or a dedicated XML parser like @xmldom/xmldom.
  • “Does cheerio work with XML?” → Partially. It’s optimized for HTML. For full XML compliance (namespaces, CDATA, etc.), xmlbuilder (for generation) or htmlparser2 in xmlMode (for parsing) are better choices.
  • “Is dom-serializer needed if I use cheerio?” → Not directly. cheerio includes it as a dependency and uses it automatically when you call .html().

📊 Summary Table

PackagePrimary RoleInputOutputBest For
htmlparser2Streaming parserHTML/XML stringEvents or DOM treeLow-level parsing, streaming, custom handlers
cheerioDOM manipulation (jQuery)HTML stringModified HTMLWeb scraping, HTML transformation
dom-serializerDOM-to-string converterDOM node treeHTML/XML stringSerializing parsed trees
xmlbuilderXML document builderJavaScript objectsWell-formed XMLGenerating sitemaps, feeds, config files

💡 Final Guidance

  • If you’re scraping or cleaning HTML, reach for cheerio first — it’s the most productive for typical tasks.
  • If you’re processing huge files or need custom parsing logic, drop down to htmlparser2.
  • If you’re generating XML from data, xmlbuilder is purpose-built for that job.
  • You’ll rarely import dom-serializer directly — it’s a utility used by other libraries under the hood.

These tools complement each other. In fact, cheerio’s power comes from combining htmlparser2 (parsing) and dom-serializer (output) with a jQuery-like API. Choose based on whether your task starts with markup (parsing/manipulation) or data (generation).

How to Choose: xmlbuilder vs cheerio vs dom-serializer vs htmlparser2

  • xmlbuilder:

    Choose xmlbuilder when your goal is to generate well-formed, properly escaped XML documents from structured data — such as sitemaps, RSS feeds, or configuration files. It excels at building XML from scratch with a fluent API but cannot parse existing XML. Avoid it for HTML manipulation or general-purpose markup processing; it’s strictly an XML generator.

  • cheerio:

    Choose cheerio when you need to parse, traverse, and manipulate HTML on the server using a jQuery-like syntax — ideal for web scraping, HTML cleanup, or template transformation. It abstracts away lower-level parsing and serialization details, making it the most developer-friendly option for common HTML manipulation tasks. Avoid it if you're working with strict XML (namespaces, CDATA) or need streaming performance on very large documents.

  • dom-serializer:

    Choose dom-serializer only when you already have a DOM node tree (e.g., from htmlparser2) and need to convert it back to a string of HTML or XML. It’s a low-level utility rarely used directly in application code, as higher-level libraries like cheerio include and use it internally. Don’t use it for parsing or building documents from scratch — it only handles serialization.

  • htmlparser2:

    Choose htmlparser2 when you need fine-grained control over HTML or XML parsing, such as processing large files via streams, implementing custom tag handlers, or avoiding full DOM construction for performance. It’s the foundation many other libraries build on, so use it directly only when cheerio’s abstraction is too limiting or when memory efficiency is critical.

README for xmlbuilder

xmlbuilder-js

An XML builder for node.js similar to java-xmlbuilder.

License NPM Version NPM Downloads

Travis Build Status AppVeyor Build status Dev Dependency Status Code Coverage

Announcing xmlbuilder2:

The new release of xmlbuilder is available at xmlbuilder2! xmlbuilder2 has been redesigned from the ground up to be fully conforming to the modern DOM specification. It supports XML namespaces, provides built-in converters for multiple formats, collection functions, and more. Please see upgrading from xmlbuilder in the wiki.

New development will be focused towards xmlbuilder2; xmlbuilder will only receive critical bug fixes.

Installation:

npm install xmlbuilder

Usage:

var builder = require('xmlbuilder');

var xml = builder.create('root')
  .ele('xmlbuilder')
    .ele('repo', {'type': 'git'}, 'git://github.com/oozcitak/xmlbuilder-js.git')
  .end({ pretty: true});

console.log(xml);

will result in:

<?xml version="1.0"?>
<root>
  <xmlbuilder>
    <repo type="git">git://github.com/oozcitak/xmlbuilder-js.git</repo>
  </xmlbuilder>
</root>

It is also possible to convert objects into nodes:

var builder = require('xmlbuilder');

var obj = {
  root: {
    xmlbuilder: {
      repo: {
        '@type': 'git', // attributes start with @
        '#text': 'git://github.com/oozcitak/xmlbuilder-js.git' // text node
      }
    }
  }
};

var xml = builder.create(obj).end({ pretty: true});
console.log(xml);

If you need to do some processing:

var builder = require('xmlbuilder');

var root = builder.create('squares');
root.com('f(x) = x^2');
for(var i = 1; i <= 5; i++)
{
  var item = root.ele('data');
  item.att('x', i);
  item.att('y', i * i);
}

var xml = root.end({ pretty: true});
console.log(xml);

This will result in:

<?xml version="1.0"?>
<squares>
  <!-- f(x) = x^2 -->
  <data x="1" y="1"/>
  <data x="2" y="4"/>
  <data x="3" y="9"/>
  <data x="4" y="16"/>
  <data x="5" y="25"/>
</squares>

See the wiki for details and examples for more complex examples.