jsdom vs xml2js vs cheerio vs xpath vs x-path
HTML and XML Parsing Libraries Comparison
1 Year
jsdomxml2jscheerioxpathx-pathSimilar Packages:
What's HTML and XML Parsing Libraries?

HTML and XML parsing libraries are essential tools for web developers who need to manipulate and extract data from web pages or XML documents. These libraries provide functionalities to traverse, query, and modify the structure of HTML or XML, making it easier to scrape data, automate tasks, or transform documents. Each library has its unique features and use cases, catering to different needs in web scraping, DOM manipulation, and XML processing.

Package Weekly Downloads Trend
Github Stars Ranking
Stat Detail
Package
Downloads
Stars
Size
Issues
Publish
License
jsdom29,395,58620,9323.18 MB4772 days agoMIT
xml2js22,618,7874,9333.44 MB2472 years agoMIT
cheerio10,295,23929,3361.25 MB468 months agoMIT
xpath3,468,169230183 kB23a year agoMIT
x-path77,335---10 years agoMIT
Feature Comparison: jsdom vs xml2js vs cheerio vs xpath vs x-path

Parsing Capability

  • jsdom:

    jsdom offers a complete DOM and HTML parsing implementation, simulating a browser environment. It can handle complex HTML documents and provides a wide range of browser APIs, making it suitable for applications that need to run scripts as if they were in a browser.

  • xml2js:

    xml2js focuses on converting XML to JavaScript objects, allowing developers to easily work with XML data in a more familiar format. It simplifies the process of parsing XML and accessing its contents programmatically.

  • cheerio:

    Cheerio provides a fast and flexible way to parse HTML and manipulate the resulting DOM structure. It allows for jQuery-like syntax, making it easy to traverse and manipulate elements, which is particularly useful for web scraping.

  • xpath:

    xpath is a lightweight library that enables the evaluation of XPath expressions on XML documents. It provides a straightforward way to extract data from XML without the overhead of a full DOM.

  • x-path:

    x-path is designed specifically for parsing XML documents and executing XPath queries. It allows for precise data extraction from XML structures, making it ideal for applications that require detailed data manipulation.

Performance

  • jsdom:

    While jsdom provides a comprehensive DOM simulation, it may have performance overhead compared to lighter libraries. It is best suited for scenarios where full browser capabilities are necessary, but it may not be the fastest option for simple parsing tasks.

  • xml2js:

    xml2js is designed for efficient XML parsing and conversion to JavaScript objects. It performs well for most XML documents, but performance may vary with extremely large or complex XML structures.

  • cheerio:

    Cheerio is optimized for performance, making it a great choice for high-speed web scraping tasks. It operates in a lightweight manner, allowing for quick parsing and manipulation of HTML without the need for a browser.

  • xpath:

    xpath is lightweight and efficient for evaluating XPath expressions, making it suitable for quick data extraction tasks from XML without significant performance concerns.

  • x-path:

    x-path is efficient in executing XPath queries, providing fast data extraction from XML documents. Its performance is generally high for XML processing tasks, especially when dealing with large datasets.

Ease of Use

  • jsdom:

    jsdom has a steeper learning curve due to its comprehensive feature set and browser-like environment. However, it offers powerful capabilities for those who need to simulate a full browser context.

  • xml2js:

    xml2js is user-friendly and simplifies the process of working with XML. Its ability to convert XML to JavaScript objects makes it accessible for developers who may not be familiar with XML parsing.

  • cheerio:

    Cheerio's jQuery-like syntax makes it very easy to learn and use, especially for developers familiar with jQuery. Its API is intuitive, allowing for rapid development and data extraction.

  • xpath:

    xpath is easy to use for evaluating XPath expressions, but it requires some understanding of XPath syntax. It is lightweight and straightforward for those who need to extract data from XML.

  • x-path:

    x-path is straightforward to use for those familiar with XPath syntax. It provides a clear interface for querying XML, making it easy to extract specific data points.

Use Cases

  • jsdom:

    jsdom is suitable for testing front-end code, simulating browser behavior, and running scripts that rely on browser APIs. It is often used in environments where a full DOM is necessary for accurate testing.

  • xml2js:

    xml2js is perfect for applications that need to parse XML data and convert it into a more manageable JavaScript format. It is often used in scenarios where XML data needs to be manipulated or transformed.

  • cheerio:

    Cheerio is ideal for web scraping, data extraction, and server-side HTML manipulation. It is commonly used in Node.js applications where quick access to HTML elements is required.

  • xpath:

    xpath is useful for lightweight XML data extraction tasks, especially when working with XML documents that require specific queries without the need for a full DOM.

  • x-path:

    x-path is best for applications that need to extract data from XML documents using XPath queries. It is commonly used in data processing and transformation tasks involving XML.

Community and Support

  • jsdom:

    jsdom is well-supported and has a robust community, making it easy to find help and resources. It is frequently updated to keep pace with browser standards and features.

  • xml2js:

    xml2js has a good level of community support and documentation, making it accessible for developers needing to work with XML data in JavaScript.

  • cheerio:

    Cheerio has a strong community and is widely used in web scraping projects, providing ample resources and documentation for developers. Its popularity ensures ongoing support and updates.

  • xpath:

    xpath has a niche community focused on XML processing, providing sufficient resources for developers who require XPath functionality.

  • x-path:

    x-path has a smaller community compared to others, but it is still supported with adequate documentation for those who need to work specifically with XPath.

How to Choose: jsdom vs xml2js vs cheerio vs xpath vs x-path
  • jsdom:

    Select jsdom if you require a full-fledged DOM implementation that simulates a browser environment. It is particularly useful for testing and running scripts that depend on browser APIs, making it suitable for applications that need to manipulate HTML or XML as if they were in a web browser.

  • xml2js:

    Use xml2js when you need to convert XML data into JavaScript objects for easier manipulation. This library is beneficial for applications that require parsing XML data and working with it in a more JavaScript-friendly format.

  • cheerio:

    Choose Cheerio if you need a fast and lightweight library for server-side DOM manipulation and jQuery-like syntax. It is ideal for web scraping tasks where you want to extract data from HTML documents without the overhead of a full browser environment.

  • xpath:

    Choose xpath if you need a lightweight library to evaluate XPath expressions on XML documents. It is particularly useful for extracting data from XML without the need for a full DOM implementation.

  • x-path:

    Opt for x-path if you need a library specifically for querying XML documents using XPath expressions. It is best suited for scenarios where you want to extract data from XML structures efficiently and with precision.

README for jsdom


jsdom

jsdom is a pure-JavaScript implementation of many web standards, notably the WHATWG DOM and HTML Standards, for use with Node.js. In general, the goal of the project is to emulate enough of a subset of a web browser to be useful for testing and scraping real-world web applications.

The latest versions of jsdom require Node.js v18 or newer. (Versions of jsdom below v23 still work with previous Node.js versions, but are unsupported.)

Basic usage

const jsdom = require("jsdom");
const { JSDOM } = jsdom;

To use jsdom, you will primarily use the JSDOM constructor, which is a named export of the jsdom main module. Pass the constructor a string. You will get back a JSDOM object, which has a number of useful properties, notably window:

const dom = new JSDOM(`<!DOCTYPE html><p>Hello world</p>`);
console.log(dom.window.document.querySelector("p").textContent); // "Hello world"

(Note that jsdom will parse the HTML you pass it just like a browser does, including implied <html>, <head>, and <body> tags.)

The resulting object is an instance of the JSDOM class, which contains a number of useful properties and methods besides window. In general, it can be used to act on the jsdom from the "outside," doing things that are not possible with the normal DOM APIs. For simple cases, where you don't need any of this functionality, we recommend a coding pattern like

const { window } = new JSDOM(`...`);
// or even
const { document } = (new JSDOM(`...`)).window;

Full documentation on everything you can do with the JSDOM class is below, in the section "JSDOM Object API".

Customizing jsdom

The JSDOM constructor accepts a second parameter which can be used to customize your jsdom in the following ways.

Simple options

const dom = new JSDOM(``, {
  url: "https://example.org/",
  referrer: "https://example.com/",
  contentType: "text/html",
  includeNodeLocations: true,
  storageQuota: 10000000
});
  • url sets the value returned by window.location, document.URL, and document.documentURI, and affects things like resolution of relative URLs within the document and the same-origin restrictions and referrer used while fetching subresources. It defaults to "about:blank".
  • referrer just affects the value read from document.referrer. It defaults to no referrer (which reflects as the empty string).
  • contentType affects the value read from document.contentType, as well as how the document is parsed: as HTML or as XML. Values that are not a HTML MIME type or an XML MIME type will throw. It defaults to "text/html". If a charset parameter is present, it can affect binary data processing.
  • includeNodeLocations preserves the location info produced by the HTML parser, allowing you to retrieve it with the nodeLocation() method (described below). It also ensures that line numbers reported in exception stack traces for code running inside <script> elements are correct. It defaults to false to give the best performance, and cannot be used with an XML content type since our XML parser does not support location info.
  • storageQuota is the maximum size in code units for the separate storage areas used by localStorage and sessionStorage. Attempts to store data larger than this limit will cause a DOMException to be thrown. By default, it is set to 5,000,000 code units per origin, as inspired by the HTML specification.

Note that both url and referrer are canonicalized before they're used, so e.g. if you pass in "https:example.com", jsdom will interpret that as if you had given "https://example.com/". If you pass an unparseable URL, the call will throw. (URLs are parsed and serialized according to the URL Standard.)

Executing scripts

jsdom's most powerful ability is that it can execute scripts inside the jsdom. These scripts can modify the content of the page and access all the web platform APIs jsdom implements.

However, this is also highly dangerous when dealing with untrusted content. The jsdom sandbox is not foolproof, and code running inside the DOM's <script>s can, if it tries hard enough, get access to the Node.js environment, and thus to your machine. As such, the ability to execute scripts embedded in the HTML is disabled by default:

const dom = new JSDOM(`<body>
  <div id="content"></div>
  <script>document.getElementById("content").append(document.createElement("hr"));</script>
</body>`);

// The script will not be executed, by default:
console.log(dom.window.document.getElementById("content").children.length); // 0

To enable executing scripts inside the page, you can use the runScripts: "dangerously" option:

const dom = new JSDOM(`<body>
  <div id="content"></div>
  <script>document.getElementById("content").append(document.createElement("hr"));</script>
</body>`, { runScripts: "dangerously" });

// The script will be executed and modify the DOM:
console.log(dom.window.document.getElementById("content").children.length); // 1

Again we emphasize to only use this when feeding jsdom code you know is safe. If you use it on arbitrary user-supplied code, or code from the Internet, you are effectively running untrusted Node.js code, and your machine could be compromised.

If you want to execute external scripts, included via <script src="">, you'll also need to ensure that they load them. To do this, add the option resources: "usable" as described below. (You'll likely also want to set the url option, for the reasons discussed there.)

Event handler attributes, like <div onclick="">, are also governed by this setting; they will not function unless runScripts is set to "dangerously". (However, event handler properties, like div.onclick = ..., will function regardless of runScripts.)

If you are simply trying to execute script "from the outside", instead of letting <script> elements and event handlers attributes run "from the inside", you can use the runScripts: "outside-only" option, which enables fresh copies of all the JavaScript spec-provided globals to be installed on window. This includes things like window.Array, window.Promise, etc. It also, notably, includes window.eval, which allows running scripts, but with the jsdom window as the global:

const dom = new JSDOM(`<body>
  <div id="content"></div>
  <script>document.getElementById("content").append(document.createElement("hr"));</script>
</body>`, { runScripts: "outside-only" });

// run a script outside of JSDOM:
dom.window.eval('document.getElementById("content").append(document.createElement("p"));');

console.log(dom.window.document.getElementById("content").children.length); // 1
console.log(dom.window.document.getElementsByTagName("hr").length); // 0
console.log(dom.window.document.getElementsByTagName("p").length); // 1

This is turned off by default for performance reasons, but is safe to enable.

Note that in the default configuration, without setting runScripts, the values of window.Array, window.eval, etc. will be the same as those provided by the outer Node.js environment. That is, window.eval === eval will hold, so window.eval will not run scripts in a useful way.

We strongly advise against trying to "execute scripts" by mashing together the jsdom and Node global environments (e.g. by doing global.window = dom.window), and then executing scripts or test code inside the Node global environment. Instead, you should treat jsdom like you would a browser, and run all scripts and tests that need access to a DOM inside the jsdom environment, using window.eval or runScripts: "dangerously". This might require, for example, creating a browserify bundle to execute as a <script> element—just like you would in a browser.

Finally, for advanced use cases you can use the dom.getInternalVMContext() method, documented below.

Pretending to be a visual browser

jsdom does not have the capability to render visual content, and will act like a headless browser by default. It provides hints to web pages through APIs such as document.hidden that their content is not visible.

When the pretendToBeVisual option is set to true, jsdom will pretend that it is rendering and displaying content. It does this by:

  • Changing document.hidden to return false instead of true
  • Changing document.visibilityState to return "visible" instead of "prerender"
  • Enabling window.requestAnimationFrame() and window.cancelAnimationFrame() methods, which otherwise do not exist
const window = (new JSDOM(``, { pretendToBeVisual: true })).window;

window.requestAnimationFrame(timestamp => {
  console.log(timestamp > 0);
});

Note that jsdom still does not do any layout or rendering, so this is really just about pretending to be visual, not about implementing the parts of the platform a real, visual web browser would implement.

Loading subresources

Basic options

By default, jsdom will not load any subresources such as scripts, stylesheets, images, or iframes. If you'd like jsdom to load such resources, you can pass the resources: "usable" option, which will load all usable resources. Those are:

  • Frames and iframes, via <frame> and <iframe>
  • Stylesheets, via <link rel="stylesheet">
  • Scripts, via <script>, but only if runScripts: "dangerously" is also set
  • Images, via <img>, but only if the canvas npm package is also installed (see "Canvas Support" below)

When attempting to load resources, recall that the default value for the url option is "about:blank", which means that any resources included via relative URLs will fail to load. (The result of trying to parse the URL /something against the URL about:blank is an error.) So, you'll likely want to set a non-default value for the url option in those cases, or use one of the convenience APIs that do so automatically.

Advanced configuration

To more fully customize jsdom's resource-loading behavior, you can pass an instance of the ResourceLoader class as the resources option value:

const resourceLoader = new jsdom.ResourceLoader({
  proxy: "http://127.0.0.1:9001",
  strictSSL: false,
  userAgent: "Mellblomenator/9000",
});
const dom = new JSDOM(``, { resources: resourceLoader });

The three options to the ResourceLoader constructor are:

  • proxy is the address of an HTTP proxy to be used.
  • strictSSL can be set to false to disable the requirement that SSL certificates be valid.
  • userAgent affects the User-Agent header sent, and thus the resulting value for navigator.userAgent. It defaults to `Mozilla/5.0 (${process.platform || "unknown OS"}) AppleWebKit/537.36 (KHTML, like Gecko) jsdom/${jsdomVersion}`.

You can further customize resource fetching by subclassing ResourceLoader and overriding the fetch() method. For example, here is a version that overrides the response provided for a specific URL:

class CustomResourceLoader extends jsdom.ResourceLoader {
  fetch(url, options) {
    // Override the contents of this script to do something unusual.
    if (url === "https://example.com/some-specific-script.js") {
      return Promise.resolve(Buffer.from("window.someGlobal = 5;"));
    }

    return super.fetch(url, options);
  }
}

jsdom will call your custom resource loader's fetch() method whenever it encounters a "usable" resource, per the above section. The method takes a URL string, as well as a few options which you should pass through unmodified if calling super.fetch(). It must return a promise for a Node.js Buffer object, or return null if the resource is intentionally not to be loaded. In general, most cases will want to delegate to super.fetch(), as shown.

One of the options you will receive in fetch() will be the element (if applicable) that is fetching a resource.

class CustomResourceLoader extends jsdom.ResourceLoader {
  fetch(url, options) {
    if (options.element) {
      console.log(`Element ${options.element.localName} is requesting the url ${url}`);
    }

    return super.fetch(url, options);
  }
}

Virtual consoles

Like web browsers, jsdom has the concept of a "console". This records both information directly sent from the page, via scripts executing inside the document, as well as information from the jsdom implementation itself. We call the user-controllable console a "virtual console", to distinguish it from the Node.js console API and from the inside-the-page window.console API.

By default, the JSDOM constructor will return an instance with a virtual console that forwards all its output to the Node.js console. To create your own virtual console and pass it to jsdom, you can override this default by doing

const virtualConsole = new jsdom.VirtualConsole();
const dom = new JSDOM(``, { virtualConsole });

Code like this will create a virtual console with no behavior. You can give it behavior by adding event listeners for all the possible console methods:

virtualConsole.on("error", () => { ... });
virtualConsole.on("warn", () => { ... });
virtualConsole.on("info", () => { ... });
virtualConsole.on("dir", () => { ... });
// ... etc. See https://console.spec.whatwg.org/#logging

(Note that it is probably best to set up these event listeners before calling new JSDOM(), since errors or console-invoking script might occur during parsing.)

If you simply want to redirect the virtual console output to another console, like the default Node.js one, you can do

virtualConsole.sendTo(console);

There is also a special event, "jsdomError", which will fire with error objects to report errors from jsdom itself. This is similar to how error messages often show up in web browser consoles, even if they are not initiated by console.error. So far, the following errors are output this way:

  • Errors loading or parsing subresources (scripts, stylesheets, frames, and iframes)
  • Script execution errors that are not handled by a window onerror event handler that returns true or calls event.preventDefault()
  • Not-implemented errors resulting from calls to methods, like window.alert, which jsdom does not implement, but installs anyway for web compatibility

If you're using sendTo(c) to send errors to c, by default it will call c.error(errorStack[, errorDetail]) with information from "jsdomError" events. If you'd prefer to maintain a strict one-to-one mapping of events to method calls, and perhaps handle "jsdomError"s yourself, then you can do

virtualConsole.sendTo(c, { omitJSDOMErrors: true });

Cookie jars

Like web browsers, jsdom has the concept of a cookie jar, storing HTTP cookies. Cookies that have a URL on the same domain as the document, and are not marked HTTP-only, are accessible via the document.cookie API. Additionally, all cookies in the cookie jar will impact the fetching of subresources.

By default, the JSDOM constructor will return an instance with an empty cookie jar. To create your own cookie jar and pass it to jsdom, you can override this default by doing

const cookieJar = new jsdom.CookieJar(store, options);
const dom = new JSDOM(``, { cookieJar });

This is mostly useful if you want to share the same cookie jar among multiple jsdoms, or prime the cookie jar with certain values ahead of time.

Cookie jars are provided by the tough-cookie package. The jsdom.CookieJar constructor is a subclass of the tough-cookie cookie jar which by default sets the looseMode: true option, since that matches better how browsers behave. If you want to use tough-cookie's utilities and classes yourself, you can use the jsdom.toughCookie module export to get access to the tough-cookie module instance packaged with jsdom.

Intervening before parsing

jsdom allows you to intervene in the creation of a jsdom very early: after the Window and Document objects are created, but before any HTML is parsed to populate the document with nodes:

const dom = new JSDOM(`<p>Hello</p>`, {
  beforeParse(window) {
    window.document.childNodes.length === 0;
    window.someCoolAPI = () => { /* ... */ };
  }
});

This is especially useful if you are wanting to modify the environment in some way, for example adding shims for web platform APIs jsdom does not support.

JSDOM object API

Once you have constructed a JSDOM object, it will have the following useful capabilities:

Properties

The property window retrieves the Window object that was created for you.

The properties virtualConsole and cookieJar reflect the options you pass in, or the defaults created for you if nothing was passed in for those options.

Serializing the document with serialize()

The serialize() method will return the HTML serialization of the document, including the doctype:

const dom = new JSDOM(`<!DOCTYPE html>hello`);

dom.serialize() === "<!DOCTYPE html><html><head></head><body>hello</body></html>";

// Contrast with:
dom.window.document.documentElement.outerHTML === "<html><head></head><body>hello</body></html>";

Getting the source location of a node with nodeLocation(node)

The nodeLocation() method will find where a DOM node is within the source document, returning the parse5 location info for the node:

const dom = new JSDOM(
  `<p>Hello
    <img src="foo.jpg">
  </p>`,
  { includeNodeLocations: true }
);

const document = dom.window.document;
const bodyEl = document.body; // implicitly created
const pEl = document.querySelector("p");
const textNode = pEl.firstChild;
const imgEl = document.querySelector("img");

console.log(dom.nodeLocation(bodyEl));   // null; it's not in the source
console.log(dom.nodeLocation(pEl));      // { startOffset: 0, endOffset: 39, startTag: ..., endTag: ... }
console.log(dom.nodeLocation(textNode)); // { startOffset: 3, endOffset: 13 }
console.log(dom.nodeLocation(imgEl));    // { startOffset: 13, endOffset: 32 }

Note that this feature only works if you have set the includeNodeLocations option; node locations are off by default for performance reasons.

Interfacing with the Node.js vm module using getInternalVMContext()

The built-in vm module of Node.js is what underpins jsdom's script-running magic. Some advanced use cases, like pre-compiling a script and then running it multiple times, benefit from using the vm module directly with a jsdom-created Window.

To get access to the contextified global object, suitable for use with the vm APIs, you can use the getInternalVMContext() method:

const { Script } = require("vm");

const dom = new JSDOM(``, { runScripts: "outside-only" });
const script = new Script(`
  if (!this.ran) {
    this.ran = 0;
  }

  ++this.ran;
`);

const vmContext = dom.getInternalVMContext();

script.runInContext(vmContext);
script.runInContext(vmContext);
script.runInContext(vmContext);

console.assert(dom.window.ran === 3);

This is somewhat-advanced functionality, and we advise sticking to normal DOM APIs (such as window.eval() or document.createElement("script")) unless you have very specific needs.

Note that this method will throw an exception if the JSDOM instance was created without runScripts set, or if you are using jsdom in a web browser.

Reconfiguring the jsdom with reconfigure(settings)

The top property on window is marked [Unforgeable] in the spec, meaning it is a non-configurable own property and thus cannot be overridden or shadowed by normal code running inside the jsdom, even using Object.defineProperty.

Similarly, at present jsdom does not handle navigation (such as setting window.location.href = "https://example.com/"); doing so will cause the virtual console to emit a "jsdomError" explaining that this feature is not implemented, and nothing will change: there will be no new Window or Document object, and the existing window's location object will still have all the same property values.

However, if you're acting from outside the window, e.g. in some test framework that creates jsdoms, you can override one or both of these using the special reconfigure() method:

const dom = new JSDOM();

dom.window.top === dom.window;
dom.window.location.href === "about:blank";

dom.reconfigure({ windowTop: myFakeTopForTesting, url: "https://example.com/" });

dom.window.top === myFakeTopForTesting;
dom.window.location.href === "https://example.com/";

Note that changing the jsdom's URL will impact all APIs that return the current document URL, such as window.location, document.URL, and document.documentURI, as well as the resolution of relative URLs within the document, and the same-origin checks and referrer used while fetching subresources. It will not, however, perform navigation to the contents of that URL; the contents of the DOM will remain unchanged, and no new instances of Window, Document, etc. will be created.

Convenience APIs

fromURL()

In addition to the JSDOM constructor itself, jsdom provides a promise-returning factory method for constructing a jsdom from a URL:

JSDOM.fromURL("https://example.com/", options).then(dom => {
  console.log(dom.serialize());
});

The returned promise will fulfill with a JSDOM instance if the URL is valid and the request is successful. Any redirects will be followed to their ultimate destination.

The options provided to fromURL() are similar to those provided to the JSDOM constructor, with the following additional restrictions and consequences:

  • The url and contentType options cannot be provided.
  • The referrer option is used as the HTTP Referer request header of the initial request.
  • The resources option also affects the initial request; this is useful if you want to, for example, configure a proxy (see above).
  • The resulting jsdom's URL, content type, and referrer are determined from the response.
  • Any cookies set via HTTP Set-Cookie response headers are stored in the jsdom's cookie jar. Similarly, any cookies already in a supplied cookie jar are sent as HTTP Cookie request headers.

fromFile()

Similar to fromURL(), jsdom also provides a fromFile() factory method for constructing a jsdom from a filename:

JSDOM.fromFile("stuff.html", options).then(dom => {
  console.log(dom.serialize());
});

The returned promise will fulfill with a JSDOM instance if the given file can be opened. As usual in Node.js APIs, the filename is given relative to the current working directory.

The options provided to fromFile() are similar to those provided to the JSDOM constructor, with the following additional defaults:

  • The url option will default to a file URL corresponding to the given filename, instead of to "about:blank".
  • The contentType option will default to "application/xhtml+xml" if the given filename ends in .xht, .xhtml, or .xml; otherwise it will continue to default to "text/html".

fragment()

For the very simplest of cases, you might not need a whole JSDOM instance with all its associated power. You might not even need a Window or Document! Instead, you just need to parse some HTML, and get a DOM object you can manipulate. For that, we have fragment(), which creates a DocumentFragment from a given string:

const frag = JSDOM.fragment(`<p>Hello</p><p><strong>Hi!</strong>`);

frag.childNodes.length === 2;
frag.querySelector("strong").textContent === "Hi!";
// etc.

Here frag is a DocumentFragment instance, whose contents are created by parsing the provided string. The parsing is done using a <template> element, so you can include any element there (including ones with weird parsing rules like <td>). It's also important to note that the resulting DocumentFragment will not have an associated browsing context: that is, elements' ownerDocument will have a null defaultView property, resources will not load, etc.

All invocations of the fragment() factory result in DocumentFragments that share the same template owner Document. This allows many calls to fragment() with no extra overhead. But it also means that calls to fragment() cannot be customized with any options.

Note that serialization is not as easy with DocumentFragments as it is with full JSDOM objects. If you need to serialize your DOM, you should probably use the JSDOM constructor more directly. But for the special case of a fragment containing a single element, it's pretty easy to do through normal means:

const frag = JSDOM.fragment(`<p>Hello</p>`);
console.log(frag.firstChild.outerHTML); // logs "<p>Hello</p>"

Other noteworthy features

Canvas support

jsdom includes support for using the canvas package to extend any <canvas> elements with the canvas API. To make this work, you need to include canvas as a dependency in your project, as a peer of jsdom. If jsdom can find version 3.x of the canvas package, it will use it, but if it's not present, then <canvas> elements will behave like <div>s.

Encoding sniffing

In addition to supplying a string, the JSDOM constructor can also be supplied binary data, in the form of a Node.js Buffer or a standard JavaScript binary data type like ArrayBuffer, Uint8Array, DataView, etc. When this is done, jsdom will sniff the encoding from the supplied bytes, scanning for <meta charset> tags just like a browser does.

If the supplied contentType option contains a charset parameter, that encoding will override the sniffed encoding—unless a UTF-8 or UTF-16 BOM is present, in which case those take precedence. (Again, this is just like a browser.)

This encoding sniffing also applies to JSDOM.fromFile() and JSDOM.fromURL(). In the latter case, any Content-Type headers sent with the response will take priority, in the same fashion as the constructor's contentType option.

Note that in many cases supplying bytes in this fashion can be better than supplying a string. For example, if you attempt to use Node.js's buffer.toString("utf-8") API, Node.js will not strip any leading BOMs. If you then give this string to jsdom, it will interpret it verbatim, leaving the BOM intact. But jsdom's binary data decoding code will strip leading BOMs, just like a browser; in such cases, supplying buffer directly will give the desired result.

Closing down a jsdom

Timers in the jsdom (set by window.setTimeout() or window.setInterval()) will, by definition, execute code in the future in the context of the window. Since there is no way to execute code in the future without keeping the process alive, outstanding jsdom timers will keep your Node.js process alive. Similarly, since there is no way to execute code in the context of an object without keeping that object alive, outstanding jsdom timers will prevent garbage collection of the window on which they are scheduled.

If you want to be sure to shut down a jsdom window, use window.close(), which will terminate all running timers (and also remove any event listeners on the window and document).

Debugging the DOM using Chrome DevTools

In Node.js you can debug programs using Chrome DevTools. See the official documentation for how to get started.

By default jsdom elements are formatted as plain old JS objects in the console. To make it easier to debug, you can use jsdom-devtools-formatter, which lets you inspect them like real DOM elements.

Caveats

Asynchronous script loading

People often have trouble with asynchronous script loading when using jsdom. Many pages load scripts asynchronously, but there is no way to tell when they're done doing so, and thus when it's a good time to run your code and inspect the resulting DOM structure. This is a fundamental limitation; we cannot predict what scripts on the web page will do, and so cannot tell you when they are done loading more scripts.

This can be worked around in a few ways. The best way, if you control the page in question, is to use whatever mechanisms are given by the script loader to detect when loading is done. For example, if you're using a module loader like RequireJS, the code could look like:

// On the Node.js side:
const window = (new JSDOM(...)).window;
window.onModulesLoaded = () => {
  console.log("ready to roll!");
};
<!-- Inside the HTML you supply to jsdom -->
<script>
requirejs(["entry-module"], () => {
  window.onModulesLoaded();
});
</script>

If you do not control the page, you could try workarounds such as polling for the presence of a specific element.

For more details, see the discussion in #640, especially @matthewkastor's insightful comment.

Unimplemented parts of the web platform

Although we enjoy adding new features to jsdom and keeping it up to date with the latest web specs, it has many missing APIs. Please feel free to file an issue for anything missing, but we're a small and busy team, so a pull request might work even better.

Some features of jsdom are provided by our dependencies. Notable documentation in that regard includes the list of supported CSS selectors for our CSS selector engine, nwsapi.

Beyond just features that we haven't gotten to yet, there are two major features that are currently outside the scope of jsdom. These are:

  • Navigation: the ability to change the global object, and all other objects, when clicking a link or assigning location.href or similar.
  • Layout: the ability to calculate where elements will be visually laid out as a result of CSS, which impacts methods like getBoundingClientRects() or properties like offsetTop.

Currently jsdom has dummy behaviors for some aspects of these features, such as sending a "not implemented" "jsdomError" to the virtual console for navigation, or returning zeros for many layout-related properties. Often you can work around these limitations in your code, e.g. by creating new JSDOM instances for each page you "navigate" to during a crawl, or using Object.defineProperty() to change what various layout-related getters and methods return.

Note that other tools in the same space, such as PhantomJS, do support these features. On the wiki, we have a more complete writeup about jsdom vs. PhantomJS.

Supporting jsdom

jsdom is a community-driven project maintained by a team of volunteers. You could support jsdom by:

Getting help

If you need help with jsdom, please feel free to use any of the following venues: