cheerio vs css-select vs domutils vs htmlparser2 vs jsdom vs xpath
Web Scraping and DOM Manipulation
cheeriocss-selectdomutilshtmlparser2jsdomxpathSimilar Packages:

Web Scraping and DOM Manipulation

Web scraping and DOM manipulation libraries in JavaScript provide tools for parsing, traversing, and manipulating HTML and XML documents. These libraries enable developers to extract data from web pages, modify content, and perform tasks like automated testing or data analysis. They vary in features, performance, and API design, catering to different use cases such as server-side scraping, client-side manipulation, or working with structured data. Popular libraries include cheerio for jQuery-like manipulation, jsdom for a full DOM implementation, and htmlparser2 for fast, streaming HTML parsing.

Npm Package Weekly Downloads Trend

3 Years

Github Stars Ranking

Stat Detail

Package
Downloads
Stars
Size
Issues
Publish
License
cheerio030,2381.01 MB442 months agoMIT
css-select0606213 kB45 days agoBSD-2-Clause
domutils0223119 kB99 days agoBSD-2-Clause
htmlparser204,816235 kB97 days agoMIT
jsdom021,5326.93 MB4168 days agoMIT
xpath0233183 kB242 years agoMIT

Feature Comparison: cheerio vs css-select vs domutils vs htmlparser2 vs jsdom vs xpath

Parsing Capabilities

  • cheerio:

    cheerio parses HTML and XML documents quickly, providing a jQuery-like interface for manipulation. However, it does not handle malformed HTML as robustly as some other parsers.

  • css-select:

    css-select does not parse documents itself; it relies on other parsers to create a DOM. It focuses on selecting elements using CSS selectors, making it lightweight and efficient for that purpose.

  • domutils:

    domutils does not parse documents but provides utilities for manipulating and traversing DOM-like structures. It is designed to work with other libraries that handle parsing.

  • htmlparser2:

    htmlparser2 is a fast, streaming parser for HTML and XML that handles malformed markup gracefully. It supports incremental parsing, making it memory-efficient for large documents.

  • jsdom:

    jsdom provides a complete DOM implementation, including parsing HTML and XML documents. It adheres to web standards, making it suitable for applications that require a fully-featured DOM environment.

  • xpath:

    xpath does not parse documents but operates on existing DOM trees. It allows for querying XML and HTML documents using XPath expressions, which can be more powerful than CSS selectors for complex queries.

DOM Manipulation

  • cheerio:

    cheerio excels at DOM manipulation, offering a wide range of methods similar to jQuery. It allows for easy traversal, modification, and extraction of elements, making it ideal for web scraping and data manipulation.

  • css-select:

    css-select does not provide DOM manipulation capabilities; it is solely focused on selecting elements based on CSS selectors. Manipulation must be done using other libraries.

  • domutils:

    domutils provides basic DOM manipulation utilities, such as inserting, removing, and modifying nodes. It is lightweight and complements other libraries but lacks a comprehensive API.

  • htmlparser2:

    htmlparser2 is primarily a parsing library and does not provide built-in DOM manipulation features. It focuses on efficiently parsing and handling HTML and XML content.

  • jsdom:

    jsdom offers full DOM manipulation capabilities, including support for events, styles, and attributes. It provides a rich API that closely resembles the browser DOM, making it suitable for complex manipulations and testing.

  • xpath:

    xpath does not manipulate the DOM; it is used for querying and selecting nodes based on XPath expressions. Manipulation must be done using other DOM manipulation libraries.

Performance

  • cheerio:

    cheerio is lightweight and fast for parsing and manipulating HTML, but it loads the entire document into memory, which can be a limitation for very large files.

  • css-select:

    css-select is highly efficient for selecting elements, especially when used with a lightweight DOM. Its performance depends on the quality of the input DOM structure.

  • domutils:

    domutils is designed to be lightweight and efficient, with minimal overhead for the utility functions it provides. Its performance is best when used with simple DOM structures.

  • htmlparser2:

    htmlparser2 is one of the fastest HTML parsers available, especially for streaming and incremental parsing. It is optimized for performance and memory usage, making it suitable for large documents.

  • jsdom:

    jsdom is more resource-intensive than lightweight libraries due to its comprehensive DOM implementation. However, it is optimized for performance and is suitable for most server-side applications.

  • xpath:

    xpath performance depends on the complexity of the XPath queries and the size of the DOM being queried. It can be slower than CSS selectors for simple tasks but is more powerful for complex queries.

Use Cases

  • cheerio:

    cheerio is ideal for web scraping, data extraction, and server-side HTML manipulation. Its jQuery-like API makes it easy to work with, especially for tasks that involve traversing and modifying the DOM.

  • css-select:

    css-select is useful for projects that require a lightweight, standalone CSS selector engine. It can be used in conjunction with other libraries for custom selection logic without a full DOM.

  • domutils:

    domutils is best suited for projects that need utility functions for working with DOM-like structures. It is often used alongside other libraries to enhance their functionality.

  • htmlparser2:

    htmlparser2 is perfect for applications that require fast and efficient parsing of HTML and XML, especially when handling large or malformed documents. It is commonly used in web scraping and data processing tools.

  • jsdom:

    jsdom is ideal for testing, server-side rendering, and applications that require a full DOM environment. It is widely used in automated testing frameworks and for simulating browser behavior on the server.

  • xpath:

    xpath is useful for applications that need to perform complex queries on XML or HTML documents. It is often used in data extraction, transformation, and processing tasks where advanced querying capabilities are required.

Ease of Use: Code Examples

  • cheerio:

    Web Scraping with cheerio

    const cheerio = require('cheerio');
    const axios = require('axios');
    
    async function scrapeWebsite(url) {
      const { data } = await axios.get(url);
      const $ = cheerio.load(data);
      const titles = [];
    
      $('h1, h2, h3').each((index, element) => {
        titles.push($(element).text());
      });
    
      return titles;
    }
    
    scrapeWebsite('https://example.com').then(console.log);
    
  • css-select:

    Selecting Elements with css-select

    const { select } = require('css-select');
    const { parse } = require('node-html-parser');
    
    const html = '<div><p class="text">Hello</p><p class="text">World</p></div>';
    const root = parse(html);
    const elements = select('.text', root);
    
    console.log(elements.map(el => el.text)); // Output: ['Hello', 'World']
    
  • domutils:

    DOM Manipulation with domutils

    const { createElement, appendChild, removeChild } = require('domutils');
    
    const root = createElement('div');
    const child = createElement('span');
    appendChild(root, child);
    
    console.log(root); // <div><span></span></div>
    removeChild(child);
    console.log(root); // <div></div>
    
  • htmlparser2:

    Parsing HTML with htmlparser2

    const { Parser } = require('htmlparser2');
    
    const parser = new Parser({
      onopentag(name) { console.log('Opening tag:', name); },
      ontext(text) { console.log('Text:', text); },
      onclosetag(name) { console.log('Closing tag:', name); },
    });
    
    parser.write('<div>Hello <span>World</span></div>');
    parser.end();
    
  • jsdom:

    DOM Manipulation with jsdom

    const { JSDOM } = require('jsdom');
    const dom = new JSDOM('<!DOCTYPE html><p>Hello world</p>');
    const document = dom.window.document;
    const p = document.querySelector('p');
    p.textContent = 'Hello, jsdom!';
    console.log(p.textContent); // Output: Hello, jsdom!
    
  • xpath:

    XPath Querying with xpath

    const xpath = require('xpath');
    const { DOMParser } = require('xmldom');
    
    const xml = '<root><item>1</item><item>2</item></root>';
    const doc = new DOMParser().parseFromString(xml);
    const nodes = xpath.select('//item', doc);
    
    nodes.forEach(node => console.log(node.textContent)); // Output: 1
    2
    

How to Choose: cheerio vs css-select vs domutils vs htmlparser2 vs jsdom vs xpath

  • cheerio:

    Choose cheerio if you need a fast, lightweight library for server-side HTML manipulation with a jQuery-like API. It is ideal for web scraping and data extraction tasks where you don't need a full DOM implementation.

  • css-select:

    Select css-select when you need a standalone CSS selector engine for parsing and selecting elements from HTML or XML. It is useful for projects that require custom selection logic without a full DOM.

  • domutils:

    Use domutils if you need utility functions for working with DOM-like structures, especially in conjunction with other libraries. It provides a lightweight solution for manipulating and traversing trees without a full DOM.

  • htmlparser2:

    Opt for htmlparser2 when you require a fast, flexible HTML and XML parser that supports streaming and incremental parsing. It is suitable for high-performance applications where memory efficiency is critical.

  • jsdom:

    Choose jsdom if you need a full-featured, standards-compliant DOM implementation for Node.js. It is ideal for testing, server-side rendering, and applications that require a complete browser-like environment.

  • xpath:

    Select xpath when you need to perform XPath queries on XML or HTML documents. It is useful for projects that require advanced querying capabilities beyond CSS selectors.

README for cheerio

cheerio

The fast, flexible, and elegant library for parsing and manipulating HTML and XML.

中文文档 (Chinese Readme)

import * as cheerio from 'cheerio';
const $ = cheerio.load('<h2 class="title">Hello world</h2>');

$('h2.title').text('Hello there!');
$('h2').addClass('welcome');

$.html();
//=> <html><head></head><body><h2 class="title welcome">Hello there!</h2></body></html>

Installation

Install Cheerio using a package manager like npm, yarn, or bun.

npm install cheerio
# or
bun add cheerio

Features

❤ Proven syntax: Cheerio implements a subset of core jQuery. Cheerio removes all the DOM inconsistencies and browser cruft from the jQuery library, revealing its truly gorgeous API.

ϟ Blazingly fast: Cheerio works with a very simple, consistent DOM model. As a result parsing, manipulating, and rendering are incredibly efficient.

❁ Incredibly flexible: Cheerio wraps around parse5 for parsing HTML and can optionally use the forgiving htmlparser2. Cheerio can parse nearly any HTML or XML document. Cheerio works in both browser and server environments.

API

Loading

First you need to load in the HTML. This step in jQuery is implicit, since jQuery operates on the one, baked-in DOM. With Cheerio, we need to pass in the HTML document.

// ESM or TypeScript:
import * as cheerio from 'cheerio';

// In other environments:
const cheerio = require('cheerio');

const $ = cheerio.load('<ul id="fruits">...</ul>');

$.html();
//=> <html><head></head><body><ul id="fruits">...</ul></body></html>

Selectors

Once you've loaded the HTML, you can use jQuery-style selectors to find elements within the document.

$( selector, [context], [root] )

selector searches within the context scope which searches within the root scope. selector and context can be a string expression, DOM Element, array of DOM elements, or cheerio object. root, if provided, is typically the HTML document string.

This selector method is the starting point for traversing and manipulating the document. Like in jQuery, it's the primary method for selecting elements in the document.

$('.apple', '#fruits').text();
//=> Apple

$('ul .pear').attr('class');
//=> pear

$('li[class=orange]').html();
//=> Orange

Rendering

When you're ready to render the document, you can call the html method on the "root" selection:

$.root().html();
//=>  <html>
//      <head></head>
//      <body>
//        <ul id="fruits">
//          <li class="apple">Apple</li>
//          <li class="orange">Orange</li>
//          <li class="pear">Pear</li>
//        </ul>
//      </body>
//    </html>

If you want to render the outerHTML of a selection, you can use the outerHTML prop:

$('.pear').prop('outerHTML');
//=> <li class="pear">Pear</li>

You may also render the text content of a Cheerio object using the text method:

const $ = cheerio.load('This is <em>content</em>.');
$('body').text();
//=> This is content.

The "DOM Node" object

Cheerio collections are made up of objects that bear some resemblance to browser-based DOM nodes. You can expect them to define the following properties:

  • tagName
  • parentNode
  • previousSibling
  • nextSibling
  • nodeValue
  • firstChild
  • childNodes
  • lastChild

Screencasts

https://vimeo.com/31950192

This video tutorial is a follow-up to Nettut's "How to Scrape Web Pages with Node.js and jQuery", using cheerio instead of JSDOM + jQuery. This video shows how easy it is to use cheerio and how much faster cheerio is than JSDOM + jQuery.

Cheerio in the real world

Are you using cheerio in production? Add it to the wiki!

Sponsors

Does your company use Cheerio in production? Please consider sponsoring this project! Your help will allow maintainers to dedicate more time and resources to its development and support.

Headlining Sponsors

Tidelift Github AirBnB HasData

Other Sponsors

OnlineCasinosSpelen Nieuwe-Casinos.net

Backers

Become a backer to show your support for Cheerio and help us maintain and improve this open source project.

Vasy Kafidoff

License

MIT