cheerio vs domino vs jsdom vs puppeteer
Web Scraping and DOM Manipulation Libraries
cheeriodominojsdompuppeteerSimilar Packages:

Web Scraping and DOM Manipulation Libraries

These libraries are essential tools for web scraping and DOM manipulation in Node.js environments. They provide various functionalities for parsing HTML, simulating browser behavior, and interacting with web pages programmatically. Each library has its unique strengths, making them suitable for different use cases in web development, from lightweight HTML parsing to full-fledged browser automation.

Npm Package Weekly Downloads Trend

3 Years

Github Stars Ranking

Stat Detail

Package
Downloads
Stars
Size
Issues
Publish
License
cheerio030,1501.01 MB35a month agoMIT
domino0789-376 years agoBSD-2-Clause
jsdom021,5143.4 MB42621 days agoMIT
puppeteer093,73763 kB2863 days agoApache-2.0

Feature Comparison: cheerio vs domino vs jsdom vs puppeteer

Parsing and Manipulation

  • cheerio:

    Cheerio provides a fast and flexible API for parsing HTML and XML, allowing you to manipulate the DOM using a jQuery-like syntax. It is optimized for performance and is particularly effective for server-side web scraping.

  • domino:

    Domino offers a lightweight DOM implementation that allows for basic manipulation and rendering of HTML. It is less feature-rich than others but is useful for simple tasks where a full browser environment is not necessary.

  • jsdom:

    jsdom provides a full-fledged DOM and HTML environment that closely mimics browser behavior. It supports a wide range of web standards, making it suitable for complex DOM manipulations and testing.

  • puppeteer:

    Puppeteer allows you to manipulate the DOM of a real browser instance, enabling you to interact with web pages as a user would. This includes handling dynamic content, form submissions, and more.

Browser Simulation

  • cheerio:

    Cheerio does not simulate a browser; it simply parses HTML and allows manipulation. It is not suitable for handling JavaScript-heavy sites that require a browser context.

  • domino:

    Domino provides a basic simulation of the DOM but lacks the full capabilities of a browser. It is useful for rendering static HTML but not for executing scripts.

  • jsdom:

    jsdom simulates a browser environment, allowing you to run scripts and interact with the DOM as if in a real browser. This makes it suitable for testing and running client-side code in Node.js.

  • puppeteer:

    Puppeteer provides a complete headless browser simulation, allowing you to execute JavaScript, interact with elements, and capture screenshots or PDFs. It is the most powerful option for browser automation.

Performance

  • cheerio:

    Cheerio is highly performant for parsing and manipulating static HTML due to its lightweight nature. It is optimized for speed and is suitable for handling large amounts of HTML quickly.

  • domino:

    Domino is lightweight and fast for basic DOM manipulations but may not perform as well with complex HTML structures compared to Cheerio or jsdom.

  • jsdom:

    jsdom is slower than Cheerio due to its comprehensive feature set and standards compliance. It is more suitable for scenarios where accurate browser behavior is essential.

  • puppeteer:

    Puppeteer is powerful but can be slower due to the overhead of launching a full browser instance. It is best used when you need the capabilities of a real browser.

Use Cases

  • cheerio:

    Cheerio is best suited for server-side web scraping and simple HTML manipulation tasks where speed is critical and JavaScript execution is not required.

  • domino:

    Domino is ideal for lightweight applications that need basic DOM manipulation without the overhead of a full browser environment.

  • jsdom:

    jsdom is perfect for testing client-side libraries and applications in a Node.js environment, as well as for scenarios where a more accurate DOM simulation is needed.

  • puppeteer:

    Puppeteer is the go-to choice for automated testing, scraping dynamic content, and generating visual outputs like screenshots and PDFs.

Learning Curve

  • cheerio:

    Cheerio has a low learning curve, especially for those familiar with jQuery. Its API is straightforward and easy to grasp for basic HTML manipulation.

  • domino:

    Domino is simple to use but may require some understanding of the DOM API. It is less complex than jsdom but offers fewer features.

  • jsdom:

    jsdom has a moderate learning curve due to its comprehensive feature set. Familiarity with browser APIs is beneficial for effective use.

  • puppeteer:

    Puppeteer has a steeper learning curve due to its extensive capabilities and the need to understand browser automation concepts, but it is well-documented.

How to Choose: cheerio vs domino vs jsdom vs puppeteer

  • cheerio:

    Choose Cheerio if you need a fast and lightweight library for parsing and manipulating HTML and XML. It is ideal for server-side web scraping where you want a jQuery-like syntax without the overhead of a full browser environment.

  • domino:

    Select Domino if you require a minimalistic DOM implementation that can be used for server-side rendering and manipulation. It is particularly useful for environments where you want to simulate a DOM without the complexities of a full browser context.

  • jsdom:

    Opt for jsdom if you need a comprehensive and standards-compliant DOM and HTML environment. It is suitable for testing and simulating browser behavior in Node.js, allowing you to run scripts as if they were in a real browser.

  • puppeteer:

    Use Puppeteer if you need to control a headless Chrome browser for tasks like automated testing, scraping dynamic content, or generating screenshots and PDFs. It provides a high-level API over the Chrome DevTools Protocol, making it powerful for browser automation.

README for cheerio

cheerio

The fast, flexible, and elegant library for parsing and manipulating HTML and XML.

中文文档 (Chinese Readme)

import * as cheerio from 'cheerio';
const $ = cheerio.load('<h2 class="title">Hello world</h2>');

$('h2.title').text('Hello there!');
$('h2').addClass('welcome');

$.html();
//=> <html><head></head><body><h2 class="title welcome">Hello there!</h2></body></html>

Installation

Install Cheerio using a package manager like npm, yarn, or bun.

npm install cheerio
# or
bun add cheerio

Features

❤ Proven syntax: Cheerio implements a subset of core jQuery. Cheerio removes all the DOM inconsistencies and browser cruft from the jQuery library, revealing its truly gorgeous API.

ϟ Blazingly fast: Cheerio works with a very simple, consistent DOM model. As a result parsing, manipulating, and rendering are incredibly efficient.

❁ Incredibly flexible: Cheerio wraps around parse5 for parsing HTML and can optionally use the forgiving htmlparser2. Cheerio can parse nearly any HTML or XML document. Cheerio works in both browser and server environments.

API

Loading

First you need to load in the HTML. This step in jQuery is implicit, since jQuery operates on the one, baked-in DOM. With Cheerio, we need to pass in the HTML document.

// ESM or TypeScript:
import * as cheerio from 'cheerio';

// In other environments:
const cheerio = require('cheerio');

const $ = cheerio.load('<ul id="fruits">...</ul>');

$.html();
//=> <html><head></head><body><ul id="fruits">...</ul></body></html>

Selectors

Once you've loaded the HTML, you can use jQuery-style selectors to find elements within the document.

$( selector, [context], [root] )

selector searches within the context scope which searches within the root scope. selector and context can be a string expression, DOM Element, array of DOM elements, or cheerio object. root, if provided, is typically the HTML document string.

This selector method is the starting point for traversing and manipulating the document. Like in jQuery, it's the primary method for selecting elements in the document.

$('.apple', '#fruits').text();
//=> Apple

$('ul .pear').attr('class');
//=> pear

$('li[class=orange]').html();
//=> Orange

Rendering

When you're ready to render the document, you can call the html method on the "root" selection:

$.root().html();
//=>  <html>
//      <head></head>
//      <body>
//        <ul id="fruits">
//          <li class="apple">Apple</li>
//          <li class="orange">Orange</li>
//          <li class="pear">Pear</li>
//        </ul>
//      </body>
//    </html>

If you want to render the outerHTML of a selection, you can use the outerHTML prop:

$('.pear').prop('outerHTML');
//=> <li class="pear">Pear</li>

You may also render the text content of a Cheerio object using the text method:

const $ = cheerio.load('This is <em>content</em>.');
$('body').text();
//=> This is content.

The "DOM Node" object

Cheerio collections are made up of objects that bear some resemblance to browser-based DOM nodes. You can expect them to define the following properties:

  • tagName
  • parentNode
  • previousSibling
  • nextSibling
  • nodeValue
  • firstChild
  • childNodes
  • lastChild

Screencasts

https://vimeo.com/31950192

This video tutorial is a follow-up to Nettut's "How to Scrape Web Pages with Node.js and jQuery", using cheerio instead of JSDOM + jQuery. This video shows how easy it is to use cheerio and how much faster cheerio is than JSDOM + jQuery.

Cheerio in the real world

Are you using cheerio in production? Add it to the wiki!

Sponsors

Does your company use Cheerio in production? Please consider sponsoring this project! Your help will allow maintainers to dedicate more time and resources to its development and support.

Headlining Sponsors

Tidelift Github AirBnB HasData

Other Sponsors

OnlineCasinosSpelen Nieuwe-Casinos.net

Backers

Become a backer to show your support for Cheerio and help us maintain and improve this open source project.

Vasy Kafidoff

License

MIT