cheerio vs html vs htmlparser2 vs jsdom
HTML Parsing and Manipulation
cheeriohtmlhtmlparser2jsdomSimilar Packages:

HTML Parsing and Manipulation

HTML parsing libraries in JavaScript provide tools for reading, manipulating, and extracting data from HTML documents. These libraries are essential for tasks such as web scraping, server-side rendering, and manipulating the DOM in environments outside the browser. They offer various features, including parsing HTML strings, traversing the DOM tree, and modifying elements, attributes, and content. These libraries are particularly useful in Node.js applications, where direct access to the browser's DOM API is not available. They enable developers to work with HTML documents in a structured and efficient manner, making it easier to extract information, manipulate content, and perform tasks that would typically require a browser environment. The choice of library depends on the specific needs of the project, such as performance requirements, ease of use, and the complexity of the HTML manipulation tasks.

Npm Package Weekly Downloads Trend

3 Years

Github Stars Ranking

Stat Detail

Package
Downloads
Stars
Size
Issues
Publish
License
cheerio030,2981.01 MB393 months agoMIT
html076-1110 years agoBSD
htmlparser204,777235 kB12a month agoMIT
jsdom021,5647.03 MB41119 hours agoMIT

Feature Comparison: cheerio vs html vs htmlparser2 vs jsdom

Parsing Speed

  • cheerio:

    cheerio is built on top of htmlparser2, which provides fast parsing capabilities. It is optimized for performance, making it a great choice for web scraping and other tasks that require quick HTML parsing.

  • html:

    html is designed to be lightweight and efficient, providing fast parsing and serialization of HTML documents. It is suitable for applications that need quick processing of HTML without significant overhead.

  • htmlparser2:

    htmlparser2 is one of the fastest HTML parsers available in the Node.js ecosystem. It is designed for performance, especially when dealing with large documents, making it ideal for streaming and memory-efficient parsing.

  • jsdom:

    jsdom is slower compared to the other libraries because it emulates a full browser environment. The performance trade-off is worth it for applications that need a complete DOM implementation, but it may not be suitable for tasks that require only simple parsing.

DOM Manipulation

  • cheerio:

    cheerio provides a jQuery-like API for manipulating the DOM, making it easy to select, modify, and traverse elements. It is particularly useful for tasks like web scraping, where you need to extract or modify content quickly.

  • html:

    html provides basic DOM manipulation capabilities, but it is not as feature-rich as cheerio or jsdom. It is suitable for simple tasks that require minimal manipulation of HTML elements.

  • htmlparser2:

    htmlparser2 focuses on parsing rather than manipulation. It provides a low-level API for handling HTML and XML, but it does not offer built-in tools for manipulating the DOM. Developers often use it in combination with other libraries for manipulation tasks.

  • jsdom:

    jsdom offers a full-featured DOM API, including support for advanced features like event handling, CSSOM, and more. It is the best choice for applications that require comprehensive DOM manipulation and a browser-like environment.

Memory Usage

  • cheerio:

    cheerio is memory-efficient, especially when compared to full browser emulation libraries. However, it still loads the entire HTML document into memory, which can be a concern for very large documents.

  • html:

    html is lightweight and has a small memory footprint, making it suitable for applications that need to process HTML without consuming significant resources.

  • htmlparser2:

    htmlparser2 is designed for low memory usage, particularly when used in streaming mode. It is ideal for applications that need to parse large documents without loading them entirely into memory.

  • jsdom:

    jsdom consumes more memory than the other libraries because it creates a complete DOM tree and emulates a browser environment. This makes it less suitable for memory-constrained applications.

Feature Completeness

  • cheerio:

    cheerio provides a comprehensive set of features for HTML manipulation, including support for CSS selectors, attribute manipulation, and content editing. It is a great all-around tool for web scraping and simple DOM tasks.

  • html:

    html offers basic features for parsing and serializing HTML, but it lacks advanced capabilities like CSS selector support or event handling. It is best used for simple tasks that do not require extensive functionality.

  • htmlparser2:

    htmlparser2 is focused on parsing and does not provide high-level features for DOM manipulation or serialization. It is a low-level library that excels at parsing but requires additional tools for more complex tasks.

  • jsdom:

    jsdom is the most feature-complete library in this group, offering a full implementation of the DOM, including support for events, styles, and more. It is ideal for applications that need a complete web environment in Node.js.

Ease of Use: Code Examples

  • cheerio:

    cheerio is easy to use, especially for developers familiar with jQuery. Its API is intuitive and well-documented, making it quick to learn and implement.

  • html:

    html has a simple API that is easy to understand, but its lack of advanced features may require developers to implement additional functionality on their own.

  • htmlparser2:

    htmlparser2 has a more complex API due to its low-level nature. It may take some time for developers to become proficient, especially if they are not familiar with event-driven parsing.

  • jsdom:

    jsdom has a comprehensive API that mirrors the browser DOM, but its complexity can be overwhelming for beginners. It is well-documented, which helps ease the learning curve.

Ease of Use: Code Examples

  • cheerio:

    cheerio is easy to use, especially for developers familiar with jQuery. Its API is intuitive and well-documented, making it quick to learn and implement.

  • html:

    html has a simple API that is easy to understand, but its lack of advanced features may require developers to implement additional functionality on their own.

  • htmlparser2:

    htmlparser2 has a more complex API due to its low-level nature. It may take some time for developers to become proficient, especially if they are not familiar with event-driven parsing.

  • jsdom:

    jsdom has a comprehensive API that mirrors the browser DOM, but its complexity can be overwhelming for beginners. It is well-documented, which helps ease the learning curve.

How to Choose: cheerio vs html vs htmlparser2 vs jsdom

  • cheerio:

    Choose cheerio if you need a fast and lightweight solution for parsing and manipulating HTML on the server side. It is ideal for web scraping and simple DOM manipulation tasks without the overhead of a full browser environment.

  • html:

    Choose html if you need a simple and efficient way to parse and serialize HTML documents. It is suitable for projects that require basic HTML manipulation without the need for complex features or a large API.

  • htmlparser2:

    Choose htmlparser2 if you need a high-performance, event-driven parser for handling large HTML or XML documents. It is ideal for applications that require streaming parsing and low memory usage, such as web crawlers and data extraction tools.

  • jsdom:

    Choose jsdom if you need a full-featured DOM implementation in Node.js that closely mimics a real browser environment. It is suitable for applications that require advanced DOM manipulation, event handling, and support for modern web APIs.

README for cheerio

cheerio

The fast, flexible, and elegant library for parsing and manipulating HTML and XML.

中文文档 (Chinese Readme)

import * as cheerio from 'cheerio';
const $ = cheerio.load('<h2 class="title">Hello world</h2>');

$('h2.title').text('Hello there!');
$('h2').addClass('welcome');

$.html();
//=> <html><head></head><body><h2 class="title welcome">Hello there!</h2></body></html>

Installation

Install Cheerio using a package manager like npm, yarn, or bun.

npm install cheerio
# or
bun add cheerio

Features

❤ Proven syntax: Cheerio implements a subset of core jQuery. Cheerio removes all the DOM inconsistencies and browser cruft from the jQuery library, revealing its truly gorgeous API.

ϟ Blazingly fast: Cheerio works with a very simple, consistent DOM model. As a result parsing, manipulating, and rendering are incredibly efficient.

❁ Incredibly flexible: Cheerio wraps around parse5 for parsing HTML and can optionally use the forgiving htmlparser2. Cheerio can parse nearly any HTML or XML document. Cheerio works in both browser and server environments.

API

Loading

First you need to load in the HTML. This step in jQuery is implicit, since jQuery operates on the one, baked-in DOM. With Cheerio, we need to pass in the HTML document.

// ESM or TypeScript:
import * as cheerio from 'cheerio';

// In other environments:
const cheerio = require('cheerio');

const $ = cheerio.load('<ul id="fruits">...</ul>');

$.html();
//=> <html><head></head><body><ul id="fruits">...</ul></body></html>

Selectors

Once you've loaded the HTML, you can use jQuery-style selectors to find elements within the document.

$( selector, [context], [root] )

selector searches within the context scope which searches within the root scope. selector and context can be a string expression, DOM Element, array of DOM elements, or cheerio object. root, if provided, is typically the HTML document string.

This selector method is the starting point for traversing and manipulating the document. Like in jQuery, it's the primary method for selecting elements in the document.

$('.apple', '#fruits').text();
//=> Apple

$('ul .pear').attr('class');
//=> pear

$('li[class=orange]').html();
//=> Orange

Rendering

When you're ready to render the document, you can call the html method on the "root" selection:

$.root().html();
//=>  <html>
//      <head></head>
//      <body>
//        <ul id="fruits">
//          <li class="apple">Apple</li>
//          <li class="orange">Orange</li>
//          <li class="pear">Pear</li>
//        </ul>
//      </body>
//    </html>

If you want to render the outerHTML of a selection, you can use the outerHTML prop:

$('.pear').prop('outerHTML');
//=> <li class="pear">Pear</li>

You may also render the text content of a Cheerio object using the text method:

const $ = cheerio.load('This is <em>content</em>.');
$('body').text();
//=> This is content.

The "DOM Node" object

Cheerio collections are made up of objects that bear some resemblance to browser-based DOM nodes. You can expect them to define the following properties:

  • tagName
  • parentNode
  • previousSibling
  • nextSibling
  • nodeValue
  • firstChild
  • childNodes
  • lastChild

Screencasts

https://vimeo.com/31950192

This video tutorial is a follow-up to Nettut's "How to Scrape Web Pages with Node.js and jQuery", using cheerio instead of JSDOM + jQuery. This video shows how easy it is to use cheerio and how much faster cheerio is than JSDOM + jQuery.

Cheerio in the real world

Are you using cheerio in production? Add it to the wiki!

Sponsors

Does your company use Cheerio in production? Please consider sponsoring this project! Your help will allow maintainers to dedicate more time and resources to its development and support.

Headlining Sponsors

Tidelift Github AirBnB HasData

Other Sponsors

OnlineCasinosSpelen Nieuwe-Casinos.net

Backers

Become a backer to show your support for Cheerio and help us maintain and improve this open source project.

Vasy Kafidoff

License

MIT