rss-parser vs feedparser
RSS and Atom Feed Parsing Comparison
1 Year
rss-parserfeedparserSimilar Packages:
What's RSS and Atom Feed Parsing?

RSS and Atom Feed Parsing libraries in Node.js are tools that allow developers to read and extract data from RSS (Really Simple Syndication) and Atom feeds. These feeds are XML-based formats used by websites to syndicate content, such as blog posts, news articles, and podcasts. Parsing these feeds enables applications to aggregate, display, or process the content programmatically. feedparser is a streaming XML parser specifically designed for parsing RSS and Atom feeds, while rss-parser is a lightweight and easy-to-use library that provides a simple API for parsing RSS and Atom feeds into JavaScript objects.

Package Weekly Downloads Trend
Github Stars Ranking
Stat Detail
Package
Downloads
Stars
Size
Issues
Publish
License
rss-parser404,5211,4531.87 MB672 years agoMIT
feedparser18,0301,973-205 years agoMIT
Feature Comparison: rss-parser vs feedparser

Parsing Methodology

  • rss-parser:

    rss-parser parses feeds by fetching the entire XML document and then converting it into a JavaScript object. While this method is simple and effective for most feeds, it may not be as memory-efficient as streaming parsers for very large feeds.

  • feedparser:

    feedparser uses a streaming approach to parse feeds, which means it processes the XML data as it is received, rather than loading the entire feed into memory. This makes it highly efficient for parsing large feeds without consuming excessive memory.

Event-Driven vs. Promise-Based

  • rss-parser:

    rss-parser provides a promise-based API for parsing feeds, which allows for easy integration with asynchronous code. This makes it simple to use in modern JavaScript applications, especially those that leverage async/await syntax.

  • feedparser:

    feedparser follows an event-driven architecture, emitting events as different parts of the feed are parsed (e.g., meta, readable, end). This allows for real-time processing of feed data, making it suitable for applications that need to handle data as it arrives.

Memory Usage

  • rss-parser:

    rss-parser may consume more memory when parsing large feeds, as it loads the entire XML document into memory before processing it. This can be a limitation for applications that need to handle very large feeds or operate in memory-constrained environments.

  • feedparser:

    feedparser is designed to be memory-efficient, as it processes data in chunks and does not require the entire feed to be loaded into memory. This makes it a good choice for parsing large feeds or handling multiple feeds simultaneously without risking memory overflow.

Customization and Extensibility

  • rss-parser:

    rss-parser provides basic customization options, such as the ability to modify the feed parsing process and handle specific elements. However, it may not be as extensible as feedparser for more complex or specialized parsing requirements.

  • feedparser:

    feedparser offers a high degree of customization, allowing developers to implement their own logic for handling different feed elements. Its event-driven nature makes it easy to extend and adapt for specialized parsing needs.

Example Code

  • rss-parser:

    Example of using rss-parser to parse a feed:

    const Parser = require('rss-parser');
    const parser = new Parser();
    
    (async () => {
      const feed = await parser.parseURL('https://example.com/feed');
      console.log(feed.title);
      feed.items.forEach(item => {
        console.log(`${item.title} - ${item.link}`);
      });
    })();
    
  • feedparser:

    Example of using feedparser to parse a feed:

    const FeedParser = require('feedparser');
    const request = require('request');
    
    const req = request('https://example.com/feed');
    const feedparser = new FeedParser();
    
    req.on('error', (error) => console.error(error));
    req.pipe(feedparser);
    
    feedparser.on('error', (error) => console.error(error));
    feedparser.on('readable', () => {
      let item;
      while ((item = this.read())) {
        console.log(item);
      }
    });
    
    feedparser.on('end', () => console.log('Feed parsing completed.'));
    
How to Choose: rss-parser vs feedparser
  • rss-parser:

    Choose rss-parser if you prefer a lightweight and simple solution with a straightforward API for quick parsing of RSS and Atom feeds. It is suitable for projects where ease of use and quick integration are priorities, and it handles most common use cases without extensive configuration.

  • feedparser:

    Choose feedparser if you need a robust, streaming parser that can handle large feeds efficiently and provides detailed event-driven parsing. It is ideal for applications that require fine-grained control over the parsing process and can handle both RSS and Atom formats seamlessly.

README for rss-parser

rss-parser

Version Build Status Downloads

A small library for turning RSS XML feeds into JavaScript objects.

Installation

npm install --save rss-parser

Usage

You can parse RSS from a URL (parser.parseURL) or an XML string (parser.parseString).

Both callbacks and Promises are supported.

NodeJS

Here's an example in NodeJS using Promises with async/await:

let Parser = require('rss-parser');
let parser = new Parser();

(async () => {

  let feed = await parser.parseURL('https://www.reddit.com/.rss');
  console.log(feed.title);

  feed.items.forEach(item => {
    console.log(item.title + ':' + item.link)
  });

})();

TypeScript

When using TypeScript, you can set a type to control the custom fields:

import Parser from 'rss-parser';

type CustomFeed = {foo: string};
type CustomItem = {bar: number};

const parser: Parser<CustomFeed, CustomItem> = new Parser({
  customFields: {
    feed: ['foo', 'baz'],
    //            ^ will error because `baz` is not a key of CustomFeed
    item: ['bar']
  }
});

(async () => {

  const feed = await parser.parseURL('https://www.reddit.com/.rss');
  console.log(feed.title); // feed will have a `foo` property, type as a string

  feed.items.forEach(item => {
    console.log(item.title + ':' + item.link) // item will have a `bar` property type as a number
  });
})();

Web

We recommend using a bundler like webpack, but we also provide pre-built browser distributions in the dist/ folder. If you use the pre-built distribution, you'll need a polyfill for Promise support.

Here's an example in the browser using callbacks:

<script src="/node_modules/rss-parser/dist/rss-parser.min.js"></script>
<script>

// Note: some RSS feeds can't be loaded in the browser due to CORS security.
// To get around this, you can use a proxy.
const CORS_PROXY = "https://cors-anywhere.herokuapp.com/"

let parser = new RSSParser();
parser.parseURL(CORS_PROXY + 'https://www.reddit.com/.rss', function(err, feed) {
  if (err) throw err;
  console.log(feed.title);
  feed.items.forEach(function(entry) {
    console.log(entry.title + ':' + entry.link);
  })
})

</script>

Upgrading from v2 to v3

A few minor breaking changes were made in v3. Here's what you need to know:

  • You need to construct a new Parser() before calling parseString or parseURL
  • parseFile is no longer available (for better browser support)
  • options are now passed to the Parser constructor
  • parsed.feed is now just feed (top-level object removed)
  • feed.entries is now feed.items (to better match RSS XML)

Output

Check out the full output format in test/output/reddit.json

feedUrl: 'https://www.reddit.com/.rss'
title: 'reddit: the front page of the internet'
description: ""
link: 'https://www.reddit.com/'
items:
    - title: 'The water is too deep, so he improvises'
      link: 'https://www.reddit.com/r/funny/comments/3skxqc/the_water_is_too_deep_so_he_improvises/'
      pubDate: 'Thu, 12 Nov 2015 21:16:39 +0000'
      creator: "John Doe"
      content: '<a href="http://example.com">this is a link</a> &amp; <b>this is bold text</b>'
      contentSnippet: 'this is a link & this is bold text'
      guid: 'https://www.reddit.com/r/funny/comments/3skxqc/the_water_is_too_deep_so_he_improvises/'
      categories:
          - funny
      isoDate: '2015-11-12T21:16:39.000Z'
Notes:
  • The contentSnippet field strips out HTML tags and unescapes HTML entities
  • The dc: prefix will be removed from all fields
  • Both dc:date and pubDate will be available in ISO 8601 format as isoDate
  • If author is specified, but not dc:creator, creator will be set to author (see article)
  • Atom's updated becomes lastBuildDate for consistency

XML Options

Custom Fields

If your RSS feed contains fields that aren't currently returned, you can access them using the customFields option.

let parser = new Parser({
  customFields: {
    feed: ['otherTitle', 'extendedDescription'],
    item: ['coAuthor','subtitle'],
  }
});

parser.parseURL('https://www.reddit.com/.rss', function(err, feed) {
  console.log(feed.extendedDescription);

  feed.items.forEach(function(entry) {
    console.log(entry.coAuthor + ':' + entry.subtitle);
  })
})

To rename fields, you can pass in an array with two items, in the format [fromField, toField]:

let parser = new Parser({
  customFields: {
    item: [
      ['dc:coAuthor', 'coAuthor'],
    ]
  }
})

To pass additional flags, provide an object as the third array item. Currently there is one such flag:

  • keepArray (false) - set to true to return all values for fields that can have multiple entries.
  • includeSnippet (false) - set to true to add an additional field, ${toField}Snippet, with HTML stripped out
let parser = new Parser({
  customFields: {
    item: [
      ['media:content', 'media:content', {keepArray: true}],
    ]
  }
})

Default RSS version

If your RSS Feed doesn't contain a <rss> tag with a version attribute, you can pass a defaultRSS option for the Parser to use:

let parser = new Parser({
  defaultRSS: 2.0
});

xml2js passthrough

rss-parser uses xml2js to parse XML. You can pass these options to new xml2js.Parser() by specifying options.xml2js:

let parser = new Parser({
  xml2js: {
    emptyTag: '--EMPTY--',
  }
});

HTTP Options

Timeout

You can set the amount of time (in milliseconds) to wait before the HTTP request times out (default 60 seconds):

let parser = new Parser({
  timeout: 1000,
});

Headers

You can pass headers to the HTTP request:

let parser = new Parser({
  headers: {'User-Agent': 'something different'},
});

Redirects

By default, parseURL will follow up to five redirects. You can change this with options.maxRedirects.

let parser = new Parser({maxRedirects: 100});

Request passthrough

rss-parser uses http/https module to do requests. You can pass these options to http.get()/https.get() by specifying options.requestOptions:

e.g. to allow unauthorized certificate

let parser = new Parser({
  requestOptions: {
    rejectUnauthorized: false
  }
});

Contributing

Contributions are welcome! If you are adding a feature or fixing a bug, please be sure to add a test case

Running Tests

The tests run the RSS parser for several sample RSS feeds in test/input and outputs the resulting JSON into test/output. If there are any changes to the output files the tests will fail.

To check if your changes affect the output of any test cases, run

npm test

To update the output files with your changes, run

WRITE_GOLDEN=true npm test

Publishing Releases

npm run build
git commit -a -m "Build distribution"
npm version minor # or major/patch
npm publish
git push --follow-tags