compromise vs natural vs retext
Natural Language Processing and Text Analysis in JavaScript
compromisenaturalretextSimilar Packages:

Natural Language Processing and Text Analysis in JavaScript

compromise, natural, and retext are three distinct JavaScript libraries for working with text, but they solve different problems. compromise is a lightweight NLP library designed specifically for the browser, focusing on fast text matching and tagging without heavy dependencies. natural is a general-purpose NLP toolkit that provides classic algorithms like stemming, TF-IDF, and classifiers, suitable for both Node.js and browser environments. retext is a plugin-based text processor focused on prose quality, grammar checking, and style analysis, built on the unified ecosystem. While natural offers broad algorithmic support, compromise prioritizes speed and simplicity for frontend tasks, and retext excels at linting and transforming human-readable content.

Npm Package Weekly Downloads Trend

3 Years

Github Stars Ranking

Stat Detail

Package
Downloads
Stars
Size
Issues
Publish
License
compromise012,1022.59 MB1182 days agoMIT
natural010,87813.8 MB823 months agoMIT
retext02,43110.3 kB03 years agoMIT

Natural Language Processing in JavaScript: compromise vs natural vs retext

When adding text intelligence to a JavaScript application, developers often face a choice between speed, algorithmic depth, and prose quality. compromise, natural, and retext represent three different approaches to this challenge. compromise focuses on lightweight pattern matching in the browser. natural provides a broad suite of classic NLP algorithms for Node and web. retext specializes in analyzing and improving human writing through a plugin system. Let's compare how they handle common text processing tasks.

๐Ÿ” Core Purpose: Extraction vs Analysis vs Proofreading

compromise is built for quick text extraction and tagging.

  • It treats text as a sequence of terms to match against rules.
  • Best for finding names, dates, or specific phrases in user input.
// compromise: Tagging and matching
import nlp from 'compromise';

const doc = nlp('John Smith bought 5 apples in New York');
const people = doc.match('#Person').out('array');
// Output: ['John Smith']

const places = doc.match('#Place').out('array');
// Output: ['New York']

natural offers a toolkit of standard NLP algorithms.

  • It includes stemmers, classifiers, and spell checkers.
  • Best for backend processing or heavy analysis tasks.
// natural: Stemming and analysis
import natural from 'natural';

const stemmer = natural.PorterStemmer;
const word = 'running';
const stemmed = stemmer.stem(word);
// Output: 'run'

const tfidf = new natural.TfIdf();
tfidf.addDocument('this document is about cheese');
tfidf.addDocument('this document is about milk');

retext focuses on prose quality and grammar.

  • It parses text into a syntax tree for detailed inspection.
  • Best for linting content, checking readability, or enforcing style guides.
// retext: Prose analysis
import retext from 'retext';
import retextEnglish from 'retext-english';
import retextReadability from 'retext-readability';

const processor = retext()
  .use(retextEnglish)
  .use(retextReadability);

const result = await processor.process('This sentence is short.');
// result.messages contains readability scores and warnings

โšก Performance and Environment: Browser vs Node

compromise is optimized for the browser.

  • It avoids heavy dependencies and large models.
  • Runs synchronously or asynchronously without blocking the main thread excessively.
// compromise: Browser-friendly
import nlp from 'compromise';

// Works directly in browser without bundler configuration
const doc = nlp('Hello world');
console.log(doc.json());

natural works in Node and browser but is heavier.

  • Some features rely on Node-specific modules (like file system).
  • Requires bundling configuration for frontend use.
// natural: Node-focused features
import natural from 'natural';

// Some classifiers may need training data loaded from files
const classifier = new natural.BayesClassifier();
classifier.addDocument('i feel good', 'positive');
classifier.train();

retext is environment-agnostic but plugin-dependent.

  • Core is small, but plugins add weight.
  • Fully compatible with browser and Node environments.
// retext: Plugin architecture
import retext from 'retext';
import retextSyllables from 'retext-syllables';

// Add only the plugins you need
const processor = retext().use(retextSyllables);
const result = await processor.process('Example text');

๐Ÿ› ๏ธ API Design: Chains vs Objects vs Processors

compromise uses a chainable API on document objects.

  • Methods return new document objects for further filtering.
  • Feels like querying a database of terms.
// compromise: Chaining queries
import nlp from 'compromise';

const doc = nlp('Steve jobs founded apple in 1976');

const founded = doc
  .match('#Person')
  .if('#Founded')
  .out('text');
// Output: 'Steve jobs'

natural uses class-based instantiation.

  • You create instances of tools (stemmer, classifier, etc.).
  • More traditional object-oriented approach.
// natural: Class instances
import natural from 'natural';

const spellcheck = new natural.SpellCheck(['apple', 'apply']);
const corrections = spellcheck.getCorrections('appl', 1);
// Output: ['apple', 'apply']

retext uses a unified processor pipeline.

  • You attach plugins to a processor and run text through it.
  • Separates parsing from analysis logic.
// retext: Processor pipeline
import retext from 'retext';
import retextProfanities from 'retext-profanities';

const processor = retext().use(retextProfanities);
const result = await processor.process('This is damn good');
// result.messages lists profanity warnings

๐Ÿ“ฆ Feature Coverage: What Each Library Does Best

Featurecompromisenaturalretext
Named Entity Recognitionโœ… Built-in (#Person, #Place)โš ๏ธ Limited/ManualโŒ Via plugins
Stemming/Lemmatizationโš ๏ธ Basicโœ… Full supportโœ… Via plugins
Sentiment AnalysisโŒ Not core focusโœ… Built-inโœ… Via plugins
Grammar/Style CheckingโŒ NoโŒ Noโœ… Core strength
Spell CheckingโŒ Noโœ… Built-inโœ… Via plugins
Browser Optimizationโœ… Highโš ๏ธ Moderateโœ… High

๐ŸŒ Real-World Scenarios

Scenario 1: Frontend Search Filter

You need to extract locations from a user's search query to filter a map.

  • โœ… Best choice: compromise
  • Why? It runs fast in the browser and understands terms like #Place out of the box.
// compromise: Extracting locations
import nlp from 'compromise';

function extractLocation(query) {
  return nlp(query).match('#Place').text();
}

Scenario 2: Product Review Analysis

You want to classify customer reviews as positive or negative on a server.

  • โœ… Best choice: natural
  • Why? It includes sentiment analysis and classifiers ready for backend use.
// natural: Sentiment analysis
import natural from 'natural';

const analyzer = new natural.SentimentAnalyzer();
const stemmer = natural.PorterStemmer;
const tokens = ['this', 'product', 'is', 'great'];
const score = analyzer.getSentiment(tokens);

Scenario 3: Content Management System

You run a blog platform and want to warn authors about complex sentences.

  • โœ… Best choice: retext
  • Why? It specializes in readability and style checks with clear messages.
// retext: Readability check
import retext from 'retext';
import retextReadability from 'retext-readability';

const processor = retext().use(retextReadability);
const file = await processor.process('Long complex text...');
console.warn(file.messages); // Shows readability issues

โš ๏ธ Maintenance and Ecosystem

compromise is actively maintained with a focus on browser compatibility.

  • Updates frequently add new term tags and matching rules.
  • Community plugins extend its vocabulary.

natural has a long history but slower update cycles.

  • It is stable but may lack modern NLP advancements.
  • Some features are better suited for Node.js environments.

retext is part of the unified ecosystem (like remark and rehype).

  • Benefits from a large plugin library for specific tasks.
  • Well-maintained by a dedicated community focused on text tooling.

๐Ÿ’ก The Big Picture

compromise is your lightweight frontend scanner ๐Ÿ“ฑ. Use it when you need to understand user input quickly without sending data to a server. It trades deep linguistic accuracy for speed and ease of use in the browser.

natural is your general-purpose NLP Swiss Army knife ๐Ÿ”ช. Use it for backend services that need standard algorithms like stemming, classification, or TF-IDF. It is reliable for traditional NLP tasks but heavier than compromise.

retext is your writing assistant editor โœ๏ธ. Use it when the quality of human text matters. It is the standard for linting prose, checking grammar, and enforcing style guides in content-heavy applications.

Final Thought: These tools are not interchangeable. If you need to extract entities in the browser, pick compromise. If you need sentiment analysis on a server, pick natural. If you need to check grammar, pick retext. Matching the tool to the specific text problem will save you significant engineering time.

How to Choose: compromise vs natural vs retext

  • compromise:

    Choose compromise if you need fast, lightweight text processing directly in the browser without a build step or heavy dependencies. It is ideal for frontend tasks like extracting entities, matching patterns, or simple tagging where performance and bundle size matter more than deep linguistic accuracy.

  • natural:

    Choose natural if you require classic NLP algorithms like stemming, spell-checking, or sentiment analysis in a Node.js backend or a bundled frontend app. It is suitable for projects that need a broad toolkit of standard NLP features and can tolerate a larger bundle size for more comprehensive functionality.

  • retext:

    Choose retext if your goal is to analyze or improve human writing, such as checking grammar, readability, or style. It is the best fit for content platforms, editors, or documentation tools where prose quality is critical, leveraging a rich plugin ecosystem for specific linguistic rules.

README for compromise

compromise
modest natural language processing
npm install compromise
french โ€ข german โ€ข italian โ€ข spanish
don't you find it strange,
    how easy text is to make,

    ย โ†ฌแ”แ–œโ†ฌ ย  and how hard it is to actually parse and use?

compromise tries its best to turn text into data.
it makes limited and sensible decisions.
it's not as smart as you'd think.
import nlp from 'compromise'

let doc = nlp('she sells seashells by the seashore.')
doc.verbs().toPastTense()
doc.text()
// 'she sold seashells by the seashore.'
don't be fancy, at all:
if (doc.has('simon says #Verb')) {
  return true
}
grab parts of the text:
let doc = nlp(entireNovel)
doc.match('the #Adjective of times').text()
// "the blurst of times?"

and get data:

import plg from 'compromise-speech'
nlp.extend(plg)

let doc = nlp('Milwaukee has certainly had its share of visitors..')
doc.compute('syllables')
doc.places().json()
/*
[{
  "text": "Milwaukee",
  "terms": [{
    "normal": "milwaukee",
    "syllables": ["mil", "wau", "kee"]
  }]
}]
*/

avoid the problems of brittle parsers:

let doc = nlp("we're not gonna take it..")

doc.has('gonna') // true
doc.has('going to') // true (implicit)

// transform
doc.contractions().expand()
doc.text()
// 'we are not going to take it..'

and whip stuff around like it's data:

let doc = nlp('ninety five thousand and fifty two')
doc.numbers().add(20)
doc.text()
// 'ninety five thousand and seventy two'

-because it actually is-

let doc = nlp('the purple dinosaur')
doc.nouns().toPlural()
doc.text()
// 'the purple dinosaurs'

Use it on the client-side:

<script src="https://unpkg.com/compromise"></script>
<script>
  var doc = nlp('two bottles of beer')
  doc.numbers().minus(1)
  document.body.innerHTML = doc.text()
  // 'one bottle of beer'
</script>

or likewise:

import nlp from 'compromise'

var doc = nlp('London is calling')
doc.verbs().toNegative()
// 'London is not calling'

compromise is ~250kb (minified):

it's pretty fast. It can run on keypress:

it works mainly by conjugating all forms of a basic word list.

The final lexicon is ~14,000 words:

you can read more about how it works, here. it's weird.

okay -

compromise/one

A tokenizer of words, sentences, and punctuation.

import nlp from 'compromise/one'

let doc = nlp("Wayne's World, party time")
let data = doc.json()
/* [{
  normal:"wayne's world party time",
    terms:[{ text: "Wayne's", normal: "wayne" },
      ...
      ]
  }]
*/

compromise/one splits your text up, wraps it in a handy API,

    and does nothing else -

/one is quick - most sentences take a 10th of a millisecond.

It can do ~1mb of text a second - or 10 wikipedia pages.

Infinite jest takes 3s.

You can also parallelize, or stream text to it with compromise-speed.

compromise/two

A part-of-speech tagger, and grammar-interpreter.

import nlp from 'compromise/two'

let doc = nlp("Wayne's World, party time")
let str = doc.match('#Possessive #Noun').text()
// "Wayne's World"

compromise/two automatically calculates the very basic grammar of each word.

this is more useful than people sometimes realize.

Light grammar helps you write cleaner templates, and get closer to the information.

compromise has 83 tags, arranged in a handsome graph.

#FirstName โ†’ #Person โ†’ #ProperNoun โ†’ #Noun

you can see the grammar of each word by running doc.debug()

you can see the reasoning for each tag with nlp.verbose('tagger').

if you prefer Penn tags, you can derive them with:

let doc = nlp('welcome thrillho')
doc.compute('penn')
doc.json()

compromise/three

Phrase and sentence tooling.

import nlp from 'compromise/three'

let doc = nlp("Wayne's World, party time")
let str = doc.people().normalize().text()
// "wayne"

compromise/three is a set of tooling to zoom into and operate on parts of a text.

.numbers() grabs all the numbers in a document, for example - and extends it with new methods, like .subtract().

When you have a phrase, or group of words, you can see additional metadata about it with .json()

let doc = nlp('four out of five dentists')
console.log(doc.fractions().json())
/*[{
    text: 'four out of five',
    terms: [ [Object], [Object], [Object], [Object] ],
    fraction: { numerator: 4, denominator: 5, decimal: 0.8 }
  }
]*/
let doc = nlp('$4.09CAD')
doc.money().json()
/*[{
    text: '$4.09CAD',
    terms: [ [Object] ],
    number: { prefix: '$', num: 4.09, suffix: 'cad'}
  }
]*/

API

Compromise/one

Output
  • .text() - return the document as text
  • .json() - return the document as data
  • .debug() - pretty-print the interpreted document
  • .out() - a named or custom output
  • .html({}) - output custom html tags for matches
  • .wrap({}) - produce custom output for document matches
Utils
  • .found [getter] - is this document empty?
  • .docs [getter] get term objects as json
  • .length [getter] - count the # of characters in the document (string length)
  • .isView [getter] - identify a compromise object
  • .compute() - run a named analysis on the document
  • .clone() - deep-copy the document, so that no references remain
  • .termList() - return a flat list of all Term objects in match
  • .cache({}) - freeze the current state of the document, for speed-purposes
  • .uncache() - un-freezes the current state of the document, so it may be transformed
  • .freeze({}) - prevent any tags from being removed, in these terms
  • .unfreeze({}) - allow tags to change again, as default
Accessors
Match

(match methods use the match-syntax.)

  • .match('') - return a new Doc, with this one as a parent
  • .not('') - return all results except for this
  • .matchOne('') - return only the first match
  • .if('') - return each current phrase, only if it contains this match ('only')
  • .ifNo('') - Filter-out any current phrases that have this match ('notIf')
  • .has('') - Return a boolean if this match exists
  • .before('') - return all terms before a match, in each phrase
  • .after('') - return all terms after a match, in each phrase
  • .union() - return combined matches without duplicates
  • .intersection() - return only duplicate matches
  • .complement() - get everything not in another match
  • .settle() - remove overlaps from matches
  • .growRight('') - add any matching terms immediately after each match
  • .growLeft('') - add any matching terms immediately before each match
  • .grow('') - add any matching terms before or after each match
  • .sweep(net) - apply a series of match objects to the document
  • .splitOn('') - return a Document with three parts for every match ('splitOn')
  • .splitBefore('') - partition a phrase before each matching segment
  • .splitAfter('') - partition a phrase after each matching segment
  • .join() - merge any neighbouring terms in each match
  • .joinIf(leftMatch, rightMatch) - merge any neighbouring terms under given conditions
  • .lookup([]) - quick find for an array of string matches
  • .autoFill() - create type-ahead assumptions on the document
Tag
  • .tag('') - Give all terms the given tag
  • .tagSafe('') - Only apply tag to terms if it is consistent with current tags
  • .unTag('') - Remove this term from the given terms
  • .canBe('') - return only the terms that can be this tag
Case
Whitespace
  • .pre('') - add this punctuation or whitespace before each match
  • .post('') - add this punctuation or whitespace after each match
  • .trim() - remove start and end whitespace
  • .hyphenate() - connect words with hyphen, and remove whitespace
  • .dehyphenate() - remove hyphens between words, and set whitespace
  • .toQuotations() - add quotation marks around these matches
  • .toParentheses() - add brackets around these matches
Loops
  • .map(fn) - run each phrase through a function, and create a new document
  • .forEach(fn) - run a function on each phrase, as an individual document
  • .filter(fn) - return only the phrases that return true
  • .find(fn) - return a document with only the first phrase that matches
  • .some(fn) - return true or false if there is one matching phrase
  • .random(fn) - sample a subset of the results
Insert
Transform
  • .sort('method') - re-arrange the order of the matches (in place)
  • .reverse() - reverse the order of the matches, but not the words
  • .unique() - remove any duplicate matches
Lib

(these methods are on the main nlp object)

compromise/two:

Contractions

compromise/three:

Nouns
Verbs
Numbers
Sentences
Adjectives
Misc selections

.extend():

This library comes with a considerate, common-sense baseline for english grammar.

You're free to change, or lay-waste to any settings - which is the fun part actually.

the easiest part is just to suggest tags for any given words:

let myWords = {
  kermit: 'FirstName',
  fozzie: 'FirstName',
}
let doc = nlp(muppetText, myWords)

or make heavier changes with a compromise-plugin.

import nlp from 'compromise'
nlp.extend({
  // add new tags
  tags: {
    Character: {
      isA: 'Person',
      notA: 'Adjective',
    },
  },
  // add or change words in the lexicon
  words: {
    kermit: 'Character',
    gonzo: 'Character',
  },
  // change inflections
  irregulars: {
    get: {
      pastTense: 'gotten',
      gerund: 'gettin',
    },
  },
  // add new methods to compromise
  api: View => {
    View.prototype.kermitVoice = function () {
      this.sentences().prepend('well,')
      this.match('i [(am|was)]').prepend('um,')
      return this
    }
  },
})

Docs:

gentle introduction:
Documentation:
ConceptsAPIPlugins
AccuracyAccessorsAdjectives
CachingConstructor-methodsDates
CaseContractionsExport
FilesizeInsertHash
InternalsJsonHtml
JustificationCharacter OffsetsKeypress
LexiconLoopsNgrams
Match-syntaxMatchNumbers
PerformanceNounsParagraphs
PluginsOutputScan
ProjectsSelectionsSentences
TaggerSortingSyllables
TagsSplitPronounce
TokenizationTextStrict
Named-EntitiesUtilsPenn-tags
WhitespaceVerbsTypeahead
World dataNormalizationSweep
Fuzzy-matchingTypescriptMutation
Root-forms
Talks:
Articles:
Some fun Applications:
Comparisons

Plugins:

These are some helpful extensions:

Dates

npm install compromise-dates

Stats

npm install compromise-stats

Speech

npm install compromise-syllables

Wikipedia

npm install compromise-wikipedia


Typescript

we're committed to typescript/deno support, both in main and in the official-plugins:

import nlp from 'compromise'
import stats from 'compromise-stats'

const nlpEx = nlp.extend(stats)

nlpEx('This is type safe!').ngrams({ min: 1 })

Limitations:

  • slash-support: We currently split slashes up as different words, like we do for hyphens. so things like this don't work: nlp('the koala eats/shoots/leaves').has('koala leaves') //false

  • inter-sentence match: By default, sentences are the top-level abstraction. Inter-sentence, or multi-sentence matches aren't supported without a plugin: nlp("that's it. Back to Winnipeg!").has('it back')//false

  • nested match syntax: the danger beauty of regex is that you can recurse indefinitely. Our match syntax is much weaker. Things like this are not (yet) possible: doc.match('(modern (major|minor))? general') complex matches must be achieved with successive .match() statements.

  • dependency parsing: Proper sentence transformation requires understanding the syntax tree of a sentence, which we don't currently do. We should! Help wanted with this.

FAQ

    โ˜‚๏ธ Isn't javascript too...

      yeah it is!
      it wasn't built to compete with NLTK, and may not fit every project.
      string processing is synchronous too, and parallelizing node processes is weird.
      See here for information about speed & performance, and here for project motivations

    ๐Ÿ’ƒ Can it run on my arduino-watch?

      Only if it's water-proof!
      Read quick start for running compromise in workers, mobile apps, and all sorts of funny environments.

    ๐ŸŒŽ Compromise in other Languages?

    โœจ Partial builds?

      we do offer a tokenize-only build, which has the POS-tagger pulled-out.
      but otherwise, compromise isn't easily tree-shaken.
      the tagging methods are competitive, and greedy, so it's not recommended to pull things out.
      Note that without a full POS-tagging, the contraction-parser won't work perfectly. ((spencer's cool) vs. (spencer's house))
      It's recommended to run the library fully.

See Also:

  • ย  en-pos - very clever javascript pos-tagger by Alex Corvi

  • ย  naturalNode - fancier statistical nlp in javascript

  • ย  winkJS - POS-tagger, tokenizer, machine-learning in javascript

  • ย  dariusk/pos-js - fastTag fork in javascript

  • ย  compendium-js - POS and sentiment analysis in javascript

  • ย  nodeBox linguistics - conjugation, inflection in javascript

  • ย  reText - very impressive text utilities in javascript

  • ย  superScript - conversation engine in js

  • ย  jsPos - javascript build of the time-tested Brill-tagger

  • ย  spaCy - speedy, multilingual tagger in C/python

  • ย  Prose - quick tagger in Go by Joseph Kato

  • ย  TextBlob - python tagger

MIT