compromise, natural, and retext are three distinct JavaScript libraries for working with text, but they solve different problems. compromise is a lightweight NLP library designed specifically for the browser, focusing on fast text matching and tagging without heavy dependencies. natural is a general-purpose NLP toolkit that provides classic algorithms like stemming, TF-IDF, and classifiers, suitable for both Node.js and browser environments. retext is a plugin-based text processor focused on prose quality, grammar checking, and style analysis, built on the unified ecosystem. While natural offers broad algorithmic support, compromise prioritizes speed and simplicity for frontend tasks, and retext excels at linting and transforming human-readable content.
When adding text intelligence to a JavaScript application, developers often face a choice between speed, algorithmic depth, and prose quality. compromise, natural, and retext represent three different approaches to this challenge. compromise focuses on lightweight pattern matching in the browser. natural provides a broad suite of classic NLP algorithms for Node and web. retext specializes in analyzing and improving human writing through a plugin system. Let's compare how they handle common text processing tasks.
compromise is built for quick text extraction and tagging.
// compromise: Tagging and matching
import nlp from 'compromise';
const doc = nlp('John Smith bought 5 apples in New York');
const people = doc.match('#Person').out('array');
// Output: ['John Smith']
const places = doc.match('#Place').out('array');
// Output: ['New York']
natural offers a toolkit of standard NLP algorithms.
// natural: Stemming and analysis
import natural from 'natural';
const stemmer = natural.PorterStemmer;
const word = 'running';
const stemmed = stemmer.stem(word);
// Output: 'run'
const tfidf = new natural.TfIdf();
tfidf.addDocument('this document is about cheese');
tfidf.addDocument('this document is about milk');
retext focuses on prose quality and grammar.
// retext: Prose analysis
import retext from 'retext';
import retextEnglish from 'retext-english';
import retextReadability from 'retext-readability';
const processor = retext()
.use(retextEnglish)
.use(retextReadability);
const result = await processor.process('This sentence is short.');
// result.messages contains readability scores and warnings
compromise is optimized for the browser.
// compromise: Browser-friendly
import nlp from 'compromise';
// Works directly in browser without bundler configuration
const doc = nlp('Hello world');
console.log(doc.json());
natural works in Node and browser but is heavier.
// natural: Node-focused features
import natural from 'natural';
// Some classifiers may need training data loaded from files
const classifier = new natural.BayesClassifier();
classifier.addDocument('i feel good', 'positive');
classifier.train();
retext is environment-agnostic but plugin-dependent.
// retext: Plugin architecture
import retext from 'retext';
import retextSyllables from 'retext-syllables';
// Add only the plugins you need
const processor = retext().use(retextSyllables);
const result = await processor.process('Example text');
compromise uses a chainable API on document objects.
// compromise: Chaining queries
import nlp from 'compromise';
const doc = nlp('Steve jobs founded apple in 1976');
const founded = doc
.match('#Person')
.if('#Founded')
.out('text');
// Output: 'Steve jobs'
natural uses class-based instantiation.
// natural: Class instances
import natural from 'natural';
const spellcheck = new natural.SpellCheck(['apple', 'apply']);
const corrections = spellcheck.getCorrections('appl', 1);
// Output: ['apple', 'apply']
retext uses a unified processor pipeline.
// retext: Processor pipeline
import retext from 'retext';
import retextProfanities from 'retext-profanities';
const processor = retext().use(retextProfanities);
const result = await processor.process('This is damn good');
// result.messages lists profanity warnings
| Feature | compromise | natural | retext |
|---|---|---|---|
| Named Entity Recognition | โ Built-in (#Person, #Place) | โ ๏ธ Limited/Manual | โ Via plugins |
| Stemming/Lemmatization | โ ๏ธ Basic | โ Full support | โ Via plugins |
| Sentiment Analysis | โ Not core focus | โ Built-in | โ Via plugins |
| Grammar/Style Checking | โ No | โ No | โ Core strength |
| Spell Checking | โ No | โ Built-in | โ Via plugins |
| Browser Optimization | โ High | โ ๏ธ Moderate | โ High |
You need to extract locations from a user's search query to filter a map.
compromise// compromise: Extracting locations
import nlp from 'compromise';
function extractLocation(query) {
return nlp(query).match('#Place').text();
}
You want to classify customer reviews as positive or negative on a server.
natural// natural: Sentiment analysis
import natural from 'natural';
const analyzer = new natural.SentimentAnalyzer();
const stemmer = natural.PorterStemmer;
const tokens = ['this', 'product', 'is', 'great'];
const score = analyzer.getSentiment(tokens);
You run a blog platform and want to warn authors about complex sentences.
retext// retext: Readability check
import retext from 'retext';
import retextReadability from 'retext-readability';
const processor = retext().use(retextReadability);
const file = await processor.process('Long complex text...');
console.warn(file.messages); // Shows readability issues
compromise is actively maintained with a focus on browser compatibility.
natural has a long history but slower update cycles.
retext is part of the unified ecosystem (like remark and rehype).
compromise is your lightweight frontend scanner ๐ฑ. Use it when you need to understand user input quickly without sending data to a server. It trades deep linguistic accuracy for speed and ease of use in the browser.
natural is your general-purpose NLP Swiss Army knife ๐ช. Use it for backend services that need standard algorithms like stemming, classification, or TF-IDF. It is reliable for traditional NLP tasks but heavier than compromise.
retext is your writing assistant editor โ๏ธ. Use it when the quality of human text matters. It is the standard for linting prose, checking grammar, and enforcing style guides in content-heavy applications.
Final Thought: These tools are not interchangeable. If you need to extract entities in the browser, pick compromise. If you need sentiment analysis on a server, pick natural. If you need to check grammar, pick retext. Matching the tool to the specific text problem will save you significant engineering time.
Choose compromise if you need fast, lightweight text processing directly in the browser without a build step or heavy dependencies. It is ideal for frontend tasks like extracting entities, matching patterns, or simple tagging where performance and bundle size matter more than deep linguistic accuracy.
Choose natural if you require classic NLP algorithms like stemming, spell-checking, or sentiment analysis in a Node.js backend or a bundled frontend app. It is suitable for projects that need a broad toolkit of standard NLP features and can tolerate a larger bundle size for more comprehensive functionality.
Choose retext if your goal is to analyze or improve human writing, such as checking grammar, readability, or style. It is the best fit for content platforms, editors, or documentation tools where prose quality is critical, leveraging a rich plugin ecosystem for specific linguistic rules.
npm install compromise
how easy text is to make,
ย โฌแแโฌ ย and how hard it is to actually parse and use?
compromise tries its best to turn text into data.
it makes limited and sensible decisions.
it's not as smart as you'd think.
import nlp from 'compromise'
let doc = nlp('she sells seashells by the seashore.')
doc.verbs().toPastTense()
doc.text()
// 'she sold seashells by the seashore.'
if (doc.has('simon says #Verb')) {
return true
}
let doc = nlp(entireNovel)
doc.match('the #Adjective of times').text()
// "the blurst of times?"
and get data:
import plg from 'compromise-speech'
nlp.extend(plg)
let doc = nlp('Milwaukee has certainly had its share of visitors..')
doc.compute('syllables')
doc.places().json()
/*
[{
"text": "Milwaukee",
"terms": [{
"normal": "milwaukee",
"syllables": ["mil", "wau", "kee"]
}]
}]
*/
avoid the problems of brittle parsers:
let doc = nlp("we're not gonna take it..")
doc.has('gonna') // true
doc.has('going to') // true (implicit)
// transform
doc.contractions().expand()
doc.text()
// 'we are not going to take it..'
and whip stuff around like it's data:
let doc = nlp('ninety five thousand and fifty two')
doc.numbers().add(20)
doc.text()
// 'ninety five thousand and seventy two'
-because it actually is-
let doc = nlp('the purple dinosaur')
doc.nouns().toPlural()
doc.text()
// 'the purple dinosaurs'
Use it on the client-side:
<script src="https://unpkg.com/compromise"></script>
<script>
var doc = nlp('two bottles of beer')
doc.numbers().minus(1)
document.body.innerHTML = doc.text()
// 'one bottle of beer'
</script>
or likewise:
import nlp from 'compromise'
var doc = nlp('London is calling')
doc.verbs().toNegative()
// 'London is not calling'
compromise is ~250kb (minified):
it's pretty fast. It can run on keypress:
it works mainly by conjugating all forms of a basic word list.
The final lexicon is ~14,000 words:
you can read more about how it works, here. it's weird.
okay -
compromise/one
A tokenizer of words, sentences, and punctuation.
import nlp from 'compromise/one'
let doc = nlp("Wayne's World, party time")
let data = doc.json()
/* [{
normal:"wayne's world party time",
terms:[{ text: "Wayne's", normal: "wayne" },
...
]
}]
*/
compromise/one splits your text up, wraps it in a handy API,
/one is quick - most sentences take a 10th of a millisecond.
It can do ~1mb of text a second - or 10 wikipedia pages.
Infinite jest takes 3s.
compromise/two
A part-of-speech tagger, and grammar-interpreter.
import nlp from 'compromise/two'
let doc = nlp("Wayne's World, party time")
let str = doc.match('#Possessive #Noun').text()
// "Wayne's World"
this is more useful than people sometimes realize.
Light grammar helps you write cleaner templates, and get closer to the information.
compromise has 83 tags, arranged in a handsome graph.
#FirstName โ #Person โ #ProperNoun โ #Noun
you can see the grammar of each word by running doc.debug()
you can see the reasoning for each tag with nlp.verbose('tagger').
if you prefer Penn tags, you can derive them with:
let doc = nlp('welcome thrillho')
doc.compute('penn')
doc.json()
compromise/three
Phrase and sentence tooling.
import nlp from 'compromise/three'
let doc = nlp("Wayne's World, party time")
let str = doc.people().normalize().text()
// "wayne"
compromise/three is a set of tooling to zoom into and operate on parts of a text.
.numbers() grabs all the numbers in a document, for example - and extends it with new methods, like .subtract().
When you have a phrase, or group of words, you can see additional metadata about it with .json()
let doc = nlp('four out of five dentists')
console.log(doc.fractions().json())
/*[{
text: 'four out of five',
terms: [ [Object], [Object], [Object], [Object] ],
fraction: { numerator: 4, denominator: 5, decimal: 0.8 }
}
]*/
let doc = nlp('$4.09CAD')
doc.money().json()
/*[{
text: '$4.09CAD',
terms: [ [Object] ],
number: { prefix: '$', num: 4.09, suffix: 'cad'}
}
]*/
(match methods use the match-syntax.)
(these methods are on the main nlp object)
nlp.tokenize(str) - parse text without running POS-tagging
nlp.lazy(str, match) - scan through a text with minimal analysis
nlp.plugin({}) - mix in a compromise-plugin
nlp.parseMatch(str) - pre-parse any match statements into json
nlp.world() - grab or change library internals
nlp.model() - grab all current linguistic data
nlp.methods() - grab or change internal methods
nlp.hooks() - see which compute methods run automatically
nlp.verbose(mode) - log our decision-making for debugging
nlp.version - current semver version of the library
nlp.addWords(obj, isFrozen?) - add new words to the lexicon
nlp.addTags(obj) - add new tags to the tagSet
nlp.typeahead(arr) - add words to the auto-fill dictionary
nlp.buildTrie(arr) - compile a list of words into a fast lookup form
nlp.buildNet(arr) - compile a list of matches into a fast match form
'football captain' โ 'football captains''turnovers' โ 'turnover''will go' โ 'went''walked' โ 'walks''walked' โ 'will walk''walks' โ 'walk''walks' โ 'walking''drive' โ 'had driven''went' โ 'did not go'"didn't study" โ 'studied'5fivefifth or 5thfive or 5'$2.50'
he walks -> he walkedhe walked -> he walkshe walks -> he will walkhe walks -> he walkhe walks -> he didn't walk?!? or !'quick'
'wash-out''(939) 555-0113''#nlp''hi@compromise.cool':)๐'@nlp_compromise''compromise.cool''he''but''of''Mrs.'people() + places() + organizations()'quickly'
'FBI'
"Spencer's"
This library comes with a considerate, common-sense baseline for english grammar.
You're free to change, or lay-waste to any settings - which is the fun part actually.
the easiest part is just to suggest tags for any given words:
let myWords = {
kermit: 'FirstName',
fozzie: 'FirstName',
}
let doc = nlp(muppetText, myWords)
or make heavier changes with a compromise-plugin.
import nlp from 'compromise'
nlp.extend({
// add new tags
tags: {
Character: {
isA: 'Person',
notA: 'Adjective',
},
},
// add or change words in the lexicon
words: {
kermit: 'Character',
gonzo: 'Character',
},
// change inflections
irregulars: {
get: {
pastTense: 'gotten',
gerund: 'gettin',
},
},
// add new methods to compromise
api: View => {
View.prototype.kermitVoice = function () {
this.sentences().prepend('well,')
this.match('i [(am|was)]').prepend('um,')
return this
}
},
})
These are some helpful extensions:
npm install compromise-dates
June 8th or 03/03/18
2 weeks or 5mins
4:30pm or half past five
npm install compromise-stats
.tfidf({}) - rank words by frequency and uniqueness
.ngrams({}) - list all repeating sub-phrases, by word-count
.unigrams() - n-grams with one word
.bigrams() - n-grams with two words
.trigrams() - n-grams with three words
.startgrams() - n-grams including the first term of a phrase
.endgrams() - n-grams including the last term of a phrase
.edgegrams() - n-grams including the first or last term of a phrase
npm install compromise-syllables
npm install compromise-wikipedia
we're committed to typescript/deno support, both in main and in the official-plugins:
import nlp from 'compromise'
import stats from 'compromise-stats'
const nlpEx = nlp.extend(stats)
nlpEx('This is type safe!').ngrams({ min: 1 })
slash-support:
We currently split slashes up as different words, like we do for hyphens. so things like this don't work:
nlp('the koala eats/shoots/leaves').has('koala leaves') //false
inter-sentence match:
By default, sentences are the top-level abstraction.
Inter-sentence, or multi-sentence matches aren't supported without a plugin:
nlp("that's it. Back to Winnipeg!").has('it back')//false
nested match syntax:
the danger beauty of regex is that you can recurse indefinitely.
Our match syntax is much weaker. Things like this are not (yet) possible:
doc.match('(modern (major|minor))? general')
complex matches must be achieved with successive .match() statements.
dependency parsing: Proper sentence transformation requires understanding the syntax tree of a sentence, which we don't currently do. We should! Help wanted with this.
ย en-pos - very clever javascript pos-tagger by Alex Corvi
ย naturalNode - fancier statistical nlp in javascript
ย winkJS - POS-tagger, tokenizer, machine-learning in javascript
ย dariusk/pos-js - fastTag fork in javascript
ย compendium-js - POS and sentiment analysis in javascript
ย nodeBox linguistics - conjugation, inflection in javascript
ย reText - very impressive text utilities in javascript
ย superScript - conversation engine in js
ย jsPos - javascript build of the time-tested Brill-tagger
ย spaCy - speedy, multilingual tagger in C/python
ย Prose - quick tagger in Go by Joseph Kato
ย TextBlob - python tagger
MIT