compromise vs franc vs linguist-js vs natural
Natural Language Processing Libraries for JavaScript
compromisefranclinguist-jsnaturalSimilar Packages:

Natural Language Processing Libraries for JavaScript

compromise, franc, linguist-js, and natural are JavaScript libraries designed to handle Natural Language Processing (NLP) tasks directly in the browser or Node.js environments. natural is a comprehensive suite offering tokenization, stemming, and classification. compromise focuses on fast, lightweight text parsing and entity extraction without heavy dependencies. franc and linguist-js specialize in language detection, identifying the human language of a given text string. These tools enable developers to add text analysis, search optimization, and localization features to web applications without relying on external APIs.

Npm Package Weekly Downloads Trend

3 Years

Github Stars Ranking

Stat Detail

Package
Downloads
Stars
Size
Issues
Publish
License
compromise012,0882.59 MB1193 months agoMIT
franc04,395272 kB52 years agoMIT
linguist-js046250 kB1a year agoISC
natural010,87313.8 MB823 months agoMIT

Natural Language Processing in JavaScript: compromise vs franc vs linguist-js vs natural

Building text-aware features in JavaScript often means choosing between heavy server-side APIs or lightweight client-side libraries. The packages compromise, franc, linguist-js, and natural represent different approaches to Natural Language Processing (NLP). Some focus on understanding text structure, while others specialize in identifying the language itself. Let's compare how they handle real-world text tasks.

๐Ÿ” Core Purpose: Analysis vs Detection

natural is a general-purpose NLP framework.

  • It provides tools for tokenization, stemming, and classification.
  • Best for deep text analysis like sentiment or keyword extraction.
// natural: Stemming a word  
const natural = require('natural');
const stemmer = natural.PorterStemmer;
console.log(stemmer.stem('running')); // 'run'

compromise is a lightweight text parser.

  • It focuses on matching terms and extracting entities quickly.
  • Best for simple search logic or cleaning user input.
// compromise: Extracting people's names  
import compromise from 'compromise';
const doc = compromise('John went to Paris');
const people = doc.people().out('array'); // ['John']

franc is a language detector.

  • It identifies the human language of a string.
  • Best for routing content to the right translation service.
// franc: Detecting language  
import franc from 'franc';
const lang = franc('Hallo, wie geht es dir?'); // 'deu' (German)

linguist-js is also a language detector.

  • It mimics GitHub's language detection logic.
  • Best for detecting code languages or specific text heuristics.
// linguist-js: Detecting language  
import linguist from 'linguist-js';
linguist('console.log("hello")', (err, result) => {
  console.log(result.language); // 'JavaScript'
});

โšก Performance and Bundle Weight

Bundle size matters significantly for frontend apps. Heavy NLP libraries can slow down initial page loads.

natural includes many algorithms by default.

  • You import the whole library or specific sub-modules.
  • Can be heavy for browser-only use cases.
// natural: Importing specific module to save space  
const TfIdf = require('natural').TfIdf;
const tfidf = new TfIdf();

compromise is designed for the browser.

  • It uses a compressed grammar model.
  • Loads faster than full NLP suites.
// compromise: Chainable API for quick ops  
const doc = compromise(text);
const verbs = doc.verbs().out('array');

franc is minimalistic.

  • It uses trigram matching for speed.
  • Very small impact on build size.
// franc: Quick check with minimum length  
const lang = franc('Hello', { minLength: 3 }); 

linguist-js is heavier than franc.

  • It loads more heuristics to match GitHub's accuracy.
  • Better for accuracy, worse for speed.
// linguist-js: Async detection  
await linguist('import React from react'); 

๐Ÿ› ๏ธ Feature Depth: Stemming, Tagging, and Detection

Different tasks require different levels of linguistic understanding.

Tokenization and Stemming

natural supports standard algorithms.

  • Use Porter or Lancaster stemmers out of the box.
  • Good for search indexing.
// natural: Tokenizing and stemming  
const tokenizer = new natural.WordTokenizer();
const tokens = tokenizer.tokenize('running runs ran');
const stemmed = tokens.map(t => stemmer.stem(t));

compromise handles terms differently.

  • It keeps words in context rather than stripping suffixes.
  • Better for preserving meaning in short phrases.
// compromise: Normalizing terms  
const doc = compromise('running runs ran');
const normalized = doc.normalize().out('string');

franc does not stem or tokenize.

  • It only returns an ISO 639-3 language code.
  • Use it before passing text to other tools.
// franc: Language code only  
const code = franc('Bonjour'); // 'fra'

linguist-js does not stem or tokenize.

  • It focuses on language identity.
  • Returns language name and type.
// linguist-js: Language metadata  
// Returns { language: 'French', type: 'prose' }

๐ŸŒ Real-World Scenarios

Scenario 1: Search Bar Autocomplete

You need to match user input against a database of products, handling plurals and typos.

  • โœ… Best choice: natural
  • Why? Stemming helps match "run" with "running".
// natural: Search indexing  
tfidf.addDocument('this document is about running');
tfidf.addDocument('this document is about runs');
tfidf.tfidf('run', 0); // Matches both documents

Scenario 2: User Input Validation

You have a contact form and need to ensure users write in English.

  • โœ… Best choice: franc
  • Why? Fast detection prevents spam in wrong languages.
// franc: Validate input  
if (franc(inputText) !== 'eng') {
  throw new Error('Please write in English');
}

Scenario 3: Chatbot Intent Parsing

You need to extract names and locations from simple chat messages.

  • โœ… Best choice: compromise
  • Why? Entity extraction is built-in and lightweight.
// compromise: Extract entities  
const doc = compromise('Book a flight to London for John');
const locations = doc.places().out('array'); // ['London']

Scenario 4: Code Snippet Detection

You are building a documentation site and need to tag code blocks.

  • โœ… Best choice: linguist-js
  • Why? It detects programming languages accurately.
// linguist-js: Detect code  
linguist('def hello(): print("hi")', (err, res) => {
  console.log(res.language); // 'Python'
});

โš ๏ธ Maintenance and Compatibility

Library stability is crucial for long-term projects.

natural is mature but has had periods of low activity.

  • It works well in Node.js.
  • Browser support requires bundler configuration.
// natural: Node.js focus  
// May require polyfills for browser usage

compromise is actively maintained.

  • Regular updates for grammar rules.
  • Strong TypeScript support.
// compromise: Modern ES modules  
import compromise from 'compromise';

franc is stable and minimal.

  • Rarely changes because the algorithm is fixed.
  • Safe for production.
// franc: Stable API  
import franc from 'franc';

linguist-js depends on GitHub's data.

  • Updates when Linguist updates.
  • Can be slower to adapt to new languages.
// linguist-js: Async API  
// Requires callback or promise handling

๐Ÿ“Š Summary Table

Featurenaturalcompromisefranclinguist-js
Primary Use๐Ÿง  Full NLP Suite๐Ÿ“ Text Parsing๐ŸŒ Language Detection๐Ÿ’ป Code/Text Detection
Stemmingโœ… Yes (Porter, etc.)โŒ No (Normalization)โŒ NoโŒ No
Entity Extractionโœ… Basicโœ… AdvancedโŒ NoโŒ No
Language IDโŒ NoโŒ Noโœ… Yes (ISO 639-3)โœ… Yes (GitHub)
Bundle Size๐Ÿ˜ Large๐Ÿฆ Small๐Ÿฆ Tiny๐Ÿฅ Medium
Environment๐Ÿ–ฅ๏ธ Node/Browser๐ŸŒ Browser/Node๐ŸŒ Browser/Node๐ŸŒ Browser/Node

๐Ÿ’ก The Big Picture

natural is the heavy lifter ๐Ÿ‹๏ธโ€โ™‚๏ธ. Use it when you need serious text analysis like sentiment scoring or search indexing on the server.

compromise is the swift parser ๐Ÿƒ. Use it in the browser for quick text manipulation, entity extraction, or lightweight chatbots.

franc is the gatekeeper ๐Ÿ›ก๏ธ. Use it to quickly check what language a user is speaking before processing their input.

linguist-js is the specialist ๐Ÿง. Use it when you need GitHub-level accuracy for detecting programming languages or specific text heuristics.

Final Thought: Don't use a cannon to kill a fly. If you just need to know the language, franc is better than natural. If you need to parse sentences in the browser, compromise is better than natural. Match the tool to the specific text problem you are solving.

How to Choose: compromise vs franc vs linguist-js vs natural

  • compromise:

    Choose compromise if you need fast, client-side text parsing and entity extraction without the overhead of a full NLP suite. It is ideal for lightweight applications like browser-based search filters, simple chatbots, or text normalization where speed and small bundle size are critical. It excels at matching patterns in text rather than deep linguistic analysis.

  • franc:

    Choose franc if your primary requirement is accurate language detection with a very small footprint. It is perfect for forms or content inputs where you need to validate or tag the language before processing. It is more focused than natural and faster than linguist-js for simple detection tasks.

  • linguist-js:

    Choose linguist-js if you need language detection compatible with GitHub's Linguist logic and don't mind a slightly larger dependency. It is suitable for projects that already rely on GitHub's language heuristics or need to detect programming languages alongside human languages. Use it when franc does not provide enough specificity for your edge cases.

  • natural:

    Choose natural if you require a full-featured NLP toolkit including stemming, tf-idf, classification, and phonetics. It is best for server-side Node.js applications or heavy-duty text analysis where you need established algorithms like Porter Stemmer or Soundex. Avoid it for lightweight client-side tasks due to its larger size and broader scope.

README for compromise

compromise
modest natural language processing
npm install compromise
french โ€ข german โ€ข italian โ€ข spanish
don't you find it strange,
    how easy text is to make,

    ย โ†ฌแ”แ–œโ†ฌ ย  and how hard it is to actually parse and use?

compromise tries its best to turn text into data.
it makes limited and sensible decisions.
it's not as smart as you'd think.
import nlp from 'compromise'

let doc = nlp('she sells seashells by the seashore.')
doc.verbs().toPastTense()
doc.text()
// 'she sold seashells by the seashore.'
don't be fancy, at all:
if (doc.has('simon says #Verb')) {
  return true
}
grab parts of the text:
let doc = nlp(entireNovel)
doc.match('the #Adjective of times').text()
// "the blurst of times?"

and get data:

import plg from 'compromise-speech'
nlp.extend(plg)

let doc = nlp('Milwaukee has certainly had its share of visitors..')
doc.compute('syllables')
doc.places().json()
/*
[{
  "text": "Milwaukee",
  "terms": [{
    "normal": "milwaukee",
    "syllables": ["mil", "wau", "kee"]
  }]
}]
*/

avoid the problems of brittle parsers:

let doc = nlp("we're not gonna take it..")

doc.has('gonna') // true
doc.has('going to') // true (implicit)

// transform
doc.contractions().expand()
doc.text()
// 'we are not going to take it..'

and whip stuff around like it's data:

let doc = nlp('ninety five thousand and fifty two')
doc.numbers().add(20)
doc.text()
// 'ninety five thousand and seventy two'

-because it actually is-

let doc = nlp('the purple dinosaur')
doc.nouns().toPlural()
doc.text()
// 'the purple dinosaurs'

Use it on the client-side:

<script src="https://unpkg.com/compromise"></script>
<script>
  var doc = nlp('two bottles of beer')
  doc.numbers().minus(1)
  document.body.innerHTML = doc.text()
  // 'one bottle of beer'
</script>

or likewise:

import nlp from 'compromise'

var doc = nlp('London is calling')
doc.verbs().toNegative()
// 'London is not calling'

compromise is ~250kb (minified):

it's pretty fast. It can run on keypress:

it works mainly by conjugating all forms of a basic word list.

The final lexicon is ~14,000 words:

you can read more about how it works, here. it's weird.

okay -

compromise/one

A tokenizer of words, sentences, and punctuation.

import nlp from 'compromise/one'

let doc = nlp("Wayne's World, party time")
let data = doc.json()
/* [{
  normal:"wayne's world party time",
    terms:[{ text: "Wayne's", normal: "wayne" },
      ...
      ]
  }]
*/

compromise/one splits your text up, wraps it in a handy API,

    and does nothing else -

/one is quick - most sentences take a 10th of a millisecond.

It can do ~1mb of text a second - or 10 wikipedia pages.

Infinite jest takes 3s.

You can also parallelize, or stream text to it with compromise-speed.

compromise/two

A part-of-speech tagger, and grammar-interpreter.

import nlp from 'compromise/two'

let doc = nlp("Wayne's World, party time")
let str = doc.match('#Possessive #Noun').text()
// "Wayne's World"

compromise/two automatically calculates the very basic grammar of each word.

this is more useful than people sometimes realize.

Light grammar helps you write cleaner templates, and get closer to the information.

compromise has 83 tags, arranged in a handsome graph.

#FirstName โ†’ #Person โ†’ #ProperNoun โ†’ #Noun

you can see the grammar of each word by running doc.debug()

you can see the reasoning for each tag with nlp.verbose('tagger').

if you prefer Penn tags, you can derive them with:

let doc = nlp('welcome thrillho')
doc.compute('penn')
doc.json()

compromise/three

Phrase and sentence tooling.

import nlp from 'compromise/three'

let doc = nlp("Wayne's World, party time")
let str = doc.people().normalize().text()
// "wayne"

compromise/three is a set of tooling to zoom into and operate on parts of a text.

.numbers() grabs all the numbers in a document, for example - and extends it with new methods, like .subtract().

When you have a phrase, or group of words, you can see additional metadata about it with .json()

let doc = nlp('four out of five dentists')
console.log(doc.fractions().json())
/*[{
    text: 'four out of five',
    terms: [ [Object], [Object], [Object], [Object] ],
    fraction: { numerator: 4, denominator: 5, decimal: 0.8 }
  }
]*/
let doc = nlp('$4.09CAD')
doc.money().json()
/*[{
    text: '$4.09CAD',
    terms: [ [Object] ],
    number: { prefix: '$', num: 4.09, suffix: 'cad'}
  }
]*/

API

Compromise/one

Output
  • .text() - return the document as text
  • .json() - return the document as data
  • .debug() - pretty-print the interpreted document
  • .out() - a named or custom output
  • .html({}) - output custom html tags for matches
  • .wrap({}) - produce custom output for document matches
Utils
  • .found [getter] - is this document empty?
  • .docs [getter] get term objects as json
  • .length [getter] - count the # of characters in the document (string length)
  • .isView [getter] - identify a compromise object
  • .compute() - run a named analysis on the document
  • .clone() - deep-copy the document, so that no references remain
  • .termList() - return a flat list of all Term objects in match
  • .cache({}) - freeze the current state of the document, for speed-purposes
  • .uncache() - un-freezes the current state of the document, so it may be transformed
  • .freeze({}) - prevent any tags from being removed, in these terms
  • .unfreeze({}) - allow tags to change again, as default
Accessors
Match

(match methods use the match-syntax.)

  • .match('') - return a new Doc, with this one as a parent
  • .not('') - return all results except for this
  • .matchOne('') - return only the first match
  • .if('') - return each current phrase, only if it contains this match ('only')
  • .ifNo('') - Filter-out any current phrases that have this match ('notIf')
  • .has('') - Return a boolean if this match exists
  • .before('') - return all terms before a match, in each phrase
  • .after('') - return all terms after a match, in each phrase
  • .union() - return combined matches without duplicates
  • .intersection() - return only duplicate matches
  • .complement() - get everything not in another match
  • .settle() - remove overlaps from matches
  • .growRight('') - add any matching terms immediately after each match
  • .growLeft('') - add any matching terms immediately before each match
  • .grow('') - add any matching terms before or after each match
  • .sweep(net) - apply a series of match objects to the document
  • .splitOn('') - return a Document with three parts for every match ('splitOn')
  • .splitBefore('') - partition a phrase before each matching segment
  • .splitAfter('') - partition a phrase after each matching segment
  • .join() - merge any neighbouring terms in each match
  • .joinIf(leftMatch, rightMatch) - merge any neighbouring terms under given conditions
  • .lookup([]) - quick find for an array of string matches
  • .autoFill() - create type-ahead assumptions on the document
Tag
  • .tag('') - Give all terms the given tag
  • .tagSafe('') - Only apply tag to terms if it is consistent with current tags
  • .unTag('') - Remove this term from the given terms
  • .canBe('') - return only the terms that can be this tag
Case
Whitespace
  • .pre('') - add this punctuation or whitespace before each match
  • .post('') - add this punctuation or whitespace after each match
  • .trim() - remove start and end whitespace
  • .hyphenate() - connect words with hyphen, and remove whitespace
  • .dehyphenate() - remove hyphens between words, and set whitespace
  • .toQuotations() - add quotation marks around these matches
  • .toParentheses() - add brackets around these matches
Loops
  • .map(fn) - run each phrase through a function, and create a new document
  • .forEach(fn) - run a function on each phrase, as an individual document
  • .filter(fn) - return only the phrases that return true
  • .find(fn) - return a document with only the first phrase that matches
  • .some(fn) - return true or false if there is one matching phrase
  • .random(fn) - sample a subset of the results
Insert
Transform
  • .sort('method') - re-arrange the order of the matches (in place)
  • .reverse() - reverse the order of the matches, but not the words
  • .unique() - remove any duplicate matches
Lib

(these methods are on the main nlp object)

compromise/two:

Contractions

compromise/three:

Nouns
Verbs
Numbers
Sentences
Adjectives
Misc selections

.extend():

This library comes with a considerate, common-sense baseline for english grammar.

You're free to change, or lay-waste to any settings - which is the fun part actually.

the easiest part is just to suggest tags for any given words:

let myWords = {
  kermit: 'FirstName',
  fozzie: 'FirstName',
}
let doc = nlp(muppetText, myWords)

or make heavier changes with a compromise-plugin.

import nlp from 'compromise'
nlp.extend({
  // add new tags
  tags: {
    Character: {
      isA: 'Person',
      notA: 'Adjective',
    },
  },
  // add or change words in the lexicon
  words: {
    kermit: 'Character',
    gonzo: 'Character',
  },
  // change inflections
  irregulars: {
    get: {
      pastTense: 'gotten',
      gerund: 'gettin',
    },
  },
  // add new methods to compromise
  api: View => {
    View.prototype.kermitVoice = function () {
      this.sentences().prepend('well,')
      this.match('i [(am|was)]').prepend('um,')
      return this
    }
  },
})

Docs:

gentle introduction:
Documentation:
ConceptsAPIPlugins
AccuracyAccessorsAdjectives
CachingConstructor-methodsDates
CaseContractionsExport
FilesizeInsertHash
InternalsJsonHtml
JustificationCharacter OffsetsKeypress
LexiconLoopsNgrams
Match-syntaxMatchNumbers
PerformanceNounsParagraphs
PluginsOutputScan
ProjectsSelectionsSentences
TaggerSortingSyllables
TagsSplitPronounce
TokenizationTextStrict
Named-EntitiesUtilsPenn-tags
WhitespaceVerbsTypeahead
World dataNormalizationSweep
Fuzzy-matchingTypescriptMutation
Root-forms
Talks:
Articles:
Some fun Applications:
Comparisons

Plugins:

These are some helpful extensions:

Dates

npm install compromise-dates

Stats

npm install compromise-stats

Speech

npm install compromise-syllables

Wikipedia

npm install compromise-wikipedia


Typescript

we're committed to typescript/deno support, both in main and in the official-plugins:

import nlp from 'compromise'
import stats from 'compromise-stats'

const nlpEx = nlp.extend(stats)

nlpEx('This is type safe!').ngrams({ min: 1 })

Limitations:

  • slash-support: We currently split slashes up as different words, like we do for hyphens. so things like this don't work: nlp('the koala eats/shoots/leaves').has('koala leaves') //false

  • inter-sentence match: By default, sentences are the top-level abstraction. Inter-sentence, or multi-sentence matches aren't supported without a plugin: nlp("that's it. Back to Winnipeg!").has('it back')//false

  • nested match syntax: the danger beauty of regex is that you can recurse indefinitely. Our match syntax is much weaker. Things like this are not (yet) possible: doc.match('(modern (major|minor))? general') complex matches must be achieved with successive .match() statements.

  • dependency parsing: Proper sentence transformation requires understanding the syntax tree of a sentence, which we don't currently do. We should! Help wanted with this.

FAQ

    โ˜‚๏ธ Isn't javascript too...

      yeah it is!
      it wasn't built to compete with NLTK, and may not fit every project.
      string processing is synchronous too, and parallelizing node processes is weird.
      See here for information about speed & performance, and here for project motivations

    ๐Ÿ’ƒ Can it run on my arduino-watch?

      Only if it's water-proof!
      Read quick start for running compromise in workers, mobile apps, and all sorts of funny environments.

    ๐ŸŒŽ Compromise in other Languages?

    โœจ Partial builds?

      we do offer a tokenize-only build, which has the POS-tagger pulled-out.
      but otherwise, compromise isn't easily tree-shaken.
      the tagging methods are competitive, and greedy, so it's not recommended to pull things out.
      Note that without a full POS-tagging, the contraction-parser won't work perfectly. ((spencer's cool) vs. (spencer's house))
      It's recommended to run the library fully.

See Also:

  • ย  en-pos - very clever javascript pos-tagger by Alex Corvi

  • ย  naturalNode - fancier statistical nlp in javascript

  • ย  winkJS - POS-tagger, tokenizer, machine-learning in javascript

  • ย  dariusk/pos-js - fastTag fork in javascript

  • ย  compendium-js - POS and sentiment analysis in javascript

  • ย  nodeBox linguistics - conjugation, inflection in javascript

  • ย  reText - very impressive text utilities in javascript

  • ย  superScript - conversation engine in js

  • ย  jsPos - javascript build of the time-tested Brill-tagger

  • ย  spaCy - speedy, multilingual tagger in C/python

  • ย  Prose - quick tagger in Go by Joseph Kato

  • ย  TextBlob - python tagger

MIT