compromise vs franc vs linguist-js vs natural
Natural Language Processing Libraries
compromisefranclinguist-jsnaturalSimilar Packages:

Natural Language Processing Libraries

Natural Language Processing (NLP) libraries are essential tools for developers working with text data, enabling them to analyze, understand, and generate human language in a meaningful way. These libraries provide various functionalities such as language detection, tokenization, part-of-speech tagging, and more, making it easier to build applications that can interact with users in natural language. The choice of an NLP library can significantly impact the efficiency and effectiveness of text processing tasks, depending on the specific requirements of the project.

Npm Package Weekly Downloads Trend

3 Years

Github Stars Ranking

Stat Detail

Package
Downloads
Stars
Size
Issues
Publish
License
compromise012,0532.59 MB11920 days agoMIT
franc04,380272 kB62 years agoMIT
linguist-js046250 kB2a year agoISC
natural010,87313.8 MB8018 days agoMIT

Feature Comparison: compromise vs franc vs linguist-js vs natural

Language Detection

  • compromise:

    Compromise does not specialize in language detection but can handle basic text processing tasks in various languages once the language is known.

  • franc:

    Franc excels in language detection, capable of identifying over 400 languages with high accuracy. It uses n-grams to analyze text and determine the most likely language, making it a go-to choice for multilingual applications.

  • linguist-js:

    Linguist-js is designed to detect programming languages rather than natural languages, making it ideal for analyzing code repositories and identifying the languages used in source files.

  • natural:

    Natural does not focus specifically on language detection but can be integrated with other libraries for this purpose, providing a more comprehensive NLP solution.

Text Processing Capabilities

  • compromise:

    Compromise offers a range of text processing capabilities, including part-of-speech tagging, noun phrase extraction, and text transformation. It is designed for quick and efficient manipulation of natural language text.

  • franc:

    Franc is limited to language detection and does not provide text processing capabilities beyond identifying the language of a given text.

  • linguist-js:

    Linguist-js focuses on analyzing programming languages and does not provide general text processing functionalities for natural language.

  • natural:

    Natural provides extensive text processing capabilities, including tokenization, stemming, classification, and sentiment analysis, making it suitable for a wide range of NLP tasks.

Performance

  • compromise:

    Compromise is lightweight and optimized for performance, allowing for fast text processing without significant overhead. It is suitable for applications where speed is a priority.

  • franc:

    Franc is designed for high performance in language detection, providing quick results even with large text inputs, making it efficient for real-time applications.

  • linguist-js:

    Linguist-js is efficient in analyzing code and can quickly identify programming languages, but its performance may vary depending on the complexity of the code being analyzed.

  • natural:

    Natural is comprehensive but may have performance trade-offs due to its extensive feature set. It is suitable for applications where a wide range of NLP functionalities is required, but performance optimization may be necessary for large datasets.

Ease of Use

  • compromise:

    Compromise is user-friendly and has a straightforward API, making it easy for developers to integrate into their applications without a steep learning curve.

  • franc:

    Franc is simple to use, with minimal setup required for language detection, making it accessible for developers who need quick language identification.

  • linguist-js:

    Linguist-js is designed for developers familiar with language analysis; however, it may require some understanding of programming language syntax for effective use.

  • natural:

    Natural has a steeper learning curve due to its comprehensive feature set, but it provides extensive documentation and examples to help developers get started.

Extensibility

  • compromise:

    Compromise is designed to be extensible, allowing developers to create custom plugins and enhance its capabilities for specific use cases.

  • franc:

    Franc is not designed for extensibility; it focuses solely on language detection without providing hooks for additional functionality.

  • linguist-js:

    Linguist-js can be extended to support additional programming languages, making it flexible for developers working with diverse codebases.

  • natural:

    Natural is extensible and allows developers to add custom algorithms and models, making it suitable for specialized NLP tasks.

How to Choose: compromise vs franc vs linguist-js vs natural

  • compromise:

    Choose Compromise if you need a lightweight library focused on natural language understanding and manipulation, particularly for tasks like parsing, tagging, and transforming text. It is ideal for applications that require quick and simple text processing without the overhead of more complex NLP solutions.

  • franc:

    Choose Franc if your primary requirement is language detection. It is a fast and efficient library that can identify over 400 languages, making it suitable for applications that need to handle multilingual content or require language identification as a preprocessing step.

  • linguist-js:

    Choose Linguist-js if you need a library that can analyze and classify programming languages in addition to natural languages. It is particularly useful for applications that involve code analysis or require language detection in source code files, providing insights into the languages used in a codebase.

  • natural:

    Choose Natural if you are looking for a comprehensive NLP toolkit that includes a wide range of functionalities such as tokenization, stemming, classification, and phonetics. It is suitable for more complex NLP tasks and provides a robust set of tools for building sophisticated language processing applications.

README for compromise

compromise
modest natural language processing
npm install compromise
don't you find it strange,
    how easy text is to make,

     ᔐᖜ   and how hard it is to actually parse and use?

compromise tries its best to turn text into data.
it makes limited and sensible decisions.
it's not as smart as you'd think.
import nlp from 'compromise'

let doc = nlp('she sells seashells by the seashore.')
doc.verbs().toPastTense()
doc.text()
// 'she sold seashells by the seashore.'
don't be fancy, at all:
if (doc.has('simon says #Verb')) {
  return true
}
grab parts of the text:
let doc = nlp(entireNovel)
doc.match('the #Adjective of times').text()
// "the blurst of times?"

and get data:

import plg from 'compromise-speech'
nlp.extend(plg)

let doc = nlp('Milwaukee has certainly had its share of visitors..')
doc.compute('syllables')
doc.places().json()
/*
[{
  "text": "Milwaukee",
  "terms": [{
    "normal": "milwaukee",
    "syllables": ["mil", "wau", "kee"]
  }]
}]
*/

avoid the problems of brittle parsers:

let doc = nlp("we're not gonna take it..")

doc.has('gonna') // true
doc.has('going to') // true (implicit)

// transform
doc.contractions().expand()
doc.text()
// 'we are not going to take it..'

and whip stuff around like it's data:

let doc = nlp('ninety five thousand and fifty two')
doc.numbers().add(20)
doc.text()
// 'ninety five thousand and seventy two'

-because it actually is-

let doc = nlp('the purple dinosaur')
doc.nouns().toPlural()
doc.text()
// 'the purple dinosaurs'

Use it on the client-side:

<script src="https://unpkg.com/compromise"></script>
<script>
  var doc = nlp('two bottles of beer')
  doc.numbers().minus(1)
  document.body.innerHTML = doc.text()
  // 'one bottle of beer'
</script>

or likewise:

import nlp from 'compromise'

var doc = nlp('London is calling')
doc.verbs().toNegative()
// 'London is not calling'

compromise is ~250kb (minified):

it's pretty fast. It can run on keypress:

it works mainly by conjugating all forms of a basic word list.

The final lexicon is ~14,000 words:

you can read more about how it works, here. it's weird.

okay -

compromise/one

A tokenizer of words, sentences, and punctuation.

import nlp from 'compromise/one'

let doc = nlp("Wayne's World, party time")
let data = doc.json()
/* [{
  normal:"wayne's world party time",
    terms:[{ text: "Wayne's", normal: "wayne" },
      ...
      ]
  }]
*/

compromise/one splits your text up, wraps it in a handy API,

    and does nothing else -

/one is quick - most sentences take a 10th of a millisecond.

It can do ~1mb of text a second - or 10 wikipedia pages.

Infinite jest takes 3s.

You can also parallelize, or stream text to it with compromise-speed.

compromise/two

A part-of-speech tagger, and grammar-interpreter.

import nlp from 'compromise/two'

let doc = nlp("Wayne's World, party time")
let str = doc.match('#Possessive #Noun').text()
// "Wayne's World"

compromise/two automatically calculates the very basic grammar of each word.

this is more useful than people sometimes realize.

Light grammar helps you write cleaner templates, and get closer to the information.

compromise has 83 tags, arranged in a handsome graph.

#FirstName#Person#ProperNoun#Noun

you can see the grammar of each word by running doc.debug()

you can see the reasoning for each tag with nlp.verbose('tagger').

if you prefer Penn tags, you can derive them with:

let doc = nlp('welcome thrillho')
doc.compute('penn')
doc.json()

compromise/three

Phrase and sentence tooling.

import nlp from 'compromise/three'

let doc = nlp("Wayne's World, party time")
let str = doc.people().normalize().text()
// "wayne"

compromise/three is a set of tooling to zoom into and operate on parts of a text.

.numbers() grabs all the numbers in a document, for example - and extends it with new methods, like .subtract().

When you have a phrase, or group of words, you can see additional metadata about it with .json()

let doc = nlp('four out of five dentists')
console.log(doc.fractions().json())
/*[{
    text: 'four out of five',
    terms: [ [Object], [Object], [Object], [Object] ],
    fraction: { numerator: 4, denominator: 5, decimal: 0.8 }
  }
]*/
let doc = nlp('$4.09CAD')
doc.money().json()
/*[{
    text: '$4.09CAD',
    terms: [ [Object] ],
    number: { prefix: '$', num: 4.09, suffix: 'cad'}
  }
]*/

API

Compromise/one

Output
  • .text() - return the document as text
  • .json() - return the document as data
  • .debug() - pretty-print the interpreted document
  • .out() - a named or custom output
  • .html({}) - output custom html tags for matches
  • .wrap({}) - produce custom output for document matches
Utils
  • .found [getter] - is this document empty?
  • .docs [getter] get term objects as json
  • .length [getter] - count the # of characters in the document (string length)
  • .isView [getter] - identify a compromise object
  • .compute() - run a named analysis on the document
  • .clone() - deep-copy the document, so that no references remain
  • .termList() - return a flat list of all Term objects in match
  • .cache({}) - freeze the current state of the document, for speed-purposes
  • .uncache() - un-freezes the current state of the document, so it may be transformed
  • .freeze({}) - prevent any tags from being removed, in these terms
  • .unfreeze({}) - allow tags to change again, as default
Accessors
Match

(match methods use the match-syntax.)

  • .match('') - return a new Doc, with this one as a parent
  • .not('') - return all results except for this
  • .matchOne('') - return only the first match
  • .if('') - return each current phrase, only if it contains this match ('only')
  • .ifNo('') - Filter-out any current phrases that have this match ('notIf')
  • .has('') - Return a boolean if this match exists
  • .before('') - return all terms before a match, in each phrase
  • .after('') - return all terms after a match, in each phrase
  • .union() - return combined matches without duplicates
  • .intersection() - return only duplicate matches
  • .complement() - get everything not in another match
  • .settle() - remove overlaps from matches
  • .growRight('') - add any matching terms immediately after each match
  • .growLeft('') - add any matching terms immediately before each match
  • .grow('') - add any matching terms before or after each match
  • .sweep(net) - apply a series of match objects to the document
  • .splitOn('') - return a Document with three parts for every match ('splitOn')
  • .splitBefore('') - partition a phrase before each matching segment
  • .splitAfter('') - partition a phrase after each matching segment
  • .join() - merge any neighbouring terms in each match
  • .joinIf(leftMatch, rightMatch) - merge any neighbouring terms under given conditions
  • .lookup([]) - quick find for an array of string matches
  • .autoFill() - create type-ahead assumptions on the document
Tag
  • .tag('') - Give all terms the given tag
  • .tagSafe('') - Only apply tag to terms if it is consistent with current tags
  • .unTag('') - Remove this term from the given terms
  • .canBe('') - return only the terms that can be this tag
Case
Whitespace
  • .pre('') - add this punctuation or whitespace before each match
  • .post('') - add this punctuation or whitespace after each match
  • .trim() - remove start and end whitespace
  • .hyphenate() - connect words with hyphen, and remove whitespace
  • .dehyphenate() - remove hyphens between words, and set whitespace
  • .toQuotations() - add quotation marks around these matches
  • .toParentheses() - add brackets around these matches
Loops
  • .map(fn) - run each phrase through a function, and create a new document
  • .forEach(fn) - run a function on each phrase, as an individual document
  • .filter(fn) - return only the phrases that return true
  • .find(fn) - return a document with only the first phrase that matches
  • .some(fn) - return true or false if there is one matching phrase
  • .random(fn) - sample a subset of the results
Insert
Transform
  • .sort('method') - re-arrange the order of the matches (in place)
  • .reverse() - reverse the order of the matches, but not the words
  • .unique() - remove any duplicate matches
Lib

(these methods are on the main nlp object)

compromise/two:

Contractions

compromise/three:

Nouns
Verbs
Numbers
Sentences
Adjectives
Misc selections

.extend():

This library comes with a considerate, common-sense baseline for english grammar.

You're free to change, or lay-waste to any settings - which is the fun part actually.

the easiest part is just to suggest tags for any given words:

let myWords = {
  kermit: 'FirstName',
  fozzie: 'FirstName',
}
let doc = nlp(muppetText, myWords)

or make heavier changes with a compromise-plugin.

import nlp from 'compromise'
nlp.extend({
  // add new tags
  tags: {
    Character: {
      isA: 'Person',
      notA: 'Adjective',
    },
  },
  // add or change words in the lexicon
  words: {
    kermit: 'Character',
    gonzo: 'Character',
  },
  // change inflections
  irregulars: {
    get: {
      pastTense: 'gotten',
      gerund: 'gettin',
    },
  },
  // add new methods to compromise
  api: View => {
    View.prototype.kermitVoice = function () {
      this.sentences().prepend('well,')
      this.match('i [(am|was)]').prepend('um,')
      return this
    }
  },
})

Docs:

gentle introduction:
Documentation:
ConceptsAPIPlugins
AccuracyAccessorsAdjectives
CachingConstructor-methodsDates
CaseContractionsExport
FilesizeInsertHash
InternalsJsonHtml
JustificationCharacter OffsetsKeypress
LexiconLoopsNgrams
Match-syntaxMatchNumbers
PerformanceNounsParagraphs
PluginsOutputScan
ProjectsSelectionsSentences
TaggerSortingSyllables
TagsSplitPronounce
TokenizationTextStrict
Named-EntitiesUtilsPenn-tags
WhitespaceVerbsTypeahead
World dataNormalizationSweep
Fuzzy-matchingTypescriptMutation
Root-forms
Talks:
Articles:
Some fun Applications:
Comparisons

Plugins:

These are some helpful extensions:

Dates

npm install compromise-dates

Stats

npm install compromise-stats

Speech

npm install compromise-syllables

Wikipedia

npm install compromise-wikipedia


Typescript

we're committed to typescript/deno support, both in main and in the official-plugins:

import nlp from 'compromise'
import stats from 'compromise-stats'

const nlpEx = nlp.extend(stats)

nlpEx('This is type safe!').ngrams({ min: 1 })

Limitations:

  • slash-support: We currently split slashes up as different words, like we do for hyphens. so things like this don't work: nlp('the koala eats/shoots/leaves').has('koala leaves') //false

  • inter-sentence match: By default, sentences are the top-level abstraction. Inter-sentence, or multi-sentence matches aren't supported without a plugin: nlp("that's it. Back to Winnipeg!").has('it back')//false

  • nested match syntax: the danger beauty of regex is that you can recurse indefinitely. Our match syntax is much weaker. Things like this are not (yet) possible: doc.match('(modern (major|minor))? general') complex matches must be achieved with successive .match() statements.

  • dependency parsing: Proper sentence transformation requires understanding the syntax tree of a sentence, which we don't currently do. We should! Help wanted with this.

FAQ

    ☂️ Isn't javascript too...

      yeah it is!
      it wasn't built to compete with NLTK, and may not fit every project.
      string processing is synchronous too, and parallelizing node processes is weird.
      See here for information about speed & performance, and here for project motivations

    💃 Can it run on my arduino-watch?

      Only if it's water-proof!
      Read quick start for running compromise in workers, mobile apps, and all sorts of funny environments.

    🌎 Compromise in other Languages?

    ✨ Partial builds?

      we do offer a tokenize-only build, which has the POS-tagger pulled-out.
      but otherwise, compromise isn't easily tree-shaken.
      the tagging methods are competitive, and greedy, so it's not recommended to pull things out.
      Note that without a full POS-tagging, the contraction-parser won't work perfectly. ((spencer's cool) vs. (spencer's house))
      It's recommended to run the library fully.

See Also:

MIT