string-similarity vs natural vs similarity vs fuzzyset
String Similarity and Fuzzy Matching Libraries Comparison
1 Year
string-similaritynaturalsimilarityfuzzysetSimilar Packages:
What's String Similarity and Fuzzy Matching Libraries?

These libraries provide tools for measuring the similarity between strings, which is useful in various applications such as search engines, data deduplication, and natural language processing. They implement different algorithms and techniques to compute similarity scores, allowing developers to choose the most suitable one based on their specific needs and use cases. By leveraging these libraries, developers can enhance user experience through better search results, improved data matching, and more intuitive interfaces.

Package Weekly Downloads Trend
Github Stars Ranking
Stat Detail
Package
Downloads
Stars
Size
Issues
Publish
License
string-similarity2,355,4172,526-234 years agoISC
natural215,35310,70813.8 MB817 months agoMIT
similarity96,28275-05 years agoISC
fuzzyset19,5671,37335.6 kB13 years agosee LICENSE.md
Feature Comparison: string-similarity vs natural vs similarity vs fuzzyset

Algorithm Type

  • string-similarity:

    String-similarity uses the Jaro-Winkler distance algorithm, which is particularly effective for short strings and is designed to handle common typographical errors.

  • natural:

    Natural implements various algorithms including Jaro-Winkler, Levenshtein, and cosine similarity, providing a comprehensive toolkit for different similarity needs in natural language processing.

  • similarity:

    Similarity focuses on cosine similarity and Jaccard index, making it suitable for applications that require vector space models and set comparisons.

  • fuzzyset:

    FuzzySet uses a variant of the Levenshtein distance algorithm to calculate similarity scores, allowing for flexible matching that can account for typos and variations in string input.

Performance

  • string-similarity:

    String-similarity is lightweight and fast, making it ideal for applications that need quick similarity checks without heavy computational overhead.

  • natural:

    Natural's performance can vary depending on the algorithm used, but it is generally efficient for most NLP tasks, although it may not be as fast as FuzzySet for large datasets.

  • similarity:

    Similarity is designed for efficiency in calculating similarity scores, particularly for larger datasets, but may require more memory for complex calculations.

  • fuzzyset:

    FuzzySet is optimized for performance, making it suitable for applications that require real-time fuzzy matching, such as search suggestions and autocomplete features.

Use Cases

  • string-similarity:

    String-similarity is ideal for applications that need to compare short strings, such as user input validation, form field matching, and simple search functionalities.

  • natural:

    Natural is versatile and can be used in a wide range of NLP applications, including sentiment analysis, tokenization, and text classification.

  • similarity:

    Similarity is particularly useful in scenarios involving document comparison, plagiarism detection, and recommendation systems based on textual content.

  • fuzzyset:

    FuzzySet is best suited for applications that require fast fuzzy matching, such as search engines, spell checkers, and data cleaning tools.

Ease of Use

  • string-similarity:

    String-similarity is user-friendly and easy to implement, making it a good choice for developers who need quick and effective string comparison.

  • natural:

    Natural provides a rich set of features but may have a steeper learning curve due to its comprehensive nature and various algorithms available.

  • similarity:

    Similarity offers a simple API, making it easy to use for basic string comparison tasks without extensive setup.

  • fuzzyset:

    FuzzySet has a straightforward API that makes it easy to integrate into projects, especially for developers looking for quick fuzzy matching solutions.

Community and Maintenance

  • string-similarity:

    String-similarity is lightweight and maintained, but has a smaller user base, which may limit community support.

  • natural:

    Natural has a larger community and extensive documentation, making it easier to find resources and support for various NLP tasks.

  • similarity:

    Similarity has a moderate community presence and is maintained regularly, but may not have as many resources as larger libraries.

  • fuzzyset:

    FuzzySet has a smaller community but is actively maintained, ensuring that it stays relevant and up-to-date with performance improvements.

How to Choose: string-similarity vs natural vs similarity vs fuzzyset
  • fuzzyset:

    Choose FuzzySet if you need a library that provides fuzzy matching capabilities with a focus on performance and efficiency. It is particularly useful for applications that require quick lookups and can handle large datasets effectively.

README for string-similarity

string-similarity

Finds degree of similarity between two strings, based on Dice's Coefficient, which is mostly better than Levenshtein distance.

Table of Contents

Usage

For Node.js

Install using:

npm install string-similarity --save

In your code:

var stringSimilarity = require("string-similarity");

var similarity = stringSimilarity.compareTwoStrings("healed", "sealed");

var matches = stringSimilarity.findBestMatch("healed", [
  "edward",
  "sealed",
  "theatre",
]);

For browser apps

Include <script src="//unpkg.com/string-similarity/umd/string-similarity.min.js"></script> to get the latest version.

Or <script src="//unpkg.com/string-similarity@4.0.1/umd/string-similarity.min.js"></script> to get a specific version (4.0.1) in this case.

This exposes a global variable called stringSimilarity which you can start using.

<script>
  stringSimilarity.compareTwoStrings('what!', 'who?');
</script>

(The package is exposed as UMD, so you can consume it as such)

API

The package contains two methods:

compareTwoStrings(string1, string2)

Returns a fraction between 0 and 1, which indicates the degree of similarity between the two strings. 0 indicates completely different strings, 1 indicates identical strings. The comparison is case-sensitive.

Arguments
  1. string1 (string): The first string
  2. string2 (string): The second string

Order does not make a difference.

Returns

(number): A fraction from 0 to 1, both inclusive. Higher number indicates more similarity.

Examples
stringSimilarity.compareTwoStrings("healed", "sealed");
// → 0.8

stringSimilarity.compareTwoStrings(
  "Olive-green table for sale, in extremely good condition.",
  "For sale: table in very good  condition, olive green in colour."
);
// → 0.6060606060606061

stringSimilarity.compareTwoStrings(
  "Olive-green table for sale, in extremely good condition.",
  "For sale: green Subaru Impreza, 210,000 miles"
);
// → 0.2558139534883721

stringSimilarity.compareTwoStrings(
  "Olive-green table for sale, in extremely good condition.",
  "Wanted: mountain bike with at least 21 gears."
);
// → 0.1411764705882353

findBestMatch(mainString, targetStrings)

Compares mainString against each string in targetStrings.

Arguments
  1. mainString (string): The string to match each target string against.
  2. targetStrings (Array): Each string in this array will be matched against the main string.
Returns

(Object): An object with a ratings property, which gives a similarity rating for each target string, a bestMatch property, which specifies which target string was most similar to the main string, and a bestMatchIndex property, which specifies the index of the bestMatch in the targetStrings array.

Examples
stringSimilarity.findBestMatch('Olive-green table for sale, in extremely good condition.', [
  'For sale: green Subaru Impreza, 210,000 miles',
  'For sale: table in very good condition, olive green in colour.',
  'Wanted: mountain bike with at least 21 gears.'
]);
// →
{ ratings:
   [ { target: 'For sale: green Subaru Impreza, 210,000 miles',
       rating: 0.2558139534883721 },
     { target: 'For sale: table in very good condition, olive green in colour.',
       rating: 0.6060606060606061 },
     { target: 'Wanted: mountain bike with at least 21 gears.',
       rating: 0.1411764705882353 } ],
  bestMatch:
   { target: 'For sale: table in very good condition, olive green in colour.',
     rating: 0.6060606060606061 },
  bestMatchIndex: 1
}

Release Notes

2.0.0

  • Removed production dependencies
  • Updated to ES6 (this breaks backward-compatibility for pre-ES6 apps)

3.0.0

  • Performance improvement for compareTwoStrings(..): now O(n) instead of O(n^2)
  • The algorithm has been tweaked slightly to disregard spaces and word boundaries. This will change the rating values slightly but not enough to make a significant difference
  • Adding a bestMatchIndex to the results for findBestMatch(..) to point to the best match in the supplied targetStrings array

3.0.1

  • Refactoring: removed unused functions; used substring instead of substr
  • Updated dependencies

4.0.1

  • Distributing as an UMD build to be used in browsers.

4.0.2

  • Update dependencies to latest versions.

4.0.3

  • Make compatible with IE and ES5. Also, update deps. (see PR56)

4.0.4

  • Simplify some conditional statements. Also, update deps. (see PR50)

Build status Known Vulnerabilities