string-similarity vs string-similarity-js vs similarity
String Similarity Libraries Comparison
1 Year
string-similaritystring-similarity-jssimilaritySimilar Packages:
What's String Similarity Libraries?

String similarity libraries provide algorithms and functions to measure how similar two strings are, which is useful in various applications such as search engines, data deduplication, and natural language processing. These libraries typically implement different algorithms like Levenshtein distance, Jaccard index, and cosine similarity to quantify the degree of similarity between strings. By leveraging these libraries, developers can enhance user experiences through features like fuzzy searching, recommendation systems, and duplicate detection in datasets.

Package Weekly Downloads Trend
Github Stars Ranking
Stat Detail
Package
Downloads
Stars
Size
Issues
Publish
License
string-similarity2,355,4172,526-234 years agoISC
string-similarity-js209,5249712.7 kB2-MIT
similarity96,28275-05 years agoISC
Feature Comparison: string-similarity vs string-similarity-js vs similarity

Algorithm Variety

  • string-similarity:

    The 'string-similarity' package focuses primarily on the Levenshtein distance algorithm, which is effective for measuring edit distance and is widely used in applications requiring fuzzy matching.

  • string-similarity-js:

    The 'string-similarity-js' package offers a broader range of algorithms, including Jaccard and Sørensen-Dice coefficients, making it suitable for applications needing multiple similarity measures.

  • similarity:

    The 'similarity' package implements a few core algorithms such as Jaccard similarity and cosine similarity, providing a balance between performance and ease of use without overwhelming the user with options.

Performance

  • string-similarity:

    While effective, 'string-similarity' may not be as fast as 'similarity' for large datasets due to its focus on a single algorithm, which may require additional processing for complex comparisons.

  • string-similarity-js:

    This package is designed for versatility but may have a slight performance overhead compared to 'similarity' due to its support for multiple algorithms.

  • similarity:

    This package is optimized for performance, making it suitable for applications where speed is critical, such as real-time search functionalities.

Ease of Use

  • string-similarity:

    This package is also user-friendly, especially for those familiar with Levenshtein distance, but may require more understanding of its specific algorithmic approach.

  • string-similarity-js:

    The API is straightforward, but users may need to familiarize themselves with the different algorithms available, which could add a slight learning curve.

  • similarity:

    With a simple API and minimal configuration, 'similarity' is easy to integrate and use, making it ideal for developers who want quick results without extensive setup.

Community and Support

  • string-similarity:

    This package has a larger user base and more community resources, making it easier to find examples and support for common issues.

  • string-similarity-js:

    Similar to 'string-similarity', this package benefits from a growing community, but it may have less documentation compared to more established libraries.

  • similarity:

    As a relatively newer package, 'similarity' may have a smaller community, which could affect the availability of resources and support.

Use Cases

  • string-similarity:

    Ideal for applications focused on fuzzy matching, such as autocomplete features and spell-checking, where Levenshtein distance is particularly effective.

  • string-similarity-js:

    Recommended for applications that need to compare strings using multiple algorithms, such as data cleaning and deduplication tasks.

  • similarity:

    Best suited for applications that require basic string similarity checks without the need for complex configurations, such as simple search functionalities.

How to Choose: string-similarity vs string-similarity-js vs similarity
  • similarity:

    Choose 'similarity' if you need a lightweight library that focuses on a few core algorithms for string similarity, particularly if you prefer a minimalistic approach with a straightforward API.

README for string-similarity

string-similarity

Finds degree of similarity between two strings, based on Dice's Coefficient, which is mostly better than Levenshtein distance.

Table of Contents

Usage

For Node.js

Install using:

npm install string-similarity --save

In your code:

var stringSimilarity = require("string-similarity");

var similarity = stringSimilarity.compareTwoStrings("healed", "sealed");

var matches = stringSimilarity.findBestMatch("healed", [
  "edward",
  "sealed",
  "theatre",
]);

For browser apps

Include <script src="//unpkg.com/string-similarity/umd/string-similarity.min.js"></script> to get the latest version.

Or <script src="//unpkg.com/string-similarity@4.0.1/umd/string-similarity.min.js"></script> to get a specific version (4.0.1) in this case.

This exposes a global variable called stringSimilarity which you can start using.

<script>
  stringSimilarity.compareTwoStrings('what!', 'who?');
</script>

(The package is exposed as UMD, so you can consume it as such)

API

The package contains two methods:

compareTwoStrings(string1, string2)

Returns a fraction between 0 and 1, which indicates the degree of similarity between the two strings. 0 indicates completely different strings, 1 indicates identical strings. The comparison is case-sensitive.

Arguments
  1. string1 (string): The first string
  2. string2 (string): The second string

Order does not make a difference.

Returns

(number): A fraction from 0 to 1, both inclusive. Higher number indicates more similarity.

Examples
stringSimilarity.compareTwoStrings("healed", "sealed");
// → 0.8

stringSimilarity.compareTwoStrings(
  "Olive-green table for sale, in extremely good condition.",
  "For sale: table in very good  condition, olive green in colour."
);
// → 0.6060606060606061

stringSimilarity.compareTwoStrings(
  "Olive-green table for sale, in extremely good condition.",
  "For sale: green Subaru Impreza, 210,000 miles"
);
// → 0.2558139534883721

stringSimilarity.compareTwoStrings(
  "Olive-green table for sale, in extremely good condition.",
  "Wanted: mountain bike with at least 21 gears."
);
// → 0.1411764705882353

findBestMatch(mainString, targetStrings)

Compares mainString against each string in targetStrings.

Arguments
  1. mainString (string): The string to match each target string against.
  2. targetStrings (Array): Each string in this array will be matched against the main string.
Returns

(Object): An object with a ratings property, which gives a similarity rating for each target string, a bestMatch property, which specifies which target string was most similar to the main string, and a bestMatchIndex property, which specifies the index of the bestMatch in the targetStrings array.

Examples
stringSimilarity.findBestMatch('Olive-green table for sale, in extremely good condition.', [
  'For sale: green Subaru Impreza, 210,000 miles',
  'For sale: table in very good condition, olive green in colour.',
  'Wanted: mountain bike with at least 21 gears.'
]);
// →
{ ratings:
   [ { target: 'For sale: green Subaru Impreza, 210,000 miles',
       rating: 0.2558139534883721 },
     { target: 'For sale: table in very good condition, olive green in colour.',
       rating: 0.6060606060606061 },
     { target: 'Wanted: mountain bike with at least 21 gears.',
       rating: 0.1411764705882353 } ],
  bestMatch:
   { target: 'For sale: table in very good condition, olive green in colour.',
     rating: 0.6060606060606061 },
  bestMatchIndex: 1
}

Release Notes

2.0.0

  • Removed production dependencies
  • Updated to ES6 (this breaks backward-compatibility for pre-ES6 apps)

3.0.0

  • Performance improvement for compareTwoStrings(..): now O(n) instead of O(n^2)
  • The algorithm has been tweaked slightly to disregard spaces and word boundaries. This will change the rating values slightly but not enough to make a significant difference
  • Adding a bestMatchIndex to the results for findBestMatch(..) to point to the best match in the supplied targetStrings array

3.0.1

  • Refactoring: removed unused functions; used substring instead of substr
  • Updated dependencies

4.0.1

  • Distributing as an UMD build to be used in browsers.

4.0.2

  • Update dependencies to latest versions.

4.0.3

  • Make compatible with IE and ES5. Also, update deps. (see PR56)

4.0.4

  • Simplify some conditional statements. Also, update deps. (see PR50)

Build status Known Vulnerabilities