string-similarity vs levenshtein-edit-distance vs natural
String Similarity Measurement Libraries Comparison
1 Year
string-similaritylevenshtein-edit-distancenaturalSimilar Packages:
What's String Similarity Measurement Libraries?

String similarity measurement libraries are essential tools in web development for comparing and analyzing textual data. They help in various applications such as search optimization, data deduplication, and natural language processing by quantifying how similar two strings are. These libraries implement different algorithms to calculate similarity scores or distances, enabling developers to choose the most suitable method based on their specific use cases and performance requirements.

Package Weekly Downloads Trend
Github Stars Ranking
Stat Detail
Package
Downloads
Stars
Size
Issues
Publish
License
string-similarity1,692,4292,527-234 years agoISC
levenshtein-edit-distance232,3907112.4 kB0-MIT
natural183,52610,77013.8 MB829 months agoMIT
Feature Comparison: string-similarity vs levenshtein-edit-distance vs natural

Algorithm Variety

  • string-similarity:

    String Similarity focuses on a few key algorithms like Jaccard and Cosine similarity, providing a balance between simplicity and effectiveness. It is designed for quick and easy implementation of string matching tasks without overwhelming the user with options.

  • levenshtein-edit-distance:

    This package specifically implements the Levenshtein distance algorithm, which calculates the minimum number of single-character edits required to change one word into another. It is efficient for basic string comparison tasks but lacks variety in algorithms.

  • natural:

    Natural provides a wide range of algorithms for string similarity, including Levenshtein, Jaro-Winkler, and Cosine similarity. This variety allows developers to choose the most appropriate method for their specific use case, enhancing flexibility and functionality.

Performance

  • string-similarity:

    String Similarity is designed for fast execution, especially for common string matching tasks. It is optimized for performance, making it a good choice when speed is a priority in applications that require real-time comparisons.

  • levenshtein-edit-distance:

    This package is optimized for performance in calculating edit distances, making it suitable for applications that require fast comparisons of short strings. However, its performance may degrade with longer strings due to the nature of the algorithm.

  • natural:

    Natural's performance varies depending on the algorithm used. While it offers a comprehensive set of features, some algorithms may be slower than others. Developers should benchmark performance against their specific use cases to ensure efficiency.

Ease of Use

  • string-similarity:

    String Similarity is user-friendly and easy to integrate into projects. Its API is intuitive, making it accessible for developers who need to implement string similarity checks quickly.

  • levenshtein-edit-distance:

    This package is very straightforward to use, with a simple API that allows developers to quickly implement string distance calculations without extensive setup or configuration.

  • natural:

    Natural has a steeper learning curve due to its extensive features and capabilities. While it offers powerful tools for NLP, new users may need some time to familiarize themselves with its API and functionalities.

Extensibility

  • string-similarity:

    String Similarity offers some level of extensibility, but it primarily focuses on a few algorithms. Developers can extend its functionality, but it may not be as robust as Natural for complex NLP tasks.

  • levenshtein-edit-distance:

    This package is not designed for extensibility; it focuses solely on the Levenshtein distance algorithm without additional features or customization options.

  • natural:

    Natural is highly extensible, allowing developers to add custom algorithms and features as needed. This makes it suitable for projects that may evolve over time and require additional NLP capabilities.

Community and Support

  • string-similarity:

    String Similarity has a moderate community presence, with sufficient documentation and examples available. While it may not be as extensive as Natural, it still offers adequate support for most common use cases.

  • levenshtein-edit-distance:

    This package has a smaller community and limited support compared to others, which may affect the availability of resources and documentation for troubleshooting.

  • natural:

    Natural has a larger community and more extensive documentation, providing better support for developers. This can be beneficial for troubleshooting and finding examples of usage.

How to Choose: string-similarity vs levenshtein-edit-distance vs natural
  • string-similarity:

    Opt for String Similarity if you are looking for a simple and effective way to measure string similarity using multiple algorithms, including Jaccard and Cosine similarity. This package is particularly useful for applications focused on fuzzy matching and deduplication tasks.

  • levenshtein-edit-distance:

    Choose this package if you need a straightforward implementation of the Levenshtein distance algorithm, which is optimal for applications requiring basic edit distance calculations between strings. It is lightweight and easy to integrate into projects without additional dependencies.

  • natural:

    Select Natural if you require a comprehensive natural language processing toolkit that includes various string similarity algorithms along with additional features like tokenization, stemming, and classification. This package is ideal for more complex applications that need a broader set of NLP capabilities.

README for string-similarity

string-similarity

Finds degree of similarity between two strings, based on Dice's Coefficient, which is mostly better than Levenshtein distance.

Table of Contents

Usage

For Node.js

Install using:

npm install string-similarity --save

In your code:

var stringSimilarity = require("string-similarity");

var similarity = stringSimilarity.compareTwoStrings("healed", "sealed");

var matches = stringSimilarity.findBestMatch("healed", [
  "edward",
  "sealed",
  "theatre",
]);

For browser apps

Include <script src="//unpkg.com/string-similarity/umd/string-similarity.min.js"></script> to get the latest version.

Or <script src="//unpkg.com/string-similarity@4.0.1/umd/string-similarity.min.js"></script> to get a specific version (4.0.1) in this case.

This exposes a global variable called stringSimilarity which you can start using.

<script>
  stringSimilarity.compareTwoStrings('what!', 'who?');
</script>

(The package is exposed as UMD, so you can consume it as such)

API

The package contains two methods:

compareTwoStrings(string1, string2)

Returns a fraction between 0 and 1, which indicates the degree of similarity between the two strings. 0 indicates completely different strings, 1 indicates identical strings. The comparison is case-sensitive.

Arguments
  1. string1 (string): The first string
  2. string2 (string): The second string

Order does not make a difference.

Returns

(number): A fraction from 0 to 1, both inclusive. Higher number indicates more similarity.

Examples
stringSimilarity.compareTwoStrings("healed", "sealed");
// → 0.8

stringSimilarity.compareTwoStrings(
  "Olive-green table for sale, in extremely good condition.",
  "For sale: table in very good  condition, olive green in colour."
);
// → 0.6060606060606061

stringSimilarity.compareTwoStrings(
  "Olive-green table for sale, in extremely good condition.",
  "For sale: green Subaru Impreza, 210,000 miles"
);
// → 0.2558139534883721

stringSimilarity.compareTwoStrings(
  "Olive-green table for sale, in extremely good condition.",
  "Wanted: mountain bike with at least 21 gears."
);
// → 0.1411764705882353

findBestMatch(mainString, targetStrings)

Compares mainString against each string in targetStrings.

Arguments
  1. mainString (string): The string to match each target string against.
  2. targetStrings (Array): Each string in this array will be matched against the main string.
Returns

(Object): An object with a ratings property, which gives a similarity rating for each target string, a bestMatch property, which specifies which target string was most similar to the main string, and a bestMatchIndex property, which specifies the index of the bestMatch in the targetStrings array.

Examples
stringSimilarity.findBestMatch('Olive-green table for sale, in extremely good condition.', [
  'For sale: green Subaru Impreza, 210,000 miles',
  'For sale: table in very good condition, olive green in colour.',
  'Wanted: mountain bike with at least 21 gears.'
]);
// →
{ ratings:
   [ { target: 'For sale: green Subaru Impreza, 210,000 miles',
       rating: 0.2558139534883721 },
     { target: 'For sale: table in very good condition, olive green in colour.',
       rating: 0.6060606060606061 },
     { target: 'Wanted: mountain bike with at least 21 gears.',
       rating: 0.1411764705882353 } ],
  bestMatch:
   { target: 'For sale: table in very good condition, olive green in colour.',
     rating: 0.6060606060606061 },
  bestMatchIndex: 1
}

Release Notes

2.0.0

  • Removed production dependencies
  • Updated to ES6 (this breaks backward-compatibility for pre-ES6 apps)

3.0.0

  • Performance improvement for compareTwoStrings(..): now O(n) instead of O(n^2)
  • The algorithm has been tweaked slightly to disregard spaces and word boundaries. This will change the rating values slightly but not enough to make a significant difference
  • Adding a bestMatchIndex to the results for findBestMatch(..) to point to the best match in the supplied targetStrings array

3.0.1

  • Refactoring: removed unused functions; used substring instead of substr
  • Updated dependencies

4.0.1

  • Distributing as an UMD build to be used in browsers.

4.0.2

  • Update dependencies to latest versions.

4.0.3

  • Make compatible with IE and ES5. Also, update deps. (see PR56)

4.0.4

  • Simplify some conditional statements. Also, update deps. (see PR50)

Build status Known Vulnerabilities