similarity vs string-similarity vs string-similarity-js
Implementing Fuzzy String Matching in JavaScript
similaritystring-similaritystring-similarity-jsSimilar Packages:

Implementing Fuzzy String Matching in JavaScript

similarity, string-similarity, and string-similarity-js are libraries designed to measure how alike two strings are. They are essential for features like search autocomplete, typo correction, and record deduplication. While they share a common goal, they differ in the mathematical algorithms they use (Levenshtein distance vs. Sorensen-Dice coefficient) and their maintenance status. Choosing the right one depends on whether you need edit-distance accuracy or n-gram performance, and how much you value long-term package stability.

Npm Package Weekly Downloads Trend

3 Years

Github Stars Ranking

Stat Detail

Package
Downloads
Stars
Size
Issues
Publish
License
similarity079-06 years agoISC
string-similarity02,534-235 years agoISC
string-similarity-js010912.7 kB2-MIT

Fuzzy String Matching: similarity vs string-similarity vs string-similarity-js

When building search bars, auto-correct features, or data deduplication tools, exact string matches are rarely enough. Users make typos, and data comes in inconsistent formats. This is where fuzzy matching libraries come in. similarity, string-similarity, and string-similarity-js are the top contenders in the JavaScript ecosystem, but they solve the problem using different math and offer different levels of maintenance confidence.

🧮 Core Algorithms: Edit Distance vs. N-Grams

The most critical technical difference lies in the algorithm. This dictates how "similarity" is calculated and impacts performance on different string lengths.

similarity uses the Levenshtein Distance algorithm.

  • It counts the minimum number of single-character edits (insertions, deletions, substitutions) required to change one word into the other.
  • Better for detecting typos in longer strings where order matters significantly.
  • Computationally heavier on very long strings (O(n*m) complexity).
// similarity: Levenshtein based
import similarity from 'similarity';

// Returns a number between 0 and 1
const score = similarity('kitten', 'sitting'); 
// Output: ~0.57 (3 edits needed)

string-similarity and string-similarity-js use the Sorensen-Dice Coefficient.

  • They break strings into bigrams (pairs of characters) and compare the overlap.
  • Faster on short to medium strings.
  • Often yields more intuitive results for search suggestions and short queries.
// string-similarity: Dice Coefficient
import stringSimilarity from 'string-similarity';

const score = stringSimilarity.compareTwoStrings('kitten', 'sitting');
// Output: 0.666... (Based on bigram overlap)
// string-similarity-js: Dice Coefficient
import stringSimilarity from 'string-similarity-js';

const score = stringSimilarity.compareTwoStrings('kitten', 'sitting');
// Output: 0.666... (Same algorithm, different package)

🛠️ API Design and Features

While similarity offers a single function for direct comparison, the string-similarity family provides a richer API for handling lists of candidates, which is common in search UI.

similarity is minimal.

  • Exports a single function.
  • You must write your own loop to find the best match in an array.
// similarity: Manual best match
import similarity from 'similarity';

const target = 'apple';
const candidates = ['apply', 'banana', 'app'];

const best = candidates.reduce((prev, curr) => {
  return similarity(curr, target) > prev.score 
    ? { name: curr, score: similarity(curr, target) } 
    : prev;
}, { name: '', score: 0 });

string-similarity and string-similarity-js include helpers.

  • compareTwoStrings: Direct comparison.
  • findBestMatch: Automatically ranks an array of targets.
  • Saves development time for autocomplete features.
// string-similarity: Built-in best match
import stringSimilarity from 'string-similarity';

const target = 'apple';
const candidates = ['apply', 'banana', 'app'];

const match = stringSimilarity.findBestMatch(target, candidates);
// Returns { bestMatch: ..., ratings: [...] }
console.log(match.bestMatch.target);
// string-similarity-js: Built-in best match
import stringSimilarity from 'string-similarity-js';

const target = 'apple';
const candidates = ['apply', 'banana', 'app'];

const match = stringSimilarity.findBestMatch(target, candidates);
// API is identical to original string-similarity
console.log(match.bestMatch.target);

⚠️ Maintenance and Security Status

For architectural decisions, package health is as important as code quality. A library that is no longer maintained poses a security and compatibility risk.

similarity

  • Historically stable due to its simplicity.
  • Low surface area for bugs.
  • Verify current repo activity before adopting for critical infrastructure.

string-similarity

  • Widely adopted but has faced periods of low maintenance activity.
  • Past concerns about unmerged PRs and slow issue resolution led the community to seek alternatives.
  • Use with caution in long-term enterprise projects.

string-similarity-js

  • Created specifically to address maintenance concerns with the original.
  • Actively maintained fork.
  • Recommended for new projects requiring the Dice coefficient algorithm.

🌐 Real-World Scenarios

Scenario 1: Search Autocomplete

You need to match user input against a list of product names.

  • Best choice: string-similarity-js
  • Why? The findBestMatch helper saves code, and the Dice coefficient handles partial matches well. Active maintenance ensures stability.
import stringSimilarity from 'string-similarity-js';

function suggest(query, products) {
  return stringSimilarity.findBestMatch(query, products).bestMatch.target;
}

Scenario 2: Password Similarity Check

You need to ensure a new password is not too close to the old one (security policy).

  • Best choice: similarity
  • Why? Levenshtein distance is the standard for security policies regarding character changes. It is stricter and more predictable for this use case.
import similarity from 'similarity';

function validatePassword(old, newPwd) {
  if (similarity(old, newPwd) > 0.6) {
    throw new Error('Password too similar to previous one');
  }
}

Scenario 3: Data Deduplication

You have a list of company names ("Inc.", "Incorporated", "Ltd") and need to merge duplicates.

  • Best choice: string-similarity-js
  • Why? N-gram matching handles abbreviations and slight variations better than strict edit distance.
import stringSimilarity from 'string-similarity-js';

const score = stringSimilarity.compareTwoStrings('Acme Inc', 'Acme Incorporated');
// High score expected due to shared bigrams

📊 Summary Table

Featuresimilaritystring-similaritystring-similarity-js
AlgorithmLevenshtein DistanceSorensen-Dice CoefficientSorensen-Dice Coefficient
Best Match Helper❌ (Manual implementation)✅ (findBestMatch)✅ (findBestMatch)
PerformanceSlower on long stringsFaster on short stringsFaster on short strings
MaintenanceStable / Low Activity⚠️ Historical Concerns✅ Active / Fork
Use CaseSecurity, Edit TrackingSearch, AutocompleteSearch, Autocomplete

💡 The Big Picture

similarity is the specialist tool 🛠️ for when you care about exact character edits. It is mathematically rigorous for security or diffing tools but lacks convenience features for search UIs.

string-similarity is the veteran 🎖️ that popularized fuzzy matching in JS. While its API is excellent, the maintenance risk makes it hard to recommend for new greenfield projects in 2024 and beyond.

string-similarity-js is the modern successor 🚀. It keeps the API developers love while fixing the maintenance risks. For most frontend architects building search or suggestion features, this is the default choice.

Final Thought: Don't just pick the most downloaded package. Pick the one whose algorithm matches your data shape and whose maintenance status matches your risk tolerance. For most UI search tasks, string-similarity-js offers the best balance of features and safety.

How to Choose: similarity vs string-similarity vs string-similarity-js

  • similarity:

    Choose similarity if your use case relies on edit distance (Levenshtein algorithm), where you care about the number of character changes needed to transform one string into another. It is lightweight and ideal for simple comparisons where performance on very long strings is not the primary bottleneck. However, ensure you verify its current maintenance status before committing to it for enterprise projects.

  • string-similarity:

    Choose string-similarity if you need the Sorensen-Dice coefficient algorithm, which is often better for short strings and search suggestions. It offers useful helpers like findBestMatch. However, be aware of historical maintenance concerns; verify the current repository activity before using it in critical production systems.

  • string-similarity-js:

    Choose string-similarity-js if you want the API and algorithm of string-similarity (Sorensen-Dice) but with a focus on active maintenance and stability. It is generally the safer bet for new projects requiring n-gram based matching, serving as a drop-in replacement for the original package with better long-term support guarantees.

README for similarity

similarity

Build Coverage Downloads Size

How similar are these two strings?

Install

npm:

npm install similarity

Use

var similarity = require('similarity')

similarity('food', 'food') // 1
similarity('food', 'fool') // 0.75
similarity('ding', 'plow') // 0
similarity('chicken', 'chick') // 0.714285714
similarity('ES6-Shim', 'es6 shim') // 0.875 (case insensitive)
similarity('ES6-Shim', 'es6 shim', {sensitive: true}) // 0.5 (case sensitive)

API

similarity(left, right[, options])

Get the similarity (number) between two values (strings), where 0 is dissimilar, and 1 is equal.

  • options.sensitive (boolean, default: false) — Turn on (true) to treat casing differences as differences

CLI

Usage: similarity [options] <word> <word>

How similar are these two strings?

Options:

  -h, --help           output usage information
  -v, --version        output version number
  -s, --sensitive      be sensitive to casing differences

Usage:

# output similarity
$ similarity sitting kitten
0.5714285714285714
$ similarity saturday sunday
0.625

See also

Note: This module uses Levenshtein distance to measure similarity, but there are many other algorithms for string comparison. Here are a few:

  • clj-fuzzy — Handy collection of algorithms dealing with fuzzy strings and phonetics
  • natural — General natural language facilities for node
  • string-similarity — Finds degree of similarity between two strings, based on Dice’s coefficient
  • dice-coefficient — Sørensen–Dice coefficient
  • jaro-winkler — The Jaro-Winkler distance metric

License

ISC © Zeke Sikelianos