compare-pdf, pdf-lib, pdf2json, and pdfjs-dist are npm packages that enable different aspects of PDF handling in JavaScript environments. pdfjs-dist is Mozilla’s official PDF.js distribution, focused on rendering and extracting text from existing PDFs in the browser. pdf-lib provides tools to create, modify, and manipulate PDF documents programmatically, including merging, splitting, and adding content. pdf2json converts PDFs into structured JSON representations, primarily for text and layout analysis. compare-pdf is a utility specifically designed to compare two PDF documents for visual or structural differences. Each serves distinct use cases within the broader PDF processing ecosystem.
Working with PDFs in frontend applications is notoriously tricky. These four packages — compare-pdf, pdf-lib, pdf2json, and pdfjs-dist — each solve different parts of the puzzle. Let’s break down what they do, how they work, and when to use which.
The biggest difference lies in their core purpose:
pdfjs-dist is for rendering and reading PDFs (like a browser-based PDF viewer).pdf-lib is for creating and editing PDFs (like a lightweight PDF editor in code).pdf2json is for parsing PDFs into structured data (like turning a PDF into a JSON object).compare-pdf is for comparing two PDFs (like a diff tool for documents).You’ll rarely use more than one of these in the same project unless you have a very specific pipeline (e.g., render with pdfjs-dist, then compare outputs with compare-pdf).
Only pdfjs-dist is designed to render PDFs visually in the browser. The others don’t produce canvas or DOM output.
// pdfjs-dist: Render first page to canvas
import * as pdfjsLib from 'pdfjs-dist';
pdfjsLib.GlobalWorkerOptions.workerSrc = 'pdf.worker.js';
const loadingTask = pdfjsLib.getDocument('document.pdf');
const pdf = await loadingTask.promise;
const page = await pdf.getPage(1);
const scale = 1.5;
const viewport = page.getViewport({ scale });
const canvas = document.getElementById('pdf-canvas');
const context = canvas.getContext('2d');
canvas.height = viewport.height;
canvas.width = viewport.width;
const renderContext = {
canvasContext: context,
viewport: viewport
};
await page.render(renderContext).promise;
None of the other packages support this. pdf-lib can read a PDF but won’t draw it. pdf2json gives you coordinates but no pixels. compare-pdf assumes rendering is already handled elsewhere.
Only pdf-lib lets you build or change PDF content programmatically.
// pdf-lib: Create a new PDF and add text
import { PDFDocument, StandardFonts } from 'pdf-lib';
const pdfDoc = await PDFDocument.create();
const page = pdfDoc.addPage([400, 300]);
const font = await pdfDoc.embedFont(StandardFonts.Helvetica);
page.drawText('Hello, world!', {
x: 50,
y: 250,
size: 20,
font: font
});
const pdfBytes = await pdfDoc.save();
// Now you can download or upload pdfBytes
pdfjs-dist can extract text but not insert it. pdf2json is read-only. compare-pdf doesn’t touch content creation at all.
If you want to turn a PDF into structured data (not just raw text), pdf2json is your only option here.
// pdf2json: Parse PDF into JSON
import fs from 'fs';
import PDFParser from 'pdf2json';
const pdfParser = new PDFParser();
pdfParser.on('pdfParser_dataReady', (pdfData) => {
console.log(pdfData); // Full JSON structure with text blocks and coordinates
});
pdfParser.on('pdfParser_dataError', (error) => {
console.error(error);
});
pdfParser.loadPDF('document.pdf');
Note: pdf2json uses an event-based API and is primarily designed for Node.js, though it can work in browsers with Buffer polyfills.
In contrast, pdfjs-dist can extract plain text but loses layout structure:
// pdfjs-dist: Extract plain text only
const pdf = await pdfjsLib.getDocument('document.pdf').promise;
const page = await pdf.getPage(1);
const textContent = await page.getTextContent();
const text = textContent.items.map(item => item.str).join(' ');
pdf-lib can read metadata or copy pages but doesn’t parse content into objects. compare-pdf doesn’t extract data for reuse.
Only compare-pdf offers built-in comparison logic.
// compare-pdf: Check if two PDFs are visually similar
import comparePdf from 'compare-pdf';
const result = await comparePdf()
.pdf1('./a.pdf')
.pdf2('./b.pdf')
.compare();
console.log(result); // { match: true/false, mismatchPages: [...] }
This package typically renders both PDFs (using an internal renderer like Puppeteer) and compares pixel output or page structure. It’s not suitable for real-time use in browsers due to performance and dependency constraints.
None of the other packages include comparison features. You’d have to build your own diff logic using pdfjs-dist text output or pdf2json structures — which is error-prone and fragile.
pdfjs-dist: Works in browsers and Node.js. Requires a worker file (pdf.worker.js) for full functionality.pdf-lib: Pure JavaScript, no external dependencies. Works in browsers and Node.js.pdf2json: Built on top of pdfjs-dist but uses a custom parser. Primarily for Node.js; browser use requires polyfills.compare-pdf: Typically requires a headless browser (like Puppeteer) under the hood, making it unsuitable for most frontend-only apps. Best used in testing or CI environments.compare-pdf is not actively maintained as of 2024 and relies on heavy dependencies like Puppeteer. Avoid it in production web apps; consider server-side alternatives or manual diffing.pdf2json output varies wildly based on how the source PDF was generated. Scanned PDFs (images) yield empty results.pdf-lib cannot extract text or images from existing PDFs — only copy pages or add new content.pdfjs-dist cannot modify PDFs. It’s a reader/renderer only.Rarely. But possible scenarios:
pdfjs-dist to render a PDF in the UI, then send it to a backend that uses pdf-lib to add a watermark before saving.pdf2json on the server to extract data, then use pdf-lib to generate a cleaned-up version.Avoid combining them in the browser unless you’ve measured performance — PDF processing is CPU-heavy.
| Feature | compare-pdf | pdf-lib | pdf2json | pdfjs-dist |
|---|---|---|---|---|
| Render PDF to screen | ❌ | ❌ | ❌ | ✅ |
| Create new PDF | ❌ | ✅ | ❌ | ❌ |
| Edit existing PDF | ❌ | ✅ | ❌ | ❌ |
| Extract structured JSON | ❌ | ❌ | ✅ | ❌ |
| Extract plain text | ❌ | ❌ | ✅ (via structure) | ✅ |
| Compare two PDFs | ✅ | ❌ | ❌ | ❌ |
| Browser-friendly | ❌ (needs Puppeteer) | ✅ | ⚠️ (with polyfills) | ✅ |
| Node.js compatible | ✅ | ✅ | ✅ | ✅ |
pdfjs-distpdf-libpdf2jsoncompare-pdf only for testing/CI, not production appsDon’t try to force one package to do another’s job — they’re built for fundamentally different tasks. Pick the right tool for your specific PDF problem.
Choose compare-pdf only if your sole requirement is to perform visual or structural comparisons between two existing PDF files. It is a narrow-purpose tool with limited functionality beyond comparison, so avoid it if you need general PDF manipulation, creation, or rendering capabilities. Verify its compatibility with your target PDF types, as complex or encrypted documents may not be handled reliably.
Choose pdf-lib when you need to programmatically create, edit, or combine PDFs in the browser or Node.js. It supports adding text, images, and annotations, as well as copying pages between documents. It works entirely in JavaScript without external dependencies, making it ideal for client-side applications where you can’t rely on server-side PDF tools. However, it cannot render PDFs visually or extract rich text layout — only modify document structure.
Choose pdf2json when your goal is to convert a PDF into a machine-readable JSON format that preserves text content and approximate layout coordinates. It’s useful for data extraction or analysis tasks where you need to parse document structure rather than display or edit the PDF. Note that it does not support image extraction or PDF generation, and its output may vary significantly based on the source PDF’s internal structure.
Choose pdfjs-dist when you need to render PDFs in the browser (e.g., in a viewer component) or extract raw text content while preserving reading order. As the official distribution of Mozilla’s PDF.js, it’s battle-tested for rendering fidelity and supports a wide range of PDF features. It’s the best choice for displaying PDFs to users or performing basic text scraping, but it does not allow modification of PDF content or creation of new documents.
Standalone node module that compares pdfs
To use GraphicsMagick (gm) Engine, install the following system dependencies
On MS Windows, please use the 32bit version of GhostScript
brew install graphicsmagick
brew install imagemagick
brew install ghostscript
Install npm module
npm install compare-pdf
Below is the default configuration showing the paths where the pdfs should be placed. By default, they are in the root folder of your project inside the folder data.
The config also contains settings for image comparison such as density, quality, tolerance and threshold. It also has flag to enable or disable cleaning up of the actual and baseline png folders.
{
paths: {
actualPdfRootFolder: process.cwd() + "/data/actualPdfs",
baselinePdfRootFolder: process.cwd() + "/data/baselinePdfs",
actualPngRootFolder: process.cwd() + "/data/actualPngs",
baselinePngRootFolder: process.cwd() + "/data/baselinePngs",
diffPngRootFolder: process.cwd() + "/data/diffPngs"
},
settings: {
imageEngine: 'graphicsMagick',
density: 100,
quality: 70,
tolerance: 0,
threshold: 0.05,
cleanPngPaths: true,
matchPageCount: true,
disableFontFace: true,
verbosity: 0
}
}
PDF to Image Conversion
@font-face rules. If disabled, fonts will be rendered using a built-in font renderer that constructs the glyphs with primitive path commands.Image Comparison
By default, pdfs are compared using the comparison type as "byImage"
it("Should be able to verify same PDFs", async () => {
let comparisonResults = await new comparePdf()
.actualPdfFile("same.pdf")
.baselinePdfFile("baseline.pdf")
.compare();
expect(comparisonResults.status).to.equal("passed");
});
it("Should be able to verify different PDFs", async () => {
const ComparePdf = new comparePdf();
let comparisonResults = await ComparePdf.actualPdfFile("notSame.pdf")
.baselinePdfFile("baseline.pdf")
.compare("byImage");
expect(comparisonResults.status).to.equal("failed");
expect(comparisonResults.message).to.equal("notSame.pdf is not the same as baseline.pdf.");
expect(comparisonResults.details).to.not.be.null;
});
You can mask areas of the images that has dynamic values (ie. Dates, or Ids) before the comparison. Just use the addMask method and indicate the pageIndex (starts at 0) and the coordinates.
it("Should be able to verify same PDFs with Masks", async () => {
let comparisonResults = await new comparePdf()
.actualPdfFile("maskedSame.pdf")
.baselinePdfFile("baseline.pdf")
.addMask(1, { x0: 35, y0: 70, x1: 145, y1: 95 })
.addMask(1, { x0: 185, y0: 70, x1: 285, y1: 95 })
.compare();
expect(comparisonResults.status).to.equal("passed");
});
You can also indicate the page masks in bulk by passing an array of it in the addMasks method
it("Should be able to verify different PDFs with Masks", async () => {
const ComparePdf = new comparePdf();
let masks = [
{ pageIndex: 1, coordinates: { x0: 35, y0: 70, x1: 145, y1: 95 } },
{ pageIndex: 1, coordinates: { x0: 185, y0: 70, x1: 285, y1: 95 } }
];
let comparisonResults = await ComparePdf.actualPdfFile("maskedNotSame.pdf")
.baselinePdfFile("baseline.pdf")
.addMasks(masks)
.compare();
expect(comparisonResults.status).to.equal("failed");
expect(comparisonResults.message).to.equal("maskedNotSame.pdf is not the same as baseline.pdf.");
expect(comparisonResults.details).to.not.be.null;
});
If you need to compare only a certain area of the pdf, you can do so by utilising the cropPage method and passing the pageIndex (starts at 0), the width and height along with the x and y coordinates.
it("Should be able to verify same PDFs with Croppings", async () => {
let comparisonResults = await new comparePdf()
.actualPdfFile("same.pdf")
.baselinePdfFile("baseline.pdf")
.cropPage(1, { width: 530, height: 210, x: 0, y: 415 })
.compare();
expect(comparisonResults.status).to.equal("passed");
});
Similar to masks, you can also pass all cropping in bulk into the cropPages method. You can have multiple croppings in the same page.
it("Should be able to verify same PDFs with Croppings", async () => {
let croppings = [
{ pageIndex: 0, coordinates: { width: 210, height: 180, x: 615, y: 265 } },
{ pageIndex: 0, coordinates: { width: 210, height: 180, x: 615, y: 520 } },
{ pageIndex: 1, coordinates: { width: 530, height: 210, x: 0, y: 415 } }
];
let comparisonResults = await new comparePdf()
.actualPdfFile("same.pdf")
.baselinePdfFile("baseline.pdf")
.cropPages(croppings)
.compare();
expect(comparisonResults.status).to.equal("passed");
});
Should you need to test only specific page indexes in a pdf, you can do so by specifying an array of page indexes using the onlyPageIndexes method as shown below.
it("Should be able to verify only specific page indexes", async () => {
let comparisonResults = await new comparePdf()
.actualPdfFile("notSame.pdf")
.baselinePdfFile("baseline.pdf")
.onlyPageIndexes([1])
.compare();
expect(comparisonResults.status).to.equal("passed");
});
On the flip side, should you need to skip specific page indexes in a pdf, you can do so by specifying an array of page indexes using the skipPageIndexes method as shown below.
it("Should be able to skip specific page indexes", async () => {
let comparisonResults = await new comparePdf()
.actualPdfFile("notSame.pdf")
.baselinePdfFile("baseline.pdf")
.skipPageIndexes([0])
.compare();
expect(comparisonResults.status).to.equal("passed");
});
Starting from v1.1.6, we now support passing buffers instead of the filepath. This is very useful for situations where Pdfs comes from an API call.
it('Should be able to verify same PDFs using direct buffer', async () => {
const actualPdfFilename = "same.pdf";
const baselinePdfFilename = "baseline.pdf";
const actualPdfBuffer = fs.readFileSync(`${config.paths.actualPdfRootFolder}/${actualPdfFilename}`);
const baselinePdfBuffer = fs.readFileSync(`${config.paths.baselinePdfRootFolder}/${baselinePdfFilename}`);
let comparisonResults = await new comparePdf()
.actualPdfBuffer(actualPdfBuffer, actualPdfFilename)
.baselinePdfBuffer(baselinePdfBuffer, baselinePdfFilename)
.compare();
expect(comparisonResults.status).to.equal('passed');
});
it('Should be able to verify same PDFs using direct buffer passing filename in another way', async () => {
const actualPdfFilename = "same.pdf";
const baselinePdfFilename = "baseline.pdf";
const actualPdfBuffer = fs.readFileSync(`${config.paths.actualPdfRootFolder}/${actualPdfFilename}`);
const baselinePdfBuffer = fs.readFileSync(`${config.paths.baselinePdfRootFolder}/${baselinePdfFilename}`);
let comparisonResults = await new comparePdf()
.actualPdfBuffer(actualPdfBuffer)
.actualPdfFile(actualPdfFilename)
.baselinePdfBuffer(baselinePdfBuffer)
.baselinePdfFile(baselinePdfFilename)
.compare();
expect(comparisonResults.status).to.equal('passed');
});
By passing "byBase64" as the comparison type parameter in the compare method, the pdfs will be compared whether the actual and baseline's converted file in base64 format are the same.
it("Should be able to verify same PDFs", async () => {
let comparisonResults = await new comparePdf()
.actualPdfFile("same.pdf")
.baselinePdfFile("baseline.pdf")
.compare("byBase64");
expect(comparisonResults.status).to.equal("passed");
});
it("Should be able to verify different PDFs", async () => {
let comparisonResults = await new comparePdf()
.actualPdfFile("notSame.pdf")
.baselinePdfFile("baseline.pdf")
.compare("byBase64");
expect(comparisonResults.status).to.equal("failed");
expect(comparisonResults.message).to.equal("notSame.pdf is not the same as baseline.pdf.");
});
You can also directly pass buffers instead of filepaths
it('Should be able to verify same PDFs using direct buffer', async () => {
const actualPdfFilename = "same.pdf";
const baselinePdfFilename = "baseline.pdf";
const actualPdfBuffer = fs.readFileSync(`${config.paths.actualPdfRootFolder}/${actualPdfFilename}`);
const baselinePdfBuffer = fs.readFileSync(`${config.paths.baselinePdfRootFolder}/${baselinePdfFilename}`);
let comparisonResults = await new comparePdf(config)
.actualPdfBuffer(actualPdfBuffer, actualPdfFilename)
.baselinePdfBuffer(baselinePdfBuffer, baselinePdfFilename)
.compare('byBase64');
expect(comparisonResults.status).to.equal('passed');
});
Users can override the default configuration by passing their custom config when initialising the class
it("Should be able to override default configs", async () => {
let config = {
paths: {
actualPdfRootFolder: process.cwd() + "/data/newActualPdfs",
baselinePdfRootFolder: process.cwd() + "/data/baselinePdfs",
actualPngRootFolder: process.cwd() + "/data/actualPngs",
baselinePngRootFolder: process.cwd() + "/data/baselinePngs",
diffPngRootFolder: process.cwd() + "/data/diffPngs"
},
settings: {
density: 100,
quality: 70,
tolerance: 0,
threshold: 0.05,
cleanPngPaths: false,
matchPageCount: true
};
let comparisonResults = await new comparePdf(config)
.actualPdfFile("newSame.pdf")
.baselinePdfFile("baseline.pdf")
.compare();
expect(comparisonResults.status).to.equal("passed");
});
it("Should be able to override specific config property", async () => {
const ComparePdf = new comparePdf();
ComparePdf.config.paths.actualPdfRootFolder = process.cwd() + "/data/newActualPdfs";
let comparisonResults = await ComparePdf.actualPdfFile("newSame.pdf")
.baselinePdfFile("baseline.pdf")
.compare();
expect(comparisonResults.status).to.equal("passed");
});
Users can pass just the filename with or without extension as long as the pdfs are inside the default or custom configured actual and baseline paths
it("Should be able to pass just the name of the pdf with extension", async () => {
let comparisonResults = await new comparePdf()
.actualPdfFile("same.pdf")
.baselinePdfFile("baseline.pdf")
.compare();
expect(comparisonResults.status).to.equal("passed");
});
it("Should be able to pass just the name of the pdf without extension", async () => {
let comparisonResults = await new comparePdf()
.actualPdfFile("same")
.baselinePdfFile("baseline")
.compare();
expect(comparisonResults.status).to.equal("passed");
});
Users can also pass a relative path of the pdf files as parameters
it("Should be able to verify same PDFs using relative paths", async () => {
let comparisonResults = await new comparePdf()
.actualPdfFile("../data/actualPdfs/same.pdf")
.baselinePdfFile("../data/baselinePdfs/baseline.pdf")
.compare();
expect(comparisonResults.status).to.equal("passed");
});
To speed up your test executions, you can utilise the comparison type "byBase64" first and only when it fails you comapre it "byImage". This provides the best of both worlds where you get the speed of execution and when there is a difference, you can check the image diff.
it("Should be able to verify PDFs byBase64 and when it fails then byImage", async () => {
let comparisonResultsByBase64 = await new comparePdf()
.actualPdfFile("notSame.pdf")
.baselinePdfFile("baseline.pdf")
.compare("byBase64");
expect(comparisonResultsByBase64.status).to.equal("failed");
expect(comparisonResultsByBase64.message).to.equal(
"notSame.pdf is not the same as baseline.pdf compared by their base64 values."
);
if (comparisonResultsByBase64.status === "failed") {
let comparisonResultsByImage = await new comparePdf()
.actualPdfFile("notSame.pdf")
.baselinePdfFile("baseline.pdf")
.compare("byImage");
expect(comparisonResultsByImage.status).to.equal("failed");
expect(comparisonResultsByImage.message).to.equal(
"notSame.pdf is not the same as baseline.pdf compared by their images."
);
expect(comparisonResultsByImage.details).to.not.be.null;
}
});
macOS users encountering "dyld: Library not loaded" error? Then follow the answer from this stackoverflow post to set the correct path to *.dylib.
If you have issues running the app using Apple Silicon, be sure to install the following:
brew install pkg-config cairo pango
brew install libpng jpeg giflib librsvg