pdf-lib, pdf-parse, and pdfjs-dist are JavaScript libraries used for working with PDF documents in web applications, but they serve different purposes and operate at different layers of the PDF processing stack. pdf-lib is designed for creating and modifying PDFs programmatically, allowing developers to add text, images, and annotations to existing documents or generate new ones from scratch. pdf-parse focuses exclusively on extracting text content from PDF files in a Node.js environment, offering a simple API for basic text retrieval without rendering capabilities. pdfjs-dist is the official distribution of Mozilla's PDF.js library, providing comprehensive PDF rendering, text extraction, and metadata parsing in both browser and Node.js environments, with support for displaying PDFs visually using canvas or SVG.
Working with PDFs in web applications is notoriously tricky due to the format’s complexity. The three main libraries—pdf-lib, pdf-parse, and pdfjs-dist—address different parts of the problem: authoring, text extraction, and rendering/inspection. Understanding their roles prevents costly architectural missteps.
pdf-lib: PDF Authoring and Modificationpdf-lib treats PDFs as editable documents. You can:
It operates on the PDF’s internal structure (objects, streams, cross-reference tables) but does not render pages visually or interpret layout for text extraction.
// pdf-lib: Create a new PDF and add text
import { PDFDocument, StandardFonts } from 'pdf-lib';
const pdfDoc = await PDFDocument.create();
const page = pdfDoc.addPage([500, 300]);
const font = await pdfDoc.embedFont(StandardFonts.Helvetica);
page.drawText('Hello from pdf-lib!', { x: 50, y: 250, size: 20, font });
const pdfBytes = await pdfDoc.save(); // Uint8Array
pdf-parse: Simple Text Extraction (Node.js Only)pdf-parse is a thin wrapper around pdfjs-dist that only extracts raw text from a PDF buffer. It returns a string with all text concatenated, stripped of formatting, positioning, or structure. It’s synchronous in spirit (returns a Promise) but offers no control over parsing behavior.
// pdf-parse: Extract plain text (Node.js only)
import pdf from 'pdf-parse';
import fs from 'fs';
const pdfBuffer = fs.readFileSync('document.pdf');
const data = await pdf(pdfBuffer);
console.log(data.text); // "All text concatenated into one string"
⚠️ Note:
pdf-parseis not usable in browsers because it relies on Node.js Buffer objects and doesn’t expose PDF.js’s rendering APIs.
pdfjs-dist: Full PDF Rendering and Analysispdfjs-dist is Mozilla’s battle-tested PDF engine. It can:
<canvas> or <svg> for displayIt’s the foundation for Firefox’s built-in PDF viewer and works in both browser and Node.js (with polyfills).
// pdfjs-dist: Render first page to canvas (browser)
import * as pdfjsLib from 'pdfjs-dist';
// Set worker path (required)
pdfjsLib.GlobalWorkerOptions.workerSrc = './pdf.worker.js';
const loadingTask = pdfjsLib.getDocument('document.pdf');
const pdf = await loadingTask.promise;
const page = await pdf.getPage(1);
const canvas = document.getElementById('pdf-canvas');
const viewport = page.getViewport({ scale: 1.5 });
canvas.height = viewport.height;
canvas.width = viewport.width;
const renderContext = { canvasContext: canvas.getContext('2d'), viewport };
await page.render(renderContext).promise;
When you need to pull text out of a PDF, the approach varies drastically.
pdf-lib cannot extract text. It reads PDF structure but doesn’t interpret content streams as human-readable text.
pdf-parse gives you a blob of text with no layout info:
// pdf-parse output example
"Title\nSection 1\nThis is paragraph one.\nSection 2\nAnother paragraph."
pdfjs-dist provides structured text with positioning, which lets you reconstruct reading order or detect columns:
// pdfjs-dist: Get text with layout info
const page = await pdf.getPage(1);
const textContent = await page.getTextContent();
// textContent.items = [
// { str: "Title", transform: [1, 0, 0, 1, 50, 700], ... },
// { str: "Section 1", transform: [1, 0, 0, 1, 50, 680], ... }
// ]
const text = textContent.items.map(item => item.str).join(' ');
If you need to know where text appears on the page (for redaction, search highlighting, or data scraping), only pdfjs-dist delivers.
Only pdfjs-dist can render PDFs visually. Neither pdf-lib nor pdf-parse has any rendering capability.
In the browser, pdfjs-dist draws pages to <canvas> with high fidelity, including vector graphics, embedded fonts, and transparency. You can also render to SVG for scalable output.
// pdfjs-dist: Render to SVG (alternative to canvas)
const operatorList = await page.getOperatorList();
const svgGraphics = await pdfjsLib.SVGGraphics.init(page.commonObjs, page.objs);
const svg = await svgGraphics.paintSvg(operatorList);
document.body.appendChild(svg);
Attempting to use pdf-lib for display will fail—it outputs byte arrays, not renderable DOM elements.
| Library | Browser | Node.js | Requires Worker | External Dependencies |
|---|---|---|---|---|
pdf-lib | ✅ | ✅ | ❌ | None |
pdf-parse | ❌ | ✅ | ❌ | Depends on pdfjs-dist |
pdfjs-dist | ✅ | ✅* | ✅ (in browser) | Needs worker script |
* Node.js usage requires polyfills for window, document, etc., and careful bundler configuration.
pdfjs-dist’s worker requirement in browsers is non-negotiable for performance—it offloads parsing to a Web Worker to avoid blocking the UI thread. You must host pdf.worker.js and set GlobalWorkerOptions.workerSrc.
pdf-parse bundles pdfjs-dist but hides this complexity, at the cost of flexibility.
You need to create PDF invoices from user data and let them download it.
pdf-lib: It generates PDFs client-side with custom fonts, tables, and logos.pdfjs-dist: Overkill—you don’t need rendering.pdf-parse: Can’t create PDFs.Users upload PDFs and view them page-by-page in your app.
pdfjs-dist: Only it can render pages accurately.Your server receives PDFs and needs to index their content for search.
pdf-parse (simplest API)pdfjs-dist directlypdf-lib: No text extractionStamp “CONFIDENTIAL” on every page of a user-uploaded document.
pdf-lib: Load the PDF, iterate pages, draw text overlay, save.pdfjs-dist: Can’t modify document structure.pdf-parse: Read-only.pdf-lib: Moderate. You work with pages, fonts, and drawing commands. Documentation is clear with examples for common tasks.pdf-parse: Very low. One function, one promise, one string result.pdfjs-dist: High. You manage document loading, page retrieval, rendering contexts, and worker setup. The API exposes PDF internals (operator lists, transforms, glyphs).For quick text extraction in Node.js, pdf-parse wins on simplicity. For anything involving modification or rendering, you’ll invest time learning the chosen library’s model.
These libraries don’t interoperate directly, but you can chain them in workflows:
pdfjs-dist in the browser to render a PDF for previewpdf-lib to add a signature pagepdf-parse to extract text for archivalHowever, note that pdf-lib and pdfjs-dist parse PDFs differently—they may not agree on page count or structure if the PDF is malformed.
| Task | Best Library | Why |
|---|---|---|
| Create/edit PDFs | pdf-lib | Designed for document authoring |
| Display PDFs in browser | pdfjs-dist | Only one with rendering engine |
| Extract raw text (Node.js) | pdf-parse | Simplest API for basic text |
| Extract structured text | pdfjs-dist | Provides layout and positioning |
| Fill PDF forms | pdf-lib | Supports AcroForm manipulation |
| Analyze document metadata | pdfjs-dist | Exposes info, outlines, attachments |
Choose based on whether you’re writing (pdf-lib), reading for display (pdfjs-dist), or reading for text only (pdf-parse). Mixing them is possible but adds bundle size—avoid unless necessary.
Choose pdfjs-dist when you need full-featured PDF handling—including rendering pages to canvas/SVG for display in the browser, extracting text with positional data, accessing document metadata, or supporting interactive features like links and forms. It’s the most capable option for viewing or deeply analyzing PDFs, but its API is lower-level and more complex. Use it when you require pixel-perfect rendering or detailed document inspection, especially in frontend applications.
Choose pdf-lib when you need to programmatically create, edit, or manipulate PDF documents—such as adding watermarks, filling form fields, merging pages, or generating reports from scratch. It works in both browser and Node.js environments and offers fine-grained control over PDF structure without requiring a rendering context. However, it does not render PDFs visually or extract formatted text layout; it’s purely for document authoring and modification.
Choose pdf-parse only if you're working in a Node.js backend and need a lightweight way to extract raw text from PDF files without caring about visual rendering, layout, fonts, or images. It depends on pdfjs-dist internally but wraps it into a promise-based API focused solely on text content. Avoid it in browser environments, and note that it provides no control over parsing options or access to document structure beyond plain text.
PDF.js is a Portable Document Format (PDF) library that is built with HTML5. Our goal is to create a general-purpose, web standards-based platform for parsing and rendering PDFs.
This is a pre-built version of the PDF.js source code. It is automatically generated by the build scripts.
For usage with older browsers/environments, without native support for the
latest JavaScript features, please see the legacy/ folder.
Please see this wiki page for information about supported browsers/environments.
See https://github.com/mozilla/pdf.js for learning and contributing.