pdf.js.mirror/AGENTS.md
2025-12-23 16:13:51 +01:00

6.9 KiB

Overview

PDF.js is a Portable Document Format (PDF) viewer built with JavaScript, HTML5 Canvas, and CSS. It's a Mozilla project that provides a general-purpose, web standards-based platform for parsing and rendering PDFs without requiring native code or plugins.

Common Commands

Development Server

npx gulp server

Then open http://localhost:8888/web/viewer.html to view the PDF viewer. Test PDFs are available at http://localhost:8888/test/pdfs/?frame

Building

Build for modern browsers:

npx gulp generic

This generates pdf.js and pdf.worker.js in build/generic/build/.

Build for distribution (creates pdfjs-dist package):

npx gulp dist
npx gulp dist-install    # Build and install locally

Testing

Run all tests:

npx gulp test

Run unit tests only:

npx gulp unittest

Run integration tests (browser-based tests using Puppeteer):

npx gulp integrationtest

Run font tests:

npx gulp fonttest

Run a single test file by modifying test/test_manifest.json or using test runner options.

Linting and Formatting

Lint JavaScript:

npx gulp lint

Format code (uses Prettier and ESLint):

npx eslint --fix <file>

Type Checking

Run TypeScript type checking:

npx gulp typestest

Architecture

High-Level Structure

PDF.js has a multi-layer architecture that separates concerns between PDF parsing, rendering, and UI:

1. Core Layer (src/core/)

The core layer handles PDF parsing and interpretation. Key responsibilities:

  • PDF parsing: Parsing PDF structure, cross-reference tables, streams
  • Font handling: CFF, TrueType, Type1 font parsing and conversion (font.js, fonts.js, cff_*.js, type1_*.js)
  • Image decoding: JPEG, JBIG2, JPX/JPEG2000 decoders
  • Operators: Processing PDF drawing operators (operator_list.js, evaluator.js)
  • XFA Forms: XML Forms Architecture support (src/core/xfa/)
  • Color spaces: ICC profiles, device color spaces (colorspace.js, icc_colorspace.js)
  • Runs in a Web Worker for performance isolation

Entry point: src/pdf.worker.js

2. Display Layer (src/display/)

The display layer provides the API for rendering PDFs to canvas and managing documents. Key components:

  • API: Main public API (api.js) - PDFDocumentProxy, PDFPageProxy, getDocument()
  • Canvas rendering: Renders PDF operations to HTML5 canvas (canvas.js)
  • Text layer: Extracts and positions text for selection/search (text_layer.js)
  • Annotation layer: Renders and handles PDF annotations (annotation_layer.js)
  • Editor layer: Supports PDF editing (annotations, highlights, stamps) (editor/)
  • Metadata: Parses XMP metadata (metadata.js)
  • Streams: Handles PDF data fetching (fetch, network, node) (fetch_stream.js, network.js, node_stream.js)

Entry point: src/pdf.js

3. Scripting Layer (src/scripting_api/)

Implements JavaScript execution for interactive PDFs (form calculations, validations, button actions).

  • Sandboxed execution environment
  • Implements Acrobat JavaScript API objects (App, Doc, Field, etc.)

Entry points: src/pdf.scripting.js, src/pdf.sandbox.js

4. Web Viewer (web/)

The complete PDF viewer application with UI. Key components:

  • Main app: Application orchestration (app.js)
  • Viewer: Page rendering and layout (pdf_viewer.js, pdf_page_view.js)
  • Toolbar: Zoom, page navigation, print, download controls
  • Sidebar: Thumbnails, outlines, attachments (pdf_sidebar.js, pdf_thumbnail_view.js, pdf_outline_viewer.js)
  • Find controller: Text search functionality (pdf_find_controller.js)
  • Annotation editors: UI for creating/editing annotations (annotation_editor_layer_builder.js)
  • Presentation mode: Full-screen presentation (pdf_presentation_mode.js)

Entry point: web/viewer.html + web/viewer.mjs

5. Shared Utilities (src/shared/)

Common utilities used across layers:

  • Message handling: Worker communication (message_handler.js)
  • Utilities: Common functions and constants (util.js)
  • Image utilities: Image processing helpers (image_utils.js)

Worker Communication

PDF.js uses a Web Worker architecture:

  • Main thread (display layer) communicates with worker thread (core layer) via MessageHandler
  • Keeps PDF parsing off the main thread for better performance
  • Messages include: page rendering requests, text content extraction, metadata queries

Build System

  • Uses Gulp for build orchestration (gulpfile.mjs)
  • Webpack bundles modules into browser-compatible formats
  • Babel transpiles for browser compatibility (configurable targets in gulpfile)
  • Preprocessor replaces build-time constants (e.g., typeof PDFJSDev !== "undefined" checks)
  • Multiple build targets: generic, components, minified, legacy (older browser support)

External Dependencies

Located in external/:

  • bcmaps: Binary CMaps for CJK fonts
  • standard_fonts: Core 14 PDF fonts metrics
  • cmapscompress: Tools for compressing CMaps
  • openjpeg: JPEG2000 decoder (WASM)
  • quickjs: JavaScript engine for sandboxed execution

Translations

Translations in l10n/ are imported from Mozilla Firefox Nightly. Only the file l10n/en-US/viewer.ftl can be updated.

Development Notes

Adding New Features

When adding features that span multiple layers:

  1. Start with the core layer if parsing/interpretation changes are needed
  2. Update the display layer API if new capabilities need exposure
  3. Modify the web viewer if UI changes are required
  4. Ensure worker communication handles new message types

Preprocessor Directives

Code uses preprocessor checks for build-time conditionals:

if (typeof PDFJSDev !== "undefined" && PDFJSDev.test("GENERIC")) {
  // Generic build-specific code
}

Common flags: GENERIC, MOZCENTRAL, CHROME, MINIFIED, TESTING, LIB, SKIP_BABEL, IMAGE_DECODERS

Testing

  • Unit tests use Jasmine framework (test/unit/)
  • Integration tests use Puppeteer for browser automation (test/integration/)
  • Test PDFs downloaded from manifest (test/test_manifest.json)
  • Reference images for visual regression testing (test/ref/)

Code Style

  • Uses ESLint with custom configuration (eslint.config.mjs)
  • Prettier for formatting
  • Stylelint for CSS
  • No semicolons required (ASI enabled)
  • Single quotes for strings

Pull Request Process

  • Keep PRs focused on a single issue
  • Provide a test PDF if the issue is PDF-specific
  • Ensure tests pass (npx gulp test)
  • Run linting (npx gulp lint)
  • Follow existing code patterns
  • Don't modify translations directly (they come from Firefox)

Performance Considerations

  • Core parsing runs in a Web Worker - keep main thread work minimal
  • Canvas rendering can be expensive - use appropriate scale factors
  • Text layer generation is separate from rendering - can be deferred
  • Annotation layer is optional - only enable when needed