6.9 KiB
Overview
PDF.js is a Portable Document Format (PDF) viewer built with JavaScript, HTML5 Canvas, and CSS. It's a Mozilla project that provides a general-purpose, web standards-based platform for parsing and rendering PDFs without requiring native code or plugins.
Common Commands
Development Server
npx gulp server
Then open http://localhost:8888/web/viewer.html to view the PDF viewer. Test PDFs are available at http://localhost:8888/test/pdfs/?frame
Building
Build for modern browsers:
npx gulp generic
This generates pdf.js and pdf.worker.js in build/generic/build/.
Build for distribution (creates pdfjs-dist package):
npx gulp dist
npx gulp dist-install # Build and install locally
Testing
Run all tests:
npx gulp test
Run unit tests only:
npx gulp unittest
Run integration tests (browser-based tests using Puppeteer):
npx gulp integrationtest
Run font tests:
npx gulp fonttest
Run a single test file by modifying test/test_manifest.json or using test runner options.
Linting and Formatting
Lint JavaScript:
npx gulp lint
Format code (uses Prettier and ESLint):
npx eslint --fix <file>
Type Checking
Run TypeScript type checking:
npx gulp typestest
Architecture
High-Level Structure
PDF.js has a multi-layer architecture that separates concerns between PDF parsing, rendering, and UI:
1. Core Layer (src/core/)
The core layer handles PDF parsing and interpretation. Key responsibilities:
- PDF parsing: Parsing PDF structure, cross-reference tables, streams
- Font handling: CFF, TrueType, Type1 font parsing and conversion (
font.js,fonts.js,cff_*.js,type1_*.js) - Image decoding: JPEG, JBIG2, JPX/JPEG2000 decoders
- Operators: Processing PDF drawing operators (
operator_list.js,evaluator.js) - XFA Forms: XML Forms Architecture support (
src/core/xfa/) - Color spaces: ICC profiles, device color spaces (
colorspace.js,icc_colorspace.js) - Runs in a Web Worker for performance isolation
Entry point: src/pdf.worker.js
2. Display Layer (src/display/)
The display layer provides the API for rendering PDFs to canvas and managing documents. Key components:
- API: Main public API (
api.js) -PDFDocumentProxy,PDFPageProxy,getDocument() - Canvas rendering: Renders PDF operations to HTML5 canvas (
canvas.js) - Text layer: Extracts and positions text for selection/search (
text_layer.js) - Annotation layer: Renders and handles PDF annotations (
annotation_layer.js) - Editor layer: Supports PDF editing (annotations, highlights, stamps) (
editor/) - Metadata: Parses XMP metadata (
metadata.js) - Streams: Handles PDF data fetching (fetch, network, node) (
fetch_stream.js,network.js,node_stream.js)
Entry point: src/pdf.js
3. Scripting Layer (src/scripting_api/)
Implements JavaScript execution for interactive PDFs (form calculations, validations, button actions).
- Sandboxed execution environment
- Implements Acrobat JavaScript API objects (App, Doc, Field, etc.)
Entry points: src/pdf.scripting.js, src/pdf.sandbox.js
4. Web Viewer (web/)
The complete PDF viewer application with UI. Key components:
- Main app: Application orchestration (
app.js) - Viewer: Page rendering and layout (
pdf_viewer.js,pdf_page_view.js) - Toolbar: Zoom, page navigation, print, download controls
- Sidebar: Thumbnails, outlines, attachments (
pdf_sidebar.js,pdf_thumbnail_view.js,pdf_outline_viewer.js) - Find controller: Text search functionality (
pdf_find_controller.js) - Annotation editors: UI for creating/editing annotations (
annotation_editor_layer_builder.js) - Presentation mode: Full-screen presentation (
pdf_presentation_mode.js)
Entry point: web/viewer.html + web/viewer.mjs
5. Shared Utilities (src/shared/)
Common utilities used across layers:
- Message handling: Worker communication (
message_handler.js) - Utilities: Common functions and constants (
util.js) - Image utilities: Image processing helpers (
image_utils.js)
Worker Communication
PDF.js uses a Web Worker architecture:
- Main thread (
displaylayer) communicates with worker thread (corelayer) viaMessageHandler - Keeps PDF parsing off the main thread for better performance
- Messages include: page rendering requests, text content extraction, metadata queries
Build System
- Uses Gulp for build orchestration (
gulpfile.mjs) - Webpack bundles modules into browser-compatible formats
- Babel transpiles for browser compatibility (configurable targets in gulpfile)
- Preprocessor replaces build-time constants (e.g.,
typeof PDFJSDev !== "undefined"checks) - Multiple build targets: generic, components, minified, legacy (older browser support)
External Dependencies
Located in external/:
- bcmaps: Binary CMaps for CJK fonts
- standard_fonts: Core 14 PDF fonts metrics
- cmapscompress: Tools for compressing CMaps
- openjpeg: JPEG2000 decoder (WASM)
- quickjs: JavaScript engine for sandboxed execution
Translations
Translations in l10n/ are imported from Mozilla Firefox Nightly. Only the file l10n/en-US/viewer.ftl can be updated.
Development Notes
Adding New Features
When adding features that span multiple layers:
- Start with the
corelayer if parsing/interpretation changes are needed - Update the
displaylayer API if new capabilities need exposure - Modify the
webviewer if UI changes are required - Ensure worker communication handles new message types
Preprocessor Directives
Code uses preprocessor checks for build-time conditionals:
if (typeof PDFJSDev !== "undefined" && PDFJSDev.test("GENERIC")) {
// Generic build-specific code
}
Common flags: GENERIC, MOZCENTRAL, CHROME, MINIFIED, TESTING, LIB, SKIP_BABEL, IMAGE_DECODERS
Testing
- Unit tests use Jasmine framework (
test/unit/) - Integration tests use Puppeteer for browser automation (
test/integration/) - Test PDFs downloaded from manifest (
test/test_manifest.json) - Reference images for visual regression testing (
test/ref/)
Code Style
- Uses ESLint with custom configuration (
eslint.config.mjs) - Prettier for formatting
- Stylelint for CSS
- No semicolons required (ASI enabled)
- Single quotes for strings
Pull Request Process
- Keep PRs focused on a single issue
- Provide a test PDF if the issue is PDF-specific
- Ensure tests pass (
npx gulp test) - Run linting (
npx gulp lint) - Follow existing code patterns
- Don't modify translations directly (they come from Firefox)
Performance Considerations
- Core parsing runs in a Web Worker - keep main thread work minimal
- Canvas rendering can be expensive - use appropriate scale factors
- Text layer generation is separate from rendering - can be deferred
- Annotation layer is optional - only enable when needed