823 Commits

Author SHA1 Message Date
Jonas Jenwald
bdc16f8999
Merge pull request #20868 from Snuffleupagus/exportData-compileFontInfo
Move the `compileFontInfo` call into the `Font.prototype.exportData` method (PR 20197 follow-up)
2026-03-18 11:14:46 +01:00
calixteman
b7d71eb7f8 Don't throw when printing with pdfBug enabled
It fixes #20847.
2026-03-17 10:54:32 +01:00
Jonas Jenwald
7d963ddc7c Move the compileFontInfo call into the Font.prototype.exportData method (PR 20197 follow-up)
After the changes in PR 20197 the code in the `TranslatedFont.prototype.send` method is not all that readable[1] given how it handles e.g. the `charProcOperatorList` data used with Type3 fonts.
Since this is the only spot where `Font.prototype.exportData` is used, it seems much simpler to move the `compileFontInfo` call there and *directly* return the intended data rather than messing with it after the fact.

Finally, while it doesn't really matter, the patch flips the order of the `charProcOperatorList` and `extra` properties throughout the code-base since the former is used with Type3 fonts while the latter (effectively) requires that debugging is enabled.

---

[1] I had to re-read it twice, also looking at all the involved methods, in order to convince myself that it's actually correct.
2026-03-16 09:29:17 +01:00
Tim van der Meij
315491dd32
Merge pull request #20840 from Snuffleupagus/getDocument-rm-length
[api-minor] Remove the `length` parameter from `getDocument`
2026-03-15 11:48:02 +01:00
Jonas Jenwald
09a9a7bd0b [api-minor] Remove the length parameter from getDocument
This is an old API-parameter that is now unused within the PDF.js project itself, and its description says that it's (partly) being used for "range requests operations".
Note that the `length` API-parameter is used to set the *initial* `contentLength` in various `BasePDFStreamReader` implementations, however it's always overridden by the "Content-Length" header (sent by the server) when that one exists *and* is a valid number. While we currently fallback to the keep the initial `contentLength` otherwise, note however how in that case range requests will always be *disabled* and thus the only spot in the code-base [where `fullReader.contentLength` is necessary](873378b718/src/core/worker.js (L230-L236)) cannot actually be reached.

Hence the only possible reason to use the `length` API-parameter would be for improved progress reporting[1] during streaming of PDF data in rare cases where the "Content-Length" header is missing/invalid, but the user *somehow* has information from another source about the correct `length` of the PDF document.
That situation feels very much like an edge-case, but it's obviously impossible to know if someone is depending on it. However, please note that there's a work-around available for users affected by this removal:
 - Implement a `PDFDataRangeTransport` instance together with custom data-fetching[2], since in that case its `length`-parameter will always be used as-is.

Finally, updates various `BasePDFStreamReader` implementations to only set the `_isRangeSupported` field once the headers are available (since previously we'd just overwrite the "initial" value anyway).

---

[1] I.e. to avoid the "indeterminate" loadingBar being displayed in the viewer.

[2] This is what e.g. the Firefox PDF Viewer uses.
2026-03-13 23:42:45 +01:00
Jonas Jenwald
3842936edf Split the src/shared/obj-bin-transform.js file into separate files for the main/worker threads (PR 20197 follow-up)
On the worker-thread only the static `write` methods are actually used, and on the main-thread only class instances are being created.
Hence this, after PR 20197, leads to a bunch of dead code in both of the *built* `pdf.mjs` and `pdf.worker.js` files.

This patch reduces the size of the `gulp mozcentral` output by `21 419` bytes, i.e. `21` kilo-bytes, which I believe is way too large of a saving to not do this.
(I can't even remember the last time we managed to reduce build-size this much with a single patch.)
2026-03-13 11:21:24 +01:00
Tim van der Meij
decbce7b9b
Merge pull request #20856 from Snuffleupagus/fix-font-clearData
Fix the `FontInfo.prototype.clearData` method to actually remove the data as intended (PR 20197 follow-up)
2026-03-12 20:39:02 +01:00
Jonas Jenwald
e88a5652de Fix the FontInfo.prototype.clearData method to actually remove the data as intended (PR 20197 follow-up)
The purpose of PR 11844 was to reduce memory usage once fonts have been attached to the DOM, since the font-data can be quite large in many cases.

Unfortunately the new `clearData` method added in PR 20197 doesn't actually remove *anything*, it just replaces the font-data with zeros which doesn't help when the underlying `ArrayBuffer` itself isn't modified.
The method does include a commented-out `resize` call[1], but uncommenting that just breaks rendering completely.

To address this regression, without having to make large or possibly complex changes, this patch simply changes the `clearData` method to replace the internal buffer/view with its contents *before* the font-data.
While this does lead to a data copy, the size of this data is usually orders of magnitude smaller than the font-data that we're removing.

---

[1] Slightly off-topic, but I don't think that patches should include commented-out code since there's a very real risk that those things never get found/fixed.
At the very least such cases should be clearly marked with `// TODO: ...` comments, and should possibly also have an issue filed about fixing the TODO.
2026-03-12 18:15:42 +01:00
Jonas Jenwald
9aa1ce8f14 Move the PagesMapper class into its own file
The `PagesMapper` class currently makes up one third of the `src/display/display_utils.js` file size, and since its introduction it's grown (a fair bit) in size.
Note that the intention with files such as `src/display/display_utils.js` was to have somewhere to place functionality too small/simple to deserve its own file.
2026-03-11 12:28:13 +01:00
calixteman
9d093d9607
Merge pull request #20626 from nicolo-ribaudo/images-right-click
Add support for right-clicking on images (bug 1012805)
2026-03-11 11:45:51 +01:00
Tim van der Meij
44a63549b0
Merge pull request #20831 from calixteman/internal_viewer
Add a new internal viewer to explore the structure of PDF files.
2026-03-10 20:48:40 +01:00
Jonas Jenwald
5ef582fb20 Use optional chaining a little bit more in the src/display/api.js file
That format is preferred where possible, since it leads to ever so slightly shorter code overall.
2026-03-10 15:51:05 +01:00
Nicolò Ribaudo
4f7a025e21
Separate bbox tracking from dependencies tracking
When recording bboxes for images, it's enough to record their
clip box / bounding box without needing to run the full bbox
tracking of the image's dependencies.
2026-03-10 14:51:03 +01:00
Nicolò Ribaudo
886c90d1a5
Add support for right-clicking on images
This patch adds right-click support for images in the PDF, allowing
users to download them. To minimize memory consumption, we:
- Do not store the images separately, and instead crop them out of the
  PDF page canvas
- Only extract the images when needed (i.e. when the user right-clicks
  on them), rather than eagery having all of them available.

To do so, we layer one empty 0x0 canvas per image, stretched to cover
the whole image, and only populate its contents on right click.
These images need to be inside the text layer: they cannot be _behind_
it, otherwise they would be covered by the text layer's container and
not be clickable, and they cannot be in front of it, otherwise they
would make the text spans unselectable.

This feature is managed by a new preference, `imagesRightClickMinSize`:
- when it's set to `-1`, right-click support is disabled
- when set to `0`, all images are available for right click
- when set to a positive integer, only images whose width and height are
  greater than or equal to that value (in the PDF page frame of
  reference) are available for right click.

This features is disabled by default outside of MOZCENTRAL, as it
significantly degrades the text selection experience in non-Firefox
browsers.
2026-03-10 14:51:03 +01:00
calixteman
9d81fafa8c
Add a new internal viewer to explore the structure of PDF files.
The one from pdf.js.utils is a bit too old: a lot of bugs have been fixed
in the code that parses PDF files since then.
It's just an internal development tool, so it doesn't need to be perfect,
but it should be good enough to be useful.
2026-03-09 14:16:12 +01:00
Calixte Denizet
0e48c16c3c
Add a UI to undo cut/delete and cancel a copy (bug 2021352, bug 2010832)
This happens in a bar on top of the thumbnails sidebar.
The label depending on the selected thumbnails is fixed.
2026-03-09 10:44:11 +01:00
Jonas Jenwald
1f69cf964c Ensure that percent === NaN is consistently reported by the onProgress callback
With these changes `0`, `NaN`, `null`, and `undefined` in the `total`-property all result in `percent === NaN` being reported by the callback, since previously e.g. `0` would result in `percent === 100` being reported unconditionally which doesn't make a lot of sense.

Also, remove the "indeterminate" loadingBar (in the viewer) if the `PDFDocumentLoadingTask` fails since there won't be any more data arriving and displaying the animation thus seems wrong.
2026-03-08 10:21:55 +01:00
Jonas Jenwald
82fc2c94f0 Include transfers correctly in the "GetOperatorList" message (PR 16588 follow-up)
Currently the transfers aren't actually being used with the "GetOperatorList" message, since the placement of the parameter is wrong; note the method signature: 909a700afa/src/shared/message_handler.js (L219-L229)
This goes back to PR 16588, which added the transfers parameter, and unfortunately we all missed that :-(

Simply fixing the parameter isn't enough however, since that broke printing of Stamp-editors (and possibly others). The solution here is to *not* transfer data during printing, given that a single `PrintAnnotationStorage` instance is being used for all pages.
2026-02-25 15:55:43 +01:00
Jonas Jenwald
2e07715c9d Reduce function creation when using Map.prototype.getOrInsertComputed()
With the exception of the first invocation the callback function is unused, which means that a lot of pointless functions may be created.
To avoid this we introduce helper functions for simple cases, such as creating `Map`s and `Objects`s.
2026-02-24 08:58:28 +01:00
Jonas Jenwald
e2c8fc6140 Use Map.prototype.getOrInsertComputed() in the src/display/api.js file 2026-02-22 21:01:13 +01:00
Calixte Denizet
d755fba96a
Add support for deleting, cutting, copying and pasting pages (bug 2010830, 2010831) 2026-02-18 16:43:00 +01:00
Jonas Jenwald
c1b824f2e5 Convert PDFPageProxy.prototype.getTextContent to an asynchronous method
This is a tiny bit shorter, which cannot hurt.
2026-02-11 19:14:10 +01:00
calixteman
4b4ab10c54
Set a pages mapper per loaded document
It fixes #20629.
2026-02-08 21:09:27 +01:00
Jonas Jenwald
d25f13d1fd Report loading progress "automatically" when using the PDFDataTransportStream class, and remove the PDFDataRangeTransport.prototype.onDataProgress method
This is consistent with the other `BasePDFStream` implementations, and simplifies the API surface of the `PDFDataRangeTransport` class (note the changes in the viewer).
Given that the `onDataProgress` method was changed to a no-op this won't affect third-party users, assuming there even are any since this code was written specifically for the Firefox PDF Viewer.
2026-02-01 18:20:19 +01:00
Jonas Jenwald
ecb09d62fc Add the current loading percentage to the onPassword callback
The percentage calculation is currently "spread out" across various viewer functionality, which we can avoid by having the API handle that instead.

Also, remove the `this.#lastProgress` special-case[1] and just register a "normal" `fullReader.onProgress` callback unconditionally. Once `headersReady` is resolved the callback can simply be removed when not needed, since the "worst" thing that could theoretically happen is that the loadingBar (in the viewer) updates sooner this way. In practice though, since `fullReader.read` cannot return data until `headersReady` is resolved, this change is not actually observable in the API.

---

[1] This was added in PR 8617, close to a decade ago, but it's not obvious to me that it was ever necessary to implement it that way.
2026-01-31 16:33:58 +01:00
Jonas Jenwald
4a8fb4dde1 Add an abstract BasePDFStream class, that all the old IPDFStream implementations inherit from
Given that there's no less than *five* different, but very similar, implementations this helps reduce code duplication and simplifies maintenance.

Also, spotted during rebasing, pass the `enableHWA` option "correctly" (i.e. as part of the existing `transportParams`) to the `WorkerTransport`-class to keep the constructor simpler.
2026-01-30 14:15:39 +01:00
Calixte Denizet
806133379e
Refactor a bit page mapping stuff in order to be able to support delete/copy pages 2026-01-26 16:53:52 +01:00
Ujjwal Sharma
3a85770af1 Encode FontPath data into an ArrayBuffer
Serialize FontPath commands into a binary format
and store it in an ArrayBuffer so that it can
eventually be stored in a SharedArrayBuffer.
2025-12-06 03:00:48 +05:30
Calixte Denizet
50c48cf11b Add telemetry for tagged pdfs (bug 1997134) 2025-11-17 19:47:16 +01:00
Calixte Denizet
bc87f4e8d6 Add the possibility to create a pdf from different ones (bug 1997379)
For now it's just possible to create a single pdf in selecting some pages in different pdf sources.
The merge is for now pretty basic (it's why it's still a WIP) none of these data are merged for now:
 - the struct trees
 - the page labels
 - the outlines
 - named destinations
For there are 2 new ref tests where some new pdfs are created: one with some extracted pages and an other
one (encrypted) which is just rewritten.
The ref images are generated from the original pdfs in selecting the page we want and the new images are
taken from the generated pdfs.
2025-11-07 14:57:48 +01:00
Aditi
fa631806bf Serialize pattern data into ArrayBuffer
Follow up on https://github.com/mozilla/pdf.js/pull/20197,
This serializes pattern data into an ArrayBuffer which is
then transferred from the worker to the main thread.

It sets up the stage for us to eventually switch to a
SharedArrayBuffer in the future.
2025-10-11 01:58:07 +05:30
Calixte Denizet
19ff148163 Fix incremental saving with hybrid references
This patch removes some previous fixes which are now likely fixed by #17636.

Fixes #20302.
2025-10-04 18:31:55 +02:00
Ujjwal Sharma
4bed7370f4 [WIP] Serialize font data into an ArrayBuffer
This PR serializes font data into an ArrayBuffer
that is then transfered from the worker to the
main thread. It's more efficient than the current
solution which clones the "export data" object
which includes the font data as a Uint8Array.

It prepares us to switch to a SharedArrayBuffer
in the future, which would allow us to share
the font data with multiple agents, which would be
crucial for the upcoming "renderer" worker.
2025-09-19 12:02:40 +05:30
Nicolò Ribaudo
42b4d97115
Fix JSDoc description in src/display/api.js 2025-09-12 15:01:41 +02:00
Nicolò Ribaudo
e4ea2e0c79
Store ops bboxes in a linear Uint8Array
This PR changes the way we store bounding boxes so that they use less
memory and can be more easily shared across threads in the future.

Instead of storing the bounding box and list of dependencies for each
operation that renders _something_, we now only store the bounding box
of _every_ operation and no dependencies list. The bounding box of
each operation covers the bounding box of all the operations affected
by it that render something. For example, the bounding box of a
`setFont` operation will be the bounding box of all the `showText`
operations that use that font.

This affects the debugging experience in pdfBug, since now the bounding
box of an operation may be larger than what it renders itself. To help
with this, now when hovering on an operation we also highlight (in red)
all its dependents. We highlight with white stripes operations that do
not affect any part of the page (i.e. with an empty bbox).

To save memory, we now save bounding box x/y coordinates as uint8
rather than float64. This effectively gives us a 256x256 uniform grid
that covers the page, which is high enough resolution for the usecase.
2025-09-09 10:24:48 +02:00
Nicolò Ribaudo
6a22da9c2e
Add logic to track rendering area of various PDF ops
This commit is a first step towards #6419, and it can also help with
first compute which ops can affect what is visible in that part of
the page.

This commit adds logic to track operations with their respective
bounding boxes. Only operations that actually cause something to
be rendered have a bounding box and dependencies.

Consider the following example:
```
0. setFillRGBColor
1. beginText
2. showText "Hello"
3. endText
4. constructPath [...] -> eoFill
```
here we have three rendering operations: the showText op (2) and the
path (4). (2) depends on (0), (1) and (3), while (4) only depends on
(0). Both (2) and (4) have a bounding box.

This tracking happens when first rendering a PDF: we then use the
recorded information to optimize future partial renderings of a PDF, so
that we can skip operations that do not affected the PDF area on the
canvas.

All this logic only runs when the new `enableOptimizedPartialRendering`
preference, disabled by default, is enabled.

The bounding boxes and dependencies are also shown in the pdfBug
stepper. When hovering over a step now:
- it highlights the steps that they depend on
- it highlights on the PDF itself the bounding box
2025-08-22 18:26:59 +02:00
Calixte Denizet
9e5ee1e5a7 [Editor] Add the ability to get all the editable annotations in a pdf document
We want to be able to show all the comments in a pdf even if the pages where they are
haven't been rendered.
And it'll help to fix the issue #18915.
2025-08-18 21:31:11 +02:00
Knut Hühne
760c8d632c
Mark canvasContext as optional
In #20016, the `canvasContext` property of `RenderParameters` was deprecated in favor of the new `canvas` property.

The JSDoc was updated to include the new parameter along with the old one.

I think the old one should be enclosed in `[]` to mark it as optional (and to allow usage from TypeScript with just the `canvas` parameter provided).

I also reordered the properties so that all required properties come first, follow by optional ones.
2025-08-06 12:38:30 +02:00
Calixte Denizet
1c15ba7789 Use the canvas context from mozPrintCallback when printing a pdf from the Firefox viewer
The context is specific to the callback and cannot be created from the canvas itself.
2025-07-11 12:55:39 +02:00
Ujjwal Sharma
b1b728d47f [api-minor] Move getContext call to InternalRenderTask
This is a precursor to moving the call into a
worker thread to let us use `OffscreenCanvas`. The
current position wouldn't work since we make
transformations to the canvas object after the
getContext call, which isn't allowed for
OffscreenCanvas. Also it isn't allowed to clone or
`transferControlToOffscreen` the canvas after the
`getContext` call.
2025-07-04 00:53:51 +02:00
Jonas Jenwald
e91b480c09 Move the PDFObjects class to its own file
This isn't directly part of the official API, and having this class in its own file could help avoid future changes (e.g. issue 18148) affecting the size of the `src/display/api.js` file unnecessarily.
2025-05-20 13:47:36 +02:00
Jonas Jenwald
0105237af6 Move a few helper functions/classes out of the src/display/api.js file
Given that this file represents the official API, it's difficult to avoid it becoming fairly large as we add new functionality. However, it also contains a couple of smaller (and internal) helpers that we can move into a new utils-file.

Also, we inline the `DEFAULT_RANGE_CHUNK_SIZE` constant since it's only used *once* and its value has never been changed in over a decade.
2025-05-20 13:47:36 +02:00
Tim van der Meij
f6e4b1cf4a
Merge pull request #19945 from Snuffleupagus/NetworkStream-rm-Node-Error
Remove Node.js-specific checks when using the Fetch API
2025-05-18 12:05:13 +02:00
Jonas Jenwald
a882195e9b Remove Node.js-specific checks when using the Fetch API
Given that Node.js has full support for the Fetch API since version 21, see the "History" data at https://nodejs.org/api/globals.html#fetch, it seems unnecessary for us to manually check for various globals before using it.

Since our primary development target is browsers in general, and Firefox in particular, being able to remove Node.js-specific compatibility code is always helpful.

Note that we still, for now, support Node.js version 20 and if the relevant globals are not available then Errors will instead be thrown from within the `PDFFetchStream` class.
2025-05-18 10:49:02 +02:00
Jonas Jenwald
99b23ea1f6 Use private fields in the PDFDataRangeTransport class 2025-05-18 10:16:46 +02:00
Jonas Jenwald
fc697b3602 Utilize private fields and methods more in the PDFWorker class
This replaces, wherever possible, the old semi-private fields and methods with actually private ones.
2025-05-17 18:05:49 +02:00
Jonas Jenwald
ab672f0b77 Replace PDFWorker.fromPort with a generic PDFWorker.create method
This allows us to simply invoke `PDFWorker.create` unconditionally from the `getDocument` function, without having to manually check if a global `workerPort` is available first.
2025-05-17 16:13:41 +02:00
Jonas Jenwald
e5e9d18289 Allow using the workerPort option in Firefox 2025-05-15 19:30:43 +02:00
Jonas Jenwald
6b961c424f Update Webpack to version 5.99.5 (issue 19808)
In Webpack version `5.99.0` the way that `export` statements are handled was changed slightly, with much less boilerplate code being generated, which unfortunately breaks our `tweakWebpackOutput` function that's used to expose the exported properties globally and that e.g. the viewer depends upon.

Given that we were depending on formatting that should most likely be viewed as nothing more than an internal implementation detail in Webpack, we instead work-around this by manually defining the structures that were previously generated.
Obviously this will lead to a tiny bit more manual work in the future, however we don't change the API-surface often enough that it should be a big issue *and* the relevant unit-tests are updated such that it shouldn't be possible to break this.

*NOTE:* In the future we might want to consider no longer using global properties like this, and instead rely only on proper `export`s throughout the code-base.
However changing this would likely be non-trivial (given edge-cases), and it'd be an `api-major` change, so let's just do the minimal amount of work to unblock Webpack updates for now.
2025-04-13 16:48:19 +02:00
Jonas Jenwald
8bcc3664c9 [api-minor] Attempt to improve support for using the PDF.js builds with Vite
Similar to Webpack there's apparently other bundlers that will not leave `import`-calls alone unless magic comments are used.
Hence we extend the builder to also append `/* @vite-ignore */` comments to `import`-calls, in order to attempt to improve support for using the PDF.js builds together with Vite.

This patch also renames `__non_webpack_import__` to `__raw_import__` since the functionality is no longer bundler-specific.

***PLEASE NOTE:*** This patch is provided as-is, and it does *not* mean that the PDF.js project can/will provide official support for Vite.
2025-03-28 16:34:00 +01:00