1459 Commits

Author SHA1 Message Date
Jonas Jenwald
3a372fde94 [api-minor] Replace the CMapReaderFactory, StandardFontDataFactory, and WasmFactory API options with a single factory/option
Currently we have no less than three different, but very similar, factories for reading built-in CMap files, standard font files, and wasm files on the main-thread.[1]
These factories were added at different points in time, since I cannot imagine that we'd add essentially three copies of the same code otherwise.

Nowadays these factories are often not even used[2], since worker-thread fetching is used whenever possible to improve performance. In particular, they will *only* be used when either:
 - The PDF.js library runs in Node.js environments.
 - The user manually sets `useWorkerFetch = false` when calling `getDocument`.
 - The user provides custom `CMapReaderFactory`, `StandardFontDataFactory`, and/or `WasmFactory` instances when calling `getDocument`.

By replacing these factories with *a single* new `BinaryDataFactory` factory/option the number of `getDocument` options are thus reduced, which cannot hurt.
This also reduces the total bundle-size of the Firefox PDF Viewer a little bit, and it slightly reduces the number of import maps that need to be maintained.

*Please note:* For users that provide custom `CMapReaderFactory`, `StandardFontDataFactory`, and `WasmFactory` instances when calling `getDocument` this will be a breaking change, however it's unlikely that (many) such users exist.
(The *internal* format data-format of `CMapReaderFactory` was changed in PR 18951, and there hasn't been a single question/complaint about it in well over a year.)

---

[1] Any new functionality could easily lead to more such factories being added in the future, which wouldn't be great.

[2] Note that the Firefox PDF Viewer no longer use these factories, since it "forcibly" sets `useWorkerFetch = true` during building.
2026-03-22 15:49:06 +01:00
calixteman
ec24053ddf
Don't add an EOL after a superscript 2026-03-22 14:20:18 +01:00
Jonas Jenwald
262aeef3fa [api-minor] Simplify BaseCMapReaderFactory by having the worker-thread create the filename
The `BaseCMapReaderFactory`, `BaseStandardFontDataFactory`, and `BaseWasmFactory` classes are all very similar, and the only difference is really in their respective `fetch` methods.
By have the worker-thread "compute" the complete `filename` it's possible to simplify the `BaseCMapReaderFactory.prototype.fetch` method, which will allow future improvements to all of these classes.

A couple of things to note:
 - This code is unused, and it's not even bundled, in the Firefox PDF Viewer.
 - In browsers it's unused by default, and worker-thread fetching will always be used when possible since that's more efficient.

*Please note:* For users that provide a custom `CMapReaderFactory` instance when calling `getDocument` this could be a breaking change, however it's unlikely that any such users exist.
(The *internal* format of this data was changed previously in PR 18951, and there hasn't been a single question/complaint about it in well over a year.)
2026-03-21 15:54:40 +01:00
Tim van der Meij
ab228da9ce
Merge pull request #20931 from Snuffleupagus/rm-factory-name-validation
Remove explicit `name`/`filename` validation in the `BaseCMapReaderFactory`, `BaseStandardFontDataFactory`, and `BaseWasmFactory` classes
2026-03-20 20:15:23 +01:00
calixteman
16aee06aac
Merge pull request #20925 from calixteman/reorganize_save_annotations
Add the possibility to save added annotations when reorganizing a pdf (bug 2023086)
2026-03-20 16:32:10 +01:00
Jonas Jenwald
5299eb2b83 Remove explicit name/filename validation in the BaseCMapReaderFactory, BaseStandardFontDataFactory, and BaseWasmFactory classes
Given that these classes are only used from the "FetchBinaryData" message handler, the `name`/`filename` parameters should never actually be missing and if they are that's a bug elsewhere in the code-base.
Furthermore a missing `name`/`filename` parameter would result in a "nonsense" URL and the actual data fetching would then fail instead, hence keeping this old validation code just doesn't seem necessary.
2026-03-20 15:50:26 +01:00
Calixte Denizet
04272de41d
Add the possibility to save added annotations when reorganizing a pdf (bug 2023086) 2026-03-20 10:55:47 +01:00
Calixte Denizet
c17801b77e Avoid getting null value in RefSet when cloning 2026-03-20 10:41:58 +01:00
Tim van der Meij
ff1af5a058
Merge pull request #20916 from calixteman/fix_co
When merging pdfs, fix the CO after the fields have been cloned
2026-03-19 21:22:43 +01:00
Tim van der Meij
6245bb201c
Merge pull request #20915 from calixteman/fix_pageindice
Avoid to use a used slot when looking for a new page position
2026-03-19 21:22:32 +01:00
Tim van der Meij
8cae5d17f2
Merge pull request #20917 from calixteman/fix_dup_name_dest
Fix the destination names when they're duplicated
2026-03-19 21:22:19 +01:00
Jonas Jenwald
7609a42209 Use toBeInstanceOf consistently in the unit-tests
There's currently a lot of unit-tests that manually check `instanceof`, let's replace that with the built-in Jasmine matcher function; see https://jasmine.github.io/api/edge/matchers.html#toBeInstanceOf
2026-03-19 17:18:25 +01:00
Calixte Denizet
cf67c1ef1e
Fix the destination names when they're duplicated 2026-03-19 10:52:39 +01:00
Calixte Denizet
b7da4b80a9
When merging pdfs, fix the CO after the fields have been cloned 2026-03-19 10:09:40 +01:00
Calixte Denizet
0bee641fed
Avoid to use a used slot when looking for a new page position 2026-03-19 09:40:16 +01:00
Jonas Jenwald
bdc16f8999
Merge pull request #20868 from Snuffleupagus/exportData-compileFontInfo
Move the `compileFontInfo` call into the `Font.prototype.exportData` method (PR 20197 follow-up)
2026-03-18 11:14:46 +01:00
Calixte Denizet
e67892d035
Add support for saving outlines after reorganize/merge (bug 2009574) 2026-03-17 22:22:13 +01:00
Jonas Jenwald
7d963ddc7c Move the compileFontInfo call into the Font.prototype.exportData method (PR 20197 follow-up)
After the changes in PR 20197 the code in the `TranslatedFont.prototype.send` method is not all that readable[1] given how it handles e.g. the `charProcOperatorList` data used with Type3 fonts.
Since this is the only spot where `Font.prototype.exportData` is used, it seems much simpler to move the `compileFontInfo` call there and *directly* return the intended data rather than messing with it after the fact.

Finally, while it doesn't really matter, the patch flips the order of the `charProcOperatorList` and `extra` properties throughout the code-base since the former is used with Type3 fonts while the latter (effectively) requires that debugging is enabled.

---

[1] I had to re-read it twice, also looking at all the involved methods, in order to convince myself that it's actually correct.
2026-03-16 09:29:17 +01:00
calixteman
3ff52e415f
Merge pull request #20862 from calixteman/bug2023106
Check for having Ref before adding them in a RefSet (bug 2023106)
2026-03-15 22:15:58 +01:00
Calixte Denizet
0fca64f01e
Check for having Ref before adding them in a RefSet (bug 2023106) 2026-03-15 22:03:39 +01:00
Tim van der Meij
315491dd32
Merge pull request #20840 from Snuffleupagus/getDocument-rm-length
[api-minor] Remove the `length` parameter from `getDocument`
2026-03-15 11:48:02 +01:00
Jonas Jenwald
09a9a7bd0b [api-minor] Remove the length parameter from getDocument
This is an old API-parameter that is now unused within the PDF.js project itself, and its description says that it's (partly) being used for "range requests operations".
Note that the `length` API-parameter is used to set the *initial* `contentLength` in various `BasePDFStreamReader` implementations, however it's always overridden by the "Content-Length" header (sent by the server) when that one exists *and* is a valid number. While we currently fallback to the keep the initial `contentLength` otherwise, note however how in that case range requests will always be *disabled* and thus the only spot in the code-base [where `fullReader.contentLength` is necessary](873378b718/src/core/worker.js (L230-L236)) cannot actually be reached.

Hence the only possible reason to use the `length` API-parameter would be for improved progress reporting[1] during streaming of PDF data in rare cases where the "Content-Length" header is missing/invalid, but the user *somehow* has information from another source about the correct `length` of the PDF document.
That situation feels very much like an edge-case, but it's obviously impossible to know if someone is depending on it. However, please note that there's a work-around available for users affected by this removal:
 - Implement a `PDFDataRangeTransport` instance together with custom data-fetching[2], since in that case its `length`-parameter will always be used as-is.

Finally, updates various `BasePDFStreamReader` implementations to only set the `_isRangeSupported` field once the headers are available (since previously we'd just overwrite the "initial" value anyway).

---

[1] I.e. to avoid the "indeterminate" loadingBar being displayed in the viewer.

[2] This is what e.g. the Firefox PDF Viewer uses.
2026-03-13 23:42:45 +01:00
Jonas Jenwald
3842936edf Split the src/shared/obj-bin-transform.js file into separate files for the main/worker threads (PR 20197 follow-up)
On the worker-thread only the static `write` methods are actually used, and on the main-thread only class instances are being created.
Hence this, after PR 20197, leads to a bunch of dead code in both of the *built* `pdf.mjs` and `pdf.worker.js` files.

This patch reduces the size of the `gulp mozcentral` output by `21 419` bytes, i.e. `21` kilo-bytes, which I believe is way too large of a saving to not do this.
(I can't even remember the last time we managed to reduce build-size this much with a single patch.)
2026-03-13 11:21:24 +01:00
calixteman
9d093d9607
Merge pull request #20626 from nicolo-ribaudo/images-right-click
Add support for right-clicking on images (bug 1012805)
2026-03-11 11:45:51 +01:00
Nicolò Ribaudo
886c90d1a5
Add support for right-clicking on images
This patch adds right-click support for images in the PDF, allowing
users to download them. To minimize memory consumption, we:
- Do not store the images separately, and instead crop them out of the
  PDF page canvas
- Only extract the images when needed (i.e. when the user right-clicks
  on them), rather than eagery having all of them available.

To do so, we layer one empty 0x0 canvas per image, stretched to cover
the whole image, and only populate its contents on right click.
These images need to be inside the text layer: they cannot be _behind_
it, otherwise they would be covered by the text layer's container and
not be clickable, and they cannot be in front of it, otherwise they
would make the text spans unselectable.

This feature is managed by a new preference, `imagesRightClickMinSize`:
- when it's set to `-1`, right-click support is disabled
- when set to `0`, all images are available for right click
- when set to a positive integer, only images whose width and height are
  greater than or equal to that value (in the PDF page frame of
  reference) are available for right click.

This features is disabled by default outside of MOZCENTRAL, as it
significantly degrades the text selection experience in non-Firefox
browsers.
2026-03-10 14:51:03 +01:00
Jonas Jenwald
a1b769caea Improve the validateRangeRequestCapabilities unit-tests
A number of these unit-tests didn't actually cover the intended code-paths, since many of them *accidentally* matched the "file size is smaller than two range requests"-check.

The patch also updates `validateRangeRequestCapabilities` to use return-value names that are consistent with the class fields used in the various stream implementations.
2026-03-08 18:28:50 +01:00
calixteman
baf8647b1f
Add the possibility to merge/update acroforms when merging/extracting (bug 2015853) 2026-03-07 19:03:02 +01:00
Jonas Jenwald
229e3642be Change the Dict.prototype.getRawValues method to return an iterator
This method is usually used with loops, and it should be a tiny bit more efficient to use an iterator directly rather than first iterating through ` Map`-values to create a temporary `Array` that we finally iterate through at the call-site.

Note that the `getRawValues` method is old code, and originally the `Dict` class stored its data in a regular `Object`, hence why the old code was written that way.
2026-03-04 16:07:49 +01:00
Jonas Jenwald
58996f21b2 Change the Dict.prototype.getKeys method to return an iterator
This method is usually used with loops, and it should be a tiny bit more efficient to use an iterator directly rather than first iterating through ` Map`-keys to create a temporary `Array` that we finally iterate through at the call-site.

Note that the `getKeys` method is old code, and originally the `Dict` class stored its data in a regular `Object`, hence why the old code was written that way.
2026-03-04 16:07:49 +01:00
calixteman
ed390c06a1
Fix intermittent issue with a unit test
Avoid to rely on timing in the test, which can cause intermittent failures.
Instead, we check that the image is cached at the document/page level.
2026-03-01 22:59:04 +01:00
Tim van der Meij
1861a4c4ad
Merge pull request #20756 from Snuffleupagus/PDFDataRangeTransport-tests
Improve the `PDFDataRangeTransport` unit-tests
2026-03-01 20:10:34 +01:00
Jonas Jenwald
fecb0aab1d Improve the PDFDataRangeTransport unit-tests
- Add a new test using only streaming, since that was missing and the lack of which most likely contributed to previous bugs in the `PDFDataRangeTransport` implementation (see PR 10675 and 20634).

 - Improve the "ranges and streaming" test, to utilize both ranges *and* streaming properly, since the way it was written seemed somewhat unrealistic given how data will normally arrive when `PDFDataRangeTransport` is being used.

 - Provide more `initialData`, in relevant tests, since a length smaller than `rangeChunkSize` seem pretty pointless.

 - Test the `contentDispositionFilename`, and `contentLength`, handling in the `PDFDataRangeTransport` implementation.
2026-02-27 14:55:39 +01:00
Jeff Muizelaar
8fa6ef36e4 Remove scientific notation parsing.
This behaviour comes from the initial pdf.js commit but is wrong and
doesn't match other PDF readers like muPDF or pdfium.

From PDF Spec 7.3.3:

A PDF writer shall not use the PostScript language syntax for numbers with non-decimal radices (such
as 16#FFFE) or in exponential format (such as 6.02E23).
2026-02-26 20:22:34 -05:00
calixteman
bc8efa190c
Merge pull request #20719 from calixteman/update_jasmine
Update Jasmine to version 6.0.0
2026-02-25 09:56:54 +01:00
calixteman
ab7629871a
Update Jasmine to version 6.0.0
It fixes #20715.

`failedExpectations` was removed from `suiteStarted` and `specStarted` events.
HtmlReporter and HtmlSpecFilter have been deprecated and removed.
2026-02-24 23:30:48 +01:00
Jonas Jenwald
0d4e587a5f Reduce allocations when using Map.prototype.getOrInsert() with Arrays
Change all these cases to use `Map.prototype.getOrInsertComputed()` instead, in combination with a helper function for creating the `Array`s (similar to the previous patch).
2026-02-24 09:03:32 +01:00
Jonas Jenwald
2e07715c9d Reduce function creation when using Map.prototype.getOrInsertComputed()
With the exception of the first invocation the callback function is unused, which means that a lot of pointless functions may be created.
To avoid this we introduce helper functions for simple cases, such as creating `Map`s and `Objects`s.
2026-02-24 08:58:28 +01:00
Calixte Denizet
0bb59f15cb
Add some unit tests for functions in image_utils.js 2026-02-20 22:43:42 +01:00
Jonas Jenwald
8ba83e73fa Start using Response.prototype.bytes() in the code-base
In all cases where we currently use `Response.prototype.arrayBuffer()` the result is immediately wrapped in a `Uint8Array`, which can be avoided by instead using the newer `Response.prototype.bytes()` method; see https://developer.mozilla.org/en-US/docs/Web/API/Response/bytes
2026-02-12 11:20:05 +01:00
calixteman
4b4ab10c54
Set a pages mapper per loaded document
It fixes #20629.
2026-02-08 21:09:27 +01:00
calixteman
22b97d1741
Flush the text content chunk only on real font changes (bug 2013793) 2026-02-03 23:11:31 +01:00
Jonas Jenwald
bfd17b2586
Merge pull request #20615 from Snuffleupagus/transport-onProgress
Report loading progress "automatically" when using the `PDFDataTransportStream` class, and remove the `PDFDataRangeTransport.prototype.onDataProgress` method
2026-02-01 22:36:43 +01:00
Jonas Jenwald
d152e92185
Merge pull request #20614 from Snuffleupagus/BasePDFStream-url
Change all relevant `BasePDFStream` implementations to take an actual `URL` instance
2026-02-01 22:13:28 +01:00
Tim van der Meij
3f21efc942
Merge pull request #20607 from Snuffleupagus/rm-web-interfaces
Replace the various interfaces in `web/interfaces.js` with proper classes
2026-02-01 20:31:13 +01:00
Jonas Jenwald
586e85888b Change all relevant BasePDFStream implementations to take an actual URL instance
Currently this code expects a "url string", rather than a proper `URL` instance, which seems completely unnecessary now. The explanation for this is, as so often is the case, "historical reasons" since a lot of this code predates the general availability of `URL`.
2026-02-01 18:21:13 +01:00
Jonas Jenwald
d25f13d1fd Report loading progress "automatically" when using the PDFDataTransportStream class, and remove the PDFDataRangeTransport.prototype.onDataProgress method
This is consistent with the other `BasePDFStream` implementations, and simplifies the API surface of the `PDFDataRangeTransport` class (note the changes in the viewer).
Given that the `onDataProgress` method was changed to a no-op this won't affect third-party users, assuming there even are any since this code was written specifically for the Firefox PDF Viewer.
2026-02-01 18:20:19 +01:00
Jonas Jenwald
023af46186 Replace the IRenderableView interface with an abstract RenderableView class
This should help reduce the maintenance burden of the code, since you no longer need to remember to update separate code when touching the different page/thumbnail classes.
2026-02-01 17:56:06 +01:00
Tim van der Meij
384c6208b2
Merge pull request #20565 from kairosci/fix-bug-20557
fix: Fix mailto links truncated at dash
2026-02-01 17:34:34 +01:00
Jonas Jenwald
ecb09d62fc Add the current loading percentage to the onPassword callback
The percentage calculation is currently "spread out" across various viewer functionality, which we can avoid by having the API handle that instead.

Also, remove the `this.#lastProgress` special-case[1] and just register a "normal" `fullReader.onProgress` callback unconditionally. Once `headersReady` is resolved the callback can simply be removed when not needed, since the "worst" thing that could theoretically happen is that the loadingBar (in the viewer) updates sooner this way. In practice though, since `fullReader.read` cannot return data until `headersReady` is resolved, this change is not actually observable in the API.

---

[1] This was added in PR 8617, close to a decade ago, but it's not obvious to me that it was ever necessary to implement it that way.
2026-01-31 16:33:58 +01:00
Jonas Jenwald
4ca205bac3 Add an abstract BasePDFStreamRangeReader class, that all the old IPDFStreamRangeReader implementations inherit from
Given that there's no less than *five* different, but very similar, implementations this helps reduce code duplication and simplifies maintenance.
2026-01-30 14:15:39 +01:00