3600 Commits

Author SHA1 Message Date
Tim van der Meij
a5a27a5ca7
Merge pull request #20705 from Snuffleupagus/#collectParents-getOrInsert
Use `Map.prototype.getOrInsert()` in the `#collectParents` method
2026-02-22 12:55:41 +01:00
Tim van der Meij
8189ca358c
Merge pull request #20703 from Snuffleupagus/#collectFieldObjects-getOrInsert
Use `Map.prototype.getOrInsert()` in the `#collectFieldObjects` method
2026-02-22 12:39:54 +01:00
Jonas Jenwald
3e7ad8d6bf Use Map.prototype.getOrInsert() in the #collectParents method 2026-02-21 11:42:42 +01:00
Jonas Jenwald
210c969c4c Use Map.prototype.getOrInsert() in the #collectFieldObjects method 2026-02-21 11:23:32 +01:00
Jonas Jenwald
76a5aed05f Use Map.prototype.getOrInsert() in the getNewAnnotationsMap helper 2026-02-21 11:03:00 +01:00
Tim van der Meij
82de22428a
Merge pull request #20660 from Snuffleupagus/ChunkedStream-async-sendRequest
Convert `ChunkedStreamManager.prototype.sendRequest` to an asynchronous method
2026-02-20 21:39:26 +01:00
calixteman
a5c62b7489
Merge pull request #20691 from Snuffleupagus/rm-unnecessary-Map-entries
Remove unnecessary `Map.prototype.entries()` usage
2026-02-20 17:44:19 +01:00
Jonas Jenwald
374f524c29 Remove unnecessary Map.prototype.entries() usage
A `Map` instance can be iterated directly with a `for...of` loop, hence using its `entries` method is not actually necessary.
2026-02-20 13:44:00 +01:00
Jonas Jenwald
7fd939763e Remove unnecessary class constructors in the src folder
There's a number of classes where the constructors can be removed completely by instead using class fields, which help to slightly shorten the code.

It seems that `unicorn/prefer-class-fields` ESLint plugin, see PR 20657, unfortunately isn't able to detect all of these cases.
2026-02-19 00:08:57 +01:00
Jonas Jenwald
e1cc24c595 Set the annotationType automatically in the Annotation constructor
Rather than assigning it manually in every extending class, we can utilize the fact that the `AnnotationType`-entries are simply the upper-case version of the `/Subtype` (when it exists) in the Annotation dictionary.
2026-02-18 14:47:42 +01:00
Jonas Jenwald
62ac1b844a
Merge pull request #20669 from Snuffleupagus/decode-truncate
Truncate too long /Decode map entries (issue 20668)
2026-02-16 20:39:12 +01:00
Jonas Jenwald
31b4612ac0 Truncate too long /Decode map entries (issue 20668) 2026-02-16 16:22:00 +01:00
Jonas Jenwald
0a9176422e Remove Object.hasOwn usage from the src/core/xref.js file
This should not be necessary, given the following checks done early during the worker initialization: c5746949ac/src/core/worker.js (L124-L141)
2026-02-15 16:39:39 +01:00
Jonas Jenwald
59fbad617b Convert ChunkedStreamManager.prototype.sendRequest to an asynchronous method
This is not only shorter, but (in my opinion) it also simplifies the code.

*Note:* In order to keep the *five* different `BasePDFStreamRangeReader` implementations consistent, we purposely don't re-factor the `PDFWorkerStreamRangeReader` class to support `for await...of` iteration.
2026-02-14 15:49:31 +01:00
Tim van der Meij
1d6307f5d4
Merge pull request #20657 from Snuffleupagus/unicorn-prefer-class-fields
Enable the `unicorn/prefer-class-fields` ESLint plugin rule
2026-02-14 15:26:28 +01:00
Tim van der Meij
f22fb6bbfb
Merge pull request #20652 from Snuffleupagus/ChunkedStream-sendRequest-skip-empty
Avoid parsing skipped range requests in `ChunkedStreamManager` (PR 10694 follow-up)
2026-02-14 13:55:31 +01:00
Jonas Jenwald
170599f1e7 Enable the unicorn/prefer-class-fields ESLint plugin rule
This leads to slightly shorter code[1] when initializing classes, and in some cases we can even remove the constructors, which shouldn't hurt; see https://github.com/sindresorhus/eslint-plugin-unicorn/blob/main/docs/rules/prefer-class-fields.md

It's probably possible to also change a lot of these class fields to private ones[2], however it's often difficult to tell at a glance if that's safe hence this patch only does this for the `PDFRenderingQueue`.

---

[1] This reduces the size of the `gulp mozcentral` output by 999 bytes, for a mostly mechanical code change.

[2] That sort of re-factoring should generally be done separately, on a class-by-class basis, to reduce the risk of regressions.
2026-02-14 12:33:34 +01:00
Jonas Jenwald
520928719c Move and re-use the stripPath helper function more
There's a couple of spots that essentially re-implement that function.
2026-02-13 17:38:21 +01:00
Jonas Jenwald
b3c07f4b3d Avoid parsing skipped range requests in ChunkedStreamManager (PR 10694 follow-up)
While we don't dispatch the actual range request after PR 10694 we still parse the returned data, which ends up being an *empty* `ArrayBuffer` and thus cannot affect the `ChunkedStream.prototype._loadedChunks` property.
Given that no actual data arrived, it's thus pointless[1] to invoke the `ChunkedStreamManager.prototype.onReceiveData` method in this case (and it also avoids sending effectively duplicate "DocProgress" messages).

---
[1] With the *possible* exception of `disableAutoFetch === false` being set, see f24768d7b4/src/core/chunked_stream.js (L499-L517) however that never happens when streaming is being used; note f24768d7b4/src/core/worker.js (L237-L238)
2026-02-12 18:01:54 +01:00
Jonas Jenwald
8ba83e73fa Start using Response.prototype.bytes() in the code-base
In all cases where we currently use `Response.prototype.arrayBuffer()` the result is immediately wrapped in a `Uint8Array`, which can be avoided by instead using the newer `Response.prototype.bytes()` method; see https://developer.mozilla.org/en-US/docs/Web/API/Response/bytes
2026-02-12 11:20:05 +01:00
Jonas Jenwald
6a3d5fea6c Replace a few cases of "manual" font name normalization with the normalizeFontName helper function 2026-02-08 16:56:50 +01:00
Jonas Jenwald
e9c509aca9 Normalize the font name in getBaseFontMetrics (issue 20246)
We tried to lookup the font metrics using the font name as-is, which didn't work since the PDF file in question has non-embedded fonts with names that include commas.
Hence the font names need to be normalized here as well, similar to elsewhere in the font code.
2026-02-08 16:56:15 +01:00
calixteman
58ac273f1f
Merge pull request #20503 from andriivitiv/Fix-Worker-was-terminated-error
Fix `Worker was terminated` error when loading is cancelled
2026-02-06 09:59:05 +01:00
calixteman
b92bdf80a2
Merge pull request #20628 from calixteman/bug2014399
Cap the max canvas dimensions in order to avoid to downscale large images in the worker (bug 2014399)
2026-02-06 09:42:04 +01:00
Tim van der Meij
f302323c7e
Merge pull request #20627 from Snuffleupagus/ChunkedStream-onReceiveData-rm-copy
Improve progress reporting in `ChunkedStreamManager`, and prevent unnecessary data copy in `ChunkedStream.prototype.onReceiveData`
2026-02-05 21:27:42 +01:00
Calixte Denizet
ff42c0bd50
Cap the max canvas dimensions in order to avoid to downscale large images in the worker (bug 2014399) 2026-02-05 20:25:36 +01:00
Jonas Jenwald
b3cd042ded Prevent unnecessary data copy in ChunkedStream.prototype.onReceiveData
This method is only invoked via `ChunkedStreamManager.prototype.sendRequest`, which currently returns data in `Uint8Array` format (since it potentially combines multiple `ArrayBuffer`s).
Hence we end up doing a short-lived, but still completely unnecessary, data copy[1] in `ChunkedStream.prototype.onReceiveData` when handling range requests. In practice this is unlikely to be a big problem by default, given that streaming is used and the (low) value of the `rangeChunkSize` API-option. (However, in custom PDF.js deployments it might affect things more.)

Given that no data copy is better than a short lived one, let's fix this small oversight and add non-production `assert`s to keep it working as intended.
This way we also improve consistency, since all other streaming and range request methods (see e.g. `BasePDFStream` and related code) only return `ArrayBuffer` data.

---
[1] Remember that `new Uint8Array(arrayBuffer)` only creates a view of the underlying `arrayBuffer`, whereas `new Uint8Array(typedArray)` actually creates a copy of the `typedArray`.
2026-02-05 16:16:36 +01:00
Jonas Jenwald
01deb085f8 Improve progress reporting in the ChunkedStreamManager
Currently there's two small bugs, which have existed around a decade, in the `loaded` property that's sent via the "DocProgress" message from the `ChunkedStreamManager.prototype.onReceiveData` method.

 - When the entire PDF has loaded the `loaded` property can become larger than the `total` property, which obviously doesn't make sense.
   This happens whenever the size of the PDF is *not* a multiple of the `rangeChunkSize` API-option, which is a very common situation.

 - When streaming is being used, the `loaded` property can become smaller than the actually loaded amount of data.
   This happens whenever the size of a streamed chunk is *not* a multiple of the `rangeChunkSize` API-option, which is a common situation.
2026-02-05 16:04:45 +01:00
calixteman
22b97d1741
Flush the text content chunk only on real font changes (bug 2013793) 2026-02-03 23:11:31 +01:00
calixteman
1c12b07726
Merge pull request #20613 from calixteman/ccittfax_pdfium
Use the ccittfax decoder from pdfium
2026-02-02 15:07:53 +01:00
calixteman
88c2051698
Use the ccittfax decoder from pdfium
The decoder is a dependency of the jbig2 one and is already
included in pdf.js, so we just need to wire it up.
It improves the performance of documents using ccittfax images.
2026-02-02 11:10:32 +01:00
Tim van der Meij
f4326e17c4
Merge pull request #20610 from calixteman/brotli
Add support for Brotli decompression
2026-02-01 20:41:06 +01:00
calixteman
43273fde27
Add support for Brotli decompression
For now, `BrotliDecode` hasn't been specified but it should be in a
close future.
So when it's possible we use the native `DecompressionStream` API
with "brotli" as argument.
If that fails or if we've to decompress in a sync context, we fallback
to `BrotliStream` which a pure js implementation (see README in external/brotli).
2026-01-31 16:25:53 +01:00
Jonas Jenwald
4ca205bac3 Add an abstract BasePDFStreamRangeReader class, that all the old IPDFStreamRangeReader implementations inherit from
Given that there's no less than *five* different, but very similar, implementations this helps reduce code duplication and simplifies maintenance.
2026-01-30 14:15:39 +01:00
Jonas Jenwald
54d8c5e7b4 Add an abstract BasePDFStreamReader class, that all the old IPDFStreamReader implementations inherit from
Given that there's no less than *five* different, but very similar, implementations this helps reduce code duplication and simplifies maintenance.

Also, remove the `rangeChunkSize` not defined checks in all the relevant stream-constructor implementations.
Note how the API, since some time, always validates *and* provides that parameter when creating a `BasePDFStreamReader`-instance.
2026-01-30 14:15:39 +01:00
Jonas Jenwald
4a8fb4dde1 Add an abstract BasePDFStream class, that all the old IPDFStream implementations inherit from
Given that there's no less than *five* different, but very similar, implementations this helps reduce code duplication and simplifies maintenance.

Also, spotted during rebasing, pass the `enableHWA` option "correctly" (i.e. as part of the existing `transportParams`) to the `WorkerTransport`-class to keep the constructor simpler.
2026-01-30 14:15:39 +01:00
Jonas Jenwald
a80f10ff1a Remove the onProgress callback from the IPDFStreamRangeReader interface
Note how there's *nowhere* in the code-base where the `IPDFStreamRangeReader.prototype.onProgress` callback is actually being set and used, however the loadingBar (in the viewer) still works just fine since loading progress is already reported via:
 - The `ChunkedStreamManager` instance respectively the `getPdfManager` function, through the use of a "DocProgress" message, on the worker-thread.
 - A `IPDFStreamReader.prototype.onProgress` callback, on the main-thread.

Furthermore, it would definitely *not* be a good idea to add any `IPDFStreamRangeReader.prototype.onProgress` callbacks since they only include the `loaded`-property which would trigger the "indeterminate" loadingBar (in the viewer).

Looking briefly at the history of this code it's not clear, at least to me, when this became unused however it's probably close to a decade ago.
2026-01-30 14:15:39 +01:00
Jonas Jenwald
05b78ce03c Stop registering an onProgress callback on the PDFWorkerStreamRangeReader-instance, in the ChunkedStreamManager class
Given that nothing in the `PDFWorkerStreamRangeReader` class attempts to invoke the `onProgress` callback, this is effectively dead code now.
Looking briefly at the history of this code it's not clear, at least to me, when this became unused however it's probably close to a decade ago.

Finally, note also how progress is already being reported through the `ChunkedStreamManager.prototype.onReceiveData` method.
2026-01-30 14:15:38 +01:00
Jonas Jenwald
987265720e Remove the unused IPDFStreamRangeReader.prototype.isStreamingSupported getter
This getter was only invoked from `src/display/network.js` and `src/core/chunked_stream.js`, however in both cases it's hardcoded to `false` and thus isn't actually needed.
This originated in PR 6879, close to a decade ago, for a potential TODO which was never implemented and it ought to be OK to just simplify this now.
2026-01-30 14:15:38 +01:00
Jonas Jenwald
62d5408cf0 Stop tracking progressiveDataLength in the ChunkedStreamManager class
Currently this property is essentially "duplicated", so let's instead use the identical one that's availble on the `ChunkedStream` instance.
2026-01-30 14:15:38 +01:00
Tim van der Meij
471adfd023
Merge pull request #20596 from Snuffleupagus/FileSpec-fixes
Simplify the `FileSpec` class, and remove no longer needed polyfills
2026-01-29 22:03:38 +01:00
Jonas Jenwald
5b368dd58a Remove the Uint8Array.prototype.toHex(), Uint8Array.prototype.toBase64(), and Uint8Array.fromBase64() polyfills
(During rebasing of the previous patches I happened to look at the polyfills and noticed that this one could be removed now.)

See:
 - https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Uint8Array/toHex#browser_compatibility
 - https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Uint8Array/toBase64#browser_compatibility
 - https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Uint8Array/fromBase64#browser_compatibility

Note that technically this functionality can still be disabled via a preference in Firefox, however that's slated for removal in [bug 1985120](https://bugzilla.mozilla.org/show_bug.cgi?id=1985120).
Looking at the Firefox source-code, see https://searchfox.org/firefox-main/search?q=array.tobase64%28%29&path=&case=false&regexp=false, you can see that it's already being used *unconditionally* elsewhere in the browser hence removing the polyfills ought to be fine (since toggling the preference would break other parts of the browser).
2026-01-29 17:27:43 +01:00
Jonas Jenwald
b3f35b6007 Return the rawFilename as-is even if it's empty, from FileSpec.prototype.serializable
It's more correct to return the `rawFilename` as-is, and limit the fallback for empty filenames to only the `filename` property.
2026-01-29 17:25:44 +01:00
Calixte Denizet
806133379e
Refactor a bit page mapping stuff in order to be able to support delete/copy pages 2026-01-26 16:53:52 +01:00
calixteman
9f660be8a2
Use DecompressionStream in async code
Usually, content stream or fonts are compressed using FlateDecode.
So use the DecompressionStream API to decompress those streams
in the async code path.
2026-01-25 14:22:19 +01:00
Jonas Jenwald
640a3106d5 Remove caching/shadowing from the FileSpec getters, and simplify the code
Given that only the `FileSpec.prototype.serializable` getter is ever invoked from "outside" of the class, and only once per `FileSpec`-instance, the caching/shadowing isn't actually necessary.

Furthermore the `_contentRef`-caching wasn't actually correct, since it ended up storing a `BaseStream`-instance and those should *generally* never be cached.
(Since calling `BaseStream.prototype.getBytes()` more than once, without resetting the stream in between, will return an empty TypedArray after the first time.)
2026-01-25 13:16:29 +01:00
Jonas Jenwald
84b5866853 Reduce duplication in the pickPlatformItem helper function
Also, tweak code/comment used when handling "GoToR" destinations.
2026-01-25 13:16:06 +01:00
calixteman
ce296d8d42
Add the possibility to order the pages in an extracted pdf (bug 1997379)
or in a merged one.
2026-01-19 18:58:23 +01:00
Calixte Denizet
b5ed988267
Don't use contents stream which have an image format
The original bug has been filled in mupdf bug tracker:
https://bugs.ghostscript.com/show_bug.cgi?id=709033

The attached pdf can be open in Chrome but not in Acrobat.
2026-01-13 18:39:17 +01:00
calixteman
eab33828a9
Fix wasm url issue for the jbig2 decoder
and add a test for jbig2 decoding with the js decoder.
2026-01-04 00:08:59 +01:00