309 Commits

Author SHA1 Message Date
Tim van der Meij
4ecbd0cbe2
Merge pull request #20726 from Snuffleupagus/getOrInsertComputed-fewer-functions
Reduce allocations and function creation when using `getOrInsert` and `getOrInsertComputed`
2026-02-24 23:32:36 +01:00
Jonas Jenwald
0d4e587a5f Reduce allocations when using Map.prototype.getOrInsert() with Arrays
Change all these cases to use `Map.prototype.getOrInsertComputed()` instead, in combination with a helper function for creating the `Array`s (similar to the previous patch).
2026-02-24 09:03:32 +01:00
Calixte Denizet
97d973ce09
After cut & paste, the thumbnail must be correctly rendered (bug 2018162) 2026-02-23 18:38:33 +01:00
Jonas Jenwald
210c969c4c Use Map.prototype.getOrInsert() in the #collectFieldObjects method 2026-02-21 11:23:32 +01:00
Tim van der Meij
471adfd023
Merge pull request #20596 from Snuffleupagus/FileSpec-fixes
Simplify the `FileSpec` class, and remove no longer needed polyfills
2026-01-29 22:03:38 +01:00
Jonas Jenwald
5b368dd58a Remove the Uint8Array.prototype.toHex(), Uint8Array.prototype.toBase64(), and Uint8Array.fromBase64() polyfills
(During rebasing of the previous patches I happened to look at the polyfills and noticed that this one could be removed now.)

See:
 - https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Uint8Array/toHex#browser_compatibility
 - https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Uint8Array/toBase64#browser_compatibility
 - https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Uint8Array/fromBase64#browser_compatibility

Note that technically this functionality can still be disabled via a preference in Firefox, however that's slated for removal in [bug 1985120](https://bugzilla.mozilla.org/show_bug.cgi?id=1985120).
Looking at the Firefox source-code, see https://searchfox.org/firefox-main/search?q=array.tobase64%28%29&path=&case=false&regexp=false, you can see that it's already being used *unconditionally* elsewhere in the browser hence removing the polyfills ought to be fine (since toggling the preference would break other parts of the browser).
2026-01-29 17:27:43 +01:00
Calixte Denizet
806133379e
Refactor a bit page mapping stuff in order to be able to support delete/copy pages 2026-01-26 16:53:52 +01:00
calixteman
9f660be8a2
Use DecompressionStream in async code
Usually, content stream or fonts are compressed using FlateDecode.
So use the DecompressionStream API to decompress those streams
in the async code path.
2026-01-25 14:22:19 +01:00
Calixte Denizet
b5ed988267
Don't use contents stream which have an image format
The original bug has been filled in mupdf bug tracker:
https://bugs.ghostscript.com/show_bug.cgi?id=709033

The attached pdf can be open in Chrome but not in Acrobat.
2026-01-13 18:39:17 +01:00
Calixte Denizet
bc87f4e8d6 Add the possibility to create a pdf from different ones (bug 1997379)
For now it's just possible to create a single pdf in selecting some pages in different pdf sources.
The merge is for now pretty basic (it's why it's still a WIP) none of these data are merged for now:
 - the struct trees
 - the page labels
 - the outlines
 - named destinations
For there are 2 new ref tests where some new pdfs are created: one with some extracted pages and an other
one (encrypted) which is just rewritten.
The ref images are generated from the original pdfs in selecting the page we want and the new images are
taken from the generated pdfs.
2025-11-07 14:57:48 +01:00
calixteman
aeceee1df3
Revert "Add some telemetry in order to know what are the certificates used in pdfs (bug 1973573)" 2025-10-29 15:41:34 +01:00
Calixte Denizet
ebc3411727 Use the cached annotations when collecting them by types 2025-08-21 18:04:00 +02:00
Calixte Denizet
9e5ee1e5a7 [Editor] Add the ability to get all the editable annotations in a pdf document
We want to be able to show all the comments in a pdf even if the pages where they are
haven't been rendered.
And it'll help to fix the issue #18915.
2025-08-18 21:31:11 +02:00
Calixte Denizet
8fc51dc089 [Editor] Add the possibility to add a popup to an annotation when saving
When saving/printing, only update the properties which are provided and set
a default value only when there is no pre-existing one.
2025-07-11 21:42:21 +02:00
Calixte Denizet
194e2ede4d Add some telemetry in order to know what are the certificates used in pdfs (bug 1973573) 2025-06-24 22:23:29 +02:00
Calixte Denizet
3bdc5d54fe Get the text under highlight/squiggly/underline/strikethrough annotations (bug 1885505)
and add an invisible element containing the text in the annotation layer to make
it readable by a screen reader.
2025-06-22 21:47:29 +02:00
calixteman
293506ada7
Merge pull request #19903 from Snuffleupagus/shorten-fieldObjects-getter
Shorten the `PDFDocument.prototype.fieldObjects` getter slightly
2025-05-09 15:49:51 +02:00
Jonas Jenwald
1f7581b5c6 Shorten the PDFDocument.prototype.fieldObjects getter slightly
The effect is probably not even measurable, however this patch ever so slightly reduces the asynchronicity in the `fieldObjects` getter. These changes should be safe since:

 - We're inside of the `PDFDocument`-class and the `annotationGlobals`-getter, which will always return a (shadowed) Promise and won't throw `MissingDataException`s, can be accessed directly without going through the `BasePdfManager`-instance.

 - The `acroForm`-dictionary can be accessed through the `annotationGlobals`-data, removing the need to "manually" look it up and thus the need for using `Promise.all` here.

 - We can also lookup the /Fields-data, in the `acroForm`-dictionary, synchronously since the initial `formInfo.hasFields` check guarantees that it's available.
2025-05-07 17:47:09 +02:00
Jonas Jenwald
36fafbc05c Use object destructuring a bit more in the src/core/document.js file 2025-05-07 13:41:50 +02:00
Jonas Jenwald
92b065c87e Replace a number of semi-private fields with actual private ones in src/core/document.js
These are fields that can be moved out of their class constructors, and be initialized directly.
2025-05-07 13:41:44 +02:00
Jonas Jenwald
39803a9f25 Replace a number of semi-private methods with actual private ones in src/core/document.js
There's a few remaining cases that are used with either cached getters or `BasePdfManager.prototype.ensure`-methods, and those cannot be converted.
2025-05-07 13:41:36 +02:00
Jonas Jenwald
0ded85e9b3 Add a Page helper method to create a PartialEvaluator-instance
Currently we repeat the same identical code five times in the `Page`-class when creating a `PartialEvaluator`-instance, which given the number of parameters it needs seems like unnecessary duplication.
2025-05-07 13:41:29 +02:00
Jonas Jenwald
62009ffa70 Simplify how the ObjectLoader is used
The `ObjectLoader.prototype.load` method has a fast-path, which avoids any lookup/parsing if the entire PDF document is already loaded.
However, we still need to create an `ObjectLoader`-instance which seems unnecessary in that case.

Hence we introduce a *static* `ObjectLoader.load` method, which will help avoid creating `ObjectLoader`-instances needlessly and also (slightly) shortens the call-sites.
To ensure that the new method will be used, we extend the `no-restricted-syntax` ESLint rule to "forbid" direct usage of `new ObjectLoader()`.
2025-05-06 15:49:59 +02:00
Jonas Jenwald
d9548b1c18 Slightly re-factor how we pre-load fonts and images in XFA documents
Rather than "manually" invoking the methods from the `src/core/worker.js` file we introduce a single `PDFDocument`-method that handles this for us, and make the current methods private.
Since this code is only invoked at most *once* per document, and only for XFA documents, we can use `BasePdfManager.prototype.ensureDoc` directly rather than needing a stand-alone method.
2025-05-04 13:44:33 +02:00
Jonas Jenwald
604153957a Reduce duplication when parsing fonts in loadXfaFonts
Currently we repeat virtually the same code when calling the `PartialEvaluator.prototype.handleSetFont` method, which we can avoid by introducing an inline helper function.
2025-05-04 13:42:17 +02:00
Tim van der Meij
5ca57fbd4b
Merge pull request #19885 from Snuffleupagus/loadXfaImages-simplify
Simplify the `loadXfaImages` method and related code
2025-05-04 13:41:06 +02:00
Jonas Jenwald
b531720d9c Simplify the serializeXfaData method and related code
Rather than having a dedicated `BasePdfManager`-method for this one call-site we can instead change `PDFDocument.prototype.serializeXfaData` to a non-async method, that we invoke via `BasePdfManager.prototype.ensureDoc`.
2025-05-03 11:20:42 +02:00
Jonas Jenwald
122822a750 Simplify the loadXfaImages method and related code
Currently we create an intermediate `Dict` during parsing, however that seems unnecessary since (note especially the second point):
 - The `NameOrNumberTree.prototype.getAll` method will already resolve any references, as needed, during parsing.
 - The `Catalog.prototype.xfaImages` getter is invoked, via the `BasePdfManager`-instance, such that any `MissingDataException`s are already handled correctly.
2025-05-02 11:53:41 +02:00
Jonas Jenwald
312c85bfd6
Merge pull request #19815 from Snuffleupagus/getMergedResources-size
Ensure that "local" /Contents stream-dict /Resources aren't empty (PR 19803 follow-up)
2025-04-25 10:46:04 +02:00
Jonas Jenwald
76f23ce3b5 Catch, and ignore, errors during Page.prototype.getStructTree
This way any errors thrown during parsing of the page-structTree will not be forwarded to the viewer.
2025-04-17 13:57:30 +02:00
Jonas Jenwald
245d9ba925 Ensure that "local" /Contents stream-dict /Resources aren't empty (PR 19803 follow-up)
This is a small, and quite possibly pointless, optimization which ensures that any "local" /Resources aren't empty, to avoid needlessly trying to load and merge dictionaries.
2025-04-14 09:58:15 +02:00
Jonas Jenwald
834423b51d Add more logical assignment in the src/ folder
This patch uses nullish coalescing assignment in cases where it's immediately obvious from surrounding code that doing so is safe, and logical OR assignment elsewhere (mostly the changes in XFA code).
2025-04-12 17:28:33 +02:00
Jonas Jenwald
1c80412f61 Change PDFDocument.prototype._xfaStreams to return a Map
Using a `Map` rather than an `Object` is a nicer, since it has better support for both iteration and checking if a key exists.
We also change the initial values to be `null`, rather than empty strings, and reduce duplication when creating the `Map`.

*Please note:* Since this is worker-thread code, these changes are "invisible" at the API-level.
2025-04-12 12:47:22 +02:00
Jonas Jenwald
7a94fafd30 Prefer /Resources from the /Contents stream-dict, if available
In rare cases /Resources are also found in the /Contents stream-dict, in addition to in the /Page dict, hence we need to prefer those when available; see `issue18894.pdf`.
2025-04-11 16:54:22 +02:00
Jonas Jenwald
d00482380a Introduce more async code in the src/core/document.js file 2025-03-17 13:20:51 +01:00
Jonas Jenwald
3e8d01ad7c Move the calculateMD5 function into its own file
This allows us to remove a closure, and we also change the code to initialize various constants lazily.
2025-03-08 15:56:05 +01:00
Jonas Jenwald
7b5cd9cddd Use arrow functions with some Promise.then calls
A lot of this is fairly old code, which we can shorten slightly by using arrow functions instead of "regular" functions.
2025-03-02 19:57:38 +01:00
Jonas Jenwald
4be79748c9 Add a GlobalColorSpaceCache to reduce unnecessary re-parsing
This complements the existing `LocalColorSpaceCache`, which is unique to each `getOperatorList`-invocation since it also caches by `Name`, which should help reduce unnecessary re-parsing especially for e.g. `ICCBased` ColorSpaces once we properly support those.
2025-03-01 14:21:05 +01:00
Jonas Jenwald
d428db63c3 Improve the "FontFallback" handling on the worker-thread
Remove the `Catalog.prototype.fontFallback` method, and move its code into `PDFDocument.prototype.fontFallback` instead, to reduce the indirection a little bit.
Pass the `evaluatorOptions` directly to the `TranslatedFont.prototype.fallback` method, since nothing else in the `TranslatedFont`-class needs it now.
2025-02-24 09:34:58 +01:00
Jonas Jenwald
36979e9eb2 Fix all outstanding ESLint arrow-body-style warnings
Currently this rule is disabled in a number of spots across the code-base, and unless absolutely necessary we probably shouldn't disable linting, so let's just update the code to fix all the outstanding cases.
2025-02-17 15:45:44 +01:00
Tim van der Meij
4d4e1befeb
Merge pull request #19289 from Snuffleupagus/issue-19281
Skip LinkAnnotations when collecting field objects (issue 19281)
2025-01-04 13:32:18 +01:00
Jonas Jenwald
6f062abb76 Skip LinkAnnotations when collecting field objects (issue 19281)
The `/Root/AcroForm/Fields` array contains a "ridiculous" number of LinkAnnotations, which obviously makes no sense since those are not form fields.
To improve performance we'll thus ignore those when collecting the field objects.
2025-01-04 11:54:45 +01:00
Jonas Jenwald
74c1795c9f Use Dict iteration more (PR 19051 follow-up)
There's a few cases where we're looping through the result of `Dict.prototype.getKeys` and then manually look-up the values, which after PR 19051 can be replaced with direct iteration instead.
2025-01-02 15:09:19 +01:00
Jonas Jenwald
2c0cc48d1b Replace the forEach method in Dict with "proper" iteration support 2024-11-17 12:45:32 +01:00
Calixte Denizet
4bf7787084 Simplify saving added/modified annotations.
Having this map to collect the different changes will allow to know if some objects have already been modified.
2024-11-12 10:59:38 +01:00
Jonas Jenwald
0b864ee7d5 Shorten the Page.prototype.userUnit getter slightly 2024-11-10 16:30:07 +01:00
Jonas Jenwald
b26dc19392 Ensure that serializing of StructTree-data cannot fail during loading
I discovered that doing skip-cache re-reloading of https://opensource.adobe.com/dc-acrobat-sdk-docs/pdfstandards/PDF32000_2008.pdf would *intermittently* cause (some of) the AnnotationLayers to break with errors printed in the console (see below).

In hindsight this bug is really obvious, however it took me quite some time to find it, since the `StructTreePage.prototype.serializable` getter will lookup various data and all of those cases can fail during loading when streaming and/or range requests are being used.

Finally, to prevent any future errors, ensure that the viewer won't break in these sort of situations.

```
Uncaught (in promise)
Object { message: "Missing data [19098296, 19098297)", name: "UnknownErrorException", details: "MissingDataException: Missing data [19098296, 19098297)", stack: "BaseExceptionClosure@resource://pdf.js/build/pdf.mjs:453:29\n@resource://pdf.js/build/pdf.mjs:456:2\n" }
viewer.mjs:8801:55

\#renderAnnotationLayer: "UnknownErrorException: Missing data [17552729, 17552730)". viewer.mjs:8737:15

Uncaught (in promise)
Object { message: "Missing data [17552729, 17552730)", name: "UnknownErrorException", details: "MissingDataException: Missing data [17552729, 17552730)", stack: "BaseExceptionClosure@resource://pdf.js/build/pdf.mjs:453:29\n@resource://pdf.js/build/pdf.mjs:456:2\n" }
viewer.mjs:8801:55
```
2024-11-01 17:43:59 +01:00
Jonas Jenwald
8f47d06d07 Add helper functions to allow using new Uint8Array methods
This allows using the new methods in browsers that support them, e.g. Firefox 133+, while still providing fallbacks where necessary; see https://github.com/tc39/proposal-arraybuffer-base64

*Please note:* These are not actual polyfills, but only implements what we need in the PDF.js code-base. Eventually this patch should be reverted, once support is generally available.
2024-10-29 10:22:35 +01:00
Jonas Jenwald
f9fc477080 Improve the implementation of the PDFDocument.fingerprints-getter
- Add explicit `length` validation of the /ID entries. Given the `EMPTY_FINGERPRINT` constant we're already *implicitly* assuming a particular length.

 - Move the constants into the `fingerprints`-getter, since they're not used anywhere else.

 - Replace the `hexString` helper function with the standard `Uint8Array.prototype.toHex` method; see https://github.com/tc39/proposal-arraybuffer-base64
2024-10-29 10:22:35 +01:00
Jonas Jenwald
662bd022ce Reduce duplication in the PDFDocument.calculationOrderIds getter 2024-10-08 12:24:09 +02:00