300 Commits

Author SHA1 Message Date
Calixte Denizet
04272de41d
Add the possibility to save added annotations when reorganizing a pdf (bug 2023086) 2026-03-20 10:55:47 +01:00
Jonas Jenwald
09a9a7bd0b [api-minor] Remove the length parameter from getDocument
This is an old API-parameter that is now unused within the PDF.js project itself, and its description says that it's (partly) being used for "range requests operations".
Note that the `length` API-parameter is used to set the *initial* `contentLength` in various `BasePDFStreamReader` implementations, however it's always overridden by the "Content-Length" header (sent by the server) when that one exists *and* is a valid number. While we currently fallback to the keep the initial `contentLength` otherwise, note however how in that case range requests will always be *disabled* and thus the only spot in the code-base [where `fullReader.contentLength` is necessary](873378b718/src/core/worker.js (L230-L236)) cannot actually be reached.

Hence the only possible reason to use the `length` API-parameter would be for improved progress reporting[1] during streaming of PDF data in rare cases where the "Content-Length" header is missing/invalid, but the user *somehow* has information from another source about the correct `length` of the PDF document.
That situation feels very much like an edge-case, but it's obviously impossible to know if someone is depending on it. However, please note that there's a work-around available for users affected by this removal:
 - Implement a `PDFDataRangeTransport` instance together with custom data-fetching[2], since in that case its `length`-parameter will always be used as-is.

Finally, updates various `BasePDFStreamReader` implementations to only set the `_isRangeSupported` field once the headers are available (since previously we'd just overwrite the "initial" value anyway).

---

[1] I.e. to avoid the "indeterminate" loadingBar being displayed in the viewer.

[2] This is what e.g. the Firefox PDF Viewer uses.
2026-03-13 23:42:45 +01:00
Jonas Jenwald
60d6abdf4f A couple of small improvements of the new internal viewer
- Mention the internal viewer in the README, such that it's easier to find.

 - Implement a new `INTERNAL_VIEWER` define, such that it's easier to limit code to only the "internal-viewer" gulp target.

 - Only include the "GetRawData" message-handler when needed. Note that the `MessageHandler` [already throws](eb159abd6a/src/shared/message_handler.js (L121-L123)) for any missing handler.

 - Move the various new helper functions from `src/core/document.js` and into their own file. The reasons for doing this are:
    - That file is already quite large and complex as-is, and these helper functions are slightly orthogonal to its main functionality.
    - Babel isn't able to remove all of the new code, and by moving this into a separate file we can guarantee that no extra code ends up in e.g. Firefox.
2026-03-10 23:41:35 +01:00
calixteman
9d81fafa8c
Add a new internal viewer to explore the structure of PDF files.
The one from pdf.js.utils is a bit too old: a lot of bugs have been fixed
in the code that parses PDF files since then.
It's just an internal development tool, so it doesn't need to be perfect,
but it should be good enough to be useful.
2026-03-09 14:16:12 +01:00
Jonas Jenwald
ddd69ce4e0 Remove the "DocProgress" loaded fallback from the getPdfManager function
Falling back to use the `loaded` byteLength if the server `contentLength` is unknown doesn't make a lot of sense, since it'd lead to the `onProgress` callback reporting `percent === 100` repeatedly while the document is loading despite that being obviously wrong.
Instead we'll now report `percent === NaN` in that case, thus showing the indeterminate progressBar, which seems more correct if the `contentLength` is unknown.

Please note that this code-path is normally not even reached, since streaming is enabled by default (applies e.g. to the Firefox PDF Viewer).
2026-03-08 10:22:01 +01:00
Jonas Jenwald
7f4e29ed22 Change the "Terminate" worker-thread handler to an asynchronous function
This is a tiny bit shorter, which cannot hurt.
2026-03-06 11:24:12 +01:00
Jonas Jenwald
e8ab3cb335 Convert the data reading in getPdfManager to be asynchronous
This is not only shorter, but (in my opinion) it also simplifies the code.

*Note:* In order to keep the *five* different `BasePDFStreamReader` implementations consistent, we purposely don't re-factor the `PDFWorkerStreamReader` class to support `for await...of` iteration.
2026-03-05 22:50:26 +01:00
calixteman
58ac273f1f
Merge pull request #20503 from andriivitiv/Fix-Worker-was-terminated-error
Fix `Worker was terminated` error when loading is cancelled
2026-02-06 09:59:05 +01:00
Jonas Jenwald
4a8fb4dde1 Add an abstract BasePDFStream class, that all the old IPDFStream implementations inherit from
Given that there's no less than *five* different, but very similar, implementations this helps reduce code duplication and simplifies maintenance.

Also, spotted during rebasing, pass the `enableHWA` option "correctly" (i.e. as part of the existing `transportParams`) to the `WorkerTransport`-class to keep the constructor simpler.
2026-01-30 14:15:39 +01:00
Calixte Denizet
806133379e
Refactor a bit page mapping stuff in order to be able to support delete/copy pages 2026-01-26 16:53:52 +01:00
Andrii Vitiv
9677798ba0
Fix Worker was terminated error when loading is cancelled
Fixes https://github.com/mozilla/pdf.js/issues/11595, where cancelling loading with `loadingTask.destroy()` before it finishes throws a `Worker was terminated` error that CANNOT be caught.

When worker is terminated, an error is thrown here:

6c746260a9/src/core/worker.js (L374)

Then `onFailure` runs, in which we throw again via `ensureNotTerminated()`. However, this second error is never caught (and cannot be), resulting in console spam.

There is no need to throw any additional errors since the termination is already reported [here](6c746260a9/src/core/worker.js (L371-L373)), and `onFailure` is supposed to handle errors, not throw them.
2025-12-14 18:15:10 +02:00
Calixte Denizet
50c48cf11b Add telemetry for tagged pdfs (bug 1997134) 2025-11-17 19:47:16 +01:00
Calixte Denizet
bc87f4e8d6 Add the possibility to create a pdf from different ones (bug 1997379)
For now it's just possible to create a single pdf in selecting some pages in different pdf sources.
The merge is for now pretty basic (it's why it's still a WIP) none of these data are merged for now:
 - the struct trees
 - the page labels
 - the outlines
 - named destinations
For there are 2 new ref tests where some new pdfs are created: one with some extracted pages and an other
one (encrypted) which is just rewritten.
The ref images are generated from the original pdfs in selecting the page we want and the new images are
taken from the generated pdfs.
2025-11-07 14:57:48 +01:00
Calixte Denizet
19ff148163 Fix incremental saving with hybrid references
This patch removes some previous fixes which are now likely fixed by #17636.

Fixes #20302.
2025-10-04 18:31:55 +02:00
Calixte Denizet
9e5ee1e5a7 [Editor] Add the ability to get all the editable annotations in a pdf document
We want to be able to show all the comments in a pdf even if the pages where they are
haven't been rendered.
And it'll help to fix the issue #18915.
2025-08-18 21:31:11 +02:00
Jonas Jenwald
d9548b1c18 Slightly re-factor how we pre-load fonts and images in XFA documents
Rather than "manually" invoking the methods from the `src/core/worker.js` file we introduce a single `PDFDocument`-method that handles this for us, and make the current methods private.
Since this code is only invoked at most *once* per document, and only for XFA documents, we can use `BasePdfManager.prototype.ensureDoc` directly rather than needing a stand-alone method.
2025-05-04 13:44:33 +02:00
Jonas Jenwald
b531720d9c Simplify the serializeXfaData method and related code
Rather than having a dedicated `BasePdfManager`-method for this one call-site we can instead change `PDFDocument.prototype.serializeXfaData` to a non-async method, that we invoke via `BasePdfManager.prototype.ensureDoc`.
2025-05-03 11:20:42 +02:00
Jonas Jenwald
91ba147317 Check that the Object.prototype hasn't been incorrectly extended (PR 11582 follow-up)
This complements, and extends, the existing check of the `Array.prototype` in the worker-thread.
To simplify the implementation we'll now abort immediately, rather than collecting all "bad" properties.
2025-04-18 12:19:29 +02:00
Jonas Jenwald
dad6febc39 Pass the /Info-strings as a Map to the src/core/writer.js code
We want to iterate through the data in the `computeMD5` function, and `Map`s have "nicer" support for that than generic objects.
(Somewhat recently `Map` performance was improved in Firefox, however this also isn't really performance sensitive code.)
2025-04-04 13:36:13 +02:00
Jonas Jenwald
3c93d63731 Pass the XRef-instance explicitly to the StructTreeRoot class
This avoids the current situation where we're accessing it through various dictionaries, since that's a somewhat brittle solution given that in the general case a `Dict`-instance may not have the `xref`-field set (see e.g. the empty-Dict).
2025-03-25 18:04:51 +01:00
Jonas Jenwald
7b5cd9cddd Use arrow functions with some Promise.then calls
A lot of this is fairly old code, which we can shorten slightly by using arrow functions instead of "regular" functions.
2025-03-02 19:57:38 +01:00
Jonas Jenwald
6bde49a606 Reduce duplication when handling "DocException" and "PasswordRequest" messages
Rather than having to manually implement the exception-handling for the "DocException" message, we can instead re-use (and slightly extend) the existing `wrapReason` function since that one already does what we need.

Furthermore, we can also simplify handling of the "PasswordRequest" message a little bit and again re-use the `wrapReason` function.

Finally, the patch makes the following smaller changes:
 - Avoid needlessly re-creating exceptions in the `wrapReason` function.
 - Use a slightly shorter parameter name in the `wrapReason` function.
 - Remove the unused entries in the `CallbackKind`/`StreamKind` enumerations.
2024-12-26 12:55:49 +01:00
Jonas Jenwald
ec1a05c104 Add missing startWorkerTask calls in the "SaveDocument" handler
Without these calls we'll not actually wait for saving to complete when document destruction runs; compare with other `WorkerTask`-usage in this file.
While I cannot imagine that this has caused any problems for library users, the code is however not technically correct as-is.
2024-12-21 14:22:18 +01:00
Jonas Jenwald
ede589dd6e Shorten the WorkerMessageHandler class a little bit
- Use `this` in all scopes where that's possible, to avoid having to spell out `WorkerMessageHandler` everywhere.

 - Inline the `isMessagePort` helper function, since there's only a single call-site.

 - Use a static initialization block to move more code into the `WorkerMessageHandler` class itself.
2024-11-30 14:07:16 +01:00
Jonas Jenwald
8ec399d7e1 Convert the getPdfManager function to be asynchronous
This is fairly old code, and by making the function `async` we can handle initialization errors "automatically" without the need for try-catch statements.
2024-11-22 17:49:43 +01:00
Jonas Jenwald
2c0cc48d1b Replace the forEach method in Dict with "proper" iteration support 2024-11-17 12:45:32 +01:00
Calixte Denizet
4bf7787084 Simplify saving added/modified annotations.
Having this map to collect the different changes will allow to know if some objects have already been modified.
2024-11-12 10:59:38 +01:00
Jonas Jenwald
196f7d7df1 Inline the flushChunks helper function, used in getPdfManager on the worker-thread
- This helper function has only a single call-site, and the function is fairly short.

 - It'll only be invoked if range requests are *disabled*, or if the entire PDF manages to load *before* the headers are resolved (which is very unlikely).
   Hence, by default, this helper function is not invoked.

 - By inlining the code we're able to utilize the existing error-handling at the call-site, rather than having to duplicate it, which further reduces the size of this code.

Finally, while slightly unrelated, this patch also adds optional chaining in one spot in the file (PR 16424 follow-up).
2024-11-02 11:06:30 +01:00
Calixte Denizet
3103deaa44 Fix missing annotation parent in using the one from the Fields entry
Fixes #15096.
2024-10-04 20:00:19 +02:00
Tim van der Meij
c77b97daff
Update the JS/CSS files for the new Prettier/Stylelint versions 2024-07-13 16:29:47 +02:00
Jonas Jenwald
a4ffc1066c Move the internal API/Worker isEditing-state into RenderingIntentFlag
In *hindsight* this seems like a better idea, since it avoids the need to manually pass `isEditing` around as a boolean value.
Note that `RenderingIntentFlag` is *internal* functionality, not exposed in the official API, which means that it can be extended and modified as necessary.
2024-07-04 23:34:30 +02:00
Calixte Denizet
64635f3b35 [api-minor][Editor] When switching to editing mode, redraw pages containing editable annotations
Right now, editable annotations are using their own canvas when they're drawn, but
it induces several issues:
 - if the annotation has to be composed with the page then the canvas must be correctly
   composed with its parent. That means we should move the canvas under canvasWrapper
   and we should extract composing info from the drawing instructions...
   Currently it's the case with highlight annotations.
 - we use some extra memory for those canvas even if the user will never edit them, which
   the case for example when opening a pdf in Fenix.

So with this patch, all the editable annotations are drawn on the canvas. When the
user switches to editing mode, then the pages with some editable annotations are redrawn but
without them: they'll be replaced by their counterpart in the annotation editor layer.
2024-07-02 14:11:40 +02:00
Jonas Jenwald
f6cd03955b [api-minor] Move the page reference/number caching into the API
Rather than having to handle this *manually* throughout the viewer, this functionality can instead be moved into the API which simplifies the code slightly.
2024-04-29 18:54:06 +02:00
Jonas Jenwald
e4d0e84802 [api-minor] Replace the PromiseCapability with Promise.withResolvers()
This replaces our custom `PromiseCapability`-class with the new native `Promise.withResolvers()` functionality, which does *almost* the same thing[1]; please see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Promise/withResolvers

The only difference is that `PromiseCapability` also had a `settled`-getter, which was however not widely used and the call-sites can either be removed or re-factored to avoid it. In particular:
 - In `src/display/api.js` we can tweak the `PDFObjects`-class to use a "special" initial data-value and just compare against that, in order to replace the `settled`-state.
 - In `web/app.js` we change the only case to manually track the `settled`-state, which should hopefully be OK given how this is being used.
 - In `web/pdf_outline_viewer.js` we can remove the `settled`-checks, since the code should work just fine without it. The only thing that could potentially happen is that we try to `resolve` a Promise multiple times, which is however *not* a problem since the value of a Promise cannot be changed once fulfilled or rejected.
 - In `web/pdf_viewer.js` we can remove the `settled`-checks, since the code should work fine without them:
     - For the `_onePageRenderedCapability` case the `settled`-check is used in a `EventBus`-listener which is *removed* on its first (valid) invocation.
     - For the `_pagesCapability` case the `settled`-check is used in a print-related helper that works just fine with "only" the other checks.
 - In `test/unit/api_spec.js` we can change the few relevant cases to manually track the `settled`-state, since this is both simple and *test-only* code.

---
[1] In browsers/environments that lack native support, note [the compatibility data](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Promise/withResolvers#browser_compatibility), it'll be polyfilled via the `core-js` library (but only in `legacy` builds).
2024-04-01 11:42:37 +02:00
Calixte Denizet
2133da166e When updating, write the xref table in the same format as the previous one (bug 1878916)
The specs are unclear about what kind of xref table format must be used.
In checking the validity of some pdfs in the preflight tool from Acrobat
we can guess that having the same format is the correct way to do.
The pdf in the mentioned bug, after having been changed, wasn't correctly
displayed in neither Chrome nor Acrobat: it's now fixed.
2024-02-13 14:14:37 +01:00
Jonas Jenwald
37e98e39f6 Skip any whitespace after the first object in linearized PDFs (issue 17665)
This way the code is now consistent with the non-linearized branch in the `PDFDocument.startXRef` getter.
2024-02-12 22:05:36 +01:00
Calixte Denizet
f2196f7803 StructParents entry isn't required on pages with no tagged contents (bug 1855641) 2023-09-28 14:23:10 +02:00
Calixte Denizet
a8573d4e1b [Editor] Add the ability to create/update the structure tree when saving a pdf containing newly added annotations (bug 1845087)
When there is no tree, the tags for the new annotions are just put under the root element.
When there is a tree, we insert the new tags at the right place in using the value
of structTreeParentId (added in PR #16916).
2023-09-16 18:34:58 +02:00
Jonas Jenwald
ff96c413d3 Use await even more in the "SaveDocument" worker-thread handler
Given that the function is already asynchronous we can make use of `await` even more and reduce the amount of indentation a little bit.
2023-09-16 13:06:48 +02:00
Jonas Jenwald
50937a3539 Ensure that the entire PDF document is loaded *before* we begin saving it
When I started looking at PR 16938 it occurred to me that some of the new structTree-methods are synchronously accessing certain dictionary-data (not used during "normal" structTree-parsing), which may not be generally safe since everything in a dictionary could be a reference (and the relevant data may not have been loaded yet).

Rather than suggesting that we make all those new methods even more asynchronous, to me the overall simplest and safest solution is to ensure that the *entire* PDF document has been loaded *before* we begin saving it. In practice this shouldn't really affect "performance" of saving noticeably, since it's always depended on the entire PDF document being downloaded.

Finally note that with the exception of the PDF document possibly not having been fully downloaded when saving is triggered, all other "global" document properties are pretty much guaranteed to already be available at this point.
2023-09-12 13:26:57 +02:00
Jonas Jenwald
64e8557fb5 [api-minor] Deprecate the PDFDocumentProxy.getJavaScript method
This method is very old, however with the exception of the auto-print hack (when scripting is disabled) in the viewer it's never actually been used.

Most likely the idea with `PDFDocumentProxy.getJavaScript` was that it'd be useful if scripting support was added, however it turned out that it was a bit too simplistic and instead a number of new methods were added for the scripting use-cases.
2023-08-01 09:02:05 +02:00
Calixte Denizet
33fdec1392 Don't replace Acroform dictionary if nothing has changed when saving (bug 1844572) 2023-07-22 17:51:06 +02:00
Jonas Jenwald
88524bf9ae Don't reset temporary XRef-entries during saving (PR 16392 follow-up)
*Please note:* I'm not aware of any bugs caused by this, however that might be more luck than anything else.

In PR 16392 the `incrementalUpdate` function, and all of its various helpers, were made asynchronous. However the call-site in `src/core/worker.js` wasn't updated, which means that we currently reset temporary XRef-entries while saving is ongoing.
2023-07-20 15:49:59 +02:00
Jonas Jenwald
3a886e7264 Move the isNodeJS-helper into the src/shared/util.js file
With the changes in the previous patch the `isNodeJS`-helper no longer needs to live in its own file, which helps get rid of a closure in the *built* files.
2023-07-17 16:42:25 +02:00
Calixte Denizet
599b9498f2 [Editor] Add support for printing/saving newly added Stamp annotations
In order to minimize the size the of a saved pdf, we generate only one
image and use a reference in each annotation using it.
When printing, it's slightly different since we have to render each page
independantly but we use the same image within a page.
2023-06-26 15:47:05 +02:00
Calixte Denizet
71479fdd21 [Editor] Avoid to have duplicated entries in the Annot array when saving an existing and modified annotation 2023-06-15 22:02:10 +02:00
Jonas Jenwald
1753e321cd Remove the compatibility checks in WorkerMessageHandler.createDocumentHandler
For some time these checks have only targeted Node.js environments, since the features in question exist in all supported browsers (even when a `legacy`-build is used).

Now that we've updated the minimum supported Node.js version to 18, a number of polyfills are thus (finally) no longer necessary in that environment. Hence for certain *basic* functionality, such as e.g. text-extraction, it's now possible to use either a modern- or a `legacy`-build of the PDF.js library in Node.js environments.

*Please note:* For e.g. canvas-rendering in Node.js environments it's still necessary to use a `legacy`-build, since that functionality requires various polyfills.
2023-05-07 13:43:19 +02:00
Jonas Jenwald
ed8be6f882 [api-minor] Update the minimum supported Node.js version to 18
This patch updates the minimum supported environments as follows:
 - Node.js 18, which was released on 2022-04-19; see https://en.wikipedia.org/wiki/Node.js#Releases

Note also that Node.js 16 will soon reach EOL, and thus no longer receive any security updates.
2023-05-07 13:43:19 +02:00
Jonas Jenwald
d950b91c4e Introduce some logical assignment in the src/core/ folder 2023-04-29 13:49:37 +02:00
Jonas Jenwald
317abd6d07 Change the createPromiseCapability helper function into a PromiseCapability class
This is not only slightly more compact, but it also simplifies the handling of the `settled` getter.
2023-04-29 13:43:24 +02:00