Marmelator/pdf.js.mirror - pdf.js.mirror - Gitea: Git with a cup of tea

Marmelator/pdf.js.mirror

mirror of https://github.com/mozilla/pdf.js.git synced 2026-04-20 04:04:03 +02:00

Author	SHA1	Message	Date
Calixte Denizet	04272de41d	Add the possibility to save added annotations when reorganizing a pdf (bug 2023086)	2026-03-20 10:55:47 +01:00
Jonas Jenwald	09a9a7bd0b	[api-minor] Remove the `length` parameter from `getDocument` This is an old API-parameter that is now unused within the PDF.js project itself, and its description says that it's (partly) being used for "range requests operations". Note that the `length` API-parameter is used to set the initial `contentLength` in various `BasePDFStreamReader` implementations, however it's always overridden by the "Content-Length" header (sent by the server) when that one exists and is a valid number. While we currently fallback to the keep the initial `contentLength` otherwise, note however how in that case range requests will always be disabled and thus the only spot in the code-base [where `fullReader.contentLength` is necessary](`873378b718/src/core/worker.js (L230-L236)`) cannot actually be reached. Hence the only possible reason to use the `length` API-parameter would be for improved progress reporting[1] during streaming of PDF data in rare cases where the "Content-Length" header is missing/invalid, but the user somehow has information from another source about the correct `length` of the PDF document. That situation feels very much like an edge-case, but it's obviously impossible to know if someone is depending on it. However, please note that there's a work-around available for users affected by this removal: - Implement a `PDFDataRangeTransport` instance together with custom data-fetching[2], since in that case its `length`-parameter will always be used as-is. Finally, updates various `BasePDFStreamReader` implementations to only set the `_isRangeSupported` field once the headers are available (since previously we'd just overwrite the "initial" value anyway). --- [1] I.e. to avoid the "indeterminate" loadingBar being displayed in the viewer. [2] This is what e.g. the Firefox PDF Viewer uses.	2026-03-13 23:42:45 +01:00
Jonas Jenwald	60d6abdf4f	A couple of small improvements of the new internal viewer - Mention the internal viewer in the README, such that it's easier to find. - Implement a new `INTERNAL_VIEWER` define, such that it's easier to limit code to only the "internal-viewer" gulp target. - Only include the "GetRawData" message-handler when needed. Note that the `MessageHandler` [already throws](`eb159abd6a/src/shared/message_handler.js (L121-L123)`) for any missing handler. - Move the various new helper functions from `src/core/document.js` and into their own file. The reasons for doing this are: - That file is already quite large and complex as-is, and these helper functions are slightly orthogonal to its main functionality. - Babel isn't able to remove all of the new code, and by moving this into a separate file we can guarantee that no extra code ends up in e.g. Firefox.	2026-03-10 23:41:35 +01:00
calixteman	9d81fafa8c	Add a new internal viewer to explore the structure of PDF files. The one from pdf.js.utils is a bit too old: a lot of bugs have been fixed in the code that parses PDF files since then. It's just an internal development tool, so it doesn't need to be perfect, but it should be good enough to be useful.	2026-03-09 14:16:12 +01:00
Jonas Jenwald	ddd69ce4e0	Remove the "DocProgress" `loaded` fallback from the `getPdfManager` function Falling back to use the `loaded` byteLength if the server `contentLength` is unknown doesn't make a lot of sense, since it'd lead to the `onProgress` callback reporting `percent === 100` repeatedly while the document is loading despite that being obviously wrong. Instead we'll now report `percent === NaN` in that case, thus showing the indeterminate progressBar, which seems more correct if the `contentLength` is unknown. Please note that this code-path is normally not even reached, since streaming is enabled by default (applies e.g. to the Firefox PDF Viewer).	2026-03-08 10:22:01 +01:00
Jonas Jenwald	7f4e29ed22	Change the "Terminate" worker-thread handler to an asynchronous function This is a tiny bit shorter, which cannot hurt.	2026-03-06 11:24:12 +01:00
Jonas Jenwald	e8ab3cb335	Convert the data reading in `getPdfManager` to be asynchronous This is not only shorter, but (in my opinion) it also simplifies the code. Note: In order to keep the five different `BasePDFStreamReader` implementations consistent, we purposely don't re-factor the `PDFWorkerStreamReader` class to support `for await...of` iteration.	2026-03-05 22:50:26 +01:00
calixteman	58ac273f1f	Merge pull request #20503 from andriivitiv/Fix-`Worker-was-terminated`-error Fix `Worker was terminated` error when loading is cancelled	2026-02-06 09:59:05 +01:00
Jonas Jenwald	4a8fb4dde1	Add an abstract `BasePDFStream` class, that all the old `IPDFStream` implementations inherit from Given that there's no less than five different, but very similar, implementations this helps reduce code duplication and simplifies maintenance. Also, spotted during rebasing, pass the `enableHWA` option "correctly" (i.e. as part of the existing `transportParams`) to the `WorkerTransport`-class to keep the constructor simpler.	2026-01-30 14:15:39 +01:00
Calixte Denizet	806133379e	Refactor a bit page mapping stuff in order to be able to support delete/copy pages	2026-01-26 16:53:52 +01:00
Andrii Vitiv	9677798ba0	Fix `Worker was terminated` error when loading is cancelled Fixes https://github.com/mozilla/pdf.js/issues/11595, where cancelling loading with `loadingTask.destroy()` before it finishes throws a `Worker was terminated` error that CANNOT be caught. When worker is terminated, an error is thrown here: `6c746260a9/src/core/worker.js (L374)` Then `onFailure` runs, in which we throw again via `ensureNotTerminated()`. However, this second error is never caught (and cannot be), resulting in console spam. There is no need to throw any additional errors since the termination is already reported [here](`6c746260a9/src/core/worker.js (L371-L373)`), and `onFailure` is supposed to handle errors, not throw them.	2025-12-14 18:15:10 +02:00
Calixte Denizet	50c48cf11b	Add telemetry for tagged pdfs (bug 1997134)	2025-11-17 19:47:16 +01:00
Calixte Denizet	bc87f4e8d6	Add the possibility to create a pdf from different ones (bug 1997379) For now it's just possible to create a single pdf in selecting some pages in different pdf sources. The merge is for now pretty basic (it's why it's still a WIP) none of these data are merged for now: - the struct trees - the page labels - the outlines - named destinations For there are 2 new ref tests where some new pdfs are created: one with some extracted pages and an other one (encrypted) which is just rewritten. The ref images are generated from the original pdfs in selecting the page we want and the new images are taken from the generated pdfs.	2025-11-07 14:57:48 +01:00
Calixte Denizet	19ff148163	Fix incremental saving with hybrid references This patch removes some previous fixes which are now likely fixed by #17636. Fixes #20302.	2025-10-04 18:31:55 +02:00
Calixte Denizet	9e5ee1e5a7	[Editor] Add the ability to get all the editable annotations in a pdf document We want to be able to show all the comments in a pdf even if the pages where they are haven't been rendered. And it'll help to fix the issue #18915.	2025-08-18 21:31:11 +02:00
Jonas Jenwald	d9548b1c18	Slightly re-factor how we pre-load fonts and images in XFA documents Rather than "manually" invoking the methods from the `src/core/worker.js` file we introduce a single `PDFDocument`-method that handles this for us, and make the current methods private. Since this code is only invoked at most once per document, and only for XFA documents, we can use `BasePdfManager.prototype.ensureDoc` directly rather than needing a stand-alone method.	2025-05-04 13:44:33 +02:00
Jonas Jenwald	b531720d9c	Simplify the `serializeXfaData` method and related code Rather than having a dedicated `BasePdfManager`-method for this one call-site we can instead change `PDFDocument.prototype.serializeXfaData` to a non-async method, that we invoke via `BasePdfManager.prototype.ensureDoc`.	2025-05-03 11:20:42 +02:00
Jonas Jenwald	91ba147317	Check that the `Object.prototype` hasn't been incorrectly extended (PR 11582 follow-up) This complements, and extends, the existing check of the `Array.prototype` in the worker-thread. To simplify the implementation we'll now abort immediately, rather than collecting all "bad" properties.	2025-04-18 12:19:29 +02:00
Jonas Jenwald	dad6febc39	Pass the /Info-strings as a `Map` to the `src/core/writer.js` code We want to iterate through the data in the `computeMD5` function, and `Map`s have "nicer" support for that than generic objects. (Somewhat recently `Map` performance was improved in Firefox, however this also isn't really performance sensitive code.)	2025-04-04 13:36:13 +02:00
Jonas Jenwald	3c93d63731	Pass the `XRef`-instance explicitly to the `StructTreeRoot` class This avoids the current situation where we're accessing it through various dictionaries, since that's a somewhat brittle solution given that in the general case a `Dict`-instance may not have the `xref`-field set (see e.g. the empty-Dict).	2025-03-25 18:04:51 +01:00
Jonas Jenwald	7b5cd9cddd	Use arrow functions with some `Promise.then` calls A lot of this is fairly old code, which we can shorten slightly by using arrow functions instead of "regular" functions.	2025-03-02 19:57:38 +01:00
Jonas Jenwald	6bde49a606	Reduce duplication when handling "DocException" and "PasswordRequest" messages Rather than having to manually implement the exception-handling for the "DocException" message, we can instead re-use (and slightly extend) the existing `wrapReason` function since that one already does what we need. Furthermore, we can also simplify handling of the "PasswordRequest" message a little bit and again re-use the `wrapReason` function. Finally, the patch makes the following smaller changes: - Avoid needlessly re-creating exceptions in the `wrapReason` function. - Use a slightly shorter parameter name in the `wrapReason` function. - Remove the unused entries in the `CallbackKind`/`StreamKind` enumerations.	2024-12-26 12:55:49 +01:00
Jonas Jenwald	ec1a05c104	Add missing `startWorkerTask` calls in the "SaveDocument" handler Without these calls we'll not actually wait for saving to complete when document destruction runs; compare with other `WorkerTask`-usage in this file. While I cannot imagine that this has caused any problems for library users, the code is however not technically correct as-is.	2024-12-21 14:22:18 +01:00
Jonas Jenwald	ede589dd6e	Shorten the `WorkerMessageHandler` class a little bit - Use `this` in all scopes where that's possible, to avoid having to spell out `WorkerMessageHandler` everywhere. - Inline the `isMessagePort` helper function, since there's only a single call-site. - Use a static initialization block to move more code into the `WorkerMessageHandler` class itself.	2024-11-30 14:07:16 +01:00
Jonas Jenwald	8ec399d7e1	Convert the `getPdfManager` function to be asynchronous This is fairly old code, and by making the function `async` we can handle initialization errors "automatically" without the need for try-catch statements.	2024-11-22 17:49:43 +01:00
Jonas Jenwald	2c0cc48d1b	Replace the `forEach` method in `Dict` with "proper" iteration support	2024-11-17 12:45:32 +01:00
Calixte Denizet	4bf7787084	Simplify saving added/modified annotations. Having this map to collect the different changes will allow to know if some objects have already been modified.	2024-11-12 10:59:38 +01:00
Jonas Jenwald	196f7d7df1	Inline the `flushChunks` helper function, used in `getPdfManager` on the worker-thread - This helper function has only a single call-site, and the function is fairly short. - It'll only be invoked if range requests are disabled, or if the entire PDF manages to load before the headers are resolved (which is very unlikely). Hence, by default, this helper function is not invoked. - By inlining the code we're able to utilize the existing error-handling at the call-site, rather than having to duplicate it, which further reduces the size of this code. Finally, while slightly unrelated, this patch also adds optional chaining in one spot in the file (PR 16424 follow-up).	2024-11-02 11:06:30 +01:00
Calixte Denizet	3103deaa44	Fix missing annotation parent in using the one from the Fields entry Fixes #15096.	2024-10-04 20:00:19 +02:00
Tim van der Meij	c77b97daff	Update the JS/CSS files for the new Prettier/Stylelint versions	2024-07-13 16:29:47 +02:00
Jonas Jenwald	a4ffc1066c	Move the internal API/Worker `isEditing`-state into `RenderingIntentFlag` In hindsight this seems like a better idea, since it avoids the need to manually pass `isEditing` around as a boolean value. Note that `RenderingIntentFlag` is internal functionality, not exposed in the official API, which means that it can be extended and modified as necessary.	2024-07-04 23:34:30 +02:00
Calixte Denizet	64635f3b35	[api-minor][Editor] When switching to editing mode, redraw pages containing editable annotations Right now, editable annotations are using their own canvas when they're drawn, but it induces several issues: - if the annotation has to be composed with the page then the canvas must be correctly composed with its parent. That means we should move the canvas under canvasWrapper and we should extract composing info from the drawing instructions... Currently it's the case with highlight annotations. - we use some extra memory for those canvas even if the user will never edit them, which the case for example when opening a pdf in Fenix. So with this patch, all the editable annotations are drawn on the canvas. When the user switches to editing mode, then the pages with some editable annotations are redrawn but without them: they'll be replaced by their counterpart in the annotation editor layer.	2024-07-02 14:11:40 +02:00
Jonas Jenwald	f6cd03955b	[api-minor] Move the page reference/number caching into the API Rather than having to handle this manually throughout the viewer, this functionality can instead be moved into the API which simplifies the code slightly.	2024-04-29 18:54:06 +02:00
Jonas Jenwald	e4d0e84802	[api-minor] Replace the `PromiseCapability` with `Promise.withResolvers()` This replaces our custom `PromiseCapability`-class with the new native `Promise.withResolvers()` functionality, which does almost the same thing[1]; please see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Promise/withResolvers The only difference is that `PromiseCapability` also had a `settled`-getter, which was however not widely used and the call-sites can either be removed or re-factored to avoid it. In particular: - In `src/display/api.js` we can tweak the `PDFObjects`-class to use a "special" initial data-value and just compare against that, in order to replace the `settled`-state. - In `web/app.js` we change the only case to manually track the `settled`-state, which should hopefully be OK given how this is being used. - In `web/pdf_outline_viewer.js` we can remove the `settled`-checks, since the code should work just fine without it. The only thing that could potentially happen is that we try to `resolve` a Promise multiple times, which is however not a problem since the value of a Promise cannot be changed once fulfilled or rejected. - In `web/pdf_viewer.js` we can remove the `settled`-checks, since the code should work fine without them: - For the `_onePageRenderedCapability` case the `settled`-check is used in a `EventBus`-listener which is removed on its first (valid) invocation. - For the `_pagesCapability` case the `settled`-check is used in a print-related helper that works just fine with "only" the other checks. - In `test/unit/api_spec.js` we can change the few relevant cases to manually track the `settled`-state, since this is both simple and test-only code. --- [1] In browsers/environments that lack native support, note [the compatibility data](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Promise/withResolvers#browser_compatibility), it'll be polyfilled via the `core-js` library (but only in `legacy` builds).	2024-04-01 11:42:37 +02:00
Calixte Denizet	2133da166e	When updating, write the xref table in the same format as the previous one (bug 1878916) The specs are unclear about what kind of xref table format must be used. In checking the validity of some pdfs in the preflight tool from Acrobat we can guess that having the same format is the correct way to do. The pdf in the mentioned bug, after having been changed, wasn't correctly displayed in neither Chrome nor Acrobat: it's now fixed.	2024-02-13 14:14:37 +01:00
Jonas Jenwald	37e98e39f6	Skip any whitespace after the first object in linearized PDFs (issue 17665) This way the code is now consistent with the non-linearized branch in the `PDFDocument.startXRef` getter.	2024-02-12 22:05:36 +01:00
Calixte Denizet	f2196f7803	StructParents entry isn't required on pages with no tagged contents (bug 1855641)	2023-09-28 14:23:10 +02:00
Calixte Denizet	a8573d4e1b	[Editor] Add the ability to create/update the structure tree when saving a pdf containing newly added annotations (bug 1845087) When there is no tree, the tags for the new annotions are just put under the root element. When there is a tree, we insert the new tags at the right place in using the value of structTreeParentId (added in PR #16916).	2023-09-16 18:34:58 +02:00
Jonas Jenwald	ff96c413d3	Use `await` even more in the "SaveDocument" worker-thread handler Given that the function is already asynchronous we can make use of `await` even more and reduce the amount of indentation a little bit.	2023-09-16 13:06:48 +02:00
Jonas Jenwald	50937a3539	Ensure that the entire PDF document is loaded before we begin saving it When I started looking at PR 16938 it occurred to me that some of the new structTree-methods are synchronously accessing certain dictionary-data (not used during "normal" structTree-parsing), which may not be generally safe since everything in a dictionary could be a reference (and the relevant data may not have been loaded yet). Rather than suggesting that we make all those new methods even more asynchronous, to me the overall simplest and safest solution is to ensure that the entire PDF document has been loaded before we begin saving it. In practice this shouldn't really affect "performance" of saving noticeably, since it's always depended on the entire PDF document being downloaded. Finally note that with the exception of the PDF document possibly not having been fully downloaded when saving is triggered, all other "global" document properties are pretty much guaranteed to already be available at this point.	2023-09-12 13:26:57 +02:00
Jonas Jenwald	64e8557fb5	[api-minor] Deprecate the `PDFDocumentProxy.getJavaScript` method This method is very old, however with the exception of the auto-print hack (when scripting is disabled) in the viewer it's never actually been used. Most likely the idea with `PDFDocumentProxy.getJavaScript` was that it'd be useful if scripting support was added, however it turned out that it was a bit too simplistic and instead a number of new methods were added for the scripting use-cases.	2023-08-01 09:02:05 +02:00
Calixte Denizet	33fdec1392	Don't replace Acroform dictionary if nothing has changed when saving (bug 1844572)	2023-07-22 17:51:06 +02:00
Jonas Jenwald	88524bf9ae	Don't reset temporary XRef-entries during saving (PR 16392 follow-up) Please note: I'm not aware of any bugs caused by this, however that might be more luck than anything else. In PR 16392 the `incrementalUpdate` function, and all of its various helpers, were made asynchronous. However the call-site in `src/core/worker.js` wasn't updated, which means that we currently reset temporary XRef-entries while saving is ongoing.	2023-07-20 15:49:59 +02:00
Jonas Jenwald	3a886e7264	Move the `isNodeJS`-helper into the `src/shared/util.js` file With the changes in the previous patch the `isNodeJS`-helper no longer needs to live in its own file, which helps get rid of a closure in the built files.	2023-07-17 16:42:25 +02:00
Calixte Denizet	599b9498f2	[Editor] Add support for printing/saving newly added Stamp annotations In order to minimize the size the of a saved pdf, we generate only one image and use a reference in each annotation using it. When printing, it's slightly different since we have to render each page independantly but we use the same image within a page.	2023-06-26 15:47:05 +02:00
Calixte Denizet	71479fdd21	[Editor] Avoid to have duplicated entries in the Annot array when saving an existing and modified annotation	2023-06-15 22:02:10 +02:00
Jonas Jenwald	1753e321cd	Remove the compatibility checks in `WorkerMessageHandler.createDocumentHandler` For some time these checks have only targeted Node.js environments, since the features in question exist in all supported browsers (even when a `legacy`-build is used). Now that we've updated the minimum supported Node.js version to 18, a number of polyfills are thus (finally) no longer necessary in that environment. Hence for certain basic functionality, such as e.g. text-extraction, it's now possible to use either a modern- or a `legacy`-build of the PDF.js library in Node.js environments. Please note: For e.g. canvas-rendering in Node.js environments it's still necessary to use a `legacy`-build, since that functionality requires various polyfills.	2023-05-07 13:43:19 +02:00
Jonas Jenwald	ed8be6f882	[api-minor] Update the minimum supported Node.js version to 18 This patch updates the minimum supported environments as follows: - Node.js 18, which was released on 2022-04-19; see https://en.wikipedia.org/wiki/Node.js#Releases Note also that Node.js 16 will soon reach EOL, and thus no longer receive any security updates.	2023-05-07 13:43:19 +02:00
Jonas Jenwald	d950b91c4e	Introduce some logical assignment in the `src/core/` folder	2023-04-29 13:49:37 +02:00
Jonas Jenwald	317abd6d07	Change the `createPromiseCapability` helper function into a `PromiseCapability` class This is not only slightly more compact, but it also simplifies the handling of the `settled` getter.	2023-04-29 13:43:24 +02:00

1 2 3 4 5 ...