One drawback of the current implementation is that the GPU device can be
unavailable at the time of the first pattern fill, which causes the
GPU-accelerated canvas to be move on the main thread because of putImageData.
Most of the shading patterns stuff will be moved to the GPU and in order
to avoid creating some useless data we've to know if the GPU is available or not.
So in this patch we create the GPU device during the worker initialization
and pass a flag to the evaluator to know if the GPU is available or not.
These classes, and various related code, became unused after PR 21023 with only unit-tests actually running that code now.
Also removes the `isEvalSupported` API option, since the `PostScriptCompiler` was the only remaining code where `eval` was used.
The `PDFDataTransportStream` constructor has always registered exactly one listener for each type of data that an `PDFDataRangeTransport` instance can receive.
Given that an end-user of the `PDFDataRangeTransport` class will supply data through its `onData...` methods, it's also somewhat difficult to understand why additional end-user registered listeners would be needed (since the data is already, by definition, available to the user).
Furthermore, since TypedArray data is being transferred nowadays it's not even clear that multiple listeners (of the same kind) would generally work.
All in all, let's simplify this old code a little bit by using *a single* (internal) listener in the `PDFDataRangeTransport` class.
This code already isn't used (or even bundled) in the Firefox PDF Viewer, and it also slightly reduces the number of import maps that need to be maintained.
Currently we have no less than three different, but very similar, factories for reading built-in CMap files, standard font files, and wasm files on the main-thread.[1]
These factories were added at different points in time, since I cannot imagine that we'd add essentially three copies of the same code otherwise.
Nowadays these factories are often not even used[2], since worker-thread fetching is used whenever possible to improve performance. In particular, they will *only* be used when either:
- The PDF.js library runs in Node.js environments.
- The user manually sets `useWorkerFetch = false` when calling `getDocument`.
- The user provides custom `CMapReaderFactory`, `StandardFontDataFactory`, and/or `WasmFactory` instances when calling `getDocument`.
By replacing these factories with *a single* new `BinaryDataFactory` factory/option the number of `getDocument` options are thus reduced, which cannot hurt.
This also reduces the total bundle-size of the Firefox PDF Viewer a little bit, and it slightly reduces the number of import maps that need to be maintained.
*Please note:* For users that provide custom `CMapReaderFactory`, `StandardFontDataFactory`, and `WasmFactory` instances when calling `getDocument` this will be a breaking change, however it's unlikely that (many) such users exist.
(The *internal* format data-format of `CMapReaderFactory` was changed in PR 18951, and there hasn't been a single question/complaint about it in well over a year.)
---
[1] Any new functionality could easily lead to more such factories being added in the future, which wouldn't be great.
[2] Note that the Firefox PDF Viewer no longer use these factories, since it "forcibly" sets `useWorkerFetch = true` during building.
The `BaseCMapReaderFactory`, `BaseStandardFontDataFactory`, and `BaseWasmFactory` classes are all very similar, and the only difference is really in their respective `fetch` methods.
By have the worker-thread "compute" the complete `filename` it's possible to simplify the `BaseCMapReaderFactory.prototype.fetch` method, which will allow future improvements to all of these classes.
A couple of things to note:
- This code is unused, and it's not even bundled, in the Firefox PDF Viewer.
- In browsers it's unused by default, and worker-thread fetching will always be used when possible since that's more efficient.
*Please note:* For users that provide a custom `CMapReaderFactory` instance when calling `getDocument` this could be a breaking change, however it's unlikely that any such users exist.
(The *internal* format of this data was changed previously in PR 18951, and there hasn't been a single question/complaint about it in well over a year.)
The WebGPU feature hasn't been released yet but it's interesting to see how
we can use it in order to speed up the rendering of some objects.
This patch allows to render mesh patterns using WebGPU.
I didn't see any significant performance improvement on my machine (mac M2)
but it may be different on other platforms.
Given that we "forcibly" set `useWorkerFetch = true` for the MOZCENTRAL build-target there's a small amount of dead code as a result, which we can thus remove during building.
Providing one of these parameters is necessary when calling `getDocument`, since otherwise there's nothing to actually load. However, we currently don't enforce that properly and if there's more than one of these parameters provided the behaviour isn't well defined.[1]
The new behaviour is thus, in order:
1. Use the `data` parameter, since the PDF is already available and no additional loading is necessary.
2. Use the `range` parameter, for custom PDF loading (e.g. the Firefox PDF Viewer).
3. Use the `url` parameter, and have the PDF.js library load the PDF with a suitable `networkStream`.
4. Throw an error, since there's no way to load the PDF.
---
[1] E.g. if both `data` and `range` is provided, we'd load the document directly (since it's available) and also initialize a pointless `PDFDataTransportStream` instance.
After the changes in PR 20197 the code in the `TranslatedFont.prototype.send` method is not all that readable[1] given how it handles e.g. the `charProcOperatorList` data used with Type3 fonts.
Since this is the only spot where `Font.prototype.exportData` is used, it seems much simpler to move the `compileFontInfo` call there and *directly* return the intended data rather than messing with it after the fact.
Finally, while it doesn't really matter, the patch flips the order of the `charProcOperatorList` and `extra` properties throughout the code-base since the former is used with Type3 fonts while the latter (effectively) requires that debugging is enabled.
---
[1] I had to re-read it twice, also looking at all the involved methods, in order to convince myself that it's actually correct.
This is an old API-parameter that is now unused within the PDF.js project itself, and its description says that it's (partly) being used for "range requests operations".
Note that the `length` API-parameter is used to set the *initial* `contentLength` in various `BasePDFStreamReader` implementations, however it's always overridden by the "Content-Length" header (sent by the server) when that one exists *and* is a valid number. While we currently fallback to the keep the initial `contentLength` otherwise, note however how in that case range requests will always be *disabled* and thus the only spot in the code-base [where `fullReader.contentLength` is necessary](873378b718/src/core/worker.js (L230-L236)) cannot actually be reached.
Hence the only possible reason to use the `length` API-parameter would be for improved progress reporting[1] during streaming of PDF data in rare cases where the "Content-Length" header is missing/invalid, but the user *somehow* has information from another source about the correct `length` of the PDF document.
That situation feels very much like an edge-case, but it's obviously impossible to know if someone is depending on it. However, please note that there's a work-around available for users affected by this removal:
- Implement a `PDFDataRangeTransport` instance together with custom data-fetching[2], since in that case its `length`-parameter will always be used as-is.
Finally, updates various `BasePDFStreamReader` implementations to only set the `_isRangeSupported` field once the headers are available (since previously we'd just overwrite the "initial" value anyway).
---
[1] I.e. to avoid the "indeterminate" loadingBar being displayed in the viewer.
[2] This is what e.g. the Firefox PDF Viewer uses.
On the worker-thread only the static `write` methods are actually used, and on the main-thread only class instances are being created.
Hence this, after PR 20197, leads to a bunch of dead code in both of the *built* `pdf.mjs` and `pdf.worker.js` files.
This patch reduces the size of the `gulp mozcentral` output by `21 419` bytes, i.e. `21` kilo-bytes, which I believe is way too large of a saving to not do this.
(I can't even remember the last time we managed to reduce build-size this much with a single patch.)
The purpose of PR 11844 was to reduce memory usage once fonts have been attached to the DOM, since the font-data can be quite large in many cases.
Unfortunately the new `clearData` method added in PR 20197 doesn't actually remove *anything*, it just replaces the font-data with zeros which doesn't help when the underlying `ArrayBuffer` itself isn't modified.
The method does include a commented-out `resize` call[1], but uncommenting that just breaks rendering completely.
To address this regression, without having to make large or possibly complex changes, this patch simply changes the `clearData` method to replace the internal buffer/view with its contents *before* the font-data.
While this does lead to a data copy, the size of this data is usually orders of magnitude smaller than the font-data that we're removing.
---
[1] Slightly off-topic, but I don't think that patches should include commented-out code since there's a very real risk that those things never get found/fixed.
At the very least such cases should be clearly marked with `// TODO: ...` comments, and should possibly also have an issue filed about fixing the TODO.
The `PagesMapper` class currently makes up one third of the `src/display/display_utils.js` file size, and since its introduction it's grown (a fair bit) in size.
Note that the intention with files such as `src/display/display_utils.js` was to have somewhere to place functionality too small/simple to deserve its own file.
When recording bboxes for images, it's enough to record their
clip box / bounding box without needing to run the full bbox
tracking of the image's dependencies.
This patch adds right-click support for images in the PDF, allowing
users to download them. To minimize memory consumption, we:
- Do not store the images separately, and instead crop them out of the
PDF page canvas
- Only extract the images when needed (i.e. when the user right-clicks
on them), rather than eagery having all of them available.
To do so, we layer one empty 0x0 canvas per image, stretched to cover
the whole image, and only populate its contents on right click.
These images need to be inside the text layer: they cannot be _behind_
it, otherwise they would be covered by the text layer's container and
not be clickable, and they cannot be in front of it, otherwise they
would make the text spans unselectable.
This feature is managed by a new preference, `imagesRightClickMinSize`:
- when it's set to `-1`, right-click support is disabled
- when set to `0`, all images are available for right click
- when set to a positive integer, only images whose width and height are
greater than or equal to that value (in the PDF page frame of
reference) are available for right click.
This features is disabled by default outside of MOZCENTRAL, as it
significantly degrades the text selection experience in non-Firefox
browsers.
The one from pdf.js.utils is a bit too old: a lot of bugs have been fixed
in the code that parses PDF files since then.
It's just an internal development tool, so it doesn't need to be perfect,
but it should be good enough to be useful.
With these changes `0`, `NaN`, `null`, and `undefined` in the `total`-property all result in `percent === NaN` being reported by the callback, since previously e.g. `0` would result in `percent === 100` being reported unconditionally which doesn't make a lot of sense.
Also, remove the "indeterminate" loadingBar (in the viewer) if the `PDFDocumentLoadingTask` fails since there won't be any more data arriving and displaying the animation thus seems wrong.
Currently the transfers aren't actually being used with the "GetOperatorList" message, since the placement of the parameter is wrong; note the method signature: 909a700afa/src/shared/message_handler.js (L219-L229)
This goes back to PR 16588, which added the transfers parameter, and unfortunately we all missed that :-(
Simply fixing the parameter isn't enough however, since that broke printing of Stamp-editors (and possibly others). The solution here is to *not* transfer data during printing, given that a single `PrintAnnotationStorage` instance is being used for all pages.
With the exception of the first invocation the callback function is unused, which means that a lot of pointless functions may be created.
To avoid this we introduce helper functions for simple cases, such as creating `Map`s and `Objects`s.
This is consistent with the other `BasePDFStream` implementations, and simplifies the API surface of the `PDFDataRangeTransport` class (note the changes in the viewer).
Given that the `onDataProgress` method was changed to a no-op this won't affect third-party users, assuming there even are any since this code was written specifically for the Firefox PDF Viewer.
The percentage calculation is currently "spread out" across various viewer functionality, which we can avoid by having the API handle that instead.
Also, remove the `this.#lastProgress` special-case[1] and just register a "normal" `fullReader.onProgress` callback unconditionally. Once `headersReady` is resolved the callback can simply be removed when not needed, since the "worst" thing that could theoretically happen is that the loadingBar (in the viewer) updates sooner this way. In practice though, since `fullReader.read` cannot return data until `headersReady` is resolved, this change is not actually observable in the API.
---
[1] This was added in PR 8617, close to a decade ago, but it's not obvious to me that it was ever necessary to implement it that way.
Given that there's no less than *five* different, but very similar, implementations this helps reduce code duplication and simplifies maintenance.
Also, spotted during rebasing, pass the `enableHWA` option "correctly" (i.e. as part of the existing `transportParams`) to the `WorkerTransport`-class to keep the constructor simpler.
For now it's just possible to create a single pdf in selecting some pages in different pdf sources.
The merge is for now pretty basic (it's why it's still a WIP) none of these data are merged for now:
- the struct trees
- the page labels
- the outlines
- named destinations
For there are 2 new ref tests where some new pdfs are created: one with some extracted pages and an other
one (encrypted) which is just rewritten.
The ref images are generated from the original pdfs in selecting the page we want and the new images are
taken from the generated pdfs.