This is an old API-parameter that is now unused within the PDF.js project itself, and its description says that it's (partly) being used for "range requests operations".
Note that the `length` API-parameter is used to set the *initial* `contentLength` in various `BasePDFStreamReader` implementations, however it's always overridden by the "Content-Length" header (sent by the server) when that one exists *and* is a valid number. While we currently fallback to the keep the initial `contentLength` otherwise, note however how in that case range requests will always be *disabled* and thus the only spot in the code-base [where `fullReader.contentLength` is necessary](873378b718/src/core/worker.js (L230-L236)) cannot actually be reached.
Hence the only possible reason to use the `length` API-parameter would be for improved progress reporting[1] during streaming of PDF data in rare cases where the "Content-Length" header is missing/invalid, but the user *somehow* has information from another source about the correct `length` of the PDF document.
That situation feels very much like an edge-case, but it's obviously impossible to know if someone is depending on it. However, please note that there's a work-around available for users affected by this removal:
- Implement a `PDFDataRangeTransport` instance together with custom data-fetching[2], since in that case its `length`-parameter will always be used as-is.
Finally, updates various `BasePDFStreamReader` implementations to only set the `_isRangeSupported` field once the headers are available (since previously we'd just overwrite the "initial" value anyway).
---
[1] I.e. to avoid the "indeterminate" loadingBar being displayed in the viewer.
[2] This is what e.g. the Firefox PDF Viewer uses.
On the worker-thread only the static `write` methods are actually used, and on the main-thread only class instances are being created.
Hence this, after PR 20197, leads to a bunch of dead code in both of the *built* `pdf.mjs` and `pdf.worker.js` files.
This patch reduces the size of the `gulp mozcentral` output by `21 419` bytes, i.e. `21` kilo-bytes, which I believe is way too large of a saving to not do this.
(I can't even remember the last time we managed to reduce build-size this much with a single patch.)
This patch adds right-click support for images in the PDF, allowing
users to download them. To minimize memory consumption, we:
- Do not store the images separately, and instead crop them out of the
PDF page canvas
- Only extract the images when needed (i.e. when the user right-clicks
on them), rather than eagery having all of them available.
To do so, we layer one empty 0x0 canvas per image, stretched to cover
the whole image, and only populate its contents on right click.
These images need to be inside the text layer: they cannot be _behind_
it, otherwise they would be covered by the text layer's container and
not be clickable, and they cannot be in front of it, otherwise they
would make the text spans unselectable.
This feature is managed by a new preference, `imagesRightClickMinSize`:
- when it's set to `-1`, right-click support is disabled
- when set to `0`, all images are available for right click
- when set to a positive integer, only images whose width and height are
greater than or equal to that value (in the PDF page frame of
reference) are available for right click.
This features is disabled by default outside of MOZCENTRAL, as it
significantly degrades the text selection experience in non-Firefox
browsers.
A number of these unit-tests didn't actually cover the intended code-paths, since many of them *accidentally* matched the "file size is smaller than two range requests"-check.
The patch also updates `validateRangeRequestCapabilities` to use return-value names that are consistent with the class fields used in the various stream implementations.
This method is usually used with loops, and it should be a tiny bit more efficient to use an iterator directly rather than first iterating through ` Map`-values to create a temporary `Array` that we finally iterate through at the call-site.
Note that the `getRawValues` method is old code, and originally the `Dict` class stored its data in a regular `Object`, hence why the old code was written that way.
This method is usually used with loops, and it should be a tiny bit more efficient to use an iterator directly rather than first iterating through ` Map`-keys to create a temporary `Array` that we finally iterate through at the call-site.
Note that the `getKeys` method is old code, and originally the `Dict` class stored its data in a regular `Object`, hence why the old code was written that way.
- Add a new test using only streaming, since that was missing and the lack of which most likely contributed to previous bugs in the `PDFDataRangeTransport` implementation (see PR 10675 and 20634).
- Improve the "ranges and streaming" test, to utilize both ranges *and* streaming properly, since the way it was written seemed somewhat unrealistic given how data will normally arrive when `PDFDataRangeTransport` is being used.
- Provide more `initialData`, in relevant tests, since a length smaller than `rangeChunkSize` seem pretty pointless.
- Test the `contentDispositionFilename`, and `contentLength`, handling in the `PDFDataRangeTransport` implementation.
This behaviour comes from the initial pdf.js commit but is wrong and
doesn't match other PDF readers like muPDF or pdfium.
From PDF Spec 7.3.3:
A PDF writer shall not use the PostScript language syntax for numbers with non-decimal radices (such
as 16#FFFE) or in exponential format (such as 6.02E23).
It fixes#20715.
`failedExpectations` was removed from `suiteStarted` and `specStarted` events.
HtmlReporter and HtmlSpecFilter have been deprecated and removed.
Change all these cases to use `Map.prototype.getOrInsertComputed()` instead, in combination with a helper function for creating the `Array`s (similar to the previous patch).
With the exception of the first invocation the callback function is unused, which means that a lot of pointless functions may be created.
To avoid this we introduce helper functions for simple cases, such as creating `Map`s and `Objects`s.
In all cases where we currently use `Response.prototype.arrayBuffer()` the result is immediately wrapped in a `Uint8Array`, which can be avoided by instead using the newer `Response.prototype.bytes()` method; see https://developer.mozilla.org/en-US/docs/Web/API/Response/bytes
Report loading progress "automatically" when using the `PDFDataTransportStream` class, and remove the `PDFDataRangeTransport.prototype.onDataProgress` method
Currently this code expects a "url string", rather than a proper `URL` instance, which seems completely unnecessary now. The explanation for this is, as so often is the case, "historical reasons" since a lot of this code predates the general availability of `URL`.
This is consistent with the other `BasePDFStream` implementations, and simplifies the API surface of the `PDFDataRangeTransport` class (note the changes in the viewer).
Given that the `onDataProgress` method was changed to a no-op this won't affect third-party users, assuming there even are any since this code was written specifically for the Firefox PDF Viewer.
This should help reduce the maintenance burden of the code, since you no longer need to remember to update separate code when touching the different page/thumbnail classes.
The percentage calculation is currently "spread out" across various viewer functionality, which we can avoid by having the API handle that instead.
Also, remove the `this.#lastProgress` special-case[1] and just register a "normal" `fullReader.onProgress` callback unconditionally. Once `headersReady` is resolved the callback can simply be removed when not needed, since the "worst" thing that could theoretically happen is that the loadingBar (in the viewer) updates sooner this way. In practice though, since `fullReader.read` cannot return data until `headersReady` is resolved, this change is not actually observable in the API.
---
[1] This was added in PR 8617, close to a decade ago, but it's not obvious to me that it was ever necessary to implement it that way.
Given that there's no less than *five* different, but very similar, implementations this helps reduce code duplication and simplifies maintenance.
Also, remove the `rangeChunkSize` not defined checks in all the relevant stream-constructor implementations.
Note how the API, since some time, always validates *and* provides that parameter when creating a `BasePDFStreamReader`-instance.
Given that there's no less than *five* different, but very similar, implementations this helps reduce code duplication and simplifies maintenance.
Also, spotted during rebasing, pass the `enableHWA` option "correctly" (i.e. as part of the existing `transportParams`) to the `WorkerTransport`-class to keep the constructor simpler.
Modified the regex in web/autolinker.js to explicitly allow hyphens (-) in
the domain part of email addresses, while maintaining the exclusion of
other punctuation. This fixes mailto links like user@uni-city.tld being
truncated at the hyphen.
Fixes#20557
Usually, content stream or fonts are compressed using FlateDecode.
So use the DecompressionStream API to decompress those streams
in the async code path.
Update the styles and HTML to reflect the new views manager concept.
For now, nothing about split/merge functionality is implemented or visible.
The new styles for the outline, attachments, and layers will be added later.
The thumbnail view is now accessible with the keyboard.
This commit moves all the logic to scale up&down `<span>`s in the text
layer, introduced in #18283, to CSS.
The motivation for this change is that #18283 is still not enough for
all cases. That PR fixed the problem in Chrome&Firefox desktop, which
allow users to set an actual minimum font size in the browser settings.
However, other browsers (e.g. the Chrome-based WebView on Android) have
more complex logic and they scale up small text rather than simply
applying a minimum.
A workaround for that behavior is probably out of scope for PDF.js
itself as it only affects not officially supported platforms. However,
having access to the actual expected font height (through
`--font-height`) allows embedders of PDF.js to implement a workaround by
themselves.