Given that `CompiledFont.prototype.getPathJs` already returns data in the desired TypedArray format, we should be able to directly copy the font-path data which helps shorten the code a little bit (rather than the "manual" handling in PR 20346).
To ensure that this keeps working as expected, a non-production `assert` is added to prevent any future surprises.
This is an old API-parameter that is now unused within the PDF.js project itself, and its description says that it's (partly) being used for "range requests operations".
Note that the `length` API-parameter is used to set the *initial* `contentLength` in various `BasePDFStreamReader` implementations, however it's always overridden by the "Content-Length" header (sent by the server) when that one exists *and* is a valid number. While we currently fallback to the keep the initial `contentLength` otherwise, note however how in that case range requests will always be *disabled* and thus the only spot in the code-base [where `fullReader.contentLength` is necessary](873378b718/src/core/worker.js (L230-L236)) cannot actually be reached.
Hence the only possible reason to use the `length` API-parameter would be for improved progress reporting[1] during streaming of PDF data in rare cases where the "Content-Length" header is missing/invalid, but the user *somehow* has information from another source about the correct `length` of the PDF document.
That situation feels very much like an edge-case, but it's obviously impossible to know if someone is depending on it. However, please note that there's a work-around available for users affected by this removal:
- Implement a `PDFDataRangeTransport` instance together with custom data-fetching[2], since in that case its `length`-parameter will always be used as-is.
Finally, updates various `BasePDFStreamReader` implementations to only set the `_isRangeSupported` field once the headers are available (since previously we'd just overwrite the "initial" value anyway).
---
[1] I.e. to avoid the "indeterminate" loadingBar being displayed in the viewer.
[2] This is what e.g. the Firefox PDF Viewer uses.
Currently the `bbox`, `fontMatrix`, and `defaultVMetrics` getters duplicate almost the same code, which we can avoid by adding a new helper method (similar to existing ones for reading numbers and strings).
The added `assert` in the new helper method also caught a bug in how the `defaultVMetrics` length was compiled.
On the worker-thread only the static `write` methods are actually used, and on the main-thread only class instances are being created.
Hence this, after PR 20197, leads to a bunch of dead code in both of the *built* `pdf.mjs` and `pdf.worker.js` files.
This patch reduces the size of the `gulp mozcentral` output by `21 419` bytes, i.e. `21` kilo-bytes, which I believe is way too large of a saving to not do this.
(I can't even remember the last time we managed to reduce build-size this much with a single patch.)
It has few cool features:
- all the canvas used during the rendering can be viewed;
- the different properties in the graphics state can be viewed;
- the different paths can be viewed.
- Mention the internal viewer in the README, such that it's easier to find.
- Implement a new `INTERNAL_VIEWER` define, such that it's easier to limit code to only the "internal-viewer" gulp target.
- Only include the "GetRawData" message-handler when needed. Note that the `MessageHandler` [already throws](eb159abd6a/src/shared/message_handler.js (L121-L123)) for any missing handler.
- Move the various new helper functions from `src/core/document.js` and into their own file. The reasons for doing this are:
- That file is already quite large and complex as-is, and these helper functions are slightly orthogonal to its main functionality.
- Babel isn't able to remove all of the new code, and by moving this into a separate file we can guarantee that no extra code ends up in e.g. Firefox.
This method is only used with loops, and it should be a tiny bit more efficient to use an iterator directly rather than first iterating through the underlying data to create a temporary `Array` that we finally iterate through at the call-site.
*Please note:* As port of these changes the chars/glyph caches, on the `Font` instances, are changed to use `Map`s rather than Objects.
The one from pdf.js.utils is a bit too old: a lot of bugs have been fixed
in the code that parses PDF files since then.
It's just an internal development tool, so it doesn't need to be perfect,
but it should be good enough to be useful.
Falling back to use the `loaded` byteLength if the server `contentLength` is unknown doesn't make a lot of sense, since it'd lead to the `onProgress` callback reporting `percent === 100` repeatedly while the document is loading despite that being obviously wrong.
Instead we'll now report `percent === NaN` in that case, thus showing the indeterminate progressBar, which seems more correct if the `contentLength` is unknown.
Please note that this code-path is normally not even reached, since streaming is enabled by default (applies e.g. to the Firefox PDF Viewer).
This is not only shorter, but (in my opinion) it also simplifies the code.
*Note:* In order to keep the *five* different `BasePDFStreamReader` implementations consistent, we purposely don't re-factor the `PDFWorkerStreamReader` class to support `for await...of` iteration.
Looking at the `BinaryCMapStream` implementation, it's basically a "regular" `Stream` but with added functionality for reading compressed CMap data.
Hence, by letting `BinaryCMapStream` extend `Stream`, we can remove an effectively duplicate method and simplify/shorten the code a tiny bit.
Currently the `customNames` are read one byte at a time, in a loop, and at every iteration converted to a string.
This can be replaced with the `BaseStream.prototype.getString` method, which didn't exist back when this function was written.
This method is usually used with loops, and it should be a tiny bit more efficient to use an iterator directly rather than first iterating through ` Map`-values to create a temporary `Array` that we finally iterate through at the call-site.
Note that the `getRawValues` method is old code, and originally the `Dict` class stored its data in a regular `Object`, hence why the old code was written that way.
This method is usually used with loops, and it should be a tiny bit more efficient to use an iterator directly rather than first iterating through ` Map`-keys to create a temporary `Array` that we finally iterate through at the call-site.
Note that the `getKeys` method is old code, and originally the `Dict` class stored its data in a regular `Object`, hence why the old code was written that way.
This changes a number of loops currently using `Dict.prototype.{getKeys, getRaw}`, since it should be a tiny bit more efficient to use an iterator directly rather than first iterating through `Map`-keys to create a temporary `Array` that we finally iterate through at the call-site.
Note that the `getKeys` method is much older than `getRawEntries`, and originally the `Dict` class stored its data in a regular `Object`, hence why the old code was written that way.
This behaviour comes from the initial pdf.js commit but is wrong and
doesn't match other PDF readers like muPDF or pdfium.
From PDF Spec 7.3.3:
A PDF writer shall not use the PostScript language syntax for numbers with non-decimal radices (such
as 16#FFFE) or in exponential format (such as 6.02E23).
Change all these cases to use `Map.prototype.getOrInsertComputed()` instead, in combination with a helper function for creating the `Array`s (similar to the previous patch).
With the exception of the first invocation the callback function is unused, which means that a lot of pointless functions may be created.
To avoid this we introduce helper functions for simple cases, such as creating `Map`s and `Objects`s.