Given that this function is only ever used during *parsing* of the PDF document, which happens in the worker-thread, this has always added (a little bit of) dead code in the built `pdf.mjs` file.
Given that the various utility-files naturally increase in size over time, it shouldn't hurt to shorten `src/core/core_utils.js` a little bit by moving a few of its string helper functions to their own file.
Note that for the integration tests the coverage information ends up
being processed in the Node.js context where `window` is not available,
so we use `globalThis` instead for the function that merges individual
test's coverage information into the global object because that is
available in all contexts we support. For clarity we also rename said
function since we're not exclusively dealing with `window` nor worker
data anymore.
Given that this function is only ever used in `src/core/` code, let's avoid a little bit of dead code in the *built* `pdf.mjs` file.
Also, place the `AnnotationPrefix` and `AnnotationEditorPrefix` constants together in `src/shared/util.js` since that should aid readability.
Note that `Ref`s and `Name`s are cached globally[1], since that helps reduce object creation (a lot) during parsing.
That cache will be cleared after a period of inactivity in the viewer[2], which is why those primitives cannot *safely* be compared with just `===`/`!==` and also (partially) why abstractions such as `RefSet`/`RefSetCache` are necessary.
Currently `deepCompare` doesn't handle `Ref`s and `Name`s correctly, which may lead to future *intermittent* bugs in any code using the `deepCompare` helper function.
---
[1] This applies to `Cmd` as well, however that doesn't matter in the context of this patch.
[2] Currently, and for more than a decade, set to 30 seconds.
This fixes two things that I overlooked in PR 21225, more specifically:
- Use proper, rather than semi, private class fields in `WasmImage`.
- Make tracking of `WasmImage` instances optional, to avoid keeping data alive permanently in the `IMAGE_DECODERS` build.
Using Array.prototype.shift() to drain the traversal queue makes each
visited node move the remaining queued entries. For large name/number
trees this can make getAll() spend quadratic time in queue management.
Iterate over the queue with for...of instead. Children pushed while
iterating are still visited, and the queue no longer needs repeated
front removals.
Given that these classes are, with the exception of their `decode` methods, virtually identical this helps reduce code duplication and simplifies maintenance.
These changes reduce the size of the `gulp mozcentral` build-target by `1292` bytes, which obviously isn't a lot but still cannot hurt.
This was necessary before charset compilation was implemented, however that's been supported for many years and this is just dead code now.
- PR 9340, back in 2018, stopped using the `raw` field.
- PR 10591, back in 2019, implemented proper charset compilation.
The `CFFFont.prototype.data` should contain a `Uint8Array`, however if compilation failed it was being set to a `Stream` instance which will thus fail elsewhere in the font-code.
*Please note:* This was found by code inspection, since I don't have a PDF document that's fixed by this change.
Return the data as-is from the `CFFCompiler.prototype.compile` method, rather than making a copy of it first.
The reason that it was implemented this way in PR 21053 was to avoid keeping a potentially large `ArrayBuffer` alive, see https://github.com/mozilla/pdf.js/pull/21053#discussion_r3045402988
Having traced all the call-sites in the font-code that directly or indirectly invoke that code, I've now managed to conclude that the compiled CFF-data is never stored on the `Font` instance and using the data as-is thus shouldn't increase permanent memory usage.
This avoids having to "duplicate" dummy `readBlock` methods in a couple of image-stream classes.
Also, move a few `DecodeStream` field definitions to (ever so slightly) shorten the code.
When pages carry explicit pageIndices (e.g. after a delete),
resolve insertAfter against that layout instead of the empty
base sequence. Also reject partial pageIndices combined with
insertAfter, which would race against the extraction's auto-fill.
Many `parseInt` call-sites already provide the `radix` argument, and this rule helps improve consistency in the code-base; see https://eslint.org/docs/latest/rules/radix
*Please note:* The rule is disabled in `src/scripting_api/util.js` for now, since it's not obvious at a glance (at least to me) what the correct `radix` argument should be there.
*Note:* This is similar to PR 19525, which did the same thing for the OpenJPEG decoder.
The advantages of doing this are:
- The same JBig2 decoder is used regardless of WASM being supported or not, which means consistent rendering.
- The old `Jbig2Image` implementation has various bugs and missing features.
- Less code that needs to be maintained in the PDF.js project, since both the CCITT and the JBig2 decoder is replaced.
The disadvantage of doing this is:
- Slightly larger bundle size, however the effect is limited since a fair amount of PDF.js code can be removed. For the `gulp mozcentral` target the size increase is approximately 54 kilo-bytes (which is small compared to the 452 kilo-bytes for the JS version of the OpenJPEG decoder).
Instrument JS files on-the-fly via babel-plugin-istanbul when --coverage
or --coverage-per-test is passed, producing an aggregate lcov/HTML report
at the end of the run. A persistent PDFWorker accumulates worker-thread
coverage alongside the main-thread coverage, collected via a new
GetWorkerCoverage message handler.
With --coverage-per-test, an inverted index
(build/coverage/per-test-index.json) is also built as tests run, mapping
each hit source line and function name to the numeric IDs of the tests
that exercised it, keeping the output compact. The new
`gulp test_search --code=file::line_or_function` tool queries the index,
and passing --code to browsertest pre-filters the test run to only those
tests.
Coverage output formats are selectable via --coverage-formats (default:
info; also accepts html, json, text, cobertura, clover).
With the exception of `glyphsIds` the length of the other segments can be trivially determined upfront, which is obvious in hindsight. This way unnecessary allocations can be avoided when building the "cmap" table.
This helps reduce the amount of boilerplate code needed in multiple spots throughout the font code, and more importantly it'll help when building TrueType tables whose final size is non-trivial to compute upfront.
Compared to the other TrueType table building functions, see previous patches, these ones are not trivial to convert to use TypedArrays properly.
However, in order to simplify the `OpenTypeFileBuilder` implementation a little bit we can at least have these functions return TypedArray data.