3675 Commits

Author SHA1 Message Date
Jonas Jenwald
e5330f06fa Move the stringToPDFString helper function into the src/core/string_utils.js file
Given that this function is only ever used during *parsing* of the PDF document, which happens in the worker-thread, this has always added (a little bit of) dead code in the built `pdf.mjs` file.
2026-05-15 12:10:30 +02:00
Jonas Jenwald
7a7e7049c1 Shorten the isAscii helper function a tiny bit 2026-05-15 11:56:33 +02:00
Jonas Jenwald
153cef615e Move a couple of src/core/ string helper functions into their own file
Given that the various utility-files naturally increase in size over time, it shouldn't hurt to shorten `src/core/core_utils.js` a little bit by moving a few of its string helper functions to their own file.
2026-05-15 11:49:54 +02:00
Tim van der Meij
26dc195a65
Collect coverage information for the integration tests
Note that for the integration tests the coverage information ends up
being processed in the Node.js context where `window` is not available,
so we use `globalThis` instead for the function that merges individual
test's coverage information into the global object because that is
available in all contexts we support. For clarity we also rename said
function since we're not exclusively dealing with `window` nor worker
data anymore.
2026-05-14 12:34:12 +02:00
Jonas Jenwald
5bc5791a86
Merge pull request #21257 from Snuffleupagus/deepCompare-Refs
Update the `deepCompare` helper function to handle `Ref`s and `Name`s correctly
2026-05-12 11:53:02 +02:00
Jonas Jenwald
aecb571ea6 Move the getModificationDate helper function into src/core/core_utils.js
Given that this function is only ever used in `src/core/` code, let's avoid a little bit of dead code in the *built* `pdf.mjs` file.

Also, place the `AnnotationPrefix` and `AnnotationEditorPrefix` constants together in `src/shared/util.js` since that should aid readability.
2026-05-11 14:13:23 +02:00
Jonas Jenwald
326df1f711 Update the deepCompare helper function to handle Refs and Names correctly
Note that `Ref`s and `Name`s are cached globally[1], since that helps reduce object creation (a lot) during parsing.
That cache will be cleared after a period of inactivity in the viewer[2], which is why those primitives cannot *safely* be compared with just `===`/`!==` and also (partially) why abstractions such as `RefSet`/`RefSetCache` are necessary.

Currently `deepCompare` doesn't handle `Ref`s and `Name`s correctly, which may lead to future *intermittent* bugs in any code using the `deepCompare` helper function.

---

[1] This applies to `Cmd` as well, however that doesn't matter in the context of this patch.

[2] Currently, and for more than a decade, set to 30 seconds.
2026-05-11 13:18:54 +02:00
Tim van der Meij
702d60aa18
Merge pull request #21230 from calixteman/avoid_cycles
Avoid cycles when getting operator list in patterns
2026-05-10 18:15:01 +02:00
Tim van der Meij
3b58a339c8
Merge pull request #21213 from saripovdenis/perf-name-tree-getall-queue-index
perf: Avoid multi-second getDestinations stalls for PDFs with many named destinations
2026-05-10 18:13:12 +02:00
Calixte Denizet
29fcf0aa76
Avoid cycles when getting operator list in patterns 2026-05-07 22:30:51 +02:00
Calixte Denizet
b39440b6e0
Simplify '#getFilteredPageIndices' and '#resolveInsertAfterIndices' 2026-05-07 21:41:37 +02:00
Tim van der Meij
e81507c167
Merge pull request #21228 from calixteman/bug2027682
Place new annotations on the correct page when extracting pages (bug 2027682)
2026-05-07 21:12:15 +02:00
Calixte Denizet
4c62a49483
Place new annotations on the correct page when extracting pages (bug 2027682) 2026-05-06 18:44:02 +02:00
Jonas Jenwald
3f6a2feef6 Tweak the WasmImage implementation a little bit (PR 21225 follow-up)
This fixes two things that I overlooked in PR 21225, more specifically:

 - Use proper, rather than semi, private class fields in `WasmImage`.

 - Make tracking of `WasmImage` instances optional, to avoid keeping data alive permanently in the `IMAGE_DECODERS` build.
2026-05-06 17:52:35 +02:00
saripovdenis
473f9b4592 Avoid quadratic traversal in NameOrNumberTree.getAll
Using Array.prototype.shift() to drain the traversal queue makes each
visited node move the remaining queued entries. For large name/number
trees this can make getAll() spend quadratic time in queue management.

Iterate over the queue with for...of instead. Children pushed while
iterating are still visited, and the queue no longer needs repeated
front removals.
2026-05-06 09:51:57 +08:00
Jonas Jenwald
6ff0f8690f Add an abstract WasmImage class, that JBig2CCITTFaxImage and JpxImage inherit from
Given that these classes are, with the exception of their `decode` methods, virtually identical this helps reduce code duplication and simplifies maintenance.

These changes reduce the size of the `gulp mozcentral` build-target by `1292` bytes, which obviously isn't a lot but still cannot hurt.
2026-05-05 17:25:18 +02:00
Jonas Jenwald
ac6a9230d1 Replace TrueTypeTableBuilder and CompilerOutput with a single class
Given that both of these classes are so similar, let's replace them with a single `DataBuilder` class instead to reduce unnecessary code-duplication.
2026-05-04 15:01:53 +02:00
Jonas Jenwald
53fd89682c Remove the unused raw field from the CFFCharset class
This was necessary before charset compilation was implemented, however that's been supported for many years and this is just dead code now.
 - PR 9340, back in 2018, stopped using the `raw` field.
 - PR 10591, back in 2019, implemented proper charset compilation.
2026-05-03 18:51:24 +02:00
Jonas Jenwald
027671e6dc Replace a loop with TypedArray.prototype.set() in the compileFDSelect method
Given that the `fdSelect.fdSelect` data is a regular Array, this code can simplified a tiny bit.
2026-05-03 16:32:48 +02:00
Jonas Jenwald
e5e82b9617 Don't create a DataView for the "CFF " TrueType table in readTableEntry
Given that the "CFF " table may be replaced completely, during font-parsing, it shouldn't make sense to read and/or modify it piecewise.
2026-05-03 13:17:23 +02:00
Jonas Jenwald
b65eedc636 Set the correct data if compilation fails in the CFFFont constructor
The `CFFFont.prototype.data` should contain a `Uint8Array`, however if compilation failed it was being set to a `Stream` instance which will thus fail elsewhere in the font-code.

*Please note:* This was found by code inspection, since I don't have a PDF document that's fixed by this change.
2026-05-03 13:17:18 +02:00
Jonas Jenwald
521f4dc554 Remove the CompilerOutput.prototype.finalData getter (PR 21053 follow-up)
Return the data as-is from the `CFFCompiler.prototype.compile` method, rather than making a copy of it first.
The reason that it was implemented this way in PR 21053 was to avoid keeping a potentially large `ArrayBuffer` alive, see https://github.com/mozilla/pdf.js/pull/21053#discussion_r3045402988

Having traced all the call-sites in the font-code that directly or indirectly invoke that code, I've now managed to conclude that the compiled CFF-data is never stored on the `Font` instance and using the data as-is thus shouldn't increase permanent memory usage.
2026-05-03 13:13:50 +02:00
Jonas Jenwald
a8715f6f96 Don't provide unused /DecodeParms when initializing JpxStream 2026-05-02 12:20:28 +02:00
Jonas Jenwald
adf07ea51c
Merge pull request #21200 from Snuffleupagus/Intersector-grid-push
Shorten how intersectors are added to the grid in the `Intersector` constructor
2026-04-30 12:56:38 +02:00
Jonas Jenwald
4a5c455c0b Shorten how intersectors are added to the grid in the Intersector constructor
Thanks to modern JavaScript features this code can be simplified a tiny bit.
2026-04-30 12:06:08 +02:00
Jonas Jenwald
f26b98c7c4 Simplify the nextChunk handling in the DecryptStream class
This is old code, that can be simplified a tiny bit with modern JavaScript features.
2026-04-30 11:40:34 +02:00
Jonas Jenwald
1f6bfa0890 Add an abstract readBlock method in the DecodeStream class
This avoids having to "duplicate" dummy `readBlock` methods in a couple of image-stream classes.
Also, move a few `DecodeStream` field definitions to (ever so slightly) shorten the code.
2026-04-29 13:02:15 +02:00
Jonas Jenwald
3475806311 Convert Catalog.prototype.getPageIndex to an asynchronous method
This simplifies/shortens a piece of old code, which shouldn't hurt.
2026-04-28 11:34:41 +02:00
Jonas Jenwald
339f755a52 Add more validation in the Catalog.prototype.getPageIndex method
- Ensure that the /Kids-entries are Arrays, before trying to iterate through them.
 - Ensure that the /Count-entries are (positive) integers.
2026-04-28 11:33:50 +02:00
Calixte Denizet
c9a7ff0506 Fix merging PDFs with conflicting AcroForm /DR (bug 2035197) 2026-04-27 18:54:52 +02:00
Calixte Denizet
64b25a8f47
Fix merging a PDF after a page deletion (bug 2034804)
When pages carry explicit pageIndices (e.g. after a delete),
resolve insertAfter against that layout instead of the empty
base sequence. Also reject partial pageIndices combined with
insertAfter, which would race against the extraction's auto-fill.
2026-04-26 22:37:22 +02:00
Jonas Jenwald
5c11bf15b0
Merge pull request #21160 from Snuffleupagus/more-hexNumbers
Move the `hexNumbers` Array into `Util`, to enable using it in the viewer
2026-04-26 13:08:42 +02:00
Jonas Jenwald
9b238b9719 Move the hexNumbers Array into Util, to enable using it in the viewer
This reduces some code duplication, and the new `Util.hexNums` property is now computed lazily.
2026-04-26 12:05:12 +02:00
Tim van der Meij
2674a9f3e4
Merge pull request #21137 from calixteman/bug2022700
Don't decode name of the checkboxes exported values (bug 2022700)
2026-04-26 12:00:58 +02:00
Jonas Jenwald
e6dba6ee34 Enable the radix ESLint rule
Many `parseInt` call-sites already provide the `radix` argument, and this rule helps improve consistency in the code-base; see https://eslint.org/docs/latest/rules/radix

*Please note:* The rule is disabled in `src/scripting_api/util.js` for now, since it's not obvious at a glance (at least to me) what the correct `radix` argument should be there.
2026-04-25 12:13:12 +02:00
Jonas Jenwald
aa7289d28b Remove the unused MIN_INT_32 constant (PR 21139 follow-up) 2026-04-24 13:29:22 +02:00
calixteman
25204d359a
Merge pull request #21136 from calixteman/bug2033908
Avoid to add outlines having a deleted page which leads to clone a useless page (bug 2033908)
2026-04-23 22:24:58 +02:00
Jonas Jenwald
a6988582d2 [api-minor] Replace the CCITT and JBig2 fallback decoders with a JS version of the PDFium decoder
*Note:* This is similar to PR 19525, which did the same thing for the OpenJPEG decoder.

The advantages of doing this are:
 - The same JBig2 decoder is used regardless of WASM being supported or not, which means consistent rendering.
 - The old `Jbig2Image` implementation has various bugs and missing features.
 - Less code that needs to be maintained in the PDF.js project, since both the CCITT and the JBig2 decoder is replaced.

The disadvantage of doing this is:
 - Slightly larger bundle size, however the effect is limited since a fair amount of PDF.js code can be removed. For the `gulp mozcentral` target the size increase is approximately 54 kilo-bytes (which is small compared to the 452 kilo-bytes for the JS version of the OpenJPEG decoder).
2026-04-22 23:24:26 +02:00
Calixte Denizet
42ccca7ee8
Don't decode name of the checkboxes exported values (bug 2022700) 2026-04-22 18:30:43 +02:00
Calixte Denizet
a52c8334f5 Avoid to add outlines having a deleted page which leads to clone a useless page (bug 2033908) 2026-04-21 22:23:28 +02:00
calixteman
3aab546524 Add code coverage support for browser/ref tests
Instrument JS files on-the-fly via babel-plugin-istanbul when --coverage
or --coverage-per-test is passed, producing an aggregate lcov/HTML report
at the end of the run. A persistent PDFWorker accumulates worker-thread
coverage alongside the main-thread coverage, collected via a new
GetWorkerCoverage message handler.

With --coverage-per-test, an inverted index
(build/coverage/per-test-index.json) is also built as tests run, mapping
each hit source line and function name to the numeric IDs of the tests
that exercised it, keeping the output compact. The new
`gulp test_search --code=file::line_or_function` tool queries the index,
and passing --code to browsertest pre-filters the test run to only those
tests.

Coverage output formats are selectable via --coverage-formats (default:
info; also accepts html, json, text, cobertura, clover).
2026-04-20 21:46:18 +02:00
Calixte Denizet
db89d3a0e6 Correctly compute the bbox when simplifying the path construction
It fixes #21126.
2026-04-20 18:42:09 +02:00
Jonas Jenwald
c155a86733 Store the Type1 program privateData in a Map, rather than an Object
This is nicer when checking if fields exist in `Type1Font.prototype.wrap`, and a couple of loops in that method are also "modernized" slightly.
2026-04-18 12:32:22 +02:00
Jonas Jenwald
92a0a91046 Pre-compute the length of more intermediate tables in createCmapTable (PR 21103 follow-up)
With the exception of `glyphsIds` the length of the other segments can be trivially determined upfront, which is obvious in hindsight. This way unnecessary allocations can be avoided when building the "cmap" table.
2026-04-16 11:46:59 +02:00
Jonas Jenwald
0a4e8d024d Use TypedArrays in the createNameTable function 2026-04-16 11:46:57 +02:00
Jonas Jenwald
cb935c35d3 Use TypedArrays in the createCmapTable function 2026-04-14 20:36:34 +02:00
Jonas Jenwald
f9ecebe63c Add a helper class for building TrueType font tables
This helps reduce the amount of boilerplate code needed in multiple spots throughout the font code, and more importantly it'll help when building TrueType tables whose final size is non-trivial to compute upfront.
2026-04-14 20:36:34 +02:00
Jonas Jenwald
634ce3c163 Convert the return value in createCmapTable and createNameTable to a TypedArray
Compared to the other TrueType table building functions, see previous patches, these ones are not trivial to convert to use TypedArrays properly.
However, in order to simplify the `OpenTypeFileBuilder` implementation a little bit we can at least have these functions return TypedArray data.
2026-04-14 12:28:45 +02:00
Jonas Jenwald
e8ed6c6e24 Use a TypedArray in the createOS2Table function 2026-04-14 10:43:42 +02:00
Jonas Jenwald
aa0bc24e95 Use a TypedArray in the createPostTable function 2026-04-14 10:43:42 +02:00