3406 Commits

Author SHA1 Message Date
Jonas Jenwald
5b368dd58a Remove the Uint8Array.prototype.toHex(), Uint8Array.prototype.toBase64(), and Uint8Array.fromBase64() polyfills
(During rebasing of the previous patches I happened to look at the polyfills and noticed that this one could be removed now.)

See:
 - https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Uint8Array/toHex#browser_compatibility
 - https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Uint8Array/toBase64#browser_compatibility
 - https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Uint8Array/fromBase64#browser_compatibility

Note that technically this functionality can still be disabled via a preference in Firefox, however that's slated for removal in [bug 1985120](https://bugzilla.mozilla.org/show_bug.cgi?id=1985120).
Looking at the Firefox source-code, see https://searchfox.org/firefox-main/search?q=array.tobase64%28%29&path=&case=false&regexp=false, you can see that it's already being used *unconditionally* elsewhere in the browser hence removing the polyfills ought to be fine (since toggling the preference would break other parts of the browser).
2026-01-29 17:27:43 +01:00
Jonas Jenwald
b3f35b6007 Return the rawFilename as-is even if it's empty, from FileSpec.prototype.serializable
It's more correct to return the `rawFilename` as-is, and limit the fallback for empty filenames to only the `filename` property.
2026-01-29 17:25:44 +01:00
Jonas Jenwald
640a3106d5 Remove caching/shadowing from the FileSpec getters, and simplify the code
Given that only the `FileSpec.prototype.serializable` getter is ever invoked from "outside" of the class, and only once per `FileSpec`-instance, the caching/shadowing isn't actually necessary.

Furthermore the `_contentRef`-caching wasn't actually correct, since it ended up storing a `BaseStream`-instance and those should *generally* never be cached.
(Since calling `BaseStream.prototype.getBytes()` more than once, without resetting the stream in between, will return an empty TypedArray after the first time.)
2026-01-25 13:16:29 +01:00
Jonas Jenwald
84b5866853 Reduce duplication in the pickPlatformItem helper function
Also, tweak code/comment used when handling "GoToR" destinations.
2026-01-25 13:16:06 +01:00
calixteman
ce296d8d42
Add the possibility to order the pages in an extracted pdf (bug 1997379)
or in a merged one.
2026-01-19 18:58:23 +01:00
Calixte Denizet
b5ed988267
Don't use contents stream which have an image format
The original bug has been filled in mupdf bug tracker:
https://bugs.ghostscript.com/show_bug.cgi?id=709033

The attached pdf can be open in Chrome but not in Acrobat.
2026-01-13 18:39:17 +01:00
calixteman
eab33828a9
Fix wasm url issue for the jbig2 decoder
and add a test for jbig2 decoding with the js decoder.
2026-01-04 00:08:59 +01:00
calixteman
98c1955bd4
Use the PDFium JBig2 decoder compiled into wasm
The decoder is ~4x faster than the JS decoder on large images.
2026-01-03 22:05:14 +01:00
calixteman
424c7989aa
Get glyph contours when stroking using a pattern
Fix issue #20513 (second part).
2025-12-28 22:55:59 +01:00
calixteman
5518c8a544
Use CIDToGIDMap when the font is a type 2 with an OpenType font
It fixes #18062.
2025-12-28 14:51:06 +01:00
Tim van der Meij
1990fa7cd0
Merge pull request #20538 from calixteman/issue13425
Fix the loca table length when there is enough space for it
2025-12-28 13:52:32 +01:00
calixteman
22932f7b68
Fix the loca table length when there is enough space for it
It fixes #13425.
2025-12-28 11:21:40 +01:00
calixteman
1dffcf7f25
Remove undefStack stuff in the cff parser
I think it should have been removed with #2527 so it should be useless now.
Because of that stuff, some commands with a wrong number of arguments
weren't stripped out (see the pdf in #13850).
2025-12-27 16:59:29 +01:00
calixteman
91033c2199
Fix the encoding for some missing chinese fonts
It fixes #20489.
2025-12-23 14:05:27 +01:00
Tim van der Meij
d946f05841
Merge pull request #20440 from Gaurang-5/master
Fix infinite loop in JBIG2 decoder with >4 referred-to segments
2025-12-09 20:42:51 +01:00
calixteman
f75812b0af
Merge pull request #20346 from ryzokuken/binary-fontpath
Encode FontPath data into an ArrayBuffer
2025-12-08 13:59:23 +01:00
Tim van der Meij
de5709a7cd
Merge pull request #20454 from xiaobai2017666/russian-char
Extend getGlyphMapForStandardFonts with some Russian entries (issue 20453)
2025-12-07 18:28:41 +01:00
Gaurang Bhatia
ac8d80a8e4 Fix infinite loop in JBIG2 decoder with >4 referred-to segments and add regression test 2025-12-07 06:46:16 +05:30
Ujjwal Sharma
3a85770af1 Encode FontPath data into an ArrayBuffer
Serialize FontPath commands into a binary format
and store it in an ArrayBuffer so that it can
eventually be stored in a SharedArrayBuffer.
2025-12-06 03:00:48 +05:30
Weismann
365cc69cae Extend getGlyphMapForStandardFonts with some Russian entries (issue 20453) 2025-12-01 10:21:27 +08:00
Calixte Denizet
516aea5562 [XFA] Set default max value in occur tag to -1 (bug 1998843) 2025-11-21 17:53:38 +01:00
calixteman
c6b61a34e6
Merge pull request #20436 from calixteman/merge_struct_trees
Merge the structure trees coming from different pdfs (bug 1997379)
2025-11-17 20:10:06 +01:00
Calixte Denizet
e13a618df3 Merge the structure trees coming from different pdfs (bug 1997379) 2025-11-17 19:56:36 +01:00
Calixte Denizet
50c48cf11b Add telemetry for tagged pdfs (bug 1997134) 2025-11-17 19:47:16 +01:00
calixteman
e7288dca8e
Merge pull request #20431 from calixteman/split_merge_p4
Add a wrapper for the new xref in order to be able to get some values from cloned dictionaries
2025-11-11 21:47:42 +01:00
Tim van der Meij
bc4d90711a
Merge pull request #20432 from calixteman/version
Version entry in the catalog has to be a name and not a string
2025-11-11 20:31:59 +01:00
Calixte Denizet
a98b0b1fb5 Version entry in the catalog has to be a name and not a string 2025-11-09 15:34:57 +01:00
Calixte Denizet
65881f0e21 Add a wrapper for the new xref in order to be able to get some values from cloned dictionaries 2025-11-09 15:28:43 +01:00
Calixte Denizet
37f4712f7e Update the named page destinations when some pdf are combined (bug 1997379)
and remove link annotations pointing on a deleted page.
2025-11-07 18:22:19 +01:00
Calixte Denizet
ad97c5b816 Update the page labels tree when a pdf is extracted (bug 1997379) 2025-11-07 15:59:57 +01:00
calixteman
85ed401b82
Merge pull request #20409 from calixteman/split_merge_p1
Add the possibility to create a pdf from different ones (bug 1997379)
2025-11-07 15:05:52 +01:00
Calixte Denizet
bc87f4e8d6 Add the possibility to create a pdf from different ones (bug 1997379)
For now it's just possible to create a single pdf in selecting some pages in different pdf sources.
The merge is for now pretty basic (it's why it's still a WIP) none of these data are merged for now:
 - the struct trees
 - the page labels
 - the outlines
 - named destinations
For there are 2 new ref tests where some new pdfs are created: one with some extracted pages and an other
one (encrypted) which is just rewritten.
The ref images are generated from the original pdfs in selecting the page we want and the new images are
taken from the generated pdfs.
2025-11-07 14:57:48 +01:00
Calixte Denizet
04db38558a Create the number tree for the ParentTree only one time 2025-11-05 17:49:55 +01:00
Tim van der Meij
6e7a6eb52b
Merge pull request #20408 from calixteman/fix_mml_encoding
Don't set the MathML namespace for attributes in MathML tags (bug 1997343)
2025-11-01 14:58:15 +01:00
Tim van der Meij
c696648826
Merge pull request #20404 from mozilla/revert-20031-telemetry_signature_certificate
Revert "Add some telemetry in order to know what are the certificates used in pdfs (bug 1973573)"
2025-11-01 14:55:07 +01:00
Calixte Denizet
6db23139be Don't set the MathML namespace for attributes in MathML tags (bug 1997343)
And by default a XML file is UTF-8 encoded so correctly decode the embedded file.
2025-10-30 18:37:19 +01:00
Edoardo Cavazza
4c22b99df3 Collect all child nodes of lists and tables 2025-10-29 17:30:46 +01:00
calixteman
aeceee1df3
Revert "Add some telemetry in order to know what are the certificates used in pdfs (bug 1973573)" 2025-10-29 15:41:34 +01:00
Coelacanthus
6590063614
Add the font PT Astra Serif as a possible substitution for Times New Roman
Metric-compatible font with Times New Roman created by ParaType, based on
their serif font PT Serif, released under OFL-1.1 license.

https://www.paratype.com/fonts/pt/pt-astra-serif

Signed-off-by: Coelacanthus <uwu@coelacanthus.name>
2025-10-29 17:15:31 +08:00
calixteman
520363b350
Merge pull request #20384 from calixteman/bug1937438
Make MathML elements visible in the struct tree (bug 1937438)
2025-10-23 17:55:42 +02:00
Calixte Denizet
e5a62c8d06 Make MathML elements visible in the struct tree (bug 1937438)
It'll help to make math equations "visible" for screen readers.
MS Office has a specific way to add some MathML code to struc tree leaf
and this patch handles it.
2025-10-23 16:29:01 +02:00
calixteman
1a8689b9be
Merge pull request #20340 from Aditi-1400/serialize-pattern-ab
Serialize pattern data into ArrayBuffer
2025-10-22 11:05:22 +02:00
Calixte Denizet
199b3d04df Fix stream use when getting the text (follow-up of #20373) 2025-10-18 22:58:27 +02:00
Calixte Denizet
05f368056d Use stream for whatever substrem in stream classes
and add a method in order to get the original stream.
When writing an existing stream it'll help to have the original one instead of the filtered one.
2025-10-17 22:26:05 +02:00
Calixte Denizet
bb2a1126e6 Use a binary format for the glyph paths
We used a SVG string which can be pass to the Path2D ctor but it's a bit slower than
building the path step by step.
Having numerical data instead of a string will help the font data serialization.
2025-10-16 15:52:51 +02:00
Calixte Denizet
aab521327b Very slightly improve intersector performance
It just avoid useless computations.
2025-10-13 14:55:44 +02:00
Aditi
fa631806bf Serialize pattern data into ArrayBuffer
Follow up on https://github.com/mozilla/pdf.js/pull/20197,
This serializes pattern data into an ArrayBuffer which is
then transferred from the worker to the main thread.

It sets up the stage for us to eventually switch to a
SharedArrayBuffer in the future.
2025-10-11 01:58:07 +05:30
calixteman
30fdf16071
Merge pull request #20354 from Aditi-1400/use-enum
Use enums instead of string for mesh shading figure type
2025-10-10 18:49:50 +02:00
calixteman
0d8a300777
Merge pull request #20353 from calixteman/improve_intersector
[Annotation] Improve the performance of the code for getting glyphs which belongs to annotations bounding boxes (bug 1987914)
2025-10-10 13:31:03 +02:00
Calixte Denizet
c4d436764c [Annotation] Improve the performance of the code for getting glyphs which belongs to annotations bounding boxes (bug 1987914)
Instead of looking at every bbox, we use a grid (64x64) where each cell of the grid is associated with the bboxes
touching it.
In order to get the potential bboxes containing a point, we just have to compute the number of the cell containing
it and in using the associated described above, we can quickly know if the point is contained.
With the pdf in the mentioned bug, it's ~20 times faster.
2025-10-10 13:28:18 +02:00