calixteman
eab33828a9
Fix wasm url issue for the jbig2 decoder
...
and add a test for jbig2 decoding with the js decoder.
2026-01-04 00:08:59 +01:00
calixteman
98c1955bd4
Use the PDFium JBig2 decoder compiled into wasm
...
The decoder is ~4x faster than the JS decoder on large images.
2026-01-03 22:05:14 +01:00
calixteman
424c7989aa
Get glyph contours when stroking using a pattern
...
Fix issue #20513 (second part).
2025-12-28 22:55:59 +01:00
calixteman
5518c8a544
Use CIDToGIDMap when the font is a type 2 with an OpenType font
...
It fixes #18062 .
2025-12-28 14:51:06 +01:00
Tim van der Meij
1990fa7cd0
Merge pull request #20538 from calixteman/issue13425
...
Fix the loca table length when there is enough space for it
2025-12-28 13:52:32 +01:00
calixteman
22932f7b68
Fix the loca table length when there is enough space for it
...
It fixes #13425 .
2025-12-28 11:21:40 +01:00
calixteman
1dffcf7f25
Remove undefStack stuff in the cff parser
...
I think it should have been removed with #2527 so it should be useless now.
Because of that stuff, some commands with a wrong number of arguments
weren't stripped out (see the pdf in #13850 ).
2025-12-27 16:59:29 +01:00
calixteman
91033c2199
Fix the encoding for some missing chinese fonts
...
It fixes #20489 .
2025-12-23 14:05:27 +01:00
Tim van der Meij
d946f05841
Merge pull request #20440 from Gaurang-5/master
...
Fix infinite loop in JBIG2 decoder with >4 referred-to segments
2025-12-09 20:42:51 +01:00
calixteman
f75812b0af
Merge pull request #20346 from ryzokuken/binary-fontpath
...
Encode FontPath data into an ArrayBuffer
2025-12-08 13:59:23 +01:00
Tim van der Meij
de5709a7cd
Merge pull request #20454 from xiaobai2017666/russian-char
...
Extend getGlyphMapForStandardFonts with some Russian entries (issue 20453)
2025-12-07 18:28:41 +01:00
Gaurang Bhatia
ac8d80a8e4
Fix infinite loop in JBIG2 decoder with >4 referred-to segments and add regression test
2025-12-07 06:46:16 +05:30
Ujjwal Sharma
3a85770af1
Encode FontPath data into an ArrayBuffer
...
Serialize FontPath commands into a binary format
and store it in an ArrayBuffer so that it can
eventually be stored in a SharedArrayBuffer.
2025-12-06 03:00:48 +05:30
Weismann
365cc69cae
Extend getGlyphMapForStandardFonts with some Russian entries (issue 20453)
2025-12-01 10:21:27 +08:00
Calixte Denizet
516aea5562
[XFA] Set default max value in occur tag to -1 (bug 1998843)
2025-11-21 17:53:38 +01:00
calixteman
c6b61a34e6
Merge pull request #20436 from calixteman/merge_struct_trees
...
Merge the structure trees coming from different pdfs (bug 1997379)
2025-11-17 20:10:06 +01:00
Calixte Denizet
e13a618df3
Merge the structure trees coming from different pdfs (bug 1997379)
2025-11-17 19:56:36 +01:00
Calixte Denizet
50c48cf11b
Add telemetry for tagged pdfs (bug 1997134)
2025-11-17 19:47:16 +01:00
calixteman
e7288dca8e
Merge pull request #20431 from calixteman/split_merge_p4
...
Add a wrapper for the new xref in order to be able to get some values from cloned dictionaries
2025-11-11 21:47:42 +01:00
Tim van der Meij
bc4d90711a
Merge pull request #20432 from calixteman/version
...
Version entry in the catalog has to be a name and not a string
2025-11-11 20:31:59 +01:00
Calixte Denizet
a98b0b1fb5
Version entry in the catalog has to be a name and not a string
2025-11-09 15:34:57 +01:00
Calixte Denizet
65881f0e21
Add a wrapper for the new xref in order to be able to get some values from cloned dictionaries
2025-11-09 15:28:43 +01:00
Calixte Denizet
37f4712f7e
Update the named page destinations when some pdf are combined (bug 1997379)
...
and remove link annotations pointing on a deleted page.
2025-11-07 18:22:19 +01:00
Calixte Denizet
ad97c5b816
Update the page labels tree when a pdf is extracted (bug 1997379)
2025-11-07 15:59:57 +01:00
calixteman
85ed401b82
Merge pull request #20409 from calixteman/split_merge_p1
...
Add the possibility to create a pdf from different ones (bug 1997379)
2025-11-07 15:05:52 +01:00
Calixte Denizet
bc87f4e8d6
Add the possibility to create a pdf from different ones (bug 1997379)
...
For now it's just possible to create a single pdf in selecting some pages in different pdf sources.
The merge is for now pretty basic (it's why it's still a WIP) none of these data are merged for now:
- the struct trees
- the page labels
- the outlines
- named destinations
For there are 2 new ref tests where some new pdfs are created: one with some extracted pages and an other
one (encrypted) which is just rewritten.
The ref images are generated from the original pdfs in selecting the page we want and the new images are
taken from the generated pdfs.
2025-11-07 14:57:48 +01:00
Calixte Denizet
04db38558a
Create the number tree for the ParentTree only one time
2025-11-05 17:49:55 +01:00
Tim van der Meij
6e7a6eb52b
Merge pull request #20408 from calixteman/fix_mml_encoding
...
Don't set the MathML namespace for attributes in MathML tags (bug 1997343)
2025-11-01 14:58:15 +01:00
Tim van der Meij
c696648826
Merge pull request #20404 from mozilla/revert-20031-telemetry_signature_certificate
...
Revert "Add some telemetry in order to know what are the certificates used in pdfs (bug 1973573)"
2025-11-01 14:55:07 +01:00
Calixte Denizet
6db23139be
Don't set the MathML namespace for attributes in MathML tags (bug 1997343)
...
And by default a XML file is UTF-8 encoded so correctly decode the embedded file.
2025-10-30 18:37:19 +01:00
Edoardo Cavazza
4c22b99df3
Collect all child nodes of lists and tables
2025-10-29 17:30:46 +01:00
calixteman
aeceee1df3
Revert "Add some telemetry in order to know what are the certificates used in pdfs (bug 1973573)"
2025-10-29 15:41:34 +01:00
Coelacanthus
6590063614
Add the font PT Astra Serif as a possible substitution for Times New Roman
...
Metric-compatible font with Times New Roman created by ParaType, based on
their serif font PT Serif, released under OFL-1.1 license.
https://www.paratype.com/fonts/pt/pt-astra-serif
Signed-off-by: Coelacanthus <uwu@coelacanthus.name>
2025-10-29 17:15:31 +08:00
calixteman
520363b350
Merge pull request #20384 from calixteman/bug1937438
...
Make MathML elements visible in the struct tree (bug 1937438)
2025-10-23 17:55:42 +02:00
Calixte Denizet
e5a62c8d06
Make MathML elements visible in the struct tree (bug 1937438)
...
It'll help to make math equations "visible" for screen readers.
MS Office has a specific way to add some MathML code to struc tree leaf
and this patch handles it.
2025-10-23 16:29:01 +02:00
calixteman
1a8689b9be
Merge pull request #20340 from Aditi-1400/serialize-pattern-ab
...
Serialize pattern data into ArrayBuffer
2025-10-22 11:05:22 +02:00
Calixte Denizet
199b3d04df
Fix stream use when getting the text (follow-up of #20373 )
2025-10-18 22:58:27 +02:00
Calixte Denizet
05f368056d
Use stream for whatever substrem in stream classes
...
and add a method in order to get the original stream.
When writing an existing stream it'll help to have the original one instead of the filtered one.
2025-10-17 22:26:05 +02:00
Calixte Denizet
bb2a1126e6
Use a binary format for the glyph paths
...
We used a SVG string which can be pass to the Path2D ctor but it's a bit slower than
building the path step by step.
Having numerical data instead of a string will help the font data serialization.
2025-10-16 15:52:51 +02:00
Calixte Denizet
aab521327b
Very slightly improve intersector performance
...
It just avoid useless computations.
2025-10-13 14:55:44 +02:00
Aditi
fa631806bf
Serialize pattern data into ArrayBuffer
...
Follow up on https://github.com/mozilla/pdf.js/pull/20197 ,
This serializes pattern data into an ArrayBuffer which is
then transferred from the worker to the main thread.
It sets up the stage for us to eventually switch to a
SharedArrayBuffer in the future.
2025-10-11 01:58:07 +05:30
calixteman
30fdf16071
Merge pull request #20354 from Aditi-1400/use-enum
...
Use enums instead of string for mesh shading figure type
2025-10-10 18:49:50 +02:00
calixteman
0d8a300777
Merge pull request #20353 from calixteman/improve_intersector
...
[Annotation] Improve the performance of the code for getting glyphs which belongs to annotations bounding boxes (bug 1987914)
2025-10-10 13:31:03 +02:00
Calixte Denizet
c4d436764c
[Annotation] Improve the performance of the code for getting glyphs which belongs to annotations bounding boxes (bug 1987914)
...
Instead of looking at every bbox, we use a grid (64x64) where each cell of the grid is associated with the bboxes
touching it.
In order to get the potential bboxes containing a point, we just have to compute the number of the cell containing
it and in using the associated described above, we can quickly know if the point is contained.
With the pdf in the mentioned bug, it's ~20 times faster.
2025-10-10 13:28:18 +02:00
Aditi
e8d08c941c
Use enums instead of string for mesh shading figure type
2025-10-10 04:21:03 +05:30
Calixte Denizet
9797dc0eb4
Improve performance of the struct tree build (bug 1987914)
...
For the pdf in bug 1987914, the overall time spent in `addTopLevelNode` is dropping from ~6s to ~70ms.
2025-10-09 16:08:56 +02:00
Calixte Denizet
19ff148163
Fix incremental saving with hybrid references
...
This patch removes some previous fixes which are now likely fixed by #17636 .
Fixes #20302 .
2025-10-04 18:31:55 +02:00
Calixte Denizet
4d15bfec0d
Only apply word spacing when there is a 0x20 in the text chunk
...
Fixes #20319 .
2025-10-03 22:18:02 +02:00
calixteman
3234912c86
Merge pull request #20224 from james-atticus/improve-serif-fallback-font-name-matching
...
Improve serif fallback font name matching
2025-10-01 19:58:13 +02:00
Ujjwal Sharma
4bed7370f4
[WIP] Serialize font data into an ArrayBuffer
...
This PR serializes font data into an ArrayBuffer
that is then transfered from the worker to the
main thread. It's more efficient than the current
solution which clones the "export data" object
which includes the font data as a Uint8Array.
It prepares us to switch to a SharedArrayBuffer
in the future, which would allow us to share
the font data with multiple agents, which would be
crucial for the upcoming "renderer" worker.
2025-09-19 12:02:40 +05:30