For now it's just possible to create a single pdf in selecting some pages in different pdf sources.
The merge is for now pretty basic (it's why it's still a WIP) none of these data are merged for now:
- the struct trees
- the page labels
- the outlines
- named destinations
For there are 2 new ref tests where some new pdfs are created: one with some extracted pages and an other
one (encrypted) which is just rewritten.
The ref images are generated from the original pdfs in selecting the page we want and the new images are
taken from the generated pdfs.
This PR changes the way we store bounding boxes so that they use less
memory and can be more easily shared across threads in the future.
Instead of storing the bounding box and list of dependencies for each
operation that renders _something_, we now only store the bounding box
of _every_ operation and no dependencies list. The bounding box of
each operation covers the bounding box of all the operations affected
by it that render something. For example, the bounding box of a
`setFont` operation will be the bounding box of all the `showText`
operations that use that font.
This affects the debugging experience in pdfBug, since now the bounding
box of an operation may be larger than what it renders itself. To help
with this, now when hovering on an operation we also highlight (in red)
all its dependents. We highlight with white stripes operations that do
not affect any part of the page (i.e. with an empty bbox).
To save memory, we now save bounding box x/y coordinates as uint8
rather than float64. This effectively gives us a 256x256 uniform grid
that covers the page, which is high enough resolution for the usecase.
This commit is a first step towards #6419, and it can also help with
first compute which ops can affect what is visible in that part of
the page.
This commit adds logic to track operations with their respective
bounding boxes. Only operations that actually cause something to
be rendered have a bounding box and dependencies.
Consider the following example:
```
0. setFillRGBColor
1. beginText
2. showText "Hello"
3. endText
4. constructPath [...] -> eoFill
```
here we have three rendering operations: the showText op (2) and the
path (4). (2) depends on (0), (1) and (3), while (4) only depends on
(0). Both (2) and (4) have a bounding box.
This tracking happens when first rendering a PDF: we then use the
recorded information to optimize future partial renderings of a PDF, so
that we can skip operations that do not affected the PDF area on the
canvas.
All this logic only runs when the new `enableOptimizedPartialRendering`
preference, disabled by default, is enabled.
The bounding boxes and dependencies are also shown in the pdfBug
stepper. When hovering over a step now:
- it highlights the steps that they depend on
- it highlights on the PDF itself the bounding box
It fixes#20065.
The only to get a path (from the path generator) is when the font is embedded.
So when we need a path (disableFontFace: true or when we want to use a pattern for stroking/filling), it's impossible
to fulfil.
Necessary because when there is no Popup annotation created along
with a Text annotation, the Popup annotation created by pdf.js
does not receive the noRotate flag
The shadow was taken into account when computing the bounding box of the section
containing the link and it was making the clip path wrong.
Since the shadow is almost invisible because of the opacity, the yellow color and the clip
we can remove it without causing any visual regressions (and as a side effect it'll avoid
to use resources to compute it when displayed).
We have a fallback for the common case of Type3 fonts without a /FontDescriptor dictionary, however we also need to handle the case where it's present but lacking the required /FontName entry.
In the referenced PDF document the keys, in the /Dests dictionary, need to account for PDFDocEncoding.
To improve destination handling in general we'll now unconditionally fallback to always checking all destinations.
For simplicity we will abort /Form XObject parsing *immediately* when encountering a circular reference, rather than letting it continue up until some limit (as e.g. PDFium appears to do), which should be fine since there are never any guarantees if/how *corrupt* PDF documents will render.
Previously we'd only do this for Type1/CFF fonts, see e.g. PR 6736, since the font-program may update the /FontMatrix.
However, it seems that we should do this unconditionally to account for fonts with non-default /FontMatrix-entries in the font-dictionary (which seem to be pretty rare).
In PDF version 2.0 the handling of Indexed color spaces was clarified as follows:
> The index value should be an integer in the range 0 to hival. If the value is a real number, it shall be rounded to the nearest integer (0.5 values shall be rounded up); if it is outside the range 0 to hival, it shall be adjusted to the nearest value within that range.
Please refer to https://github.com/pdf-association/pdf-differences/tree/main/IndexedColor
In the included PDF document the Type3-font doesn't contain any glyph definition for "space", despite that character being referenced in the /Contents stream.
While missing Type3-glyphs obviously cannot be rendered, we still need to update the current canvas position such that any char/word-spacing is correctly applied.
The test-case was found at https://github.com/pdf-association/pdf-differences/tree/main/Type3WordSpacing
Originally this function would "manually" invoke the rendering commands for Type3-glyphs, however that was changed some time ago:
- Initial `Path2D` support was added in PR 14858, but the old code kept for Node.js compatibility.
- Since PR 15951 we've been using a `Path2D` polyfill in Node.js environments.
Hence, after the previous commit, we can further simplify this function by *directly* returning/using the `Path2D` object when rendering Type3-glyphs; see also https://github.com/mozilla/pdf.js/pull/19731#discussion_r2018712695
While this won't improve performance significantly, when compared to the introduction of `Path2D`, it definately cannot hurt.
With this patch, all the paths components are collected in the worker until a path
operation is met (i.e., stroke, fill, ...).
Then in the canvas a Path2D is created and will replace the path data transfered from the worker,
this way when rescaling, the Path2D can be reused.
In term of performances, using Path2D is very slightly improving speed when scaling the canvas.