This patch adds right-click support for images in the PDF, allowing
users to download them. To minimize memory consumption, we:
- Do not store the images separately, and instead crop them out of the
PDF page canvas
- Only extract the images when needed (i.e. when the user right-clicks
on them), rather than eagery having all of them available.
To do so, we layer one empty 0x0 canvas per image, stretched to cover
the whole image, and only populate its contents on right click.
These images need to be inside the text layer: they cannot be _behind_
it, otherwise they would be covered by the text layer's container and
not be clickable, and they cannot be in front of it, otherwise they
would make the text spans unselectable.
This feature is managed by a new preference, `imagesRightClickMinSize`:
- when it's set to `-1`, right-click support is disabled
- when set to `0`, all images are available for right click
- when set to a positive integer, only images whose width and height are
greater than or equal to that value (in the PDF page frame of
reference) are available for right click.
This features is disabled by default outside of MOZCENTRAL, as it
significantly degrades the text selection experience in non-Firefox
browsers.
With the exception of the first invocation the callback function is unused, which means that a lot of pointless functions may be created.
To avoid this we introduce helper functions for simple cases, such as creating `Map`s and `Objects`s.
We used a SVG string which can be pass to the Path2D ctor but it's a bit slower than
building the path step by step.
Having numerical data instead of a string will help the font data serialization.
The position of the text rendered by `showText` is affected
incrementally by the preceding `showText` operations "on the same line".
For this reason, we keep track of all of them (with their dependencies)
in `sameLineText`.
`sameLineText` can be reset whenever we explicitly position the text
somewhere else. We were previously only doing it for `moveText`, and
this patch updates the code to also do it on `setTextMatrix` (which
resets `this.current.x/y` in `CanvasGraphics` to 0).
The complexity of subsequent `sameLineText` operations dependency
tracking grows quadratically with the number of operations on the same
line, so this patch fixes the performance problem when there are whole
pages of text that only use `setTextMatrix` and not `moveText`.
This PR changes the way we store bounding boxes so that they use less
memory and can be more easily shared across threads in the future.
Instead of storing the bounding box and list of dependencies for each
operation that renders _something_, we now only store the bounding box
of _every_ operation and no dependencies list. The bounding box of
each operation covers the bounding box of all the operations affected
by it that render something. For example, the bounding box of a
`setFont` operation will be the bounding box of all the `showText`
operations that use that font.
This affects the debugging experience in pdfBug, since now the bounding
box of an operation may be larger than what it renders itself. To help
with this, now when hovering on an operation we also highlight (in red)
all its dependents. We highlight with white stripes operations that do
not affect any part of the page (i.e. with an empty bbox).
To save memory, we now save bounding box x/y coordinates as uint8
rather than float64. This effectively gives us a 256x256 uniform grid
that covers the page, which is high enough resolution for the usecase.
This commit is a first step towards #6419, and it can also help with
first compute which ops can affect what is visible in that part of
the page.
This commit adds logic to track operations with their respective
bounding boxes. Only operations that actually cause something to
be rendered have a bounding box and dependencies.
Consider the following example:
```
0. setFillRGBColor
1. beginText
2. showText "Hello"
3. endText
4. constructPath [...] -> eoFill
```
here we have three rendering operations: the showText op (2) and the
path (4). (2) depends on (0), (1) and (3), while (4) only depends on
(0). Both (2) and (4) have a bounding box.
This tracking happens when first rendering a PDF: we then use the
recorded information to optimize future partial renderings of a PDF, so
that we can skip operations that do not affected the PDF area on the
canvas.
All this logic only runs when the new `enableOptimizedPartialRendering`
preference, disabled by default, is enabled.
The bounding boxes and dependencies are also shown in the pdfBug
stepper. When hovering over a step now:
- it highlights the steps that they depend on
- it highlights on the PDF itself the bounding box
It fixes#20065.
The only to get a path (from the path generator) is when the font is embedded.
So when we need a path (disableFontFace: true or when we want to use a pattern for stroking/filling), it's impossible
to fulfil.
It seems that the `@napi-rs/canvas` dependency has *basic* canvas-filter support, whereas the "old" `canvas` dependency didn't, hence we no longer need the Node.js-specific checks in the `src/display/canvas.js` file.
Note that I've successfully tested the [`pdf2png` example](https://github.com/mozilla/pdf.js/tree/master/examples/node/pdf2png) with this patch applied and things appear to work as before.
After PR 19731 the format of compiled Type3-glyphs is now simple enough that the compilation can be moved to the worker-thread, without introducing any significant additional complexity.
This allows us to, ever so slightly, simplify the implementation in `src/display/canvas.js` since the Type3 operatorLists will now directly include standard path-rendering operators (using the format introduced in PR 19689).
As part of these changes we also stop caching Type3 image masks since: we've not come across any cases where that actually helps, they're usually fairly small, and it simplifies the code.
Note that one "negative" change introduced in this patch is that we'll now compile Type3-glyphs *eagerly*, whereas previously we'd only do that lazily upon their first use.
However, this doesn't seem to impact performance in any noticeable way since the compilation is fast enough (way below 1 ms/glyph in my testing) and Type3-fonts are also limited to just 256 glyphs. Also, many (or most?) Type3-fonts don't even use image masks and are thus not affected by these changes.
In using the Firefox profiler (with JS allocations tracking) and wuppertal.pdf, I noticed
we were using a bit too much memory for a function which is supposed to just compute 2 numbers.
The memory used by itself isn't so important but having a too much objects lead to waste some time
to gc them.
So this patch aims to simplify it a bit.
The color-parameter is already available through `IR` (i.e. the internal representation), and after the changes in PR 4824 (which landed in 2014) we no longer need any special handling for it.
In the included PDF document the Type3-font doesn't contain any glyph definition for "space", despite that character being referenced in the /Contents stream.
While missing Type3-glyphs obviously cannot be rendered, we still need to update the current canvas position such that any char/word-spacing is correctly applied.
The test-case was found at https://github.com/pdf-association/pdf-differences/tree/main/Type3WordSpacing
Originally this function would "manually" invoke the rendering commands for Type3-glyphs, however that was changed some time ago:
- Initial `Path2D` support was added in PR 14858, but the old code kept for Node.js compatibility.
- Since PR 15951 we've been using a `Path2D` polyfill in Node.js environments.
Hence, after the previous commit, we can further simplify this function by *directly* returning/using the `Path2D` object when rendering Type3-glyphs; see also https://github.com/mozilla/pdf.js/pull/19731#discussion_r2018712695
While this won't improve performance significantly, when compared to the introduction of `Path2D`, it definately cannot hurt.
Rather than updating the transform every time that we're painting a Type3-glyph, we can instead just compute the "final" coordinates during building of the `Path2D` objects.
The rendering bug with issue17779.pdf is due to the fact that we call save on the suspended ctx
but not on the the current ctx. So each time we've something like save/transform/restore then
the transform not "removed" when restoring.
So this patch just apply the save/restore operations to ctx which are mirrored on the suspended one.
With this patch, all the paths components are collected in the worker until a path
operation is met (i.e., stroke, fill, ...).
Then in the canvas a Path2D is created and will replace the path data transfered from the worker,
this way when rescaling, the Path2D can be reused.
In term of performances, using Path2D is very slightly improving speed when scaling the canvas.
Currently we lookup the `devicePixelRatio`, with fallback handling, in a number of spots in the code-base.
Rather than duplicating code we can instead add a new static method in the `OutputScale` class, since that one is now exposed in the API.
It fixes#19360.
Each glyph in the test case has a fill and a stroke pattern, so the current transform used
to scale the glyph outline must be the same.
In setting the stroke color to green, I noticed that the last outline contains some non-closed
subpaths, so when generating the glyph outline, every time we 'moveTo', we close the previous
subpath.
Given that `ImageData` has been supported for many years in all browsers, see [MDN](https://developer.mozilla.org/en-US/docs/Web/API/ImageData#browser_compatibility), we have a `typeof` check that's only necessary in Node.js environments.
Since the `@napi-rs/canvas` package provides that functionality, we can thus add an `ImageData` polyfill which allows us to ever so slightly simplify the code.
It fixes#18956.
In the patch #18029, for performance reasons and because I thought it was useless, I deliberately chose to not fill the mask
with the backdrop color when it's full black: it was a bad idea.
So in this patch we always add the backdrop color to the mask.