Given that this file represents the official API, it's difficult to avoid it becoming fairly large as we add new functionality. However, it also contains a couple of smaller (and internal) helpers that we can move into a new utils-file.
Also, we inline the `DEFAULT_RANGE_CHUNK_SIZE` constant since it's only used *once* and its value has never been changed in over a decade.
To make it easier to tell which PDF.js version/commit that the *built* files correspond to, they have (since many years) included `pdfjsVersion` and `pdfjsBuild` constants with that information.
As currently implemented this has a few shortcomings:
- It requires manually adding the code, with its preprocessor statements, in all relevant files.
- It requires ESLint disable statements, since it's obviously unused code.
- Being unused, this code is removed in the minified builds.
- This information would be more appropriate as comments, however Babel discards all comments during building.
- It would be helpful to have this information at the top of the *built* files, however it's being moved during building.
To address all of these issues, we'll instead utilize Webpack to insert the version/commit information as a comment placed just after the license header.
In Webpack version `5.99.0` the way that `export` statements are handled was changed slightly, with much less boilerplate code being generated, which unfortunately breaks our `tweakWebpackOutput` function that's used to expose the exported properties globally and that e.g. the viewer depends upon.
Given that we were depending on formatting that should most likely be viewed as nothing more than an internal implementation detail in Webpack, we instead work-around this by manually defining the structures that were previously generated.
Obviously this will lead to a tiny bit more manual work in the future, however we don't change the API-surface often enough that it should be a big issue *and* the relevant unit-tests are updated such that it shouldn't be possible to break this.
*NOTE:* In the future we might want to consider no longer using global properties like this, and instead rely only on proper `export`s throughout the code-base.
However changing this would likely be non-trivial (given edge-cases), and it'd be an `api-major` change, so let's just do the minimal amount of work to unblock Webpack updates for now.
Currently we have a number of spots in the code-base where we need to clamp a value to a [min, max] range. This is either implemented using `Math.min`/`Math.max` or with a local helper function, which leads to some unnecessary duplication.
Hence this patch adds and re-uses a single helper function for this, which we'll hopefully be able to remove in the future once https://github.com/tc39/proposal-math-clamp/ becomes generally available.
Currently we re-implement the same helper function twice, which in hindsight seems like the wrong decision since that way it's quite easy for the implementations to accidentally diverge.
The reason for doing it this way was because the code in the worker-thread is able to check for `Ref`- and `Name`-instances directly, which obviously isn't possible in the viewer but can be solved by passing validation-functions to the helper.
Given that most inferred links will overlap existing LinkAnnotations, creating a lot of unused `borderStyle` objects seem unnecessary.
Hence we can move that into the `AnnotationLayer.prototype.addLinkAnnotations` method instead, which also allows us to slightly reduce the API-surface.
Automatically detect links in the text content of a file and automatically
generate link annotations at the appropriate locations to achieve
automatic link detection and hyperlinking.
These old exceptions have a fair amount of overlap given how/where they are being used, which is likely because they were introduced at different points in time, hence we can shorten and simplify the code by replacing them with a more general `ResponseException` instead.
Besides an error message, the new `ResponseException` instances also include:
- A numeric `status` field containing the server response status, similar to the old `UnexpectedResponseException`.
- A boolean `missing` field, to allow easily detecting the situations where `MissingPDFException` was previously thrown.
- Ensure that `pdfjsTestingUtils` is available when running benchmarking, since that shouldn't be done in TESTING-mode.
- Exclude the `test/stats/results/` folder from linting, since it'll contain *generated* JSON-files.
- Over time the number and size of these factories have increased, especially the `DOMFilterFactory` class, and this split should thus aid readability/maintainability of the code.
- By introducing a couple of new import maps we can avoid bundling the `DOMCMapReaderFactory`/`DOMStandardFontDataFactory` classes in the Firefox PDF Viewer, since they are dead code there given that worker-thread fetching is always being used.
- This patch has been successfully tested, by running `$ ./mach test toolkit/components/pdfjs/`, in a local Firefox artifact-build.
*Note:* This patch reduces the size of the `gulp mozcentral` output by `1.3` kilo-bytes, which isn't a lot but still cannot hurt.
One goal is to make the code for drawing with the Ink tool similar to the one to free highlighting:
it doesn't really make sense to have so different ways to do almost the same thing.
When the zoom level is high, it'll avoid to create a too big canvas covering all the page which consume
more memory, makes the drawing very slow and the overall user xp pretty bad.
A second goal is to be able to easily implement more drawing tools where we would just have to implement
how to draw from the pointer coordinates.
After the binary CMap format had been added there were also some ideas about *maybe* providing other formats, see [here](https://github.com/mozilla/pdf.js/pull/8064#issuecomment-279730182), however that was over seven years ago and we still only use binary CMaps.
Hence it now seems reasonable to simplify the relevant code by removing `CMapCompressionType` and instead just use a boolean to indicate the type of the built-in CMaps.
As far as I can tell `Outliner` is only exposed in the API because we need to access it when running some of the reference-tests, but is otherwise not used.
Hence this seems like something that should be kept *internal* and thus only exposed in TESTING-builds.
This is very old code, and predates e.g. the introduction of JavaScript classes, which creates unnecessarily unwieldy code in the viewer.
By introducing a new `TextLayer` class in the API, similar to how e.g. the `AnnotationLayer` looks, we're able to keep most parameters on the class-instance itself. This removes the need to manually track them in the viewer, and simplifies the call-sites.
This also removes the `numTextDivs` parameter from the "textlayerrendered" event, since that's only added to support default-viewer functionality that no longer exists.
Finally we try, as far as possible, to polyfill the old `renderTextLayer` and `updateTextLayer` functions since they are exposed in the library API.
For *simple* invocations of `renderTextLayer` the behaviour should thus be the same, with only a warning printed in the console.
This replaces our custom `PromiseCapability`-class with the new native `Promise.withResolvers()` functionality, which does *almost* the same thing[1]; please see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Promise/withResolvers
The only difference is that `PromiseCapability` also had a `settled`-getter, which was however not widely used and the call-sites can either be removed or re-factored to avoid it. In particular:
- In `src/display/api.js` we can tweak the `PDFObjects`-class to use a "special" initial data-value and just compare against that, in order to replace the `settled`-state.
- In `web/app.js` we change the only case to manually track the `settled`-state, which should hopefully be OK given how this is being used.
- In `web/pdf_outline_viewer.js` we can remove the `settled`-checks, since the code should work just fine without it. The only thing that could potentially happen is that we try to `resolve` a Promise multiple times, which is however *not* a problem since the value of a Promise cannot be changed once fulfilled or rejected.
- In `web/pdf_viewer.js` we can remove the `settled`-checks, since the code should work fine without them:
- For the `_onePageRenderedCapability` case the `settled`-check is used in a `EventBus`-listener which is *removed* on its first (valid) invocation.
- For the `_pagesCapability` case the `settled`-check is used in a print-related helper that works just fine with "only" the other checks.
- In `test/unit/api_spec.js` we can change the few relevant cases to manually track the `settled`-state, since this is both simple and *test-only* code.
---
[1] In browsers/environments that lack native support, note [the compatibility data](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Promise/withResolvers#browser_compatibility), it'll be polyfilled via the `core-js` library (but only in `legacy` builds).
The doorhanger for highlighting has a basic color picker composed of 5 predefined colors
to set the default color to use.
These colors can be changed thanks to a preference for now but it's something which could
be changed in the Firefox settings in the future.
Each highlight has in its own toolbar a color picker to just change its color.
The different color pickers are so similar (modulo few differences in their styles) that
this patch introduces a new class ColorPicker which provides a color picker component
which could be reused in future editors.
All in all, a large part of this patch is dedicated to color picker itself and its style
and the rest is almost a matter of wiring the component.
The goal is to be able to get these outlines to fill the shape corresponding
to a text selection in order to highlight some text contents.
The outlines will be used either to show selected/hovered highlights.
- Re-factor the existing `fetchData` helper function such that it can fetch more types of data, and it now supports "arraybuffer", "json", and "text".
This only needed minor adjustments in the `DOMCMapReaderFactory` and `DOMStandardFontDataFactory` classes.[1]
- Expose the `fetchData` helper function in the API, such that the viewer is able to access it.
- Use the `fetchData` helper function in the `GenericL10n` class, since this should allow fetching of localization-data even if the default viewer is run in an environment without support for the Fetch API.
---
[1] While testing this I also noticed a minor inconsistency when handling standard font-data on the worker-thread.
At this point in time all browsers, and also Node.js, support standard `import`/`export` statements and we can now finally consider outputting modern JavaScript modules in the builds.[1]
In order for this to work we can *only* use proper `import`/`export` statements throughout the main code-base, and (as expected) our Node.js support made this much more complicated since both the official builds and the GitHub Actions-based tests must keep working.[2]
One remaining issue is that the `pdf.scripting.js` file cannot be built as a JavaScript module, since doing so breaks PDF scripting.
Note that my initial goal was to try and split these changes into a couple of commits, however that unfortunately didn't really work since it turned out to be difficult for smaller patches to work correctly and pass (all) tests that way.[3]
This is a classic case of every change requiring a couple of other changes, with each of those changes requiring further changes in turn and the size/scope quickly increasing as a result.
One possible "issue" with these changes is that we'll now only output JavaScript modules in the builds, which could perhaps be a problem with older tools. However it unfortunately seems far too complicated/time-consuming for us to attempt to support both the old and modern module formats, hence the alternative would be to do "nothing" here and just keep our "old" builds.[4]
---
[1] The final blocker was module support in workers in Firefox, which was implemented in Firefox 114; please see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Statements/import#browser_compatibility
[2] It's probably possible to further improve/simplify especially the Node.js-specific code, but it does appear to work as-is.
[3] Having partially "broken" patches, that fail tests, as part of the commit history is *really not* a good idea in general.
[4] Outputting JavaScript modules was first requested almost five years ago, see issue 10317, and nowadays there *should* be much better support for JavaScript modules in various tools.
This has been deprecated since version `2.15.349`, which is a year ago.
Removing this will also simplify some upcoming changes, specifically outputting of JavaScript modules in the builds.
By leveraging import maps we can get rid of *most* of the remaining `require`-calls in the `src/display/`-folder, since we should strive to use modern `import`-statements wherever possible.
The only remaining cases are Node.js-specific dependencies, since those seem very difficult to convert unless we start producing a bundle *specifically* for Node.js environments.
After PR 16226 the deprecated SVG back-end is now unused in development mode, with the exception of unit-tests, hence we can re-factor how it's exposed in the API to avoid including a useless webpack-closure in e.g. the *built-in* Firefox PDF Viewer.
Some arabic chars like \ufe94 could be searched in a pdf, hence it must be normalized
when creating the search query. So to avoid to duplicate the normalization code,
everything is moved in the find controller.
The previous code to normalize text was using NFKC but with a hardcoded map, hence it
has been replaced by the use of normalize("NFKC") (it helps to reduce the bundle size
by 30kb).
In playing with this \ufe94 char, I noticed that the bidi algorithm wasn't taking into
account some RTL unicode ranges, the generated font wasn't embedding the mapping this
char and the unicode ranges in the OS/2 table weren't up-to-date.
When normalized some chars can be replaced by several ones and it induced to have
some extra chars in the text layer. To avoid any regression, when copying some text
from the text layer, a copied string is normalized (NFKC) before being put in the
clipboard (it works like this in either Acrobat or Chrome).
This patch extends PR 16115 to work in all browsers, regardless of their `OffscreenCanvas` support, such that transfer functions will be applied to general rendering (and not just image data).
In order to do this we introduce the `BaseFilterFactory` that is then extended in browsers/Node.js environments, similar to all the other factories used in the API, such that we always have the necessary factory available in `src/display/canvas.js`.
These changes help simplify the existing `putBinaryImageData` function, and the new method can easily be stubbed-out in the Firefox PDF Viewer.
*Please note:* This patch removes the old *partial* transfer function support, which only applied to image data, from Node.js environments since the `node-canvas` package currently doesn't support filters. However, this should hopefully be fine given that:
- Transfer functions are not very commonly used in PDF documents.
- Browsers in general, and Firefox in particular, are the *primary* development target for the PDF.js library.
- The FAQ only lists Node.js as *mostly* supported, see https://github.com/mozilla/pdf.js/wiki/Frequently-Asked-Questions#faq-support
This was deprecated in PR 15758, which has now been included in three official PDF.js releases.
While PR 15880 did limit the bundle-size impact of this functionality on e.g. the Firefox PDF Viewer, it still leads to some unnecessary "bloat" that these changes remove.
Furthermore, with this being deprecated there'd also be no effort put into e.g. extending the `UNSUPPORTED_FEATURES` list when handling future error cases.
We introduced the use of OffscreenCanvas in #14754 and this patch aims
to use them for all kind of images.
It'll slightly improve performances (and maybe slightly decrease memory use).
Since an image can be rendered in using some transfer maps but because of
OffscreenCanvas we don't have the underlying pixels array the transfer maps
stuff is re-implemented in using the SVG filter feComponentTransfer.
In the mac case we don't want to care about the scaleFactor threshold
because else if too big another move could start and then subsequent
events aren't considered as wheel events.
It isn't really ideal and at some point we'll need to find a way at
least for the Firefox case to get the real events instead of the fake
wheel ones.
Given that this is internal functionality, not exposed in the official API, it's not entirely clear (at least to me) why we can't just initialize this directly in `src/display/api.js` instead.
When testing both the development viewer and all the ways in which we run tests, everthing still appears to work just fine with this patch.
The idea is just to resuse what we got on the first draw.
Now, we only update the scaleX of the different spans and the other values
are dependant of --scale-factor.
Move some properties in the CSS in order to avoid any updates in JS.
This patch has been successfully tested in a local, artifact, Firefox build.
*Please note:* The only thing that'll no longer work for PDF documents opened using "data:"-URLs is middle-clicking on internal/outline links, in order to open the destination in a new tab. This is however an extremely small loss of functionality, and as can be seen in the bug the alternative (i.e. doing nothing) is surely much worse.