386 Commits

Author SHA1 Message Date
Calixte Denizet
04272de41d
Add the possibility to save added annotations when reorganizing a pdf (bug 2023086) 2026-03-20 10:55:47 +01:00
Tim van der Meij
ff1af5a058
Merge pull request #20916 from calixteman/fix_co
When merging pdfs, fix the CO after the fields have been cloned
2026-03-19 21:22:43 +01:00
Tim van der Meij
6245bb201c
Merge pull request #20915 from calixteman/fix_pageindice
Avoid to use a used slot when looking for a new page position
2026-03-19 21:22:32 +01:00
Tim van der Meij
8cae5d17f2
Merge pull request #20917 from calixteman/fix_dup_name_dest
Fix the destination names when they're duplicated
2026-03-19 21:22:19 +01:00
Jonas Jenwald
7609a42209 Use toBeInstanceOf consistently in the unit-tests
There's currently a lot of unit-tests that manually check `instanceof`, let's replace that with the built-in Jasmine matcher function; see https://jasmine.github.io/api/edge/matchers.html#toBeInstanceOf
2026-03-19 17:18:25 +01:00
Calixte Denizet
cf67c1ef1e
Fix the destination names when they're duplicated 2026-03-19 10:52:39 +01:00
Calixte Denizet
b7da4b80a9
When merging pdfs, fix the CO after the fields have been cloned 2026-03-19 10:09:40 +01:00
Calixte Denizet
0bee641fed
Avoid to use a used slot when looking for a new page position 2026-03-19 09:40:16 +01:00
Calixte Denizet
e67892d035
Add support for saving outlines after reorganize/merge (bug 2009574) 2026-03-17 22:22:13 +01:00
calixteman
baf8647b1f
Add the possibility to merge/update acroforms when merging/extracting (bug 2015853) 2026-03-07 19:03:02 +01:00
calixteman
ed390c06a1
Fix intermittent issue with a unit test
Avoid to rely on timing in the test, which can cause intermittent failures.
Instead, we check that the image is cached at the document/page level.
2026-03-01 22:59:04 +01:00
Jonas Jenwald
fecb0aab1d Improve the PDFDataRangeTransport unit-tests
- Add a new test using only streaming, since that was missing and the lack of which most likely contributed to previous bugs in the `PDFDataRangeTransport` implementation (see PR 10675 and 20634).

 - Improve the "ranges and streaming" test, to utilize both ranges *and* streaming properly, since the way it was written seemed somewhat unrealistic given how data will normally arrive when `PDFDataRangeTransport` is being used.

 - Provide more `initialData`, in relevant tests, since a length smaller than `rangeChunkSize` seem pretty pointless.

 - Test the `contentDispositionFilename`, and `contentLength`, handling in the `PDFDataRangeTransport` implementation.
2026-02-27 14:55:39 +01:00
calixteman
4b4ab10c54
Set a pages mapper per loaded document
It fixes #20629.
2026-02-08 21:09:27 +01:00
calixteman
22b97d1741
Flush the text content chunk only on real font changes (bug 2013793) 2026-02-03 23:11:31 +01:00
Jonas Jenwald
d25f13d1fd Report loading progress "automatically" when using the PDFDataTransportStream class, and remove the PDFDataRangeTransport.prototype.onDataProgress method
This is consistent with the other `BasePDFStream` implementations, and simplifies the API surface of the `PDFDataRangeTransport` class (note the changes in the viewer).
Given that the `onDataProgress` method was changed to a no-op this won't affect third-party users, assuming there even are any since this code was written specifically for the Firefox PDF Viewer.
2026-02-01 18:20:19 +01:00
Jonas Jenwald
ecb09d62fc Add the current loading percentage to the onPassword callback
The percentage calculation is currently "spread out" across various viewer functionality, which we can avoid by having the API handle that instead.

Also, remove the `this.#lastProgress` special-case[1] and just register a "normal" `fullReader.onProgress` callback unconditionally. Once `headersReady` is resolved the callback can simply be removed when not needed, since the "worst" thing that could theoretically happen is that the loadingBar (in the viewer) updates sooner this way. In practice though, since `fullReader.read` cannot return data until `headersReady` is resolved, this change is not actually observable in the API.

---

[1] This was added in PR 8617, close to a decade ago, but it's not obvious to me that it was ever necessary to implement it that way.
2026-01-31 16:33:58 +01:00
calixteman
ce296d8d42
Add the possibility to order the pages in an extracted pdf (bug 1997379)
or in a merged one.
2026-01-19 18:58:23 +01:00
Calixte Denizet
e13a618df3 Merge the structure trees coming from different pdfs (bug 1997379) 2025-11-17 19:56:36 +01:00
Calixte Denizet
37f4712f7e Update the named page destinations when some pdf are combined (bug 1997379)
and remove link annotations pointing on a deleted page.
2025-11-07 18:22:19 +01:00
Calixte Denizet
ad97c5b816 Update the page labels tree when a pdf is extracted (bug 1997379) 2025-11-07 15:59:57 +01:00
Calixte Denizet
bc87f4e8d6 Add the possibility to create a pdf from different ones (bug 1997379)
For now it's just possible to create a single pdf in selecting some pages in different pdf sources.
The merge is for now pretty basic (it's why it's still a WIP) none of these data are merged for now:
 - the struct trees
 - the page labels
 - the outlines
 - named destinations
For there are 2 new ref tests where some new pdfs are created: one with some extracted pages and an other
one (encrypted) which is just rewritten.
The ref images are generated from the original pdfs in selecting the page we want and the new images are
taken from the generated pdfs.
2025-11-07 14:57:48 +01:00
Calixte Denizet
19ff148163 Fix incremental saving with hybrid references
This patch removes some previous fixes which are now likely fixed by #17636.

Fixes #20302.
2025-10-04 18:31:55 +02:00
Calixte Denizet
4d15bfec0d Only apply word spacing when there is a 0x20 in the text chunk
Fixes #20319.
2025-10-03 22:18:02 +02:00
Calixte Denizet
af144be3ba Don't iterate over all empty slots in the xref entries (bug 1980958) 2025-08-25 14:02:08 +02:00
Calixte Denizet
ebc3411727 Use the cached annotations when collecting them by types 2025-08-21 18:04:00 +02:00
Calixte Denizet
9e5ee1e5a7 [Editor] Add the ability to get all the editable annotations in a pdf document
We want to be able to show all the comments in a pdf even if the pages where they are
haven't been rendered.
And it'll help to fix the issue #18915.
2025-08-18 21:31:11 +02:00
Calixte Denizet
57ce4f8f43 Use a HTML date/time input when a field requires a date or a time.
The user will be able to enter a date in the format corresponding to their locale
and it'll be formatted in using the format provided by the pdf.
2025-07-24 22:01:45 +02:00
calixteman
1b427a3af5
Merge pull request #20016 from ryzokuken/move-getcontext
[api-minor] Move getContext call to InternalRenderTask
2025-07-08 22:20:19 +02:00
Ujjwal Sharma
b1b728d47f [api-minor] Move getContext call to InternalRenderTask
This is a precursor to moving the call into a
worker thread to let us use `OffscreenCanvas`. The
current position wouldn't work since we make
transformations to the canvas object after the
getContext call, which isn't allowed for
OffscreenCanvas. Also it isn't allowed to clone or
`transferControlToOffscreen` the canvas after the
`getContext` call.
2025-07-04 00:53:51 +02:00
Calixte Denizet
3bdc5d54fe Get the text under highlight/squiggly/underline/strikethrough annotations (bug 1885505)
and add an invisible element containing the text in the annotation layer to make
it readable by a screen reader.
2025-06-22 21:47:29 +02:00
Calixte Denizet
5789afd3f8 Create the css color to use with the canvas in the worker
It slightly reduces the time spent to draw and the memory used.
2025-05-19 14:52:24 +02:00
Jonas Jenwald
ab672f0b77 Replace PDFWorker.fromPort with a generic PDFWorker.create method
This allows us to simply invoke `PDFWorker.create` unconditionally from the `getDocument` function, without having to manually check if a global `workerPort` is available first.
2025-05-17 16:13:41 +02:00
Jonas Jenwald
b629bafd1c Allow to, optionally, keep Unicode escape sequences in stringToPDFString (PR 17331 follow-up)
Currently *some* of the links[1] on page three of the `issue19835.pdf` test-case aren't clickable, since the destination (of the LinkAnnotation) becomes empty.
The reason is that these destinations include the character `\x1b`, which is interpreted as the start of a Unicode escape sequence specifying the language of the string; please refer to section [7.9.2.2 Text String Type](https://opensource.adobe.com/dc-acrobat-sdk-docs/pdfstandards/PDF32000_2008.pdf#G6.1957385) in the PDF specification.

Hence it seems that we need a way to optionally disable that behaviour, to avoid a "badly" formatted string from becoming empty (or truncated), at least for cases where we are:
 - Parsing named destinations[2] and URLs.
 - Handling "strings" that are actually /Name-instances.
 - Building a lookup Object/Map based on some PDF data-structure.

*NOTE:* The issue that prompted this patch is obviously related to destinations, however I've gone through the `src/core/` folder and updated various other `stringToPDFString` call-sites that (directly or indirectly) fit the categories listed above.

---
[1] Try clicking on anything on the line containing "Item 7A. Quantitative and Qualitative Disclosures About Market Risk 27".

[2] Unfortunately just skipping `stringToPDFString` in this case would cause other issues, such as the named destination becoming "unusable" in the viewer; see e.g. issues 14847 and 14864.
2025-04-30 20:51:10 +02:00
Jonas Jenwald
adc9eb5a5a Always fallback to checking all destinations, when lookup fails (issue 19835)
In the referenced PDF document the keys, in the /Dests dictionary, need to account for PDFDocEncoding.
To improve destination handling in general we'll now unconditionally fallback to always checking all destinations.
2025-04-20 14:53:10 +02:00
Calixte Denizet
be1f5671bb [api-minor] Use a Path2D when doing a path operation in the canvas (bug 1946953)
With this patch, all the paths components are collected in the worker until a path
operation is met (i.e., stroke, fill, ...).
Then in the canvas a Path2D is created and will replace the path data transfered from the worker,
this way when rescaling, the Path2D can be reused.
In term of performances, using Path2D is very slightly improving speed when scaling the canvas.
2025-03-22 20:35:24 +01:00
Jonas Jenwald
9e8d4e4d46 [api-minor] Attempt to support fetching the raw data of the PDF document from the PDFDocumentLoadingTask-instance (issue 15085)
The new API-functionality will allow a PDF document to be downloaded in the viewer e.g. while the PasswordPrompt is open, or in cases when document initialization failed.
Normally the raw data of the PDF document would be accessed via the `PDFDocumentProxy.prototype.getData` method, however in these cases the `PDFDocumentProxy`-instance isn't available.
2025-03-16 10:09:44 +01:00
Jonas Jenwald
7b5cd9cddd Use arrow functions with some Promise.then calls
A lot of this is fairly old code, which we can shorten slightly by using arrow functions instead of "regular" functions.
2025-03-02 19:57:38 +01:00
Jonas Jenwald
2e62f426fe Use arrow function with various Array methods
A lot of this is quite old code, which we can shorten slightly by using arrow functions instead of "regular" functions.
2025-03-02 15:19:04 +01:00
Jonas Jenwald
d5ce35f744 Move the EXIF-block replacement into JpegStream (PR 19356 follow-up)
Currently we modify the EXIF-block in place, which may end up "breaking" the JPEG-data of the original PDF document since e.g. saving it from the viewer no longer contains the real EXIF-block.
Hence the EXIF-block replacement is moved into the `JpegStream` class, such that we can copy the data before doing the replacement.
2025-02-20 12:41:39 +01:00
Jonas Jenwald
36979e9eb2 Fix all outstanding ESLint arrow-body-style warnings
Currently this rule is disabled in a number of spots across the code-base, and unless absolutely necessary we probably shouldn't disable linting, so let's just update the code to fix all the outstanding cases.
2025-02-17 15:45:44 +01:00
Jonas Jenwald
33cba30bdb Search for destinations in both /Names and /Dests dictionaries (issue 19474)
Currently we only use either one of them, preferring the NameTree when it's available.
2025-02-14 15:49:05 +01:00
Jonas Jenwald
db43f158dc Inline the default Factory-definitions in getDocument
- Most of the these are only used in the `src/display/api.js` file, and this leads to slightly shorter code.

 - A number of unit-tests need a `BaseCanvasFactory`-instance, however that one is available through the `PDFDocumentProxy`-instance nowadays.

 - For other unit-tests the remaining necessary default Factory-definitions can be moved into the `test/unit/test_utils.js` file.
2025-01-18 14:09:14 +01:00
Jonas Jenwald
75cba72ca6 [api-major] Replace MissingPDFException and UnexpectedResponseException with one exception
These old exceptions have a fair amount of overlap given how/where they are being used, which is likely because they were introduced at different points in time, hence we can shorten and simplify the code by replacing them with a more general `ResponseException` instead.

Besides an error message, the new `ResponseException` instances also include:
 - A numeric `status` field containing the server response status, similar to the old `UnexpectedResponseException`.

 - A boolean `missing` field, to allow easily detecting the situations where `MissingPDFException` was previously thrown.
2025-01-16 22:51:05 +01:00
Jonas Jenwald
6f062abb76 Skip LinkAnnotations when collecting field objects (issue 19281)
The `/Root/AcroForm/Fields` array contains a "ridiculous" number of LinkAnnotations, which obviously makes no sense since those are not form fields.
To improve performance we'll thus ignore those when collecting the field objects.
2025-01-04 11:54:45 +01:00
Jonas Jenwald
c6e3fc4fe6 Take the userUnit into account in the PageViewport class (issue 19176) 2024-12-08 15:51:04 +01:00
Jonas Jenwald
f8d11a3a3a
Merge pull request #19074 from Rob--W/issue-12744-test
Add test cases for redirected responses
2024-12-02 19:06:55 +01:00
Rob Wu
f97b4b9a66 Add test cases for redirected responses
Regression tests for issue #12744 and PR #19028
2024-12-02 17:57:49 +01:00
Rob Wu
28b0220bc2 Replace createTemporaryNodeServer with TestPdfsServer
Some tests rely on the presence of a server that serves PDF files.
When tests are run from a web browser, the test files and PDF files are
served by the same server (WebServer), but in Node.js that server is not
around.

Currently, the tests that depend on it start a minimal Node.js server
that re-implements part of the functionality from WebServer.

To avoid code duplication when tests depend on more complex behaviors,
this patch replaces createTemporaryNodeServer with the existing
WebServer, wrapped in a new test utility that has the same interface in
Node.js and non-Node.js environments (=TestPdfsServer).

This patch has been tested by running the refactored tests in the
following three configurations:

1. From the browser:
   - http://localhost:8888/test/unit/unit_test.html?spec=api
   - http://localhost:8888/test/unit/unit_test.html?spec=fetch_stream

2. Run specific tests directly with jasmine without legacy bundling:
   `JASMINE_CONFIG_PATH=test/unit/clitests.json ./node_modules/.bin/jasmine --filter='^api|^fetch_stream'`

3. `gulp unittestcli`
2024-12-02 17:57:49 +01:00
Rob Wu
131d4650a5 Drop trailing whitespace from test/unit/api_spec.js
test/unit/api_spec.js is the only JS file in the tree with trailing
whitespace. Because `trim_trailing_whitespace = true` in .editorconfig,
any editor supporting EditorConfig would trim whitespace when the file
is changed, which results in test failures.

This commit fixes the issue by trimming the trailing whitespace and
adjusting the test expectations.
2024-11-24 23:37:16 +01:00
Jonas Jenwald
1a56b35af7
Merge pull request #19003 from Snuffleupagus/api-unittest-image-helpers
Add helper functions to load image blob/bitmap data in `test/unit/api_spec.js`
2024-11-06 09:11:28 +01:00