Currently *some* of the links[1] on page three of the `issue19835.pdf` test-case aren't clickable, since the destination (of the LinkAnnotation) becomes empty.
The reason is that these destinations include the character `\x1b`, which is interpreted as the start of a Unicode escape sequence specifying the language of the string; please refer to section [7.9.2.2 Text String Type](https://opensource.adobe.com/dc-acrobat-sdk-docs/pdfstandards/PDF32000_2008.pdf#G6.1957385) in the PDF specification.
Hence it seems that we need a way to optionally disable that behaviour, to avoid a "badly" formatted string from becoming empty (or truncated), at least for cases where we are:
- Parsing named destinations[2] and URLs.
- Handling "strings" that are actually /Name-instances.
- Building a lookup Object/Map based on some PDF data-structure.
*NOTE:* The issue that prompted this patch is obviously related to destinations, however I've gone through the `src/core/` folder and updated various other `stringToPDFString` call-sites that (directly or indirectly) fit the categories listed above.
---
[1] Try clicking on anything on the line containing "Item 7A. Quantitative and Qualitative Disclosures About Market Risk 27".
[2] Unfortunately just skipping `stringToPDFString` in this case would cause other issues, such as the named destination becoming "unusable" in the viewer; see e.g. issues 14847 and 14864.
- Check that the `filename` is actually a string, before parsing it further.
- Use proper "shadowing" in the `filename` getter.
- Add a bit more validation of the data in `pickPlatformItem`.
- Last, but not least, return both the original `filename` and the (path stripped) variant needed in the display-layer and viewer.
Unless you actually need to check that something is both a `Dict` and also of the *correct* type, using `instanceof Dict` directly should be a tiny bit more efficient since it avoids one function call and an unnecessary `undefined` check.
This patch uses ESLint to enforce this, since we obviously still want to keep the `isDict` helper function for where it makes sense.
At this point all the various Stream-classes extends an abstract base-class, hence this helper function is no longer necessary and only adds unnecessary indirection in the code.
Given that the GitHub Advanced Security workflow now covers everything that LGTM does, but generally faster and with better GitHub-integration, there's no longer much point in also running LGTM separately.
As a follow-up to this patch, we should also disable/remove the LGTM-integration from the PDF.js repository.
Apparently I didn't put one of the disable comments on the *correct* line, since I didn't read the instructions carefully enough, so let's try again.
Note that, most unfortunately, disabling of warnings isn't applied until *after* a patch has been merged.
Most of the warnings we don't really care about, and those are simply white-listed using inline comments; however two cases prompted actual code changes:
- In `src/display/pattern_helper.js` the branch in question is indeed unreachable, and should thus be safe to remove. (This code originated in PR 4192, which is now over seven years ago.)
- In `test/test.js`, the function in question indeed doesn't accept any arguments. (The patch also re-formats a string just above, which didn't seem worthy of a separated patch.)
This now leaves only *one* warning in the LGTM report, however that one is a false positive that we'll need to report upstream.
The size of the `src/core/obj.js` file has increased slowly over the years, and it also contains a fair amount of *distinct* functionality.
In order to improve readability and make it easier to navigate through the code, this patch moves the `FileSpec` into its own file.