pdf.js/core at 26f6f77db6644ffaeb20f059bd904acf059ad239 - pdf.js - Gitea: Git with a cup of tea

Marmelator/pdf.js

History

Jonas Jenwald c33b8d7692 Cache the normalized unicode-value on the Glyph-instance

Currently, during text-extraction, we're repeatedly normalizing and (when necessary) reversing the unicode-values every time. This seems a little unnecessary, since the result won't change, hence this patch moves that into the `Glyph`-instance and makes it *lazily* initialized.

Taking the `tracemonkey.pdf` document as an example: When extracting the text-content there's a total of 69236 characters but only 595 unique `Glyph`-instances, which mean a 99.1 percent cache hit-rate. Generally speaking, the longer a PDF document is the more beneficial this should be.

*Please note:* The old code is fast enough that it unfortunately seems difficult to measure a (clear) performance improvement with this patch, so I completely understand if it's deemed an unnecessary change.

2022-11-03 22:36:53 +01:00

..

Fix property chain orders of Operators in isDotExpression and isSomPredicate

2022-09-21 17:20:23 +02:00

.eslintrc

Enable the ESLint no-var rule globally

2021-03-13 16:12:53 +01:00

annotation.js

Merge pull request #15615 from calixteman/bug1796741

2022-10-31 09:58:27 +01:00

arithmetic_decoder.js

Re-factor how the ESLint no-var rule is enabled in the src/ folder

2020-10-03 20:15:29 +02:00

ascii_85_stream.js

Fix the remaining ESLint operator-assignment errors

2021-07-04 15:23:56 +02:00

ascii_hex_stream.js

Fix the remaining ESLint operator-assignment errors

2021-07-04 15:23:56 +02:00

base_stream.js

[api-minor] Remove the forceClamped-functionality in the Streams (issue 14849)

2022-04-29 14:46:30 +02:00

bidi.js

Tweak the Bidi-detection heuristics for very short RTL strings (issue 11656)

2021-11-03 20:31:57 +01:00

calibri_factors.js

XFA - Fix font scale factors (bug 1720888)

2021-07-28 19:10:42 +02:00

catalog.js

[api-minor] Let Catalog.getAllPageDicts return an *empty* dictionary when loading the first /Page fails (issue 15590)

2022-11-03 12:51:48 +01:00

ccitt_stream.js

Prefer instanceof Dict rather than calling isDict() with one argument

2022-02-21 12:44:56 +01:00

ccitt.js

Prevent an infinite loop when parsing corrupt /CCITTFaxDecode data (issue 14305)

2021-12-07 13:57:25 +01:00

cff_font.js

Take the /CIDToGIDMap into account when getting the glyph mapping for CFF fonts (issue 15559)

2022-10-13 10:02:25 +02:00

cff_parser.js

Use more for...of loops in the code-base

2022-10-03 13:08:38 +02:00

charsets.js

Use ESLint to ensure that exports are sorted alphabetically

2021-01-09 20:37:51 +01:00

chunked_stream.js

Remove the PdfManager.onLoadedStream method (PR 15616 follow-up)

2022-10-29 14:42:17 +02:00

cleanup_helper.js

Add a (global) cache to the getCharUnicodeCategory function

2022-01-25 09:59:34 +01:00

cmap.js

Remove the unused CMapCompressionType.STREAM value

2022-10-08 17:10:05 +02:00

colorspace.js

Prefer instanceof Name rather than calling isName() with one argument

2022-02-21 12:45:00 +01:00

core_utils.js

Re-factor the PDF version parsing in the worker-thread

2022-10-15 12:06:39 +02:00

crypto.js

Simplify the way to compute the remainder modulo 3 in PDF20Hash function

2022-10-07 14:43:31 +02:00

dataset_reader.js

Refactor some xfa*** getters in document.js

2022-04-03 20:38:12 +02:00

decode_stream.js

[api-minor] Remove the forceClamped-functionality in the Streams (issue 14849)

2022-04-29 14:46:30 +02:00

decrypt_stream.js

Replace loop with TypedArray.prototype.set in the DecryptStream.readBlock method

2022-10-06 14:43:24 +02:00

default_appearance.js

Combine Array.from and Array.prototype.map calls

2022-10-28 13:46:30 +02:00

document.js

Re-factor the PDF version parsing in the worker-thread

2022-10-15 12:06:39 +02:00

encodings.js

Use ESLint to ensure that exports are sorted alphabetically

2021-01-09 20:37:51 +01:00

evaluator.js

Cache the normalized unicode-value on the Glyph-instance

2022-11-03 22:36:53 +01:00

file_spec.js

Prefer instanceof Dict rather than calling isDict() with one argument

2022-02-21 12:44:56 +01:00

flate_stream.js

Remove some, with Prettier 2.3.0, unnecessary // prettier-ignore comments

2021-05-19 11:36:03 +02:00

font_renderer.js

Enable the unicorn/prefer-at ESLint plugin rule (PR 15008 follow-up)

2022-06-09 21:21:19 +02:00

fonts_utils.js

Include and use the 14 standard fonts files.

2021-06-07 11:10:11 -07:00

fonts.js

Cache the normalized unicode-value on the Glyph-instance

2022-11-03 22:36:53 +01:00

function.js

Use more for...of loops in the code-base

2022-10-03 13:08:38 +02:00

glyf.js

Font renderer - get int8 instead of uint8 in composite glyphes (bug 1749563)

2022-01-18 22:06:23 +01:00

glyphlist.js

Add more non-standard ligatures in the glyphlist.js file (issue 15516)

2022-09-27 16:31:51 +02:00

helvetica_factors.js

XFA - Fix font scale factors (bug 1720888)

2021-07-28 19:10:42 +02:00

image_utils.js

Add general iteration support in the RefSet and RefSetCache classes

2022-03-18 14:27:34 +01:00

image.js

[api-minor] Make isOffscreenCanvasSupported configurable via the API (issue 14952)

2022-10-07 00:10:46 +02:00

jbig2_stream.js

Prefer instanceof Dict rather than calling isDict() with one argument

2022-02-21 12:44:56 +01:00

jbig2.js

Enable the ESLint prefer-spread rule

2022-08-06 10:17:00 +02:00

jpeg_stream.js

Prefer instanceof Dict rather than calling isDict() with one argument

2022-02-21 12:44:56 +01:00

jpg.js

Use more for...of loops in the code-base

2022-10-03 13:08:38 +02:00

jpx_stream.js

Stop special-casing the dict parameter in the Jbig2Stream/JpegStream/JpxStream constructors

2021-04-28 13:44:47 +02:00

jpx.js

[JPEG 2000] Add support for resetContextProbabilities (bug 1731483)

2022-02-26 13:05:23 +01:00

liberationsans_widths.js

XFA - Fix font scale factors (bug 1720888)

2021-07-28 19:10:42 +02:00

lzw_stream.js

Move the DecodeStream and StreamsSequenceStream from src/core/stream.js and into its own file

2021-04-28 10:16:51 +02:00

metadata_parser.js

Move the XML-parser to the src/core/-folder

2021-02-17 13:12:01 +01:00

metrics.js

[api-minor] Annotations - Adjust the font size in text field in considering the total width (bug 1721335)

2022-01-30 15:53:31 +01:00

myriadpro_factors.js

XFA - Fix font scale factors (bug 1720888)

2021-07-28 19:10:42 +02:00

name_number_tree.js

Support destinations in NameTrees with encoded keys (issue 14847)

2022-04-27 11:19:55 +02:00

object_loader.js

Use more for...of loops in the code-base

2022-10-03 13:08:38 +02:00

opentype_file_builder.js

Use more for...of loops in the code-base

2022-10-03 13:08:38 +02:00

operator_list.js

[api-minor] Improve thumbnail handling in documents that contain interactive forms

2022-07-30 16:53:32 +02:00

parser.js

Let Lexer.getNumber treat more invalid "numbers" as zero (issue 15604)

2022-10-20 22:36:15 +02:00

pattern.js

Enable the unicorn/prefer-at ESLint plugin rule (PR 15008 follow-up)

2022-06-09 21:21:19 +02:00

pdf_manager.js

Remove the PdfManager.onLoadedStream method (PR 15616 follow-up)

2022-10-29 14:42:17 +02:00

predictor_stream.js

Prefer instanceof Dict rather than calling isDict() with one argument

2022-02-21 12:44:56 +01:00

primitives.js

Enable the unicorn/prefer-logical-operator-over-ternary ESLint plugin rule

2022-07-12 10:52:37 +02:00

ps_parser.js

Remove the closure used with the PostScriptToken class

2021-07-24 13:05:46 +02:00

run_length_stream.js

Move the DecodeStream and StreamsSequenceStream from src/core/stream.js and into its own file

2021-04-28 10:16:51 +02:00

segoeui_factors.js

XFA - Fix font scale factors (bug 1720888)

2021-07-28 19:10:42 +02:00

standard_fonts.js

Extend getSupplementalGlyphMapForCalibri with some umlauts (issue 15594)

2022-10-19 17:49:40 +02:00

stream.js

[api-minor] Remove the forceClamped-functionality in the Streams (issue 14849)

2022-04-29 14:46:30 +02:00

struct_tree.js

Correct typos

2022-04-09 09:43:18 +09:00

to_unicode_map.js

Convert src/core/to_unicode_map.js to use standard classes

2021-05-02 21:00:29 +02:00

type1_font.js

Use more for...of loops in the code-base

2022-10-03 13:08:38 +02:00

type1_parser.js

Remove the remaining closures in the src/core/type1_parser.js file

2022-08-14 12:50:26 +02:00

unicode.js

Add a (global) cache to the getCharUnicodeCategory function

2022-01-25 09:59:34 +01:00

worker_stream.js

Replace a bunch of Array.prototype.forEach() cases with for...of loops instead

2021-04-24 13:00:19 +02:00

worker.js

Remove the PdfManager.onLoadedStream method (PR 15616 follow-up)

2022-10-29 14:42:17 +02:00

writer.js

Enable the unicorn/prefer-at ESLint plugin rule (PR 15008 follow-up)

2022-06-09 21:21:19 +02:00

xfa_fonts.js

Enable the unicorn/prefer-at ESLint plugin rule (PR 15008 follow-up)

2022-06-09 21:21:19 +02:00

xml_parser.js

Use more for...of loops in the code-base

2022-10-03 13:08:38 +02:00

xref.js

Fallback to finding the first "obj" occurrence, when the trailer-dictionary is incomplete (issue 15590)

2022-11-03 12:46:30 +01:00