Given that these classes are, with the exception of their `decode` methods, virtually identical this helps reduce code duplication and simplifies maintenance.
These changes reduce the size of the `gulp mozcentral` build-target by `1292` bytes, which obviously isn't a lot but still cannot hurt.
*Note:* This is similar to PR 19525, which did the same thing for the OpenJPEG decoder.
The advantages of doing this are:
- The same JBig2 decoder is used regardless of WASM being supported or not, which means consistent rendering.
- The old `Jbig2Image` implementation has various bugs and missing features.
- Less code that needs to be maintained in the PDF.js project, since both the CCITT and the JBig2 decoder is replaced.
The disadvantage of doing this is:
- Slightly larger bundle size, however the effect is limited since a fair amount of PDF.js code can be removed. For the `gulp mozcentral` target the size increase is approximately 54 kilo-bytes (which is small compared to the 452 kilo-bytes for the JS version of the OpenJPEG decoder).
While this cache will not contain a huge amount of data in practice, it's nonetheless a *global* cache that currently will never be cleared.
This patch also removes the existing closure, since it shouldn't really be necessary nowadays given that the code is a JavaScript module which means that only explicitly listed properties will be exported.
Given that the regular expression has already become more complex (after the initial patch adding it), it seems to me that it probably cannot hurt to add a global cache to reduce unnecessary re-parsing.
Obviously the `Glyph`-instances are being cached *per* font, however in most documents multiple fonts are being used and in practice there's very often a fair amount of overlap between the /ToUnicode-data in different fonts[1].
Consider for example loading and rendering the entire `tracemonkey.pdf` document (from the test-suite), which isn't a particularily large document. In that case the `getCharUnicodeCategory` function is being called a total of `601` times, however there's only `106` *unique* unicode-chars being checked.
*Please note:* In practice I suppose that this won't have a *huge* effect on overall performance, however given the relative simplicity of this patch I figured that it'd not hurt to submit it for review.
---
[1] Consider e.g. how there's usually different fonts used for regular, bold, respectively italic text.