- disable scanned page filter, since dropping these disables the computation of the images hash and the frontend OCR hint, which are both wanted - optimize image extraction by using arrays instead of byte streams for the conversion to PIL images