143 Commits

Author SHA1 Message Date
Matthias Bisping
dffe1c18fc [WIP] Either refactoring
Add alternative formulation for monadic chain
2023-02-06 19:34:56 +01:00
Matthias Bisping
066cf17add [WIP] Either refactoring 2023-02-06 18:40:42 +01:00
Matthias Bisping
f53f0fea29 [WIP] Either refactoring
Propagate error and metadata
2023-02-06 18:18:36 +01:00
Matthias Bisping
274a5f56d4 [WIP] Either refactoring
Fix test assertion
2023-02-06 17:51:45 +01:00
Matthias Bisping
3235a857f6 [WIP] Either-refactoring
Replace Maybe with Either to allow passing on error information or
metadata which otherwise get sucked up by Nothing.
2023-02-06 16:57:45 +01:00
Matthias Bisping
89989543d8 [WIP] Monadic refactoring
Integrate image validation step into monadic chain.

At the moment we lost the error information through this. Refactoring to
Either monad can bring it back.
2023-02-06 16:12:41 +01:00
Matthias Bisping
022bd4856a [WIP] Monadic refactoring 2023-02-06 15:16:41 +01:00
Matthias Bisping
ca3898cb53 [WIP] Monadic refactoring 2023-02-06 15:10:34 +01:00
Matthias Bisping
d8f37bed5c [WIP] Monadic refactoring 2023-02-06 15:09:51 +01:00
Matthias Bisping
906fee0e5d [WIP] Monadic refactoring 2023-02-06 15:03:35 +01:00
Matthias Bisping
4e3168e51c [WIP] Monadic refactoring 2023-02-06 14:36:25 +01:00
Matthias Bisping
f645984ea4 Update dependencies 2023-02-06 13:25:07 +01:00
Matthias Bisping
0cf8e047c5 Refactoring 2023-02-06 13:22:33 +01:00
Matthias Bisping
112e18ebb5 Tweak logging 2023-02-06 13:21:41 +01:00
Matthias Bisping
1d1eb8b649 Track missing test data files 2023-02-06 13:21:34 +01:00
Matthias Bisping
0244ba7f17 Make test for bad xref work RED-6084-adhoc-scanned-pages-filtering-alternative_17 2023-02-06 12:18:25 +01:00
Matthias Bisping
825099d946 Replace bad-xref file 2023-02-06 11:47:42 +01:00
Matthias Bisping
f6dbfcab43 Add test for handling of bad xrefs 2023-02-06 11:31:43 +01:00
Matthias Bisping
e63f66a126 Refactoring
- Rename metadata -> metadatum in some more places to make it clear that
  it is the metadata of a single image in that context
- Re-order function definitions according to caller hierarchy
2023-02-06 10:46:56 +01:00
Matthias Bisping
6136bf57d4 Start tracking test/data with DVC 2023-02-06 10:07:16 +01:00
Matthias Bisping
290a8de3e3 Stop tracking test/data 2023-02-06 10:06:43 +01:00
Julius Unverfehrt
4d43e385c5 replace image extraction logic final RED-6084-adhoc-scanned-pages-filtering-alternative_16 RED-6189-bugfix_2 test 2023-02-06 09:43:28 +01:00
Julius Unverfehrt
bd0279ddd1 introduce normalizing function for image extraction 2023-02-03 12:25:27 +01:00
Julius Unverfehrt
2995d5ee48 refactoring 2023-02-03 11:14:14 +01:00
Julius Unverfehrt
eff1bb4124 adjust behavior of filtering of invalid images 1.20.6 RED-6084-adhoc-scanned-pages-filtering-alternative_14 2023-02-03 09:04:02 +01:00
Julius Unverfehrt
c478333111 add log in callback to diplay which file is processed 2023-02-03 08:25:36 +01:00
Julius Unverfehrt
978f48e8f9 add ad hoc logic for bad xref handling 1.20.5 RED-6084-adhoc-scanned-pages-filtering-alternative_8 2023-02-02 15:39:44 +01:00
Julius Unverfehrt
94652aafe4 beautify 2023-02-02 15:26:33 +01:00
Julius Unverfehrt
c4416636c0 beautify 1.20.4 RED-6084-adhoc-scanned-pages-filtering-alternative_6 2023-02-02 14:10:32 +01:00
Julius Unverfehrt
c0b41e77b8 implement ad hoc channel count detection for new image extraction RED-6084-adhoc-scanned-pages-filtering-alternative_5 2023-02-02 13:57:56 +01:00
Julius Unverfehrt
73f7491c8f improve performance
- disable scanned page filter, since dropping these disables the
computation of the images hash and the frontend OCR hint, which are both
wanted
- optimize image extraction by using arrays instead of byte streams for
the conversion to PIL images
RED-6084-adhoc-scanned-pages-filtering-alternative_4
2023-02-02 13:37:03 +01:00
Julius Unverfehrt
2385584dcb refactor scanned page filtering 1.20.3 RED-6084-adhoc-scanned-pages-filtering-alternative_2 2023-02-01 15:49:36 +01:00
Julius Unverfehrt
b880e892ec refactor scanned page filtering WIP 2023-02-01 15:47:40 +01:00
Julius Unverfehrt
8c7349c2d1 refactor scanned page filtering WIP 2023-02-01 15:36:16 +01:00
Julius Unverfehrt
c55777e339 refactor scanned page filtering WIP 2023-02-01 15:16:12 +01:00
Julius Unverfehrt
0f440bdb09 refactor scanned page filtering WIP 2023-02-01 15:14:27 +01:00
Julius Unverfehrt
436a32ad2b refactor scanned page filtering WIP 2023-02-01 15:07:35 +01:00
Julius Unverfehrt
9ec6cc19ba refactor scanned page filtering WIP 2023-02-01 14:53:26 +01:00
Julius Unverfehrt
2d385b0a73 refactor scanned page filtering WIP 2023-02-01 14:38:55 +01:00
Julius Unverfehrt
5bd5e0cf2b refactor
- reduce code duplication by adapting functions of the module
- use the modules enums for image metadata
- improve readabilty of the scanned page detection heuristic
RED-6084-adhoc-scanned-pages-filtering_7
2023-02-01 12:43:59 +01:00
Julius Unverfehrt
876260f403 improve the readability of variable names and docstrings RED-6084-adhoc-scanned-pages-filtering_6 2023-02-01 10:08:36 +01:00
Julius Unverfehrt
368c54a8be clean-up filter logic
- Logic adapted so that it can potentially be
easily removed again from the extraction logic
1.20.2 RED-6084-adhoc-scanned-pages-filtering_4
2023-02-01 08:49:30 +01:00
Julius Unverfehrt
1490d27308 introduce adhoc filter for scanned pages 1.20.1 RED-6084-adhoc-scanned-pages-filtering_2 2023-01-31 17:18:28 +01:00
Julius Unverfehrt
4eb7f3c40a rename publishing flag refactor-adhoc-additions_3 2023-01-31 10:37:27 +01:00
Julius Unverfehrt
98dc001123 revert adhoc figure detection changes
- revert pipeline and serve logic to pre figure detection data for image
extraction changes: figure detection data as input not supported for now
refactor-adhoc-additions_2
2023-01-30 12:41:22 +01:00
Francisco Schulz
25fc7d84b9 Pull request #38: update dependencies
Merge in RR/image-prediction from fschulz/update-to-new-pyinfra-version to master

* commit 'd63f8c4eaf39ef7346188b585fb9d968de72db87':
  update dependencies
1.15.0 1.16.0 1.17.0 1.18.0 1.19.0 1.20.0
2022-10-13 15:33:53 +02:00
Francisco Schulz
d63f8c4eaf update dependencies 2022-10-13 15:23:27 +02:00
Viktor Seifert
549b2aac5c Pull request #37: RED-5324: Update pyinfra to include storage-region fix
Merge in RR/image-prediction from RED-5324 to master

* commit 'c72ef26a6caac8d87cdc08dd19dbe235247129d4':
  RED-5324: Update pyinfra to include storage-region fix
1.14.0 aure_storage_check_2 azure_storage_check_1
2022-09-30 15:27:03 +02:00
Viktor Seifert
c72ef26a6c RED-5324: Update pyinfra to include storage-region fix 2022-09-30 15:24:18 +02:00
Julius Unverfehrt
561a7f527c Pull request #36: RED-4206 wrap queue callback in process to manage memory allocation with the operating system and force deallocation after processing.
Merge in RR/image-prediction from RED-4206-fix-unwanted-restart-bug to master

Squashed commit of the following:

commit 3dfe7b861816ef9019103e16a23efd97a08fb617
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Thu Sep 22 13:53:32 2022 +0200

    RED-4206 wrap queue callback in process to manage memory allocation with the operating system and force deallocation after processing.
1.13.0
2022-09-22 13:56:44 +02:00