138 Commits

Author SHA1 Message Date
Matthias Bisping
89989543d8 [WIP] Monadic refactoring
Integrate image validation step into monadic chain.

At the moment we lost the error information through this. Refactoring to
Either monad can bring it back.
2023-02-06 16:12:41 +01:00
Matthias Bisping
022bd4856a [WIP] Monadic refactoring 2023-02-06 15:16:41 +01:00
Matthias Bisping
ca3898cb53 [WIP] Monadic refactoring 2023-02-06 15:10:34 +01:00
Matthias Bisping
d8f37bed5c [WIP] Monadic refactoring 2023-02-06 15:09:51 +01:00
Matthias Bisping
906fee0e5d [WIP] Monadic refactoring 2023-02-06 15:03:35 +01:00
Matthias Bisping
4e3168e51c [WIP] Monadic refactoring 2023-02-06 14:36:25 +01:00
Matthias Bisping
f645984ea4 Update dependencies 2023-02-06 13:25:07 +01:00
Matthias Bisping
0cf8e047c5 Refactoring 2023-02-06 13:22:33 +01:00
Matthias Bisping
112e18ebb5 Tweak logging 2023-02-06 13:21:41 +01:00
Matthias Bisping
1d1eb8b649 Track missing test data files 2023-02-06 13:21:34 +01:00
Matthias Bisping
0244ba7f17 Make test for bad xref work RED-6084-adhoc-scanned-pages-filtering-alternative_17 2023-02-06 12:18:25 +01:00
Matthias Bisping
825099d946 Replace bad-xref file 2023-02-06 11:47:42 +01:00
Matthias Bisping
f6dbfcab43 Add test for handling of bad xrefs 2023-02-06 11:31:43 +01:00
Matthias Bisping
e63f66a126 Refactoring
- Rename metadata -> metadatum in some more places to make it clear that
  it is the metadata of a single image in that context
- Re-order function definitions according to caller hierarchy
2023-02-06 10:46:56 +01:00
Matthias Bisping
6136bf57d4 Start tracking test/data with DVC 2023-02-06 10:07:16 +01:00
Matthias Bisping
290a8de3e3 Stop tracking test/data 2023-02-06 10:06:43 +01:00
Julius Unverfehrt
4d43e385c5 replace image extraction logic final RED-6084-adhoc-scanned-pages-filtering-alternative_16 RED-6189-bugfix_2 test 2023-02-06 09:43:28 +01:00
Julius Unverfehrt
bd0279ddd1 introduce normalizing function for image extraction 2023-02-03 12:25:27 +01:00
Julius Unverfehrt
2995d5ee48 refactoring 2023-02-03 11:14:14 +01:00
Julius Unverfehrt
eff1bb4124 adjust behavior of filtering of invalid images 1.20.6 RED-6084-adhoc-scanned-pages-filtering-alternative_14 2023-02-03 09:04:02 +01:00
Julius Unverfehrt
c478333111 add log in callback to diplay which file is processed 2023-02-03 08:25:36 +01:00
Julius Unverfehrt
978f48e8f9 add ad hoc logic for bad xref handling 1.20.5 RED-6084-adhoc-scanned-pages-filtering-alternative_8 2023-02-02 15:39:44 +01:00
Julius Unverfehrt
94652aafe4 beautify 2023-02-02 15:26:33 +01:00
Julius Unverfehrt
c4416636c0 beautify 1.20.4 RED-6084-adhoc-scanned-pages-filtering-alternative_6 2023-02-02 14:10:32 +01:00
Julius Unverfehrt
c0b41e77b8 implement ad hoc channel count detection for new image extraction RED-6084-adhoc-scanned-pages-filtering-alternative_5 2023-02-02 13:57:56 +01:00
Julius Unverfehrt
73f7491c8f improve performance
- disable scanned page filter, since dropping these disables the
computation of the images hash and the frontend OCR hint, which are both
wanted
- optimize image extraction by using arrays instead of byte streams for
the conversion to PIL images
RED-6084-adhoc-scanned-pages-filtering-alternative_4
2023-02-02 13:37:03 +01:00
Julius Unverfehrt
2385584dcb refactor scanned page filtering 1.20.3 RED-6084-adhoc-scanned-pages-filtering-alternative_2 2023-02-01 15:49:36 +01:00
Julius Unverfehrt
b880e892ec refactor scanned page filtering WIP 2023-02-01 15:47:40 +01:00
Julius Unverfehrt
8c7349c2d1 refactor scanned page filtering WIP 2023-02-01 15:36:16 +01:00
Julius Unverfehrt
c55777e339 refactor scanned page filtering WIP 2023-02-01 15:16:12 +01:00
Julius Unverfehrt
0f440bdb09 refactor scanned page filtering WIP 2023-02-01 15:14:27 +01:00
Julius Unverfehrt
436a32ad2b refactor scanned page filtering WIP 2023-02-01 15:07:35 +01:00
Julius Unverfehrt
9ec6cc19ba refactor scanned page filtering WIP 2023-02-01 14:53:26 +01:00
Julius Unverfehrt
2d385b0a73 refactor scanned page filtering WIP 2023-02-01 14:38:55 +01:00
Julius Unverfehrt
5bd5e0cf2b refactor
- reduce code duplication by adapting functions of the module
- use the modules enums for image metadata
- improve readabilty of the scanned page detection heuristic
RED-6084-adhoc-scanned-pages-filtering_7
2023-02-01 12:43:59 +01:00
Julius Unverfehrt
876260f403 improve the readability of variable names and docstrings RED-6084-adhoc-scanned-pages-filtering_6 2023-02-01 10:08:36 +01:00
Julius Unverfehrt
368c54a8be clean-up filter logic
- Logic adapted so that it can potentially be
easily removed again from the extraction logic
1.20.2 RED-6084-adhoc-scanned-pages-filtering_4
2023-02-01 08:49:30 +01:00
Julius Unverfehrt
1490d27308 introduce adhoc filter for scanned pages 1.20.1 RED-6084-adhoc-scanned-pages-filtering_2 2023-01-31 17:18:28 +01:00
Julius Unverfehrt
4eb7f3c40a rename publishing flag refactor-adhoc-additions_3 2023-01-31 10:37:27 +01:00
Julius Unverfehrt
98dc001123 revert adhoc figure detection changes
- revert pipeline and serve logic to pre figure detection data for image
extraction changes: figure detection data as input not supported for now
refactor-adhoc-additions_2
2023-01-30 12:41:22 +01:00
Francisco Schulz
25fc7d84b9 Pull request #38: update dependencies
Merge in RR/image-prediction from fschulz/update-to-new-pyinfra-version to master

* commit 'd63f8c4eaf39ef7346188b585fb9d968de72db87':
  update dependencies
1.15.0 1.16.0 1.17.0 1.18.0 1.19.0 1.20.0
2022-10-13 15:33:53 +02:00
Francisco Schulz
d63f8c4eaf update dependencies 2022-10-13 15:23:27 +02:00
Viktor Seifert
549b2aac5c Pull request #37: RED-5324: Update pyinfra to include storage-region fix
Merge in RR/image-prediction from RED-5324 to master

* commit 'c72ef26a6caac8d87cdc08dd19dbe235247129d4':
  RED-5324: Update pyinfra to include storage-region fix
1.14.0 aure_storage_check_2 azure_storage_check_1
2022-09-30 15:27:03 +02:00
Viktor Seifert
c72ef26a6c RED-5324: Update pyinfra to include storage-region fix 2022-09-30 15:24:18 +02:00
Julius Unverfehrt
561a7f527c Pull request #36: RED-4206 wrap queue callback in process to manage memory allocation with the operating system and force deallocation after processing.
Merge in RR/image-prediction from RED-4206-fix-unwanted-restart-bug to master

Squashed commit of the following:

commit 3dfe7b861816ef9019103e16a23efd97a08fb617
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Thu Sep 22 13:53:32 2022 +0200

    RED-4206 wrap queue callback in process to manage memory allocation with the operating system and force deallocation after processing.
1.13.0
2022-09-22 13:56:44 +02:00
Julius Unverfehrt
48dd52131d Pull request #35: update test dockerfile
Merge in RR/image-prediction from make-sec-build-work to master

Squashed commit of the following:

commit 08149d3a99681f4900a7d4b6a5f656b1c25ebdb3
Merge: 76b5a45 0538377
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Wed Sep 21 13:43:24 2022 +0200

    Merge branch 'master' of ssh://git.iqser.com:2222/rr/image-prediction into make-sec-build-work

commit 76b5a4504adc709107af9e5958970ec24ae3f5ef
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Wed Sep 21 13:41:46 2022 +0200

    update test dockerfile
1.12.0
2022-09-21 13:47:40 +02:00
Christoph Schabert
053837722b Pull request #34: hotfix: fix key prepare
Merge in RR/image-prediction from hotfix/keyPrep to master

* commit '98e639d83f72f0cde34cb9c009d84ed4e3b0d138':
  hotfix: fix key prepare
1.11.0
2022-09-20 11:36:11 +02:00
cschabert
98e639d83f hotfix: fix key prepare 2022-09-20 11:34:55 +02:00
Julius Unverfehrt
13d4427c78 Pull request #33: RED-5202 port hotfixes
Merge in RR/image-prediction from RED-5202-port-hotfixes to master

Squashed commit of the following:

commit 9674901235264de6b74d679fd39a52775ac4aee1
Merge: ec2ab89 9763d2c
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Mon Sep 12 15:55:58 2022 +0200

    Merge remote-tracking branch 'origin' into RED-5202-port-hotfixes

commit ec2ab890b8307942d147d6b8b236f6a3c1d0aebc
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Mon Sep 12 15:49:17 2022 +0200

    swap case when the log is printed for env var parsing

commit aaa02ea35e9c1b3b307116d7e3e32c93fd79ef5d
Merge: 5d87066 521222e
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Mon Sep 12 15:28:39 2022 +0200

    Merge branch 'master' of ssh://git.iqser.com:2222/rr/image-prediction into RED-5202-port-hotfixes

commit 5d87066b40b28f919b1346f5e5396b46445b4e00
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Mon Sep 12 15:25:01 2022 +0200

    remove warning log for non existent non default env var

commit 23c61ef49ef918b29952150d4a6e61b99d60ac64
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Mon Sep 12 15:14:19 2022 +0200

    make env var parser discrete

commit c1b92270354c764861da0f7782348e9cd0725d76
Author: Matthias Bisping <matthias.bisping@axbit.com>
Date:   Mon Sep 12 13:28:44 2022 +0200

    fixed statefulness issue with os.environ in tests

commit ad9c5657fe93079d5646ba2b70fa091e8d2daf76
Author: Matthias Bisping <matthias.bisping@axbit.com>
Date:   Mon Sep 12 13:04:55 2022 +0200

    - Adapted response formatting logic for threshold maps passed via env vars.
    - Added test for reading threshold maps and values from env vars.

commit c60e8cd6781b8e0c3ec69ccd0a25375803de26f0
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Mon Sep 12 11:38:01 2022 +0200

    add parser for environment variables WIP

commit 101b71726c697f30ec9298ba62d2203bd7da2efb
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Mon Sep 12 09:52:33 2022 +0200

    Add typehints, make custom page quotient breach function private since the intention of outsourcing it from build_image_info is to make it testable seperately

commit 04aee4e62781e78cd54c6d20e961dcd7bf1fc081
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Mon Sep 12 09:25:59 2022 +0200

    DotIndexable default get method exception made more specific

commit 4584e7ba66400033dc5f1a38473b644eeb11e67c
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Mon Sep 12 08:55:05 2022 +0200

    RED-5202 port temporary broken image handling so the hotfix won't be lost by upgrading the service. A proper solution is still desirable (see RED-5148)

commit 5f99622646b3f6d3a842aebef91ff8e082072cd6
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Mon Sep 12 08:47:02 2022 +0200

    RED-5202 add per class customizable max image to page quotient setting for signatures, default is 0.4. Can be overwritten by , set to null to use default value or set to value that should be used.
1.10.0
2022-09-12 15:59:50 +02:00
Julius Unverfehrt
9763d2ca65 Pull request #32: RED-5202 port hotfixes
Merge in RR/image-prediction from RED-5202-port-hotfixes to master

Squashed commit of the following:

commit aaa02ea35e9c1b3b307116d7e3e32c93fd79ef5d
Merge: 5d87066 521222e
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Mon Sep 12 15:28:39 2022 +0200

    Merge branch 'master' of ssh://git.iqser.com:2222/rr/image-prediction into RED-5202-port-hotfixes

commit 5d87066b40b28f919b1346f5e5396b46445b4e00
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Mon Sep 12 15:25:01 2022 +0200

    remove warning log for non existent non default env var

commit 23c61ef49ef918b29952150d4a6e61b99d60ac64
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Mon Sep 12 15:14:19 2022 +0200

    make env var parser discrete

commit c1b92270354c764861da0f7782348e9cd0725d76
Author: Matthias Bisping <matthias.bisping@axbit.com>
Date:   Mon Sep 12 13:28:44 2022 +0200

    fixed statefulness issue with os.environ in tests

commit ad9c5657fe93079d5646ba2b70fa091e8d2daf76
Author: Matthias Bisping <matthias.bisping@axbit.com>
Date:   Mon Sep 12 13:04:55 2022 +0200

    - Adapted response formatting logic for threshold maps passed via env vars.
    - Added test for reading threshold maps and values from env vars.

commit c60e8cd6781b8e0c3ec69ccd0a25375803de26f0
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Mon Sep 12 11:38:01 2022 +0200

    add parser for environment variables WIP

commit 101b71726c697f30ec9298ba62d2203bd7da2efb
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Mon Sep 12 09:52:33 2022 +0200

    Add typehints, make custom page quotient breach function private since the intention of outsourcing it from build_image_info is to make it testable seperately

commit 04aee4e62781e78cd54c6d20e961dcd7bf1fc081
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Mon Sep 12 09:25:59 2022 +0200

    DotIndexable default get method exception made more specific

commit 4584e7ba66400033dc5f1a38473b644eeb11e67c
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Mon Sep 12 08:55:05 2022 +0200

    RED-5202 port temporary broken image handling so the hotfix won't be lost by upgrading the service. A proper solution is still desirable (see RED-5148)

commit 5f99622646b3f6d3a842aebef91ff8e082072cd6
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Mon Sep 12 08:47:02 2022 +0200

    RED-5202 add per class customizable max image to page quotient setting for signatures, default is 0.4. Can be overwritten by , set to null to use default value or set to value that should be used.
1.9.0
2022-09-12 15:29:47 +02:00