Matthias Bisping
55bdd48d2a
Update dependencies
RED-6084-adhoc-scanned-pages-filtering-refactoring_12
RED-6084-adhoc-scanned-pages-filtering-refactoring_13
2023-02-09 15:47:31 +01:00
Matthias Bisping
970275b257
Refactoring
...
Make alpha channel check monadic to streamline error handling
2023-02-09 15:35:37 +01:00
Matthias Bisping
e99e97e23f
Refactoring
...
- Rename
- Refactor image extraction functions
2023-02-07 14:35:58 +01:00
Matthias Bisping
76b1b0ca24
Refactoring
2023-02-07 11:55:30 +01:00
Matthias Bisping
cb1c461049
Refactoring
2023-02-07 11:44:01 +01:00
Matthias Bisping
092069221a
Add to-do
2023-02-07 10:18:53 +01:00
Matthias Bisping
3cea4dad2d
Refactoring
...
- Rename
- Add typehints everywhere
2023-02-07 10:16:25 +01:00
Matthias Bisping
865e0819a1
Add type explanation
2023-02-06 19:39:00 +01:00
Matthias Bisping
01d3d5d33f
Formatting
2023-02-06 19:37:49 +01:00
Matthias Bisping
dffe1c18fc
[WIP] Either refactoring
...
Add alternative formulation for monadic chain
2023-02-06 19:34:56 +01:00
Matthias Bisping
066cf17add
[WIP] Either refactoring
2023-02-06 18:40:42 +01:00
Matthias Bisping
f53f0fea29
[WIP] Either refactoring
...
Propagate error and metadata
2023-02-06 18:18:36 +01:00
Matthias Bisping
274a5f56d4
[WIP] Either refactoring
...
Fix test assertion
2023-02-06 17:51:45 +01:00
Matthias Bisping
3235a857f6
[WIP] Either-refactoring
...
Replace Maybe with Either to allow passing on error information or
metadata which otherwise get sucked up by Nothing.
2023-02-06 16:57:45 +01:00
Matthias Bisping
89989543d8
[WIP] Monadic refactoring
...
Integrate image validation step into monadic chain.
At the moment we lost the error information through this. Refactoring to
Either monad can bring it back.
2023-02-06 16:12:41 +01:00
Matthias Bisping
022bd4856a
[WIP] Monadic refactoring
2023-02-06 15:16:41 +01:00
Matthias Bisping
ca3898cb53
[WIP] Monadic refactoring
2023-02-06 15:10:34 +01:00
Matthias Bisping
d8f37bed5c
[WIP] Monadic refactoring
2023-02-06 15:09:51 +01:00
Matthias Bisping
906fee0e5d
[WIP] Monadic refactoring
2023-02-06 15:03:35 +01:00
Matthias Bisping
4e3168e51c
[WIP] Monadic refactoring
2023-02-06 14:36:25 +01:00
Matthias Bisping
f645984ea4
Update dependencies
2023-02-06 13:25:07 +01:00
Matthias Bisping
0cf8e047c5
Refactoring
2023-02-06 13:22:33 +01:00
Matthias Bisping
112e18ebb5
Tweak logging
2023-02-06 13:21:41 +01:00
Matthias Bisping
1d1eb8b649
Track missing test data files
2023-02-06 13:21:34 +01:00
Matthias Bisping
0244ba7f17
Make test for bad xref work
RED-6084-adhoc-scanned-pages-filtering-alternative_17
2023-02-06 12:18:25 +01:00
Matthias Bisping
825099d946
Replace bad-xref file
2023-02-06 11:47:42 +01:00
Matthias Bisping
f6dbfcab43
Add test for handling of bad xrefs
2023-02-06 11:31:43 +01:00
Matthias Bisping
e63f66a126
Refactoring
...
- Rename metadata -> metadatum in some more places to make it clear that
it is the metadata of a single image in that context
- Re-order function definitions according to caller hierarchy
2023-02-06 10:46:56 +01:00
Matthias Bisping
6136bf57d4
Start tracking test/data with DVC
2023-02-06 10:07:16 +01:00
Matthias Bisping
290a8de3e3
Stop tracking test/data
2023-02-06 10:06:43 +01:00
Julius Unverfehrt
4d43e385c5
replace image extraction logic final
RED-6084-adhoc-scanned-pages-filtering-alternative_16
RED-6189-bugfix_2
test
2023-02-06 09:43:28 +01:00
Julius Unverfehrt
bd0279ddd1
introduce normalizing function for image extraction
2023-02-03 12:25:27 +01:00
Julius Unverfehrt
2995d5ee48
refactoring
2023-02-03 11:14:14 +01:00
Julius Unverfehrt
eff1bb4124
adjust behavior of filtering of invalid images
1.20.6
RED-6084-adhoc-scanned-pages-filtering-alternative_14
2023-02-03 09:04:02 +01:00
Julius Unverfehrt
c478333111
add log in callback to diplay which file is processed
2023-02-03 08:25:36 +01:00
Julius Unverfehrt
978f48e8f9
add ad hoc logic for bad xref handling
1.20.5
RED-6084-adhoc-scanned-pages-filtering-alternative_8
2023-02-02 15:39:44 +01:00
Julius Unverfehrt
94652aafe4
beautify
2023-02-02 15:26:33 +01:00
Julius Unverfehrt
c4416636c0
beautify
1.20.4
RED-6084-adhoc-scanned-pages-filtering-alternative_6
2023-02-02 14:10:32 +01:00
Julius Unverfehrt
c0b41e77b8
implement ad hoc channel count detection for new image extraction
RED-6084-adhoc-scanned-pages-filtering-alternative_5
2023-02-02 13:57:56 +01:00
Julius Unverfehrt
73f7491c8f
improve performance
...
- disable scanned page filter, since dropping these disables the
computation of the images hash and the frontend OCR hint, which are both
wanted
- optimize image extraction by using arrays instead of byte streams for
the conversion to PIL images
RED-6084-adhoc-scanned-pages-filtering-alternative_4
2023-02-02 13:37:03 +01:00
Julius Unverfehrt
2385584dcb
refactor scanned page filtering
1.20.3
RED-6084-adhoc-scanned-pages-filtering-alternative_2
2023-02-01 15:49:36 +01:00
Julius Unverfehrt
b880e892ec
refactor scanned page filtering WIP
2023-02-01 15:47:40 +01:00
Julius Unverfehrt
8c7349c2d1
refactor scanned page filtering WIP
2023-02-01 15:36:16 +01:00
Julius Unverfehrt
c55777e339
refactor scanned page filtering WIP
2023-02-01 15:16:12 +01:00
Julius Unverfehrt
0f440bdb09
refactor scanned page filtering WIP
2023-02-01 15:14:27 +01:00
Julius Unverfehrt
436a32ad2b
refactor scanned page filtering WIP
2023-02-01 15:07:35 +01:00
Julius Unverfehrt
9ec6cc19ba
refactor scanned page filtering WIP
2023-02-01 14:53:26 +01:00
Julius Unverfehrt
2d385b0a73
refactor scanned page filtering WIP
2023-02-01 14:38:55 +01:00
Julius Unverfehrt
5bd5e0cf2b
refactor
...
- reduce code duplication by adapting functions of the module
- use the modules enums for image metadata
- improve readabilty of the scanned page detection heuristic
RED-6084-adhoc-scanned-pages-filtering_7
2023-02-01 12:43:59 +01:00
Julius Unverfehrt
876260f403
improve the readability of variable names and docstrings
RED-6084-adhoc-scanned-pages-filtering_6
2023-02-01 10:08:36 +01:00