Merge in RR/image-prediction from RED-5202-port-hotfixes to master
Squashed commit of the following:
commit 9674901235264de6b74d679fd39a52775ac4aee1
Merge: ec2ab89 9763d2c
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date: Mon Sep 12 15:55:58 2022 +0200
Merge remote-tracking branch 'origin' into RED-5202-port-hotfixes
commit ec2ab890b8307942d147d6b8b236f6a3c1d0aebc
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date: Mon Sep 12 15:49:17 2022 +0200
swap case when the log is printed for env var parsing
commit aaa02ea35e9c1b3b307116d7e3e32c93fd79ef5d
Merge: 5d87066 521222e
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date: Mon Sep 12 15:28:39 2022 +0200
Merge branch 'master' of ssh://git.iqser.com:2222/rr/image-prediction into RED-5202-port-hotfixes
commit 5d87066b40b28f919b1346f5e5396b46445b4e00
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date: Mon Sep 12 15:25:01 2022 +0200
remove warning log for non existent non default env var
commit 23c61ef49ef918b29952150d4a6e61b99d60ac64
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date: Mon Sep 12 15:14:19 2022 +0200
make env var parser discrete
commit c1b92270354c764861da0f7782348e9cd0725d76
Author: Matthias Bisping <matthias.bisping@axbit.com>
Date: Mon Sep 12 13:28:44 2022 +0200
fixed statefulness issue with os.environ in tests
commit ad9c5657fe93079d5646ba2b70fa091e8d2daf76
Author: Matthias Bisping <matthias.bisping@axbit.com>
Date: Mon Sep 12 13:04:55 2022 +0200
- Adapted response formatting logic for threshold maps passed via env vars.
- Added test for reading threshold maps and values from env vars.
commit c60e8cd6781b8e0c3ec69ccd0a25375803de26f0
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date: Mon Sep 12 11:38:01 2022 +0200
add parser for environment variables WIP
commit 101b71726c697f30ec9298ba62d2203bd7da2efb
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date: Mon Sep 12 09:52:33 2022 +0200
Add typehints, make custom page quotient breach function private since the intention of outsourcing it from build_image_info is to make it testable seperately
commit 04aee4e62781e78cd54c6d20e961dcd7bf1fc081
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date: Mon Sep 12 09:25:59 2022 +0200
DotIndexable default get method exception made more specific
commit 4584e7ba66400033dc5f1a38473b644eeb11e67c
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date: Mon Sep 12 08:55:05 2022 +0200
RED-5202 port temporary broken image handling so the hotfix won't be lost by upgrading the service. A proper solution is still desirable (see RED-5148)
commit 5f99622646b3f6d3a842aebef91ff8e082072cd6
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date: Mon Sep 12 08:47:02 2022 +0200
RED-5202 add per class customizable max image to page quotient setting for signatures, default is 0.4. Can be overwritten by , set to null to use default value or set to value that should be used.
Setup
Build base image
docker build -f Dockerfile_base -t image-prediction-base .
docker build -f Dockerfile -t image-prediction .
Usage
Without Docker
py scripts/run_pipeline.py /path/to/a/pdf
With Docker
Shell 1
docker run --rm --net=host image-prediction
Shell 2
python scripts/pyinfra_mock.py /path/to/a/pdf
Tests
Run for example this command to execute all tests and get a coverage report:
coverage run -m pytest test --tb=native -q -s -vvv -x && coverage combine && coverage report -m
After having built the service container as specified above, you can also run tests in a container as follows:
./run_tests.sh
Message Body Formats
Request Format
The request messages need to provide the fields "dossierId" and "fileId". A request should look like this:
{
"dossierId": "<string identifier>",
"fileId": "<string identifier>"
}
Any additional keys are ignored.
Response Format
Response bodies contain information about the identified class of the image, the confidence of the classification, the position and size of the image as well as the results of additional convenience filters which can be configured through environment variables. A response body looks like this:
{
"dossierId": "debug",
"fileId": "13ffa9851740c8d20c4c7d1706d72f2a",
"data": [...]
}
An image metadata record (entry in "data" field of a response body) looks like this:
{
"classification": {
"label": "logo",
"probabilities": {
"logo": 1.0,
"signature": 1.1599173226749333e-17,
"other": 2.994595513398207e-23,
"formula": 4.352109377281029e-31
}
},
"position": {
"x1": 475.95,
"x2": 533.4,
"y1": 796.47,
"y2": 827.62,
"pageNumber": 6
},
"geometry": {
"width": 57.44999999999999,
"height": 31.149999999999977
},
"alpha": false,
"filters": {
"geometry": {
"imageSize": {
"quotient": 0.05975350599135938,
"tooLarge": false,
"tooSmall": false
},
"imageFormat": {
"quotient": 1.8443017656500813,
"tooTall": false,
"tooWide": false
}
},
"probability": {
"unconfident": false
},
"allPassed": true
}
}
Configuration
A configuration file is located under config.yaml. All relevant variables can be configured via
exporting environment variables.
| Environment Variable | Default | Description |
|---|---|---|
| LOGGING_LEVEL_ROOT | "INFO" | Logging level for log file messages |
| VERBOSE | true | Service prints document processing progress to stdout |
| BATCH_SIZE | 16 | Number of images in memory simultaneously per service instance |
| RUN_ID | "fabfb1f192c745369b88cab34471aba7" | The ID of the mlflow run to load the image classifier from |
| MIN_REL_IMAGE_SIZE | 0.05 | Minimally permissible image size to page size ratio |
| MAX_REL_IMAGE_SIZE | 0.75 | Maximally permissible image size to page size ratio |
| MIN_IMAGE_FORMAT | 0.1 | Minimally permissible image width to height ratio |
| MAX_IMAGE_FORMAT | 10 | Maximally permissible image width to height ratio |
See also: https://git.iqser.com/projects/RED/repos/helm/browse/redaction/templates/image-service-v2