Merge in RR/image-prediction from RED-4329-add-prometheus to master
Squashed commit of the following:
commit 7fcf256c5277a3cfafcaf76c3116e3643ad01fa4
Merge: 8381621 c14d00c
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date: Tue Jun 21 15:41:14 2022 +0200
Merge branch 'master' into RED-4329-add-prometheus
commit 8381621ae08b1a91563c9c655020ec55bb58ecc5
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date: Tue Jun 21 15:24:50 2022 +0200
add prometheus endpoint
commit 26f07088b0a711b6f9db0974f5dfc8aa8ad4e1dc
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date: Tue Jun 21 15:14:34 2022 +0200
refactor
commit c563aa505018f8a14931a16a9061d361b5d4c383
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date: Tue Jun 21 15:10:19 2022 +0200
test bamboo build
commit 2b8446e703617c6897b6149846f2548ec292a9a1
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date: Tue Jun 21 14:40:44 2022 +0200
RED-4329 add prometheus endpoint with summary metric
Setup
Build base image
docker build -f Dockerfile_base -t image-prediction-base .
docker build -f Dockerfile -t image-prediction .
Usage
Without Docker
py scripts/run_pipeline.py /path/to/a/pdf
With Docker
Shell 1
docker run --rm --net=host image-prediction
Shell 2
python scripts/pyinfra_mock.py /path/to/a/pdf
Tests
Run for example this command to execute all tests and get a coverage report:
coverage run -m pytest test --tb=native -q -s -vvv -x && coverage combine && coverage report -m
After having built the service container as specified above, you can also run tests in a container as follows:
./run_tests.sh
Message Body Formats
Request Format
The request messages need to provide the fields "dossierId" and "fileId". A request should look like this:
{
"dossierId": "<string identifier>",
"fileId": "<string identifier>"
}
Any additional keys are ignored.
Response Format
Response bodies contain information about the identified class of the image, the confidence of the classification, the position and size of the image as well as the results of additional convenience filters which can be configured through environment variables. A response body looks like this:
{
"dossierId": "debug",
"fileId": "13ffa9851740c8d20c4c7d1706d72f2a",
"data": [...]
}
An image metadata record (entry in "data" field of a response body) looks like this:
{
"classification": {
"label": "logo",
"probabilities": {
"logo": 1.0,
"signature": 1.1599173226749333e-17,
"other": 2.994595513398207e-23,
"formula": 4.352109377281029e-31
}
},
"position": {
"x1": 475.95,
"x2": 533.4,
"y1": 796.47,
"y2": 827.62,
"pageNumber": 6
},
"geometry": {
"width": 57.44999999999999,
"height": 31.149999999999977
},
"alpha": false,
"filters": {
"geometry": {
"imageSize": {
"quotient": 0.05975350599135938,
"tooLarge": false,
"tooSmall": false
},
"imageFormat": {
"quotient": 1.8443017656500813,
"tooTall": false,
"tooWide": false
}
},
"probability": {
"unconfident": false
},
"allPassed": true
}
}
Configuration
A configuration file is located under config.yaml. All relevant variables can be configured via
exporting environment variables.
| Environment Variable | Default | Description |
|---|---|---|
| LOGGING_LEVEL_ROOT | "INFO" | Logging level for log file messages |
| VERBOSE | true | Service prints document processing progress to stdout |
| BATCH_SIZE | 16 | Number of images in memory simultaneously per service instance |
| RUN_ID | "fabfb1f192c745369b88cab34471aba7" | The ID of the mlflow run to load the image classifier from |
| MIN_REL_IMAGE_SIZE | 0.05 | Minimally permissible image size to page size ratio |
| MAX_REL_IMAGE_SIZE | 0.75 | Maximally permissible image size to page size ratio |
| MIN_IMAGE_FORMAT | 0.1 | Minimally permissible image width to height ratio |
| MAX_IMAGE_FORMAT | 10 | Maximally permissible image width to height ratio |
See also: https://git.iqser.com/projects/RED/repos/helm/browse/redaction/templates/image-service-v2