Merge in RR/image-prediction from RED-6205-add-prometheus-monitoring to master
Squashed commit of the following:
commit 6932b5ee579a31d0317dc3f76acb8dd2845fdb4b
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date: Thu Mar 16 17:30:57 2023 +0100
update pyinfra
commit d6e55534623eae2edcddaa6dd333f93171d421dc
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date: Thu Mar 16 16:30:14 2023 +0100
set pyinfra subproject to current master commit
commit 030dc660e6060ae326c32fba8c2944a10866fbb6
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date: Thu Mar 16 16:25:19 2023 +0100
adapt serve script to advanced pyinfra API including monitoring of the processing time of images.
commit 0fa0c44c376c52653e517d257a35793797f7be31
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date: Thu Mar 16 15:19:57 2023 +0100
Update dockerfile to work with new pyinfra package setup utilizing pyproject.toml instad of setup.py and requirments.txt
commit aad53c4d313f908de93a13e69e2cb150db3be6cb
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date: Thu Mar 16 14:16:04 2023 +0100
remove no longer needed dependencies
Setup
Build base image
docker build -f Dockerfile_base -t image-prediction-base .
docker build -f Dockerfile -t image-prediction .
Usage
Without Docker
py scripts/run_pipeline.py /path/to/a/pdf
With Docker
Shell 1
docker run --rm --net=host image-prediction
Shell 2
python scripts/pyinfra_mock.py /path/to/a/pdf
Tests
Run for example this command to execute all tests and get a coverage report:
coverage run -m pytest test --tb=native -q -s -vvv -x && coverage combine && coverage report -m
After having built the service container as specified above, you can also run tests in a container as follows:
./run_tests.sh
Message Body Formats
Request Format
The request messages need to provide the fields "dossierId" and "fileId". A request should look like this:
{
"dossierId": "<string identifier>",
"fileId": "<string identifier>"
}
Any additional keys are ignored.
Response Format
Response bodies contain information about the identified class of the image, the confidence of the classification, the position and size of the image as well as the results of additional convenience filters which can be configured through environment variables. A response body looks like this:
{
"dossierId": "debug",
"fileId": "13ffa9851740c8d20c4c7d1706d72f2a",
"data": [...]
}
An image metadata record (entry in "data" field of a response body) looks like this:
{
"classification": {
"label": "logo",
"probabilities": {
"logo": 1.0,
"signature": 1.1599173226749333e-17,
"other": 2.994595513398207e-23,
"formula": 4.352109377281029e-31
}
},
"position": {
"x1": 475.95,
"x2": 533.4,
"y1": 796.47,
"y2": 827.62,
"pageNumber": 6
},
"geometry": {
"width": 57.44999999999999,
"height": 31.149999999999977
},
"alpha": false,
"filters": {
"geometry": {
"imageSize": {
"quotient": 0.05975350599135938,
"tooLarge": false,
"tooSmall": false
},
"imageFormat": {
"quotient": 1.8443017656500813,
"tooTall": false,
"tooWide": false
}
},
"probability": {
"unconfident": false
},
"allPassed": true
}
}
Configuration
A configuration file is located under config.yaml. All relevant variables can be configured via
exporting environment variables.
| Environment Variable | Default | Description |
|---|---|---|
| LOGGING_LEVEL_ROOT | "INFO" | Logging level for log file messages |
| VERBOSE | true | Service prints document processing progress to stdout |
| BATCH_SIZE | 16 | Number of images in memory simultaneously per service instance |
| RUN_ID | "fabfb1f192c745369b88cab34471aba7" | The ID of the mlflow run to load the image classifier from |
| MIN_REL_IMAGE_SIZE | 0.05 | Minimally permissible image size to page size ratio |
| MAX_REL_IMAGE_SIZE | 0.75 | Maximally permissible image size to page size ratio |
| MIN_IMAGE_FORMAT | 0.1 | Minimally permissible image width to height ratio |
| MAX_IMAGE_FORMAT | 10 | Maximally permissible image width to height ratio |
See also: https://git.iqser.com/projects/RED/repos/helm/browse/redaction/templates/image-service-v2