Merge in RR/image-prediction from docstrfix to master
Squashed commit of the following:
commit 8ccb07037074cc88ba5b72e4bedd5bc346eb0256
Merge: 77cd0a8 5d611d5
Author: Matthias Bisping <matthias.bisping@iqser.com>
Date: Mon Jul 4 11:50:52 2022 +0200
Merge branch 'master' of ssh://git.iqser.com:2222/rr/image-prediction into docstrfix
commit 77cd0a860a69bfb8f4390dabdca23455b340bd9e
Author: Matthias Bisping <matthias.bisping@iqser.com>
Date: Mon Jul 4 11:46:25 2022 +0200
fixed docstring
commit eb53464ca9f1ccf881d90ece592ad50226decd7a
Merge: 4efb9c7 fd0e4dc
Author: Matthias Bisping <matthias.bisping@iqser.com>
Date: Tue Jun 21 15:22:03 2022 +0200
Merge branch 'master' of ssh://git.iqser.com:2222/rr/image-prediction
commit 4efb9c79b10f23fa556ce43c8e7f05944dae1af6
Merge: 84a8b0a 9f18ef9
Author: Matthias Bisping <matthias.bisping@iqser.com>
Date: Thu May 12 11:51:30 2022 +0200
Merge branch 'master' of ssh://git.iqser.com:2222/rr/image-prediction
commit 84a8b0a290081616240c3876f8db8a1ae8592096
Merge: 1624ee4 6030f40
Author: Matthias Bisping <matthias.bisping@iqser.com>
Date: Thu May 12 10:18:56 2022 +0200
Merge branch 'master' of ssh://git.iqser.com:2222/rr/image-prediction
commit 1624ee40376b84a4519025343f913120c464407a
Author: Matthias Bisping <Matthias.Bisping@iqser.com>
Date: Mon Apr 25 16:51:13 2022 +0200
Pull request #11: fixed assignment
Merge in RR/image-prediction from image_prediction_service_overhaul_xref_and_empty_result_fix_fix to master
Squashed commit of the following:
commit 7312e57d1127b081bfdc6e96311e8348d3f8110d
Author: Matthias Bisping <matthias.bisping@iqser.com>
Date: Mon Apr 25 16:45:12 2022 +0200
logging setup changed
commit 955e353d74f414ee2d57b234bdf84d32817d14bf
Author: Matthias Bisping <matthias.bisping@iqser.com>
Date: Mon Apr 25 16:37:52 2022 +0200
fixed assignment
Setup
Build base image
docker build -f Dockerfile_base -t image-prediction-base .
docker build -f Dockerfile -t image-prediction .
Usage
Without Docker
py scripts/run_pipeline.py /path/to/a/pdf
With Docker
Shell 1
docker run --rm --net=host image-prediction
Shell 2
python scripts/pyinfra_mock.py /path/to/a/pdf
Tests
Run for example this command to execute all tests and get a coverage report:
coverage run -m pytest test --tb=native -q -s -vvv -x && coverage combine && coverage report -m
After having built the service container as specified above, you can also run tests in a container as follows:
./run_tests.sh
Message Body Formats
Request Format
The request messages need to provide the fields "dossierId" and "fileId". A request should look like this:
{
"dossierId": "<string identifier>",
"fileId": "<string identifier>"
}
Any additional keys are ignored.
Response Format
Response bodies contain information about the identified class of the image, the confidence of the classification, the position and size of the image as well as the results of additional convenience filters which can be configured through environment variables. A response body looks like this:
{
"dossierId": "debug",
"fileId": "13ffa9851740c8d20c4c7d1706d72f2a",
"data": [...]
}
An image metadata record (entry in "data" field of a response body) looks like this:
{
"classification": {
"label": "logo",
"probabilities": {
"logo": 1.0,
"signature": 1.1599173226749333e-17,
"other": 2.994595513398207e-23,
"formula": 4.352109377281029e-31
}
},
"position": {
"x1": 475.95,
"x2": 533.4,
"y1": 796.47,
"y2": 827.62,
"pageNumber": 6
},
"geometry": {
"width": 57.44999999999999,
"height": 31.149999999999977
},
"alpha": false,
"filters": {
"geometry": {
"imageSize": {
"quotient": 0.05975350599135938,
"tooLarge": false,
"tooSmall": false
},
"imageFormat": {
"quotient": 1.8443017656500813,
"tooTall": false,
"tooWide": false
}
},
"probability": {
"unconfident": false
},
"allPassed": true
}
}
Configuration
A configuration file is located under config.yaml. All relevant variables can be configured via
exporting environment variables.
| Environment Variable | Default | Description |
|---|---|---|
| LOGGING_LEVEL_ROOT | "INFO" | Logging level for log file messages |
| VERBOSE | true | Service prints document processing progress to stdout |
| BATCH_SIZE | 16 | Number of images in memory simultaneously per service instance |
| RUN_ID | "fabfb1f192c745369b88cab34471aba7" | The ID of the mlflow run to load the image classifier from |
| MIN_REL_IMAGE_SIZE | 0.05 | Minimally permissible image size to page size ratio |
| MAX_REL_IMAGE_SIZE | 0.75 | Maximally permissible image size to page size ratio |
| MIN_IMAGE_FORMAT | 0.1 | Minimally permissible image width to height ratio |
| MAX_IMAGE_FORMAT | 10 | Maximally permissible image width to height ratio |
See also: https://git.iqser.com/projects/RED/repos/helm/browse/redaction/templates/image-service-v2