### Setup Build base image ```bash docker build -f Dockerfile_base -t image-prediction-base . docker build -f Dockerfile -t image-prediction . ``` ### Usage #### Without Docker ```bash py scripts/run_pipeline.py /path/to/a/pdf ``` #### With Docker Shell 1 ```bash docker run --rm --net=host image-prediction ``` Shell 2 ```bash python scripts/pyinfra_mock.py /path/to/a/pdf ``` ### Message Body Formats #### Request Format The request messages need to provide the fields `"dossierID"` and `"fileID"`. The file to be processed is assumed to be located in the MinIO store under `redaction//.ORIG.pdf.gz`. A request should look like this: ```json { "dossierID": "", "fileID": "" } ``` Any additional keys are ignored. #### Response Format Response bodies contain information about the identified class of the image, the confidence of the classification, the position and size of the image as well as the results of additional convenience filters which can be configured through environment variables. A response body looks like this: ```json { "dossierId": "debug", "fileId": "13ffa9851740c8d20c4c7d1706d72f2a", "data": [...] } ``` An image metadata record (entry in `"data"` field of a response body) looks like this: ```json { "classification": { "label": "logo", "probabilities": { "logo": 1.0, "signature": 1.1599173226749333e-17, "other": 2.994595513398207e-23, "formula": 4.352109377281029e-31 } }, "position": { "x1": 475.95, "x2": 533.4, "y1": 796.47, "y2": 827.62, "pageNumber": 6 }, "geometry": { "width": 57.44999999999999, "height": 31.149999999999977 }, "filters": { "geometry": { "imageSize": { "quotient": 0.05975350599135938, "tooLarge": false, "tooSmall": false }, "imageFormat": { "quotient": 1.8443017656500813, "tooTall": false, "tooWide": false } }, "probability": { "unconfident": false }, "allPassed": true } } ``` ## Configuration A configuration file is located under `incl/image_service/config.yaml`. All relevant variables can be configured via exporting environment variables. | __Environment Variable__ | Default | Description | |------------------------------------|------------------------------------|----------------------------------------------------------------------------------------| | __LOGGING_LEVEL_ROOT__ | "INFO" | Logging level for log file messages | | __VERBOSE__ | *true* | Service prints document processing progress to stdout | | __BATCH_SIZE__ | 32 | Number of images in memory simultaneously per service instance | | __RUN_ID__ | "fabfb1f192c745369b88cab34471aba7" | The ID of the mlflow run to load the image classifier from | | __MIN_REL_IMAGE_SIZE__ | 0.05 | Minimally permissible image size to page size ratio | | __MAX_REL_IMAGE_SIZE__ | 0.75 | Maximally permissible image size to page size ratio | | __MIN_IMAGE_FORMAT__ | 0.1 | Minimally permissible image width to height ratio | | __MAX_IMAGE_FORMAT__ | 10 | Maximally permissible image width to height ratio | See also: https://git.iqser.com/projects/RED/repos/helm/browse/redaction/templates/image-service-v2 ## Liveness and Readiness The service runs a webserver on `0.0.0.0/8080` which responds to GET requests on `0.0.0.0/8080/ready` and `0.0.0.0/8080/health` with the status of the service (status code 200 for nominal status). Each service instance is monitored independently. A request to `0.0.0.0/8080` is forwarded to subordinated webservers each coupled to exactly one service instance. The responses by the subordinated webservers are aggregated either under an all or an existential quantifier (see `CHECK_QUANTIFIER`). Note that checks are evaluated lazily, so missing check logs from subordinated webservers is not unexpected when using an existential quantifier.