Matthias Bisping 6030f4055a Pull request #12: Image prediction service overhaul xref and empty result fix fix
Merge in RR/image-prediction from image_prediction_service_overhaul_xref_and_empty_result_fix_fix to master

Squashed commit of the following:

commit 1dfa95b3e2875d58d19639a2110ba50a46e949aa
Merge: c9cad0e eb050a5
Author: Matthias Bisping <matthias.bisping@iqser.com>
Date:   Thu May 12 10:13:40 2022 +0200

    Merge branch 'master' of ssh://git.iqser.com:2222/rr/image-prediction into image_prediction_service_overhaul_xref_and_empty_result_fix_fix

commit c9cad0eda55c32e4cb0b601679e39d4962b4b485
Author: Matthias Bisping <matthias.bisping@iqser.com>
Date:   Mon Apr 25 17:06:59 2022 +0200

    logging setup changed

commit 89e33618fe6b8e30a376d619395db6a6c664e218
Author: Matthias Bisping <matthias.bisping@iqser.com>
Date:   Mon Apr 25 17:01:44 2022 +0200

    logging setup changed

commit 7312e57d1127b081bfdc6e96311e8348d3f8110d
Author: Matthias Bisping <matthias.bisping@iqser.com>
Date:   Mon Apr 25 16:45:12 2022 +0200

    logging setup changed

commit 955e353d74f414ee2d57b234bdf84d32817d14bf
Author: Matthias Bisping <matthias.bisping@iqser.com>
Date:   Mon Apr 25 16:37:52 2022 +0200

    fixed assignment
2022-05-12 10:18:13 +02:00
2022-04-25 12:25:41 +02:00
2022-04-25 12:25:41 +02:00
2022-04-25 12:25:41 +02:00
2022-04-25 12:25:41 +02:00
2022-04-25 12:25:41 +02:00
2022-02-03 11:44:11 +01:00
2022-04-25 12:25:41 +02:00
2022-02-03 11:44:11 +01:00
2022-04-25 12:25:41 +02:00
2022-04-25 12:25:41 +02:00
2022-04-25 12:25:41 +02:00
2022-04-25 12:25:41 +02:00
2022-04-25 12:25:41 +02:00
2022-03-02 10:56:37 +01:00
2022-04-25 12:25:41 +02:00
2022-04-25 12:25:41 +02:00
2022-04-25 12:25:41 +02:00

Setup

Build base image

docker build -f Dockerfile_base -t image-prediction-base .
docker build -f Dockerfile -t image-prediction .

Usage

Without Docker

py scripts/run_pipeline.py /path/to/a/pdf

With Docker

Shell 1

docker run --rm --net=host image-prediction

Shell 2

python scripts/pyinfra_mock.py /path/to/a/pdf

Tests

Run for example this command to execute all tests and get a coverage report:

coverage run -m pytest test --tb=native -q -s -vvv -x && coverage combine && coverage report -m

After having built the service container as specified above, you can also run tests in a container as follows:

./run_tests.sh

Message Body Formats

Request Format

The request messages need to provide the fields "dossierId" and "fileId". A request should look like this:

{
    "dossierId": "<string identifier>",
    "fileId": "<string identifier>"
}

Any additional keys are ignored.

Response Format

Response bodies contain information about the identified class of the image, the confidence of the classification, the position and size of the image as well as the results of additional convenience filters which can be configured through environment variables. A response body looks like this:

{
  "dossierId": "debug",
  "fileId": "13ffa9851740c8d20c4c7d1706d72f2a",
  "data": [...]
}

An image metadata record (entry in "data" field of a response body) looks like this:

{
  "classification": {
    "label": "logo",
    "probabilities": {
      "logo": 1.0,
      "signature": 1.1599173226749333e-17,
      "other": 2.994595513398207e-23,
      "formula": 4.352109377281029e-31
    }
  },
  "position": {
    "x1": 475.95,
    "x2": 533.4,
    "y1": 796.47,
    "y2": 827.62,
    "pageNumber": 6
  },
  "geometry": {
    "width": 57.44999999999999,
    "height": 31.149999999999977
  },
  "alpha": false,
  "filters": {
    "geometry": {
      "imageSize": {
        "quotient": 0.05975350599135938,
        "tooLarge": false,
        "tooSmall": false
      },
      "imageFormat": {
        "quotient": 1.8443017656500813,
        "tooTall": false,
        "tooWide": false
      }
    },
    "probability": {
      "unconfident": false
    },
    "allPassed": true
  }
}

Configuration

A configuration file is located under config.yaml. All relevant variables can be configured via exporting environment variables.

Environment Variable Default Description
LOGGING_LEVEL_ROOT "INFO" Logging level for log file messages
VERBOSE true Service prints document processing progress to stdout
BATCH_SIZE 16 Number of images in memory simultaneously per service instance
RUN_ID "fabfb1f192c745369b88cab34471aba7" The ID of the mlflow run to load the image classifier from
MIN_REL_IMAGE_SIZE 0.05 Minimally permissible image size to page size ratio
MAX_REL_IMAGE_SIZE 0.75 Maximally permissible image size to page size ratio
MIN_IMAGE_FORMAT 0.1 Minimally permissible image width to height ratio
MAX_IMAGE_FORMAT 10 Maximally permissible image width to height ratio

See also: https://git.iqser.com/projects/RED/repos/helm/browse/redaction/templates/image-service-v2

Description
Analysis container service for redai-image
Readme 1.8 MiB
2025-01-31 13:08:10 +01:00
Languages
Python 96.5%
Dockerfile 1.8%
Shell 1.7%