Julius Unverfehrt 9763d2ca65 Pull request #32: RED-5202 port hotfixes
Merge in RR/image-prediction from RED-5202-port-hotfixes to master

Squashed commit of the following:

commit aaa02ea35e9c1b3b307116d7e3e32c93fd79ef5d
Merge: 5d87066 521222e
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Mon Sep 12 15:28:39 2022 +0200

    Merge branch 'master' of ssh://git.iqser.com:2222/rr/image-prediction into RED-5202-port-hotfixes

commit 5d87066b40b28f919b1346f5e5396b46445b4e00
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Mon Sep 12 15:25:01 2022 +0200

    remove warning log for non existent non default env var

commit 23c61ef49ef918b29952150d4a6e61b99d60ac64
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Mon Sep 12 15:14:19 2022 +0200

    make env var parser discrete

commit c1b92270354c764861da0f7782348e9cd0725d76
Author: Matthias Bisping <matthias.bisping@axbit.com>
Date:   Mon Sep 12 13:28:44 2022 +0200

    fixed statefulness issue with os.environ in tests

commit ad9c5657fe93079d5646ba2b70fa091e8d2daf76
Author: Matthias Bisping <matthias.bisping@axbit.com>
Date:   Mon Sep 12 13:04:55 2022 +0200

    - Adapted response formatting logic for threshold maps passed via env vars.
    - Added test for reading threshold maps and values from env vars.

commit c60e8cd6781b8e0c3ec69ccd0a25375803de26f0
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Mon Sep 12 11:38:01 2022 +0200

    add parser for environment variables WIP

commit 101b71726c697f30ec9298ba62d2203bd7da2efb
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Mon Sep 12 09:52:33 2022 +0200

    Add typehints, make custom page quotient breach function private since the intention of outsourcing it from build_image_info is to make it testable seperately

commit 04aee4e62781e78cd54c6d20e961dcd7bf1fc081
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Mon Sep 12 09:25:59 2022 +0200

    DotIndexable default get method exception made more specific

commit 4584e7ba66400033dc5f1a38473b644eeb11e67c
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Mon Sep 12 08:55:05 2022 +0200

    RED-5202 port temporary broken image handling so the hotfix won't be lost by upgrading the service. A proper solution is still desirable (see RED-5148)

commit 5f99622646b3f6d3a842aebef91ff8e082072cd6
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Mon Sep 12 08:47:02 2022 +0200

    RED-5202 add per class customizable max image to page quotient setting for signatures, default is 0.4. Can be overwritten by , set to null to use default value or set to value that should be used.
2022-09-12 15:29:47 +02:00
2022-04-25 12:25:41 +02:00
2022-04-25 12:25:41 +02:00
2022-04-25 12:25:41 +02:00
2022-02-03 11:44:11 +01:00
2022-04-25 12:25:41 +02:00
2022-02-03 11:44:11 +01:00
2022-04-25 12:25:41 +02:00
2022-04-25 12:25:41 +02:00
2022-04-25 12:25:41 +02:00
2022-03-02 10:56:37 +01:00
2022-04-25 12:25:41 +02:00
2022-04-25 12:25:41 +02:00
2022-04-25 12:25:41 +02:00
2022-08-03 14:59:07 +02:00

Setup

Build base image

docker build -f Dockerfile_base -t image-prediction-base .
docker build -f Dockerfile -t image-prediction .

Usage

Without Docker

py scripts/run_pipeline.py /path/to/a/pdf

With Docker

Shell 1

docker run --rm --net=host image-prediction

Shell 2

python scripts/pyinfra_mock.py /path/to/a/pdf

Tests

Run for example this command to execute all tests and get a coverage report:

coverage run -m pytest test --tb=native -q -s -vvv -x && coverage combine && coverage report -m

After having built the service container as specified above, you can also run tests in a container as follows:

./run_tests.sh

Message Body Formats

Request Format

The request messages need to provide the fields "dossierId" and "fileId". A request should look like this:

{
    "dossierId": "<string identifier>",
    "fileId": "<string identifier>"
}

Any additional keys are ignored.

Response Format

Response bodies contain information about the identified class of the image, the confidence of the classification, the position and size of the image as well as the results of additional convenience filters which can be configured through environment variables. A response body looks like this:

{
  "dossierId": "debug",
  "fileId": "13ffa9851740c8d20c4c7d1706d72f2a",
  "data": [...]
}

An image metadata record (entry in "data" field of a response body) looks like this:

{
  "classification": {
    "label": "logo",
    "probabilities": {
      "logo": 1.0,
      "signature": 1.1599173226749333e-17,
      "other": 2.994595513398207e-23,
      "formula": 4.352109377281029e-31
    }
  },
  "position": {
    "x1": 475.95,
    "x2": 533.4,
    "y1": 796.47,
    "y2": 827.62,
    "pageNumber": 6
  },
  "geometry": {
    "width": 57.44999999999999,
    "height": 31.149999999999977
  },
  "alpha": false,
  "filters": {
    "geometry": {
      "imageSize": {
        "quotient": 0.05975350599135938,
        "tooLarge": false,
        "tooSmall": false
      },
      "imageFormat": {
        "quotient": 1.8443017656500813,
        "tooTall": false,
        "tooWide": false
      }
    },
    "probability": {
      "unconfident": false
    },
    "allPassed": true
  }
}

Configuration

A configuration file is located under config.yaml. All relevant variables can be configured via exporting environment variables.

Environment Variable Default Description
LOGGING_LEVEL_ROOT "INFO" Logging level for log file messages
VERBOSE true Service prints document processing progress to stdout
BATCH_SIZE 16 Number of images in memory simultaneously per service instance
RUN_ID "fabfb1f192c745369b88cab34471aba7" The ID of the mlflow run to load the image classifier from
MIN_REL_IMAGE_SIZE 0.05 Minimally permissible image size to page size ratio
MAX_REL_IMAGE_SIZE 0.75 Maximally permissible image size to page size ratio
MIN_IMAGE_FORMAT 0.1 Minimally permissible image width to height ratio
MAX_IMAGE_FORMAT 10 Maximally permissible image width to height ratio

See also: https://git.iqser.com/projects/RED/repos/helm/browse/redaction/templates/image-service-v2

Description
Analysis container service for redai-image
Readme 1.8 MiB
2025-01-31 13:08:10 +01:00
Languages
Python 96.5%
Dockerfile 1.8%
Shell 1.7%