image-classification-service

Go to file

Julius Unverfehrt 81520b1a53 Pull request #46 : RED-6205 add prometheus monitoring

Merge in RR/image-prediction from RED-6205-add-prometheus-monitoring to master

Squashed commit of the following:

commit 6932b5ee579a31d0317dc3f76acb8dd2845fdb4b
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Thu Mar 16 17:30:57 2023 +0100

    update pyinfra

commit d6e55534623eae2edcddaa6dd333f93171d421dc
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Thu Mar 16 16:30:14 2023 +0100

    set pyinfra subproject to current master commit

commit 030dc660e6060ae326c32fba8c2944a10866fbb6
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Thu Mar 16 16:25:19 2023 +0100

    adapt serve script to advanced pyinfra API including monitoring of the processing time of images.

commit 0fa0c44c376c52653e517d257a35793797f7be31
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Thu Mar 16 15:19:57 2023 +0100

    Update dockerfile to work with new pyinfra package setup utilizing pyproject.toml instad of setup.py and requirments.txt

commit aad53c4d313f908de93a13e69e2cb150db3be6cb
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Thu Mar 16 14:16:04 2023 +0100

    remove no longer needed dependencies

2023-03-17 16:12:59 +01:00

.dvc

Pull request #9 : Tdd refactoring

2022-04-25 12:25:41 +02:00

bamboo-specs

build dev image and push to nexus

2023-02-14 16:30:18 +01:00

data

Pull request #9 : Tdd refactoring

2022-04-25 12:25:41 +02:00

doc

Pull request #9 : Tdd refactoring

2022-04-25 12:25:41 +02:00

image_prediction

add log config to __init__.py

2023-02-15 15:44:56 +01:00

incl

Pull request #46 : RED-6205 add prometheus monitoring

2023-03-17 16:12:59 +01:00

scripts

Pull request #39 : RED-6084 Improve image extraction speed

2023-02-10 08:33:13 +01:00

src

Pull request #46 : RED-6205 add prometheus monitoring

2023-03-17 16:12:59 +01:00

test

Merge branch 'master' of ssh://git.iqser.com:2222/rr/image-prediction into RED-6189-bugfix

2023-02-13 17:23:07 +01:00

__init__.py

Pull request #1 : Setup

2022-02-03 11:44:11 +01:00

.coveragerc

Pull request #9 : Tdd refactoring

2022-04-25 12:25:41 +02:00

.dockerignore

Pull request #9 : Docker image tuning, batching of pdf pages and misc other

2022-02-21 15:36:38 +01:00

.dvcignore

Pull request #1 : Setup

2022-02-03 11:44:11 +01:00

.gitignore

update

2023-02-14 16:25:49 +01:00

.gitmodules

Pull request #46 : RED-6205 add prometheus monitoring

2023-03-17 16:12:59 +01:00

banner.txt

Pull request #9 : Tdd refactoring

2022-04-25 12:25:41 +02:00

config.yaml

format + set verbose to False by default

2023-02-14 16:26:24 +01:00

Dockerfile

Pull request #46 : RED-6205 add prometheus monitoring

2023-03-17 16:12:59 +01:00

Dockerfile_base

Quickfix Dockerfile

2022-03-02 10:56:37 +01:00

Dockerfile_tests

Pull request #46 : RED-6205 add prometheus monitoring

2023-03-17 16:12:59 +01:00

pytest.ini

Pull request #9 : Tdd refactoring

2022-04-25 12:25:41 +02:00

README.md

Pull request #9 : Tdd refactoring

2022-04-25 12:25:41 +02:00

requirements.txt

Pull request #39 : RED-6084 Improve image extraction speed

2023-02-10 08:33:13 +01:00

run_tests.sh

Pull request #9 : Tdd refactoring

2022-04-25 12:25:41 +02:00

setup.py

adapt service-container to image-service-v2

2022-03-01 14:17:37 +01:00

sonar-project.properties

adapt service-container to image-service-v2

2022-03-01 14:17:37 +01:00

version.yaml

RED-4758: adjust to new buildjob

2022-08-03 14:59:07 +02:00

README.md

Setup

Build base image

docker build -f Dockerfile_base -t image-prediction-base .
docker build -f Dockerfile -t image-prediction .

Usage

Without Docker

py scripts/run_pipeline.py /path/to/a/pdf

With Docker

Shell 1

docker run --rm --net=host image-prediction

Shell 2

python scripts/pyinfra_mock.py /path/to/a/pdf

Tests

Run for example this command to execute all tests and get a coverage report:

coverage run -m pytest test --tb=native -q -s -vvv -x && coverage combine && coverage report -m

After having built the service container as specified above, you can also run tests in a container as follows:

./run_tests.sh

Message Body Formats

Request Format

The request messages need to provide the fields "dossierId" and "fileId". A request should look like this:

{
    "dossierId": "<string identifier>",
    "fileId": "<string identifier>"
}

Any additional keys are ignored.

Response Format

Response bodies contain information about the identified class of the image, the confidence of the classification, the position and size of the image as well as the results of additional convenience filters which can be configured through environment variables. A response body looks like this:

{
  "dossierId": "debug",
  "fileId": "13ffa9851740c8d20c4c7d1706d72f2a",
  "data": [...]
}

An image metadata record (entry in "data" field of a response body) looks like this:

{
  "classification": {
    "label": "logo",
    "probabilities": {
      "logo": 1.0,
      "signature": 1.1599173226749333e-17,
      "other": 2.994595513398207e-23,
      "formula": 4.352109377281029e-31
    }
  },
  "position": {
    "x1": 475.95,
    "x2": 533.4,
    "y1": 796.47,
    "y2": 827.62,
    "pageNumber": 6
  },
  "geometry": {
    "width": 57.44999999999999,
    "height": 31.149999999999977
  },
  "alpha": false,
  "filters": {
    "geometry": {
      "imageSize": {
        "quotient": 0.05975350599135938,
        "tooLarge": false,
        "tooSmall": false
      },
      "imageFormat": {
        "quotient": 1.8443017656500813,
        "tooTall": false,
        "tooWide": false
      }
    },
    "probability": {
      "unconfident": false
    },
    "allPassed": true
  }
}

Configuration

A configuration file is located under config.yaml. All relevant variables can be configured via exporting environment variables.

Environment Variable	Default	Description
LOGGING_LEVEL_ROOT	"INFO"	Logging level for log file messages
VERBOSE	true	Service prints document processing progress to stdout
BATCH_SIZE	16	Number of images in memory simultaneously per service instance
RUN_ID	"fabfb1f192c745369b88cab34471aba7"	The ID of the mlflow run to load the image classifier from
MIN_REL_IMAGE_SIZE	0.05	Minimally permissible image size to page size ratio
MAX_REL_IMAGE_SIZE	0.75	Maximally permissible image size to page size ratio
MIN_IMAGE_FORMAT	0.1	Minimally permissible image width to height ratio
MAX_IMAGE_FORMAT	10	Maximally permissible image width to height ratio

Releases 29

Release 2.20.0 Latest

2025-01-31 13:08:10 +01:00

Languages

Python 96.5%

Dockerfile 1.8%

Shell 1.7%