image-classification-service

Go to file

Julius Unverfehrt 5d611d5fae Pull request #17 : RED-4329 add prometheus

Merge in RR/image-prediction from RED-4329-add-prometheus to master

Squashed commit of the following:

commit 7fcf256c5277a3cfafcaf76c3116e3643ad01fa4
Merge: 8381621 c14d00c
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Tue Jun 21 15:41:14 2022 +0200

    Merge branch 'master' into RED-4329-add-prometheus

commit 8381621ae08b1a91563c9c655020ec55bb58ecc5
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Tue Jun 21 15:24:50 2022 +0200

    add prometheus endpoint

commit 26f07088b0a711b6f9db0974f5dfc8aa8ad4e1dc
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Tue Jun 21 15:14:34 2022 +0200

    refactor

commit c563aa505018f8a14931a16a9061d361b5d4c383
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Tue Jun 21 15:10:19 2022 +0200

    test bamboo build

commit 2b8446e703617c6897b6149846f2548ec292a9a1
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Tue Jun 21 14:40:44 2022 +0200

    RED-4329 add prometheus endpoint with summary metric

2022-06-21 15:49:01 +02:00

.dvc

Pull request #9 : Tdd refactoring

2022-04-25 12:25:41 +02:00

bamboo-specs

Pull request #9 : Tdd refactoring

2022-04-25 12:25:41 +02:00

data

Pull request #9 : Tdd refactoring

2022-04-25 12:25:41 +02:00

doc

Pull request #9 : Tdd refactoring

2022-04-25 12:25:41 +02:00

image_prediction

Pull request #17 : RED-4329 add prometheus

2022-06-21 15:49:01 +02:00

incl

Pull request #9 : Tdd refactoring

2022-04-25 12:25:41 +02:00

scripts

Pull request #9 : Tdd refactoring

2022-04-25 12:25:41 +02:00

src

Pull request #12 : Image prediction service overhaul xref and empty result fix fix

2022-05-12 10:18:13 +02:00

test

Pull request #13 : Image representation info

2022-05-12 11:49:19 +02:00

__init__.py

Pull request #1 : Setup

2022-02-03 11:44:11 +01:00

.coveragerc

Pull request #9 : Tdd refactoring

2022-04-25 12:25:41 +02:00

.dockerignore

Pull request #9 : Docker image tuning, batching of pdf pages and misc other

2022-02-21 15:36:38 +01:00

.dvcignore

Pull request #1 : Setup

2022-02-03 11:44:11 +01:00

.gitignore

Pull request #9 : Tdd refactoring

2022-04-25 12:25:41 +02:00

.gitmodules

Pull request #9 : Tdd refactoring

2022-04-25 12:25:41 +02:00

banner.txt

Pull request #9 : Tdd refactoring

2022-04-25 12:25:41 +02:00

config.yaml

Pull request #9 : Tdd refactoring

2022-04-25 12:25:41 +02:00

Dockerfile

Pull request #9 : Tdd refactoring

2022-04-25 12:25:41 +02:00

Dockerfile_base

Quickfix Dockerfile

2022-03-02 10:56:37 +01:00

Dockerfile_tests

Pull request #9 : Tdd refactoring

2022-04-25 12:25:41 +02:00

pytest.ini

Pull request #9 : Tdd refactoring

2022-04-25 12:25:41 +02:00

README.md

Pull request #9 : Tdd refactoring

2022-04-25 12:25:41 +02:00

requirements.txt

Pull request #17 : RED-4329 add prometheus

2022-06-21 15:49:01 +02:00

run_tests.sh

Pull request #9 : Tdd refactoring

2022-04-25 12:25:41 +02:00

setup.py

adapt service-container to image-service-v2

2022-03-01 14:17:37 +01:00

sonar-project.properties

adapt service-container to image-service-v2

2022-03-01 14:17:37 +01:00

README.md

Setup

Build base image

docker build -f Dockerfile_base -t image-prediction-base .
docker build -f Dockerfile -t image-prediction .

Usage

Without Docker

py scripts/run_pipeline.py /path/to/a/pdf

With Docker

Shell 1

docker run --rm --net=host image-prediction

Shell 2

python scripts/pyinfra_mock.py /path/to/a/pdf

Tests

Run for example this command to execute all tests and get a coverage report:

coverage run -m pytest test --tb=native -q -s -vvv -x && coverage combine && coverage report -m

After having built the service container as specified above, you can also run tests in a container as follows:

./run_tests.sh

Message Body Formats

Request Format

The request messages need to provide the fields "dossierId" and "fileId". A request should look like this:

{
    "dossierId": "<string identifier>",
    "fileId": "<string identifier>"
}

Any additional keys are ignored.

Response Format

Response bodies contain information about the identified class of the image, the confidence of the classification, the position and size of the image as well as the results of additional convenience filters which can be configured through environment variables. A response body looks like this:

{
  "dossierId": "debug",
  "fileId": "13ffa9851740c8d20c4c7d1706d72f2a",
  "data": [...]
}

An image metadata record (entry in "data" field of a response body) looks like this:

{
  "classification": {
    "label": "logo",
    "probabilities": {
      "logo": 1.0,
      "signature": 1.1599173226749333e-17,
      "other": 2.994595513398207e-23,
      "formula": 4.352109377281029e-31
    }
  },
  "position": {
    "x1": 475.95,
    "x2": 533.4,
    "y1": 796.47,
    "y2": 827.62,
    "pageNumber": 6
  },
  "geometry": {
    "width": 57.44999999999999,
    "height": 31.149999999999977
  },
  "alpha": false,
  "filters": {
    "geometry": {
      "imageSize": {
        "quotient": 0.05975350599135938,
        "tooLarge": false,
        "tooSmall": false
      },
      "imageFormat": {
        "quotient": 1.8443017656500813,
        "tooTall": false,
        "tooWide": false
      }
    },
    "probability": {
      "unconfident": false
    },
    "allPassed": true
  }
}

Configuration

A configuration file is located under config.yaml. All relevant variables can be configured via exporting environment variables.

Environment Variable	Default	Description
LOGGING_LEVEL_ROOT	"INFO"	Logging level for log file messages
VERBOSE	true	Service prints document processing progress to stdout
BATCH_SIZE	16	Number of images in memory simultaneously per service instance
RUN_ID	"fabfb1f192c745369b88cab34471aba7"	The ID of the mlflow run to load the image classifier from
MIN_REL_IMAGE_SIZE	0.05	Minimally permissible image size to page size ratio
MAX_REL_IMAGE_SIZE	0.75	Maximally permissible image size to page size ratio
MIN_IMAGE_FORMAT	0.1	Minimally permissible image width to height ratio
MAX_IMAGE_FORMAT	10	Maximally permissible image width to height ratio

Releases 29

Release 2.20.0 Latest

2025-01-31 13:08:10 +01:00

Languages

Python 96.5%

Dockerfile 1.8%

Shell 1.7%