image-classification-service

Marmelator/image-classification-service

Fork 0

Go to file

Matthias Bisping 0d06ad657e readme updated and config

2022-04-25 11:19:35 +02:00

.dvc

reimplemented model loader logic and moved base weights into mlflow run dir

2022-03-29 19:50:43 +02:00

bamboo-specs

no compose down

2022-04-20 15:28:27 +02:00

data

model builder path in mlruns adjusted

2022-04-20 14:43:09 +02:00

doc

refactoring

2022-03-31 19:17:48 +02:00

image_prediction

refactoring

2022-04-25 11:19:26 +02:00

incl

removed submodule

2022-04-19 14:36:54 +02:00

scripts

refactoring

2022-04-25 11:19:26 +02:00

src

skip server predict test

2022-04-19 20:01:59 +02:00

test

Pull request #8 : Pipeline refactoring

2022-04-25 10:08:49 +02:00

__init__.py

Pull request #1 : Setup

2022-02-03 11:44:11 +01:00

.coveragerc

misc

2022-04-03 04:35:44 +02:00

.dockerignore

Pull request #9 : Docker image tuning, batching of pdf pages and misc other

2022-02-21 15:36:38 +01:00

.dvcignore

Pull request #1 : Setup

2022-02-03 11:44:11 +01:00

.gitignore

coverage combine

2022-04-20 15:22:17 +02:00

.gitmodules

removed submodule

2022-04-19 14:36:54 +02:00

banner.txt

changed banner

2022-03-31 19:50:12 +02:00

config.yaml

readme updated and config

2022-04-25 11:19:35 +02:00

Dockerfile

add banner.txt to container

2022-04-25 11:17:49 +02:00

Dockerfile_base

Quickfix Dockerfile

2022-03-02 10:56:37 +01:00

Dockerfile_tests

coverage combine

2022-04-20 15:22:17 +02:00

pytest.ini

eliminated redai dependency; updated requirement versions

2022-04-01 21:10:41 +02:00

README.md

readme updated and config

2022-04-25 11:19:35 +02:00

requirements.txt

containerized tests

2022-04-19 17:58:19 +02:00

run_tests.sh

rm debug ls

2022-04-20 15:39:58 +02:00

setup.py

adapt service-container to image-service-v2

2022-03-01 14:17:37 +01:00

sonar-project.properties

adapt service-container to image-service-v2

2022-03-01 14:17:37 +01:00

README.md

Setup

Build base image

docker build -f Dockerfile_base -t image-prediction-base .
docker build -f Dockerfile -t image-prediction .

Usage

Without Docker

py scripts/run_pipeline.py /path/to/a/pdf

With Docker

Shell 1

docker run --rm --net=host image-prediction

Shell 2

python scripts/pyinfra_mock.py /path/to/a/pdf

Message Body Formats

Request Format

The request messages need to provide the fields "dossierID" and "fileID". The file to be processed is assumed to be located in the MinIO store under redaction/<dossierID>/<fileID>.ORIG.pdf.gz. A request should look like this:

{
    "dossierID": "<string identifier>",
    "fileID": "<string identifier>"
}

Any additional keys are ignored.

Response Format

Response bodies contain information about the identified class of the image, the confidence of the classification, the position and size of the image as well as the results of additional convenience filters which can be configured through environment variables. A response body looks like this:

{
  "dossierId": "debug",
  "fileId": "13ffa9851740c8d20c4c7d1706d72f2a",
  "data": [...]
}

An image metadata record (entry in "data" field of a response body) looks like this:

{
  "classification": {
    "label": "logo",
    "probabilities": {
      "logo": 1.0,
      "signature": 1.1599173226749333e-17,
      "other": 2.994595513398207e-23,
      "formula": 4.352109377281029e-31
    }
  },
  "position": {
    "x1": 475.95,
    "x2": 533.4,
    "y1": 796.47,
    "y2": 827.62,
    "pageNumber": 6
  },
  "geometry": {
    "width": 57.44999999999999,
    "height": 31.149999999999977
  },
  "filters": {
    "geometry": {
      "imageSize": {
        "quotient": 0.05975350599135938,
        "tooLarge": false,
        "tooSmall": false
      },
      "imageFormat": {
        "quotient": 1.8443017656500813,
        "tooTall": false,
        "tooWide": false
      }
    },
    "probability": {
      "unconfident": false
    },
    "allPassed": true
  }
}

Configuration

A configuration file is located under incl/image_service/config.yaml. All relevant variables can be configured via exporting environment variables.

Environment Variable	Default	Description
LOGGING_LEVEL_ROOT	"INFO"	Logging level for log file messages
VERBOSE	true	Service prints document processing progress to stdout
BATCH_SIZE	32	Number of images in memory simultaneously per service instance
RUN_ID	"fabfb1f192c745369b88cab34471aba7"	The ID of the mlflow run to load the image classifier from
MIN_REL_IMAGE_SIZE	0.05	Minimally permissible image size to page size ratio
MAX_REL_IMAGE_SIZE	0.75	Maximally permissible image size to page size ratio
MIN_IMAGE_FORMAT	0.1	Minimally permissible image width to height ratio
MAX_IMAGE_FORMAT	10	Maximally permissible image width to height ratio

Liveness and Readiness

The service runs a webserver on 0.0.0.0/8080 which responds to GET requests on 0.0.0.0/8080/ready and 0.0.0.0/8080/health with the status of the service (status code 200 for nominal status). Each service instance is monitored independently. A request to 0.0.0.0/8080 is forwarded to subordinated webservers each coupled to exactly one service instance. The responses by the subordinated webservers are aggregated either under an all or an existential quantifier (see CHECK_QUANTIFIER). Note that checks are evaluated lazily, so missing check logs from subordinated webservers is not unexpected when using an existential quantifier.

Releases 29

Release 2.20.0 Latest

2025-01-31 13:08:10 +01:00

Languages

Python 96.5%

Dockerfile 1.8%

Shell 1.7%