image-classification-service/README.md

### Setup

Build base image
```bash
docker build -t image-classification-image --progress=plain --no-cache \
    -f Dockerfile \
    --build-arg USERNAME=$GITLAB_USER \
    --build-arg TOKEN=$GITLAB_ACCESS_TOKEN \
    .
```

### Usage

#### Without Docker


```bash
py scripts/run_pipeline.py /path/to/a/pdf
```

#### With Docker

Shell 1

```bash
docker run --rm --net=host image-prediction
```

Shell 2

```bash
python scripts/pyinfra_mock.py /path/to/a/pdf
```

### Tests

Run for example this command to execute all tests and get a coverage report:

```bash
coverage run -m pytest test --tb=native -q -s -vvv -x && coverage combine && coverage report -m
```

After having built the service container as specified above, you can also run tests in a container as follows:

```bash
./run_tests.sh
```

### Message Body Formats


#### Request Format

The request messages need to provide the fields `"dossierId"` and `"fileId"`. A request should look like this:

```json
{
    "dossierId": "<string identifier>",
    "fileId": "<string identifier>"
}
```

Any additional keys are ignored.


#### Response Format

Response bodies contain information about the identified class of the image, the confidence of the classification, the
position and size of the image as well as the results of additional convenience filters which can be configured through
environment variables. A response body looks like this:

```json
{
  "dossierId": "debug",
  "fileId": "13ffa9851740c8d20c4c7d1706d72f2a",
  "data": [...]
}
```

An image metadata record (entry in `"data"` field of a response body) looks like this:

```json
{
  "classification": {
    "label": "logo",
    "probabilities": {
      "logo": 1.0,
      "signature": 1.1599173226749333e-17,
      "other": 2.994595513398207e-23,
      "formula": 4.352109377281029e-31
    }
  },
  "position": {
    "x1": 475.95,
    "x2": 533.4,
    "y1": 796.47,
    "y2": 827.62,
    "pageNumber": 6
  },
  "geometry": {
    "width": 57.44999999999999,
    "height": 31.149999999999977
  },
  "alpha": false,
  "filters": {
    "geometry": {
      "imageSize": {
        "quotient": 0.05975350599135938,
        "tooLarge": false,
        "tooSmall": false
      },
      "imageFormat": {
        "quotient": 1.8443017656500813,
        "tooTall": false,
        "tooWide": false
      }
    },
    "probability": {
      "unconfident": false
    },
    "allPassed": true
  }
}
```


## Configuration

A configuration file is located under `config.yaml`. All relevant variables can be configured via
exporting environment variables.

| __Environment Variable__           | Default                            | Description                                                                            |
|------------------------------------|------------------------------------|----------------------------------------------------------------------------------------|
| __LOGGING_LEVEL_ROOT__             | "INFO"                             | Logging level for log file messages                                                    |
| __VERBOSE__                        | *true*                             | Service prints document processing progress to stdout                                  |
| __BATCH_SIZE__                     | 16                                 | Number of images in memory simultaneously per service instance                         |
| __RUN_ID__                         | "fabfb1f192c745369b88cab34471aba7" | The ID of the mlflow run to load the image classifier from                             |
| __MIN_REL_IMAGE_SIZE__             | 0.05                               | Minimally permissible image size to page size ratio                                    |
| __MAX_REL_IMAGE_SIZE__             | 0.75                               | Maximally permissible image size to page size ratio                                    |
| __MIN_IMAGE_FORMAT__               | 0.1                                | Minimally permissible image width to height ratio                                      |
| __MAX_IMAGE_FORMAT__               | 10                                 | Maximally permissible image width to height ratio                                      |

See also: https://git.iqser.com/projects/RED/repos/helm/browse/redaction/templates/image-service-v2