Clarissa Dietrich c34a1ce849 Pull request #18: refactor README
Merge in RR/pyinfra from readme to master

* commit '5cb3632b3cbecc2996edd862994dee27ff9d0883':
  refactor README
2022-03-08 16:18:50 +01:00
2022-02-22 10:19:18 +01:00
2022-02-17 08:20:59 +01:00
2022-03-08 16:15:27 +01:00
2022-02-16 15:49:38 +01:00

Infrastructure to deploy Research Projects

The Infrastructure expects to be deployed in the same Pod / local environment as the analysis container and handles all outbound communication.

Configuration

A configuration is located in /config.yaml. All relevant variables can be configured via exporting environment variables.

Environment Variable Default Description
service
LOGGING_LEVEL_ROOT DEBUG Logging level for service logger
RESPONSE_TYPE "stream" Whether the analysis response is stored as file on storage or sent as stream: "file" or "stream"
RESPONSE_FILE_EXTENSION ".NER_ENTITIES.json.gz" Extension to the file that stores the analyzed response on storage
probing_webserver
PROBING_WEBSERVER_HOST "0.0.0.0" Probe webserver address
PROBING_WEBSERVER_PORT 8080 Probe webserver port
PROBING_WEBSERVER_MODE production Webserver mode: {development, production}
rabbitmq
RABBITMQ_HOST localhost RabbitMQ host address
RABBITMQ_PORT 5672 RabbitMQ host port
RABBITMQ_USERNAME user RabbitMQ username
RABBITMQ_PASSWORD bitnami RabbitMQ password
RABBITMQ_HEARTBEAT 7200 Controls AMQP heartbeat timeout in seconds
queues
REQUEST_QUEUE request_queue Requests to service
RESPONSE_QUEUE response_queue Responses by service
DEAD_LETTER_QUEUE dead_letter_queue Messages that failed to process
callback
RETRY False Toggles retry behaviour
MAX_ATTEMPTS 3 Number of times a message may fail before being published to dead letter queue
ANALYSIS_ENDPOINT "http://127.0.0.1:5000"
storage
STORAGE_BACKEND s3 The type of storage to use {s3, azure}
STORAGE_BUCKET "pyinfra-test-bucket" The bucket / container to pull files specified in queue requests from
TARGET_FILE_EXTENSION ".TEXT.json.gz" Defines type of file to pull from storage: .TEXT.json.gz or .ORIGIN.pdf.gz
STORAGE_ENDPOINT "http://127.0.0.1:9000"
STORAGE_KEY
STORAGE_SECRET
STORAGE_AZURECONNECTIONSTRING "DefaultEndpointsProtocol=..."

Response Format

RESPONSE_AS_FILE == False

Response-Format:

{
  "dossierId": "klaus",
  "fileId": "1a7fd8ac0da7656a487b68f89188be82",
  "imageMetadata": ANALYSIS_DATA
}

Response-example for image-prediction

{
  "dossierId": "klaus",
  "fileId": "1a7fd8ac0da7656a487b68f89188be82",
  "imageMetadata": [
    {
      "classification": {
        "label": "logo",
        "probabilities": {
          "formula": 0.0,
          "logo": 1.0,
          "other": 0.0,
          "signature": 0.0
        }
      },
      "filters": {
        "allPassed": true,
        "geometry": {
          "imageFormat": {
            "quotient": 1.570791527313267,
            "tooTall": false,
            "tooWide": false
          },
          "imageSize": {
            "quotient": 0.19059804229011604,
            "tooLarge": false,
            "tooSmall": false
          }
        },
        "probability": {
          "unconfident": false
        }
      },
      "geometry": {
        "height": 107.63999999999999,
        "width": 169.08000000000004
      },
      "position": {
        "pageNumber": 1,
        "x1": 213.12,
        "x2": 382.20000000000005,
        "y1": 568.7604,
        "y2": 676.4004
      }
    }
  ]
}

RESPONSE_AS_FILE == True

Creates a respone file on the request storage, named dossier_Id / file_Id + RESPONSE_FILE_EXTENSION with the ANALYSIS_DATA as content.

Development

Local Setup

You can run the infrastructure either as module via. src/serve.py or as Dockercontainer simulating the kubernetes environment

  1. Install module / build docker image

    pip install -e .
    pip install -r requirements.txt
    
    docker build -f Dockerfile -t pyinfra .
    
  2. Run rabbitmq & minio

    docker-compose up
    
  3. Run module

    python src/serve.py    
    

    OR as container:

    docker run --net=host pyinfra
    

Start your prediction container for example ner-prediction or image-prediction (follow their corresponding README for building the container).

To put a file on the queue do:

python src/manage_minio.py add --file path/to/file dossierID

To start mock:

python src/mock_client.py    

Hints:

When stopping the docker-compose up, use docker-compose down to remove containers created by up.

If uploaded files are stuck, clean the minio storage by using python src/manage_minio.py purge or delete local minio data folder in pyinfra with sudo rm -rf data

Description
Infrastructure container for analysis container
Readme 3.2 MiB
Release 4.1.0 Latest
2025-01-22 12:38:26 +01:00
Languages
Python 96.7%
Makefile 2%
Shell 1.3%