pyinfra/README.md
2022-03-02 09:34:38 +01:00

6.9 KiB
Executable File

Infrastructure to deploy Research Projects

The Infrastructure expects to be deployed in the same Pod / local environment as the analysis container and handles all outbound communication.

Configuration

A configuration is located in /config.yaml. All relevant variables can be configured via exporting environment variables.

Environment Variable Default Description
service
LOGGING_LEVEL_ROOT DEBUG Logging level for service logger
RESPONSE_AS_FILE False Whether the response is stored as file on storage or sent as stream
RESPONSE_FILE_EXTENSION ".NER_ENTITIES.json.gz" Extension to the file that stores the analyized response on storage
probing_webserver
PROBING_WEBSERVER_HOST "0.0.0.0" Probe webserver address
PROBING_WEBSERVER_PORT 8080 Probe webserver port
PROBING_WEBSERVER_MODE production Webserver mode: {development, production}
rabbitmq
RABBITMQ_HOST localhost RabbitMQ host address
RABBITMQ_PORT 5672 RabbitMQ host port
RABBITMQ_USERNAME user RabbitMQ username
RABBITMQ_PASSWORD bitnami RabbitMQ password
RABBITMQ_HEARTBEAT 7200 Controls AMQP heartbeat timeout in seconds
queues
REQUEST_QUEUE request_queue Requests to service
RESPONSE_QUEUE response_queue Responses by service
DEAD_LETTER_QUEUE dead_letter_queue Messages that failed to process
callback
RETRY False Toggles retry behaviour
MAX_ATTEMPTS 3 Number of times a message may fail before being published to dead letter queue
ANALYSIS_ENDPOINT "http://127.0.0.1:5000"
storage
STORAGE_BACKEND s3 The type of storage to use {s3, azure}
STORAGE_BUCKET "pyinfra-test-bucket" The bucket / container to pull files specified in queue requests from
TARGET_FILE_EXTENSION ".TEXT.json.gz" Defines type of file to pull from storage: .TEXT.json.gz or .ORIGIN.pdf.gz
STORAGE_ENDPOINT "http://127.0.0.1:9000"
STORAGE_KEY
STORAGE_SECRET
STORAGE_AZURECONNECTIONSTRING "DefaultEndpointsProtocol=..."

Response Format

RESPONSE_AS_FILE == False

Response-Format:

{
  "dossierId": "klaus",
  "fileId": "1a7fd8ac0da7656a487b68f89188be82",
  "imageMetadata": ANALYSIS_DATA
}

Response-example for image-prediction

{
  "dossierId": "klaus",
  "fileId": "1a7fd8ac0da7656a487b68f89188be82",
  "imageMetadata": [
    {
      "classification": {
        "label": "logo",
        "probabilities": {
          "formula": 0.0,
          "logo": 1.0,
          "other": 0.0,
          "signature": 0.0
        }
      },
      "filters": {
        "allPassed": true,
        "geometry": {
          "imageFormat": {
            "quotient": 1.570791527313267,
            "tooTall": false,
            "tooWide": false
          },
          "imageSize": {
            "quotient": 0.19059804229011604,
            "tooLarge": false,
            "tooSmall": false
          }
        },
        "probability": {
          "unconfident": false
        }
      },
      "geometry": {
        "height": 107.63999999999999,
        "width": 169.08000000000004
      },
      "position": {
        "pageNumber": 1,
        "x1": 213.12,
        "x2": 382.20000000000005,
        "y1": 568.7604,
        "y2": 676.4004
      }
    }
  ]
}

RESPONSE_AS_FILE == True

Creates a respone file on the request storage, named dossier_Id / file_Id + RESPONSE_FILE_EXTENSION with the ANALYSIS_DATA as content.

Development

Local Setup

You can run the infrastructure either as module via. src/serve.py or as Dockercontainer simulating the kubernetes environment

  1. Install module / build docker image

    pip install -e .
    pip install -r requirements.txt
    
    docker build -f Dockerfile -t pyinfra .
    
  2. Run rabbitmq & minio

    docker-compose up
    
  3. Run module / docker container

    python src/serve.py
    
    docker run --net=host pyinfra