# Infrastructure to deploy Research Projects The Infrastructure expects to be deployed in the same Pod / local environment as the analysis container and handles all outbound communication. ## Configuration A configuration is located in `/config.yaml`. All relevant variables can be configured via exporting environment variables. | Environment Variable | Default | Description | |-------------------------------|--------------------------------|--------------------------------------------------------------------------------------------------| | _service_ | | | | LOGGING_LEVEL_ROOT | DEBUG | Logging level for service logger | | RESPONSE_TYPE | "stream" | Whether the analysis response is stored as file on storage or sent as stream: "file" or "stream" | | RESPONSE_FILE_EXTENSION | ".NER_ENTITIES.json.gz" | Extension to the file that stores the analyized response on storage | | _probing_webserver_ | | | | PROBING_WEBSERVER_HOST | "0.0.0.0" | Probe webserver address | | PROBING_WEBSERVER_PORT | 8080 | Probe webserver port | | PROBING_WEBSERVER_MODE | production | Webserver mode: {development, production} | | _rabbitmq_ | | | | RABBITMQ_HOST | localhost | RabbitMQ host address | | RABBITMQ_PORT | 5672 | RabbitMQ host port | | RABBITMQ_USERNAME | user | RabbitMQ username | | RABBITMQ_PASSWORD | bitnami | RabbitMQ password | | RABBITMQ_HEARTBEAT | 7200 | Controls AMQP heartbeat timeout in seconds | | _queues_ | | | | REQUEST_QUEUE | request_queue | Requests to service | | RESPONSE_QUEUE | response_queue | Responses by service | | DEAD_LETTER_QUEUE | dead_letter_queue | Messages that failed to process | | _callback_ | | | | RETRY | False | Toggles retry behaviour | | MAX_ATTEMPTS | 3 | Number of times a message may fail before being published to dead letter queue | | ANALYSIS_ENDPOINT | "http://127.0.0.1:5000" | | | _storage_ | | | | STORAGE_BACKEND | s3 | The type of storage to use {s3, azure} | | STORAGE_BUCKET | "pyinfra-test-bucket" | The bucket / container to pull files specified in queue requests from | | TARGET_FILE_EXTENSION | ".TEXT.json.gz" | Defines type of file to pull from storage: .TEXT.json.gz or .ORIGIN.pdf.gz | | STORAGE_ENDPOINT | "http://127.0.0.1:9000" | | | STORAGE_KEY | | | | STORAGE_SECRET | | | | STORAGE_AZURECONNECTIONSTRING | "DefaultEndpointsProtocol=..." | | ## Response Format ### RESPONSE_AS_FILE == False Response-Format: ```json { "dossierId": "klaus", "fileId": "1a7fd8ac0da7656a487b68f89188be82", "imageMetadata": ANALYSIS_DATA } ``` Response-example for image-prediction ```json { "dossierId": "klaus", "fileId": "1a7fd8ac0da7656a487b68f89188be82", "imageMetadata": [ { "classification": { "label": "logo", "probabilities": { "formula": 0.0, "logo": 1.0, "other": 0.0, "signature": 0.0 } }, "filters": { "allPassed": true, "geometry": { "imageFormat": { "quotient": 1.570791527313267, "tooTall": false, "tooWide": false }, "imageSize": { "quotient": 0.19059804229011604, "tooLarge": false, "tooSmall": false } }, "probability": { "unconfident": false } }, "geometry": { "height": 107.63999999999999, "width": 169.08000000000004 }, "position": { "pageNumber": 1, "x1": 213.12, "x2": 382.20000000000005, "y1": 568.7604, "y2": 676.4004 } } ] } ``` ### RESPONSE_AS_FILE == True Creates a respone file on the request storage, named `dossier_Id / file_Id + RESPONSE_FILE_EXTENSION` with the `ANALYSIS_DATA` as content. ## Development ### Local Setup You can run the infrastructure either as module via. `src/serve.py` or as Dockercontainer simulating the kubernetes environment 1. Install module / build docker image ```bash pip install -e . pip install -r requirements.txt ``` ```bash docker build -f Dockerfile -t pyinfra . ``` 2. Run rabbitmq & minio ```bash docker-compose up ``` 3. Run module / docker container ```bash python src/serve.py ``` ```bash docker run --net=host pyinfra ```