Merge in RR/pyinfra from feature/MLOPS-23-pyinfra-does-not-use-the-value-of-the-storage_azurecontainername-environment to master * commit '7a740403bb65db97c8e4cb54de00aac3536b2e4c': update update test config update test config add pytests to check if a configured bucket can be found add submodule initialization load different env vars for the variable depending on the set
Infrastructure to deploy Research Projects
The Infrastructure expects to be deployed in the same Pod / local environment as the analysis container and handles all outbound communication.
Configuration
A configuration is located in /config.yaml. All relevant variables can be configured via exporting environment variables.
| Environment Variable | Default | Description |
|---|---|---|
| LOGGING_LEVEL_ROOT | DEBUG | Logging level for service logger |
| PROBING_WEBSERVER_HOST | "0.0.0.0" | Probe webserver address |
| PROBING_WEBSERVER_PORT | 8080 | Probe webserver port |
| PROBING_WEBSERVER_MODE | production | Webserver mode: {development, production} |
| RABBITMQ_HOST | localhost | RabbitMQ host address |
| RABBITMQ_PORT | 5672 | RabbitMQ host port |
| RABBITMQ_USERNAME | user | RabbitMQ username |
| RABBITMQ_PASSWORD | bitnami | RabbitMQ password |
| RABBITMQ_HEARTBEAT | 7200 | Controls AMQP heartbeat timeout in seconds |
| REQUEST_QUEUE | request_queue | Requests to service |
| RESPONSE_QUEUE | response_queue | Responses by service |
| DEAD_LETTER_QUEUE | dead_letter_queue | Messages that failed to process |
| ANALYSIS_ENDPOINT | "http://127.0.0.1:5000" | Endpoint for analysis container |
| STORAGE_BACKEND | s3 | The type of storage to use {s3, azure} |
| STORAGE_BUCKET | "redaction" | The bucket / container to pull files specified in queue requests from |
| STORAGE_ENDPOINT | "http://127.0.0.1:9000" | Endpoint for s3 storage |
| STORAGE_KEY | root | User for s3 storage |
| STORAGE_SECRET | password | Password for s3 storage |
| STORAGE_AZURECONNECTIONSTRING | "DefaultEndpointsProtocol=..." | Connection string for Azure storage |
| STORAGE_AZURECONTAINERNAME | "redaction" | AKS container |
Response Format
Expected AMQP input message:
{
"dossierId": "",
"fileId": "",
"targetFileExtension": "",
"responseFileExtension": ""
}
Optionally, the input message can contain a field with the key "operations".
AMQP output message:
{
"dossierId": "",
"fileId": "",
...
}
Development
Either run src/serve.py or the built Docker image.
Setup
Install module.
pip install -e .
pip install -r requirements.txt
or build docker image.
docker build -f Dockerfile -t pyinfra .
Usage
Shell 1: Start a MinIO and a RabbitMQ docker container.
docker-compose up
Shell 2: Add files to the local minio storage.
python scripts/manage_minio.py add <MinIO target folder> -d path/to/a/folder/with/PDFs
Shell 2: Run pyinfra-server.
python src/serve.py
or as container:
docker run --net=host pyinfra
Shell 3: Run analysis-container.
Shell 4: Start a client that sends requests to process PDFs from the MinIO store and annotates these PDFs according to the service responses.
python scripts/mock_client.py
Description
Release 4.1.0
Latest
Languages
Python
96.7%
Makefile
2%
Shell
1.3%