- Upgrades python version to 3.10 and sync packages with isaacs list. - Changes loguru logger to kn_utlis logger. - Overrides python version in CI script (temporarily until all services are updated and CI template can be adjusted).
PyInfra
About
Common Module with the infrastructure to deploy Research Projects. The Infrastructure expects to be deployed in the same Pod / local environment as the analysis container and handles all outbound communication.
Configuration
A configuration is located in /config.yaml. All relevant variables can be configured via exporting environment variables.
| Environment Variable | Default | Description |
|---|---|---|
| LOGGING_LEVEL_ROOT | "DEBUG" | Logging level for service logger |
| MONITORING_ENABLED | True | Enables Prometheus monitoring |
| PROMETHEUS_METRIC_PREFIX | "redactmanager_research_service" | Prometheus metric prefix, per convention '{product_name}_{service name}' |
| PROMETHEUS_HOST | "127.0.0.1" | Prometheus webserver address |
| PROMETHEUS_PORT | 8080 | Prometheus webserver port |
| RABBITMQ_HOST | "localhost" | RabbitMQ host address |
| RABBITMQ_PORT | "5672" | RabbitMQ host port |
| RABBITMQ_USERNAME | "user" | RabbitMQ username |
| RABBITMQ_PASSWORD | "bitnami" | RabbitMQ password |
| RABBITMQ_HEARTBEAT | 60 | Controls AMQP heartbeat timeout in seconds |
| RABBITMQ_CONNECTION_SLEEP | 5 | Controls AMQP connection sleep timer in seconds |
| REQUEST_QUEUE | "request_queue" | Requests to service |
| RESPONSE_QUEUE | "response_queue" | Responses by service |
| DEAD_LETTER_QUEUE | "dead_letter_queue" | Messages that failed to process |
| STORAGE_BACKEND | "s3" | The type of storage to use {s3, azure} |
| STORAGE_BUCKET | "redaction" | The bucket / container to pull files specified in queue requests from |
| STORAGE_ENDPOINT | "http://127.0.0.1:9000" | Endpoint for s3 storage |
| STORAGE_KEY | "root" | User for s3 storage |
| STORAGE_SECRET | "password" | Password for s3 storage |
| STORAGE_AZURECONNECTIONSTRING | "DefaultEndpointsProtocol=..." | Connection string for Azure storage |
| STORAGE_AZURECONTAINERNAME | "redaction" | AKS container |
| WRITE_CONSUMER_TOKEN | "False" | Value to see if we should write a consumer token to a file |
Response Format
Expected AMQP input message:
Either use the legacy format with dossierId and fileId as strings or the new format where absolute paths are used. A tenant ID can be optionally provided in the message header (key: "X-TENANT-ID")
{
"targetFilePath": "",
"responseFilePath": ""
}
or
{
"dossierId": "",
"fileId": "",
"targetFileExtension": "",
"responseFileExtension": ""
}
Optionally, the input message can contain a field with the key "operations".
AMQP output message:
{
"targetFilePath": "",
"responseFilePath": ""
}
or
{
"dossierId": "",
"fileId": ""
}
Usage & API
Setup
Add the respective version of the pyinfra package to your pyproject.toml file. Make sure to add our gitlab registry as a source.
For now, all internal packages used by pyinfra also have to be added to the pyproject.toml file.
Execute poetry lock and poetry install to install the packages.
You can look up the latest version of the package in the gitlab registry. For the used versions of internal dependencies, please refer to the pyproject.toml file.
[tool.poetry.dependencies]
pyinfra = { version = "x.x.x", source = "gitlab-research" }
kn-utils = { version = "x.x.x", source = "gitlab-research" }
[[tool.poetry.source]]
name = "gitlab-research"
url = "https://gitlab.knecon.com/api/v4/groups/19/-/packages/pypi/simple"
priority = "explicit"
API
from pyinfra import config
from pyinfra.payload_processing.processor import make_payload_processor
from pyinfra.queue.queue_manager import QueueManager
pyinfra_config = config.get_config()
process_payload = make_payload_processor(process_data, config=pyinfra_config)
queue_manager = QueueManager(pyinfra_config)
queue_manager.start_consuming(process_payload)
process_data should expect a dict (json) or bytes (pdf) as input and should return a list of results.
Scripts
Run pyinfra locally
Shell 1: Start minio and rabbitmq containers
$ cd tests && docker-compose up
Shell 2: Start pyinfra with callback mock
$ python scripts/start_pyinfra.py
Shell 3: Upload dummy content on storage and publish message
$ python scripts/send_request.py
Tests
Running all tests take a bit longer than you are probably used to, because among other things the required startup times are quite high for docker-compose dependent tests. This is why the tests are split into two parts. The first part contains all tests that do not require docker-compose and the second part contains all tests that require docker-compose. Per default, only the first part is executed, but when releasing a new version, all tests should be executed.