# PyInfra 1. [ About ](#about) 2. [ Configuration ](#configuration) 3. [ Response Format ](#response-format) 4. [ Usage & API ](#usage--api) 5. [ Scripts ](#scripts) 6. [ Tests ](#tests) ## About Common Module with the infrastructure to deploy Research Projects. The Infrastructure expects to be deployed in the same Pod / local environment as the analysis container and handles all outbound communication. ## Configuration A configuration is located in `/config.yaml`. All relevant variables can be configured via exporting environment variables. | Environment Variable | Default | Description | |-------------------------------|----------------------------------|--------------------------------------------------------------------------| | LOGGING_LEVEL_ROOT | "DEBUG" | Logging level for service logger | | MONITORING_ENABLED | True | Enables Prometheus monitoring | | PROMETHEUS_METRIC_PREFIX | "redactmanager_research_service" | Prometheus metric prefix, per convention '{product_name}_{service name}' | | PROMETHEUS_HOST | "127.0.0.1" | Prometheus webserver address | | PROMETHEUS_PORT | 8080 | Prometheus webserver port | | RABBITMQ_HOST | "localhost" | RabbitMQ host address | | RABBITMQ_PORT | "5672" | RabbitMQ host port | | RABBITMQ_USERNAME | "user" | RabbitMQ username | | RABBITMQ_PASSWORD | "bitnami" | RabbitMQ password | | RABBITMQ_HEARTBEAT | 60 | Controls AMQP heartbeat timeout in seconds | | RABBITMQ_CONNECTION_SLEEP | 5 | Controls AMQP connection sleep timer in seconds | | REQUEST_QUEUE | "request_queue" | Requests to service | | RESPONSE_QUEUE | "response_queue" | Responses by service | | DEAD_LETTER_QUEUE | "dead_letter_queue" | Messages that failed to process | | STORAGE_BACKEND | "s3" | The type of storage to use {s3, azure} | | STORAGE_BUCKET | "redaction" | The bucket / container to pull files specified in queue requests from | | STORAGE_ENDPOINT | "http://127.0.0.1:9000" | Endpoint for s3 storage | | STORAGE_KEY | "root" | User for s3 storage | | STORAGE_SECRET | "password" | Password for s3 storage | | STORAGE_AZURECONNECTIONSTRING | "DefaultEndpointsProtocol=..." | Connection string for Azure storage | | STORAGE_AZURECONTAINERNAME | "redaction" | AKS container | | WRITE_CONSUMER_TOKEN | "False" | Value to see if we should write a consumer token to a file | ## Response Format ### Expected AMQP input message: Either use the legacy format with dossierId and fileId as strings or the new format where absolute paths are used. A tenant ID can be optionally provided in the message header (key: "X-TENANT-ID") ```json { "targetFilePath": "", "responseFilePath": "" } ``` or ```json { "dossierId": "", "fileId": "", "targetFileExtension": "", "responseFileExtension": "" } ``` Optionally, the input message can contain a field with the key `"operations"`. ### AMQP output message: ```json { "targetFilePath": "", "responseFilePath": "" } ``` or ```json { "dossierId": "", "fileId": "" } ``` ## Usage & API ### Setup Add the respective version of the pyinfra package to your pyproject.toml file. Make sure to add our gitlab registry as a source. For now, all internal packages used by pyinfra also have to be added to the pyproject.toml file. Execute `poetry lock` and `poetry install` to install the packages. You can look up the latest version of the package in the [gitlab registry](https://gitlab.knecon.com/knecon/research/pyinfra/-/packages). For the used versions of internal dependencies, please refer to the [pyproject.toml](pyproject.toml) file. ```toml [tool.poetry.dependencies] pyinfra = { version = "x.x.x", source = "gitlab-research" } kn-utils = { version = "x.x.x", source = "gitlab-research" } [[tool.poetry.source]] name = "gitlab-research" url = "https://gitlab.knecon.com/api/v4/groups/19/-/packages/pypi/simple" priority = "explicit" ``` ### API ```python from pyinfra import config from pyinfra.payload_processing.processor import make_payload_processor from pyinfra.queue.queue_manager import QueueManager pyinfra_config = config.get_config() process_payload = make_payload_processor(process_data, config=pyinfra_config) queue_manager = QueueManager(pyinfra_config) queue_manager.start_consuming(process_payload) ``` `process_data` should expect a dict (json) or bytes (pdf) as input and should return a list of results. ## Scripts ### Run pyinfra locally **Shell 1**: Start minio and rabbitmq containers ```bash $ cd tests && docker-compose up ``` **Shell 2**: Start pyinfra with callback mock ```bash $ python scripts/start_pyinfra.py ``` **Shell 3**: Upload dummy content on storage and publish message ```bash $ python scripts/send_request.py ``` ## Tests Running all tests take a bit longer than you are probably used to, because among other things the required startup times are quite high for docker-compose dependent tests. This is why the tests are split into two parts. The first part contains all tests that do not require docker-compose and the second part contains all tests that require docker-compose. Per default, only the first part is executed, but when releasing a new version, all tests should be executed.