Merge in RR/pyinfra from RED-6273-multi-tenant-storage to master
Squashed commit of the following:
commit 0fead1f8b59c9187330879b4e48d48355885c27c
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date: Tue Mar 28 15:02:22 2023 +0200
fix typos
commit 892a803726946876f8b8cd7905a0e73c419b2fb1
Author: Matthias Bisping <matthias.bisping@axbit.com>
Date: Tue Mar 28 14:41:49 2023 +0200
Refactoring
Replace custom storage caching logic with LRU decorator
commit eafcd90260731e3360ce960571f07dee8f521327
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date: Fri Mar 24 12:50:13 2023 +0100
fix bug in storage connection from endpoint
commit d0c9fb5b7d1c55ae2f90e8faa1efec9f7587c26a
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date: Fri Mar 24 11:49:34 2023 +0100
add logs to PayloadProcessor
- set log messages to determine if x-tenant
storage connection is working
commit 97309fe58037b90469cf7a3de342d4749a0edfde
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date: Fri Mar 24 10:41:59 2023 +0100
update PayloadProcessor
- introduce storage cache to make every unique
storage connection only once
- add functionality to pass optional processing
kwargs in queue message like the operation key to
the processing function
commit d48e8108fdc0d463c89aaa0d672061ab7dca83a0
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date: Wed Mar 22 13:34:43 2023 +0100
add multi-tenant storage connection 1st iteration
- forward x-tenant-id from queue message header to
payload processor
- add functions to receive storage infos from an
endpoint or the config. This enables hashing and
caching of connections created from these infos
- add function to initialize storage connections
from storage infos
- streamline and refactor tests to make them more
readable and robust and to make it easier to add
new tests
- update payload processor with first iteration
of multi tenancy storage connection support
with connection caching and backwards compability
commit 52c047c47b98e62d0b834a9b9b6c0e2bb0db41e5
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date: Tue Mar 21 15:35:57 2023 +0100
add AES/GCM cipher functions
- decrypt x-tenant storage connection strings
PyInfra
About
Common Module with the infrastructure to deploy Research Projects. The Infrastructure expects to be deployed in the same Pod / local environment as the analysis container and handles all outbound communication.
Configuration
A configuration is located in /config.yaml. All relevant variables can be configured via exporting environment variables.
| Environment Variable | Default | Description |
|---|---|---|
| LOGGING_LEVEL_ROOT | "DEBUG" | Logging level for service logger |
| MONITORING_ENABLED | True | Enables Prometheus monitoring |
| PROMETHEUS_METRIC_PREFIX | "redactmanager_research_service" | Prometheus metric prefix, per convention '{product_name}_{service name}' |
| PROMETHEUS_HOST | "127.0.0.1" | Prometheus webserver address |
| PROMETHEUS_PORT | 8080 | Prometheus webserver port |
| RABBITMQ_HOST | "localhost" | RabbitMQ host address |
| RABBITMQ_PORT | "5672" | RabbitMQ host port |
| RABBITMQ_USERNAME | "user" | RabbitMQ username |
| RABBITMQ_PASSWORD | "bitnami" | RabbitMQ password |
| RABBITMQ_HEARTBEAT | 60 | Controls AMQP heartbeat timeout in seconds |
| RABBITMQ_CONNECTION_SLEEP | 5 | Controls AMQP connection sleep timer in seconds |
| REQUEST_QUEUE | "request_queue" | Requests to service |
| RESPONSE_QUEUE | "response_queue" | Responses by service |
| DEAD_LETTER_QUEUE | "dead_letter_queue" | Messages that failed to process |
| STORAGE_BACKEND | "s3" | The type of storage to use {s3, azure} |
| STORAGE_BUCKET | "redaction" | The bucket / container to pull files specified in queue requests from |
| STORAGE_ENDPOINT | "http://127.0.0.1:9000" | Endpoint for s3 storage |
| STORAGE_KEY | "root" | User for s3 storage |
| STORAGE_SECRET | "password" | Password for s3 storage |
| STORAGE_AZURECONNECTIONSTRING | "DefaultEndpointsProtocol=..." | Connection string for Azure storage |
| STORAGE_AZURECONTAINERNAME | "redaction" | AKS container |
| WRITE_CONSUMER_TOKEN | "False" | Value to see if we should write a consumer token to a file |
Response Format
Expected AMQP input message:
{
"dossierId": "",
"fileId": "",
"targetFileExtension": "",
"responseFileExtension": ""
}
Optionally, the input message can contain a field with the key "operations".
AMQP output message:
{
"dossierId": "",
"fileId": ""
}
Usage & API
Setup
Install project dependencies
make poetry
You don't have to install it independently in the project repo, just import pyinfra in any .py-file
or install form another project
poetry add git+ssh://git@git.iqser.com:2222/rr/pyinfra.git#TAG-NUMBER
API
from pyinfra.config import get_config
from pyinfra.payload_processing.processor import make_payload_processor
from pyinfra.queue.queue_manager import QueueManager
queue_manager = QueueManager(get_config())
queue_manager.start_consuming(make_payload_processor(data_processor))
The data_processor should expect a dict or bytes (pdf) as input and should return a list of results.
Scripts
Run pyinfra locally
Shell 1: Start minio and rabbitmq containers
$ cd tests && docker-compose up
Shell 2: Start pyinfra with callback mock
$ python scripts/start_pyinfra.py
Shell 3: Upload dummy content on storage and publish message
$ python scripts/mock_process_request.py
Tests
The tests take a bit longer than you are probably used to, because among other things the required startup times are quite high. The test runtime can be accelerated by setting 'autouse' to 'False'. In that case, run 'docker-compose up' in the tests dir manually before running the tests.