Merge in RR/pyinfra from RED-6205-monitoring to master
Squashed commit of the following:
commit 529cedfd7c065a3f7364e4596b923f25f0af76b5
Author: Matthias Bisping <matthias.bisping@axbit.com>
Date: Thu Mar 16 14:57:26 2023 +0100
Remove unnecessary default argument to dict.get
commit b718531f568e89df77cc05039e5e7afe7111b9a4
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date: Thu Mar 16 14:56:50 2023 +0100
refactor
commit c039b0c25a6cd2ad2a72d237d0930c484c8e427c
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date: Thu Mar 16 13:22:17 2023 +0100
increase package version to reflect the recent changes
commit 0a983a4113f25cd692b68869e1f33ffbf7efc6f0
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date: Thu Mar 16 13:16:39 2023 +0100
remove processing result conversion to a list, since ner-predicion service actually returns a dictionary. It is now expected that the result is sized to perform the monitoring and json dumpable to upload it.
commit 541bf321410471dc09a354669b2778402286c09f
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date: Thu Mar 16 12:48:07 2023 +0100
remove no longer needed requirements
commit cfa182985d989a5b92a9a069a603daee72f37d49
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date: Thu Mar 16 11:14:58 2023 +0100
refactor payload formatting
- introduce PayloadFormatter class for better typehinting and bundling
of functionality
- parametrize payload formatting so the PayloadProcesser can adapt
better to differnt services/products
- move file extension parsing to its own module
commit f57663b86954b7164eeb6db013d862af88ec4584
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date: Wed Mar 15 12:22:08 2023 +0100
refactor payload parsing
- introduce QueueMessagePayloadParser for generality
and typehinting
- refactor file extension parsing algorithm
commit 713fb4a0dddecf5442ceda3988444d9887869dcf
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date: Tue Mar 14 17:07:02 2023 +0100
fix tests
commit a22ecf7ae93bc0bec235fba3fd9cbf6c1778aa13
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date: Tue Mar 14 16:31:26 2023 +0100
refactor payload parsing
- parameterize file and compression types allowed for files to download
and upload via config
- make a real value bag out of QueueMessagePayload and do the parsing
beforehand
- refector file extension parser to be more robust
commit 50b578d054ca47a94c907f5f8b585eca7ed626ac
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date: Tue Mar 14 13:21:32 2023 +0100
add monitoring
- add an optional prometheus monitor to monitor the average processing
time of a service per relevent paramater that is at this point defined
via the number of resulting elements.
commit de525e7fa2f846f7fde5b9a4b466039238da10cd
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date: Tue Mar 14 12:57:24 2023 +0100
fix bug in file extension parser not working if the file endings have prefixes
5.6 KiB
Executable File
PyInfra
About
Common Module with the infrastructure to deploy Research Projects. The Infrastructure expects to be deployed in the same Pod / local environment as the analysis container and handles all outbound communication.
Configuration
A configuration is located in /config.yaml. All relevant variables can be configured via exporting environment variables.
| Environment Variable | Default | Description |
|---|---|---|
| LOGGING_LEVEL_ROOT | "DEBUG" | Logging level for service logger |
| MONITORING_ENABLED | True | Enables Prometheus monitoring |
| PROMETHEUS_METRIC_PREFIX | "redactmanager_research_service" | Prometheus metric prefix, per convention '{product_name}_{service name}' |
| PROMETHEUS_HOST | "127.0.0.1" | Prometheus webserver address |
| PROMETHEUS_PORT | 8080 | Prometheus webserver port |
| RABBITMQ_HOST | "localhost" | RabbitMQ host address |
| RABBITMQ_PORT | "5672" | RabbitMQ host port |
| RABBITMQ_USERNAME | "user" | RabbitMQ username |
| RABBITMQ_PASSWORD | "bitnami" | RabbitMQ password |
| RABBITMQ_HEARTBEAT | 60 | Controls AMQP heartbeat timeout in seconds |
| RABBITMQ_CONNECTION_SLEEP | 5 | Controls AMQP connection sleep timer in seconds |
| REQUEST_QUEUE | "request_queue" | Requests to service |
| RESPONSE_QUEUE | "response_queue" | Responses by service |
| DEAD_LETTER_QUEUE | "dead_letter_queue" | Messages that failed to process |
| STORAGE_BACKEND | "s3" | The type of storage to use {s3, azure} |
| STORAGE_BUCKET | "redaction" | The bucket / container to pull files specified in queue requests from |
| STORAGE_ENDPOINT | "http://127.0.0.1:9000" | Endpoint for s3 storage |
| STORAGE_KEY | "root" | User for s3 storage |
| STORAGE_SECRET | "password" | Password for s3 storage |
| STORAGE_AZURECONNECTIONSTRING | "DefaultEndpointsProtocol=..." | Connection string for Azure storage |
| STORAGE_AZURECONTAINERNAME | "redaction" | AKS container |
| WRITE_CONSUMER_TOKEN | "False" | Value to see if we should write a consumer token to a file |
Response Format
Expected AMQP input message:
{
"dossierId": "",
"fileId": "",
"targetFileExtension": "",
"responseFileExtension": ""
}
Optionally, the input message can contain a field with the key "operations".
AMQP output message:
{
"dossierId": "",
"fileId": ""
}
Usage & API
Setup
Install project dependencies
make poetry
You don't have to install it independently in the project repo, just import pyinfra in any .py-file
or install form another project
poetry add git+ssh://git@git.iqser.com:2222/rr/pyinfra.git#TAG-NUMBER
API
from pyinfra.config import get_config
from pyinfra.payload_processing.processor import make_payload_processor
from pyinfra.queue.queue_manager import QueueManager
queue_manager = QueueManager(get_config())
queue_manager.start_consuming(make_payload_processor(data_processor))
The data_processor should expect a dict or bytes (pdf) as input and should return a list of results.
Scripts
Run pyinfra locally
Shell 1: Start minio and rabbitmq containers
$ cd tests && docker-compose up
Shell 2: Start pyinfra with callback mock
$ python scripts/start_pyinfra.py
Shell 3: Upload dummy content on storage and publish message
$ python scripts/mock_process_request.py
Tests
The tests take a bit longer than you are probably used to, because among other things the required startup times are quite high. The test runtime can be accelerated by setting 'autouse' to 'False'. In that case, run 'docker-compose up' in the tests dir manually before running the tests.