Merge in RR/pyinfra from RED-6366-refactor to master
Squashed commit of the following:
commit 8807cda514b5cc24b1be208173283275d87dcb97
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date: Fri Mar 10 13:15:15 2023 +0100
enable docker-compose autouse for automatic tests
commit c4579581d3e9a885ef387ee97f3f3a5cf4731193
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date: Fri Mar 10 12:35:49 2023 +0100
black
commit ac2b754c5624ef37ce310fce7196c9ea11bbca03
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date: Fri Mar 10 12:30:23 2023 +0100
refactor storage url parsing
- move parsing and validation to config where the connection url is
actually read in
- improve readability of parsing fn
commit 371802cc10b6d946c4939ff6839571002a2cb9f4
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date: Fri Mar 10 10:48:00 2023 +0100
refactor
commit e8c381c29deebf663e665920752c2965d7abce16
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date: Fri Mar 10 09:57:34 2023 +0100
rename
commit c8628a509316a651960dfa806d5fe6aacb7a91c1
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date: Fri Mar 10 09:37:01 2023 +0100
renaming and refactoring
commit 4974d4f56fd73bc55bd76aa7a9bbb16babee19f4
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date: Fri Mar 10 08:53:09 2023 +0100
refactor payload processor
- limit make_uploader and make_downloader cache
- partially apply them when the class is initialized with storage and
bucket to make the logic and behaviour more comprehensive
- renaming functional pipeline steps to be more expressive
commit f8d51bfcad2b815c8293ab27dd66b256255c5414
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date: Thu Mar 9 15:30:32 2023 +0100
remove monitor and rename Payload
commit 412ddaa207a08aff1229d7acd5d95402ac8cd578
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date: Thu Mar 2 10:15:39 2023 +0100
remove azure connection string and disable respective test for now for security reasons
commit 7922a2d9d325f3b9008ad4e3e56b241ba179f52c
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date: Wed Mar 1 13:30:58 2023 +0100
make payload formatting function names more expressive
commit 7517e544b0f5a434579cc9bada3a37e7ac04059f
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date: Wed Mar 1 13:24:57 2023 +0100
add some type hints
commit 095410d3009f2dcbd374680dd0f7b55de94c9e76
Author: Matthias Bisping <matthias.bisping@axbit.com>
Date: Wed Mar 1 10:54:58 2023 +0100
Refactoring
- Renaming
- Docstring adjustments
commit e992f0715fc2636eb13eb5ffc4de0bcc5d433fc8
Author: Matthias Bisping <matthias.bisping@axbit.com>
Date: Wed Mar 1 09:43:26 2023 +0100
Re-wording and typo fixes
commit 3c2d698f9bf980bc4b378a44dc20c2badc407b3e
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date: Tue Feb 28 14:59:59 2023 +0100
enable auto startup for docker compose in tests
commit 55773b4fb0b624ca4745e5b8aeafa6f6a0ae6436
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date: Tue Feb 28 14:59:37 2023 +0100
Extended tests for queue manager
commit 14f7f943f60b9bfb9fe77fa3cef99a1e7d094333
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date: Tue Feb 28 13:39:00 2023 +0100
enable auto startup for docker compose in tests
commit 7caf354491c84c6e0b0e09ad4d41cb5dfbfdb225
Merge: 49d47ba d0277b8
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date: Tue Feb 28 13:32:52 2023 +0100
Merge branch 'RED-6205-prometheus' of ssh://git.iqser.com:2222/rr/pyinfra into RED-6205-prometheus
commit 49d47baba8ccf11dee48a4c1cbddc3bbd12471e5
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date: Tue Feb 28 13:32:42 2023 +0100
adjust Payload Processor signature
commit d0277b86bc54994b6032774bf0ec2d7b19d7f517
Merge: 5184a18 f6b35d6
Author: Christoph Schabert <christoph.schabert@iqser.com>
Date: Tue Feb 28 11:07:16 2023 +0100
Pull request #61: Change Sec Trigger to PR
Merge in RR/pyinfra from cschabert/PlanSpecjava-1677578703647 to RED-6205-prometheus
* commit 'f6b35d648c88ddbce1856445c3b887bce669265c':
Change Sec Trigger to PR
commit f6b35d648c88ddbce1856445c3b887bce669265c
Author: Christoph Schabert <christoph.schabert@iqser.com>
Date: Tue Feb 28 11:05:13 2023 +0100
Change Sec Trigger to PR
... and 20 more commits
PyInfra
About
Common Module with the infrastructure to deploy Research Projects. The Infrastructure expects to be deployed in the same Pod / local environment as the analysis container and handles all outbound communication.
Configuration
A configuration is located in /config.yaml. All relevant variables can be configured via exporting environment variables.
| Environment Variable | Default | Description |
|---|---|---|
| LOGGING_LEVEL_ROOT | "DEBUG" | Logging level for service logger |
| RABBITMQ_HOST | "localhost" | RabbitMQ host address |
| RABBITMQ_PORT | "5672" | RabbitMQ host port |
| RABBITMQ_USERNAME | "user" | RabbitMQ username |
| RABBITMQ_PASSWORD | "bitnami" | RabbitMQ password |
| RABBITMQ_HEARTBEAT | 60 | Controls AMQP heartbeat timeout in seconds |
| RABBITMQ_CONNECTION_SLEEP | 5 | Controls AMQP connection sleep timer in seconds |
| REQUEST_QUEUE | "request_queue" | Requests to service |
| RESPONSE_QUEUE | "response_queue" | Responses by service |
| DEAD_LETTER_QUEUE | "dead_letter_queue" | Messages that failed to process |
| STORAGE_BACKEND | "s3" | The type of storage to use {s3, azure} |
| STORAGE_BUCKET | "redaction" | The bucket / container to pull files specified in queue requests from |
| STORAGE_ENDPOINT | "http://127.0.0.1:9000" | Endpoint for s3 storage |
| STORAGE_KEY | "root" | User for s3 storage |
| STORAGE_SECRET | "password" | Password for s3 storage |
| STORAGE_AZURECONNECTIONSTRING | "DefaultEndpointsProtocol=..." | Connection string for Azure storage |
| STORAGE_AZURECONTAINERNAME | "redaction" | AKS container |
| WRITE_CONSUMER_TOKEN | "False" | Value to see if we should write a consumer token to a file |
Response Format
Expected AMQP input message:
{
"dossierId": "",
"fileId": "",
"targetFileExtension": "",
"responseFileExtension": ""
}
Optionally, the input message can contain a field with the key "operations".
AMQP output message:
{
"dossierId": "",
"fileId": ""
}
Usage & API
Setup
Install project dependencies
make poetry
You don't have to install it independently in the project repo, just import pyinfra in any .py-file
or install form another project
poetry add git+ssh://git@git.iqser.com:2222/rr/pyinfra.git#TAG-NUMBER
API
from pyinfra.config import get_config
from pyinfra.payload_processing.processor import make_payload_processor
from pyinfra.queue.queue_manager import QueueManager
queue_manager = QueueManager(get_config())
queue_manager.start_consuming(make_payload_processor(data_processor))
The data_processor should expect a dict or bytes (pdf) as input and should return a list of results.
Scripts
Run pyinfra locally
Shell 1: Start minio and rabbitmq containers
$ cd tests && docker-compose up
Shell 2: Start pyinfra with callback mock
$ python scripts/start_pyinfra.py
Shell 3: Upload dummy content on storage and publish message
$ python scripts/mock_process_request.py
Tests
The tests take a bit longer than you are probably used to, because among other things the required startup times are quite high. The test runtime can be accelerated by setting 'autouse' to 'False'. In that case, run 'docker-compose up' in the tests dir manually before running the tests.