Merge in RR/pyinfra from cschabert/PlanSpecjava-1678717832322 to master
Squashed commit of the following:
commit 3ae2b191e777739738d91d114c376ac78efa193f
Author: Christoph Schabert <christoph.schabert@iqser.com>
Date: Tue Mar 14 08:36:54 2023 +0100
PlanSpec.java edited online with Bitbucket
commit 2aa012242c77958701ca7b3400ed4b3272cd7d95
Author: Christoph Schabert <christoph.schabert@iqser.com>
Date: Tue Mar 14 08:34:40 2023 +0100
sonar-scan.sh edited online with Bitbucket
commit 2dd8c21229f40f4972b632702c4bcf4ad71bf7ae
Author: Christoph Schabert <christoph.schabert@iqser.com>
Date: Tue Mar 14 08:33:50 2023 +0100
sonar-scan.sh edited online with Bitbucket
commit 8837c31d664a7cb913ac538c9403871352b014a3
Author: Christoph Schabert <christoph.schabert@iqser.com>
Date: Tue Mar 14 08:33:17 2023 +0100
sonar-scan.sh edited online with Bitbucket
commit 0de23c519fcbb9f991a85389fe1644af4256266b
Author: Christoph Schabert <christoph.schabert@iqser.com>
Date: Tue Mar 14 08:28:00 2023 +0100
config-keys.sh edited online with Bitbucket
commit 4f971967e5055e368bc3c779f7f400bbf9b86a42
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date: Tue Mar 14 08:22:17 2023 +0100
update bamboo agent username
commit 37fa1bbf9f83ec3d242a32e2051b6f1615102307
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date: Tue Mar 14 08:08:46 2023 +0100
remove venv install
commit 44180f403ac8a5b1b33090081c45e30121dbae8d
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date: Tue Mar 14 08:07:13 2023 +0100
add venv install
commit eac141bf8f430af3f7406a89df5147cd93231278
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date: Tue Mar 14 08:05:51 2023 +0100
add venv install
commit 24b37f9f83db20e90d3bd528f4111f524b7485c5
Author: Christoph Schabert <christoph.schabert@iqser.com>
Date: Mon Mar 13 15:47:03 2023 +0100
Set new image for Sonar Scan
commit b734389316f60b2fdbe4bdcdf00d1f2f14e61266
Author: Christoph Schabert <christoph.schabert@iqser.com>
Date: Mon Mar 13 15:30:45 2023 +0100
update java version for sonar-scan
PyInfra
About
Common Module with the infrastructure to deploy Research Projects. The Infrastructure expects to be deployed in the same Pod / local environment as the analysis container and handles all outbound communication.
Configuration
A configuration is located in /config.yaml. All relevant variables can be configured via exporting environment variables.
| Environment Variable | Default | Description |
|---|---|---|
| LOGGING_LEVEL_ROOT | "DEBUG" | Logging level for service logger |
| RABBITMQ_HOST | "localhost" | RabbitMQ host address |
| RABBITMQ_PORT | "5672" | RabbitMQ host port |
| RABBITMQ_USERNAME | "user" | RabbitMQ username |
| RABBITMQ_PASSWORD | "bitnami" | RabbitMQ password |
| RABBITMQ_HEARTBEAT | 60 | Controls AMQP heartbeat timeout in seconds |
| RABBITMQ_CONNECTION_SLEEP | 5 | Controls AMQP connection sleep timer in seconds |
| REQUEST_QUEUE | "request_queue" | Requests to service |
| RESPONSE_QUEUE | "response_queue" | Responses by service |
| DEAD_LETTER_QUEUE | "dead_letter_queue" | Messages that failed to process |
| STORAGE_BACKEND | "s3" | The type of storage to use {s3, azure} |
| STORAGE_BUCKET | "redaction" | The bucket / container to pull files specified in queue requests from |
| STORAGE_ENDPOINT | "http://127.0.0.1:9000" | Endpoint for s3 storage |
| STORAGE_KEY | "root" | User for s3 storage |
| STORAGE_SECRET | "password" | Password for s3 storage |
| STORAGE_AZURECONNECTIONSTRING | "DefaultEndpointsProtocol=..." | Connection string for Azure storage |
| STORAGE_AZURECONTAINERNAME | "redaction" | AKS container |
| WRITE_CONSUMER_TOKEN | "False" | Value to see if we should write a consumer token to a file |
Response Format
Expected AMQP input message:
{
"dossierId": "",
"fileId": "",
"targetFileExtension": "",
"responseFileExtension": ""
}
Optionally, the input message can contain a field with the key "operations".
AMQP output message:
{
"dossierId": "",
"fileId": ""
}
Usage & API
Setup
Install project dependencies
make poetry
You don't have to install it independently in the project repo, just import pyinfra in any .py-file
or install form another project
poetry add git+ssh://git@git.iqser.com:2222/rr/pyinfra.git#TAG-NUMBER
API
from pyinfra.config import get_config
from pyinfra.payload_processing.processor import make_payload_processor
from pyinfra.queue.queue_manager import QueueManager
queue_manager = QueueManager(get_config())
queue_manager.start_consuming(make_payload_processor(data_processor))
The data_processor should expect a dict or bytes (pdf) as input and should return a list of results.
Scripts
Run pyinfra locally
Shell 1: Start minio and rabbitmq containers
$ cd tests && docker-compose up
Shell 2: Start pyinfra with callback mock
$ python scripts/start_pyinfra.py
Shell 3: Upload dummy content on storage and publish message
$ python scripts/mock_process_request.py
Tests
The tests take a bit longer than you are probably used to, because among other things the required startup times are quite high. The test runtime can be accelerated by setting 'autouse' to 'False'. In that case, run 'docker-compose up' in the tests dir manually before running the tests.