Merge in RR/pyinfra from bugfix/RED-6205-prometheus-port to master
Squashed commit of the following:
commit e97d81bebfe34c24d8da4e4392ff7dbd3638e685
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date: Tue Mar 21 15:48:04 2023 +0100
increase package version
commit c7e181a462e275c5f2cbf1e6df4c88dfefbe36b7
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date: Tue Mar 21 15:43:46 2023 +0100
fix prometheus address
- change loopback address to all available network interfaces to enable
external metric scraping
- disable ENV input for prometheus address and port since they should
not be set in HELM
PyInfra
About
Common Module with the infrastructure to deploy Research Projects. The Infrastructure expects to be deployed in the same Pod / local environment as the analysis container and handles all outbound communication.
Configuration
A configuration is located in /config.yaml. All relevant variables can be configured via exporting environment variables.
| Environment Variable | Default | Description |
|---|---|---|
| LOGGING_LEVEL_ROOT | "DEBUG" | Logging level for service logger |
| MONITORING_ENABLED | True | Enables Prometheus monitoring |
| PROMETHEUS_METRIC_PREFIX | "redactmanager_research_service" | Prometheus metric prefix, per convention '{product_name}_{service name}' |
| PROMETHEUS_HOST | "127.0.0.1" | Prometheus webserver address |
| PROMETHEUS_PORT | 8080 | Prometheus webserver port |
| RABBITMQ_HOST | "localhost" | RabbitMQ host address |
| RABBITMQ_PORT | "5672" | RabbitMQ host port |
| RABBITMQ_USERNAME | "user" | RabbitMQ username |
| RABBITMQ_PASSWORD | "bitnami" | RabbitMQ password |
| RABBITMQ_HEARTBEAT | 60 | Controls AMQP heartbeat timeout in seconds |
| RABBITMQ_CONNECTION_SLEEP | 5 | Controls AMQP connection sleep timer in seconds |
| REQUEST_QUEUE | "request_queue" | Requests to service |
| RESPONSE_QUEUE | "response_queue" | Responses by service |
| DEAD_LETTER_QUEUE | "dead_letter_queue" | Messages that failed to process |
| STORAGE_BACKEND | "s3" | The type of storage to use {s3, azure} |
| STORAGE_BUCKET | "redaction" | The bucket / container to pull files specified in queue requests from |
| STORAGE_ENDPOINT | "http://127.0.0.1:9000" | Endpoint for s3 storage |
| STORAGE_KEY | "root" | User for s3 storage |
| STORAGE_SECRET | "password" | Password for s3 storage |
| STORAGE_AZURECONNECTIONSTRING | "DefaultEndpointsProtocol=..." | Connection string for Azure storage |
| STORAGE_AZURECONTAINERNAME | "redaction" | AKS container |
| WRITE_CONSUMER_TOKEN | "False" | Value to see if we should write a consumer token to a file |
Response Format
Expected AMQP input message:
{
"dossierId": "",
"fileId": "",
"targetFileExtension": "",
"responseFileExtension": ""
}
Optionally, the input message can contain a field with the key "operations".
AMQP output message:
{
"dossierId": "",
"fileId": ""
}
Usage & API
Setup
Install project dependencies
make poetry
You don't have to install it independently in the project repo, just import pyinfra in any .py-file
or install form another project
poetry add git+ssh://git@git.iqser.com:2222/rr/pyinfra.git#TAG-NUMBER
API
from pyinfra.config import get_config
from pyinfra.payload_processing.processor import make_payload_processor
from pyinfra.queue.queue_manager import QueueManager
queue_manager = QueueManager(get_config())
queue_manager.start_consuming(make_payload_processor(data_processor))
The data_processor should expect a dict or bytes (pdf) as input and should return a list of results.
Scripts
Run pyinfra locally
Shell 1: Start minio and rabbitmq containers
$ cd tests && docker-compose up
Shell 2: Start pyinfra with callback mock
$ python scripts/start_pyinfra.py
Shell 3: Upload dummy content on storage and publish message
$ python scripts/mock_process_request.py
Tests
The tests take a bit longer than you are probably used to, because among other things the required startup times are quite high. The test runtime can be accelerated by setting 'autouse' to 'False'. In that case, run 'docker-compose up' in the tests dir manually before running the tests.