PyInfra

  1. About
  2. Configuration
  3. Response Format
  4. Usage & API
  5. Scripts
  6. Tests

About

Common Module with the infrastructure to deploy Research Projects. The Infrastructure expects to be deployed in the same Pod / local environment as the analysis container and handles all outbound communication.

Configuration

A configuration is located in /config.yaml. All relevant variables can be configured via exporting environment variables.

Environment Variable Default Description
LOGGING_LEVEL_ROOT "DEBUG" Logging level for service logger
MONITORING_ENABLED True Enables Prometheus monitoring
PROMETHEUS_METRIC_PREFIX "redactmanager_research_service" Prometheus metric prefix, per convention '{product_name}_{service name}'
PROMETHEUS_HOST "127.0.0.1" Prometheus webserver address
PROMETHEUS_PORT 8080 Prometheus webserver port
RABBITMQ_HOST "localhost" RabbitMQ host address
RABBITMQ_PORT "5672" RabbitMQ host port
RABBITMQ_USERNAME "user" RabbitMQ username
RABBITMQ_PASSWORD "bitnami" RabbitMQ password
RABBITMQ_HEARTBEAT 60 Controls AMQP heartbeat timeout in seconds
RABBITMQ_CONNECTION_SLEEP 5 Controls AMQP connection sleep timer in seconds
REQUEST_QUEUE "request_queue" Requests to service
RESPONSE_QUEUE "response_queue" Responses by service
DEAD_LETTER_QUEUE "dead_letter_queue" Messages that failed to process
STORAGE_BACKEND "s3" The type of storage to use {s3, azure}
STORAGE_BUCKET "redaction" The bucket / container to pull files specified in queue requests from
STORAGE_ENDPOINT "http://127.0.0.1:9000" Endpoint for s3 storage
STORAGE_KEY "root" User for s3 storage
STORAGE_SECRET "password" Password for s3 storage
STORAGE_AZURECONNECTIONSTRING "DefaultEndpointsProtocol=..." Connection string for Azure storage
STORAGE_AZURECONTAINERNAME "redaction" AKS container
WRITE_CONSUMER_TOKEN "False" Value to see if we should write a consumer token to a file

Response Format

Expected AMQP input message:

{
   "dossierId": "",
   "fileId": "",
   "targetFileExtension": "",
   "responseFileExtension": ""
}

Optionally, the input message can contain a field with the key "operations".

AMQP output message:

{
  "dossierId": "",
  "fileId": ""
}

Usage & API

Setup

Install project dependencies

 make poetry

You don't have to install it independently in the project repo, just import pyinfra in any .py-file

or install form another project

poetry add git+ssh://git@git.iqser.com:2222/rr/pyinfra.git#TAG-NUMBER

API

from pyinfra.config import get_config
from pyinfra.payload_processing.processor import make_payload_processor
from pyinfra.queue.queue_manager import QueueManager

queue_manager = QueueManager(get_config())
queue_manager.start_consuming(make_payload_processor(data_processor))

The data_processor should expect a dict or bytes (pdf) as input and should return a list of results.

Scripts

Run pyinfra locally

Shell 1: Start minio and rabbitmq containers

$ cd tests && docker-compose up

Shell 2: Start pyinfra with callback mock

$ python scripts/start_pyinfra.py 

Shell 3: Upload dummy content on storage and publish message

$ python scripts/mock_process_request.py

Tests

The tests take a bit longer than you are probably used to, because among other things the required startup times are quite high. The test runtime can be accelerated by setting 'autouse' to 'False'. In that case, run 'docker-compose up' in the tests dir manually before running the tests.

Description
Infrastructure container for analysis container
Readme 3.2 MiB
Release 4.1.0 Latest
2025-01-22 12:38:26 +01:00
Languages
Python 96.7%
Makefile 2%
Shell 1.3%