Julius Unverfehrt 201ed5b9a8 Merge branch 'feature/RED-6685-support-absolute-paths' into 'master'
Add support for absolute file paths

See merge request knecon/research/pyinfra!77
2023-08-23 14:11:46 +02:00
2023-08-22 17:33:22 +02:00
2022-11-15 09:00:46 +01:00
2023-08-22 17:33:22 +02:00

PyInfra

  1. About
  2. Configuration
  3. Response Format
  4. Usage & API
  5. Scripts
  6. Tests

About

Common Module with the infrastructure to deploy Research Projects. The Infrastructure expects to be deployed in the same Pod / local environment as the analysis container and handles all outbound communication.

Configuration

A configuration is located in /config.yaml. All relevant variables can be configured via exporting environment variables.

Environment Variable Default Description
LOGGING_LEVEL_ROOT "DEBUG" Logging level for service logger
MONITORING_ENABLED True Enables Prometheus monitoring
PROMETHEUS_METRIC_PREFIX "redactmanager_research_service" Prometheus metric prefix, per convention '{product_name}_{service name}'
PROMETHEUS_HOST "127.0.0.1" Prometheus webserver address
PROMETHEUS_PORT 8080 Prometheus webserver port
RABBITMQ_HOST "localhost" RabbitMQ host address
RABBITMQ_PORT "5672" RabbitMQ host port
RABBITMQ_USERNAME "user" RabbitMQ username
RABBITMQ_PASSWORD "bitnami" RabbitMQ password
RABBITMQ_HEARTBEAT 60 Controls AMQP heartbeat timeout in seconds
RABBITMQ_CONNECTION_SLEEP 5 Controls AMQP connection sleep timer in seconds
REQUEST_QUEUE "request_queue" Requests to service
RESPONSE_QUEUE "response_queue" Responses by service
DEAD_LETTER_QUEUE "dead_letter_queue" Messages that failed to process
STORAGE_BACKEND "s3" The type of storage to use {s3, azure}
STORAGE_BUCKET "redaction" The bucket / container to pull files specified in queue requests from
STORAGE_ENDPOINT "http://127.0.0.1:9000" Endpoint for s3 storage
STORAGE_KEY "root" User for s3 storage
STORAGE_SECRET "password" Password for s3 storage
STORAGE_AZURECONNECTIONSTRING "DefaultEndpointsProtocol=..." Connection string for Azure storage
STORAGE_AZURECONTAINERNAME "redaction" AKS container
WRITE_CONSUMER_TOKEN "False" Value to see if we should write a consumer token to a file

Response Format

Expected AMQP input message:

Either use the legacy format with dossierId and fileId as strings or the new format where absolute paths are used. A tenant ID can be optionally provided in the message header (key: "X-TENANT-ID")

{
  "targetFilePath": "",
  "responseFilePath": ""
}

or

{
   "dossierId": "",
   "fileId": "",
   "targetFileExtension": "",
   "responseFileExtension": ""
}

Optionally, the input message can contain a field with the key "operations".

AMQP output message:

{
  "targetFilePath": "",
  "responseFilePath": ""
}

or

{
  "dossierId": "",
  "fileId": ""
}

Usage & API

Setup

Add the respective version of the pyinfra package to your pyproject.toml file. Make sure to add our gitlab registry as a source. For now, all internal packages used by pyinfra also have to be added to the pyproject.toml file. Execute poetry lock and poetry install to install the packages.

[tool.poetry.dependencies]
pyinfra = { version = "1.6.0", source = "gitlab-research" }
kn-utils = { version = "0.1.4", source = "gitlab-research" }

[[tool.poetry.source]]
name = "gitlab-research"
url = "https://gitlab.knecon.com/api/v4/groups/19/-/packages/pypi/simple"
priority = "explicit"

API

from pyinfra import config
from pyinfra.payload_processing.processor import make_payload_processor
from pyinfra.queue.queue_manager import QueueManager

pyinfra_config = config.get_config()

process_payload = make_payload_processor(process_data, config=pyinfra_config)

queue_manager = QueueManager(pyinfra_config)
queue_manager.start_consuming(process_payload)

process_data should expect a dict (json) or bytes (pdf) as input and should return a list of results.

Scripts

Run pyinfra locally

Shell 1: Start minio and rabbitmq containers

$ cd tests && docker-compose up

Shell 2: Start pyinfra with callback mock

$ python scripts/start_pyinfra.py 

Shell 3: Upload dummy content on storage and publish message

$ python scripts/send_request.py

Tests

Running all tests take a bit longer than you are probably used to, because among other things the required startup times are quite high for docker-compose dependent tests. This is why the tests are split into two parts. The first part contains all tests that do not require docker-compose and the second part contains all tests that require docker-compose. Per default, only the first part is executed, but when releasing a new version, all tests should be executed.

Description
Infrastructure container for analysis container
Readme 3.2 MiB
Release 4.1.0 Latest
2025-01-22 12:38:26 +01:00
Languages
Python 96.7%
Makefile 2%
Shell 1.3%