Go to file

Julius Unverfehrt 201ed5b9a8 Merge branch 'feature/RED-6685-support-absolute-paths' into 'master'

Add support for absolute file paths

See merge request knecon/research/pyinfra!77

2023-08-23 14:11:46 +02:00

pyinfra

Adjust log levels to reduce log clutter

2023-08-23 12:38:34 +02:00

scripts

Pull request #68 : RED-6273 multi tenant storage

2023-03-28 15:04:14 +02:00

tests

Update tests

2023-08-22 17:33:22 +02:00

.coveragerc

Pull request #30 : Multiple consumers fix

2022-04-21 17:45:14 +02:00

.gitignore

ignore bamboo YAML configs

2022-11-15 09:00:46 +01:00

.gitlab-ci.yml

Adjust log levels to reduce log clutter

2023-08-23 12:38:34 +02:00

.pre-commit-config.yaml

Pull request #57 : Bugfix/RED-5277 investigate missing heartbeat error

2023-02-15 16:02:17 +01:00

.python-version

upgrade dependencies, allow python>=3.8

2023-07-18 16:54:29 +02:00

Makefile

Pull request #57 : Bugfix/RED-5277 investigate missing heartbeat error

2023-02-15 16:02:17 +01:00

poetry.lock

RES-343 Update logging to knutils logger

2023-08-22 10:46:14 +02:00

pyproject.toml

RES-343 Update logging to knutils logger

2023-08-22 10:46:14 +02:00

pytest.ini

Update tests

2023-08-22 17:33:22 +02:00

README.md

Adjust log levels to reduce log clutter

2023-08-23 12:38:34 +02:00

README.md

PyInfra

About
Configuration
Response Format
Usage & API
Scripts
Tests

About

Common Module with the infrastructure to deploy Research Projects. The Infrastructure expects to be deployed in the same Pod / local environment as the analysis container and handles all outbound communication.

Configuration

A configuration is located in /config.yaml. All relevant variables can be configured via exporting environment variables.

Environment Variable	Default	Description
LOGGING_LEVEL_ROOT	"DEBUG"	Logging level for service logger
MONITORING_ENABLED	True	Enables Prometheus monitoring
PROMETHEUS_METRIC_PREFIX	"redactmanager_research_service"	Prometheus metric prefix, per convention '{product_name}_{service name}'
PROMETHEUS_HOST	"127.0.0.1"	Prometheus webserver address
PROMETHEUS_PORT	8080	Prometheus webserver port
RABBITMQ_HOST	"localhost"	RabbitMQ host address
RABBITMQ_PORT	"5672"	RabbitMQ host port
RABBITMQ_USERNAME	"user"	RabbitMQ username
RABBITMQ_PASSWORD	"bitnami"	RabbitMQ password
RABBITMQ_HEARTBEAT	60	Controls AMQP heartbeat timeout in seconds
RABBITMQ_CONNECTION_SLEEP	5	Controls AMQP connection sleep timer in seconds
REQUEST_QUEUE	"request_queue"	Requests to service
RESPONSE_QUEUE	"response_queue"	Responses by service
DEAD_LETTER_QUEUE	"dead_letter_queue"	Messages that failed to process
STORAGE_BACKEND	"s3"	The type of storage to use {s3, azure}
STORAGE_BUCKET	"redaction"	The bucket / container to pull files specified in queue requests from
STORAGE_ENDPOINT	"http://127.0.0.1:9000"	Endpoint for s3 storage
STORAGE_KEY	"root"	User for s3 storage
STORAGE_SECRET	"password"	Password for s3 storage
STORAGE_AZURECONNECTIONSTRING	"DefaultEndpointsProtocol=..."	Connection string for Azure storage
STORAGE_AZURECONTAINERNAME	"redaction"	AKS container
WRITE_CONSUMER_TOKEN	"False"	Value to see if we should write a consumer token to a file

Response Format

Expected AMQP input message:

Either use the legacy format with dossierId and fileId as strings or the new format where absolute paths are used. A tenant ID can be optionally provided in the message header (key: "X-TENANT-ID")

{
  "targetFilePath": "",
  "responseFilePath": ""
}

{
   "dossierId": "",
   "fileId": "",
   "targetFileExtension": "",
   "responseFileExtension": ""
}

Optionally, the input message can contain a field with the key "operations".

AMQP output message:

{
  "targetFilePath": "",
  "responseFilePath": ""
}

{
  "dossierId": "",
  "fileId": ""
}

Usage & API

Setup

Add the respective version of the pyinfra package to your pyproject.toml file. Make sure to add our gitlab registry as a source. For now, all internal packages used by pyinfra also have to be added to the pyproject.toml file. Execute poetry lock and poetry install to install the packages.

[tool.poetry.dependencies]
pyinfra = { version = "1.6.0", source = "gitlab-research" }
kn-utils = { version = "0.1.4", source = "gitlab-research" }

[[tool.poetry.source]]
name = "gitlab-research"
url = "https://gitlab.knecon.com/api/v4/groups/19/-/packages/pypi/simple"
priority = "explicit"

API

from pyinfra import config
from pyinfra.payload_processing.processor import make_payload_processor
from pyinfra.queue.queue_manager import QueueManager

pyinfra_config = config.get_config()

process_payload = make_payload_processor(process_data, config=pyinfra_config)

queue_manager = QueueManager(pyinfra_config)
queue_manager.start_consuming(process_payload)

process_data should expect a dict (json) or bytes (pdf) as input and should return a list of results.

Scripts

Run pyinfra locally

Shell 1: Start minio and rabbitmq containers

$ cd tests && docker-compose up

Shell 2: Start pyinfra with callback mock

$ python scripts/start_pyinfra.py

Shell 3: Upload dummy content on storage and publish message

$ python scripts/send_request.py

Tests

Running all tests take a bit longer than you are probably used to, because among other things the required startup times are quite high for docker-compose dependent tests. This is why the tests are split into two parts. The first part contains all tests that do not require docker-compose and the second part contains all tests that require docker-compose. Per default, only the first part is executed, but when releasing a new version, all tests should be executed.

Releases 36

Release 4.1.0 Latest

2025-01-22 12:38:26 +01:00

Languages

Python 96.7%

Makefile 2%

Shell 1.3%