Go to file

Julius Unverfehrt 793a427c50 Pull request #68 : RED-6273 multi tenant storage

Merge in RR/pyinfra from RED-6273-multi-tenant-storage to master

Squashed commit of the following:

commit 0fead1f8b59c9187330879b4e48d48355885c27c
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Tue Mar 28 15:02:22 2023 +0200

    fix typos

commit 892a803726946876f8b8cd7905a0e73c419b2fb1
Author: Matthias Bisping <matthias.bisping@axbit.com>
Date:   Tue Mar 28 14:41:49 2023 +0200

    Refactoring

    Replace custom storage caching logic with LRU decorator

commit eafcd90260731e3360ce960571f07dee8f521327
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Fri Mar 24 12:50:13 2023 +0100

    fix bug in storage connection from endpoint

commit d0c9fb5b7d1c55ae2f90e8faa1efec9f7587c26a
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Fri Mar 24 11:49:34 2023 +0100

    add logs to PayloadProcessor

    - set log messages to determine if x-tenant
    storage connection is working

commit 97309fe58037b90469cf7a3de342d4749a0edfde
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Fri Mar 24 10:41:59 2023 +0100

    update PayloadProcessor

    - introduce storage cache to make every unique
    storage connection only once
    - add functionality to pass optional processing
    kwargs in queue message like the operation key to
    the processing function

commit d48e8108fdc0d463c89aaa0d672061ab7dca83a0
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Wed Mar 22 13:34:43 2023 +0100

    add multi-tenant storage connection 1st iteration

    - forward x-tenant-id from queue message header to
    payload processor
    - add functions to receive storage infos from an
    endpoint or the config. This enables hashing and
    caching of connections created from these infos
    - add function to initialize storage connections
    from storage infos
    - streamline and refactor tests to make them more
    readable and robust and to make it easier to add
     new tests
    - update payload processor with first iteration
    of multi tenancy storage connection support
    with connection caching and backwards compability

commit 52c047c47b98e62d0b834a9b9b6c0e2bb0db41e5
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Tue Mar 21 15:35:57 2023 +0100

    add AES/GCM cipher functions

    - decrypt x-tenant storage connection strings

2023-03-28 15:04:14 +02:00

bamboo-specs

Pull request #64 : update java version for sonar-scan

2023-03-14 08:39:41 +01:00

pyinfra

Pull request #68 : RED-6273 multi tenant storage

2023-03-28 15:04:14 +02:00

scripts

Pull request #68 : RED-6273 multi tenant storage

2023-03-28 15:04:14 +02:00

tests

Pull request #68 : RED-6273 multi tenant storage

2023-03-28 15:04:14 +02:00

.coveragerc

Pull request #30 : Multiple consumers fix

2022-04-21 17:45:14 +02:00

.gitignore

ignore bamboo YAML configs

2022-11-15 09:00:46 +01:00

.pre-commit-config.yaml

Pull request #57 : Bugfix/RED-5277 investigate missing heartbeat error

2023-02-15 16:02:17 +01:00

.python-version

convert into python package

2022-11-03 16:10:12 +01:00

Makefile

Pull request #57 : Bugfix/RED-5277 investigate missing heartbeat error

2023-02-15 16:02:17 +01:00

poetry.lock

Pull request #68 : RED-6273 multi tenant storage

2023-03-28 15:04:14 +02:00

pyproject.toml

Pull request #68 : RED-6273 multi tenant storage

2023-03-28 15:04:14 +02:00

README.md

Pull request #65 : RED-6205 monitoring

2023-03-16 16:08:44 +01:00

requirements.txt

Pull request #57 : Bugfix/RED-5277 investigate missing heartbeat error

2023-02-15 16:02:17 +01:00

sonar-project.properties

add sonar config

2022-11-10 15:25:44 +01:00

README.md

PyInfra

About
Configuration
Response Format
Usage & API
Scripts
Tests

About

Common Module with the infrastructure to deploy Research Projects. The Infrastructure expects to be deployed in the same Pod / local environment as the analysis container and handles all outbound communication.

Configuration

A configuration is located in /config.yaml. All relevant variables can be configured via exporting environment variables.

Environment Variable	Default	Description
LOGGING_LEVEL_ROOT	"DEBUG"	Logging level for service logger
MONITORING_ENABLED	True	Enables Prometheus monitoring
PROMETHEUS_METRIC_PREFIX	"redactmanager_research_service"	Prometheus metric prefix, per convention '{product_name}_{service name}'
PROMETHEUS_HOST	"127.0.0.1"	Prometheus webserver address
PROMETHEUS_PORT	8080	Prometheus webserver port
RABBITMQ_HOST	"localhost"	RabbitMQ host address
RABBITMQ_PORT	"5672"	RabbitMQ host port
RABBITMQ_USERNAME	"user"	RabbitMQ username
RABBITMQ_PASSWORD	"bitnami"	RabbitMQ password
RABBITMQ_HEARTBEAT	60	Controls AMQP heartbeat timeout in seconds
RABBITMQ_CONNECTION_SLEEP	5	Controls AMQP connection sleep timer in seconds
REQUEST_QUEUE	"request_queue"	Requests to service
RESPONSE_QUEUE	"response_queue"	Responses by service
DEAD_LETTER_QUEUE	"dead_letter_queue"	Messages that failed to process
STORAGE_BACKEND	"s3"	The type of storage to use {s3, azure}
STORAGE_BUCKET	"redaction"	The bucket / container to pull files specified in queue requests from
STORAGE_ENDPOINT	"http://127.0.0.1:9000"	Endpoint for s3 storage
STORAGE_KEY	"root"	User for s3 storage
STORAGE_SECRET	"password"	Password for s3 storage
STORAGE_AZURECONNECTIONSTRING	"DefaultEndpointsProtocol=..."	Connection string for Azure storage
STORAGE_AZURECONTAINERNAME	"redaction"	AKS container
WRITE_CONSUMER_TOKEN	"False"	Value to see if we should write a consumer token to a file

Response Format

Expected AMQP input message:

{
   "dossierId": "",
   "fileId": "",
   "targetFileExtension": "",
   "responseFileExtension": ""
}

Optionally, the input message can contain a field with the key "operations".

AMQP output message:

{
  "dossierId": "",
  "fileId": ""
}

Usage & API

Setup

Install project dependencies

 make poetry

You don't have to install it independently in the project repo, just import pyinfra in any .py-file

or install form another project

poetry add git+ssh://git@git.iqser.com:2222/rr/pyinfra.git#TAG-NUMBER

API

from pyinfra.config import get_config
from pyinfra.payload_processing.processor import make_payload_processor
from pyinfra.queue.queue_manager import QueueManager

queue_manager = QueueManager(get_config())
queue_manager.start_consuming(make_payload_processor(data_processor))

The data_processor should expect a dict or bytes (pdf) as input and should return a list of results.

Scripts

Run pyinfra locally

Shell 1: Start minio and rabbitmq containers

$ cd tests && docker-compose up

Shell 2: Start pyinfra with callback mock

$ python scripts/start_pyinfra.py

Shell 3: Upload dummy content on storage and publish message

$ python scripts/mock_process_request.py

Tests

The tests take a bit longer than you are probably used to, because among other things the required startup times are quite high. The test runtime can be accelerated by setting 'autouse' to 'False'. In that case, run 'docker-compose up' in the tests dir manually before running the tests.

Releases 36

Release 4.1.0 Latest

2025-01-22 12:38:26 +01:00

Languages

Python 96.7%

Makefile 2%

Shell 1.3%