Update kn-utils for missing loglevels fix, which is needed for queue manager error logging.
154 lines
6.7 KiB
Markdown
Executable File
154 lines
6.7 KiB
Markdown
Executable File
# PyInfra
|
|
|
|
1. [ About ](#about)
|
|
2. [ Configuration ](#configuration)
|
|
3. [ Response Format ](#response-format)
|
|
4. [ Usage & API ](#usage--api)
|
|
5. [ Scripts ](#scripts)
|
|
6. [ Tests ](#tests)
|
|
|
|
## About
|
|
|
|
Common Module with the infrastructure to deploy Research Projects.
|
|
The Infrastructure expects to be deployed in the same Pod / local environment as the analysis container and handles all outbound communication.
|
|
|
|
## Configuration
|
|
|
|
A configuration is located in `/config.yaml`. All relevant variables can be configured via exporting environment variables.
|
|
|
|
| Environment Variable | Default | Description |
|
|
|-------------------------------|----------------------------------|--------------------------------------------------------------------------|
|
|
| LOGGING_LEVEL_ROOT | "DEBUG" | Logging level for service logger |
|
|
| MONITORING_ENABLED | True | Enables Prometheus monitoring |
|
|
| PROMETHEUS_METRIC_PREFIX | "redactmanager_research_service" | Prometheus metric prefix, per convention '{product_name}_{service name}' |
|
|
| PROMETHEUS_HOST | "127.0.0.1" | Prometheus webserver address |
|
|
| PROMETHEUS_PORT | 8080 | Prometheus webserver port |
|
|
| RABBITMQ_HOST | "localhost" | RabbitMQ host address |
|
|
| RABBITMQ_PORT | "5672" | RabbitMQ host port |
|
|
| RABBITMQ_USERNAME | "user" | RabbitMQ username |
|
|
| RABBITMQ_PASSWORD | "bitnami" | RabbitMQ password |
|
|
| RABBITMQ_HEARTBEAT | 60 | Controls AMQP heartbeat timeout in seconds |
|
|
| RABBITMQ_CONNECTION_SLEEP | 5 | Controls AMQP connection sleep timer in seconds |
|
|
| REQUEST_QUEUE | "request_queue" | Requests to service |
|
|
| RESPONSE_QUEUE | "response_queue" | Responses by service |
|
|
| DEAD_LETTER_QUEUE | "dead_letter_queue" | Messages that failed to process |
|
|
| STORAGE_BACKEND | "s3" | The type of storage to use {s3, azure} |
|
|
| STORAGE_BUCKET | "redaction" | The bucket / container to pull files specified in queue requests from |
|
|
| STORAGE_ENDPOINT | "http://127.0.0.1:9000" | Endpoint for s3 storage |
|
|
| STORAGE_KEY | "root" | User for s3 storage |
|
|
| STORAGE_SECRET | "password" | Password for s3 storage |
|
|
| STORAGE_AZURECONNECTIONSTRING | "DefaultEndpointsProtocol=..." | Connection string for Azure storage |
|
|
| STORAGE_AZURECONTAINERNAME | "redaction" | AKS container |
|
|
| WRITE_CONSUMER_TOKEN | "False" | Value to see if we should write a consumer token to a file |
|
|
|
|
## Response Format
|
|
|
|
### Expected AMQP input message:
|
|
|
|
Either use the legacy format with dossierId and fileId as strings or the new format where absolute paths are used.
|
|
A tenant ID can be optionally provided in the message header (key: "X-TENANT-ID")
|
|
|
|
|
|
```json
|
|
{
|
|
"targetFilePath": "",
|
|
"responseFilePath": ""
|
|
}
|
|
```
|
|
|
|
or
|
|
|
|
```json
|
|
{
|
|
"dossierId": "",
|
|
"fileId": "",
|
|
"targetFileExtension": "",
|
|
"responseFileExtension": ""
|
|
}
|
|
```
|
|
|
|
Optionally, the input message can contain a field with the key `"operations"`.
|
|
|
|
### AMQP output message:
|
|
|
|
|
|
```json
|
|
{
|
|
"targetFilePath": "",
|
|
"responseFilePath": ""
|
|
}
|
|
```
|
|
|
|
or
|
|
|
|
```json
|
|
{
|
|
"dossierId": "",
|
|
"fileId": ""
|
|
}
|
|
```
|
|
|
|
## Usage & API
|
|
|
|
### Setup
|
|
|
|
Add the respective version of the pyinfra package to your pyproject.toml file. Make sure to add our gitlab registry as a source.
|
|
For now, all internal packages used by pyinfra also have to be added to the pyproject.toml file.
|
|
Execute `poetry lock` and `poetry install` to install the packages.
|
|
|
|
You can look up the latest version of the package in the [gitlab registry](https://gitlab.knecon.com/knecon/research/pyinfra/-/packages).
|
|
For the used versions of internal dependencies, please refer to the [pyproject.toml](pyproject.toml) file.
|
|
|
|
```toml
|
|
[tool.poetry.dependencies]
|
|
pyinfra = { version = "x.x.x", source = "gitlab-research" }
|
|
kn-utils = { version = "x.x.x", source = "gitlab-research" }
|
|
|
|
[[tool.poetry.source]]
|
|
name = "gitlab-research"
|
|
url = "https://gitlab.knecon.com/api/v4/groups/19/-/packages/pypi/simple"
|
|
priority = "explicit"
|
|
```
|
|
|
|
### API
|
|
|
|
```python
|
|
from pyinfra import config
|
|
from pyinfra.payload_processing.processor import make_payload_processor
|
|
from pyinfra.queue.queue_manager import QueueManager
|
|
|
|
pyinfra_config = config.get_config()
|
|
|
|
process_payload = make_payload_processor(process_data, config=pyinfra_config)
|
|
|
|
queue_manager = QueueManager(pyinfra_config)
|
|
queue_manager.start_consuming(process_payload)
|
|
```
|
|
|
|
`process_data` should expect a dict (json) or bytes (pdf) as input and should return a list of results.
|
|
|
|
## Scripts
|
|
|
|
### Run pyinfra locally
|
|
|
|
**Shell 1**: Start minio and rabbitmq containers
|
|
```bash
|
|
$ cd tests && docker-compose up
|
|
```
|
|
|
|
**Shell 2**: Start pyinfra with callback mock
|
|
```bash
|
|
$ python scripts/start_pyinfra.py
|
|
```
|
|
|
|
**Shell 3**: Upload dummy content on storage and publish message
|
|
```bash
|
|
$ python scripts/send_request.py
|
|
```
|
|
|
|
## Tests
|
|
|
|
Running all tests take a bit longer than you are probably used to, because among other things the required startup times are
|
|
quite high for docker-compose dependent tests. This is why the tests are split into two parts. The first part contains all
|
|
tests that do not require docker-compose and the second part contains all tests that require docker-compose.
|
|
Per default, only the first part is executed, but when releasing a new version, all tests should be executed. |