Merge in RR/pyinfra from RED-6205-monitoring to master
Squashed commit of the following:
commit 529cedfd7c065a3f7364e4596b923f25f0af76b5
Author: Matthias Bisping <matthias.bisping@axbit.com>
Date: Thu Mar 16 14:57:26 2023 +0100
Remove unnecessary default argument to dict.get
commit b718531f568e89df77cc05039e5e7afe7111b9a4
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date: Thu Mar 16 14:56:50 2023 +0100
refactor
commit c039b0c25a6cd2ad2a72d237d0930c484c8e427c
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date: Thu Mar 16 13:22:17 2023 +0100
increase package version to reflect the recent changes
commit 0a983a4113f25cd692b68869e1f33ffbf7efc6f0
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date: Thu Mar 16 13:16:39 2023 +0100
remove processing result conversion to a list, since ner-predicion service actually returns a dictionary. It is now expected that the result is sized to perform the monitoring and json dumpable to upload it.
commit 541bf321410471dc09a354669b2778402286c09f
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date: Thu Mar 16 12:48:07 2023 +0100
remove no longer needed requirements
commit cfa182985d989a5b92a9a069a603daee72f37d49
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date: Thu Mar 16 11:14:58 2023 +0100
refactor payload formatting
- introduce PayloadFormatter class for better typehinting and bundling
of functionality
- parametrize payload formatting so the PayloadProcesser can adapt
better to differnt services/products
- move file extension parsing to its own module
commit f57663b86954b7164eeb6db013d862af88ec4584
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date: Wed Mar 15 12:22:08 2023 +0100
refactor payload parsing
- introduce QueueMessagePayloadParser for generality
and typehinting
- refactor file extension parsing algorithm
commit 713fb4a0dddecf5442ceda3988444d9887869dcf
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date: Tue Mar 14 17:07:02 2023 +0100
fix tests
commit a22ecf7ae93bc0bec235fba3fd9cbf6c1778aa13
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date: Tue Mar 14 16:31:26 2023 +0100
refactor payload parsing
- parameterize file and compression types allowed for files to download
and upload via config
- make a real value bag out of QueueMessagePayload and do the parsing
beforehand
- refector file extension parser to be more robust
commit 50b578d054ca47a94c907f5f8b585eca7ed626ac
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date: Tue Mar 14 13:21:32 2023 +0100
add monitoring
- add an optional prometheus monitor to monitor the average processing
time of a service per relevent paramater that is at this point defined
via the number of resulting elements.
commit de525e7fa2f846f7fde5b9a4b466039238da10cd
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date: Tue Mar 14 12:57:24 2023 +0100
fix bug in file extension parser not working if the file endings have prefixes
121 lines
5.6 KiB
Markdown
Executable File
121 lines
5.6 KiB
Markdown
Executable File
# PyInfra
|
|
|
|
1. [ About ](#about)
|
|
2. [ Configuration ](#configuration)
|
|
3. [ Response Format ](#response-format)
|
|
4. [ Usage & API ](#usage--api)
|
|
5. [ Scripts ](#scripts)
|
|
6. [ Tests ](#tests)
|
|
|
|
## About
|
|
|
|
Common Module with the infrastructure to deploy Research Projects.
|
|
The Infrastructure expects to be deployed in the same Pod / local environment as the analysis container and handles all outbound communication.
|
|
|
|
## Configuration
|
|
|
|
A configuration is located in `/config.yaml`. All relevant variables can be configured via exporting environment variables.
|
|
|
|
| Environment Variable | Default | Description |
|
|
|-------------------------------|----------------------------------|--------------------------------------------------------------------------|
|
|
| LOGGING_LEVEL_ROOT | "DEBUG" | Logging level for service logger |
|
|
| MONITORING_ENABLED | True | Enables Prometheus monitoring |
|
|
| PROMETHEUS_METRIC_PREFIX | "redactmanager_research_service" | Prometheus metric prefix, per convention '{product_name}_{service name}' |
|
|
| PROMETHEUS_HOST | "127.0.0.1" | Prometheus webserver address |
|
|
| PROMETHEUS_PORT | 8080 | Prometheus webserver port |
|
|
| RABBITMQ_HOST | "localhost" | RabbitMQ host address |
|
|
| RABBITMQ_PORT | "5672" | RabbitMQ host port |
|
|
| RABBITMQ_USERNAME | "user" | RabbitMQ username |
|
|
| RABBITMQ_PASSWORD | "bitnami" | RabbitMQ password |
|
|
| RABBITMQ_HEARTBEAT | 60 | Controls AMQP heartbeat timeout in seconds |
|
|
| RABBITMQ_CONNECTION_SLEEP | 5 | Controls AMQP connection sleep timer in seconds |
|
|
| REQUEST_QUEUE | "request_queue" | Requests to service |
|
|
| RESPONSE_QUEUE | "response_queue" | Responses by service |
|
|
| DEAD_LETTER_QUEUE | "dead_letter_queue" | Messages that failed to process |
|
|
| STORAGE_BACKEND | "s3" | The type of storage to use {s3, azure} |
|
|
| STORAGE_BUCKET | "redaction" | The bucket / container to pull files specified in queue requests from |
|
|
| STORAGE_ENDPOINT | "http://127.0.0.1:9000" | Endpoint for s3 storage |
|
|
| STORAGE_KEY | "root" | User for s3 storage |
|
|
| STORAGE_SECRET | "password" | Password for s3 storage |
|
|
| STORAGE_AZURECONNECTIONSTRING | "DefaultEndpointsProtocol=..." | Connection string for Azure storage |
|
|
| STORAGE_AZURECONTAINERNAME | "redaction" | AKS container |
|
|
| WRITE_CONSUMER_TOKEN | "False" | Value to see if we should write a consumer token to a file |
|
|
|
|
## Response Format
|
|
|
|
### Expected AMQP input message:
|
|
|
|
```json
|
|
{
|
|
"dossierId": "",
|
|
"fileId": "",
|
|
"targetFileExtension": "",
|
|
"responseFileExtension": ""
|
|
}
|
|
```
|
|
|
|
Optionally, the input message can contain a field with the key `"operations"`.
|
|
|
|
### AMQP output message:
|
|
|
|
```json
|
|
{
|
|
"dossierId": "",
|
|
"fileId": ""
|
|
}
|
|
```
|
|
|
|
## Usage & API
|
|
|
|
### Setup
|
|
|
|
Install project dependencies
|
|
|
|
```bash
|
|
make poetry
|
|
```
|
|
|
|
You don't have to install it independently in the project repo, just `import pyinfra` in any `.py`-file
|
|
|
|
or install form another project
|
|
|
|
```bash
|
|
poetry add git+ssh://git@git.iqser.com:2222/rr/pyinfra.git#TAG-NUMBER
|
|
```
|
|
|
|
### API
|
|
|
|
```python
|
|
from pyinfra.config import get_config
|
|
from pyinfra.payload_processing.processor import make_payload_processor
|
|
from pyinfra.queue.queue_manager import QueueManager
|
|
|
|
queue_manager = QueueManager(get_config())
|
|
queue_manager.start_consuming(make_payload_processor(data_processor))
|
|
```
|
|
The data_processor should expect a dict or bytes (pdf) as input and should return a list of results.
|
|
|
|
## Scripts
|
|
|
|
### Run pyinfra locally
|
|
|
|
**Shell 1**: Start minio and rabbitmq containers
|
|
```bash
|
|
$ cd tests && docker-compose up
|
|
```
|
|
|
|
**Shell 2**: Start pyinfra with callback mock
|
|
```bash
|
|
$ python scripts/start_pyinfra.py
|
|
```
|
|
|
|
**Shell 3**: Upload dummy content on storage and publish message
|
|
```bash
|
|
$ python scripts/mock_process_request.py
|
|
```
|
|
|
|
## Tests
|
|
|
|
The tests take a bit longer than you are probably used to, because among other things the required startup times are
|
|
quite high. The test runtime can be accelerated by setting 'autouse' to 'False'. In that case, run 'docker-compose up'
|
|
in the tests dir manually before running the tests. |