pyinfra/README.md
Francisco Schulz 7a740403bb update
2022-10-13 14:04:00 +02:00

107 lines
4.5 KiB
Markdown
Executable File

# Infrastructure to deploy Research Projects
The Infrastructure expects to be deployed in the same Pod / local environment as the analysis container and handles all outbound communication.
## Configuration
A configuration is located in `/config.yaml`. All relevant variables can be configured via exporting environment variables.
| Environment Variable | Default | Description |
| ----------------------------- | ------------------------------ | --------------------------------------------------------------------- |
| LOGGING_LEVEL_ROOT | DEBUG | Logging level for service logger |
| PROBING_WEBSERVER_HOST | "0.0.0.0" | Probe webserver address |
| PROBING_WEBSERVER_PORT | 8080 | Probe webserver port |
| PROBING_WEBSERVER_MODE | production | Webserver mode: {development, production} |
| RABBITMQ_HOST | localhost | RabbitMQ host address |
| RABBITMQ_PORT | 5672 | RabbitMQ host port |
| RABBITMQ_USERNAME | user | RabbitMQ username |
| RABBITMQ_PASSWORD | bitnami | RabbitMQ password |
| RABBITMQ_HEARTBEAT | 7200 | Controls AMQP heartbeat timeout in seconds |
| REQUEST_QUEUE | request_queue | Requests to service |
| RESPONSE_QUEUE | response_queue | Responses by service |
| DEAD_LETTER_QUEUE | dead_letter_queue | Messages that failed to process |
| ANALYSIS_ENDPOINT | "http://127.0.0.1:5000" | Endpoint for analysis container |
| STORAGE_BACKEND | s3 | The type of storage to use {s3, azure} |
| STORAGE_BUCKET | "redaction" | The bucket / container to pull files specified in queue requests from |
| STORAGE_ENDPOINT | "http://127.0.0.1:9000" | Endpoint for s3 storage |
| STORAGE_KEY | root | User for s3 storage |
| STORAGE_SECRET | password | Password for s3 storage |
| STORAGE_AZURECONNECTIONSTRING | "DefaultEndpointsProtocol=..." | Connection string for Azure storage |
| STORAGE_AZURECONTAINERNAME | "redaction" | AKS container |
## Response Format
### Expected AMQP input message:
```json
{
"dossierId": "",
"fileId": "",
"targetFileExtension": "",
"responseFileExtension": ""
}
```
Optionally, the input message can contain a field with the key `"operations"`.
### AMQP output message:
```json
{
"dossierId": "",
"fileId": "",
...
}
```
## Development
Either run `src/serve.py` or the built Docker image.
### Setup
Install module.
```bash
pip install -e .
pip install -r requirements.txt
```
or build docker image.
```bash
docker build -f Dockerfile -t pyinfra .
```
### Usage
**Shell 1:** Start a MinIO and a RabbitMQ docker container.
```bash
docker-compose up
```
**Shell 2:** Add files to the local minio storage.
```bash
python scripts/manage_minio.py add <MinIO target folder> -d path/to/a/folder/with/PDFs
```
**Shell 2:** Run pyinfra-server.
```bash
python src/serve.py
```
or as container:
```bash
docker run --net=host pyinfra
```
**Shell 3:** Run analysis-container.
**Shell 4:** Start a client that sends requests to process PDFs from the MinIO store and annotates these PDFs according to the service responses.
```bash
python scripts/mock_client.py
```