Compare commits

..

315 Commits

Author SHA1 Message Date
Julius Unverfehrt
3ef4246d1e chore: fuzzy pin kn-utils to allow for future updates 2025-01-22 12:36:38 +01:00
Julius Unverfehrt
841c492639 Merge branch 'chore/RES-871-update-callback' into 'master'
feat:BREAKING CHANGE: download callback no forwards all files as bytes

See merge request knecon/research/pyinfra!108
2025-01-16 11:11:59 +01:00
Julius Unverfehrt
ead069d3a7 chore: adjust docstrings 2025-01-16 10:35:06 +01:00
Julius Unverfehrt
044ea6cf0a feat: streamline download to always include the filename of the downloaded file 2025-01-16 10:29:50 +01:00
Julius Unverfehrt
ff7547e2c6 fix: remove faulty import 2025-01-16 10:29:50 +01:00
Julius Unverfehrt
fbf79ef758 chore: regenerate BOM 2025-01-16 10:29:50 +01:00
Julius Unverfehrt
f382887d40 chore: seek and destroy proto in code 2025-01-16 10:29:50 +01:00
Julius Unverfehrt
5c4400aa8b feat:BREAKING CHANGE: download callback no forwards all files as bytes 2025-01-16 10:29:46 +01:00
Jonathan Kössler
5ce66f18a0 Merge branch 'bugfix/RED-10722' into 'master'
fix: dlq init

See merge request knecon/research/pyinfra!109
2025-01-15 10:56:12 +01:00
Jonathan Kössler
ea0c55930a chore: remove test nack 2025-01-15 10:00:50 +01:00
Jonathan Kössler
87f57e2244 fix: dlq init 2025-01-14 16:39:47 +01:00
Jonathan Kössler
3fb8c4e641 fix: do not use groups for packages 2024-12-18 16:33:35 +01:00
Jonathan Kössler
e23f63acf0 Merge branch 'chore/nexus-package-registry' into 'master'
RES-914: move package registry to nexus

See merge request knecon/research/pyinfra!106
2024-11-20 10:02:52 +01:00
Jonathan Kössler
d3fecc518e chore: move integration tests to own subfolder 2024-11-18 17:31:15 +01:00
Jonathan Kössler
341500d463 chore: set lower bound for opentelemetry dependencies 2024-11-18 17:28:11 +01:00
Jonathan Kössler
e002f77fd5 Revert "chore: update opentelemetry for proto v5 support"
This reverts commit 3c6d8f2dcc73b17f329f9cecb8d4d301f848dc1e.
2024-11-18 17:19:37 +01:00
Jonathan Kössler
3c6d8f2dcc chore: update opentelemetry for proto v5 support 2024-11-18 15:14:34 +01:00
Jonathan Kössler
f6d6ba40bb chore: add pytest-cov 2024-11-18 13:57:39 +01:00
Jonathan Kössler
6a0bbad108 ops: update CI 2024-11-18 13:53:11 +01:00
Jonathan Kössler
527a671a75 feat: move package registry to nexus 2024-11-18 13:49:48 +01:00
Jonathan Kössler
cf91189728 Merge branch 'feature/RED-10441' into 'master'
RED-10441: separate queue and webserver shutdown

See merge request knecon/research/pyinfra!105
2024-11-13 17:17:13 +01:00
Jonathan Kössler
61a6d0eeed feat: separate queue and webserver shutdown 2024-11-13 17:02:21 +01:00
Jonathan Kössler
bc0b355ff9 Merge branch 'feature/RED-10441' into 'master'
RED-10441: ensure queue manager shutdown

See merge request knecon/research/pyinfra!104
2024-11-13 16:34:25 +01:00
Jonathan Kössler
235e27b74c chore: bump version 2024-11-13 16:31:48 +01:00
Jonathan Kössler
1540c2894e feat: ensure shutdown of queue manager 2024-11-13 16:30:18 +01:00
Jonathan Kössler
9b60594ce1 Merge branch 'feature/RED-10441' into 'master'
RED-10441: Fix graceful shutdown

See merge request knecon/research/pyinfra!103
2024-11-13 14:48:34 +01:00
Jonathan Kössler
3d3c76b466 chore: bump version 2024-11-13 13:55:15 +01:00
Jonathan Kössler
9d4ec84b49 fix: use signals for graceful shutdown 2024-11-13 13:54:41 +01:00
Jonathan Kössler
8891249d7a Merge branch 'feature/RED-10441' into 'master'
RED-10441: fix abandoned queues

See merge request knecon/research/pyinfra!102
2024-11-13 09:35:36 +01:00
Jonathan Kössler
e51e5c33eb chore: cleanup 2024-11-12 17:24:57 +01:00
Jonathan Kössler
04c90533b6 refactor: fetch active tenants before start 2024-11-12 17:11:33 +01:00
Jonathan Kössler
86af05c12c feat: add logger to retry 2024-11-12 16:50:23 +01:00
Jonathan Kössler
c6e336cb35 refactor: tenant queues init 2024-11-12 15:55:11 +01:00
Jonathan Kössler
bf6f95f3e0 feat: exit on ClientResponseError 2024-11-12 15:32:11 +01:00
Jonathan Kössler
ed2bd1ec86 refactor: raise error if tenant service is not available 2024-11-12 13:30:21 +01:00
Julius Unverfehrt
9906f68e0a chore: bumb versions to enable package rebuild (current package has the wrong hash due to backup issues) 2024-11-11 12:47:27 +01:00
Julius Unverfehrt
0af648d66c fix: rebuild since mia and update rebuild kn_utils 2024-11-08 13:52:08 +01:00
Jonathan Kössler
46dc1fdce4 Merge branch 'feature/RES-809' into 'master'
RES-809: update kn_utils

See merge request knecon/research/pyinfra!101
2024-10-23 18:01:25 +02:00
Jonathan Kössler
bd2f0b9b9a feat: switch out tenacity retry with kn_utils 2024-10-23 16:06:06 +02:00
Jonathan Kössler
131afd7d3e chore: update kn_utils 2024-10-23 16:04:08 +02:00
Jonathan Kössler
98532c60ed Merge branch 'feature/RES-858-fix-graceful-shutdown' into 'master'
RES-858: fix graceful shutdown for unexpected broker disconnects

See merge request knecon/research/pyinfra!100
2024-09-30 09:54:25 +02:00
Jonathan Kössler
45377ba172 feat: improve on close callback and simplify exception handling 2024-09-27 17:11:10 +02:00
Jonathan Kössler
f855224e29 feat: add on close callback 2024-09-27 10:00:41 +02:00
Jonathan Kössler
541219177f feat: add error handling to shutdown logic 2024-09-26 12:28:55 +02:00
Jonathan Kössler
4119a7d7d7 chore: bump version 2024-09-26 11:05:12 +02:00
Jonathan Kössler
e2edfa7260 fix: simplify webserver shutdown 2024-09-26 10:33:05 +02:00
Jonathan Kössler
b70b16c541 Merge branch 'feature/RES-856-test-proto-format' into 'master'
RES-856: add type tests for proto format

See merge request knecon/research/pyinfra!99
2024-09-26 10:07:29 +02:00
Jonathan Kössler
e8d9326e48 chore: rewrite lock and bump version 2024-09-26 09:45:42 +02:00
Jonathan Kössler
9669152e14 Merge branch 'master' into feature/RES-856-test-proto-format 2024-09-26 09:39:28 +02:00
Jonathan Kössler
ed3f8088e1 Merge branch 'feature/RES-844-fix-tracing' into 'master'
RES-844: fix opentelemtry tracing

See merge request knecon/research/pyinfra!98
2024-09-26 09:13:52 +02:00
Jonathan Kössler
66eaa9a748 feat: set range for protobuf version 2024-09-25 14:16:40 +02:00
Jonathan Kössler
3a04359320 chore: bump pyinfra version 2024-09-25 11:59:52 +02:00
Jonathan Kössler
b46fcbd977 feat: add AioPikaInstrumentor 2024-09-25 11:58:51 +02:00
Jonathan Kössler
e75df42bec feat: skip keys in int conversion 2024-09-25 11:07:20 +02:00
Jonathan Kössler
3bab86fe83 chore: update test files 2024-09-24 11:59:08 +02:00
Jonathan Kössler
c5d53b8665 feat: add file comparison 2024-09-24 11:57:33 +02:00
Jonathan Kössler
09d39930e7 chore: cleanup test 2024-09-23 16:43:59 +02:00
Jonathan Kössler
a81f1bf31a chore: update protobuf to 25.5 2024-09-23 16:41:57 +02:00
Francisco Schulz
0783e95d22 Merge branch 'RED-10017-investigate-crashing-py-services-when-upload-large-number-of-files' into 'master'
fix: add semaphore to AsyncQueueManager to limit concurrent tasks

See merge request knecon/research/pyinfra!97
2024-09-23 15:19:40 +02:00
Francisco Schulz
8ec13502a9 fix: add semaphore to AsyncQueueManager to limit concurrent tasks 2024-09-23 15:19:40 +02:00
Jonathan Kössler
43881de526 feat: add tests for types of documentreader 2024-09-20 16:42:55 +02:00
Julius Unverfehrt
67c30a5620 fix: recompile proto schemas with experimental schema update 2024-09-20 15:23:13 +02:00
Francisco Schulz
8e21b2144c Merge branch 'fix-poetry-version' into 'master'
chore: update package version

See merge request knecon/research/pyinfra!96
2024-09-02 16:56:58 +02:00
francisco.schulz
5b45cae9a0 chore: update package version 2024-09-02 10:53:09 -04:00
Francisco Schulz
f2a5a2ea0e Merge branch 'custom-build-image-classification-service-protobuf' into 'master'
fix(temp): set protobuf version range to >=v3,<v4 so image-classification model keeps working

See merge request knecon/research/pyinfra!95
2024-09-02 16:48:56 +02:00
francisco.schulz
2133933d25 chore: update dependencies 2024-08-30 08:42:19 -04:00
francisco.schulz
4c8dc6ccc0 fix(temp): set protobuf version range to >=v3,<v4 so image-classification model keeps working 2024-08-30 08:37:31 -04:00
Julius Unverfehrt
5f31e2b15f Merge branch 'RES-842-pyinfra-fix-rabbit-mq-handler-shuts-down-when-queues-not-available-yet' into 'master'
fix(queuemanager): add retries to prevent container from shutting down when queues are not available yet

See merge request knecon/research/pyinfra!94
2024-08-30 13:59:02 +02:00
francisco.schulz
88aef57c5f chore: version increase 2024-08-29 11:18:36 -04:00
francisco.schulz
2b129b35f4 fix(queuemanager): add retries to prevent container from shutting down when queues are not available yet 2024-08-29 11:17:11 -04:00
Jonathan Kössler
facb9726f9 Merge branch 'feature/RES-840-add-client-connector-error' into 'master'
feat: add ClientConnectorError

See merge request knecon/research/pyinfra!93
2024-08-28 14:39:40 +02:00
Jonathan Kössler
b6a2069a6a feat: add ClientConnectorError 2024-08-28 10:28:12 +02:00
Jonathan Kössler
f626ef2e6f Merge branch 'bugfix/RES-834-service-disconnects' into 'master'
fix: pod restarts due to health check

See merge request knecon/research/pyinfra!92
2024-08-26 15:10:51 +02:00
Jonathan Kössler
318779413a fix: add signal to webserver 2024-08-23 17:23:53 +02:00
Jonathan Kössler
f27b1fbba1 chore: bump version 2024-08-23 16:56:54 +02:00
Jonathan Kössler
f2018f9c86 fix: process message in thread in event loop 2024-08-23 16:56:24 +02:00
Julius Unverfehrt
a5167d1230 Merge branch 'bugfix/RES-826-fix-initial-startup' into 'master'
fix: add async webserver for probes

See merge request knecon/research/pyinfra!91
2024-08-21 17:25:35 +02:00
Jonathan Kössler
1e939febc2 refactor: function naming 2024-08-21 17:02:04 +02:00
Jonathan Kössler
564f2cbb43 chore: bump version 2024-08-21 16:25:17 +02:00
Jonathan Kössler
fa44f36088 feat: add async webserver for probes 2024-08-21 16:24:20 +02:00
Jonathan Kössler
2970823cc1 Merge branch 'refactor/tenant_queue_settings' into 'master'
refactor: tenant queues settings

See merge request knecon/research/pyinfra!90
2024-08-19 14:43:24 +02:00
Jonathan Kössler
dba348a621 refactor: tenant queues settings 2024-08-19 14:37:48 +02:00
Jonathan Kössler
5020e54dcc Merge branch 'fix/RES-820-channel-opening' into 'master'
fix: use is_initialized instead of is_open

See merge request knecon/research/pyinfra!89
2024-08-16 14:23:46 +02:00
Jonathan Kössler
2bc332831e fix: use is_initialized instead of is_open 2024-08-16 12:37:28 +02:00
Jonathan Kössler
b3f1529be2 chore: bump version 2024-08-06 09:48:09 +02:00
Jonathan Kössler
789f6a7f7c Merge branch 'feat/RES-757-protobuffer' into 'master'
feat: add protobuffer

See merge request knecon/research/pyinfra!87
2024-08-06 09:44:01 +02:00
Jonathan Kössler
06ce8bbb22 Merge branch 'master' into feat/RES-757-protobuffer 2024-08-05 11:01:40 +02:00
Jonathan Kössler
fdde56991b Merge branch 'refactor/RES-780-graceful-shutdown' into 'master'
refactor: graceful shutdown

See merge request knecon/research/pyinfra!88
2024-08-02 13:57:04 +02:00
Jonathan Kössler
cb8509b120 refactor: message counter 2024-08-01 17:42:59 +02:00
Jonathan Kössler
47b42e95e2 refactor: graceful shutdown 2024-08-01 15:31:58 +02:00
Jonathan Kössler
536284ed84 chore: update readme 2024-08-01 09:56:13 +02:00
Jonathan Kössler
aeac1c58f9 chore: bump pyinfra version 2024-07-31 16:05:42 +02:00
Jonathan Kössler
b12b1ce42b refactor: use protoc 4.25.x as compiler to avoid dependency issues 2024-07-31 16:04:43 +02:00
Jonathan Kössler
50b7a877e9 fix: poetry lock 2024-07-30 10:45:37 +02:00
Jonathan Kössler
f3d0f24ea6 Merge branch 'master' into feat/RES-757-protobuffer 2024-07-30 10:40:56 +02:00
Jonathan Kössler
8f1ad1a4bd Merge branch 'feature/RES-731-add-queues-per-tenant' into 'master'
feat: refractor to work asynchronously

See merge request knecon/research/pyinfra!86
2024-07-29 15:06:05 +02:00
Jonathan Kössler
2a2028085e feat: add async retry for tenant server calls 2024-07-25 14:45:19 +02:00
Jonathan Kössler
66aaeca928 fix: async queue test 2024-07-24 17:28:13 +02:00
Jonathan Kössler
23aaaf68b1 refactor: simplify rabbitmq config 2024-07-23 18:34:50 +02:00
Jonathan Kössler
c7e0df758e feat: add async health endpoint 2024-07-23 15:42:48 +02:00
Jonathan Kössler
13d670091c chore: update readme 2024-07-22 17:31:32 +02:00
Jonathan Kössler
1520e96287 refactor: cleanup codebase 2024-07-22 16:57:02 +02:00
Jonathan Kössler
28451e8f8f chore: bump pyinfra version 2024-07-22 16:54:28 +02:00
Jonathan Kössler
596d4a9bd0 feat: add expiration for tenant event queue and retry to tenant api call 2024-07-22 16:48:31 +02:00
Julius Unverfehrt
70d3a210a1 feat: update data loader tests
We now compare the output proto json conversion to expected json files.
This revealed multiple differences between the file.

FIXED: int64 type was cast into string in python. We now get proper
integers

TODO: Empty fields are omitted by proto, but the jsons have them and the
pyinfra implementing services might expect them. We have to test this
behaviour and adjusts the tests accordingly.
2024-07-18 12:36:29 +02:00
Jonathan Kössler
f935056fa9 refactor: dataloader to not crash on unknown file formats 2024-07-17 13:54:50 +02:00
Jonathan Kössler
eeb4c3ce29 fix: add await to is_ready 2024-07-17 11:41:31 +02:00
Jonathan Kössler
b8833c7560 fix: settings mapping 2024-07-17 10:51:14 +02:00
Julius Unverfehrt
f175633f30 chore: track proto buf test data with dvc 2024-07-16 17:36:50 +02:00
Julius Unverfehrt
ceac21c1ef deps: add dvc 2024-07-16 17:35:03 +02:00
Julius Unverfehrt
0d232226fd feat: integrate proto data loader in pipeline 2024-07-16 17:34:39 +02:00
Julius Unverfehrt
9d55b3be89 feat: implement proto data loader 2024-07-16 16:32:58 +02:00
Julius Unverfehrt
edba6fc4da feat: track proto schmemata & add compilations to package 2024-07-16 16:31:48 +02:00
Julius Unverfehrt
c5d8a6ed84 feat: add proto requirements and instructions to readme for compiling the schemata 2024-07-16 16:30:32 +02:00
Julius Unverfehrt
c16000c774 fix(tracing test): make test work in case azure conntection string is missing 2024-07-15 16:13:41 +02:00
Jonathan Kössler
02665a5ef8 feat: align async queue manager 2024-07-12 15:14:13 +02:00
Jonathan Kössler
9c28498d8a feat: rollback testing logic for send_request 2024-07-12 15:12:46 +02:00
Jonathan Kössler
3c3580d3bc feat: add backwards compatibility 2024-07-12 12:26:56 +02:00
Jonathan Kössler
8ac16de0fa feat: add backwards compatibility 2024-07-12 12:23:45 +02:00
Jonathan Kössler
8844df44ce feat: add async_v2 2024-07-12 12:12:55 +02:00
Jonathan Kössler
a5162d5bf0 chore: update poetry deps 2024-07-12 12:10:31 +02:00
francisco.schulz
f9aec74d55 chore: clean up + improve robustness 2024-07-11 15:54:21 -04:00
francisco.schulz
7559118822 fix: remove sleep commands 2024-07-11 14:50:11 -04:00
francisco.schulz
5ff65f2cf4 feat(tests): add RabbitMQHandler class tests 2024-07-11 14:46:41 -04:00
francisco.schulz
cc25a20c24 feat(process_input_message): add message processing logic with support to pass in external message processor 2024-07-11 12:21:48 -04:00
francisco.schulz
f723bcb9b1 fix(fetch_active_tenants): propper async API call 2024-07-11 12:06:59 -04:00
francisco.schulz
abde776cd1 feat(RabbitMQHandler): add async test class 2024-07-11 11:55:52 -04:00
francisco.schulz
aa23894858 chose(dependencies): update 2024-07-11 11:55:17 -04:00
Jonathan Kössler
2da4f37620 feat: wip for multiple tenants - for pkg build 2024-07-11 12:49:07 +02:00
Jonathan Kössler
9b20a67ace feat: wip for multiple tenants - for pkg build 2024-07-11 11:41:09 +02:00
Jonathan Kössler
7b6408e0de feat: wip for multiple tenants - for pkg build 2024-07-11 11:04:02 +02:00
Jonathan Kössler
6e7c4ccb7b feat: wip for multiple tenants - for pkg build 2024-07-10 11:45:47 +02:00
Jonathan Kössler
b2e3ae092f feat: wip for multiple tenants 2024-07-09 18:20:55 +02:00
Jonathan Kössler
de41030e69 feat: wip for multiple tenants 2024-07-05 13:27:16 +02:00
Jonathan Kössler
c81d967aee feat: wip for multiple tenants 2024-07-03 17:51:47 +02:00
Jonathan Kössler
30330937ce feat: wip for multiple tenants 2024-07-02 18:07:23 +02:00
Jonathan Kössler
7624208188 feat: wip for multiple tenants 2024-07-01 18:15:04 +02:00
Jonathan Kössler
6fabe1ae8c feat: wip for multiple tenants 2024-06-28 15:41:53 +02:00
Jonathan Kössler
3532f949a9 refactor: remove second trace setup 2024-06-26 18:15:51 +02:00
Jonathan Kössler
65cc1c9aad fix: improve error handling for tracing settings 2024-06-26 18:02:52 +02:00
Jonathan Kössler
2484a5e9f7 chore: bump pyinfra version 2024-06-17 13:53:42 +02:00
Julius Unverfehrt
88fe7383f3 Merge branch 'feature/RES-718-add-azure-monitoring' into 'master'
RES-718: add azure tracing

See merge request knecon/research/pyinfra!85
2024-06-17 12:25:09 +02:00
Jonathan Kössler
18a0ddc2d3 feat: add tracing settings to validator 2024-06-13 08:47:50 +02:00
Jonathan Kössler
5328e8de03 refactor: streamline tracing types 2024-06-12 10:41:52 +02:00
Jonathan Kössler
9661d75d8a refactor: update tracing info for Azure Monitor 2024-06-11 14:31:06 +02:00
Jonathan Kössler
7dbcdf1650 feat: add azure opentelemtry monitoring 2024-06-11 12:00:18 +02:00
Julius Unverfehrt
4536f9d35b Merge branch 'RES-671-multi-file-dl' into 'master'
feat: add multiple file download

See merge request knecon/research/pyinfra!84
2024-04-18 16:47:00 +02:00
Julius Unverfehrt
a1e7b3b565 build: add SBOM and increment package version 2024-04-18 16:39:46 +02:00
Julius Unverfehrt
b810449bba feat: add multiple file download
The download function is now overloaded and additionlly supports a
dict with file paths as values, in addition to the present string as
file path. The data is forwarded as dict of the same structure in the
first case.
2024-04-18 16:35:55 +02:00
Julius Unverfehrt
f67813702a Merge branch 'RED-8978-no-crash-on-non-existing-files' into 'master'
fix: add error handling for file not found error

See merge request knecon/research/pyinfra!83
2024-04-16 16:28:25 +02:00
Julius Unverfehrt
ed4f912acf build: increment service version 2024-04-16 16:21:57 +02:00
Julius Unverfehrt
021222475b fix: add error handling for file not found error
When a file couldn't be downloaded from storage, the queue consumer now
informs the operator with a log and rejects the message, without crashing
but continuing its honest work.
2024-04-16 16:20:08 +02:00
Julius Unverfehrt
876253b3fb tests: add test for file not found error 2024-04-16 16:19:45 +02:00
Julius Unverfehrt
1689cd762b fix(CI): fix CI 2024-01-31 12:03:07 +01:00
Julius Unverfehrt
dc413cea82 Merge branch 'opentel' into 'master'
RES-506, RES-507, RES-499, RES-434, RES-398

See merge request knecon/research/pyinfra!82
2024-01-31 11:21:17 +01:00
Julius Unverfehrt
bfb27383e4 fix(settings): change precedence to ENV ROOT_PATH > root_path arg 2024-01-31 10:24:29 +01:00
Julius Unverfehrt
af914ab3ae fix(argparse): automatically output settings path 2024-01-31 10:12:32 +01:00
Julius Unverfehrt
7093e01925 feat(opentelemetry): add webserver tracing to default pipeline 2024-01-31 09:09:13 +01:00
Julius Unverfehrt
88cfb2b1c1 fix(settings): add debug log 2024-01-30 14:52:35 +01:00
Julius Unverfehrt
c1301d287f fix(dependencies): move opentel deps to main since groups are not packaged with CI script 2024-01-30 14:31:08 +01:00
Julius Unverfehrt
f1b8e5a25f refac(arg parse): rename settings parsing fn for clarity 2024-01-30 13:27:19 +01:00
Julius Unverfehrt
fff5be2e50 feat(settings): improve config loading logic
Load settings from .toml files, .env and environment variables. Also ensures a ROOT_PATH environment variable is
set. If ROOT_PATH is not set and no root_path argument is passed, the current working directory is used as root.
Settings paths can be a single .toml file, a folder containing .toml files or a list of .toml files and folders.
If a folder is passed, all .toml files in the folder are loaded. If settings path is None, only .env and
environment variables are loaded. If settings_path are relative paths, they are joined with the root_path argument.
2024-01-30 12:56:58 +01:00
Julius Unverfehrt
ec9ab21198 package: increment major version and update kn-utils 2024-01-25 11:08:50 +01:00
Julius Unverfehrt
b2f073e0c5 refactor: IoC for callback, update readme 2024-01-25 10:41:48 +01:00
Julius Unverfehrt
f6f56b8d8c refactoy: simplify storage connection logic 2024-01-25 09:08:51 +01:00
Isaac Riley
8ff637d6ba chore: add opentelemetry subsection to README.md; formatting 2024-01-25 08:25:19 +01:00
Julius Unverfehrt
c18475a77d feat(opentelemetry): improve readability 2024-01-24 17:46:54 +01:00
Julius Unverfehrt
e0b32fa448 feat(opentelemetry): fastAPI tracing
The tests don't work yet since the webserver has to run in a thread and
the traces don't get exported from the thread with local json exporting.
However, with an export to an external server this should still work.
WIP
2024-01-24 15:52:42 +01:00
Julius Unverfehrt
da163897c4 feat(opentelemetry): add fastapi instumentation 2024-01-24 14:26:10 +01:00
Julius Unverfehrt
a415666830 feat(opentelemetry): put logic in own module 2024-01-24 14:00:11 +01:00
Julius Unverfehrt
739a7c0731 feat(opentelemetry): add queue instrumenting test 2024-01-24 13:26:01 +01:00
Isaac Riley
936bb4fe80 feat: add opentelemetry on top of newly refactored pyinfra 2024-01-24 08:09:42 +01:00
Julius Unverfehrt
725d6dce45 Update readme 2024-01-23 18:08:57 +01:00
Julius Unverfehrt
be602d8411 Adjust logs 2024-01-23 14:10:56 +01:00
Julius Unverfehrt
429a85b609 Disable automated tests until we found a way to rund docker compose before 2024-01-23 10:26:44 +01:00
Julius Unverfehrt
d6eeb65ccc Update scripts 2024-01-23 10:25:56 +01:00
Julius Unverfehrt
adfbd650e6 Add config tests, add type validation to config loading 2024-01-23 08:51:44 +01:00
Julius Unverfehrt
73eba97ede Add serving example
TODO: - update readme
      - check if logs are adequate
2024-01-19 14:53:06 +01:00
Julius Unverfehrt
8cd1d6b283 add retries to queue consuming, so we retray at least a bit if something happens. Eventually the container should crash though since there do exist unfixable problems sadly. 2024-01-19 14:15:00 +01:00
Julius Unverfehrt
87cbf89672 finnish config loading logic 2024-01-19 14:05:05 +01:00
Julius Unverfehrt
9c2f34e694 Put add health check in own function 2024-01-19 13:13:12 +01:00
Julius Unverfehrt
fbbfc553ae fix message encoding for response, rename some functions 2024-01-19 12:46:02 +01:00
Julius Unverfehrt
b7f860f36b WIP: add callback factory and update example scripts 2024-01-18 17:10:04 +01:00
Julius Unverfehrt
6802bf5960 refactor: download and upload file logic, module structure, remove redundant files so far 2024-01-18 15:54:38 +01:00
Julius Unverfehrt
ec5ad09fa8 refactor: multi tenant storage connection 2024-01-18 11:34:21 +01:00
Julius Unverfehrt
17c5eebdf6 finnish prometheus 2024-01-18 08:19:46 +01:00
Julius Unverfehrt
358e227251 fix prometheus tests WIP 2024-01-17 17:39:53 +01:00
Julius Unverfehrt
f31693d36a refactor: adapt prometheus monitoring logic to work with other webservers WIP 2024-01-16 17:24:53 +01:00
Julius Unverfehrt
e5c8a6e9f1 refactor: update storages with dynaconf logic, add validators, repair test 2024-01-16 15:34:56 +01:00
Julius Unverfehrt
27917863c9 refactor: finnish queue manager, queue manager tests, also add validation logic, integrate new settings 2024-01-16 14:35:23 +01:00
Julius Unverfehrt
ebc519ee0d refactor: finnish queue manager, queue manager tests, also add validation logic, integrate new settings 2024-01-16 14:16:27 +01:00
Julius Unverfehrt
b49645cce4 refactor: queue manager and config logic WIP 2024-01-15 16:46:33 +01:00
Julius Unverfehrt
64871bbb62 refactor: add basic queue manager test 2024-01-15 10:30:07 +01:00
Julius Unverfehrt
1f482f2476 fix: storage test 2024-01-09 16:07:48 +01:00
Francisco Schulz
8dfba74682 Merge branch 'RED-7958-logging-issues-of-python-services' into 'master'
Red 7958 logging issues of python services

See merge request knecon/research/pyinfra!81
2023-11-28 10:21:37 +01:00
francisco.schulz
570689ed9b increment version 2023-11-28 09:35:46 +01:00
francisco.schulz
5db56d8449 update CI template 2023-11-28 09:20:22 +01:00
francisco.schulz
3a9d34f9c0 add loglevel tests & fix broken exception and error log tests 2023-11-28 09:16:42 +01:00
francisco.schulz
3084d6338c update dependencies for kn-utils@0.2.4.dev112 2023-11-28 09:16:05 +01:00
Julius Unverfehrt
3a3a8e4ce1 Merge branch 'feature/version-upgrade-knutils-logging' into 'master'
Upgrade python version & change logger

See merge request knecon/research/pyinfra!80
2023-11-13 15:48:22 +01:00
Julius Unverfehrt
bb00c83a80 Upgrade python version & change logger
- Upgrades python version to 3.10 and sync packages with isaacs list.
- Changes loguru logger to kn_utlis logger.
- Overrides python version in CI script (temporarily until all services
  are updated and CI template can be adjusted).
2023-11-13 15:28:49 +01:00
Julius Unverfehrt
b297894505 Merge branch 'feature/stack-trace-for-exeptions' into 'master'
Add stacktrace to processing failures

See merge request knecon/research/pyinfra!79
2023-09-05 13:04:00 +02:00
Julius Unverfehrt
261b991049 Add stacktrace to processing failures
If a processing failure occures in the processing callback, pyinfra now
additionally to the exeption prints the stack trace.

Also removes knutils logging for now, since it still contains bugs and
it should be tested first in a non-production environment if
production-readiness is given.
2023-09-05 12:59:45 +02:00
Julius Unverfehrt
84c4e7601f Update kn-utils package
Update kn-utils for missing loglevels fix, which is needed for queue
manager error logging.
2023-08-30 15:58:29 +02:00
Julius Unverfehrt
201ed5b9a8 Merge branch 'feature/RED-6685-support-absolute-paths' into 'master'
Add support for absolute file paths

See merge request knecon/research/pyinfra!77
2023-08-23 14:11:46 +02:00
Julius Unverfehrt
72547201f3 Adjust log levels to reduce log clutter
Also updates readme and adds pytest execution to CI script.
2023-08-23 12:38:34 +02:00
Julius Unverfehrt
c09476cfae Update tests
All components from payload processing downwards are tested.

Tests that depend on docker compose have been disabled by default
because they take too long to use during development. Furthermore, the
queue manager tests are not stable, a refactoring with inversion of
control is urgently needed to make the components properly testable. The
storage tests are stable and should be run once before releasing, this
should be implemented via the CI script.

Also adds, if present, tenant Id and operation kwargs to storage and
queue response.
2023-08-22 17:33:22 +02:00
Julius Unverfehrt
e580a66347 Refactor storage provider & payload parser
Applies strategy pattern to payload parsing logic to improve
maintainability and testability.
Renames storage manager to storage provider.
2023-08-22 10:46:27 +02:00
Julius Unverfehrt
294688ea66 RED-7002 Forward exceptions from thread context
PyInfra now reports exceptions that happen inside the processing
callback.
Also refactors queue manager logging to fit new logger by
changing "%s", var logic to f string, since this syntax is not supported
with knutlis logging.
2023-08-22 10:46:27 +02:00
Julius Unverfehrt
7187f0ec0c RES-343 Update logging to knutils logger 2023-08-22 10:46:14 +02:00
Julius Unverfehrt
ef916ee790 Refactor payload processing logic
Streamlines payload processor class by encapsulating closely dependent
logic, to improve readability and maintainability.
2023-08-18 12:49:21 +02:00
Julius Unverfehrt
48d74b4307 Add support for absolute file paths
Introduces new payload parsing logic to be able to process absolute file
paths. The queue message is expected to contain the keys
"targetFilePath" and "responseFilePath".

To ensure backward-compatibility, the legacy "dossierId", "fileId"
messages are still supported.
2023-08-18 12:45:53 +02:00
Francisco Schulz
692ff204c3 Merge branch 'bugfix/RES-269' into 'master'
Bugfix/res 269

See merge request knecon/research/pyinfra!75
2023-08-17 09:55:27 +02:00
Francisco Schulz
03eddadcb9 update template 2023-08-17 09:48:35 +02:00
francisco.schulz
daddec7dc3 increment version 2023-07-18 16:59:50 +02:00
francisco.schulz
370e978fa7 upgrade dependencies, allow python>=3.8 2023-07-18 16:54:29 +02:00
Julius Unverfehrt
366d040ceb Merge branch 'RES-201-red-research-services-investigate-why-k-8-s-startup-probes-are-not-starting' into 'master'
RES-201 red research services investigate why k 8 s startup probes are not starting

See merge request knecon/research/pyinfra!74
2023-06-26 13:57:25 +02:00
francisco.schulz
9598b963ee remove dist/* files 2023-06-21 15:28:12 +02:00
francisco.schulz
2bacc4d971 update dependencies 2023-06-21 14:13:48 +02:00
francisco.schulz
d228c0a891 temporarily disable tests 2023-06-21 08:12:20 +02:00
francisco.schulz
4e6b4e2969 update dependencies 2023-06-20 17:13:26 +02:00
francisco.schulz
892b6e8236 use template CI 2023-06-20 17:13:08 +02:00
francisco.schulz
d63435e092 change k8s startup probe script to function call 2023-06-20 17:04:03 +02:00
Julius Unverfehrt
7e995bd78b Merge branch 'RES-196-red-hotfix-persistent-service-address' into 'master'
Fix: New tenant storage information endpoint

See merge request knecon/research/pyinfra!73
2023-06-15 16:29:38 +02:00
Julius Unverfehrt
c4e03d4641 Fix: New tenant storage information endpoint
Parametrize tenant enpoint and publick decryption key as environment
variable and set the default value to new endpoint.
2023-06-15 16:22:30 +02:00
Francisco Schulz
233b546f6f Merge branch 'update-azure-dependencies' into 'master'
update azure dependencies

See merge request knecon/research/pyinfra!72
2023-05-16 14:47:09 +02:00
francisco.schulz
5ed41a392a update version number 2023-05-16 14:18:46 +02:00
francisco.schulz
4a0c59b070 update deps 2023-05-16 13:42:53 +02:00
Christoph Schabert
e67ebc27b1 Merge branch 'RES-109-add-gitlab-ci' into 'master'
RES-109: add gitlab ci

See merge request knecon/research/pyinfra!71
2023-04-20 09:43:36 +02:00
francisco.schulz
309119cb62 update version 2023-04-18 15:48:50 +02:00
francisco.schulz
a381ac6b87 temp diable tests 2023-04-18 15:38:32 +02:00
francisco.schulz
6d49f0ccb9 add CI 2023-04-18 15:37:19 +02:00
Francisco Schulz
873abdca0c remove redundant files 2023-04-18 10:28:08 +02:00
Francisco Schulz
decd3710ab remove bamboo-spec 2023-04-18 10:18:35 +02:00
Julius Unverfehrt
d838413500 Pull request #70: Bugfix/RED-6273 forward processing kwargs
Merge in RR/pyinfra from bugfix/RED-6273-forward-processing-kwargs to master

Squashed commit of the following:

commit 2f45f7329dc6fd6166e08bad720e022e722737ad
Merge: 0a6d5df 0f4646e
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Tue Mar 28 17:55:24 2023 +0200

    Merge branch 'master' of ssh://git.iqser.com:2222/rr/pyinfra into bugfix/RED-6273-forward-processing-kwargs

commit 0a6d5dfc1a6edd8e6d171b50334b812a79f9288d
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Tue Mar 28 17:51:05 2023 +0200

    update pyinfra version

commit cd417c4b515d2a5d190af883af770bc660e15bb8
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Tue Mar 28 17:48:12 2023 +0200

    Revert poetry update

    - adds strange rust dependency for some reason
2023-03-28 17:57:20 +02:00
Julius Unverfehrt
0f4646e390 Pull request #69: fix monitoring preventing operation kwargs for processing fn getting forwarded
Merge in RR/pyinfra from bugfix/RED-6273-forward-operation-kwargs to master

Squashed commit of the following:

commit 347add07f8ea6e085064660ae79f0df9013dd9d6
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Tue Mar 28 17:16:41 2023 +0200

    update pyinfra version

commit 3c17047377aca666a015eaf0f06190d3dfa28c1c
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Tue Mar 28 17:13:59 2023 +0200

    fix monitoring preventing operation kwargs for processing fn getting forwarded
2023-03-28 17:17:09 +02:00
Julius Unverfehrt
793a427c50 Pull request #68: RED-6273 multi tenant storage
Merge in RR/pyinfra from RED-6273-multi-tenant-storage to master

Squashed commit of the following:

commit 0fead1f8b59c9187330879b4e48d48355885c27c
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Tue Mar 28 15:02:22 2023 +0200

    fix typos

commit 892a803726946876f8b8cd7905a0e73c419b2fb1
Author: Matthias Bisping <matthias.bisping@axbit.com>
Date:   Tue Mar 28 14:41:49 2023 +0200

    Refactoring

    Replace custom storage caching logic with LRU decorator

commit eafcd90260731e3360ce960571f07dee8f521327
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Fri Mar 24 12:50:13 2023 +0100

    fix bug in storage connection from endpoint

commit d0c9fb5b7d1c55ae2f90e8faa1efec9f7587c26a
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Fri Mar 24 11:49:34 2023 +0100

    add logs to PayloadProcessor

    - set log messages to determine if x-tenant
    storage connection is working

commit 97309fe58037b90469cf7a3de342d4749a0edfde
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Fri Mar 24 10:41:59 2023 +0100

    update PayloadProcessor

    - introduce storage cache to make every unique
    storage connection only once
    - add functionality to pass optional processing
    kwargs in queue message like the operation key to
    the processing function

commit d48e8108fdc0d463c89aaa0d672061ab7dca83a0
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Wed Mar 22 13:34:43 2023 +0100

    add multi-tenant storage connection 1st iteration

    - forward x-tenant-id from queue message header to
    payload processor
    - add functions to receive storage infos from an
    endpoint or the config. This enables hashing and
    caching of connections created from these infos
    - add function to initialize storage connections
    from storage infos
    - streamline and refactor tests to make them more
    readable and robust and to make it easier to add
     new tests
    - update payload processor with first iteration
    of multi tenancy storage connection support
    with connection caching and backwards compability

commit 52c047c47b98e62d0b834a9b9b6c0e2bb0db41e5
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Tue Mar 21 15:35:57 2023 +0100

    add AES/GCM cipher functions

    - decrypt x-tenant storage connection strings
2023-03-28 15:04:14 +02:00
Julius Unverfehrt
0f24a7f26d Pull request #67: fix prometheus address
Merge in RR/pyinfra from bugfix/RED-6205-prometheus-port to master

Squashed commit of the following:

commit e97d81bebfe34c24d8da4e4392ff7dbd3638e685
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Tue Mar 21 15:48:04 2023 +0100

    increase package version

commit c7e181a462e275c5f2cbf1e6df4c88dfefbe36b7
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Tue Mar 21 15:43:46 2023 +0100

    fix prometheus address

    - change loopback address to all available network interfaces to enable
    external metric scraping
    - disable ENV input for prometheus address and port since they should
    not be set in HELM
2023-03-21 15:54:47 +01:00
Julius Unverfehrt
ff6f437e84 Pull request #66: add safety mesasure for monitoring in case a service didn't find any results.
Merge in RR/pyinfra from add-safety-measure to master

* commit 'b985679d6b30b3a983c7b1df5fb23eef0dc95cd3':
  add safety mesasure for monitoring in case a service didn't find any results.
2023-03-16 17:29:12 +01:00
Julius Unverfehrt
b985679d6b add safety mesasure for monitoring in case a service didn't find any results. 2023-03-16 17:27:33 +01:00
Julius Unverfehrt
d6de45d783 Pull request #65: RED-6205 monitoring
Merge in RR/pyinfra from RED-6205-monitoring to master

Squashed commit of the following:

commit 529cedfd7c065a3f7364e4596b923f25f0af76b5
Author: Matthias Bisping <matthias.bisping@axbit.com>
Date:   Thu Mar 16 14:57:26 2023 +0100

    Remove unnecessary default argument to dict.get

commit b718531f568e89df77cc05039e5e7afe7111b9a4
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Thu Mar 16 14:56:50 2023 +0100

    refactor

commit c039b0c25a6cd2ad2a72d237d0930c484c8e427c
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Thu Mar 16 13:22:17 2023 +0100

    increase package version to reflect the recent changes

commit 0a983a4113f25cd692b68869e1f33ffbf7efc6f0
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Thu Mar 16 13:16:39 2023 +0100

    remove processing result conversion to a list, since ner-predicion service actually returns a dictionary. It is now expected that the result is sized to perform the monitoring and json dumpable to upload it.

commit 541bf321410471dc09a354669b2778402286c09f
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Thu Mar 16 12:48:07 2023 +0100

    remove no longer needed requirements

commit cfa182985d989a5b92a9a069a603daee72f37d49
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Thu Mar 16 11:14:58 2023 +0100

    refactor payload formatting

    - introduce PayloadFormatter class for better typehinting and bundling
    of functionality
    - parametrize payload formatting so the PayloadProcesser can adapt
    better to differnt services/products
    - move file extension parsing to its own module

commit f57663b86954b7164eeb6db013d862af88ec4584
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Wed Mar 15 12:22:08 2023 +0100

    refactor payload parsing

    - introduce QueueMessagePayloadParser for generality
    and typehinting
    - refactor file extension parsing algorithm

commit 713fb4a0dddecf5442ceda3988444d9887869dcf
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Tue Mar 14 17:07:02 2023 +0100

    fix tests

commit a22ecf7ae93bc0bec235fba3fd9cbf6c1778aa13
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Tue Mar 14 16:31:26 2023 +0100

    refactor payload parsing

    - parameterize file and compression types allowed for files to download
    and upload via config
    - make a real value bag out of QueueMessagePayload and do the parsing
    beforehand
    - refector file extension parser to be more robust

commit 50b578d054ca47a94c907f5f8b585eca7ed626ac
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Tue Mar 14 13:21:32 2023 +0100

    add monitoring

    - add an optional prometheus monitor to monitor the average processing
    time of a service per relevent paramater that is at this point defined
    via the number of resulting elements.

commit de525e7fa2f846f7fde5b9a4b466039238da10cd
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Tue Mar 14 12:57:24 2023 +0100

    fix bug in file extension parser not working if the file endings have prefixes
2023-03-16 16:08:44 +01:00
Christoph Schabert
564c429834 Pull request #64: update java version for sonar-scan
Merge in RR/pyinfra from cschabert/PlanSpecjava-1678717832322 to master

Squashed commit of the following:

commit 3ae2b191e777739738d91d114c376ac78efa193f
Author: Christoph Schabert <christoph.schabert@iqser.com>
Date:   Tue Mar 14 08:36:54 2023 +0100

    PlanSpec.java edited online with Bitbucket

commit 2aa012242c77958701ca7b3400ed4b3272cd7d95
Author: Christoph Schabert <christoph.schabert@iqser.com>
Date:   Tue Mar 14 08:34:40 2023 +0100

    sonar-scan.sh edited online with Bitbucket

commit 2dd8c21229f40f4972b632702c4bcf4ad71bf7ae
Author: Christoph Schabert <christoph.schabert@iqser.com>
Date:   Tue Mar 14 08:33:50 2023 +0100

    sonar-scan.sh edited online with Bitbucket

commit 8837c31d664a7cb913ac538c9403871352b014a3
Author: Christoph Schabert <christoph.schabert@iqser.com>
Date:   Tue Mar 14 08:33:17 2023 +0100

    sonar-scan.sh edited online with Bitbucket

commit 0de23c519fcbb9f991a85389fe1644af4256266b
Author: Christoph Schabert <christoph.schabert@iqser.com>
Date:   Tue Mar 14 08:28:00 2023 +0100

    config-keys.sh edited online with Bitbucket

commit 4f971967e5055e368bc3c779f7f400bbf9b86a42
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Tue Mar 14 08:22:17 2023 +0100

    update bamboo agent username

commit 37fa1bbf9f83ec3d242a32e2051b6f1615102307
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Tue Mar 14 08:08:46 2023 +0100

    remove venv install

commit 44180f403ac8a5b1b33090081c45e30121dbae8d
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Tue Mar 14 08:07:13 2023 +0100

    add venv install

commit eac141bf8f430af3f7406a89df5147cd93231278
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Tue Mar 14 08:05:51 2023 +0100

    add venv install

commit 24b37f9f83db20e90d3bd528f4111f524b7485c5
Author: Christoph Schabert <christoph.schabert@iqser.com>
Date:   Mon Mar 13 15:47:03 2023 +0100

    Set new image for Sonar Scan

commit b734389316f60b2fdbe4bdcdf00d1f2f14e61266
Author: Christoph Schabert <christoph.schabert@iqser.com>
Date:   Mon Mar 13 15:30:45 2023 +0100

    update java version for sonar-scan
2023-03-14 08:39:41 +01:00
Julius Unverfehrt
3c4739ad8b Pull request #63: RED-6366 refactor
Merge in RR/pyinfra from RED-6366-refactor to master

Squashed commit of the following:

commit 8807cda514b5cc24b1be208173283275d87dcb97
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Fri Mar 10 13:15:15 2023 +0100

    enable docker-compose autouse for automatic tests

commit c4579581d3e9a885ef387ee97f3f3a5cf4731193
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Fri Mar 10 12:35:49 2023 +0100

    black

commit ac2b754c5624ef37ce310fce7196c9ea11bbca03
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Fri Mar 10 12:30:23 2023 +0100

    refactor storage url parsing

    - move parsing and validation to config where the connection url is
    actually read in
    - improve readability of parsing fn

commit 371802cc10b6d946c4939ff6839571002a2cb9f4
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Fri Mar 10 10:48:00 2023 +0100

    refactor

commit e8c381c29deebf663e665920752c2965d7abce16
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Fri Mar 10 09:57:34 2023 +0100

    rename

commit c8628a509316a651960dfa806d5fe6aacb7a91c1
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Fri Mar 10 09:37:01 2023 +0100

    renaming and refactoring

commit 4974d4f56fd73bc55bd76aa7a9bbb16babee19f4
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Fri Mar 10 08:53:09 2023 +0100

    refactor payload processor

    - limit make_uploader and make_downloader cache
    - partially apply them when the class is initialized with storage and
    bucket to make the logic and behaviour more comprehensive
    - renaming functional pipeline steps to be more expressive

commit f8d51bfcad2b815c8293ab27dd66b256255c5414
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Thu Mar 9 15:30:32 2023 +0100

    remove monitor and rename Payload

commit 412ddaa207a08aff1229d7acd5d95402ac8cd578
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Thu Mar 2 10:15:39 2023 +0100

    remove azure connection string and disable respective test for now for security reasons

commit 7922a2d9d325f3b9008ad4e3e56b241ba179f52c
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Wed Mar 1 13:30:58 2023 +0100

    make payload formatting function names more expressive

commit 7517e544b0f5a434579cc9bada3a37e7ac04059f
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Wed Mar 1 13:24:57 2023 +0100

    add some type hints

commit 095410d3009f2dcbd374680dd0f7b55de94c9e76
Author: Matthias Bisping <matthias.bisping@axbit.com>
Date:   Wed Mar 1 10:54:58 2023 +0100

    Refactoring

    - Renaming
    - Docstring adjustments

commit e992f0715fc2636eb13eb5ffc4de0bcc5d433fc8
Author: Matthias Bisping <matthias.bisping@axbit.com>
Date:   Wed Mar 1 09:43:26 2023 +0100

    Re-wording and typo fixes

commit 3c2d698f9bf980bc4b378a44dc20c2badc407b3e
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Tue Feb 28 14:59:59 2023 +0100

    enable auto startup for docker compose in tests

commit 55773b4fb0b624ca4745e5b8aeafa6f6a0ae6436
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Tue Feb 28 14:59:37 2023 +0100

    Extended tests for queue manager

commit 14f7f943f60b9bfb9fe77fa3cef99a1e7d094333
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Tue Feb 28 13:39:00 2023 +0100

    enable auto startup for docker compose in tests

commit 7caf354491c84c6e0b0e09ad4d41cb5dfbfdb225
Merge: 49d47ba d0277b8
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Tue Feb 28 13:32:52 2023 +0100

    Merge branch 'RED-6205-prometheus' of ssh://git.iqser.com:2222/rr/pyinfra into RED-6205-prometheus

commit 49d47baba8ccf11dee48a4c1cbddc3bbd12471e5
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Tue Feb 28 13:32:42 2023 +0100

    adjust Payload Processor signature

commit d0277b86bc54994b6032774bf0ec2d7b19d7f517
Merge: 5184a18 f6b35d6
Author: Christoph Schabert <christoph.schabert@iqser.com>
Date:   Tue Feb 28 11:07:16 2023 +0100

    Pull request #61: Change Sec Trigger to PR

    Merge in RR/pyinfra from cschabert/PlanSpecjava-1677578703647 to RED-6205-prometheus

    * commit 'f6b35d648c88ddbce1856445c3b887bce669265c':
      Change Sec Trigger to PR

commit f6b35d648c88ddbce1856445c3b887bce669265c
Author: Christoph Schabert <christoph.schabert@iqser.com>
Date:   Tue Feb 28 11:05:13 2023 +0100

    Change Sec Trigger to PR

... and 20 more commits
2023-03-13 15:11:25 +01:00
Julius Unverfehrt
46157031b5 Pull request #59: adjust response headers
Merge in RR/pyinfra from RED-6118-multi-tenancy-patch to master

Squashed commit of the following:

commit 02e471622e59baf5d2bb5c61980cea43ca1c6d61
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Thu Feb 16 16:36:19 2023 +0100

    move acknowledgment function to outer scope

commit f9efffd8e6d90d5e371c66574b1afe361a1da146
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Thu Feb 16 16:04:07 2023 +0100

    adjust response headers

    - change response formatting: only forward the
    request message headers instead of all properties
    - adjust build script to only increase patch
    version on master push
2023-02-16 16:37:57 +01:00
Julius Unverfehrt
c97ae3d2c2 Pull request #56: RED-6118 multi tenancy
Merge in RR/pyinfra from RED-6118-multi-tenancy to master

Squashed commit of the following:

commit 0a1301f9d7a12a1097e6bf9a1bb0a94025312d0a
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Thu Feb 16 09:12:54 2023 +0100

    delete (for now) not needed exception module

commit 9b624f9c95c129bf186eaea8405a14d359ccb1ae
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Thu Feb 16 09:08:57 2023 +0100

    implement message properties forwarding

    - revert tenant validation logic since this functionality is not wanted
    - implement request message properties forwarding to response message.
    Thus, all message headers including x-tenant-id are present in the
    reponse.

commit ddac812d32eeec09d9434c32595875eb354767f8
Merge: ed4b495 6828c65
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Wed Feb 15 17:00:54 2023 +0100

    Merge branch 'master' of ssh://git.iqser.com:2222/rr/pyinfra into RED-6118-multi-tenancy

commit ed4b4956c6cb6d201fc29b0318078dfb8fa99006
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Wed Feb 15 10:00:28 2023 +0100

    refactor

commit 970fd72aa73ace97d36f129031fb143209c5076b
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Tue Feb 14 17:22:54 2023 +0100

    RED-6118 make pyinfra multi-tenant ready

    - refactor message validation logic
    - add tenant validation step:
    	- messages without header/tenant id are accepted for now, until
    	  multi-tenancy is implemented in backend
    	- only valid tenant is 'redaction'

commit 0f04e799620e01b3346eeaf86f3e941830824202
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Tue Feb 14 15:42:28 2023 +0100

    add dev scripts

    - add scripts to ease pyinfra development by allowing to run pyinfra
    locally with callback mock and publishing script.
2023-02-16 09:44:43 +01:00
Francisco Schulz
6828c65396 Pull request #58: fix version conflict
Merge in RR/pyinfra from hotfix/version-conflict-during-build to master

* commit '73bfef686782112d448469ec14d84cab5965f318':
  increment version
2023-02-15 16:14:55 +01:00
Francisco Schulz
73bfef6867 increment version 2023-02-15 16:12:22 +01:00
Francisco Schulz
1af171bd3f Pull request #57: Bugfix/RED-5277 investigate missing heartbeat error
Merge in RR/pyinfra from bugfix/RED-5277-investigate-missing-heartbeat-error to master

Squashed commit of the following:

commit 9e139e79e46c52014986f9afb2c6534281b55c10
Author: Viktor Seifert <viktor.seifert@iqser.com>
Date:   Wed Feb 15 14:56:44 2023 +0100

    RED-5277: Moved async processing to its own functions

commit 244a941299dbf75b254adcad8b068b2917c6bf79
Author: Francisco Schulz <Francisco.Schulz@iqser.com>
Date:   Wed Feb 15 11:26:00 2023 +0100

    Revert "only set git tag on release and master branches"

    This reverts commit 9066856d223f0646723fa1c62c444e16a9bb3ce9.

commit adb35db6fa6daf4b79263a918716c34905e8b3bc
Author: Francisco Schulz <Francisco.Schulz@iqser.com>
Date:   Wed Feb 15 11:11:07 2023 +0100

    increment version

commit 9066856d223f0646723fa1c62c444e16a9bb3ce9
Author: Francisco Schulz <Francisco.Schulz@iqser.com>
Date:   Wed Feb 15 11:10:49 2023 +0100

    only set git tag on release and master branches

commit ee11e018efdbc63a740008e7fa2415cbb12476ae
Author: Francisco Schulz <Francisco.Schulz@iqser.com>
Date:   Wed Feb 15 10:18:08 2023 +0100

    configure root logger in `__init__.py`
    only set log levels for other loggers, inherit config

commit 776399912ddf1e936138cceb2af981f27d333823
Author: Francisco Schulz <Francisco.Schulz@iqser.com>
Date:   Wed Feb 15 10:16:57 2023 +0100

    update dependency via `poetry update`

commit 804a8d9fbd1ded3e154fe9b3cafa32428522ca0f
Author: Francisco Schulz <Francisco.Schulz@iqser.com>
Date:   Wed Feb 15 10:16:25 2023 +0100

    increment version

commit cf057daed23d5f5b0f6f3a1a31e956e015e86368
Author: Francisco Schulz <Francisco.Schulz@iqser.com>
Date:   Tue Feb 14 17:59:55 2023 +0100

    update

commit 51717d85fce592b8bf38a8b5235faa04379cce1a
Author: Francisco Schulz <Francisco.Schulz@iqser.com>
Date:   Tue Feb 14 17:48:51 2023 +0100

    define sonar source

commit ace57c211a61d8e473a700da161806f882b19dc6
Author: Francisco Schulz <Francisco.Schulz@iqser.com>
Date:   Tue Feb 14 17:46:24 2023 +0100

    update plan

commit 1fcc00eb18ed692e2646873b4a233a00b5f6d93b
Author: Francisco Schulz <Francisco.Schulz@iqser.com>
Date:   Tue Feb 14 17:46:13 2023 +0100

    fix typo

commit 20b59768a68d985e1bf2fe6f93a1e6283bac5cb0
Author: Francisco Schulz <Francisco.Schulz@iqser.com>
Date:   Tue Feb 14 17:43:39 2023 +0100

    increment version

commit 8e7b4bf302b5591b2c490ad89c8a01a87c5b4741
Author: Francisco Schulz <Francisco.Schulz@iqser.com>
Date:   Tue Feb 14 17:11:59 2023 +0100

    get rid of extra logger

commit 3fd3eb255c252d1e208b88b475ec8a07c521619d
Author: Francisco Schulz <Francisco.Schulz@iqser.com>
Date:   Tue Feb 14 16:45:56 2023 +0100

    increment version

commit b0b5e5ebd94554cdafed6cff333d73a9ba08bea1
Author: Francisco Schulz <Francisco.Schulz@iqser.com>
Date:   Tue Feb 14 16:40:22 2023 +0100

    update

commit b87b3c351722d6949833c397178bc0354c754d90
Author: Francisco Schulz <Francisco.Schulz@iqser.com>
Date:   Tue Feb 14 16:38:41 2023 +0100

    fix tag issue from build

commit 73f3dcb280b6f905eeef3c69123b1252e6c934b1
Author: Francisco Schulz <Francisco.Schulz@iqser.com>
Date:   Tue Feb 14 14:21:57 2023 +0100

    add comments & update logging

commit 72a9e2c51f5bf98fc9f0803183fc1d28aaea9e35
Author: Francisco Schulz <Francisco.Schulz@iqser.com>
Date:   Tue Feb 14 12:06:09 2023 +0100

    cleanup comments

commit 587814944921f0f148e4d3c4c76d4edffff55bba
Author: Francisco Schulz <Francisco.Schulz@iqser.com>
Date:   Tue Feb 14 11:16:17 2023 +0100

    use thread executor in a `with` statement

commit 9561a6b447d98d2f0d536f63c0946d7bf1e2ca7d
Author: Francisco Schulz <Francisco.Schulz@iqser.com>
Date:   Tue Feb 14 10:42:49 2023 +0100

    fix unbound issue `callback_result` & shutdown thread executor

... and 23 more commits
2023-02-15 16:02:17 +01:00
Francisco Schulz
61efbdaffd Pull request #55: add master to non-dev branches
Merge in RR/pyinfra from bugfix/add-master-to-non-dev-branches to master

* commit 'c94604cc666ec4a9d3803c949f228cbf4291aaf2':
  add master to non-dev branches
2022-11-16 10:44:43 +01:00
Francisco Schulz
c94604cc66 add master to non-dev branches 2022-11-16 10:35:33 +01:00
Francisco Schulz
edbe5fa4f0 Pull request #54: Feature/MLOPS-32 update pyinfra to use pypoetry.toml
Merge in RR/pyinfra from feature/MLOPS-32-update-pyinfra-to-use-pypoetry.toml to master

* commit '37d8ee49a22ab9ee81792217404ed0a7daea65c2': (34 commits)
  add convenience command for version updates
  testing version is ahead in project
  test equal version number
  echo latest git version tag
  update tag fetching
  rollback
  testing hardcoded
  remove specific planRepository
  remove parentheses
  change project key
  add planRepositories config
  fix typo: licence -> license
  ignore bamboo YAML configs
  switch back to bamboo Java config
  update version tag manually
  remove superfulous `then`
  isolate feature/bugfix/hotfix and dev tag setting
  fix script `echo` was missing
  add version update shortcut
  show pyproject.toml file
  ...
2022-11-15 16:03:59 +01:00
Francisco Schulz
37d8ee49a2 add convenience command for version updates 2022-11-15 15:56:54 +01:00
Francisco Schulz
7732e884c5 testing version is ahead in project 2022-11-15 15:56:35 +01:00
Francisco Schulz
40e516b4e8 test equal version number 2022-11-15 15:47:32 +01:00
Francisco Schulz
c8c0210945 echo latest git version tag 2022-11-15 15:41:12 +01:00
Francisco Schulz
280b14b4a0 update tag fetching 2022-11-15 15:13:19 +01:00
Kevin Tumma
203c0f669c rollback 2022-11-15 14:14:12 +01:00
Kevin Tumma
8227e18580 testing hardcoded 2022-11-15 14:07:23 +01:00
Francisco Schulz
73bb38f917 remove specific planRepository 2022-11-15 13:18:25 +01:00
Francisco Schulz
fa76003983 remove parentheses 2022-11-15 11:40:16 +01:00
Francisco Schulz
19540c7c08 change project key 2022-11-15 11:39:08 +01:00
Francisco Schulz
2fe4a75a57 add planRepositories config 2022-11-15 11:37:40 +01:00
Francisco Schulz
b5deb7b292 fix typo: licence -> license 2022-11-15 09:28:41 +01:00
Francisco Schulz
5cd30c08b3 ignore bamboo YAML configs 2022-11-15 09:00:46 +01:00
Francisco Schulz
17cbbeb620 switch back to bamboo Java config 2022-11-15 08:59:47 +01:00
Francisco Schulz
9c4cf3d220 update version tag manually 2022-11-14 17:04:30 +01:00
Francisco Schulz
3ccb1d2370 remove superfulous then 2022-11-14 16:26:19 +01:00
Francisco Schulz
398b1c271f isolate feature/bugfix/hotfix and dev tag setting 2022-11-14 16:20:16 +01:00
Francisco Schulz
05658784be fix script
`echo` was missing
2022-11-14 15:49:44 +01:00
Francisco Schulz
974df96bb9 add version update shortcut 2022-11-14 15:49:17 +01:00
Francisco Schulz
ca3f812527 show pyproject.toml file 2022-11-14 15:38:10 +01:00
Francisco Schulz
d78e6c45fb separate git-tag into own stage 2022-11-14 15:37:56 +01:00
Francisco Schulz
41220d3c80 increment version 2022-11-14 10:39:36 +01:00
Francisco Schulz
2d2e72c86e remove redundancies 2022-11-14 10:39:30 +01:00
Francisco Schulz
37b0280ab6 update tag logic 2022-11-10 17:38:18 +01:00
Francisco Schulz
4bd6ee867f fix circumflex formatting 2022-11-10 16:52:11 +01:00
Francisco Schulz
b0efed4007 update regex 2022-11-10 16:45:32 +01:00
Francisco Schulz
28ee14e92f echo bamboo vars 2022-11-10 16:04:38 +01:00
Francisco Schulz
fb3d4b5fc9 add sonar config 2022-11-10 15:25:44 +01:00
Francisco Schulz
84351fd75c fix formatting issue 2022-11-10 13:24:51 +01:00
Francisco Schulz
244aaec470 use inline config-keys script opposed to file 2022-11-10 13:21:38 +01:00
Francisco Schulz
18d614f61c use bamboo config YAML 2022-11-10 13:17:01 +01:00
Francisco Schulz
7472939f21 ignore checks for bamboo.yml
otherwise check-yaml throws multi-file exception
2022-11-10 13:16:54 +01:00
Francisco Schulz
a819e60632 update 2022-11-10 13:09:09 +01:00
Francisco Schulz
05d5582479 convert into python package
- remove build specs
- move pytest.ini into pyproject.toml
- update readme
- add pre-commit config
- run formatters
- add Makefile
2022-11-03 16:10:12 +01:00
Francisco Schulz
64d6a8cec6 Pull request #53: Feature/MLOPS-23 pyinfra does not use the value of the storage azurecontainername environment
Merge in RR/pyinfra from feature/MLOPS-23-pyinfra-does-not-use-the-value-of-the-storage_azurecontainername-environment to master

* commit '7a740403bb65db97c8e4cb54de00aac3536b2e4c':
  update
  update test config
  update test config
  add pytests to check if a configured bucket can be found
  add submodule initialization
  load different env vars for the  variable depending on the set
2022-10-13 15:09:53 +02:00
Francisco Schulz
7a740403bb update 2022-10-13 14:04:00 +02:00
Francisco Schulz
b2cd529519 update test config 2022-10-13 13:53:53 +02:00
Francisco Schulz
1891519e19 update test config 2022-10-13 13:17:31 +02:00
Francisco Schulz
843d91c61a add pytests to check if a configured bucket can be found 2022-10-13 11:26:58 +02:00
Francisco Schulz
5b948fdcc5 add submodule initialization 2022-10-13 11:01:29 +02:00
Francisco Schulz
bb5b73e189 load different env vars for the variable depending on the set 2022-10-13 10:29:57 +02:00
Viktor Seifert
94beb544fa Pull request #52: RED-5324: Added missing storage-region to key to config and to minio-client creation so that storage access works on s3
Merge in RR/pyinfra from RED-5324 to master

* commit 'ffaa4a668b447fe3f4708d99ec6fccec14f85693':
  RED-5324: Added missing storage-region to key to config and to minio-client creation so that storage access works on s3
2022-09-30 15:15:47 +02:00
Viktor Seifert
ffaa4a668b RED-5324: Added missing storage-region to key to config and to minio-client creation so that storage access works on s3 2022-09-30 15:12:58 +02:00
Julius Unverfehrt
88b4c5c7ce Pull request #51: RED-5009 pyinfra now truly rejects messages that couldn't be processed by the callback
Merge in RR/pyinfra from RED-5009-fix-ack-bug to master

Squashed commit of the following:

commit 7b00edf6fe1167345e774d658fcd2e60c01d05d5
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Wed Aug 24 14:52:57 2022 +0200

    RED-5009 pyinfra now truly rejects messages that couldn't be processed by the callback (e.g. inobtainable storage file)
2022-08-24 15:00:00 +02:00
Viktor Seifert
71ad2af4eb Pull request #50: RED-5009: Changed callback to not process redelivered messages to prevent endless retries
Merge in RR/pyinfra from RED-5009 to master

Squashed commit of the following:

commit 1f8114379bdeb3af8640c71c2edde2a672bb358c
Author: Viktor Seifert <viktor.seifert@iqser.com>
Date:   Mon Aug 22 16:55:04 2022 +0200

    RED-5009: Added the possibility for a callback to signal that a message should be declined/dead-lettered

commit be674c2915f6f149c581bc2fe2783217fe424df8
Author: Viktor Seifert <viktor.seifert@iqser.com>
Date:   Fri Aug 19 16:26:38 2022 +0200

    RED-5009: Changed callback to not process redelivered messages to prevent endless retries
2022-08-23 10:22:13 +02:00
Julius Unverfehrt
be82114f83 Pull request #49: Add exists to storage
Merge in RR/pyinfra from add-exists-to-storage to master

Squashed commit of the following:

commit 48d5e1c9e103702bfebfc115e811576514e115c3
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Fri Aug 12 13:32:40 2022 +0200

    refactor

commit 711d2c8dbf7c78e26133e3ea3a57670fe829059b
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Fri Aug 12 11:45:42 2022 +0200

    add method to check if objects exists for azure and s3
2022-08-12 13:35:12 +02:00
Viktor Seifert
0f6512df54 Pull request #48: RED-4653: Changed token-file path to the temp dir
Merge in RR/pyinfra from RED-4653 to master

* commit '8b050fe9b16cbea37b4becf7de54b25a9a4dbf63':
  RED-4653: Changed token-file path to the temp dir
2022-08-02 10:52:42 +02:00
Viktor Seifert
8b050fe9b1 RED-4653: Changed token-file path to the temp dir 2022-08-02 10:44:56 +02:00
Viktor Seifert
046f26d0e9 Pull request #47: RED-4653
Merge in RR/pyinfra from RED-4653 to master

* commit '7e2cb20040a6be7510f5b06b0a522c1b044d5ee3':
  RED-4653: Corrected if-operator
  RED-4653: Added value to config to prevent writing the token as a default since that is only useful in a container
  RED-4653: Implemented a startup probe for k8s
2022-08-02 09:59:35 +02:00
Viktor Seifert
7e2cb20040 RED-4653: Corrected if-operator 2022-08-01 17:38:37 +02:00
Viktor Seifert
8867da3557 RED-4653: Added value to config to prevent writing the token as a default since that is only useful in a container 2022-08-01 17:00:44 +02:00
Viktor Seifert
eed5912516 RED-4653: Implemented a startup probe for k8s 2022-08-01 16:19:13 +02:00
Viktor Seifert
3ccc4a1547 Pull request #46: RED-4653
Merge in RR/pyinfra from RED-4653 to master

* commit '0efbd2c98cecaa1e33991473b1b120827df60ae9':
  RED-4653: Removed unnecessary string formatting
  RED-4653: Reordered code to prevent errors on application shutdown
  RED-4653: Changed code to close only the connection instead of the channel & connection to see if that is sufficient for a clean shutdown
  RED-4653: Added some debugging code to test if closing the connection needed
  RED-4653: Corrected exception block to not swallow exceptions
  RED-4653: Switch to closing channel instead of only cancelling subscription on shutdown.
  RED-4653: Corrected signal handler by correctly handling passed params
2022-08-01 14:29:35 +02:00
Viktor Seifert
0efbd2c98c RED-4653: Removed unnecessary string formatting 2022-08-01 14:15:03 +02:00
Viktor Seifert
89ce61996c RED-4653: Reordered code to prevent errors on application shutdown 2022-08-01 14:08:58 +02:00
Viktor Seifert
2cffab279d RED-4653: Changed code to close only the connection instead of the channel & connection to see if that is sufficient for a clean shutdown 2022-08-01 13:39:09 +02:00
Viktor Seifert
76985e83ed RED-4653: Added some debugging code to test if closing the connection needed 2022-08-01 13:25:53 +02:00
Viktor Seifert
bbf013385a RED-4653: Corrected exception block to not swallow exceptions 2022-08-01 13:17:01 +02:00
Viktor Seifert
5cdf4df4a3 RED-4653: Switch to closing channel instead of only cancelling subscription on shutdown.
Changed queue-consumption shutdown to close the channel before closing the connection, since only cancelling the consumers doesn't clean-up the channel correctly, which in turn can cause an error when closing the connection.  Also reordered the code so that the connection and channel are only opened when queue-consumption starts.
2022-08-01 12:25:27 +02:00
Viktor Seifert
fc1f23a24d RED-4653: Corrected signal handler by correctly handling passed params 2022-08-01 11:26:15 +02:00
Isaac Riley
6c2652837a Pull request #45: clean up config hygiene; align queue manager and storage signature
Merge in RR/pyinfra from tidy_up to master

* commit 'db8f617aa78698760e5aaa198445d349755366a1':
  clean up config hygiene; align queue manager and storage signature
2022-07-26 15:12:36 +02:00
Isaac Riley
db8f617aa7 clean up config hygiene; align queue manager and storage signature 2022-07-26 14:56:21 +02:00
Viktor Seifert
e3abf2be0f Pull request #44: RED-4653
Merge in RR/pyinfra from RED-4653 to master

Squashed commit of the following:

commit 14ed6d2ee79f9a6bc4bad187dc775f7476a05d97
Author: Viktor Seifert <viktor.seifert@iqser.com>
Date:   Tue Jul 26 11:08:16 2022 +0200

    RED-4653: Disabled coverage check since there not tests at the moment

commit e926631b167d03e8cc0867db5b5c7d44d6612dcf
Author: Viktor Seifert <viktor.seifert@iqser.com>
Date:   Tue Jul 26 10:58:50 2022 +0200

    RED-4653: Re-added test execution scripts

commit 94648cc449bbc392864197a1796f99f8953b7312
Author: Viktor Seifert <viktor.seifert@iqser.com>
Date:   Tue Jul 26 10:50:42 2022 +0200

    RED-4653: Changed error case for processing messages to not requeue the message since that will be handled in DLQ logic

commit d77982dfedcec49482293d79818283c8d7a17dc7
Author: Viktor Seifert <viktor.seifert@iqser.com>
Date:   Tue Jul 26 10:46:32 2022 +0200

    RED-4653: Removed unnecessary logging message

commit 8c00fd75bf04f8ecc0e9cda654f8e053d4cfb66f
Author: Viktor Seifert <viktor.seifert@iqser.com>
Date:   Tue Jul 26 10:03:35 2022 +0200

    RED-4653: Re-added wrongly removed config

commit 759d72b3fa093b19f97e68d17bf53390cd5453c7
Author: Viktor Seifert <viktor.seifert@iqser.com>
Date:   Tue Jul 26 09:57:47 2022 +0200

    RED-4653: Removed leftover Docker commands

commit 2ff5897ee38e39d6507278b6a82176be2450da16
Author: Viktor Seifert <viktor.seifert@iqser.com>
Date:   Tue Jul 26 09:48:08 2022 +0200

    RED-4653: Removed leftover Docker config

commit 1074167aa98f9f59c0f0f534ba2f1ba09ffb0958
Author: Viktor Seifert <viktor.seifert@iqser.com>
Date:   Tue Jul 26 09:41:21 2022 +0200

    RED-4653: Removed Docker build stage since it is not needed for a project that is used as a Python module

commit ec769c6cd74a74097d8ebe4800ea6e2ea86236cc
Author: Viktor Seifert <viktor.seifert@iqser.com>
Date:   Mon Jul 25 16:11:50 2022 +0200

    RED-4653: Renamed function for better clarity and consistency

commit 96e8ac4316ac57aac90066f35422d333c532513b
Author: Viktor Seifert <viktor.seifert@iqser.com>
Date:   Mon Jul 25 15:07:40 2022 +0200

    RED-4653: Added code to cancel the queue subscription on application exit to queue manager so that it can exit gracefully

commit 64d8e0bd15730898274c08d34f9c34fbac559422
Author: Viktor Seifert <viktor.seifert@iqser.com>
Date:   Mon Jul 25 13:57:06 2022 +0200

    RED-4653: Moved queue cancellation to a separate method so that it can be called on application exit

commit aff1d06364f5694c5922f37d961e401c12243221
Author: Viktor Seifert <viktor.seifert@iqser.com>
Date:   Mon Jul 25 11:51:16 2022 +0200

    RED-4653: Re-ordered message processing so that ack occurs after publishing the result, to prevent message loss

commit 9339186b86f2fe9653366c22fcdc9f7fc096b138
Author: Viktor Seifert <viktor.seifert@iqser.com>
Date:   Fri Jul 22 18:07:25 2022 +0200

    RED-4653: RED-4653: Reordered code to acknowledge message before publishing a result message

commit 2d6fe1cbd95cd86832b086c6dfbcfa62b3ffa16f
Author: Viktor Seifert <viktor.seifert@iqser.com>
Date:   Fri Jul 22 17:00:04 2022 +0200

    RED-4653: Hopefully corrected storage bucket env var name

commit 8f1ef0dd5532882cb12901721195d9acb336286c
Author: Viktor Seifert <viktor.seifert@iqser.com>
Date:   Fri Jul 22 16:37:27 2022 +0200

    RED-4653: Switched to validating the connection url via a regex since the validators lib parses our endpoints incorrectly

commit 8d0234fcc5ff7ed1ae7695a17856c6af050065bd
Author: Viktor Seifert <viktor.seifert@iqser.com>
Date:   Fri Jul 22 15:02:54 2022 +0200

    RED-4653: Corrected exception creation

commit 098a62335b3b695ee409363d429ac07284de7138
Author: Viktor Seifert <viktor.seifert@iqser.com>
Date:   Fri Jul 22 14:42:22 2022 +0200

    RED-4653: Added a descriptive error message when the storage endpoint is nor a correct url

commit 379685f964a4de641ce6506713f1ea8914a3f5ab
Author: Viktor Seifert <viktor.seifert@iqser.com>
Date:   Fri Jul 22 14:11:48 2022 +0200

    RED-4653: Removed variable re-use to make the code clearer

commit 4bf1a023453635568e16b1678ef5ad994c534045
Author: Viktor Seifert <viktor.seifert@iqser.com>
Date:   Thu Jul 21 17:41:55 2022 +0200

    RED-4653: Added explicit conversion of the heartbeat config value to an int before passing it to pika

commit 8f2bc4e028aafdef893458d1433a05724f534fce
Author: Viktor Seifert <viktor.seifert@iqser.com>
Date:   Mon Jul 18 16:41:31 2022 +0200

    RED-4653: Set heartbeat to lower value so that disconnects are detected more quickly

... and 6 more commits
2022-07-26 13:15:07 +02:00
Julius Unverfehrt
3f645484d9 Pull request #41: RED-4564
Merge in RR/pyinfra from RED-4564 to master

Squashed commit of the following:

commit bf4c85a0ab9fed19a44508f2cbef6858cbb32259
Author: Viktor Seifert <viktor.seifert@iqser.com>
Date:   Fri Jul 8 15:46:11 2022 +0200

    RED-4564: POC-test to see if cancelling the consumer prevents messages from being stuck

commit 12ebd186b220f263ac2275463b0c124e8f4210fc
Author: Viktor Seifert <viktor.seifert@iqser.com>
Date:   Fri Jul 8 14:32:05 2022 +0200

    RED-4564: Print full exception with traceback when processing from the queue
2022-07-11 11:28:18 +02:00
201 changed files with 37461 additions and 5353 deletions

View File

@ -1,106 +0,0 @@
data
/build_venv/
/.venv/
/misc/
/incl/image_service/test/
/scratch/
/bamboo-specs/
README.md
Dockerfile
*idea
*misc
*egg-innfo
*pycache*
# Git
.git
.gitignore
# CI
.codeclimate.yml
.travis.yml
.taskcluster.yml
# Docker
.docker
# Byte-compiled / optimized / DLL files
__pycache__/
*/__pycache__/
*/*/__pycache__/
*/*/*/__pycache__/
*.py[cod]
*/*.py[cod]
*/*/*.py[cod]
*/*/*/*.py[cod]
# C extensions
*.so
# Distribution / packaging
.Python
env/
build/
develop-eggs/
dist/
downloads/
eggs/
lib/
lib64/
parts/
sdist/
var/
*.egg-info/**
.installed.cfg
*.egg
# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec
# Installer logs
pip-log.txt
pip-delete-this-directory.txt
# Unit test / coverage reports
htmlcov/
.tox/
.coverage
.cache
nosetests.xml
coverage.xml
# Translations
*.mo
*.pot
# Django stuff:
*.log
# Sphinx documentation
docs/_build/
# PyBuilder
target/
# Virtual environment
.env/
.venv/
#venv/
# PyCharm
.idea
# Python mode for VIM
.ropeproject
*/.ropeproject
*/*/.ropeproject
*/*/*/.ropeproject
# Vim swap files
*.swp
*/*.swp
*/*/*.swp
*/*/*/*.swp

2
.dvc/.gitignore vendored Normal file
View File

@ -0,0 +1,2 @@
/config.local
/cache

5
.dvc/config Normal file
View File

@ -0,0 +1,5 @@
[core]
remote = azure
['remote "azure"']
url = azure://pyinfra-dvc
connection_string =

3
.dvcignore Normal file
View File

@ -0,0 +1,3 @@
# Add patterns of files dvc should ignore, which could improve
# the performance. Learn more at
# https://dvc.org/doc/user-guide/dvcignore

57
.gitignore vendored
View File

@ -1,10 +1,53 @@
# Environments
.env
.venv
__pycache__
data/
env/
venv/
.DS_Store
# Project folders
*.vscode/
.idea
*_app
*pytest_cache
*joblib
*tmp
*profiling
*logs
*docker
*drivers
*bamboo-specs/target
.coverage
data
build_venv
reports
pyinfra.egg-info
bamboo-specs/target
.pytest_cache
/.coverage
.idea
# Python specific files
__pycache__/
*.py[cod]
*.ipynb
*.ipynb_checkpoints
# file extensions
*.log
*.csv
*.pkl
*.profile
*.cbm
*.egg-info
# temp files
*.swp
*~
*.un~
# keep files
!notebooks/*.ipynb
# keep folders
!secrets
!data/*
!drivers
# ignore files
bamboo.yml

23
.gitlab-ci.yml Normal file
View File

@ -0,0 +1,23 @@
# CI for services, check gitlab repo for python package CI
include:
- project: "Gitlab/gitlab"
ref: main
file: "/ci-templates/research/python_pkg-test-build-release.gitlab-ci.yml"
# set project variables here
variables:
NEXUS_PROJECT_DIR: research # subfolder in Nexus docker-gin where your container will be stored
IMAGENAME: $CI_PROJECT_NAME # if the project URL is gitlab.example.com/group-name/project-1, CI_PROJECT_NAME is project-1
REPORTS_DIR: reports
FF_USE_FASTZIP: "true" # enable fastzip - a faster zip implementation that also supports level configuration.
ARTIFACT_COMPRESSION_LEVEL: default # can also be set to fastest, fast, slow and slowest. If just enabling fastzip is not enough try setting this to fastest or fast.
CACHE_COMPRESSION_LEVEL: default # same as above, but for caches
# TRANSFER_METER_FREQUENCY: 5s # will display transfer progress every 5 seconds for artifacts and remote caches. For debugging purposes.
############
# UNIT TESTS
unit-tests:
variables:
###### UPDATE/EDIT ######
UNIT_TEST_DIR: "tests/unit_test"

55
.pre-commit-config.yaml Normal file
View File

@ -0,0 +1,55 @@
# See https://pre-commit.com for more information
# See https://pre-commit.com/hooks.html for more hooks
exclude: ^(docs/|notebooks/|data/|src/configs/|tests/|.hooks/)
default_language_version:
python: python3.10
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v5.0.0
hooks:
- id: trailing-whitespace
- id: end-of-file-fixer
- id: check-yaml
name: Check Gitlab CI (unsafe)
args: [--unsafe]
files: .gitlab-ci.yml
- id: check-yaml
exclude: .gitlab-ci.yml
- id: check-toml
- id: detect-private-key
- id: check-added-large-files
args: ['--maxkb=10000']
- id: check-case-conflict
- id: mixed-line-ending
- repo: https://github.com/pre-commit/mirrors-pylint
rev: v3.0.0a5
hooks:
- id: pylint
language: system
args:
- --disable=C0111,R0903
- --max-line-length=120
- repo: https://github.com/pre-commit/mirrors-isort
rev: v5.10.1
hooks:
- id: isort
args:
- --profile black
- repo: https://github.com/psf/black
rev: 24.10.0
hooks:
- id: black
# exclude: ^(docs/|notebooks/|data/|src/secrets/)
args:
- --line-length=120
- repo: https://github.com/compilerla/conventional-pre-commit
rev: v3.6.0
hooks:
- id: conventional-pre-commit
pass_filenames: false
stages: [commit-msg]
# args: [] # optional: list of Conventional Commits types to allow e.g. [feat, fix, ci, chore, test]

1
.python-version Normal file
View File

@ -0,0 +1 @@
3.10

View File

@ -1,19 +0,0 @@
FROM python:3.8
# Use a virtual environment.
RUN python -m venv /app/venv
ENV PATH="/app/venv/bin:$PATH"
# Upgrade pip.
RUN python -m pip install --upgrade pip
# Make a directory for the service files and copy the service repo into the container.
WORKDIR /app/service
COPY . .
# Install module & dependencies
RUN python3 -m pip install -e .
RUN python3 -m pip install -r requirements.txt
# Run the service loop.
CMD ["python", "src/serve.py"]

View File

@ -1,19 +0,0 @@
ARG BASE_ROOT="nexus.iqser.com:5001/red/"
ARG VERSION_TAG="dev"
FROM ${BASE_ROOT}pyinfra:${VERSION_TAG}
EXPOSE 5000
EXPOSE 8080
RUN python3 -m pip install coverage
# Make a directory for the service files and copy the service repo into the container.
WORKDIR /app/service
COPY . .
# Install module & dependencies
RUN python3 -m pip install -e .
RUN python3 -m pip install -r requirements.txt
CMD coverage run -m pytest test/ -x && coverage report -m && coverage xml

85
Makefile Normal file
View File

@ -0,0 +1,85 @@
.PHONY: \
poetry in-project-venv dev-env use-env install install-dev tests \
update-version sync-version-with-git \
docker docker-build-run docker-build docker-run \
docker-rm docker-rm-container docker-rm-image \
pre-commit get-licenses prep-commit \
docs sphinx_html sphinx_apidoc
.DEFAULT_GOAL := run
export DOCKER=docker
export DOCKERFILE=Dockerfile
export IMAGE_NAME=rule_engine-image
export CONTAINER_NAME=rule_engine-container
export HOST_PORT=9999
export CONTAINER_PORT=9999
export PYTHON_VERSION=python3.8
# all commands should be executed in the root dir or the project,
# specific environments should be deactivated
poetry: in-project-venv use-env dev-env
in-project-venv:
poetry config virtualenvs.in-project true
use-env:
poetry env use ${PYTHON_VERSION}
dev-env:
poetry install --with dev
install:
poetry add $(pkg)
install-dev:
poetry add --dev $(pkg)
requirements:
poetry export --without-hashes --output requirements.txt
update-version:
poetry version prerelease
sync-version-with-git:
git pull -p && poetry version $(git rev-list --tags --max-count=1 | git describe --tags --abbrev=0)
docker: docker-rm docker-build-run
docker-build-run: docker-build docker-run
docker-build:
$(DOCKER) build \
--no-cache --progress=plain \
-t $(IMAGE_NAME) -f $(DOCKERFILE) .
docker-run:
$(DOCKER) run -it --rm -p $(HOST_PORT):$(CONTAINER_PORT)/tcp --name $(CONTAINER_NAME) $(IMAGE_NAME) python app.py
docker-rm: docker-rm-container docker-rm-image
docker-rm-container:
-$(DOCKER) rm $(CONTAINER_NAME)
docker-rm-image:
-$(DOCKER) image rm $(IMAGE_NAME)
tests:
poetry run pytest ./tests
prep-commit:
docs get-license sync-version-with-git update-version pre-commit
pre-commit:
pre-commit run --all-files
get-licenses:
pip-licenses --format=json --order=license --with-urls > pkg-licenses.json
docs: sphinx_apidoc sphinx_html
sphinx_html:
poetry run sphinx-build -b html docs/source/ docs/build/html -E -a
sphinx_apidoc:
poetry run sphinx-apidoc -o ./docs/source/modules ./src/rule_engine

239
README.md
View File

@ -1,103 +1,220 @@
# Infrastructure to deploy Research Projects
# PyInfra
The Infrastructure expects to be deployed in the same Pod / local environment as the analysis container and handles all outbound communication.
1. [ About ](#about)
2. [ Configuration ](#configuration)
3. [ Queue Manager ](#queue-manager)
4. [ Module Installation ](#module-installation)
5. [ Scripts ](#scripts)
6. [ Tests ](#tests)
7. [ Opentelemetry protobuf dependency hell ](#opentelemetry-protobuf-dependency-hell)
## About
Shared library for the research team, containing code related to infrastructure and communication with other services.
Offers a simple interface for processing data and sending responses via AMQP, monitoring via Prometheus and storage
access via S3 or Azure. Also export traces via OpenTelemetry for queue messages and webserver requests.
To start, see the [complete example](pyinfra/examples.py) which shows how to use all features of the service and can be
imported and used directly for default research service pipelines (data ID in message, download data from storage,
upload result while offering Prometheus monitoring, /health and /ready endpoints and multi tenancy support).
## Configuration
A configuration is located in `/config.yaml`. All relevant variables can be configured via exporting environment variables.
Configuration is done via `Dynaconf`. This means that you can use environment variables, a `.env` file or `.toml`
file(s) to configure the service. You can also combine these methods. The precedence is
`environment variables > .env > .toml`. It is recommended to load settings with the provided
[`load_settings`](pyinfra/config/loader.py) function, which you can combine with the provided
[`parse_args`](pyinfra/config/loader.py) function. This allows you to load settings from a `.toml` file or a folder with
`.toml` files and override them with environment variables.
| Environment Variable | Default | Description |
|-------------------------------|--------------------------------|-----------------------------------------------------------------------|
| LOGGING_LEVEL_ROOT | DEBUG | Logging level for service logger |
| PROBING_WEBSERVER_HOST | "0.0.0.0" | Probe webserver address |
| PROBING_WEBSERVER_PORT | 8080 | Probe webserver port |
| PROBING_WEBSERVER_MODE | production | Webserver mode: {development, production} |
| RABBITMQ_HOST | localhost | RabbitMQ host address |
| RABBITMQ_PORT | 5672 | RabbitMQ host port |
| RABBITMQ_USERNAME | user | RabbitMQ username |
| RABBITMQ_PASSWORD | bitnami | RabbitMQ password |
| RABBITMQ_HEARTBEAT | 7200 | Controls AMQP heartbeat timeout in seconds |
| REQUEST_QUEUE | request_queue | Requests to service |
| RESPONSE_QUEUE | response_queue | Responses by service |
| DEAD_LETTER_QUEUE | dead_letter_queue | Messages that failed to process |
| ANALYSIS_ENDPOINT | "http://127.0.0.1:5000" | Endpoint for analysis container |
| STORAGE_BACKEND | s3 | The type of storage to use {s3, azure} |
| STORAGE_BUCKET | "pyinfra-test-bucket" | The bucket / container to pull files specified in queue requests from |
| STORAGE_ENDPOINT | "http://127.0.0.1:9000" | Endpoint for s3 storage |
| STORAGE_KEY | root | User for s3 storage |
| STORAGE_SECRET | password | Password for s3 storage |
| STORAGE_AZURECONNECTIONSTRING | "DefaultEndpointsProtocol=..." | Connection string for Azure storage |
The following table shows all necessary settings. You can find a preconfigured settings file for this service in
bitbucket. These are the complete settings, you only need all if using all features of the service as described in
the [complete example](pyinfra/examples.py).
## Response Format
| Environment Variable | Internal / .toml Name | Description |
| ------------------------------------------ | --------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| LOGGING\_\_LEVEL | logging.level | Log level |
| DYNAMIC_TENANT_QUEUES\_\_ENABLED | dynamic_tenant_queues.enabled | Enable queues per tenant that are dynamically created mode |
| METRICS\_\_PROMETHEUS\_\_ENABLED | metrics.prometheus.enabled | Enable Prometheus metrics collection |
| METRICS\_\_PROMETHEUS\_\_PREFIX | metrics.prometheus.prefix | Prefix for Prometheus metrics (e.g. {product}-{service}) |
| WEBSERVER\_\_HOST | webserver.host | Host of the webserver (offering e.g. /prometheus, /ready and /health endpoints) |
| WEBSERVER\_\_PORT | webserver.port | Port of the webserver |
| RABBITMQ\_\_HOST | rabbitmq.host | Host of the RabbitMQ server |
| RABBITMQ\_\_PORT | rabbitmq.port | Port of the RabbitMQ server |
| RABBITMQ\_\_USERNAME | rabbitmq.username | Username for the RabbitMQ server |
| RABBITMQ\_\_PASSWORD | rabbitmq.password | Password for the RabbitMQ server |
| RABBITMQ\_\_HEARTBEAT | rabbitmq.heartbeat | Heartbeat for the RabbitMQ server |
| RABBITMQ\_\_CONNECTION_SLEEP | rabbitmq.connection_sleep | Sleep time intervals during message processing. Has to be a divider of heartbeat, and shouldn't be too big, since only in these intervals queue interactions happen (like receiving new messages) This is also the minimum time the service needs to process a message. |
| RABBITMQ\_\_INPUT_QUEUE | rabbitmq.input_queue | Name of the input queue in single queue setting |
| RABBITMQ\_\_OUTPUT_QUEUE | rabbitmq.output_queue | Name of the output queue in single queue setting |
| RABBITMQ\_\_DEAD_LETTER_QUEUE | rabbitmq.dead_letter_queue | Name of the dead letter queue in single queue setting |
| RABBITMQ\_\_TENANT_EVENT_QUEUE_SUFFIX | rabbitmq.tenant_event_queue_suffix | Suffix for the tenant event queue in multi tenant/queue setting |
| RABBITMQ\_\_TENANT_EVENT_DLQ_SUFFIX | rabbitmq.tenant_event_dlq_suffix | Suffix for the dead letter queue in multi tenant/queue setting |
| RABBITMQ\_\_TENANT_EXCHANGE_NAME | rabbitmq.tenant_exchange_name | Name of tenant exchange in multi tenant/queue setting |
| RABBITMQ\_\_QUEUE_EXPIRATION_TIME | rabbitmq.queue_expiration_time | Time until queue expiration in multi tenant/queue setting |
| RABBITMQ\_\_SERVICE_REQUEST_QUEUE_PREFIX | rabbitmq.service_request_queue_prefix | Service request queue prefix in multi tenant/queue setting |
| RABBITMQ\_\_SERVICE_REQUEST_EXCHANGE_NAME | rabbitmq.service_request_exchange_name | Service request exchange name in multi tenant/queue setting |
| RABBITMQ\_\_SERVICE_RESPONSE_EXCHANGE_NAME | rabbitmq.service_response_exchange_name | Service response exchange name in multi tenant/queue setting |
| RABBITMQ\_\_SERVICE_DLQ_NAME | rabbitmq.service_dlq_name | Service dead letter queue name in multi tenant/queue setting |
| STORAGE\_\_BACKEND | storage.backend | Storage backend to use (currently only "s3" and "azure" are supported) |
| STORAGE\_\_S3\_\_BUCKET | storage.s3.bucket | Name of the S3 bucket |
| STORAGE\_\_S3\_\_ENDPOINT | storage.s3.endpoint | Endpoint of the S3 server |
| STORAGE\_\_S3\_\_KEY | storage.s3.key | Access key for the S3 server |
| STORAGE\_\_S3\_\_SECRET | storage.s3.secret | Secret key for the S3 server |
| STORAGE\_\_S3\_\_REGION | storage.s3.region | Region of the S3 server |
| STORAGE\_\_AZURE\_\_CONTAINER | storage.azure.container_name | Name of the Azure container |
| STORAGE\_\_AZURE\_\_CONNECTION_STRING | storage.azure.connection_string | Connection string for the Azure server |
| STORAGE\_\_TENANT_SERVER\_\_PUBLIC_KEY | storage.tenant_server.public_key | Public key of the tenant server |
| STORAGE\_\_TENANT_SERVER\_\_ENDPOINT | storage.tenant_server.endpoint | Endpoint of the tenant server |
| TRACING\_\_ENABLED | tracing.enabled | Enable tracing |
| TRACING\_\_TYPE | tracing.type | Tracing mode - possible values: "opentelemetry", "azure_monitor" (Excpects APPLICATIONINSIGHTS_CONNECTION_STRING environment variable.) |
| TRACING\_\_OPENTELEMETRY\_\_ENDPOINT | tracing.opentelemetry.endpoint | Endpoint to which OpenTelemetry traces are exported |
| TRACING\_\_OPENTELEMETRY\_\_SERVICE_NAME | tracing.opentelemetry.service_name | Name of the service as displayed in the traces collected |
| TRACING\_\_OPENTELEMETRY\_\_EXPORTER | tracing.opentelemetry.exporter | Name of exporter |
| KUBERNETES\_\_POD_NAME | kubernetes.pod_name | Service pod name |
### Expected AMQP input message:
## Setup
**IMPORTANT** you need to set the following environment variables before running the setup script:
- ``$NEXUS_USER`` your Nexus user (usually equal to firstname.lastname@knecon.com)
- ``$NEXUS_PASSWORD`` your Nexus password (usually equal to your Azure Login)
```shell
# create venv and activate it
source ./scripts/setup/devenvsetup.sh {{ cookiecutter.python_version }} $NEXUS_USER $NEXUS_PASSWORD
source .venv/bin/activate
```
### OpenTelemetry
Open telemetry (vis its Python SDK) is set up to be as unobtrusive as possible; for typical use cases it can be
configured
from environment variables, without additional work in the microservice app, although additional confiuration is
possible.
`TRACING__OPENTELEMETRY__ENDPOINT` should typically be set
to `http://otel-collector-opentelemetry-collector.otel-collector:4318/v1/traces`.
## Queue Manager
The queue manager is responsible for consuming messages from the input queue, processing them and sending the response
to the output queue. The default callback also downloads data from the storage and uploads the result to the storage.
The response message does not contain the data itself, but the identifiers from the input message (including headers
beginning with "X-").
### Standalone Usage
```python
from pyinfra.queue.manager import QueueManager
from pyinfra.queue.callback import make_download_process_upload_callback, DataProcessor
from pyinfra.config.loader import load_settings
settings = load_settings("path/to/settings")
processing_function: DataProcessor # function should expect a dict (json) or bytes (pdf) as input and should return a json serializable object.
queue_manager = QueueManager(settings)
callback = make_download_process_upload_callback(processing_function, settings)
queue_manager.start_consuming(make_download_process_upload_callback(callback, settings))
```
### Usage in a Service
This is the recommended way to use the module. This includes the webserver, Prometheus metrics and health endpoints.
Custom endpoints can be added by adding a new route to the `app` object beforehand. Settings are loaded from files
specified as CLI arguments (e.g. `--settings-path path/to/settings.toml`). The values can also be set or overriden via
environment variables (e.g. `LOGGING__LEVEL=DEBUG`).
The callback can be replaced with a custom one, for example if the data to process is contained in the message itself
and not on the storage.
```python
from pyinfra.config.loader import load_settings, parse_settings_path
from pyinfra.examples import start_standard_queue_consumer
from pyinfra.queue.callback import make_download_process_upload_callback, DataProcessor
processing_function: DataProcessor
arguments = parse_settings_path()
settings = load_settings(arguments.settings_path)
callback = make_download_process_upload_callback(processing_function, settings)
start_standard_queue_consumer(callback, settings) # optionally also pass a fastAPI app object with preconfigured routes
```
### AMQP input message:
Either use the legacy format with dossierId and fileId as strings or the new format where absolute paths are used.
All headers beginning with "X-" are forwarded to the message processor, and returned in the response message (e.g.
"X-TENANT-ID" is used to acquire storage information for the tenant).
```json
{
"dossierId": "",
"fileId": "",
"targetFilePath": "",
"responseFilePath": ""
}
```
Optionally, the input message can contain a field with the key `"operations"`.
### AMQP output message:
or
```json
{
"dossierId": "",
"fileId": "",
...
"targetFileExtension": "",
"responseFileExtension": ""
}
```
## Development
## Module Installation
Either run `src/serve.py` or the built Docker image.
Add the respective version of the pyinfra package to your pyproject.toml file. Make sure to add our gitlab registry as a
source.
For now, all internal packages used by pyinfra also have to be added to the pyproject.toml file (namely kn-utils).
Execute `poetry lock` and `poetry install` to install the packages.
### Setup
You can look up the latest version of the package in
the [gitlab registry](https://gitlab.knecon.com/knecon/research/pyinfra/-/packages).
For the used versions of internal dependencies, please refer to the [pyproject.toml](pyproject.toml) file.
Install module.
```toml
[tool.poetry.dependencies]
pyinfra = { version = "x.x.x", source = "gitlab-research" }
kn-utils = { version = "x.x.x", source = "gitlab-research" }
```bash
pip install -e .
pip install -r requirements.txt
[[tool.poetry.source]]
name = "gitlab-research"
url = "https://gitlab.knecon.com/api/v4/groups/19/-/packages/pypi/simple"
priority = "explicit"
```
or build docker image.
## Scripts
### Run pyinfra locally
**Shell 1**: Start minio and rabbitmq containers
```bash
docker build -f Dockerfile -t pyinfra .
$ cd tests && docker compose up
```
### Usage
**Shell 1:** Start a MinIO and a RabbitMQ docker container.
**Shell 2**: Start pyinfra with callback mock
```bash
docker-compose up
$ python scripts/start_pyinfra.py
```
**Shell 2:** Add files to the local minio storage.
**Shell 3**: Upload dummy content on storage and publish message
```bash
python scripts/manage_minio.py add <MinIO target folder> -d path/to/a/folder/with/PDFs
$ python scripts/send_request.py
```
**Shell 2:** Run pyinfra-server.
## Tests
```bash
python src/serve.py
```
or as container:
Tests require a running minio and rabbitmq container, meaning you have to run `docker compose up` in the tests folder
before running the tests.
```bash
docker run --net=host pyinfra
```
## OpenTelemetry Protobuf Dependency Hell
**Shell 3:** Run analysis-container.
**Shell 4:** Start a client that sends requests to process PDFs from the MinIO store and annotates these PDFs according to the service responses.
```bash
python scripts/mock_client.py
```
**Note**: Status 2025/01/09: the currently used `opentelemetry-exporter-otlp-proto-http` version `1.25.0` requires
a `protobuf` version < `5.x.x` and is not compatible with the latest protobuf version `5.27.x`. This is an [open issue](https://github.com/open-telemetry/opentelemetry-python/issues/3958) in opentelemetry, because [support for 4.25.x ends in Q2 '25](https://protobuf.dev/support/version-support/#python).
Therefore, we should keep this in mind and update the dependency once opentelemetry includes support for `protobuf 5.27.x`.

View File

@ -1,40 +0,0 @@
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
<modelVersion>4.0.0</modelVersion>
<parent>
<groupId>com.atlassian.bamboo</groupId>
<artifactId>bamboo-specs-parent</artifactId>
<version>7.1.2</version>
<relativePath/>
</parent>
<artifactId>bamboo-specs</artifactId>
<version>1.0.0-SNAPSHOT</version>
<packaging>jar</packaging>
<properties>
<sonar.skip>true</sonar.skip>
</properties>
<dependencies>
<dependency>
<groupId>com.atlassian.bamboo</groupId>
<artifactId>bamboo-specs-api</artifactId>
</dependency>
<dependency>
<groupId>com.atlassian.bamboo</groupId>
<artifactId>bamboo-specs</artifactId>
</dependency>
<!-- Test dependencies -->
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<scope>test</scope>
</dependency>
</dependencies>
<!-- run 'mvn test' to perform offline validation of the plan -->
<!-- run 'mvn -Ppublish-specs' to upload the plan to your Bamboo server -->
</project>

View File

@ -1,179 +0,0 @@
package buildjob;
import com.atlassian.bamboo.specs.api.BambooSpec;
import com.atlassian.bamboo.specs.api.builders.BambooKey;
import com.atlassian.bamboo.specs.api.builders.docker.DockerConfiguration;
import com.atlassian.bamboo.specs.api.builders.permission.PermissionType;
import com.atlassian.bamboo.specs.api.builders.permission.Permissions;
import com.atlassian.bamboo.specs.api.builders.permission.PlanPermissions;
import com.atlassian.bamboo.specs.api.builders.plan.Job;
import com.atlassian.bamboo.specs.api.builders.plan.Plan;
import com.atlassian.bamboo.specs.api.builders.plan.PlanIdentifier;
import com.atlassian.bamboo.specs.api.builders.plan.Stage;
import com.atlassian.bamboo.specs.api.builders.plan.branches.BranchCleanup;
import com.atlassian.bamboo.specs.api.builders.plan.branches.PlanBranchManagement;
import com.atlassian.bamboo.specs.api.builders.project.Project;
import com.atlassian.bamboo.specs.builders.task.CheckoutItem;
import com.atlassian.bamboo.specs.builders.task.InjectVariablesTask;
import com.atlassian.bamboo.specs.builders.task.ScriptTask;
import com.atlassian.bamboo.specs.builders.task.VcsCheckoutTask;
import com.atlassian.bamboo.specs.builders.task.CleanWorkingDirectoryTask;
import com.atlassian.bamboo.specs.builders.task.VcsTagTask;
import com.atlassian.bamboo.specs.builders.trigger.BitbucketServerTrigger;
import com.atlassian.bamboo.specs.model.task.InjectVariablesScope;
import com.atlassian.bamboo.specs.api.builders.Variable;
import com.atlassian.bamboo.specs.util.BambooServer;
import com.atlassian.bamboo.specs.builders.task.ScriptTask;
import com.atlassian.bamboo.specs.model.task.ScriptTaskProperties.Location;
/**
* Plan configuration for Bamboo.
* Learn more on: <a href="https://confluence.atlassian.com/display/BAMBOO/Bamboo+Specs">https://confluence.atlassian.com/display/BAMBOO/Bamboo+Specs</a>
*/
@BambooSpec
public class PlanSpec {
private static final String SERVICE_NAME = "pyinfra";
private static final String SERVICE_KEY = SERVICE_NAME.toUpperCase().replaceAll("-","");
/**
* Run main to publish plan on Bamboo
*/
public static void main(final String[] args) throws Exception {
//By default credentials are read from the '.credentials' file.
BambooServer bambooServer = new BambooServer("http://localhost:8085");
Plan plan = new PlanSpec().createDockerBuildPlan();
bambooServer.publish(plan);
PlanPermissions planPermission = new PlanSpec().createPlanPermission(plan.getIdentifier());
bambooServer.publish(planPermission);
}
private PlanPermissions createPlanPermission(PlanIdentifier planIdentifier) {
Permissions permission = new Permissions()
.userPermissions("atlbamboo", PermissionType.EDIT, PermissionType.VIEW, PermissionType.ADMIN, PermissionType.CLONE, PermissionType.BUILD)
.groupPermissions("research", PermissionType.EDIT, PermissionType.VIEW, PermissionType.CLONE, PermissionType.BUILD)
.groupPermissions("Development", PermissionType.EDIT, PermissionType.VIEW, PermissionType.CLONE, PermissionType.BUILD)
.groupPermissions("QA", PermissionType.EDIT, PermissionType.VIEW, PermissionType.CLONE, PermissionType.BUILD)
.loggedInUserPermissions(PermissionType.VIEW)
.anonymousUserPermissionView();
return new PlanPermissions(planIdentifier.getProjectKey(), planIdentifier.getPlanKey()).permissions(permission);
}
private Project project() {
return new Project()
.name("RED")
.key(new BambooKey("RED"));
}
public Plan createDockerBuildPlan() {
return new Plan(
project(),
SERVICE_NAME, new BambooKey(SERVICE_KEY))
.description("Docker build for pyinfra")
.stages(
new Stage("Build Stage")
.jobs(
new Job("Build Job", new BambooKey("BUILD"))
.tasks(
new CleanWorkingDirectoryTask()
.description("Clean working directory.")
.enabled(true),
new VcsCheckoutTask()
.description("Checkout default repository.")
.checkoutItems(new CheckoutItem().defaultRepository()),
new ScriptTask()
.description("Set config and keys.")
.inlineBody("mkdir -p ~/.ssh\n" +
"echo \"${bamboo.bamboo_agent_ssh}\" | base64 -d >> ~/.ssh/id_rsa\n" +
"echo \"host vector.iqser.com\" > ~/.ssh/config\n" +
"echo \" user bamboo-agent\" >> ~/.ssh/config\n" +
"chmod 600 ~/.ssh/config ~/.ssh/id_rsa"),
new ScriptTask()
.description("Build Docker container.")
.location(Location.FILE)
.fileFromPath("bamboo-specs/src/main/resources/scripts/docker-build.sh")
.argument(SERVICE_NAME))
.dockerConfiguration(
new DockerConfiguration()
.image("nexus.iqser.com:5001/infra/release_build:4.2.0")
.volume("/var/run/docker.sock", "/var/run/docker.sock"))),
new Stage("Sonar Stage")
.jobs(
new Job("Sonar Job", new BambooKey("SONAR"))
.tasks(
new CleanWorkingDirectoryTask()
.description("Clean working directory.")
.enabled(true),
new VcsCheckoutTask()
.description("Checkout default repository.")
.checkoutItems(new CheckoutItem().defaultRepository()),
new ScriptTask()
.description("Set config and keys.")
.inlineBody("mkdir -p ~/.ssh\n" +
"echo \"${bamboo.bamboo_agent_ssh}\" | base64 -d >> ~/.ssh/id_rsa\n" +
"echo \"host vector.iqser.com\" > ~/.ssh/config\n" +
"echo \" user bamboo-agent\" >> ~/.ssh/config\n" +
"chmod 600 ~/.ssh/config ~/.ssh/id_rsa"),
new ScriptTask()
.description("Run Sonarqube scan.")
.location(Location.FILE)
.fileFromPath("bamboo-specs/src/main/resources/scripts/sonar-scan.sh")
.argument(SERVICE_NAME),
new ScriptTask()
.description("Shut down any running docker containers.")
.location(Location.FILE)
.inlineBody("pip install docker-compose\n" +
"docker-compose down"))
.dockerConfiguration(
new DockerConfiguration()
.image("nexus.iqser.com:5001/infra/release_build:4.2.0")
.volume("/var/run/docker.sock", "/var/run/docker.sock"))),
new Stage("Licence Stage")
.jobs(
new Job("Git Tag Job", new BambooKey("GITTAG"))
.tasks(
new VcsCheckoutTask()
.description("Checkout default repository.")
.checkoutItems(new CheckoutItem().defaultRepository()),
new ScriptTask()
.description("Build git tag.")
.location(Location.FILE)
.fileFromPath("bamboo-specs/src/main/resources/scripts/git-tag.sh"),
new InjectVariablesTask()
.description("Inject git tag.")
.path("git.tag")
.namespace("g")
.scope(InjectVariablesScope.LOCAL),
new VcsTagTask()
.description("${bamboo.g.gitTag}")
.tagName("${bamboo.g.gitTag}")
.defaultRepository())
.dockerConfiguration(
new DockerConfiguration()
.image("nexus.iqser.com:5001/infra/release_build:4.4.1")),
new Job("Licence Job", new BambooKey("LICENCE"))
.enabled(false)
.tasks(
new VcsCheckoutTask()
.description("Checkout default repository.")
.checkoutItems(new CheckoutItem().defaultRepository()),
new ScriptTask()
.description("Build licence.")
.location(Location.FILE)
.fileFromPath("bamboo-specs/src/main/resources/scripts/create-licence.sh"))
.dockerConfiguration(
new DockerConfiguration()
.image("nexus.iqser.com:5001/infra/maven:3.6.2-jdk-13-3.0.0")
.volume("/etc/maven/settings.xml", "/usr/share/maven/ref/settings.xml")
.volume("/var/run/docker.sock", "/var/run/docker.sock"))))
.linkedRepositories("RR / " + SERVICE_NAME)
.triggers(new BitbucketServerTrigger())
.planBranchManagement(new PlanBranchManagement()
.createForVcsBranch()
.delete(new BranchCleanup()
.whenInactiveInRepositoryAfterDays(14))
.notificationForCommitters());
}
}

View File

@ -1,19 +0,0 @@
#!/bin/bash
set -e
if [[ \"${bamboo_version_tag}\" != \"dev\" ]]
then
${bamboo_capability_system_builder_mvn3_Maven_3}/bin/mvn \
-f ${bamboo_build_working_directory}/pom.xml \
versions:set \
-DnewVersion=${bamboo_version_tag}
${bamboo_capability_system_builder_mvn3_Maven_3}/bin/mvn \
-f ${bamboo_build_working_directory}/pom.xml \
-B clean deploy \
-e -DdeployAtEnd=true \
-Dmaven.wagon.http.ssl.insecure=true \
-Dmaven.wagon.http.ssl.allowall=true \
-Dmaven.wagon.http.ssl.ignore.validity.dates=true \
-DaltDeploymentRepository=iqser_release::default::https://nexus.iqser.com/repository/gin4-platform-releases
fi

View File

@ -1,13 +0,0 @@
#!/bin/bash
set -e
SERVICE_NAME=$1
python3 -m venv build_venv
source build_venv/bin/activate
python3 -m pip install --upgrade pip
echo "index-url = https://${bamboo_nexus_user}:${bamboo_nexus_password}@nexus.iqser.com/repository/python-combind/simple" >> pip.conf
docker build -f Dockerfile -t nexus.iqser.com:5001/red/$SERVICE_NAME:${bamboo_version_tag} .
echo "${bamboo_nexus_password}" | docker login --username "${bamboo_nexus_user}" --password-stdin nexus.iqser.com:5001
docker push nexus.iqser.com:5001/red/$SERVICE_NAME:${bamboo_version_tag}

View File

@ -1,9 +0,0 @@
#!/bin/bash
set -e
if [[ "${bamboo_version_tag}" = "dev" ]]
then
echo "gitTag=${bamboo_planRepository_1_branch}_${bamboo_buildNumber}" > git.tag
else
echo "gitTag=${bamboo_version_tag}" > git.tag
fi

View File

@ -1,58 +0,0 @@
#!/bin/bash
set -e
export JAVA_HOME=/usr/bin/sonar-scanner/jre
python3 -m venv build_venv
source build_venv/bin/activate
python3 -m pip install --upgrade pip
python3 -m pip install dependency-check
python3 -m pip install docker-compose
python3 -m pip install coverage
echo "docker-compose down"
docker-compose down
sleep 30
echo "coverage report generation"
bash run_tests.sh
if [ ! -f reports/coverage.xml ]
then
exit 1
fi
SERVICE_NAME=$1
echo "dependency-check:aggregate"
mkdir -p reports
dependency-check --enableExperimental -f JSON -f XML \
--disableAssembly -s . -o reports --project $SERVICE_NAME --exclude ".git/**" --exclude "venv/**" \
--exclude "build_venv/**" --exclude "**/__pycache__/**" --exclude "bamboo-specs/**"
if [[ -z "${bamboo_repository_pr_key}" ]]
then
echo "Sonar Scan for branch: ${bamboo_planRepository_1_branch}"
/usr/bin/sonar-scanner/bin/sonar-scanner -X\
-Dsonar.projectKey=RED_$SERVICE_NAME \
-Dsonar.host.url=https://sonarqube.iqser.com \
-Dsonar.login=${bamboo_sonarqube_api_token_secret} \
-Dsonar.dependencyCheck.jsonReportPath=reports/dependency-check-report.json \
-Dsonar.dependencyCheck.xmlReportPath=reports/dependency-check-report.xml \
-Dsonar.dependencyCheck.htmlReportPath=reports/dependency-check-report.html \
-Dsonar.python.coverage.reportPaths=reports/coverage.xml
else
echo "Sonar Scan for PR with key1: ${bamboo_repository_pr_key}"
/usr/bin/sonar-scanner/bin/sonar-scanner \
-Dsonar.projectKey=RED_$SERVICE_NAME \
-Dsonar.host.url=https://sonarqube.iqser.com \
-Dsonar.login=${bamboo_sonarqube_api_token_secret} \
-Dsonar.pullrequest.key=${bamboo_repository_pr_key} \
-Dsonar.pullrequest.branch=${bamboo_repository_pr_sourceBranch} \
-Dsonar.pullrequest.base=${bamboo_repository_pr_targetBranch} \
-Dsonar.dependencyCheck.jsonReportPath=reports/dependency-check-report.json \
-Dsonar.dependencyCheck.xmlReportPath=reports/dependency-check-report.xml \
-Dsonar.dependencyCheck.htmlReportPath=reports/dependency-check-report.html \
-Dsonar.python.coverage.reportPaths=reports/coverage.xml
fi

View File

@ -1,16 +0,0 @@
package buildjob;
import com.atlassian.bamboo.specs.api.builders.plan.Plan;
import com.atlassian.bamboo.specs.api.exceptions.PropertiesValidationException;
import com.atlassian.bamboo.specs.api.util.EntityPropertiesBuilders;
import org.junit.Test;
public class PlanSpecTest {
@Test
public void checkYourPlanOffline() throws PropertiesValidationException {
Plan plan = new PlanSpec().createDockerBuildPlan();
EntityPropertiesBuilders.build(plan);
}
}

View File

@ -1,6 +0,0 @@
___ _ _ ___ __
o O O | _ \ | || | |_ _| _ _ / _| _ _ __ _
o | _/ \_, | | | | ' \ | _| | '_| / _` |
TS__[O] _|_|_ _|__/ |___| |_||_| _|_|_ _|_|_ \__,_|
{======|_| ``` |_| ````|_|`````|_|`````|_|`````|_|`````|_|`````|
./o--000' `-0-0-' `-0-0-' `-0-0-' `-0-0-' `-0-0-' `-0-0-' `-0-0-'

27459
bom.json Normal file

File diff suppressed because it is too large Load Diff

View File

@ -1,87 +0,0 @@
service:
logging_level: $LOGGING_LEVEL_ROOT|DEBUG # Logging level for service logger
name: $SERVICE_NAME|research # Default service name for research service, used for prometheus metric name
response_formatter: default # formats analysis payloads of response messages
upload_formatter: projecting # formats analysis payloads of objects uploaded to storage
# Note: This is not really the right place for this. It should be configured on a per-service basis.
operation: $OPERATION|default
# operation needs to be specified in deployment config for services that are called without an operation specified
operations:
conversion:
input:
multi: False
subdir: ""
extension: ORIGIN.pdf.gz
output:
subdir: "pages_as_images"
extension: json.gz
extraction:
input:
multi: False
subdir: ""
extension: ORIGIN.pdf.gz
output:
subdir: "extracted_images"
extension: json.gz
table_parsing:
input:
multi: True
subdir: "pages_as_images"
extension: json.gz
output:
subdir: "table_parses"
extension: json.gz
image_classification:
input:
multi: True
subdir: "extracted_images"
extension: json.gz
output:
subdir: ""
extension: IMAGE_INFO.json.gz
default:
input:
multi: False
subdir: ""
extension: in.gz
output:
subdir: ""
extension: out.gz
probing_webserver:
host: $PROBING_WEBSERVER_HOST|"0.0.0.0" # Probe webserver address
port: $PROBING_WEBSERVER_PORT|8080 # Probe webserver port
mode: $PROBING_WEBSERVER_MODE|production # webserver mode: {development, production}
rabbitmq:
host: $RABBITMQ_HOST|localhost # RabbitMQ host address
port: $RABBITMQ_PORT|5672 # RabbitMQ host port
user: $RABBITMQ_USERNAME|user # RabbitMQ username
password: $RABBITMQ_PASSWORD|bitnami # RabbitMQ password
heartbeat: $RABBITMQ_HEARTBEAT|7200 # Controls AMQP heartbeat timeout in seconds
queues:
input: $REQUEST_QUEUE|request_queue # Requests to service
output: $RESPONSE_QUEUE|response_queue # Responses by service
dead_letter: $DEAD_LETTER_QUEUE|dead_letter_queue # Messages that failed to process
callback:
analysis_endpoint: $ANALYSIS_ENDPOINT|"http://127.0.0.1:5000"
storage:
backend: $STORAGE_BACKEND|s3 # The type of storage to use {s3, azure}
bucket: "STORAGE_BUCKET_NAME|STORAGE_AZURECONTAINERNAME|pyinfra-test-bucket" # The bucket / container to pull files specified in queue requests from
s3:
endpoint: $STORAGE_ENDPOINT|"http://127.0.0.1:9000"
access_key: $STORAGE_KEY|root
secret_key: $STORAGE_SECRET|password
region: $STORAGE_REGION|"eu-west-1"
azure:
connection_string: $STORAGE_AZURECONNECTIONSTRING|"DefaultEndpointsProtocol=https;AccountName=iqserdevelopment;AccountKey=4imAbV9PYXaztSOMpIyAClg88bAZCXuXMGJG0GA1eIBpdh2PlnFGoRBnKqLy2YZUSTmZ3wJfC7tzfHtuC6FEhQ==;EndpointSuffix=core.windows.net"
retry:
tries: 3
delay: 5
jitter: [1, 3]

View File

@ -1,76 +0,0 @@
Processing service interface
image classification now : JSON (Mdat PDF) -> (Data PDF -> JSON [Mdat ImObj]
image classification future: JSON [Mdat FunkIm] | Mdat PDF -> (Data [FunkIm] -> JSON [Mdat FunkIm])
object detection : JSON [Mdat PagIm] | Mdat PDF -> (Data [PagIm] -> JSON [[Mdat SemIm]])
NER : JSON [Mdat Dict] -> (Data [Dict] -> JSON [Mdat])
table parsing : JSON [Mdat FunkIm] | Mdat PDF -> (Data [PagIm] -> JSON [[Mdat FunkIm]])
pdf2image : Mdat (fn, [Int], PDF) -> (JSON ([Int], Data PDF) -> [(FunkIm, Mdat)])
image classification now : Mdat (fn, [Int], file) -> (Data PDF -> JSON [Mdat ImObj]
image classification future: Mdat (fn, [Int], dir) -> (Data [FunkIm] -> JSON [Mdat FunkIm])
object detection : Mdat (fn, [Int], dir) -> (Data [PagIm] -> JSON [[Mdat SemIm]])
table parsing : Mdat (fn, [Int], dir) -> (Data [PagIm] -> JSON [[Mdat FunkIm]])
NER : Mdat (fn, [Int], file) -> (Data [Dict] -> JSON [Mdat])
pdf2image : Mdat (fn, [Int], file) -> (JSON ([Int], Data PDF) -> [(FunkIm, Mdat)])
from funcy import identity
access(mdat):
if mdat.path is file:
request = {"data": load(mdat.path), "metadata": mdat}
elif mdat.path is dir:
get_indexed = identity if not mdat.idx else itemgetter(*mdat.idx)
request = {"data": get_indexed(get_files(mdat.path)), "metadata": mdat}
else:
raise BadRequest
storage:
fileId: {
pages: [PagIm]
images: [FunkIm]
sections: gz
}
---------------
assert if targetPath is file then response list must be singleton
{index: [], dir: fileID.pdf.gz, targetPath: fileID.images.json.gz} -> [{data: pdf bytes, metadata: request: ...] -> [{data: null, metadata: request: null, response: {classification infos: ...}]
image classification now : Mdat (fn, [Int], file) -> [JSON (Data PDF, Mdat)] -> [JSON (Data null, Mdat [ImObj])] | 1 -> 1
assert if targetPath is file then response list must be singleton
{index: [], dir: fileID/images, targetPath: fileID.images.json.gz} -> [{data: image bytes, metadata: request: {image location...}] -> [{data: null, metadata: request: null, response: {classification infos: ...}]
image classification future: Mdat (fn, [Int], dir) -> JSON (Data [FunkIm], Mdat) -> [JSON (Data null, Mdat [FunkIm])] |
object detection : Mdat (fn, [Int], dir) -> (Data [PagIm] -> JSON [[Mdat SemIm]])
table parsing : Mdat (fn, [Int], dir) -> (Data [PagIm] -> JSON [[Mdat FunkIm]])
NER : Mdat (fn, [Int], file) -> (Data [Dict] -> JSON [Mdat])
pdf2image : Mdat (fn, [Int], file) -> (JSON ([Int], Data PDF) -> [(FunkIm, Mdat)])
aggregate <==> targetpath is file and index is empty

View File

@ -1,32 +0,0 @@
version: '2'
services:
minio:
image: minio/minio:RELEASE.2022-06-11T19-55-32Z
ports:
- "9000:9000"
environment:
- MINIO_ROOT_PASSWORD=password
- MINIO_ROOT_USER=root
volumes:
- ./data/minio_store:/data
command: server /data
network_mode: "bridge"
rabbitmq:
image: docker.io/bitnami/rabbitmq:3.9.8
ports:
- '4369:4369'
- '5551:5551'
- '5552:5552'
- '5672:5672'
- '25672:25672'
- '15672:15672'
environment:
- RABBITMQ_SECURE_PASSWORD=yes
- RABBITMQ_VM_MEMORY_HIGH_WATERMARK=100%
- RABBITMQ_DISK_FREE_ABSOLUTE_LIMIT=20Gi
network_mode: "bridge"
volumes:
- /opt/bitnami/rabbitmq/.rabbitmq/:/data/bitnami
volumes:
mdata:

6802
poetry.lock generated Normal file

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1 @@

View File

@ -1,65 +0,0 @@
import logging
from funcy import merge, omit, lmap
from pyinfra.exceptions import AnalysisFailure
from pyinfra.pipeline_factory import CachedPipelineFactory
logger = logging.getLogger(__name__)
class Callback:
"""This is the callback that is applied to items pulled from the storage. It forwards these items to an analysis
endpoint.
"""
def __init__(self, pipeline_factory: CachedPipelineFactory):
self.pipeline_factory = pipeline_factory
def __get_pipeline(self, endpoint):
return self.pipeline_factory.get_pipeline(endpoint)
@staticmethod
def __run_pipeline(pipeline, analysis_input: dict):
"""
TODO: Since data and metadata are passed as singletons, there is no buffering and hence no batching happening
within the pipeline. However, the queue acknowledgment logic needs to be changed in order to facilitate
passing non-singletons, to only ack a message, once a response is pulled from the output queue of the
pipeline. Probably the pipeline return value needs to contains the queue message frame (or so), in order for
the queue manager to tell which message to ack.
TODO: casting list (lmap) on `analysis_response_stream` is a temporary solution, while the client pipeline
operates on singletons ([data], [metadata]).
"""
def combine_storage_item_metadata_with_queue_message_metadata(analysis_input):
return merge(analysis_input["metadata"], omit(analysis_input, ["data", "metadata"]))
def remove_queue_message_metadata(analysis_result):
metadata = omit(analysis_result["metadata"], queue_message_keys(analysis_input))
return {**analysis_result, "metadata": metadata}
def queue_message_keys(analysis_input):
return {*analysis_input.keys()}.difference({"data", "metadata"})
try:
data = analysis_input["data"]
metadata = combine_storage_item_metadata_with_queue_message_metadata(analysis_input)
analysis_response_stream = pipeline([data], [metadata])
analysis_response_stream = lmap(remove_queue_message_metadata, analysis_response_stream)
return analysis_response_stream
except Exception as err:
logger.error(err)
raise AnalysisFailure from err
def __call__(self, analysis_input: dict):
"""data_metadata_pack: {'dossierId': ..., 'fileId': ..., 'pages': ..., 'operation': ...}"""
operation = analysis_input.get("operation", "")
pipeline = self.__get_pipeline(operation)
try:
logging.debug(f"Requesting analysis for operation '{operation}'...")
return self.__run_pipeline(pipeline, analysis_input)
except AnalysisFailure:
logging.warning(f"Exception caught when calling analysis endpoint for operation '{operation}'.")

View File

@ -1,120 +0,0 @@
import logging
from functools import lru_cache
from funcy import project, identity, rcompose
from pyinfra.callback import Callback
from pyinfra.config import parse_disjunction_string
from pyinfra.file_descriptor_builder import RedFileDescriptorBuilder
from pyinfra.file_descriptor_manager import FileDescriptorManager
from pyinfra.pipeline_factory import CachedPipelineFactory
from pyinfra.queue.consumer import Consumer
from pyinfra.queue.queue_manager.pika_queue_manager import PikaQueueManager
from pyinfra.server.client_pipeline import ClientPipeline
from pyinfra.server.dispatcher.dispatchers.rest import RestDispatcher
from pyinfra.server.interpreter.interpreters.rest_callback import RestPickupStreamer
from pyinfra.server.packer.packers.rest import RestPacker
from pyinfra.server.receiver.receivers.rest import RestReceiver
from pyinfra.storage import storages
from pyinfra.visitor import QueueVisitor
from pyinfra.visitor.downloader import Downloader
from pyinfra.visitor.response_formatter.formatters.default import DefaultResponseFormatter
from pyinfra.visitor.response_formatter.formatters.identity import IdentityResponseFormatter
from pyinfra.visitor.strategies.response.aggregation import AggregationStorageStrategy, ProjectingUploadFormatter
logger = logging.getLogger(__name__)
class ComponentFactory:
def __init__(self, config):
self.config = config
@lru_cache(maxsize=None)
def get_consumer(self, callback=None):
callback = callback or self.get_callback()
return Consumer(self.get_visitor(callback), self.get_queue_manager())
@lru_cache(maxsize=None)
def get_callback(self, analysis_base_url=None):
analysis_base_url = analysis_base_url or self.config.rabbitmq.callback.analysis_endpoint
callback = Callback(CachedPipelineFactory(base_url=analysis_base_url, pipeline_factory=self.get_pipeline))
def wrapped(body):
body_repr = project(body, ["dossierId", "fileId", "operation"])
logger.info(f"Processing {body_repr}...")
result = callback(body)
logger.info(f"Completed processing {body_repr}...")
return result
return wrapped
@lru_cache(maxsize=None)
def get_visitor(self, callback):
return QueueVisitor(
callback=callback,
data_loader=self.get_downloader(),
response_strategy=self.get_response_strategy(),
response_formatter=self.get_response_formatter(),
)
@lru_cache(maxsize=None)
def get_queue_manager(self):
return PikaQueueManager(self.config.rabbitmq.queues.input, self.config.rabbitmq.queues.output)
@staticmethod
@lru_cache(maxsize=None)
def get_pipeline(endpoint):
return ClientPipeline(
RestPacker(), RestDispatcher(endpoint), RestReceiver(), rcompose(RestPickupStreamer(), RestReceiver())
)
@lru_cache(maxsize=None)
def get_storage(self):
return storages.get_storage(self.config.storage.backend)
@lru_cache(maxsize=None)
def get_response_strategy(self, storage=None):
return AggregationStorageStrategy(
storage=storage or self.get_storage(),
file_descriptor_manager=self.get_file_descriptor_manager(),
upload_formatter=self.get_upload_formatter(),
)
@lru_cache(maxsize=None)
def get_file_descriptor_manager(self):
return FileDescriptorManager(
bucket_name=parse_disjunction_string(self.config.storage.bucket),
file_descriptor_builder=self.get_operation_file_descriptor_builder(),
)
@lru_cache(maxsize=None)
def get_upload_formatter(self):
return {"identity": identity, "projecting": ProjectingUploadFormatter()}[self.config.service.upload_formatter]
@lru_cache(maxsize=None)
def get_operation_file_descriptor_builder(self):
return RedFileDescriptorBuilder(
operation2file_patterns=self.get_operation2file_patterns(),
default_operation_name=self.config.service.operation,
)
@lru_cache(maxsize=None)
def get_response_formatter(self):
return {"default": DefaultResponseFormatter(), "identity": IdentityResponseFormatter()}[
self.config.service.response_formatter
]
@lru_cache(maxsize=None)
def get_operation2file_patterns(self):
if self.config.service.operation is not "default":
self.config.service.operations["default"] = self.config.service.operations[self.config.service.operation]
return self.config.service.operations
@lru_cache(maxsize=None)
def get_downloader(self, storage=None):
return Downloader(
storage=storage or self.get_storage(),
bucket_name=parse_disjunction_string(self.config.storage.bucket),
file_descriptor_manager=self.get_file_descriptor_manager(),
)

View File

@ -1,84 +0,0 @@
"""Implements a config object with dot-indexing syntax."""
import os
from functools import partial
from itertools import chain
from operator import truth
from typing import Iterable
from envyaml import EnvYAML
from frozendict import frozendict
from funcy import first, juxt, butlast, last, lmap
from pyinfra.locations import CONFIG_FILE
def _get_item_and_maybe_make_dotindexable(container, item):
ret = container[item]
return DotIndexable(ret) if isinstance(ret, dict) else ret
class DotIndexable:
def __init__(self, x):
self.x = x
def __getattr__(self, item):
return _get_item_and_maybe_make_dotindexable(self.x, item)
def __repr__(self):
return self.x.__repr__()
def __getitem__(self, item):
return self.__getattr__(item)
def __setitem__(self, key, value):
self.x[key] = value
class Config:
def __init__(self, config_path):
self.__config = EnvYAML(config_path)
def __getattr__(self, item):
if item in self.__config:
return _get_item_and_maybe_make_dotindexable(self.__config, item)
def __getitem__(self, item):
return self.__getattr__(item)
def __setitem__(self, key, value):
self.__config.key = value
def to_dict(self, frozen=True):
return to_dict(self.__config.export(), frozen=frozen)
def __hash__(self):
return hash(self.to_dict())
def to_dict(v, frozen=True):
def make_dict(*args, **kwargs):
return (frozendict if frozen else dict)(*args, **kwargs)
if isinstance(v, list):
return tuple(map(partial(to_dict, frozen=frozen), v))
elif isinstance(v, DotIndexable):
return make_dict({k: to_dict(v, frozen=frozen) for k, v in v.x.items()})
elif isinstance(v, dict):
return make_dict({k: to_dict(v, frozen=frozen) for k, v in v.items()})
else:
return v
CONFIG = Config(CONFIG_FILE)
def parse_disjunction_string(disjunction_string):
def try_parse_env_var(disjunction_string):
try:
return os.environ[disjunction_string]
except KeyError:
return None
options = disjunction_string.split("|")
identifiers, fallback_value = juxt(butlast, last)(options)
return first(chain(filter(truth, map(try_parse_env_var, identifiers)), [fallback_value]))

133
pyinfra/config/loader.py Normal file
View File

@ -0,0 +1,133 @@
import argparse
import os
from functools import partial
from pathlib import Path
from typing import Union
from dynaconf import Dynaconf, ValidationError, Validator
from funcy import lflatten
from kn_utils.logging import logger
# This path is ment for testing purposes and convenience. It probably won't reflect the actual root path when pyinfra is
# installed as a package, so don't use it in production code, but define your own root path as described in load config.
local_pyinfra_root_path = Path(__file__).parents[2]
def load_settings(
settings_path: Union[str, Path, list] = "config/",
root_path: Union[str, Path] = None,
validators: list[Validator] = None,
):
"""Load settings from .toml files, .env and environment variables. Also ensures a ROOT_PATH environment variable is
set. If ROOT_PATH is not set and no root_path argument is passed, the current working directory is used as root.
Settings paths can be a single .toml file, a folder containing .toml files or a list of .toml files and folders.
If a ROOT_PATH environment variable is set, it is not overwritten by the root_path argument.
If a folder is passed, all .toml files in the folder are loaded. If settings path is None, only .env and
environment variables are loaded. If settings_path are relative paths, they are joined with the root_path argument.
"""
root_path = get_or_set_root_path(root_path)
validators = validators or get_pyinfra_validators()
settings_files = normalize_to_settings_files(settings_path, root_path)
settings = Dynaconf(
load_dotenv=True,
envvar_prefix=False,
settings_files=settings_files,
)
validate_settings(settings, validators)
logger.info("Settings loaded and validated.")
return settings
def normalize_to_settings_files(settings_path: Union[str, Path, list], root_path: Union[str, Path]):
if settings_path is None:
logger.info("No settings path specified, only loading .env end ENVs.")
settings_files = []
elif isinstance(settings_path, str) or isinstance(settings_path, Path):
settings_files = [settings_path]
elif isinstance(settings_path, list):
settings_files = settings_path
else:
raise ValueError(f"Invalid settings path: {settings_path=}")
settings_files = lflatten(map(partial(_normalize_and_verify, root_path=root_path), settings_files))
logger.debug(f"Normalized settings files: {settings_files}")
return settings_files
def _normalize_and_verify(settings_path: Path, root_path: Path):
settings_path = Path(settings_path)
root_path = Path(root_path)
if not settings_path.is_absolute():
logger.debug(f"Settings path is not absolute, joining with root path: {root_path}")
settings_path = root_path / settings_path
if settings_path.is_dir():
logger.debug(f"Settings path is a directory, loading all .toml files in the directory: {settings_path}")
settings_files = list(settings_path.glob("*.toml"))
elif settings_path.is_file():
logger.debug(f"Settings path is a file, loading specified file: {settings_path}")
settings_files = [settings_path]
else:
raise ValueError(f"Invalid settings path: {settings_path=}, {root_path=}")
return settings_files
def get_or_set_root_path(root_path: Union[str, Path] = None):
env_root_path = os.environ.get("ROOT_PATH")
if env_root_path:
root_path = env_root_path
logger.debug(f"'ROOT_PATH' environment variable is set to {root_path}.")
elif root_path:
logger.info(f"'ROOT_PATH' environment variable is not set, setting to {root_path}.")
os.environ["ROOT_PATH"] = str(root_path)
else:
root_path = Path.cwd()
logger.info(f"'ROOT_PATH' environment variable is not set, defaulting to working directory {root_path}.")
os.environ["ROOT_PATH"] = str(root_path)
return root_path
def get_pyinfra_validators():
import pyinfra.config.validators
return lflatten(
validator for validator in pyinfra.config.validators.__dict__.values() if isinstance(validator, list)
)
def validate_settings(settings: Dynaconf, validators):
settings_valid = True
for validator in validators:
try:
validator.validate(settings)
except ValidationError as e:
settings_valid = False
logger.warning(e)
if not settings_valid:
raise ValidationError("Settings validation failed.")
logger.debug("Settings validated.")
def parse_settings_path():
parser = argparse.ArgumentParser()
parser.add_argument(
"settings_path",
help="Path to settings file(s) or folder(s). Must be .toml file(s) or a folder(s) containing .toml files.",
nargs="+",
)
return parser.parse_args().settings_path

View File

@ -0,0 +1,57 @@
from dynaconf import Validator
queue_manager_validators = [
Validator("rabbitmq.host", must_exist=True, is_type_of=str),
Validator("rabbitmq.port", must_exist=True, is_type_of=int),
Validator("rabbitmq.username", must_exist=True, is_type_of=str),
Validator("rabbitmq.password", must_exist=True, is_type_of=str),
Validator("rabbitmq.heartbeat", must_exist=True, is_type_of=int),
Validator("rabbitmq.connection_sleep", must_exist=True, is_type_of=int),
Validator("rabbitmq.input_queue", must_exist=True, is_type_of=str),
Validator("rabbitmq.output_queue", must_exist=True, is_type_of=str),
Validator("rabbitmq.dead_letter_queue", must_exist=True, is_type_of=str),
]
azure_storage_validators = [
Validator("storage.azure.connection_string", must_exist=True, is_type_of=str),
Validator("storage.azure.container", must_exist=True, is_type_of=str),
]
s3_storage_validators = [
Validator("storage.s3.endpoint", must_exist=True, is_type_of=str),
Validator("storage.s3.key", must_exist=True, is_type_of=str),
Validator("storage.s3.secret", must_exist=True, is_type_of=str),
Validator("storage.s3.region", must_exist=True, is_type_of=str),
Validator("storage.s3.bucket", must_exist=True, is_type_of=str),
]
storage_validators = [
Validator("storage.backend", must_exist=True, is_type_of=str),
]
multi_tenant_storage_validators = [
Validator("storage.tenant_server.endpoint", must_exist=True, is_type_of=str),
Validator("storage.tenant_server.public_key", must_exist=True, is_type_of=str),
]
prometheus_validators = [
Validator("metrics.prometheus.prefix", must_exist=True, is_type_of=str),
Validator("metrics.prometheus.enabled", must_exist=True, is_type_of=bool),
]
webserver_validators = [
Validator("webserver.host", must_exist=True, is_type_of=str),
Validator("webserver.port", must_exist=True, is_type_of=int),
]
tracing_validators = [
Validator("tracing.enabled", must_exist=True, is_type_of=bool),
Validator("tracing.type", must_exist=True, is_type_of=str)
]
opentelemetry_validators = [
Validator("tracing.opentelemetry.endpoint", must_exist=True, is_type_of=str),
Validator("tracing.opentelemetry.service_name", must_exist=True, is_type_of=str),
Validator("tracing.opentelemetry.exporter", must_exist=True, is_type_of=str)
]

View File

@ -1,8 +0,0 @@
from functools import lru_cache
from pyinfra.component_factory import ComponentFactory
@lru_cache(maxsize=None)
def get_component_factory(config):
return ComponentFactory(config)

169
pyinfra/examples.py Normal file
View File

@ -0,0 +1,169 @@
import asyncio
import signal
import sys
import aiohttp
from aiormq.exceptions import AMQPConnectionError
from dynaconf import Dynaconf
from fastapi import FastAPI
from kn_utils.logging import logger
from pyinfra.config.loader import get_pyinfra_validators, validate_settings
from pyinfra.queue.async_manager import AsyncQueueManager, RabbitMQConfig
from pyinfra.queue.callback import Callback
from pyinfra.queue.manager import QueueManager
from pyinfra.utils.opentelemetry import instrument_app, instrument_pika, setup_trace
from pyinfra.webserver.prometheus import (
add_prometheus_endpoint,
make_prometheus_processing_time_decorator_from_settings,
)
from pyinfra.webserver.utils import (
add_health_check_endpoint,
create_webserver_thread_from_settings,
run_async_webserver,
)
shutdown_flag = False
async def graceful_shutdown(manager: AsyncQueueManager, queue_task, webserver_task):
global shutdown_flag
shutdown_flag = True
logger.info("SIGTERM received, shutting down gracefully...")
if queue_task and not queue_task.done():
queue_task.cancel()
# await queue manager shutdown
await asyncio.gather(queue_task, manager.shutdown(), return_exceptions=True)
if webserver_task and not webserver_task.done():
webserver_task.cancel()
# await webserver shutdown
await asyncio.gather(webserver_task, return_exceptions=True)
logger.info("Shutdown complete.")
async def run_async_queues(manager: AsyncQueueManager, app, port, host):
"""Run the async webserver and the async queue manager concurrently."""
queue_task = None
webserver_task = None
tenant_api_available = True
# add signal handler for SIGTERM and SIGINT
loop = asyncio.get_running_loop()
loop.add_signal_handler(
signal.SIGTERM, lambda: asyncio.create_task(graceful_shutdown(manager, queue_task, webserver_task))
)
loop.add_signal_handler(
signal.SIGINT, lambda: asyncio.create_task(graceful_shutdown(manager, queue_task, webserver_task))
)
try:
active_tenants = await manager.fetch_active_tenants()
queue_task = asyncio.create_task(manager.run(active_tenants=active_tenants), name="queues")
webserver_task = asyncio.create_task(run_async_webserver(app, port, host), name="webserver")
await asyncio.gather(queue_task, webserver_task)
except asyncio.CancelledError:
logger.info("Main task was cancelled, initiating shutdown.")
except AMQPConnectionError as e:
logger.warning(f"AMQPConnectionError: {e} - shutting down.")
except (aiohttp.ClientResponseError, aiohttp.ClientConnectorError):
logger.warning("Tenant server did not answer - shutting down.")
tenant_api_available = False
except Exception as e:
logger.error(f"An error occurred while running async queues: {e}", exc_info=True)
sys.exit(1)
finally:
if shutdown_flag:
logger.debug("Graceful shutdown already in progress.")
else:
logger.warning("Initiating shutdown due to error or manual interruption.")
if not tenant_api_available:
sys.exit(0)
if queue_task and not queue_task.done():
queue_task.cancel()
if webserver_task and not webserver_task.done():
webserver_task.cancel()
await asyncio.gather(queue_task, manager.shutdown(), webserver_task, return_exceptions=True)
logger.info("Shutdown complete.")
def start_standard_queue_consumer(
callback: Callback,
settings: Dynaconf,
app: FastAPI = None,
):
"""Default serving logic for research services.
Supplies /health, /ready and /prometheus endpoints (if enabled). The callback is monitored for processing time per
message. Also traces the queue messages via openTelemetry (if enabled).
Workload is received via queue messages and processed by the callback function (see pyinfra.queue.callback for
callbacks).
"""
validate_settings(settings, get_pyinfra_validators())
logger.info("Starting webserver and queue consumer...")
app = app or FastAPI()
if settings.metrics.prometheus.enabled:
logger.info("Prometheus metrics enabled.")
app = add_prometheus_endpoint(app)
callback = make_prometheus_processing_time_decorator_from_settings(settings)(callback)
if settings.tracing.enabled:
setup_trace(settings)
instrument_pika(dynamic_queues=settings.dynamic_tenant_queues.enabled)
instrument_app(app)
if settings.dynamic_tenant_queues.enabled:
logger.info("Dynamic tenant queues enabled. Running async queues.")
config = RabbitMQConfig(
host=settings.rabbitmq.host,
port=settings.rabbitmq.port,
username=settings.rabbitmq.username,
password=settings.rabbitmq.password,
heartbeat=settings.rabbitmq.heartbeat,
input_queue_prefix=settings.rabbitmq.service_request_queue_prefix,
tenant_event_queue_suffix=settings.rabbitmq.tenant_event_queue_suffix,
tenant_exchange_name=settings.rabbitmq.tenant_exchange_name,
service_request_exchange_name=settings.rabbitmq.service_request_exchange_name,
service_response_exchange_name=settings.rabbitmq.service_response_exchange_name,
service_dead_letter_queue_name=settings.rabbitmq.service_dlq_name,
queue_expiration_time=settings.rabbitmq.queue_expiration_time,
pod_name=settings.kubernetes.pod_name,
)
manager = AsyncQueueManager(
config=config,
tenant_service_url=settings.storage.tenant_server.endpoint,
message_processor=callback,
max_concurrent_tasks=(
settings.asyncio.max_concurrent_tasks if hasattr(settings.asyncio, "max_concurrent_tasks") else 10
),
)
else:
logger.info("Dynamic tenant queues disabled. Running sync queues.")
manager = QueueManager(settings)
app = add_health_check_endpoint(app, manager.is_ready)
if isinstance(manager, AsyncQueueManager):
asyncio.run(run_async_queues(manager, app, port=settings.webserver.port, host=settings.webserver.host))
elif isinstance(manager, QueueManager):
webserver = create_webserver_thread_from_settings(app, settings)
webserver.start()
try:
manager.start_consuming(callback)
except Exception as e:
logger.error(f"An error occurred while consuming messages: {e}", exc_info=True)
sys.exit(1)
else:
logger.warning(f"Behavior for type {type(manager)} is not defined")

View File

@ -1,50 +0,0 @@
class AnalysisFailure(Exception):
pass
class DataLoadingFailure(Exception):
pass
class ProcessingFailure(Exception):
pass
class UnknownStorageBackend(ValueError):
pass
class InvalidEndpoint(ValueError):
pass
class UnknownClient(ValueError):
pass
class ConsumerError(Exception):
pass
class NoSuchContainer(KeyError):
pass
class IntentionalTestException(RuntimeError):
pass
class UnexpectedItemType(ValueError):
pass
class NoBufferCapacity(ValueError):
pass
class InvalidMessage(ValueError):
pass
class InvalidStorageItemFormat(ValueError):
pass

View File

@ -1,99 +0,0 @@
import abc
import os
from operator import itemgetter
from funcy import project
class FileDescriptorBuilder:
@abc.abstractmethod
def build_file_descriptor(self, queue_item_body, end="input"):
raise NotImplementedError
@abc.abstractmethod
def build_matcher(self, file_descriptor):
raise NotImplementedError
@staticmethod
@abc.abstractmethod
def build_storage_upload_info(analysis_payload, request_metadata):
raise NotImplementedError
@abc.abstractmethod
def get_path_prefix(self, queue_item_body):
raise NotImplementedError
class RedFileDescriptorBuilder(FileDescriptorBuilder):
"""Defines concrete descriptors for storage objects based on queue messages"""
def __init__(self, operation2file_patterns, default_operation_name):
self.operation2file_patterns = operation2file_patterns or self.get_default_operation2file_patterns()
self.default_operation_name = default_operation_name
@staticmethod
def get_default_operation2file_patterns():
return {"default": {"input": {"subdir": "", "extension": ".in"}, "output": {"subdir": "", "extension": ".out"}}}
def build_file_descriptor(self, queue_item_body, end="input"):
def pages():
if end == "input":
if "id" in queue_item_body:
return [queue_item_body["id"]]
else:
return queue_item_body["pages"] if file_pattern["multi"] else []
elif end == "output":
return [queue_item_body["id"]]
else:
raise ValueError(f"Invalid argument: {end=}") # TODO: use an enum for `end`
operation = queue_item_body.get("operation", self.default_operation_name)
file_pattern = self.operation2file_patterns[operation][end]
file_descriptor = {
**project(queue_item_body, ["dossierId", "fileId", "pages"]),
"pages": pages(),
"extension": file_pattern["extension"],
"subdir": file_pattern["subdir"],
}
return file_descriptor
def build_matcher(self, file_descriptor):
def make_filename(file_id, subdir, suffix):
return os.path.join(file_id, subdir, suffix) if subdir else f"{file_id}.{suffix}"
dossier_id, file_id, subdir, pages, extension = itemgetter(
"dossierId", "fileId", "subdir", "pages", "extension"
)(file_descriptor)
matcher = os.path.join(
dossier_id, make_filename(file_id, subdir, self.__build_page_regex(pages, subdir) + extension)
)
return matcher
@staticmethod
def __build_page_regex(pages, subdir):
n_pages = len(pages)
if n_pages > 1:
page_re = "id:(" + "|".join(map(str, pages)) + ")."
elif n_pages == 1:
page_re = f"id:{pages[0]}."
else: # no pages specified -> either all pages or no pages, depending on whether a subdir is specified
page_re = r"id:\d+." if subdir else ""
return page_re
@staticmethod
def build_storage_upload_info(analysis_payload, request_metadata):
storage_upload_info = {**request_metadata, "id": analysis_payload["metadata"].get("id", 0)}
return storage_upload_info
def get_path_prefix(self, queue_item_body):
prefix = "/".join(itemgetter("dossierId", "fileId")(self.build_file_descriptor(queue_item_body, end="input")))
return prefix

View File

@ -1,63 +0,0 @@
from pyinfra.file_descriptor_builder import FileDescriptorBuilder
class FileDescriptorManager:
"""Decorates a file descriptor builder with additional convenience functionality and this way provides a
comprehensive interface for all file descriptor related operations, while the concrete descriptor logic is
implemented in a file descriptor builder.
TODO: This is supposed to be fully decoupled from the concrete file descriptor builder implementation, however some
bad coupling is still left.
"""
def __init__(self, bucket_name, file_descriptor_builder: FileDescriptorBuilder):
self.bucket_name = bucket_name
self.operation_file_descriptor_builder = file_descriptor_builder
def get_input_object_name(self, queue_item_body: dict):
return self.get_object_name(queue_item_body, end="input")
def get_output_object_name(self, queue_item_body: dict):
return self.get_object_name(queue_item_body, end="output")
def get_object_name(self, queue_item_body: dict, end):
file_descriptor = self.build_file_descriptor(queue_item_body, end=end)
object_name = self.__build_matcher(file_descriptor)
return object_name
def build_file_descriptor(self, queue_item_body, end="input"):
return self.operation_file_descriptor_builder.build_file_descriptor(queue_item_body, end=end)
def build_input_matcher(self, queue_item_body):
return self.build_matcher(queue_item_body, end="input")
def build_output_matcher(self, queue_item_body):
return self.build_matcher(queue_item_body, end="output")
def build_matcher(self, queue_item_body, end):
file_descriptor = self.build_file_descriptor(queue_item_body, end=end)
return self.__build_matcher(file_descriptor)
def __build_matcher(self, file_descriptor):
return self.operation_file_descriptor_builder.build_matcher(file_descriptor)
def get_input_object_descriptor(self, queue_item_body):
return self.get_object_descriptor(queue_item_body, end="input")
def get_output_object_descriptor(self, storage_upload_info):
return self.get_object_descriptor(storage_upload_info, end="output")
def get_object_descriptor(self, queue_item_body, end):
# TODO: this is complected with the Storage class API
# FIXME: bad coupling
return {
"bucket_name": self.bucket_name,
"object_name": self.get_object_name(queue_item_body, end=end),
}
def build_storage_upload_info(self, analysis_payload, request_metadata):
return self.operation_file_descriptor_builder.build_storage_upload_info(analysis_payload, request_metadata)
def get_path_prefix(self, queue_item_body):
return self.operation_file_descriptor_builder.get_path_prefix(queue_item_body)

View File

@ -1,63 +0,0 @@
import logging
import requests
from flask import Flask, jsonify
from waitress import serve
from pyinfra.config import CONFIG
logger = logging.getLogger()
def run_probing_webserver(app, host=None, port=None, mode=None):
if not host:
host = CONFIG.probing_webserver.host
if not port:
port = CONFIG.probing_webserver.port
if not mode:
mode = CONFIG.probing_webserver.mode
if mode == "development":
app.run(host=host, port=port, debug=True)
elif mode == "production":
serve(app, host=host, port=port)
def set_up_probing_webserver():
# TODO: implement meaningful checks
app = Flask(__name__)
informed_about_missing_prometheus_endpoint = False
@app.route("/ready", methods=["GET"])
def ready():
resp = jsonify("OK")
resp.status_code = 200
return resp
@app.route("/health", methods=["GET"])
def healthy():
resp = jsonify("OK")
resp.status_code = 200
return resp
@app.route("/prometheus", methods=["GET"])
def get_metrics_from_analysis_endpoint():
nonlocal informed_about_missing_prometheus_endpoint
try:
resp = requests.get(f"{CONFIG.rabbitmq.callback.analysis_endpoint}/prometheus")
resp.raise_for_status()
except ConnectionError:
return ""
except requests.exceptions.HTTPError as err:
if resp.status_code == 404:
if not informed_about_missing_prometheus_endpoint:
logger.warning(f"Got no metrics from analysis prometheus endpoint: {err}")
informed_about_missing_prometheus_endpoint = True
else:
logging.warning(f"Caught {err}")
return resp.text
return app

View File

@ -1,18 +0,0 @@
"""Defines constant paths relative to the module root path."""
from pathlib import Path
MODULE_DIR = Path(__file__).resolve().parents[0]
PACKAGE_ROOT_DIR = MODULE_DIR.parents[0]
TEST_DIR = PACKAGE_ROOT_DIR / "test"
CONFIG_FILE = PACKAGE_ROOT_DIR / "config.yaml"
TEST_CONFIG_FILE = TEST_DIR / "config.yaml"
COMPOSE_PATH = PACKAGE_ROOT_DIR
BANNER_FILE = PACKAGE_ROOT_DIR / "banner.txt"

View File

@ -1,14 +0,0 @@
import abc
class ParsingError(Exception):
pass
class BlobParser(abc.ABC):
@abc.abstractmethod
def parse(self, blob: bytes):
pass
def __call__(self, blob: bytes):
return self.parse(blob)

View File

@ -1,67 +0,0 @@
import logging
from funcy import rcompose
from pyinfra.parser.blob_parser import ParsingError
logger = logging.getLogger(__name__)
class Either:
def __init__(self, item):
self.item = item
def bind(self):
return self.item
class Left(Either):
pass
class Right(Either):
pass
class EitherParserWrapper:
def __init__(self, parser):
self.parser = parser
def __log(self, result):
if isinstance(result, Right):
logger.log(logging.DEBUG - 5, f"{self.parser.__class__.__name__} succeeded or forwarded on {result.bind()}")
else:
logger.log(logging.DEBUG - 5, f"{self.parser.__class__.__name__} failed on {result.bind()}")
return result
def parse(self, item: Either):
if isinstance(item, Left):
try:
return Right(self.parser(item.bind()))
except ParsingError:
return item
elif isinstance(item, Right):
return item
else:
return self.parse(Left(item))
def __call__(self, item):
return self.__log(self.parse(item))
class EitherParserComposer:
def __init__(self, *parsers):
self.parser = rcompose(*map(EitherParserWrapper, parsers))
def parse(self, item):
result = self.parser(item)
if isinstance(result, Right):
return result.bind()
else:
raise ParsingError("All parsers failed.")
def __call__(self, item):
return self.parse(item)

View File

@ -1,7 +0,0 @@
from pyinfra.parser.blob_parser import BlobParser
class IdentityBlobParser(BlobParser):
def parse(self, data: bytes):
return data

View File

@ -1,21 +0,0 @@
import json
from pyinfra.parser.blob_parser import BlobParser, ParsingError
from pyinfra.server.packing import string_to_bytes
class JsonBlobParser(BlobParser):
def parse(self, data: bytes):
try:
data = data.decode()
data = json.loads(data)
except (UnicodeDecodeError, json.JSONDecodeError, AttributeError) as err:
raise ParsingError from err
try:
data["data"] = string_to_bytes(data["data"])
except (KeyError, TypeError) as err:
raise ParsingError from err
return data

View File

@ -1,9 +0,0 @@
from pyinfra.parser.blob_parser import BlobParser, ParsingError
class StringBlobParser(BlobParser):
def parse(self, data: bytes):
try:
return data.decode()
except Exception as err:
raise ParsingError from err

View File

@ -1,18 +0,0 @@
class CachedPipelineFactory:
def __init__(self, base_url, pipeline_factory):
self.base_url = base_url
self.operation2pipeline = {}
self.pipeline_factory = pipeline_factory
def get_pipeline(self, operation: str):
pipeline = self.operation2pipeline.get(operation, None) or self.__register_pipeline(operation)
return pipeline
def __register_pipeline(self, operation):
endpoint = self.__make_endpoint(operation)
pipeline = self.pipeline_factory(endpoint)
self.operation2pipeline[operation] = pipeline
return pipeline
def __make_endpoint(self, operation):
return f"{self.base_url}/{operation}"

View File

@ -0,0 +1,329 @@
import asyncio
import concurrent.futures
import json
from dataclasses import dataclass, field
from typing import Any, Callable, Dict, Set
import aiohttp
from aio_pika import ExchangeType, IncomingMessage, Message, connect
from aio_pika.abc import (
AbstractChannel,
AbstractConnection,
AbstractExchange,
AbstractIncomingMessage,
AbstractQueue,
)
from aio_pika.exceptions import (
ChannelClosed,
ChannelInvalidStateError,
ConnectionClosed,
)
from aiormq.exceptions import AMQPConnectionError
from kn_utils.logging import logger
from kn_utils.retry import retry
@dataclass
class RabbitMQConfig:
host: str
port: int
username: str
password: str
heartbeat: int
input_queue_prefix: str
tenant_event_queue_suffix: str
tenant_exchange_name: str
service_request_exchange_name: str
service_response_exchange_name: str
service_dead_letter_queue_name: str
queue_expiration_time: int
pod_name: str
connection_params: Dict[str, object] = field(init=False)
def __post_init__(self):
self.connection_params = {
"host": self.host,
"port": self.port,
"login": self.username,
"password": self.password,
"client_properties": {"heartbeat": self.heartbeat},
}
class AsyncQueueManager:
def __init__(
self,
config: RabbitMQConfig,
tenant_service_url: str,
message_processor: Callable[[Dict[str, Any]], Dict[str, Any]],
max_concurrent_tasks: int = 10,
):
self.config = config
self.tenant_service_url = tenant_service_url
self.message_processor = message_processor
self.semaphore = asyncio.Semaphore(max_concurrent_tasks)
self.connection: AbstractConnection | None = None
self.channel: AbstractChannel | None = None
self.tenant_exchange: AbstractExchange | None = None
self.input_exchange: AbstractExchange | None = None
self.output_exchange: AbstractExchange | None = None
self.tenant_exchange_queue: AbstractQueue | None = None
self.tenant_queues: Dict[str, AbstractChannel] = {}
self.consumer_tags: Dict[str, str] = {}
self.message_count: int = 0
@retry(tries=5, exceptions=AMQPConnectionError, reraise=True, logger=logger)
async def connect(self) -> None:
logger.info("Attempting to connect to RabbitMQ...")
self.connection = await connect(**self.config.connection_params)
self.connection.close_callbacks.add(self.on_connection_close)
self.channel = await self.connection.channel()
await self.channel.set_qos(prefetch_count=1)
logger.info("Successfully connected to RabbitMQ")
async def on_connection_close(self, sender, exc):
"""This is a callback for unexpected connection closures."""
logger.debug(f"Sender: {sender}")
if isinstance(exc, ConnectionClosed):
logger.warning("Connection to RabbitMQ lost. Attempting to reconnect...")
try:
active_tenants = await self.fetch_active_tenants()
await self.run(active_tenants=active_tenants)
logger.debug("Reconnected to RabbitMQ successfully")
except Exception as e:
logger.warning(f"Failed to reconnect to RabbitMQ: {e}")
# cancel queue manager and webserver to shutdown service
tasks = [t for t in asyncio.all_tasks() if t is not asyncio.current_task()]
[task.cancel() for task in tasks if task.get_name() in ["queues", "webserver"]]
else:
logger.debug("Connection closed on purpose.")
async def is_ready(self) -> bool:
if self.connection is None or self.connection.is_closed:
try:
await self.connect()
except Exception as e:
logger.error(f"Failed to connect to RabbitMQ: {e}")
return False
return True
@retry(tries=5, exceptions=(AMQPConnectionError, ChannelInvalidStateError), reraise=True, logger=logger)
async def setup_exchanges(self) -> None:
self.tenant_exchange = await self.channel.declare_exchange(
self.config.tenant_exchange_name, ExchangeType.TOPIC, durable=True
)
self.input_exchange = await self.channel.declare_exchange(
self.config.service_request_exchange_name, ExchangeType.DIRECT, durable=True
)
self.output_exchange = await self.channel.declare_exchange(
self.config.service_response_exchange_name, ExchangeType.DIRECT, durable=True
)
# we must declare DLQ to handle error messages
self.dead_letter_queue = await self.channel.declare_queue(
self.config.service_dead_letter_queue_name, durable=True
)
@retry(tries=5, exceptions=(AMQPConnectionError, ChannelInvalidStateError), reraise=True, logger=logger)
async def setup_tenant_queue(self) -> None:
self.tenant_exchange_queue = await self.channel.declare_queue(
f"{self.config.pod_name}_{self.config.tenant_event_queue_suffix}",
durable=True,
arguments={
"x-dead-letter-exchange": "",
"x-dead-letter-routing-key": self.config.service_dead_letter_queue_name,
"x-expires": self.config.queue_expiration_time,
},
)
await self.tenant_exchange_queue.bind(self.tenant_exchange, routing_key="tenant.*")
self.consumer_tags["tenant_exchange_queue"] = await self.tenant_exchange_queue.consume(
self.process_tenant_message
)
async def process_tenant_message(self, message: AbstractIncomingMessage) -> None:
try:
async with message.process():
message_body = json.loads(message.body.decode())
logger.debug(f"Tenant message received: {message_body}")
tenant_id = message_body["tenantId"]
routing_key = message.routing_key
if routing_key == "tenant.created":
await self.create_tenant_queues(tenant_id)
elif routing_key == "tenant.delete":
await self.delete_tenant_queues(tenant_id)
except Exception as e:
logger.error(e, exc_info=True)
async def create_tenant_queues(self, tenant_id: str) -> None:
queue_name = f"{self.config.input_queue_prefix}_{tenant_id}"
logger.info(f"Declaring queue: {queue_name}")
try:
input_queue = await self.channel.declare_queue(
queue_name,
durable=True,
arguments={
"x-dead-letter-exchange": "",
"x-dead-letter-routing-key": self.config.service_dead_letter_queue_name,
},
)
await input_queue.bind(self.input_exchange, routing_key=tenant_id)
self.consumer_tags[tenant_id] = await input_queue.consume(self.process_input_message)
self.tenant_queues[tenant_id] = input_queue
logger.info(f"Created and started consuming queue for tenant {tenant_id}")
except Exception as e:
logger.error(e, exc_info=True)
async def delete_tenant_queues(self, tenant_id: str) -> None:
if tenant_id in self.tenant_queues:
# somehow queue.delete() does not work here
await self.channel.queue_delete(f"{self.config.input_queue_prefix}_{tenant_id}")
del self.tenant_queues[tenant_id]
del self.consumer_tags[tenant_id]
logger.info(f"Deleted queues for tenant {tenant_id}")
async def process_input_message(self, message: IncomingMessage) -> None:
async def process_message_body_and_await_result(unpacked_message_body):
async with self.semaphore:
loop = asyncio.get_running_loop()
with concurrent.futures.ThreadPoolExecutor(max_workers=1) as thread_pool_executor:
logger.info("Processing payload in a separate thread.")
result = await loop.run_in_executor(
thread_pool_executor, self.message_processor, unpacked_message_body
)
return result
async with message.process(ignore_processed=True):
if message.redelivered:
logger.warning(f"Declining message with {message.delivery_tag=} due to it being redelivered.")
await message.nack(requeue=False)
return
if message.body.decode("utf-8") == "STOP":
logger.info("Received stop signal, stopping consumption...")
await message.ack()
# TODO: shutdown is probably not the right call here - align w/ Dev what should happen on stop signal
await self.shutdown()
return
self.message_count += 1
try:
tenant_id = message.routing_key
filtered_message_headers = (
{k: v for k, v in message.headers.items() if k.lower().startswith("x-")} if message.headers else {}
)
logger.debug(f"Processing message with {filtered_message_headers=}.")
result: dict = await (
process_message_body_and_await_result({**json.loads(message.body), **filtered_message_headers})
or {}
)
if result:
await self.publish_to_output_exchange(tenant_id, result, filtered_message_headers)
await message.ack()
logger.debug(f"Message with {message.delivery_tag=} acknowledged.")
else:
raise ValueError(f"Could not process message with {message.body=}.")
except json.JSONDecodeError:
await message.nack(requeue=False)
logger.error(f"Invalid JSON in input message: {message.body}", exc_info=True)
except FileNotFoundError as e:
logger.warning(f"{e}, declining message with {message.delivery_tag=}.", exc_info=True)
await message.nack(requeue=False)
except Exception as e:
await message.nack(requeue=False)
logger.error(f"Error processing input message: {e}", exc_info=True)
finally:
self.message_count -= 1
async def publish_to_output_exchange(self, tenant_id: str, result: Dict[str, Any], headers: Dict[str, Any]) -> None:
await self.output_exchange.publish(
Message(body=json.dumps(result).encode(), headers=headers),
routing_key=tenant_id,
)
logger.info(f"Published result to queue {tenant_id}.")
@retry(tries=5, exceptions=(aiohttp.ClientResponseError, aiohttp.ClientConnectorError), reraise=True, logger=logger)
async def fetch_active_tenants(self) -> Set[str]:
async with aiohttp.ClientSession() as session:
async with session.get(self.tenant_service_url) as response:
response.raise_for_status()
if response.headers["content-type"].lower() == "application/json":
data = await response.json()
return {tenant["tenantId"] for tenant in data}
else:
logger.error(
f"Failed to fetch active tenants. Content type is not JSON: {response.headers['content-type'].lower()}"
)
return set()
@retry(
tries=5,
exceptions=(
AMQPConnectionError,
ChannelInvalidStateError,
),
reraise=True,
logger=logger,
)
async def initialize_tenant_queues(self, active_tenants: set) -> None:
for tenant_id in active_tenants:
await self.create_tenant_queues(tenant_id)
async def run(self, active_tenants: set) -> None:
await self.connect()
await self.setup_exchanges()
await self.initialize_tenant_queues(active_tenants=active_tenants)
await self.setup_tenant_queue()
logger.info("RabbitMQ handler is running. Press CTRL+C to exit.")
async def close_channels(self) -> None:
try:
if self.channel and not self.channel.is_closed:
# Cancel queues to stop fetching messages
logger.debug("Cancelling queues...")
for tenant, queue in self.tenant_queues.items():
await queue.cancel(self.consumer_tags[tenant])
if self.tenant_exchange_queue:
await self.tenant_exchange_queue.cancel(self.consumer_tags["tenant_exchange_queue"])
while self.message_count != 0:
logger.debug(f"Messages are still being processed: {self.message_count=} ")
await asyncio.sleep(2)
await self.channel.close(exc=asyncio.CancelledError)
logger.debug("Channel closed.")
else:
logger.debug("No channel to close.")
except ChannelClosed:
logger.warning("Channel was already closed.")
except ConnectionClosed:
logger.warning("Connection was lost, unable to close channel.")
except Exception as e:
logger.error(f"Error during channel shutdown: {e}")
async def close_connection(self) -> None:
try:
if self.connection and not self.connection.is_closed:
await self.connection.close(exc=asyncio.CancelledError)
logger.debug("Connection closed.")
else:
logger.debug("No connection to close.")
except ConnectionClosed:
logger.warning("Connection was already closed.")
except Exception as e:
logger.error(f"Error closing connection: {e}")
async def shutdown(self) -> None:
logger.info("Shutting down RabbitMQ handler...")
await self.close_channels()
await self.close_connection()
logger.info("RabbitMQ handler shut down successfully.")

42
pyinfra/queue/callback.py Normal file
View File

@ -0,0 +1,42 @@
from typing import Callable
from dynaconf import Dynaconf
from kn_utils.logging import logger
from pyinfra.storage.connection import get_storage
from pyinfra.storage.utils import (
download_data_bytes_as_specified_in_message,
upload_data_as_specified_in_message,
DownloadedData,
)
DataProcessor = Callable[[dict[str, DownloadedData] | DownloadedData, dict], dict | list | str]
Callback = Callable[[dict], dict]
def make_download_process_upload_callback(data_processor: DataProcessor, settings: Dynaconf) -> Callback:
"""Default callback for processing queue messages.
Data will be downloaded from the storage as specified in the message. If a tenant id is specified, the storage
will be configured to use that tenant id, otherwise the storage is configured as specified in the settings.
The data is the passed to the dataprocessor, together with the message. The dataprocessor should return a
json serializable object. This object is then uploaded to the storage as specified in the message. The response
message is just the original message.
"""
def inner(queue_message_payload: dict) -> dict:
logger.info(f"Processing payload with download-process-upload callback...")
storage = get_storage(settings, queue_message_payload.get("X-TENANT-ID"))
data: dict[str, DownloadedData] | DownloadedData = download_data_bytes_as_specified_in_message(
storage, queue_message_payload
)
result = data_processor(data, queue_message_payload)
upload_data_as_specified_in_message(storage, queue_message_payload, result)
return queue_message_payload
return inner

View File

@ -1,16 +0,0 @@
from pyinfra.queue.queue_manager.queue_manager import QueueManager
class Consumer:
def __init__(self, visitor, queue_manager: QueueManager):
self.queue_manager = queue_manager
self.visitor = visitor
def consume_and_publish(self, n=None):
self.queue_manager.consume_and_publish(self.visitor, n=n)
def basic_consume_and_publish(self):
self.queue_manager.basic_consume_and_publish(self.visitor)
def consume(self, **kwargs):
return self.queue_manager.consume(**kwargs)

229
pyinfra/queue/manager.py Normal file
View File

@ -0,0 +1,229 @@
import atexit
import concurrent.futures
import json
import logging
import signal
import sys
from typing import Callable, Union
import pika
import pika.exceptions
from dynaconf import Dynaconf
from kn_utils.logging import logger
from kn_utils.retry import retry
from pika.adapters.blocking_connection import BlockingChannel, BlockingConnection
from pyinfra.config.loader import validate_settings
from pyinfra.config.validators import queue_manager_validators
pika_logger = logging.getLogger("pika")
pika_logger.setLevel(logging.WARNING) # disables non-informative pika log clutter
MessageProcessor = Callable[[dict], dict]
class QueueManager:
def __init__(self, settings: Dynaconf):
validate_settings(settings, queue_manager_validators)
self.input_queue = settings.rabbitmq.input_queue
self.output_queue = settings.rabbitmq.output_queue
self.dead_letter_queue = settings.rabbitmq.dead_letter_queue
self.connection_parameters = self.create_connection_parameters(settings)
self.connection: Union[BlockingConnection, None] = None
self.channel: Union[BlockingChannel, None] = None
self.connection_sleep = settings.rabbitmq.connection_sleep
self.processing_callback = False
self.received_signal = False
atexit.register(self.stop_consuming)
signal.signal(signal.SIGTERM, self._handle_stop_signal)
signal.signal(signal.SIGINT, self._handle_stop_signal)
self.max_retries = settings.rabbitmq.max_retries or 5
self.max_delay = settings.rabbitmq.max_delay or 60
@staticmethod
def create_connection_parameters(settings: Dynaconf):
credentials = pika.PlainCredentials(username=settings.rabbitmq.username, password=settings.rabbitmq.password)
pika_connection_params = {
"host": settings.rabbitmq.host,
"port": settings.rabbitmq.port,
"credentials": credentials,
"heartbeat": settings.rabbitmq.heartbeat,
}
return pika.ConnectionParameters(**pika_connection_params)
@retry(
tries=5,
exceptions=(pika.exceptions.AMQPConnectionError, pika.exceptions.ChannelClosedByBroker),
reraise=True,
)
def establish_connection(self):
if self.connection and self.connection.is_open:
logger.debug("Connection to RabbitMQ already established.")
return
logger.info("Establishing connection to RabbitMQ...")
self.connection = pika.BlockingConnection(parameters=self.connection_parameters)
logger.debug("Opening channel...")
self.channel = self.connection.channel()
self.channel.basic_qos(prefetch_count=1)
args = {
"x-dead-letter-exchange": "",
"x-dead-letter-routing-key": self.dead_letter_queue,
}
self.channel.queue_declare(self.input_queue, arguments=args, auto_delete=False, durable=True)
self.channel.queue_declare(self.output_queue, arguments=args, auto_delete=False, durable=True)
logger.info("Connection to RabbitMQ established, channel open.")
def is_ready(self):
try:
self.establish_connection()
return self.channel.is_open
except Exception as e:
logger.error(f"Failed to establish connection: {e}")
return False
@retry(
tries=5,
exceptions=pika.exceptions.AMQPConnectionError,
reraise=True,
)
def start_consuming(self, message_processor: Callable):
on_message_callback = self._make_on_message_callback(message_processor)
try:
self.establish_connection()
self.channel.basic_consume(self.input_queue, on_message_callback)
logger.info("Starting to consume messages...")
self.channel.start_consuming()
except pika.exceptions.AMQPConnectionError as e:
logger.error(f"AMQP Connection Error: {e}")
raise
except Exception as e:
logger.error(f"An unexpected error occurred while consuming messages: {e}", exc_info=True)
raise
finally:
self.stop_consuming()
def stop_consuming(self):
if self.channel and self.channel.is_open:
logger.info("Stopping consuming...")
self.channel.stop_consuming()
logger.info("Closing channel...")
self.channel.close()
if self.connection and self.connection.is_open:
logger.info("Closing connection to RabbitMQ...")
self.connection.close()
def publish_message_to_input_queue(self, message: Union[str, bytes, dict], properties: pika.BasicProperties = None):
if isinstance(message, str):
message = message.encode("utf-8")
elif isinstance(message, dict):
message = json.dumps(message).encode("utf-8")
self.establish_connection()
self.channel.basic_publish(
"",
self.input_queue,
properties=properties,
body=message,
)
logger.info(f"Published message to queue {self.input_queue}.")
def purge_queues(self):
self.establish_connection()
try:
self.channel.queue_purge(self.input_queue)
self.channel.queue_purge(self.output_queue)
logger.info("Queues purged.")
except pika.exceptions.ChannelWrongStateError:
pass
def get_message_from_output_queue(self):
self.establish_connection()
return self.channel.basic_get(self.output_queue, auto_ack=True)
def _make_on_message_callback(self, message_processor: MessageProcessor):
def process_message_body_and_await_result(unpacked_message_body):
# Processing the message in a separate thread is necessary for the main thread pika client to be able to
# process data events (e.g. heartbeats) while the message is being processed.
with concurrent.futures.ThreadPoolExecutor(max_workers=1) as thread_pool_executor:
logger.info("Processing payload in separate thread.")
future = thread_pool_executor.submit(message_processor, unpacked_message_body)
# TODO: This block is probably not necessary, but kept since the implications of removing it are
# unclear. Remove it in a future iteration where less changes are being made to the code base.
while future.running():
logger.debug("Waiting for payload processing to finish...")
self.connection.sleep(self.connection_sleep)
return future.result()
def on_message_callback(channel, method, properties, body):
logger.info(f"Received message from queue with delivery_tag {method.delivery_tag}.")
self.processing_callback = True
if method.redelivered:
logger.warning(f"Declining message with {method.delivery_tag=} due to it being redelivered.")
channel.basic_nack(method.delivery_tag, requeue=False)
return
if body.decode("utf-8") == "STOP":
logger.info(f"Received stop signal, stopping consuming...")
channel.basic_ack(delivery_tag=method.delivery_tag)
self.stop_consuming()
return
try:
filtered_message_headers = (
{k: v for k, v in properties.headers.items() if k.lower().startswith("x-")}
if properties.headers
else {}
)
logger.debug(f"Processing message with {filtered_message_headers=}.")
result: dict = (
process_message_body_and_await_result({**json.loads(body), **filtered_message_headers}) or {}
)
channel.basic_publish(
"",
self.output_queue,
json.dumps(result).encode(),
properties=pika.BasicProperties(headers=filtered_message_headers),
)
logger.info(f"Published result to queue {self.output_queue}.")
channel.basic_ack(delivery_tag=method.delivery_tag)
logger.debug(f"Message with {method.delivery_tag=} acknowledged.")
except FileNotFoundError as e:
logger.warning(f"{e}, declining message with {method.delivery_tag=}.")
channel.basic_nack(method.delivery_tag, requeue=False)
except Exception:
logger.warning(f"Failed to process message with {method.delivery_tag=}, declining...", exc_info=True)
channel.basic_nack(method.delivery_tag, requeue=False)
raise
finally:
self.processing_callback = False
if self.received_signal:
self.stop_consuming()
sys.exit(0)
return on_message_callback
def _handle_stop_signal(self, signum, *args, **kwargs):
logger.info(f"Received signal {signum}, stopping consuming...")
self.received_signal = True
if not self.processing_callback:
self.stop_consuming()
sys.exit(0)

View File

@ -1,172 +0,0 @@
import json
import logging
from itertools import islice
import pika
from pyinfra.config import CONFIG
from pyinfra.exceptions import ProcessingFailure, DataLoadingFailure
from pyinfra.queue.queue_manager.queue_manager import QueueHandle, QueueManager
from pyinfra.visitor import QueueVisitor
logger = logging.getLogger("pika")
logger.setLevel(logging.WARNING)
logger = logging.getLogger()
def monkey_patch_queue_handle(channel, queue) -> QueueHandle:
empty_message = (None, None, None)
def is_empty_message(message):
return message == empty_message
queue_handle = QueueHandle()
queue_handle.empty = lambda: is_empty_message(channel.basic_get(queue))
def produce_items():
while True:
message = channel.basic_get(queue)
if is_empty_message(message):
break
method_frame, properties, body = message
channel.basic_ack(method_frame.delivery_tag)
yield json.loads(body)
queue_handle.to_list = lambda: list(produce_items())
return queue_handle
def get_connection_params():
credentials = pika.PlainCredentials(username=CONFIG.rabbitmq.user, password=CONFIG.rabbitmq.password)
kwargs = {
"host": CONFIG.rabbitmq.host,
"port": CONFIG.rabbitmq.port,
"credentials": credentials,
"heartbeat": CONFIG.rabbitmq.heartbeat,
}
parameters = pika.ConnectionParameters(**kwargs)
return parameters
def get_n_previous_attempts(props):
return 0 if props.headers is None else props.headers.get("x-retry-count", 0)
def attempts_remain(n_attempts, max_attempts):
return n_attempts < max_attempts
class PikaQueueManager(QueueManager):
def __init__(self, input_queue, output_queue, dead_letter_queue=None, connection_params=None):
super().__init__(input_queue, output_queue)
if not connection_params:
connection_params = get_connection_params()
self.connection = pika.BlockingConnection(parameters=connection_params)
self.channel = self.connection.channel()
self.channel.basic_qos(prefetch_count=1)
if not dead_letter_queue:
dead_letter_queue = CONFIG.rabbitmq.queues.dead_letter
args = {"x-dead-letter-exchange": "", "x-dead-letter-routing-key": dead_letter_queue}
self.channel.queue_declare(input_queue, arguments=args, auto_delete=False, durable=True)
self.channel.queue_declare(output_queue, arguments=args, auto_delete=False, durable=True)
def republish(self, body: bytes, n_current_attempts, frame):
self.channel.basic_publish(
exchange="",
routing_key=self._input_queue,
body=body,
properties=pika.BasicProperties(headers={"x-retry-count": n_current_attempts}),
)
self.channel.basic_ack(delivery_tag=frame.delivery_tag)
def publish_request(self, request):
logger.debug(f"Publishing {request}")
self.channel.basic_publish("", self._input_queue, json.dumps(request).encode())
def reject(self, body, frame):
logger.error(f"Adding to dead letter queue: {body}")
self.channel.basic_reject(delivery_tag=frame.delivery_tag, requeue=False)
def publish_response(self, message, visitor: QueueVisitor, max_attempts=3):
logger.debug(f"Processing {message}.")
frame, properties, body = message
n_attempts = get_n_previous_attempts(properties) + 1
try:
response_messages = visitor(json.loads(body))
if isinstance(response_messages, dict):
response_messages = [response_messages]
for response_message in response_messages:
response_message = json.dumps(response_message).encode()
self.channel.basic_publish("", self._output_queue, response_message)
self.channel.basic_ack(frame.delivery_tag)
except (ProcessingFailure, DataLoadingFailure):
logger.error(f"Message failed to process {n_attempts}/{max_attempts} times: {body}")
if attempts_remain(n_attempts, max_attempts):
self.republish(body, n_attempts, frame)
else:
self.reject(body, frame)
def pull_request(self):
return self.channel.basic_get(self._input_queue)
def consume(self, inactivity_timeout=None, n=None):
logger.debug("Consuming")
gen = self.channel.consume(self._input_queue, inactivity_timeout=inactivity_timeout)
yield from islice(gen, n)
def consume_and_publish(self, visitor: QueueVisitor, n=None):
logger.info(f"Consuming input queue.")
for message in self.consume(n=n):
self.publish_response(message, visitor)
def basic_consume_and_publish(self, visitor: QueueVisitor):
logger.info(f"Basic consuming input queue.")
def callback(channel, frame, properties, body):
message = (frame, properties, body)
return self.publish_response(message, visitor)
self.channel.basic_consume(self._input_queue, callback)
self.channel.start_consuming()
def clear(self):
try:
self.channel.queue_purge(self._input_queue)
self.channel.queue_purge(self._output_queue)
assert self.input_queue.to_list() == []
assert self.output_queue.to_list() == []
except pika.exceptions.ChannelWrongStateError:
pass
@property
def input_queue(self) -> QueueHandle:
return monkey_patch_queue_handle(self.channel, self._input_queue)
@property
def output_queue(self) -> QueueHandle:
return monkey_patch_queue_handle(self.channel, self._output_queue)

View File

@ -1,51 +0,0 @@
import abc
class QueueHandle:
def empty(self) -> bool:
raise NotImplementedError
def to_list(self) -> list:
raise NotImplementedError
class QueueManager(abc.ABC):
def __init__(self, input_queue, output_queue):
self._input_queue = input_queue
self._output_queue = output_queue
@abc.abstractmethod
def publish_request(self, request):
raise NotImplementedError
@abc.abstractmethod
def publish_response(self, response, callback):
raise NotImplementedError
@abc.abstractmethod
def pull_request(self):
raise NotImplementedError
@abc.abstractmethod
def consume(self, **kwargs):
raise NotImplementedError
@abc.abstractmethod
def clear(self):
raise NotImplementedError
@abc.abstractmethod
def input_queue(self) -> QueueHandle:
raise NotImplementedError
@abc.abstractmethod
def output_queue(self) -> QueueHandle:
raise NotImplementedError
@abc.abstractmethod
def consume_and_publish(self, callback, n=None):
raise NotImplementedError
@abc.abstractmethod
def basic_consume_and_publish(self, callback):
raise NotImplementedError

View File

@ -1,37 +0,0 @@
import logging
from collections import deque
from funcy import repeatedly, identity
from pyinfra.exceptions import NoBufferCapacity
from pyinfra.server.nothing import Nothing
logger = logging.getLogger(__name__)
def bufferize(fn, buffer_size=3, persist_fn=identity, null_value=None):
def buffered_fn(item):
if item is not Nothing:
buffer.append(persist_fn(item))
response_payload = fn(repeatedly(buffer.popleft, n_items_to_pop(buffer, item is Nothing)))
return response_payload or null_value
def buffer_full(current_buffer_size):
if current_buffer_size > buffer_size:
logger.warning(f"Overfull buffer. size: {current_buffer_size}; intended capacity: {buffer_size}")
return current_buffer_size == buffer_size
def n_items_to_pop(buffer, final):
current_buffer_size = len(buffer)
return (final or buffer_full(current_buffer_size)) * current_buffer_size
if not buffer_size > 0:
raise NoBufferCapacity("Buffer size must be greater than zero.")
buffer = deque()
return buffered_fn

View File

@ -1,24 +0,0 @@
from collections import deque
from itertools import takewhile
from funcy import repeatedly
from pyinfra.server.nothing import is_not_nothing, Nothing
def stream_queue(queue):
yield from takewhile(is_not_nothing, repeatedly(queue.popleft))
class Queue:
def __init__(self):
self.__queue = deque()
def append(self, package) -> None:
self.__queue.append(package)
def popleft(self):
return self.__queue.popleft() if self.__queue else Nothing
def __bool__(self):
return bool(self.__queue)

View File

@ -1,44 +0,0 @@
from itertools import chain, takewhile
from typing import Iterable
from funcy import first, repeatedly, mapcat
from pyinfra.server.buffering.bufferize import bufferize
from pyinfra.server.nothing import Nothing, is_not_nothing
class FlatStreamBuffer:
"""Wraps a stream buffer and chains its output. Also flushes the stream buffer when applied to an iterable."""
def __init__(self, fn, buffer_size=3):
"""Function `fn` needs to be mappable and return an iterable; ideally `fn` returns a generator."""
self.stream_buffer = StreamBuffer(fn, buffer_size=buffer_size)
def __call__(self, items):
items = chain(items, [Nothing])
yield from mapcat(self.stream_buffer, items)
class StreamBuffer:
"""Puts a streaming function between an input and an output buffer."""
def __init__(self, fn, buffer_size=3):
"""Function `fn` needs to be mappable and return an iterable; ideally `fn` returns a generator."""
self.fn = bufferize(fn, buffer_size=buffer_size, null_value=[])
self.result_stream = chain([])
def __call__(self, item) -> Iterable:
self.push(item)
yield from takewhile(is_not_nothing, repeatedly(self.pop))
def push(self, item):
self.result_stream = chain(self.result_stream, self.compute(item))
def compute(self, item):
try:
yield from self.fn(item)
except TypeError as err:
raise TypeError("Function failed with type-error. Is it mappable?") from err
def pop(self):
return first(chain(self.result_stream, [Nothing]))

View File

@ -1,16 +0,0 @@
from funcy import rcompose, flatten
# TODO: remove the dispatcher component from the pipeline; it no longer actually dispatches
class ClientPipeline:
def __init__(self, packer, dispatcher, receiver, interpreter):
self.pipe = rcompose(
packer,
dispatcher,
receiver,
interpreter,
flatten, # each analysis call returns an iterable. Can be empty, singleton or multi item. Hence, flatten.
)
def __call__(self, *args, **kwargs):
yield from self.pipe(*args, **kwargs)

View File

@ -1,27 +0,0 @@
from itertools import tee
from typing import Iterable
def inspect(prefix="inspect", embed=False):
"""Can be used to inspect compositions of generator functions by placing inbetween two functions."""
def inner(x):
if isinstance(x, Iterable) and not isinstance(x, dict) and not isinstance(x, tuple):
x, y = tee(x)
y = list(y)
else:
y = x
l = f" {len(y)} items" if isinstance(y, list) else ""
print(f"{prefix}{l}:", y)
if embed:
import IPython
IPython.embed()
return x
return inner

View File

@ -1,30 +0,0 @@
import abc
from typing import Iterable
from more_itertools import peekable
from pyinfra.server.nothing import Nothing
def has_next(peekable_iter):
return peekable_iter.peek(Nothing) is not Nothing
class Dispatcher:
def __call__(self, packages: Iterable[dict]):
yield from self.dispatch_methods(packages)
def dispatch_methods(self, packages):
packages = peekable(packages)
for package in packages:
method = self.patch if has_next(packages) else self.post
response = method(package)
yield response
@abc.abstractmethod
def patch(self, package):
raise NotImplementedError
@abc.abstractmethod
def post(self, package):
raise NotImplementedError

View File

@ -1,21 +0,0 @@
from itertools import takewhile
from funcy import repeatedly, notnone
from pyinfra.server.dispatcher.dispatcher import Dispatcher
from pyinfra.server.stream.queued_stream_function import QueuedStreamFunction
class QueuedStreamFunctionDispatcher(Dispatcher):
def __init__(self, queued_stream_function: QueuedStreamFunction):
self.queued_stream_function = queued_stream_function
def patch(self, package):
self.queued_stream_function.push(package)
# TODO: this is wonky and a result of the pipeline components having shifted behaviour through previous
# refactorings. The analogous functionality for the rest pipeline is in the interpreter. Correct this
# asymmetry!
yield from takewhile(notnone, repeatedly(self.queued_stream_function.pop))
def post(self, package):
yield from self.patch(package)

View File

@ -1,14 +0,0 @@
import requests
from pyinfra.server.dispatcher.dispatcher import Dispatcher
class RestDispatcher(Dispatcher):
def __init__(self, endpoint):
self.endpoint = endpoint
def patch(self, package):
return requests.patch(self.endpoint, json=package)
def post(self, package):
return requests.post(self.endpoint, json=package)

View File

@ -1,8 +0,0 @@
import abc
from typing import Iterable
class Interpreter(abc.ABC):
@abc.abstractmethod
def __call__(self, payloads: Iterable):
pass

View File

@ -1,8 +0,0 @@
from typing import Iterable
from pyinfra.server.interpreter.interpreter import Interpreter
class IdentityInterpreter(Interpreter):
def __call__(self, payloads: Iterable):
yield from payloads

View File

@ -1,23 +0,0 @@
from typing import Iterable
import requests
from funcy import takewhile, repeatedly, mapcat
from pyinfra.server.interpreter.interpreter import Interpreter
def stream_responses(endpoint):
def receive():
response = requests.get(endpoint)
return response
def more_is_coming(response):
return response.status_code == 206
response_stream = takewhile(more_is_coming, repeatedly(receive))
yield from response_stream
class RestPickupStreamer(Interpreter):
def __call__(self, payloads: Iterable):
yield from mapcat(stream_responses, payloads)

View File

@ -1,39 +0,0 @@
from functools import lru_cache
from funcy import identity
from prometheus_client import CollectorRegistry, Summary
from pyinfra.server.operation_dispatcher import OperationDispatcher
class OperationDispatcherMonitoringDecorator:
def __init__(self, operation_dispatcher: OperationDispatcher, naming_policy=identity):
self.operation_dispatcher = operation_dispatcher
self.operation2metric = {}
self.naming_policy = naming_policy
@property
@lru_cache(maxsize=None)
def registry(self):
return CollectorRegistry(auto_describe=True)
def make_summary_instance(self, op: str):
return Summary(f"{self.naming_policy(op)}_seconds", f"Time spent on {op}.", registry=self.registry)
def submit(self, operation, request):
return self.operation_dispatcher.submit(operation, request)
def pickup(self, operation):
with self.get_monitor(operation):
return self.operation_dispatcher.pickup(operation)
def get_monitor(self, operation):
monitor = self.operation2metric.get(operation, None) or self.register_operation(operation)
return monitor.time()
def register_operation(self, operation):
summary = self.make_summary_instance(operation)
self.operation2metric[operation] = summary
return summary

View File

@ -1,17 +0,0 @@
from itertools import chain
from typing import Iterable, Union, Tuple
from pyinfra.exceptions import UnexpectedItemType
def normalize(itr: Iterable[Union[Tuple, Iterable]]) -> Iterable[Tuple]:
return chain.from_iterable(map(normalize_item, normalize_item(itr)))
def normalize_item(itm: Union[Tuple, Iterable]) -> Iterable:
if isinstance(itm, tuple):
return [itm]
elif isinstance(itm, Iterable):
return itm
else:
raise UnexpectedItemType("Encountered an item that could not be normalized to a list.")

View File

@ -1,6 +0,0 @@
class Nothing:
pass
def is_not_nothing(x):
return x is not Nothing

View File

@ -1,33 +0,0 @@
from itertools import starmap, tee
from typing import Dict
from funcy import juxt, zipdict, cat
from pyinfra.server.stream.queued_stream_function import QueuedStreamFunction
from pyinfra.server.stream.rest import LazyRestProcessor
class OperationDispatcher:
def __init__(self, operation2function: Dict[str, QueuedStreamFunction]):
submit_suffixes, pickup_suffixes = zip(*map(juxt(submit_suffix, pickup_suffix), operation2function))
processors = starmap(LazyRestProcessor, zip(operation2function.values(), submit_suffixes, pickup_suffixes))
self.operation2processor = zipdict(submit_suffixes + pickup_suffixes, cat(tee(processors)))
@classmethod
@property
def pickup_suffix(cls):
return pickup_suffix("")
def submit(self, operation, request):
return self.operation2processor[operation].push(request)
def pickup(self, operation):
return self.operation2processor[operation].pop()
def submit_suffix(op: str):
return "" if not op else op
def pickup_suffix(op: str):
return "pickup" if not op else f"{op}_pickup"

View File

@ -1,8 +0,0 @@
import abc
from typing import Iterable
class Packer(abc.ABC):
@abc.abstractmethod
def __call__(self, data: Iterable, metadata: Iterable):
pass

View File

@ -1,14 +0,0 @@
from itertools import starmap
from typing import Iterable
from pyinfra.server.packer.packer import Packer
def bundle(data: bytes, metadata: dict):
package = {"data": data, "metadata": metadata}
return package
class IdentityPacker(Packer):
def __call__(self, data: Iterable, metadata):
yield from starmap(bundle, zip(data, metadata))

View File

@ -1,9 +0,0 @@
from typing import Iterable
from pyinfra.server.packer.packer import Packer
from pyinfra.server.packing import pack_data_and_metadata_for_rest_transfer
class RestPacker(Packer):
def __call__(self, data: Iterable[bytes], metadata: Iterable[dict]):
yield from pack_data_and_metadata_for_rest_transfer(data, metadata)

View File

@ -1,34 +0,0 @@
import base64
from _operator import itemgetter
from itertools import starmap
from typing import Iterable
from funcy import compose
from pyinfra.utils.func import starlift, lift
def pack_data_and_metadata_for_rest_transfer(data: Iterable, metadata: Iterable):
yield from starmap(pack, zip(data, metadata))
def unpack_fn_pack(fn):
return compose(starlift(pack), fn, lift(unpack))
def pack(data: bytes, metadata: dict):
package = {"data": bytes_to_string(data), "metadata": metadata}
return package
def unpack(package):
data, metadata = itemgetter("data", "metadata")(package)
return string_to_bytes(data), metadata
def bytes_to_string(data: bytes) -> str:
return base64.b64encode(data).decode()
def string_to_bytes(data: str) -> bytes:
return base64.b64decode(data.encode())

View File

@ -1,8 +0,0 @@
import abc
from typing import Iterable
class Receiver(abc.ABC):
@abc.abstractmethod
def __call__(self, package: Iterable):
pass

View File

@ -1,11 +0,0 @@
from typing import Iterable
from pyinfra.server.receiver.receiver import Receiver
from funcy import notnone
class QueuedStreamFunctionReceiver(Receiver):
def __call__(self, responses: Iterable):
for response in filter(notnone, responses):
yield response

View File

@ -1,16 +0,0 @@
from typing import Iterable
import requests
from funcy import chunks, flatten
from pyinfra.server.receiver.receiver import Receiver
class RestReceiver(Receiver):
def __init__(self, chunk_size=3):
self.chunk_size = chunk_size
def __call__(self, responses: Iterable[requests.Response]):
for response in flatten(chunks(self.chunk_size, responses)):
response.raise_for_status()
yield response.json()

View File

@ -1,100 +0,0 @@
from functools import singledispatch
from typing import Dict, Callable, Union
from flask import Flask, jsonify, request
from prometheus_client import generate_latest
from pyinfra.config import CONFIG
from pyinfra.server.buffering.stream import FlatStreamBuffer
from pyinfra.server.monitoring import OperationDispatcherMonitoringDecorator
from pyinfra.server.operation_dispatcher import OperationDispatcher
from pyinfra.server.stream.queued_stream_function import QueuedStreamFunction
@singledispatch
def set_up_processing_server(arg: Union[dict, Callable], buffer_size=1):
"""Produces a processing server given a streamable function or a mapping from operations to streamable functions.
Streamable functions are constructed by calling pyinfra.server.utils.make_streamable_and_wrap_in_packing_logic on a
function taking a tuple of data and metadata and also returning a tuple or yielding tuples of data and metadata.
If the function doesn't produce data, data should be an empty byte string.
If the function doesn't produce metadata, metadata should be an empty dictionary.
Args:
arg: streamable function or mapping of operations: str to streamable functions
buffer_size: If your function operates on batches this parameter controls how many items are aggregated before
your function is applied.
TODO: buffer_size has to be controllable on per function basis.
Returns:
Processing server: flask app
"""
pass
@set_up_processing_server.register
def _(operation2stream_fn: dict, buffer_size=1):
return __stream_fn_to_processing_server(operation2stream_fn, buffer_size)
@set_up_processing_server.register
def _(stream_fn: object, buffer_size=1):
operation2stream_fn = {None: stream_fn}
return __stream_fn_to_processing_server(operation2stream_fn, buffer_size)
def __stream_fn_to_processing_server(operation2stream_fn: dict, buffer_size):
operation2stream_fn = {
op: QueuedStreamFunction(FlatStreamBuffer(fn, buffer_size)) for op, fn in operation2stream_fn.items()
}
return __set_up_processing_server(operation2stream_fn)
def __set_up_processing_server(operation2function: Dict[str, QueuedStreamFunction]):
app = Flask(__name__)
dispatcher = OperationDispatcherMonitoringDecorator(
OperationDispatcher(operation2function),
naming_policy=naming_policy,
)
def ok():
resp = jsonify("OK")
resp.status_code = 200
return resp
@app.route("/ready", methods=["GET"])
def ready():
return ok()
@app.route("/health", methods=["GET"])
def healthy():
return ok()
@app.route("/prometheus", methods=["GET"])
def prometheus():
return generate_latest(registry=dispatcher.registry)
@app.route("/<operation>", methods=["POST", "PATCH"])
def submit(operation):
return dispatcher.submit(operation, request)
@app.route("/", methods=["POST", "PATCH"])
def submit_default():
return dispatcher.submit("", request)
@app.route("/<operation>", methods=["GET"])
def pickup(operation):
return dispatcher.pickup(operation)
return app
def naming_policy(op_name: str):
pop_suffix = OperationDispatcher.pickup_suffix
prefix = f"redactmanager_{CONFIG.service.name}"
op_display_name = op_name.replace(f"_{pop_suffix}", "") if op_name != pop_suffix else "default"
complete_display_name = f"{prefix}_{op_display_name}"
return complete_display_name

View File

@ -1,21 +0,0 @@
from funcy import first
from pyinfra.server.buffering.queue import stream_queue, Queue
class QueuedStreamFunction:
def __init__(self, stream_function):
"""Combines a stream function with a queue.
Args:
stream_function: Needs to operate on iterables.
"""
self.queue = Queue()
self.stream_function = stream_function
def push(self, item):
self.queue.append(item)
def pop(self):
items = stream_queue(self.queue)
return first(self.stream_function(items))

View File

@ -1,51 +0,0 @@
import logging
from flask import jsonify
from funcy import drop
from pyinfra.server.nothing import Nothing
from pyinfra.server.stream.queued_stream_function import QueuedStreamFunction
logger = logging.getLogger(__name__)
class LazyRestProcessor:
def __init__(self, queued_stream_function: QueuedStreamFunction, submit_suffix="submit", pickup_suffix="pickup"):
self.submit_suffix = submit_suffix
self.pickup_suffix = pickup_suffix
self.queued_stream_function = queued_stream_function
def push(self, request):
self.queued_stream_function.push(request.json)
return jsonify(replace_suffix(request.base_url, self.submit_suffix, self.pickup_suffix))
def pop(self):
result = self.queued_stream_function.pop() or Nothing
if not valid(result):
logger.error(f"Received invalid result: {result}")
result = Nothing
if result is Nothing:
logger.info("Analysis completed successfully.")
resp = jsonify("No more items left")
resp.status_code = 204
else:
logger.debug("Partial analysis completed.")
resp = jsonify(result)
resp.status_code = 206
return resp
def valid(result):
return isinstance(result, dict) or result is Nothing
def replace_suffix(strn, suf, repl):
return remove_last_n(strn, len(suf)) + repl
def remove_last_n(strn, n):
return "".join(reversed(list(drop(n, reversed(strn)))))

View File

@ -1,16 +0,0 @@
from funcy import compose, identity
from pyinfra.server.normalization import normalize
from pyinfra.server.packing import unpack_fn_pack
from pyinfra.utils.func import starlift
def make_streamable_and_wrap_in_packing_logic(fn, batched):
fn = make_streamable(fn, batched)
fn = unpack_fn_pack(fn)
return fn
def make_streamable(fn, batched):
# FIXME: something broken with batched == True
return compose(normalize, (identity if batched else starlift)(fn))

View File

@ -1,34 +0,0 @@
from abc import ABC, abstractmethod
class StorageAdapter(ABC):
def __init__(self, client):
self.__client = client
@abstractmethod
def make_bucket(self, bucket_name):
raise NotImplementedError
@abstractmethod
def has_bucket(self, bucket_name):
raise NotImplementedError
@abstractmethod
def put_object(self, bucket_name, object_name, data):
raise NotImplementedError
@abstractmethod
def get_object(self, bucket_name, object_name):
raise NotImplementedError
@abstractmethod
def get_all_objects(self, bucket_name):
raise NotImplementedError
@abstractmethod
def clear_bucket(self, bucket_name):
raise NotImplementedError
@abstractmethod
def get_all_object_names(self, bucket_name, prefix=None):
raise NotImplementedError

View File

@ -1,64 +0,0 @@
import logging
from operator import attrgetter
from azure.storage.blob import ContainerClient, BlobServiceClient
from pyinfra.storage.adapters.adapter import StorageAdapter
logger = logging.getLogger(__name__)
logging.getLogger("azure").setLevel(logging.WARNING)
logging.getLogger("urllib3").setLevel(logging.WARNING)
class AzureStorageAdapter(StorageAdapter):
def __init__(self, client):
super().__init__(client=client)
self.__client: BlobServiceClient = self._StorageAdapter__client
def has_bucket(self, bucket_name):
container_client = self.__client.get_container_client(bucket_name)
return container_client.exists()
def __provide_container_client(self, bucket_name) -> ContainerClient:
self.make_bucket(bucket_name)
container_client = self.__client.get_container_client(bucket_name)
return container_client
def make_bucket(self, bucket_name):
container_client = self.__client.get_container_client(bucket_name)
container_client if container_client.exists() else self.__client.create_container(bucket_name)
def put_object(self, bucket_name, object_name, data):
logger.debug(f"Uploading '{object_name}'...")
container_client = self.__provide_container_client(bucket_name)
blob_client = container_client.get_blob_client(object_name)
blob_client.upload_blob(data, overwrite=True)
def get_object(self, bucket_name, object_name):
logger.debug(f"Downloading '{object_name}'...")
container_client = self.__provide_container_client(bucket_name)
blob_client = container_client.get_blob_client(object_name)
blob_data = blob_client.download_blob()
return blob_data.readall()
def get_all_objects(self, bucket_name):
container_client = self.__provide_container_client(bucket_name)
blobs = container_client.list_blobs()
for blob in blobs:
logger.debug(f"Downloading '{blob.name}'...")
blob_client = container_client.get_blob_client(blob)
blob_data = blob_client.download_blob()
data = blob_data.readall()
yield data
def clear_bucket(self, bucket_name):
logger.debug(f"Clearing Azure container '{bucket_name}'...")
container_client = self.__client.get_container_client(bucket_name)
blobs = container_client.list_blobs()
container_client.delete_blobs(*blobs)
def get_all_object_names(self, bucket_name, prefix=None):
container_client = self.__provide_container_client(bucket_name)
blobs = container_client.list_blobs(name_starts_with=prefix)
return map(attrgetter("name"), blobs)

View File

@ -1,58 +0,0 @@
import io
import logging
from itertools import repeat
from operator import attrgetter
from minio import Minio
from pyinfra.exceptions import DataLoadingFailure
from pyinfra.storage.adapters.adapter import StorageAdapter
logger = logging.getLogger(__name__)
class S3StorageAdapter(StorageAdapter):
def __init__(self, client):
super().__init__(client=client)
self.__client: Minio = self._StorageAdapter__client
def make_bucket(self, bucket_name):
if not self.has_bucket(bucket_name):
self.__client.make_bucket(bucket_name)
def has_bucket(self, bucket_name):
return self.__client.bucket_exists(bucket_name)
def put_object(self, bucket_name, object_name, data):
logger.debug(f"Uploading '{object_name}'...")
data = io.BytesIO(data)
self.__client.put_object(bucket_name, object_name, data, length=data.getbuffer().nbytes)
def get_object(self, bucket_name, object_name):
logger.debug(f"Downloading '{object_name}'...")
response = None
try:
response = self.__client.get_object(bucket_name, object_name)
return response.data
except Exception as err:
raise DataLoadingFailure("Failed getting object from s3 client") from err
finally:
if response:
response.close()
response.release_conn()
def get_all_objects(self, bucket_name):
for obj in self.__client.list_objects(bucket_name, recursive=True):
logger.debug(f"Downloading '{obj.object_name}'...")
yield self.get_object(bucket_name, obj.object_name)
def clear_bucket(self, bucket_name):
logger.debug(f"Clearing S3 bucket '{bucket_name}'...")
objects = self.__client.list_objects(bucket_name, recursive=True)
for obj in objects:
self.__client.remove_object(bucket_name, obj.object_name)
def get_all_object_names(self, bucket_name, prefix=None):
objs = self.__client.list_objects(bucket_name, recursive=True, prefix=prefix)
return map(attrgetter("object_name"), objs)

View File

@ -1,11 +0,0 @@
from azure.storage.blob import BlobServiceClient
from pyinfra.config import CONFIG
def get_azure_client(connection_string=None) -> BlobServiceClient:
if not connection_string:
connection_string = CONFIG.storage.azure.connection_string
return BlobServiceClient.from_connection_string(conn_str=connection_string)

View File

@ -1,40 +0,0 @@
import re
from minio import Minio
from pyinfra.config import CONFIG
from pyinfra.exceptions import InvalidEndpoint
def parse_endpoint(endpoint):
# FIXME Greedy matching (.+) since we get random storage names on kubernetes (eg http://red-research-headless:9000)
# FIXME this has been broken and accepts invalid URLs
endpoint_pattern = r"(?P<protocol>https?)*(?:://)*(?P<address>(?:(?:(?:\d{1,3}\.){3}\d{1,3})|.+)(?:\:\d+)?)"
match = re.match(endpoint_pattern, endpoint)
if not match:
raise InvalidEndpoint(f"Endpoint {endpoint} is invalid; expected {endpoint_pattern}")
return {"secure": match.group("protocol") == "https", "endpoint": match.group("address")}
def get_s3_client(params=None) -> Minio:
"""
Args:
params: dict like
{
"endpoint": <storage_endpoint>
"access_key": <storage_key>
"secret_key": <storage_secret>
}
"""
if not params:
params = CONFIG.storage.s3
return Minio(
**parse_endpoint(params.endpoint),
access_key=params.access_key,
secret_key=params.secret_key,
region=params.region,
)

View File

@ -0,0 +1,89 @@
from functools import lru_cache
import requests
from dynaconf import Dynaconf
from kn_utils.logging import logger
from pyinfra.config.loader import validate_settings
from pyinfra.config.validators import (
multi_tenant_storage_validators,
storage_validators,
)
from pyinfra.storage.storages.azure import get_azure_storage_from_settings
from pyinfra.storage.storages.s3 import get_s3_storage_from_settings
from pyinfra.storage.storages.storage import Storage
from pyinfra.utils.cipher import decrypt
def get_storage(settings: Dynaconf, tenant_id: str = None) -> Storage:
"""Establishes a storage connection.
If tenant_id is provided, gets storage connection information from tenant server. These connections are cached.
Otherwise, gets storage connection information from settings.
"""
logger.info("Establishing storage connection...")
if tenant_id:
logger.info(f"Using tenant storage for {tenant_id}.")
validate_settings(settings, multi_tenant_storage_validators)
return get_storage_for_tenant(
tenant_id,
settings.storage.tenant_server.endpoint,
settings.storage.tenant_server.public_key,
)
logger.info("Using default storage.")
validate_settings(settings, storage_validators)
return storage_dispatcher[settings.storage.backend](settings)
storage_dispatcher = {
"azure": get_azure_storage_from_settings,
"s3": get_s3_storage_from_settings,
}
@lru_cache(maxsize=10)
def get_storage_for_tenant(tenant: str, endpoint: str, public_key: str) -> Storage:
response = requests.get(f"{endpoint}/{tenant}").json()
maybe_azure = response.get("azureStorageConnection")
maybe_s3 = response.get("s3StorageConnection")
assert (maybe_azure or maybe_s3) and not (maybe_azure and maybe_s3), "Only one storage backend can be used."
if maybe_azure:
connection_string = decrypt(public_key, maybe_azure["connectionString"])
backend = "azure"
storage_info = {
"storage": {
"azure": {
"connection_string": connection_string,
"container": maybe_azure["containerName"],
},
}
}
elif maybe_s3:
secret = decrypt(public_key, maybe_s3["secret"])
backend = "s3"
storage_info = {
"storage": {
"s3": {
"endpoint": maybe_s3["endpoint"],
"key": maybe_s3["key"],
"secret": secret,
"region": maybe_s3["region"],
"bucket": maybe_s3["bucketName"],
},
}
}
else:
raise Exception(f"Unknown storage backend in {response}.")
storage_settings = Dynaconf()
storage_settings.update(storage_info)
storage = storage_dispatcher[backend](storage_settings)
return storage

View File

@ -1,44 +0,0 @@
import logging
from pyinfra.config import CONFIG
from pyinfra.exceptions import DataLoadingFailure
from pyinfra.storage.adapters.adapter import StorageAdapter
from pyinfra.utils.retry import retry
logger = logging.getLogger(__name__)
logger.setLevel(CONFIG.service.logging_level)
class Storage:
def __init__(self, adapter: StorageAdapter):
self.__adapter = adapter
def make_bucket(self, bucket_name):
self.__adapter.make_bucket(bucket_name)
def has_bucket(self, bucket_name):
return self.__adapter.has_bucket(bucket_name)
def put_object(self, bucket_name, object_name, data):
self.__adapter.put_object(bucket_name, object_name, data)
def get_object(self, bucket_name, object_name):
return self.__get_object(bucket_name, object_name)
@retry(DataLoadingFailure)
def __get_object(self, bucket_name, object_name):
try:
return self.__adapter.get_object(bucket_name, object_name)
except Exception as err:
logging.error(err)
raise DataLoadingFailure from err
def get_all_objects(self, bucket_name):
return self.__adapter.get_all_objects(bucket_name)
def clear_bucket(self, bucket_name):
return self.__adapter.clear_bucket(bucket_name)
def get_all_object_names(self, bucket_name, prefix=None):
return self.__adapter.get_all_object_names(bucket_name, prefix=prefix)

View File

@ -1,26 +0,0 @@
from pyinfra.exceptions import UnknownStorageBackend
from pyinfra.storage.adapters.azure import AzureStorageAdapter
from pyinfra.storage.adapters.s3 import S3StorageAdapter
from pyinfra.storage.clients.azure import get_azure_client
from pyinfra.storage.clients.s3 import get_s3_client
from pyinfra.storage.storage import Storage
def get_azure_storage(config=None):
return Storage(AzureStorageAdapter(get_azure_client(config)))
def get_s3_storage(config=None):
return Storage(S3StorageAdapter(get_s3_client(config)))
def get_storage(storage_backend):
if storage_backend == "s3":
storage = get_s3_storage()
elif storage_backend == "azure":
storage = get_azure_storage()
else:
raise UnknownStorageBackend(f"Unknown storage backend '{storage_backend}'.")
return storage

Some files were not shown because too many files have changed in this diff Show More