Compare commits

..

266 Commits

Author SHA1 Message Date
Julius Unverfehrt
a1bfec765c Pull request #43: Image prediction v2 support
Merge in RR/pyinfra from image-prediction-v2-support to 2.0.0

Squashed commit of the following:

commit 37c536324e847357e86dd9b72d1e07ad792ed90f
Merge: 77d1db8 01bfb1d
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Mon Jul 11 13:53:56 2022 +0200

    Merge branch '2.0.0' of ssh://git.iqser.com:2222/rr/pyinfra into image-prediction-v2-support

commit 77d1db8e8630de8822c124eb39f4cd817ed1d3e1
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Mon Jul 11 13:07:41 2022 +0200

    add operation assignment via config if operation is not defined by caller

commit 36c8ca48a8c6151f713c093a23de110901ba6b02
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Mon Jul 11 10:33:34 2022 +0200

    refactor nothing part 2

commit f6cd0ef986802554dd544b9b7a24073d3b3f05b5
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Mon Jul 11 10:28:49 2022 +0200

    refactor nothing

commit 1e70d49531e89613c70903be49290b94ee014f65
Author: Matthias Bisping <matthias.bisping@iqser.com>
Date:   Wed Jul 6 17:42:12 2022 +0200

    enable docker-compose fixture

commit 9fee32cecdd120cfac3e065fb8ad2b4f37b49226
Author: Matthias Bisping <matthias.bisping@iqser.com>
Date:   Wed Jul 6 17:40:35 2022 +0200

    added 'multi' key to actual operation configurations

commit 4287f6d9878dd361489b8490eafd06f81df472ce
Author: Matthias Bisping <matthias.bisping@iqser.com>
Date:   Wed Jul 6 16:56:12 2022 +0200

    removed debug prints

commit 23a533e8f99222c7e598fb0864f65e9aa3508a3b
Author: Matthias Bisping <matthias.bisping@iqser.com>
Date:   Wed Jul 6 16:31:50 2022 +0200

    completed correcting / cleaning upload and download logic with regard to operations and ids. next: remove debug code

commit 33246d1ff94989d2ea70242c7ae2e58afa4d35c1
Author: Matthias Bisping <matthias.bisping@iqser.com>
Date:   Wed Jul 6 14:37:17 2022 +0200

    corrected / cleaned upload and download logic with regard to operations and ids

commit 7f2b4e882022c6843cb2f80df202caa495c54ee9
Author: Matthias Bisping <matthias.bisping@iqser.com>
Date:   Tue Jul 5 18:41:07 2022 +0200

    partially decomplected file descriptor manager from concrete and non-generic descriptor code

commit 40b892da17670dae3b8eba1700877c1dcf219852
Author: Matthias Bisping <matthias.bisping@iqser.com>
Date:   Tue Jul 5 09:53:46 2022 +0200

    typo

commit ec4fa8e6f4551ff1f8d4f78c484b7a260f274898
Author: Matthias Bisping <matthias.bisping@iqser.com>
Date:   Tue Jul 5 09:52:41 2022 +0200

    typo

commit 701b43403c328161fd96a73ce388a66035cca348
Author: Matthias Bisping <matthias.bisping@iqser.com>
Date:   Mon Jul 4 17:26:53 2022 +0200

    made adjustments for image classification with pyinfra 2.x; added related fixmes

commit 7a794bdcc987631cdc4d89b5620359464e2e018e
Author: Matthias Bisping <matthias.bisping@iqser.com>
Date:   Mon Jul 4 13:05:26 2022 +0200

    removed obsolete imports

commit 3fc6a7ef5d0172dbce1c4292d245eced2f378b5a
Author: Matthias Bisping <matthias.bisping@iqser.com>
Date:   Mon Jul 4 11:47:12 2022 +0200

    enable docker-compose fixture

commit 36d8d3bc851b06d94cf12a73048a00a67ef79c42
Author: Matthias Bisping <matthias.bisping@iqser.com>
Date:   Mon Jul 4 11:46:53 2022 +0200

    renaming

commit 3bf00d11cd041dff325b66f13fcd00d3ce96b8b5
Author: Matthias Bisping <matthias.bisping@iqser.com>
Date:   Thu Jun 30 12:47:57 2022 +0200

    refactoring: added cached pipeline factory

commit 90e735852af2f86e35be845fabf28494de952edb
Author: Matthias Bisping <matthias.bisping@iqser.com>
Date:   Wed Jun 29 13:47:08 2022 +0200

    renaming

commit 93b3d4b202b41183ed8cabe193a4bfa03f520787
Author: Matthias Bisping <matthias.bisping@iqser.com>
Date:   Wed Jun 29 13:25:03 2022 +0200

    further refactored server setup code: moving and decomplecting

commit 8b2ed83c7ade5bd811cb045d56fbfb0353fa385e
Author: Matthias Bisping <matthias.bisping@iqser.com>
Date:   Wed Jun 29 12:53:09 2022 +0200

    refactored server setup code: factored out and decoupled operation registry and prometheus summary registry

... and 6 more commits
2022-07-11 14:17:59 +02:00
Julius Unverfehrt
01bfb1d668 Pull request #40: 2.0.0 documentation
Merge in RR/pyinfra from 2.0.0-documentation to 2.0.0

Squashed commit of the following:

commit 7a794bdcc987631cdc4d89b5620359464e2e018e
Author: Matthias Bisping <matthias.bisping@iqser.com>
Date:   Mon Jul 4 13:05:26 2022 +0200

    removed obsolete imports

commit 3fc6a7ef5d0172dbce1c4292d245eced2f378b5a
Author: Matthias Bisping <matthias.bisping@iqser.com>
Date:   Mon Jul 4 11:47:12 2022 +0200

    enable docker-compose fixture

commit 36d8d3bc851b06d94cf12a73048a00a67ef79c42
Author: Matthias Bisping <matthias.bisping@iqser.com>
Date:   Mon Jul 4 11:46:53 2022 +0200

    renaming

commit 3bf00d11cd041dff325b66f13fcd00d3ce96b8b5
Author: Matthias Bisping <matthias.bisping@iqser.com>
Date:   Thu Jun 30 12:47:57 2022 +0200

    refactoring: added cached pipeline factory

commit 90e735852af2f86e35be845fabf28494de952edb
Author: Matthias Bisping <matthias.bisping@iqser.com>
Date:   Wed Jun 29 13:47:08 2022 +0200

    renaming

commit 93b3d4b202b41183ed8cabe193a4bfa03f520787
Author: Matthias Bisping <matthias.bisping@iqser.com>
Date:   Wed Jun 29 13:25:03 2022 +0200

    further refactored server setup code: moving and decomplecting

commit 8b2ed83c7ade5bd811cb045d56fbfb0353fa385e
Author: Matthias Bisping <matthias.bisping@iqser.com>
Date:   Wed Jun 29 12:53:09 2022 +0200

    refactored server setup code: factored out and decoupled operation registry and prometheus summary registry

commit da2dce762bdd6889165fbb320dc9ee8a0bd089b2
Author: Matthias Bisping <matthias.bisping@iqser.com>
Date:   Tue Jun 28 19:40:04 2022 +0200

    adjusted test target

commit 70df7911b9b92f4b72afd7d4b33ca2bbf136295e
Author: Matthias Bisping <matthias.bisping@iqser.com>
Date:   Tue Jun 28 19:32:38 2022 +0200

    minor refactoring

commit 0937b63dc000346559bde353381304b273244109
Author: Matthias Bisping <matthias.bisping@iqser.com>
Date:   Mon Jun 27 13:59:59 2022 +0200

    support for empty operation suffix

commit 5e56917970962a2e69bbd66a324bdb4618c040bd
Author: Matthias Bisping <matthias.bisping@iqser.com>
Date:   Mon Jun 27 12:52:36 2022 +0200

    minor refactoring

commit 40665a7815ae5927b3877bda14fb77deef37d667
Author: Matthias Bisping <matthias.bisping@iqser.com>
Date:   Mon Jun 27 10:57:04 2022 +0200

    optimization: prefix filtering via storage API for get_all_object_names

commit af0892a899d09023eb0e61eecb63e03dc2fd3b60
Author: Matthias Bisping <matthias.bisping@iqser.com>
Date:   Mon Jun 27 10:55:47 2022 +0200

    topological sorting of definitions by caller hierarchy
2022-07-11 09:09:44 +02:00
Matthias Bisping
94254e1681 Pull request #38: 2.0.0 input output file pattern for download strategy
Merge in RR/pyinfra from 2.0.0-input-output-file-pattern-for-download-strategy to 2.0.0

Squashed commit of the following:

commit c7ce79ebbeace6a8cb7925ed69eda2d7cd2a4783
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Fri Jun 24 12:35:29 2022 +0200

    refactor

commit 80f04e544962760adb2dc60c9dd03ccca22167d6
Author: Matthias Bisping <matthias.bisping@iqser.com>
Date:   Fri Jun 24 11:06:10 2022 +0200

    refactoring of component factory, callback and client-pipeline getter

commit 6c024e1a789e1d55f0739c6846e5c02e8b7c943d
Author: Matthias Bisping <matthias.bisping@iqser.com>
Date:   Thu Jun 23 20:04:10 2022 +0200

    operations section in config cleaned up; added upload formatter

commit c85800aefc224967cea591c1ec4cf1aaa3ac8215
Author: Matthias Bisping <matthias.bisping@iqser.com>
Date:   Thu Jun 23 19:22:51 2022 +0200

    refactoring; removed obsolete config entries and code

commit 4be125952d82dc868935c8c73ad87fd8f0bd1d6c
Author: Matthias Bisping <matthias.bisping@iqser.com>
Date:   Thu Jun 23 19:14:47 2022 +0200

    removed obsolete code

commit ac69a5c8e3f1e2fd7e828a17eeab97984f4f9746
Author: Matthias Bisping <matthias.bisping@iqser.com>
Date:   Thu Jun 23 18:58:41 2022 +0200

    refactoring: rm dl strat module

commit efd36d0fc4f8f36d267bfa9d35415811fe723ccc
Author: Matthias Bisping <matthias.bisping@iqser.com>
Date:   Thu Jun 23 18:33:51 2022 +0200

    refactoring: multi dl strat -> downloader, rm single dl strat

commit afffdeb993500a6abdb6fe85a549e3d6e97e9ee7
Author: Matthias Bisping <matthias.bisping@iqser.com>
Date:   Thu Jun 23 16:39:22 2022 +0200

    operations section in config cleaned up

commit 671129af3e343490e0fb277a2b0329aa3027fd73
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Thu Jun 23 16:09:16 2022 +0200

    rename prometheus metric name to include service name

commit 932a3e314b382315492aecab95b1f02f2916f8a6
Author: Matthias Bisping <matthias.bisping@iqser.com>
Date:   Thu Jun 23 14:43:23 2022 +0200

    cleaned up file descr mngr

commit 79350b4ce71fcd095ed6a5e1d3a598ea246fae53
Author: Matthias Bisping <matthias.bisping@iqser.com>
Date:   Thu Jun 23 12:26:15 2022 +0200

    refactoring WIP: moving response stratgey logic into storage strategy (needs to be refactored as well, later) and file descr mngr. Here the moved code needs to be cleaned up.

commit 7e48c66f0c378b25a433a4034eefdc8a0957e775
Author: Matthias Bisping <matthias.bisping@iqser.com>
Date:   Thu Jun 23 12:00:48 2022 +0200

    refactoring; removed operation / response folder from output path

commit 8e6cbdaf23c48f6eeb52512b7f382d5727e206d6
Author: Matthias Bisping <matthias.bisping@iqser.com>
Date:   Thu Jun 23 11:08:37 2022 +0200

    refactoring; added operation -> file pattern mapping to file descr mngr (mainly for self-documentaton purposes)

commit 2c80d7cec0cc171e099e5b13aadd2ae0f9bf4f02
Author: Matthias Bisping <matthias.bisping@iqser.com>
Date:   Thu Jun 23 10:59:57 2022 +0200

    refactoring: introduced input- and output-file specific methods to file descr mngr

commit ecced37150eaac3008cc1b01b235e5f7135e504b
Author: Matthias Bisping <matthias.bisping@iqser.com>
Date:   Thu Jun 23 10:43:26 2022 +0200

    refactoring

commit 3828341e98861ff8d63035ee983309ad5064bb30
Author: Matthias Bisping <matthias.bisping@iqser.com>
Date:   Thu Jun 23 10:42:46 2022 +0200

    refactoring

commit 9a7c412523d467af40feb6924823ca89e28aadfe
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Wed Jun 22 17:04:54 2022 +0200

    add prometheus metric name for default operation

commit d207b2e274ba53b2a21a18c367bb130fb05ee1cd
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Wed Jun 22 17:02:55 2022 +0200

    Merge config

commit d3fdf36b12d8def18810454765e731599b833bfc
Author: Matthias Bisping <matthias.bisping@iqser.com>
Date:   Wed Jun 22 17:01:12 2022 +0200

    added fixmes / todos

commit f49d0b9cb7764473ef9d127bc5d88525a4a16a23
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Wed Jun 22 16:28:25 2022 +0200

    update script

... and 47 more commits
2022-06-24 12:59:26 +02:00
Julius Unverfehrt
0d87c60fce parametrize download strategy 2022-06-20 11:43:33 +02:00
Julius Unverfehrt
2dff7d62aa remove duplicate pickup metrics 2022-06-20 10:27:36 +02:00
Julius Unverfehrt
e9424aee04 remove duplicate pickup metrics 2022-06-20 08:29:33 +02:00
Julius Unverfehrt
9d73f42982 remove duplicate pickup metrics 2022-06-20 08:19:51 +02:00
Matthias Bisping
4aef3316a3 renaming 2022-06-15 15:14:17 +02:00
Matthias Bisping
41172d6abb formatting 2022-06-15 15:13:46 +02:00
Matthias Bisping
71af6f703b using function local registry for prometheus 2022-06-15 15:13:10 +02:00
Julius Unverfehrt
a5ff59069a Merge branch '2.0.0' of ssh://git.iqser.com:2222/rr/pyinfra into 2.0.0 2022-06-15 14:56:06 +02:00
Julius Unverfehrt
965d79b08f add prometheus endpoint to analysis server 2022-06-15 14:52:22 +02:00
Matthias Bisping
ca6a2f8d32 fixed docker fixture 2022-06-15 14:14:38 +02:00
Matthias Bisping
86eb3a6f7e enanbled docker fixture 2022-06-15 14:01:56 +02:00
Matthias Bisping
87cf1ad189 removed obsolete imports 2022-06-15 14:00:36 +02:00
Matthias Bisping
7865a767c7 added type hint 2022-06-15 14:00:09 +02:00
Matthias Bisping
3897e44378 Merge branch '2.0.0' of ssh://git.iqser.com:2222/rr/pyinfra into 2.0.0 2022-06-15 12:25:55 +02:00
Matthias Bisping
1558398c56 made object name construction logic part of download strategies 2022-06-15 12:25:27 +02:00
Matthias Bisping
8537d4af50 made object name construction logic part of download strategies 2022-06-15 12:02:41 +02:00
Matthias Bisping
116c2b8924 changed default target file extension 2022-06-15 10:31:05 +02:00
Matthias Bisping
45f04590cc removed obsolete code 2022-06-15 10:25:58 +02:00
Matthias Bisping
bb729b6c26 wrapped retry decortaor, so retry behaviour can be controlled via config and set to a lower value for tests to save time 2022-06-15 10:25:53 +02:00
Matthias Bisping
24be8d3f13 test config options for logging and docker; changed object name construction 2022-06-15 09:59:47 +02:00
Matthias Bisping
147416bfad pin minio and rabbitmq again 2022-06-14 17:05:33 +02:00
Matthias Bisping
c8fb15b9f7 rm retry decorator on clear_bucket, unpin minio 2022-06-14 17:02:22 +02:00
Matthias Bisping
771df7c78d make bucket before running test; rabbitmq 3.9 again 2022-06-14 16:58:27 +02:00
Matthias Bisping
f9972a95a7 fixed minio version 2022-06-14 16:44:46 +02:00
Matthias Bisping
83e1b5f029 added retry to clear bucket 2022-06-14 16:34:10 +02:00
Matthias Bisping
c1b5cbeb51 logging setup changed 2022-06-14 16:28:45 +02:00
Matthias Bisping
4fcc89f938 s3 backend fixture no longer needs to not come last 2022-06-14 15:40:23 +02:00
Matthias Bisping
d1242aee6c enable docker-compose fixture 2022-06-14 15:33:34 +02:00
Matthias Bisping
0442ecd7b3 Merge branch 'file_extensions_and_index_handler_via_config' of ssh://git.iqser.com:2222/rr/pyinfra into file_extensions_and_index_handler_via_config 2022-06-14 15:33:14 +02:00
Matthias Bisping
e64ade3135 added comments to new config params 2022-06-14 15:33:11 +02:00
Julius Unverfehrt
ace919d078 set xfail for broken tests, set docker-compose rabbitmq version to version running on production server 2022-06-14 15:10:08 +02:00
Matthias Bisping
d179fdede6 consumer test runs again...? 2022-06-14 14:17:08 +02:00
Julius Unverfehrt
9b975b759b set xfail for broken tests 2022-06-14 14:06:06 +02:00
Julius Unverfehrt
c033d98acd adjust test for new return type of visitor, add download strategy parameter to config 2022-06-14 12:33:56 +02:00
Julius Unverfehrt
bb7e631f91 introduce flag to distinguish between server side tests and complete integration tests 2022-06-14 11:56:47 +02:00
Julius Unverfehrt
d8b5be9e72 refactoring 2022-06-14 11:26:46 +02:00
Julius Unverfehrt
2954bbc1ad refactoring 2022-06-14 11:21:31 +02:00
Matthias Bisping
a69f613fe6 completed multi download to single response logic. but broke pipeline test again, maybe? 2022-06-14 11:08:03 +02:00
Matthias Bisping
fa3b08aef5 added dependencies 2022-06-13 15:38:14 +02:00
Matthias Bisping
14ab23b2cc fixed bug in operation wrapper returning a tuple instead of an singleton-iterable with a tuple in one of the return-cases. 2022-06-13 15:36:17 +02:00
Matthias Bisping
8a64e5d868 narrowed down the pytest bug: n_items interacts with s3_backend: when n_items has more than one entry, s3_backend must not be the last decorator 2022-06-13 10:36:26 +02:00
Matthias Bisping
051cea3ded found bug in pytest fixture setup causing serve_test to fail (s3 backend fixture function vs s3 backend decorator fixture) 2022-06-13 10:15:33 +02:00
Matthias Bisping
40bc8c2c2c debugging of queue problem, when queue is not consumed by skipping a test configuration WIP 2022-06-13 09:49:10 +02:00
Matthias Bisping
9962651d88 download strategy WIP: added 1 -> n upload logic 2022-06-10 14:06:19 +02:00
Matthias Bisping
249c6203b2 download strategy WIP 2022-06-10 13:26:43 +02:00
Matthias Bisping
3a3c497383 added logic for uploading response files to a folder, which defaults to the name of the operation used 2022-06-10 13:03:40 +02:00
Matthias Bisping
13b6388e5a fixed default identifier type 2022-06-09 16:24:11 +02:00
Matthias Bisping
1940b974b1 added id setting to operatin mocks and thus removed need for random seed of randomly seeded hash in storage item identifier 2022-06-09 15:18:41 +02:00
Matthias Bisping
7e46a66698 fixed bug in stream_pages: return -> yield 2022-06-09 14:58:42 +02:00
Matthias Bisping
5b45f5fa15 added dependency 2022-06-08 14:27:34 +02:00
Matthias Bisping
e43504f08d commented out consumer tests. something is wrong with the fixtures, leading to the tests failing when run together with other tests. Consumer functionality is covered by serve_test.py, but dedicated tests should be restored at a later point. 2022-06-08 11:03:50 +02:00
Matthias Bisping
bffaa0786e readded correct consumer test code which git messed up 2022-06-07 15:48:54 +02:00
Matthias Bisping
8d209d63c7 Merge branch 'master' of ssh://git.iqser.com:2222/rr/pyinfra into fixing_consumer_tests 2022-06-07 15:44:39 +02:00
Matthias Bisping
0dee98b23d consumer test adjustment WIP 2022-06-07 15:11:38 +02:00
Matthias Bisping
91701929e5 adjusted stream buffer test for core-operations taking tuples now 2022-06-07 15:08:58 +02:00
Matthias Bisping
c55e41f2d8 refactoring; tweaked json-blob-parser; added standardization case for decodable strings as storage items 2022-06-07 14:55:40 +02:00
Matthias Bisping
f718b2f7ef parser composer checks for either-type 2022-06-03 16:34:37 +02:00
Matthias Bisping
730bdfb220 refactoring 2022-06-03 16:26:12 +02:00
Matthias Bisping
e48fa85784 replaced trial and error logic for parsing blobs in visitor with parser composer instance 2022-06-03 16:18:56 +02:00
Matthias Bisping
6e5af4092e added parsers and parser composer for clean handling of storage blobs in the context of interpreting downloaded blobs 2022-06-03 16:13:15 +02:00
Matthias Bisping
ea2d3223fb parsing strategy error handling for bytes as not an encoded string 2022-06-03 14:55:04 +02:00
Matthias Bisping
26573eeda3 introduced parsing strategy for storage blobs as part of the queue visitor 2022-06-03 14:49:19 +02:00
Matthias Bisping
7730950b50 cleaning up standardization method for downloaded storage items (WIP) 2022-06-03 14:08:40 +02:00
Matthias Bisping
9a47388017 adjusted param fixtures for serve test 2022-06-03 13:48:33 +02:00
Matthias Bisping
8a2b60a8f5 applied black 2022-06-03 13:42:52 +02:00
Matthias Bisping
9232385dea modified core operations to return metadata for better classification mock test 2022-06-03 13:40:59 +02:00
Matthias Bisping
eb81e96400 added extraction test case (with page index) to serving test 2022-06-03 13:13:45 +02:00
Julius Unverfehrt
e7ee0cda42 add compression for storage item before upload, update script for extraction 2022-06-02 15:49:28 +02:00
Julius Unverfehrt
90f8f9da36 update script for extraction 2022-06-02 15:12:33 +02:00
Julius Unverfehrt
c2d7127a84 add log for Consumer Error, fix page index hash function 2022-06-02 14:15:03 +02:00
Matthias Bisping
bfe8bbb8cb reorganized queue message metadata in request 2022-06-01 16:00:46 +02:00
Matthias Bisping
ecff50ae7c submit endpoint is now 'submit' or 'operation' 2022-06-01 11:46:51 +02:00
Matthias Bisping
7a1b215d69 removed obsolete code 2022-06-01 11:37:47 +02:00
Matthias Bisping
01ce914417 generalizing server setup from operations WIP 2022-06-01 10:58:44 +02:00
Matthias Bisping
2b72174605 Merge branch 'integration_test_in_order_develop_aggregation_stratgey' of ssh://git.iqser.com:2222/rr/pyinfra into integration_test_in_order_develop_aggregation_stratgey 2022-05-31 18:40:47 +02:00
Matthias Bisping
586871a26f added queue message body to analysis input dict 2022-05-31 18:40:40 +02:00
Julius Unverfehrt
187055e5eb adjust mock_client script for conversion 2022-05-31 18:19:41 +02:00
Matthias Bisping
3046b4dc26 misc minor fixes while integrating with pdf2image 2022-05-31 17:58:28 +02:00
Matthias Bisping
dd591bd24b removed obsolete code 2022-05-31 16:23:14 +02:00
Matthias Bisping
12fa52f38c removed obsolete code 2022-05-31 16:22:27 +02:00
Matthias Bisping
043fa1ee53 removed obsolete code 2022-05-31 16:22:11 +02:00
Matthias Bisping
1fa6bbdbc6 removed obsolete code 2022-05-31 16:19:37 +02:00
Matthias Bisping
93747d0f63 test param adjustment 2022-05-31 16:14:48 +02:00
Matthias Bisping
dc4f578e94 reorganized serve-test to use only default-objects instead of test-object 2022-05-31 16:13:47 +02:00
Matthias Bisping
93da0d12bb refactoring: added paramater 'n' to consume_and_publish 2022-05-31 15:30:23 +02:00
Matthias Bisping
a688cbd7bd refactoring 2022-05-31 13:57:20 +02:00
Matthias Bisping
0104395790 removed obsolete code 2022-05-31 13:50:44 +02:00
Matthias Bisping
18a9683ddb operation field in queue message WIP 2022-05-31 13:47:40 +02:00
Matthias Bisping
ae2509dc59 modified visitor and queue manager for 1 -> n (1 request to n response messages) 2022-05-30 13:04:12 +02:00
Matthias Bisping
bf9f6ba8e2 tweaked response upload related logic and repaired visitor tests that were broken by new visitor code written for accomodating the aggregation storage strategy 2022-05-30 12:10:14 +02:00
Matthias Bisping
868a53b23f response file path depending on response metadata and request page index complete 2022-05-25 17:37:18 +02:00
Matthias Bisping
2d1ec16714 modified serve test to use components from fixtures; response file path depending on response metadata and request page index WIP 2022-05-25 16:56:08 +02:00
Matthias Bisping
9e2ed6a9f9 fix: data was doubly encoded and hence always triggering the immediate upload path 2022-05-24 14:51:24 +02:00
Matthias Bisping
ab56c9a173 added todo comment for modifying acknowledging logic at some point to allow input buffering to take effect 2022-05-24 14:39:39 +02:00
Matthias Bisping
298d8d3e2c metadata as part of storage item test works 2022-05-23 15:59:56 +02:00
Matthias Bisping
0842ec0d91 metadata as part of storage item WIP 2022-05-23 15:36:18 +02:00
Matthias Bisping
c944cdb1a7 refactoring: splitting source data from encoded data in data fixture 2022-05-23 13:53:50 +02:00
Matthias Bisping
7b998cdaf6 refactoring 2022-05-23 13:14:18 +02:00
Matthias Bisping
426967ee46 refactoring 2022-05-23 11:39:30 +02:00
Matthias Bisping
54ca81d577 moved parameter combination based test skipping into operation factory 2022-05-23 11:28:35 +02:00
Matthias Bisping
13888524fb refactoring 2022-05-23 10:40:28 +02:00
Matthias Bisping
a7ffaeb18f changed return value of file name listing function for storages to return strings of filenames rather than tuples of bucket name and file name 2022-05-23 10:19:07 +02:00
Matthias Bisping
c97393f690 skipping undefined combinations for analysis_task 2022-05-23 10:07:28 +02:00
Matthias Bisping
7ff466e0ea added test for empty return-data operation, like classification 2022-05-23 09:58:43 +02:00
Matthias Bisping
02b0009219 data AND metadata is being uploaded instead of data only 2022-05-18 16:35:18 +02:00
Matthias Bisping
cf13f67394 refactoring: serve_test can now be run with input_data_items like image, pdf etc 2022-05-18 10:56:14 +02:00
Matthias Bisping
0ab86206ec fixed bug introduced by overwritng 'body' as explanatory variable within try-block, which resultet in republish() receiving parsed body, instead of body as bytes 2022-05-18 10:32:54 +02:00
Matthias Bisping
35542f994c integration test for lazy pipeline 2022-05-18 09:24:12 +02:00
Matthias Bisping
fb712af7c6 storag aggregation strategy working in first version 2022-05-17 22:30:54 +02:00
Matthias Bisping
6cb13051eb fixed following bugs:
- upper() did yield instead of return
 - metdadata was not repeated when zipping with results generator
 - since test metadata was empty dict,  target data was therefore empty always, since results were zipped with {}
 - hence added check for target lengths > 0
 - fixed return value of queued stream function dispatcher; only returned first item of 1 -> n results
2022-05-17 21:48:16 +02:00
Matthias Bisping
456cb4157d refactoring: move 2022-05-17 17:27:58 +02:00
Matthias Bisping
6945760045 refactoring: move 2022-05-17 15:59:04 +02:00
Matthias Bisping
47f1d77c03 renaming 2022-05-17 12:12:43 +02:00
Matthias Bisping
fb325ce43d renaming 2022-05-17 12:10:24 +02:00
Matthias Bisping
5590669939 pipelin laziness test works again 2022-05-17 10:44:43 +02:00
Matthias Bisping
9c262e7138 non-rest pipeline works again 2022-05-17 10:27:32 +02:00
Matthias Bisping
e5a4e7e994 applied black 2022-05-16 15:00:09 +02:00
Matthias Bisping
89f562aa71 refactoring: move 2022-05-16 14:58:19 +02:00
Matthias Bisping
1074f44b30 no buffer capacity test; commented out probably dead codde -- removing next 2022-05-16 14:34:37 +02:00
Matthias Bisping
96bf831b00 refactoring: move 2022-05-16 13:31:11 +02:00
Matthias Bisping
5d2b71d647 target data fixture and test for flat stream buffer on different data 2022-05-16 13:21:31 +02:00
Matthias Bisping
d12124b2d5 refactoring 2022-05-16 13:04:37 +02:00
Matthias Bisping
7adbdefb4e refactoring: skipping invalid parameter combinations 2022-05-16 12:18:04 +02:00
Matthias Bisping
a1c292a485 refactoring: pulled core operation taking only data out from operation taking data and metadata 2022-05-16 11:53:52 +02:00
Matthias Bisping
948575d199 renaming 2022-05-16 11:43:48 +02:00
Matthias Bisping
2070f300c9 refactoring: queued stream function returns first of generator 2022-05-16 10:28:53 +02:00
Matthias Bisping
092a0e2964 renaming 2022-05-13 17:16:27 +02:00
Matthias Bisping
40777ae609 refactoring: simplyfing lazy processor to queued function WIP 2022-05-13 16:59:50 +02:00
Matthias Bisping
2434e0ea55 refactoring; stream buffer tests 2022-05-13 16:38:20 +02:00
Matthias Bisping
08ad83b6a5 renaming 2022-05-13 15:04:01 +02:00
Matthias Bisping
9870aa38d1 renaming 2022-05-13 15:02:05 +02:00
Matthias Bisping
3b7605772e refactoring 2022-05-13 12:42:07 +02:00
Matthias Bisping
1acf16dc91 refactoring: flat stream buffer now takes over stream buffer flushing 2022-05-13 12:29:18 +02:00
Matthias Bisping
c09e5df23e refactoring: introduced flat stream buffer class 2022-05-13 12:13:59 +02:00
Matthias Bisping
bfdce62ccf refactoring; renaming 2022-05-12 19:46:29 +02:00
Matthias Bisping
1552cd10cc refactoring: further simplififyied queue consuming 2022-05-12 19:26:22 +02:00
Matthias Bisping
8b0c2d4e07 refactoring: further simplified queue consuming function; added one -> many test fixture param 2022-05-12 17:55:45 +02:00
Matthias Bisping
461c0fe6a6 refactoring: simplified queue consuming function 2022-05-12 17:29:20 +02:00
Matthias Bisping
9b5fc4ff77 refactoring: made queued buffer coupler into a function 2022-05-12 17:19:52 +02:00
Matthias Bisping
e3793e5c7c refactoring: split stream processor into two functions; moved queue streaming Nothing check from caller to stream function 2022-05-12 17:12:26 +02:00
Matthias Bisping
1a04dfb426 renaming 2022-05-12 14:42:42 +02:00
Matthias Bisping
e151d2005b refactoring 2022-05-11 17:01:56 +02:00
Matthias Bisping
da2572b8be refactoring 2022-05-11 16:59:41 +02:00
Matthias Bisping
eb8ace4ddd refactoring 2022-05-11 16:38:36 +02:00
Matthias Bisping
096068367f fixed recursion issue 2022-05-11 12:40:03 +02:00
Matthias Bisping
7bd35dce67 refactoring: introduced queue-buffer-coupler; introduced recursion depth issie -- fixing next 2022-05-11 12:39:36 +02:00
Matthias Bisping
1eb4dbc657 refactoring: split queue processor into output buffer and queue streamer 2022-05-11 10:54:27 +02:00
Matthias Bisping
ccf7a7379d refactoring: introduced queue wrapper 2022-05-11 10:44:50 +02:00
Matthias Bisping
b1a318872e refactoring 2022-05-11 10:16:36 +02:00
Matthias Bisping
abc56e6d9f refactoring: factored out queue processor fom lazy bufferizer 2022-05-11 10:07:34 +02:00
Matthias Bisping
c8579a8ad0 renaming 2022-05-10 12:49:36 +02:00
Matthias Bisping
c68e19e6e4 refactoring rest stream processor 2022-05-10 12:40:19 +02:00
Matthias Bisping
de0deaa2f4 renaming 2022-05-10 12:19:42 +02:00
Matthias Bisping
83ce7692e6 renaming; adjusted and added tests for lazy bufferize (formerlay on-demand processor) 2022-05-10 12:18:20 +02:00
Matthias Bisping
949413af4a fixed bug in compute_next of on demand processor that skipped all but the first return value of 1 -> n functions 2022-05-10 11:07:58 +02:00
Matthias Bisping
3dba896038 removed obsolete imports 2022-05-09 14:52:33 +02:00
Matthias Bisping
453281d48d re-order imports 2022-05-09 14:52:07 +02:00
Matthias Bisping
5b913983eb refactoring: move; added null value param to bufferize 2022-05-09 14:50:54 +02:00
Matthias Bisping
c2ed6d78b7 reintroduced buffering wrapper with Nothing item as flushing signal. buffering is controlled via chunking in the REST receiver on client side 2022-05-09 14:23:12 +02:00
Matthias Bisping
f29bd7d4d3 removed need for bufferize wrapper by composing with first(chunks(...)) and applying to on-demand processor execution chain; broke mock pipeline, fixing next 2022-05-09 11:06:14 +02:00
Matthias Bisping
ec620abf54 lazy pipeline test 2022-05-09 01:17:53 +02:00
Matthias Bisping
5b30a32fff added client pipeline without webserver backend 2022-05-09 00:43:21 +02:00
Matthias Bisping
c092e7bcab refactoring; server pipeline WIP 2022-05-08 18:36:15 +02:00
Matthias Bisping
1d09337378 endpoint suffixes passed to stream processor 2022-05-08 17:28:55 +02:00
Matthias Bisping
8c1ad64464 refactoring; additional buffer test 2022-05-07 00:29:58 +02:00
Matthias Bisping
1e21913e37 processor test; refactoring 2022-05-07 00:23:22 +02:00
Matthias Bisping
132a1a1b50 renaming 2022-05-06 23:46:21 +02:00
Matthias Bisping
f99d779c29 refactoring 2022-05-06 23:39:29 +02:00
Matthias Bisping
f428372511 renaming 2022-05-06 23:19:25 +02:00
Matthias Bisping
54359501f9 renaming 2022-05-06 23:19:03 +02:00
Matthias Bisping
cbba561116 replaced eager endpoint in sender test 2022-05-06 23:18:37 +02:00
Matthias Bisping
1daaf2b904 refactoring 2022-05-06 23:13:48 +02:00
Matthias Bisping
866df2dee3 renaming 2022-05-06 23:07:46 +02:00
Matthias Bisping
962b398a9c refactoring: added processor adapter and streamer 2022-05-06 22:49:59 +02:00
Matthias Bisping
3ae4fd8986 refactoring / formatting 2022-05-06 19:57:33 +02:00
Matthias Bisping
98af600787 fixed result validation case for Nothing value 2022-05-06 19:57:05 +02:00
Matthias Bisping
ec650464a8 bufferize test 2022-05-06 19:56:43 +02:00
Matthias Bisping
ed69011bf6 removed comment out lines 2022-05-06 19:16:17 +02:00
Matthias Bisping
0fc3db2fae test params shown with names in pytest log 2022-05-06 19:15:44 +02:00
Matthias Bisping
53eee983c4 added result validaton for processor 2022-05-06 19:15:17 +02:00
Matthias Bisping
5760a6f354 fixed buffer issue: buffer can overflow when called lazily, for some reason. looking into it next. 2022-05-06 19:15:02 +02:00
Matthias Bisping
dca3eaaa54 fixed buffering and result streaming: all items are yielded individually and computed on demand now 2022-05-06 12:14:09 +02:00
Matthias Bisping
1b04c46853 removed code related to eager endpoint 2022-05-05 17:25:42 +02:00
Matthias Bisping
68c24c863f removed eager endpoint (/process) 2022-05-05 17:21:47 +02:00
Matthias Bisping
4a3ac150cf fixed initial computation queue state 2022-05-05 16:42:43 +02:00
Matthias Bisping
373c38113f refactoring 2022-05-05 16:26:18 +02:00
Matthias Bisping
b58a9d11c3 refactoring: added processor class 2022-05-05 16:08:59 +02:00
Matthias Bisping
bd5fe82e06 refactoring: rename 2022-05-05 15:47:25 +02:00
Matthias Bisping
b62652957a refactoring: pipeline 2022-05-05 15:15:05 +02:00
Matthias Bisping
7a3bb9334b removed obsolete code 2022-05-05 14:26:46 +02:00
Matthias Bisping
7ccad043f4 refactoring: rename 2022-05-05 14:25:24 +02:00
Matthias Bisping
4fd6a5aa2a docstring update 2022-05-05 12:52:40 +02:00
Matthias Bisping
a7a44267f1 refactoring: added pipeline class 2022-05-05 12:49:49 +02:00
Matthias Bisping
685edaa62f removed obsolete imports 2022-05-05 11:57:29 +02:00
Matthias Bisping
82d7b7f8cb refactoring: simplify pickup endpoint extraction 2022-05-05 11:56:06 +02:00
Matthias Bisping
d4ffd75e26 added rest callback interpreter 2022-05-05 11:45:52 +02:00
Matthias Bisping
7a1db32c3b refactoring: rename 2022-05-05 11:22:17 +02:00
Matthias Bisping
456fb1db06 refactoring: move 2022-05-05 10:23:51 +02:00
Matthias Bisping
e221b00933 refactoring: Sender, Receiver 2022-05-05 10:17:38 +02:00
Matthias Bisping
24313241a8 sync 2022-05-04 17:46:35 +02:00
Matthias Bisping
ef0e805223 sender baseclass 2022-05-04 17:05:44 +02:00
Matthias Bisping
531ff8d3e0 refactoring; fixed sender 2022-05-04 16:57:08 +02:00
Matthias Bisping
14d83abd72 refactoring; rest sender 2022-05-04 16:46:24 +02:00
Matthias Bisping
00ea224379 refactoring; packer test; sender test 2022-05-04 16:41:29 +02:00
Matthias Bisping
625552ec7c packing and bundling test params 2022-05-04 15:19:52 +02:00
Matthias Bisping
8afd87e44f added packing and bundling test 2022-05-04 15:18:15 +02:00
Matthias Bisping
1d70fb628e removed debug prints 2022-05-04 14:58:22 +02:00
Matthias Bisping
f32004c3a4 added packer test 2022-05-04 14:57:40 +02:00
Matthias Bisping
a301876ab9 refactoring: metadata argument as iterable instead of dict 2022-05-04 14:55:04 +02:00
Matthias Bisping
35045128b4 renaming 2022-05-04 13:51:05 +02:00
Matthias Bisping
463e1b2024 corrected imports 2022-05-04 12:14:30 +02:00
Matthias Bisping
630ed51b27 refactoring 2022-05-04 10:52:19 +02:00
Matthias Bisping
a4079f6710 refactoring 2022-05-04 10:45:13 +02:00
Matthias Bisping
cf0a877569 refactoring 2022-05-04 10:37:46 +02:00
Matthias Bisping
7d8659f257 topological sorting of definitions by caller hierarchy 2022-05-03 18:21:51 +02:00
Matthias Bisping
51a6bf9875 refactoring 2022-05-03 18:06:28 +02:00
Matthias Bisping
85d7ad52dc refactoring 2022-05-03 18:02:53 +02:00
Matthias Bisping
c00973b676 removed debug prints 2022-05-03 17:56:33 +02:00
Matthias Bisping
16fa992cae pickup endpoint working 2022-05-03 17:52:13 +02:00
Matthias Bisping
c315247625 parametrized number of pages for pdf fixture 2022-05-03 15:47:11 +02:00
Matthias Bisping
29fb0dda30 improved 1 -> n test and explanation 2022-05-03 15:43:30 +02:00
Matthias Bisping
ae39ccc8e2 renaming 2022-05-03 14:32:35 +02:00
Matthias Bisping
8575567890 refactoring 2022-05-03 14:25:01 +02:00
Matthias Bisping
92190a42f0 refactoring: move 2022-05-03 09:56:12 +02:00
Matthias Bisping
ea9b405d2a signature harminization for 1 -> 1 and 1 -> n completed 2022-05-02 15:50:14 +02:00
Matthias Bisping
77f23a2185 data nesting harmonized for 1 -> 1 and 1 -> n; pdf page extraction (1 -> n) working for non-batched usage 2022-04-29 17:22:54 +02:00
Matthias Bisping
3172a00aaa refactoring 2022-04-29 17:08:34 +02:00
Matthias Bisping
501f0bd5fc signature harminization for 1 -> 1 and 1 -> n WIP: batched and not batched working again 2022-04-29 16:46:25 +02:00
Matthias Bisping
fd57261631 signature harminization for 1 -> 1 and 1 -> n WIP 2022-04-29 15:43:20 +02:00
Matthias Bisping
23070f3480 changed operation signature to return iterables for 1 -> n compatibility 2022-04-29 13:34:27 +02:00
Matthias Bisping
2550a0eff2 refactoring 2022-04-29 12:34:54 +02:00
Matthias Bisping
f053a072d6 signatures for services updated 2022-04-29 12:01:13 +02:00
Matthias Bisping
00276dbcc7 added batching wrapper for internally batching functions 2022-04-29 10:35:55 +02:00
Matthias Bisping
bc0e9ed643 renaming 2022-04-29 09:51:41 +02:00
Matthias Bisping
940bc3a689 changed buffering function behaviour: applies function to buffer. function needs to be lifted from the outside if single items are to be processed. 2022-04-28 21:58:38 +02:00
Matthias Bisping
a999ce2c3b refactoring 2022-04-28 21:36:42 +02:00
Matthias Bisping
e8315ffea9 signature of process_fn changed. Now returns {'data': ..., 'metadata'} instead of {'data': ...} 2022-04-28 21:12:25 +02:00
Matthias Bisping
087b5af929 refactoring 2022-04-28 18:28:35 +02:00
Matthias Bisping
f47d458217 send json instead of data 2022-04-28 18:05:07 +02:00
Matthias Bisping
0d503d1c1d refactoring 2022-04-28 14:30:50 +02:00
Matthias Bisping
3b0d0868b9 refactoring: method dispatch via peekable rather than special empty data request 2022-04-28 13:01:01 +02:00
Matthias Bisping
da84ff5112 buffer size constrained by assertion 2022-04-27 18:45:14 +02:00
Matthias Bisping
c9f26000d7 test parametrization for number of input items and buffer size 2022-04-27 18:38:43 +02:00
Matthias Bisping
67c4bac4b7 sync 2022-04-27 17:45:48 +02:00
Matthias Bisping
ab5839a126 refactoring; extended partial posting by image payload data 2022-04-27 16:29:52 +02:00
Matthias Bisping
fa4e5e5e0e refactoring; made test dynamic relative to import 2022-04-27 14:15:52 +02:00
Matthias Bisping
e58addf8c4 refactoring 2022-04-27 14:09:18 +02:00
Matthias Bisping
00e21b00ba refactoring 2022-04-27 14:02:41 +02:00
Matthias Bisping
fabc78efce refactoring; formatting 2022-04-27 13:41:48 +02:00
Matthias Bisping
90af62ed2c refactoring 2022-04-27 13:38:55 +02:00
Matthias Bisping
2af648254e formatting 2022-04-27 10:13:46 +02:00
Matthias Bisping
9e8172427c refactoring 2022-04-27 10:08:57 +02:00
Matthias Bisping
e903c69a07 refactoring 2022-04-27 09:13:19 +02:00
Matthias Bisping
7419612c21 partial request by manual receiver buffering V1 2022-04-26 19:54:37 +02:00
Julius Unverfehrt
656bc7cd63 explore partial responses 2022-04-26 16:34:58 +02:00
Julius Unverfehrt
f6ca9c9ac5 explore partial responses 2022-04-26 16:17:50 +02:00
Matthias Bisping
5a948ef7ad partial response test WIP 2022-04-26 16:15:54 +02:00
Matthias Bisping
4078b3e4ec signatures for services 2022-04-26 14:37:12 +02:00
Matthias Bisping
b7882d4452 refactoring: introduced sub-conftest files 2022-04-26 13:13:26 +02:00
Matthias Bisping
01daa634ec refactoring: made docker-compose api call non-autousing 2022-04-26 13:01:15 +02:00
Matthias Bisping
64624a3fd3 refactoring: moved conftest up a dir 2022-04-26 12:52:24 +02:00
Matthias Bisping
afd67d87a6 updated test container dockerfile for new location of tests directory 2022-04-26 12:48:15 +02:00
Matthias Bisping
37881da08e restructuring: moved test out of module scope 2022-04-26 12:45:12 +02:00
201 changed files with 5356 additions and 37464 deletions

106
.dockerignore Normal file
View File

@ -0,0 +1,106 @@
data
/build_venv/
/.venv/
/misc/
/incl/image_service/test/
/scratch/
/bamboo-specs/
README.md
Dockerfile
*idea
*misc
*egg-innfo
*pycache*
# Git
.git
.gitignore
# CI
.codeclimate.yml
.travis.yml
.taskcluster.yml
# Docker
.docker
# Byte-compiled / optimized / DLL files
__pycache__/
*/__pycache__/
*/*/__pycache__/
*/*/*/__pycache__/
*.py[cod]
*/*.py[cod]
*/*/*.py[cod]
*/*/*/*.py[cod]
# C extensions
*.so
# Distribution / packaging
.Python
env/
build/
develop-eggs/
dist/
downloads/
eggs/
lib/
lib64/
parts/
sdist/
var/
*.egg-info/**
.installed.cfg
*.egg
# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec
# Installer logs
pip-log.txt
pip-delete-this-directory.txt
# Unit test / coverage reports
htmlcov/
.tox/
.coverage
.cache
nosetests.xml
coverage.xml
# Translations
*.mo
*.pot
# Django stuff:
*.log
# Sphinx documentation
docs/_build/
# PyBuilder
target/
# Virtual environment
.env/
.venv/
#venv/
# PyCharm
.idea
# Python mode for VIM
.ropeproject
*/.ropeproject
*/*/.ropeproject
*/*/*/.ropeproject
# Vim swap files
*.swp
*/*.swp
*/*/*.swp
*/*/*/*.swp

2
.dvc/.gitignore vendored
View File

@ -1,2 +0,0 @@
/config.local
/cache

View File

@ -1,5 +0,0 @@
[core]
remote = azure
['remote "azure"']
url = azure://pyinfra-dvc
connection_string =

View File

@ -1,3 +0,0 @@
# Add patterns of files dvc should ignore, which could improve
# the performance. Learn more at
# https://dvc.org/doc/user-guide/dvcignore

57
.gitignore vendored
View File

@ -1,53 +1,10 @@
# Environments
.env
.venv
env/
venv/
.DS_Store
# Project folders
*.vscode/
.idea
*_app
*pytest_cache
*joblib
*tmp
*profiling
*logs
*docker
*drivers
*bamboo-specs/target
.coverage
data
__pycache__
data/
build_venv
reports
# Python specific files
__pycache__/
*.py[cod]
*.ipynb
*.ipynb_checkpoints
# file extensions
*.log
*.csv
*.pkl
*.profile
*.cbm
*.egg-info
# temp files
*.swp
*~
*.un~
# keep files
!notebooks/*.ipynb
# keep folders
!secrets
!data/*
!drivers
# ignore files
bamboo.yml
pyinfra.egg-info
bamboo-specs/target
.pytest_cache
/.coverage
.idea

View File

@ -1,23 +0,0 @@
# CI for services, check gitlab repo for python package CI
include:
- project: "Gitlab/gitlab"
ref: main
file: "/ci-templates/research/python_pkg-test-build-release.gitlab-ci.yml"
# set project variables here
variables:
NEXUS_PROJECT_DIR: research # subfolder in Nexus docker-gin where your container will be stored
IMAGENAME: $CI_PROJECT_NAME # if the project URL is gitlab.example.com/group-name/project-1, CI_PROJECT_NAME is project-1
REPORTS_DIR: reports
FF_USE_FASTZIP: "true" # enable fastzip - a faster zip implementation that also supports level configuration.
ARTIFACT_COMPRESSION_LEVEL: default # can also be set to fastest, fast, slow and slowest. If just enabling fastzip is not enough try setting this to fastest or fast.
CACHE_COMPRESSION_LEVEL: default # same as above, but for caches
# TRANSFER_METER_FREQUENCY: 5s # will display transfer progress every 5 seconds for artifacts and remote caches. For debugging purposes.
############
# UNIT TESTS
unit-tests:
variables:
###### UPDATE/EDIT ######
UNIT_TEST_DIR: "tests/unit_test"

View File

@ -1,55 +0,0 @@
# See https://pre-commit.com for more information
# See https://pre-commit.com/hooks.html for more hooks
exclude: ^(docs/|notebooks/|data/|src/configs/|tests/|.hooks/)
default_language_version:
python: python3.10
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v5.0.0
hooks:
- id: trailing-whitespace
- id: end-of-file-fixer
- id: check-yaml
name: Check Gitlab CI (unsafe)
args: [--unsafe]
files: .gitlab-ci.yml
- id: check-yaml
exclude: .gitlab-ci.yml
- id: check-toml
- id: detect-private-key
- id: check-added-large-files
args: ['--maxkb=10000']
- id: check-case-conflict
- id: mixed-line-ending
- repo: https://github.com/pre-commit/mirrors-pylint
rev: v3.0.0a5
hooks:
- id: pylint
language: system
args:
- --disable=C0111,R0903
- --max-line-length=120
- repo: https://github.com/pre-commit/mirrors-isort
rev: v5.10.1
hooks:
- id: isort
args:
- --profile black
- repo: https://github.com/psf/black
rev: 24.10.0
hooks:
- id: black
# exclude: ^(docs/|notebooks/|data/|src/secrets/)
args:
- --line-length=120
- repo: https://github.com/compilerla/conventional-pre-commit
rev: v3.6.0
hooks:
- id: conventional-pre-commit
pass_filenames: false
stages: [commit-msg]
# args: [] # optional: list of Conventional Commits types to allow e.g. [feat, fix, ci, chore, test]

View File

@ -1 +0,0 @@
3.10

19
Dockerfile Executable file
View File

@ -0,0 +1,19 @@
FROM python:3.8
# Use a virtual environment.
RUN python -m venv /app/venv
ENV PATH="/app/venv/bin:$PATH"
# Upgrade pip.
RUN python -m pip install --upgrade pip
# Make a directory for the service files and copy the service repo into the container.
WORKDIR /app/service
COPY . .
# Install module & dependencies
RUN python3 -m pip install -e .
RUN python3 -m pip install -r requirements.txt
# Run the service loop.
CMD ["python", "src/serve.py"]

19
Dockerfile_tests Executable file
View File

@ -0,0 +1,19 @@
ARG BASE_ROOT="nexus.iqser.com:5001/red/"
ARG VERSION_TAG="dev"
FROM ${BASE_ROOT}pyinfra:${VERSION_TAG}
EXPOSE 5000
EXPOSE 8080
RUN python3 -m pip install coverage
# Make a directory for the service files and copy the service repo into the container.
WORKDIR /app/service
COPY . .
# Install module & dependencies
RUN python3 -m pip install -e .
RUN python3 -m pip install -r requirements.txt
CMD coverage run -m pytest test/ -x && coverage report -m && coverage xml

View File

@ -1,85 +0,0 @@
.PHONY: \
poetry in-project-venv dev-env use-env install install-dev tests \
update-version sync-version-with-git \
docker docker-build-run docker-build docker-run \
docker-rm docker-rm-container docker-rm-image \
pre-commit get-licenses prep-commit \
docs sphinx_html sphinx_apidoc
.DEFAULT_GOAL := run
export DOCKER=docker
export DOCKERFILE=Dockerfile
export IMAGE_NAME=rule_engine-image
export CONTAINER_NAME=rule_engine-container
export HOST_PORT=9999
export CONTAINER_PORT=9999
export PYTHON_VERSION=python3.8
# all commands should be executed in the root dir or the project,
# specific environments should be deactivated
poetry: in-project-venv use-env dev-env
in-project-venv:
poetry config virtualenvs.in-project true
use-env:
poetry env use ${PYTHON_VERSION}
dev-env:
poetry install --with dev
install:
poetry add $(pkg)
install-dev:
poetry add --dev $(pkg)
requirements:
poetry export --without-hashes --output requirements.txt
update-version:
poetry version prerelease
sync-version-with-git:
git pull -p && poetry version $(git rev-list --tags --max-count=1 | git describe --tags --abbrev=0)
docker: docker-rm docker-build-run
docker-build-run: docker-build docker-run
docker-build:
$(DOCKER) build \
--no-cache --progress=plain \
-t $(IMAGE_NAME) -f $(DOCKERFILE) .
docker-run:
$(DOCKER) run -it --rm -p $(HOST_PORT):$(CONTAINER_PORT)/tcp --name $(CONTAINER_NAME) $(IMAGE_NAME) python app.py
docker-rm: docker-rm-container docker-rm-image
docker-rm-container:
-$(DOCKER) rm $(CONTAINER_NAME)
docker-rm-image:
-$(DOCKER) image rm $(IMAGE_NAME)
tests:
poetry run pytest ./tests
prep-commit:
docs get-license sync-version-with-git update-version pre-commit
pre-commit:
pre-commit run --all-files
get-licenses:
pip-licenses --format=json --order=license --with-urls > pkg-licenses.json
docs: sphinx_apidoc sphinx_html
sphinx_html:
poetry run sphinx-build -b html docs/source/ docs/build/html -E -a
sphinx_apidoc:
poetry run sphinx-apidoc -o ./docs/source/modules ./src/rule_engine

245
README.md
View File

@ -1,220 +1,103 @@
# PyInfra
# Infrastructure to deploy Research Projects
1. [ About ](#about)
2. [ Configuration ](#configuration)
3. [ Queue Manager ](#queue-manager)
4. [ Module Installation ](#module-installation)
5. [ Scripts ](#scripts)
6. [ Tests ](#tests)
7. [ Opentelemetry protobuf dependency hell ](#opentelemetry-protobuf-dependency-hell)
## About
Shared library for the research team, containing code related to infrastructure and communication with other services.
Offers a simple interface for processing data and sending responses via AMQP, monitoring via Prometheus and storage
access via S3 or Azure. Also export traces via OpenTelemetry for queue messages and webserver requests.
To start, see the [complete example](pyinfra/examples.py) which shows how to use all features of the service and can be
imported and used directly for default research service pipelines (data ID in message, download data from storage,
upload result while offering Prometheus monitoring, /health and /ready endpoints and multi tenancy support).
The Infrastructure expects to be deployed in the same Pod / local environment as the analysis container and handles all outbound communication.
## Configuration
Configuration is done via `Dynaconf`. This means that you can use environment variables, a `.env` file or `.toml`
file(s) to configure the service. You can also combine these methods. The precedence is
`environment variables > .env > .toml`. It is recommended to load settings with the provided
[`load_settings`](pyinfra/config/loader.py) function, which you can combine with the provided
[`parse_args`](pyinfra/config/loader.py) function. This allows you to load settings from a `.toml` file or a folder with
`.toml` files and override them with environment variables.
A configuration is located in `/config.yaml`. All relevant variables can be configured via exporting environment variables.
The following table shows all necessary settings. You can find a preconfigured settings file for this service in
bitbucket. These are the complete settings, you only need all if using all features of the service as described in
the [complete example](pyinfra/examples.py).
| Environment Variable | Default | Description |
|-------------------------------|--------------------------------|-----------------------------------------------------------------------|
| LOGGING_LEVEL_ROOT | DEBUG | Logging level for service logger |
| PROBING_WEBSERVER_HOST | "0.0.0.0" | Probe webserver address |
| PROBING_WEBSERVER_PORT | 8080 | Probe webserver port |
| PROBING_WEBSERVER_MODE | production | Webserver mode: {development, production} |
| RABBITMQ_HOST | localhost | RabbitMQ host address |
| RABBITMQ_PORT | 5672 | RabbitMQ host port |
| RABBITMQ_USERNAME | user | RabbitMQ username |
| RABBITMQ_PASSWORD | bitnami | RabbitMQ password |
| RABBITMQ_HEARTBEAT | 7200 | Controls AMQP heartbeat timeout in seconds |
| REQUEST_QUEUE | request_queue | Requests to service |
| RESPONSE_QUEUE | response_queue | Responses by service |
| DEAD_LETTER_QUEUE | dead_letter_queue | Messages that failed to process |
| ANALYSIS_ENDPOINT | "http://127.0.0.1:5000" | Endpoint for analysis container |
| STORAGE_BACKEND | s3 | The type of storage to use {s3, azure} |
| STORAGE_BUCKET | "pyinfra-test-bucket" | The bucket / container to pull files specified in queue requests from |
| STORAGE_ENDPOINT | "http://127.0.0.1:9000" | Endpoint for s3 storage |
| STORAGE_KEY | root | User for s3 storage |
| STORAGE_SECRET | password | Password for s3 storage |
| STORAGE_AZURECONNECTIONSTRING | "DefaultEndpointsProtocol=..." | Connection string for Azure storage |
| Environment Variable | Internal / .toml Name | Description |
| ------------------------------------------ | --------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| LOGGING\_\_LEVEL | logging.level | Log level |
| DYNAMIC_TENANT_QUEUES\_\_ENABLED | dynamic_tenant_queues.enabled | Enable queues per tenant that are dynamically created mode |
| METRICS\_\_PROMETHEUS\_\_ENABLED | metrics.prometheus.enabled | Enable Prometheus metrics collection |
| METRICS\_\_PROMETHEUS\_\_PREFIX | metrics.prometheus.prefix | Prefix for Prometheus metrics (e.g. {product}-{service}) |
| WEBSERVER\_\_HOST | webserver.host | Host of the webserver (offering e.g. /prometheus, /ready and /health endpoints) |
| WEBSERVER\_\_PORT | webserver.port | Port of the webserver |
| RABBITMQ\_\_HOST | rabbitmq.host | Host of the RabbitMQ server |
| RABBITMQ\_\_PORT | rabbitmq.port | Port of the RabbitMQ server |
| RABBITMQ\_\_USERNAME | rabbitmq.username | Username for the RabbitMQ server |
| RABBITMQ\_\_PASSWORD | rabbitmq.password | Password for the RabbitMQ server |
| RABBITMQ\_\_HEARTBEAT | rabbitmq.heartbeat | Heartbeat for the RabbitMQ server |
| RABBITMQ\_\_CONNECTION_SLEEP | rabbitmq.connection_sleep | Sleep time intervals during message processing. Has to be a divider of heartbeat, and shouldn't be too big, since only in these intervals queue interactions happen (like receiving new messages) This is also the minimum time the service needs to process a message. |
| RABBITMQ\_\_INPUT_QUEUE | rabbitmq.input_queue | Name of the input queue in single queue setting |
| RABBITMQ\_\_OUTPUT_QUEUE | rabbitmq.output_queue | Name of the output queue in single queue setting |
| RABBITMQ\_\_DEAD_LETTER_QUEUE | rabbitmq.dead_letter_queue | Name of the dead letter queue in single queue setting |
| RABBITMQ\_\_TENANT_EVENT_QUEUE_SUFFIX | rabbitmq.tenant_event_queue_suffix | Suffix for the tenant event queue in multi tenant/queue setting |
| RABBITMQ\_\_TENANT_EVENT_DLQ_SUFFIX | rabbitmq.tenant_event_dlq_suffix | Suffix for the dead letter queue in multi tenant/queue setting |
| RABBITMQ\_\_TENANT_EXCHANGE_NAME | rabbitmq.tenant_exchange_name | Name of tenant exchange in multi tenant/queue setting |
| RABBITMQ\_\_QUEUE_EXPIRATION_TIME | rabbitmq.queue_expiration_time | Time until queue expiration in multi tenant/queue setting |
| RABBITMQ\_\_SERVICE_REQUEST_QUEUE_PREFIX | rabbitmq.service_request_queue_prefix | Service request queue prefix in multi tenant/queue setting |
| RABBITMQ\_\_SERVICE_REQUEST_EXCHANGE_NAME | rabbitmq.service_request_exchange_name | Service request exchange name in multi tenant/queue setting |
| RABBITMQ\_\_SERVICE_RESPONSE_EXCHANGE_NAME | rabbitmq.service_response_exchange_name | Service response exchange name in multi tenant/queue setting |
| RABBITMQ\_\_SERVICE_DLQ_NAME | rabbitmq.service_dlq_name | Service dead letter queue name in multi tenant/queue setting |
| STORAGE\_\_BACKEND | storage.backend | Storage backend to use (currently only "s3" and "azure" are supported) |
| STORAGE\_\_S3\_\_BUCKET | storage.s3.bucket | Name of the S3 bucket |
| STORAGE\_\_S3\_\_ENDPOINT | storage.s3.endpoint | Endpoint of the S3 server |
| STORAGE\_\_S3\_\_KEY | storage.s3.key | Access key for the S3 server |
| STORAGE\_\_S3\_\_SECRET | storage.s3.secret | Secret key for the S3 server |
| STORAGE\_\_S3\_\_REGION | storage.s3.region | Region of the S3 server |
| STORAGE\_\_AZURE\_\_CONTAINER | storage.azure.container_name | Name of the Azure container |
| STORAGE\_\_AZURE\_\_CONNECTION_STRING | storage.azure.connection_string | Connection string for the Azure server |
| STORAGE\_\_TENANT_SERVER\_\_PUBLIC_KEY | storage.tenant_server.public_key | Public key of the tenant server |
| STORAGE\_\_TENANT_SERVER\_\_ENDPOINT | storage.tenant_server.endpoint | Endpoint of the tenant server |
| TRACING\_\_ENABLED | tracing.enabled | Enable tracing |
| TRACING\_\_TYPE | tracing.type | Tracing mode - possible values: "opentelemetry", "azure_monitor" (Excpects APPLICATIONINSIGHTS_CONNECTION_STRING environment variable.) |
| TRACING\_\_OPENTELEMETRY\_\_ENDPOINT | tracing.opentelemetry.endpoint | Endpoint to which OpenTelemetry traces are exported |
| TRACING\_\_OPENTELEMETRY\_\_SERVICE_NAME | tracing.opentelemetry.service_name | Name of the service as displayed in the traces collected |
| TRACING\_\_OPENTELEMETRY\_\_EXPORTER | tracing.opentelemetry.exporter | Name of exporter |
| KUBERNETES\_\_POD_NAME | kubernetes.pod_name | Service pod name |
## Response Format
## Setup
**IMPORTANT** you need to set the following environment variables before running the setup script:
- ``$NEXUS_USER`` your Nexus user (usually equal to firstname.lastname@knecon.com)
- ``$NEXUS_PASSWORD`` your Nexus password (usually equal to your Azure Login)
```shell
# create venv and activate it
source ./scripts/setup/devenvsetup.sh {{ cookiecutter.python_version }} $NEXUS_USER $NEXUS_PASSWORD
source .venv/bin/activate
```
### OpenTelemetry
Open telemetry (vis its Python SDK) is set up to be as unobtrusive as possible; for typical use cases it can be
configured
from environment variables, without additional work in the microservice app, although additional confiuration is
possible.
`TRACING__OPENTELEMETRY__ENDPOINT` should typically be set
to `http://otel-collector-opentelemetry-collector.otel-collector:4318/v1/traces`.
## Queue Manager
The queue manager is responsible for consuming messages from the input queue, processing them and sending the response
to the output queue. The default callback also downloads data from the storage and uploads the result to the storage.
The response message does not contain the data itself, but the identifiers from the input message (including headers
beginning with "X-").
### Standalone Usage
```python
from pyinfra.queue.manager import QueueManager
from pyinfra.queue.callback import make_download_process_upload_callback, DataProcessor
from pyinfra.config.loader import load_settings
settings = load_settings("path/to/settings")
processing_function: DataProcessor # function should expect a dict (json) or bytes (pdf) as input and should return a json serializable object.
queue_manager = QueueManager(settings)
callback = make_download_process_upload_callback(processing_function, settings)
queue_manager.start_consuming(make_download_process_upload_callback(callback, settings))
```
### Usage in a Service
This is the recommended way to use the module. This includes the webserver, Prometheus metrics and health endpoints.
Custom endpoints can be added by adding a new route to the `app` object beforehand. Settings are loaded from files
specified as CLI arguments (e.g. `--settings-path path/to/settings.toml`). The values can also be set or overriden via
environment variables (e.g. `LOGGING__LEVEL=DEBUG`).
The callback can be replaced with a custom one, for example if the data to process is contained in the message itself
and not on the storage.
```python
from pyinfra.config.loader import load_settings, parse_settings_path
from pyinfra.examples import start_standard_queue_consumer
from pyinfra.queue.callback import make_download_process_upload_callback, DataProcessor
processing_function: DataProcessor
arguments = parse_settings_path()
settings = load_settings(arguments.settings_path)
callback = make_download_process_upload_callback(processing_function, settings)
start_standard_queue_consumer(callback, settings) # optionally also pass a fastAPI app object with preconfigured routes
```
### AMQP input message:
Either use the legacy format with dossierId and fileId as strings or the new format where absolute paths are used.
All headers beginning with "X-" are forwarded to the message processor, and returned in the response message (e.g.
"X-TENANT-ID" is used to acquire storage information for the tenant).
### Expected AMQP input message:
```json
{
"targetFilePath": "",
"responseFilePath": ""
"dossierId": "",
"fileId": "",
}
```
or
Optionally, the input message can contain a field with the key `"operations"`.
### AMQP output message:
```json
{
"dossierId": "",
"fileId": "",
"targetFileExtension": "",
"responseFileExtension": ""
...
}
```
## Module Installation
## Development
Add the respective version of the pyinfra package to your pyproject.toml file. Make sure to add our gitlab registry as a
source.
For now, all internal packages used by pyinfra also have to be added to the pyproject.toml file (namely kn-utils).
Execute `poetry lock` and `poetry install` to install the packages.
Either run `src/serve.py` or the built Docker image.
You can look up the latest version of the package in
the [gitlab registry](https://gitlab.knecon.com/knecon/research/pyinfra/-/packages).
For the used versions of internal dependencies, please refer to the [pyproject.toml](pyproject.toml) file.
### Setup
```toml
[tool.poetry.dependencies]
pyinfra = { version = "x.x.x", source = "gitlab-research" }
kn-utils = { version = "x.x.x", source = "gitlab-research" }
[[tool.poetry.source]]
name = "gitlab-research"
url = "https://gitlab.knecon.com/api/v4/groups/19/-/packages/pypi/simple"
priority = "explicit"
```
## Scripts
### Run pyinfra locally
**Shell 1**: Start minio and rabbitmq containers
Install module.
```bash
$ cd tests && docker compose up
pip install -e .
pip install -r requirements.txt
```
**Shell 2**: Start pyinfra with callback mock
or build docker image.
```bash
$ python scripts/start_pyinfra.py
docker build -f Dockerfile -t pyinfra .
```
**Shell 3**: Upload dummy content on storage and publish message
### Usage
**Shell 1:** Start a MinIO and a RabbitMQ docker container.
```bash
$ python scripts/send_request.py
docker-compose up
```
## Tests
**Shell 2:** Add files to the local minio storage.
Tests require a running minio and rabbitmq container, meaning you have to run `docker compose up` in the tests folder
before running the tests.
```bash
python scripts/manage_minio.py add <MinIO target folder> -d path/to/a/folder/with/PDFs
```
## OpenTelemetry Protobuf Dependency Hell
**Shell 2:** Run pyinfra-server.
**Note**: Status 2025/01/09: the currently used `opentelemetry-exporter-otlp-proto-http` version `1.25.0` requires
a `protobuf` version < `5.x.x` and is not compatible with the latest protobuf version `5.27.x`. This is an [open issue](https://github.com/open-telemetry/opentelemetry-python/issues/3958) in opentelemetry, because [support for 4.25.x ends in Q2 '25](https://protobuf.dev/support/version-support/#python).
Therefore, we should keep this in mind and update the dependency once opentelemetry includes support for `protobuf 5.27.x`.
```bash
python src/serve.py
```
or as container:
```bash
docker run --net=host pyinfra
```
**Shell 3:** Run analysis-container.
**Shell 4:** Start a client that sends requests to process PDFs from the MinIO store and annotates these PDFs according to the service responses.
```bash
python scripts/mock_client.py
```

40
bamboo-specs/pom.xml Normal file
View File

@ -0,0 +1,40 @@
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
<modelVersion>4.0.0</modelVersion>
<parent>
<groupId>com.atlassian.bamboo</groupId>
<artifactId>bamboo-specs-parent</artifactId>
<version>7.1.2</version>
<relativePath/>
</parent>
<artifactId>bamboo-specs</artifactId>
<version>1.0.0-SNAPSHOT</version>
<packaging>jar</packaging>
<properties>
<sonar.skip>true</sonar.skip>
</properties>
<dependencies>
<dependency>
<groupId>com.atlassian.bamboo</groupId>
<artifactId>bamboo-specs-api</artifactId>
</dependency>
<dependency>
<groupId>com.atlassian.bamboo</groupId>
<artifactId>bamboo-specs</artifactId>
</dependency>
<!-- Test dependencies -->
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<scope>test</scope>
</dependency>
</dependencies>
<!-- run 'mvn test' to perform offline validation of the plan -->
<!-- run 'mvn -Ppublish-specs' to upload the plan to your Bamboo server -->
</project>

View File

@ -0,0 +1,179 @@
package buildjob;
import com.atlassian.bamboo.specs.api.BambooSpec;
import com.atlassian.bamboo.specs.api.builders.BambooKey;
import com.atlassian.bamboo.specs.api.builders.docker.DockerConfiguration;
import com.atlassian.bamboo.specs.api.builders.permission.PermissionType;
import com.atlassian.bamboo.specs.api.builders.permission.Permissions;
import com.atlassian.bamboo.specs.api.builders.permission.PlanPermissions;
import com.atlassian.bamboo.specs.api.builders.plan.Job;
import com.atlassian.bamboo.specs.api.builders.plan.Plan;
import com.atlassian.bamboo.specs.api.builders.plan.PlanIdentifier;
import com.atlassian.bamboo.specs.api.builders.plan.Stage;
import com.atlassian.bamboo.specs.api.builders.plan.branches.BranchCleanup;
import com.atlassian.bamboo.specs.api.builders.plan.branches.PlanBranchManagement;
import com.atlassian.bamboo.specs.api.builders.project.Project;
import com.atlassian.bamboo.specs.builders.task.CheckoutItem;
import com.atlassian.bamboo.specs.builders.task.InjectVariablesTask;
import com.atlassian.bamboo.specs.builders.task.ScriptTask;
import com.atlassian.bamboo.specs.builders.task.VcsCheckoutTask;
import com.atlassian.bamboo.specs.builders.task.CleanWorkingDirectoryTask;
import com.atlassian.bamboo.specs.builders.task.VcsTagTask;
import com.atlassian.bamboo.specs.builders.trigger.BitbucketServerTrigger;
import com.atlassian.bamboo.specs.model.task.InjectVariablesScope;
import com.atlassian.bamboo.specs.api.builders.Variable;
import com.atlassian.bamboo.specs.util.BambooServer;
import com.atlassian.bamboo.specs.builders.task.ScriptTask;
import com.atlassian.bamboo.specs.model.task.ScriptTaskProperties.Location;
/**
* Plan configuration for Bamboo.
* Learn more on: <a href="https://confluence.atlassian.com/display/BAMBOO/Bamboo+Specs">https://confluence.atlassian.com/display/BAMBOO/Bamboo+Specs</a>
*/
@BambooSpec
public class PlanSpec {
private static final String SERVICE_NAME = "pyinfra";
private static final String SERVICE_KEY = SERVICE_NAME.toUpperCase().replaceAll("-","");
/**
* Run main to publish plan on Bamboo
*/
public static void main(final String[] args) throws Exception {
//By default credentials are read from the '.credentials' file.
BambooServer bambooServer = new BambooServer("http://localhost:8085");
Plan plan = new PlanSpec().createDockerBuildPlan();
bambooServer.publish(plan);
PlanPermissions planPermission = new PlanSpec().createPlanPermission(plan.getIdentifier());
bambooServer.publish(planPermission);
}
private PlanPermissions createPlanPermission(PlanIdentifier planIdentifier) {
Permissions permission = new Permissions()
.userPermissions("atlbamboo", PermissionType.EDIT, PermissionType.VIEW, PermissionType.ADMIN, PermissionType.CLONE, PermissionType.BUILD)
.groupPermissions("research", PermissionType.EDIT, PermissionType.VIEW, PermissionType.CLONE, PermissionType.BUILD)
.groupPermissions("Development", PermissionType.EDIT, PermissionType.VIEW, PermissionType.CLONE, PermissionType.BUILD)
.groupPermissions("QA", PermissionType.EDIT, PermissionType.VIEW, PermissionType.CLONE, PermissionType.BUILD)
.loggedInUserPermissions(PermissionType.VIEW)
.anonymousUserPermissionView();
return new PlanPermissions(planIdentifier.getProjectKey(), planIdentifier.getPlanKey()).permissions(permission);
}
private Project project() {
return new Project()
.name("RED")
.key(new BambooKey("RED"));
}
public Plan createDockerBuildPlan() {
return new Plan(
project(),
SERVICE_NAME, new BambooKey(SERVICE_KEY))
.description("Docker build for pyinfra")
.stages(
new Stage("Build Stage")
.jobs(
new Job("Build Job", new BambooKey("BUILD"))
.tasks(
new CleanWorkingDirectoryTask()
.description("Clean working directory.")
.enabled(true),
new VcsCheckoutTask()
.description("Checkout default repository.")
.checkoutItems(new CheckoutItem().defaultRepository()),
new ScriptTask()
.description("Set config and keys.")
.inlineBody("mkdir -p ~/.ssh\n" +
"echo \"${bamboo.bamboo_agent_ssh}\" | base64 -d >> ~/.ssh/id_rsa\n" +
"echo \"host vector.iqser.com\" > ~/.ssh/config\n" +
"echo \" user bamboo-agent\" >> ~/.ssh/config\n" +
"chmod 600 ~/.ssh/config ~/.ssh/id_rsa"),
new ScriptTask()
.description("Build Docker container.")
.location(Location.FILE)
.fileFromPath("bamboo-specs/src/main/resources/scripts/docker-build.sh")
.argument(SERVICE_NAME))
.dockerConfiguration(
new DockerConfiguration()
.image("nexus.iqser.com:5001/infra/release_build:4.2.0")
.volume("/var/run/docker.sock", "/var/run/docker.sock"))),
new Stage("Sonar Stage")
.jobs(
new Job("Sonar Job", new BambooKey("SONAR"))
.tasks(
new CleanWorkingDirectoryTask()
.description("Clean working directory.")
.enabled(true),
new VcsCheckoutTask()
.description("Checkout default repository.")
.checkoutItems(new CheckoutItem().defaultRepository()),
new ScriptTask()
.description("Set config and keys.")
.inlineBody("mkdir -p ~/.ssh\n" +
"echo \"${bamboo.bamboo_agent_ssh}\" | base64 -d >> ~/.ssh/id_rsa\n" +
"echo \"host vector.iqser.com\" > ~/.ssh/config\n" +
"echo \" user bamboo-agent\" >> ~/.ssh/config\n" +
"chmod 600 ~/.ssh/config ~/.ssh/id_rsa"),
new ScriptTask()
.description("Run Sonarqube scan.")
.location(Location.FILE)
.fileFromPath("bamboo-specs/src/main/resources/scripts/sonar-scan.sh")
.argument(SERVICE_NAME),
new ScriptTask()
.description("Shut down any running docker containers.")
.location(Location.FILE)
.inlineBody("pip install docker-compose\n" +
"docker-compose down"))
.dockerConfiguration(
new DockerConfiguration()
.image("nexus.iqser.com:5001/infra/release_build:4.2.0")
.volume("/var/run/docker.sock", "/var/run/docker.sock"))),
new Stage("Licence Stage")
.jobs(
new Job("Git Tag Job", new BambooKey("GITTAG"))
.tasks(
new VcsCheckoutTask()
.description("Checkout default repository.")
.checkoutItems(new CheckoutItem().defaultRepository()),
new ScriptTask()
.description("Build git tag.")
.location(Location.FILE)
.fileFromPath("bamboo-specs/src/main/resources/scripts/git-tag.sh"),
new InjectVariablesTask()
.description("Inject git tag.")
.path("git.tag")
.namespace("g")
.scope(InjectVariablesScope.LOCAL),
new VcsTagTask()
.description("${bamboo.g.gitTag}")
.tagName("${bamboo.g.gitTag}")
.defaultRepository())
.dockerConfiguration(
new DockerConfiguration()
.image("nexus.iqser.com:5001/infra/release_build:4.4.1")),
new Job("Licence Job", new BambooKey("LICENCE"))
.enabled(false)
.tasks(
new VcsCheckoutTask()
.description("Checkout default repository.")
.checkoutItems(new CheckoutItem().defaultRepository()),
new ScriptTask()
.description("Build licence.")
.location(Location.FILE)
.fileFromPath("bamboo-specs/src/main/resources/scripts/create-licence.sh"))
.dockerConfiguration(
new DockerConfiguration()
.image("nexus.iqser.com:5001/infra/maven:3.6.2-jdk-13-3.0.0")
.volume("/etc/maven/settings.xml", "/usr/share/maven/ref/settings.xml")
.volume("/var/run/docker.sock", "/var/run/docker.sock"))))
.linkedRepositories("RR / " + SERVICE_NAME)
.triggers(new BitbucketServerTrigger())
.planBranchManagement(new PlanBranchManagement()
.createForVcsBranch()
.delete(new BranchCleanup()
.whenInactiveInRepositoryAfterDays(14))
.notificationForCommitters());
}
}

View File

@ -0,0 +1,19 @@
#!/bin/bash
set -e
if [[ \"${bamboo_version_tag}\" != \"dev\" ]]
then
${bamboo_capability_system_builder_mvn3_Maven_3}/bin/mvn \
-f ${bamboo_build_working_directory}/pom.xml \
versions:set \
-DnewVersion=${bamboo_version_tag}
${bamboo_capability_system_builder_mvn3_Maven_3}/bin/mvn \
-f ${bamboo_build_working_directory}/pom.xml \
-B clean deploy \
-e -DdeployAtEnd=true \
-Dmaven.wagon.http.ssl.insecure=true \
-Dmaven.wagon.http.ssl.allowall=true \
-Dmaven.wagon.http.ssl.ignore.validity.dates=true \
-DaltDeploymentRepository=iqser_release::default::https://nexus.iqser.com/repository/gin4-platform-releases
fi

View File

@ -0,0 +1,13 @@
#!/bin/bash
set -e
SERVICE_NAME=$1
python3 -m venv build_venv
source build_venv/bin/activate
python3 -m pip install --upgrade pip
echo "index-url = https://${bamboo_nexus_user}:${bamboo_nexus_password}@nexus.iqser.com/repository/python-combind/simple" >> pip.conf
docker build -f Dockerfile -t nexus.iqser.com:5001/red/$SERVICE_NAME:${bamboo_version_tag} .
echo "${bamboo_nexus_password}" | docker login --username "${bamboo_nexus_user}" --password-stdin nexus.iqser.com:5001
docker push nexus.iqser.com:5001/red/$SERVICE_NAME:${bamboo_version_tag}

View File

@ -0,0 +1,9 @@
#!/bin/bash
set -e
if [[ "${bamboo_version_tag}" = "dev" ]]
then
echo "gitTag=${bamboo_planRepository_1_branch}_${bamboo_buildNumber}" > git.tag
else
echo "gitTag=${bamboo_version_tag}" > git.tag
fi

View File

@ -0,0 +1,58 @@
#!/bin/bash
set -e
export JAVA_HOME=/usr/bin/sonar-scanner/jre
python3 -m venv build_venv
source build_venv/bin/activate
python3 -m pip install --upgrade pip
python3 -m pip install dependency-check
python3 -m pip install docker-compose
python3 -m pip install coverage
echo "docker-compose down"
docker-compose down
sleep 30
echo "coverage report generation"
bash run_tests.sh
if [ ! -f reports/coverage.xml ]
then
exit 1
fi
SERVICE_NAME=$1
echo "dependency-check:aggregate"
mkdir -p reports
dependency-check --enableExperimental -f JSON -f XML \
--disableAssembly -s . -o reports --project $SERVICE_NAME --exclude ".git/**" --exclude "venv/**" \
--exclude "build_venv/**" --exclude "**/__pycache__/**" --exclude "bamboo-specs/**"
if [[ -z "${bamboo_repository_pr_key}" ]]
then
echo "Sonar Scan for branch: ${bamboo_planRepository_1_branch}"
/usr/bin/sonar-scanner/bin/sonar-scanner -X\
-Dsonar.projectKey=RED_$SERVICE_NAME \
-Dsonar.host.url=https://sonarqube.iqser.com \
-Dsonar.login=${bamboo_sonarqube_api_token_secret} \
-Dsonar.dependencyCheck.jsonReportPath=reports/dependency-check-report.json \
-Dsonar.dependencyCheck.xmlReportPath=reports/dependency-check-report.xml \
-Dsonar.dependencyCheck.htmlReportPath=reports/dependency-check-report.html \
-Dsonar.python.coverage.reportPaths=reports/coverage.xml
else
echo "Sonar Scan for PR with key1: ${bamboo_repository_pr_key}"
/usr/bin/sonar-scanner/bin/sonar-scanner \
-Dsonar.projectKey=RED_$SERVICE_NAME \
-Dsonar.host.url=https://sonarqube.iqser.com \
-Dsonar.login=${bamboo_sonarqube_api_token_secret} \
-Dsonar.pullrequest.key=${bamboo_repository_pr_key} \
-Dsonar.pullrequest.branch=${bamboo_repository_pr_sourceBranch} \
-Dsonar.pullrequest.base=${bamboo_repository_pr_targetBranch} \
-Dsonar.dependencyCheck.jsonReportPath=reports/dependency-check-report.json \
-Dsonar.dependencyCheck.xmlReportPath=reports/dependency-check-report.xml \
-Dsonar.dependencyCheck.htmlReportPath=reports/dependency-check-report.html \
-Dsonar.python.coverage.reportPaths=reports/coverage.xml
fi

View File

@ -0,0 +1,16 @@
package buildjob;
import com.atlassian.bamboo.specs.api.builders.plan.Plan;
import com.atlassian.bamboo.specs.api.exceptions.PropertiesValidationException;
import com.atlassian.bamboo.specs.api.util.EntityPropertiesBuilders;
import org.junit.Test;
public class PlanSpecTest {
@Test
public void checkYourPlanOffline() throws PropertiesValidationException {
Plan plan = new PlanSpec().createDockerBuildPlan();
EntityPropertiesBuilders.build(plan);
}
}

6
banner.txt Normal file
View File

@ -0,0 +1,6 @@
___ _ _ ___ __
o O O | _ \ | || | |_ _| _ _ / _| _ _ __ _
o | _/ \_, | | | | ' \ | _| | '_| / _` |
TS__[O] _|_|_ _|__/ |___| |_||_| _|_|_ _|_|_ \__,_|
{======|_| ``` |_| ````|_|`````|_|`````|_|`````|_|`````|_|`````|
./o--000' `-0-0-' `-0-0-' `-0-0-' `-0-0-' `-0-0-' `-0-0-' `-0-0-'

27459
bom.json

File diff suppressed because it is too large Load Diff

87
config.yaml Executable file
View File

@ -0,0 +1,87 @@
service:
logging_level: $LOGGING_LEVEL_ROOT|DEBUG # Logging level for service logger
name: $SERVICE_NAME|research # Default service name for research service, used for prometheus metric name
response_formatter: default # formats analysis payloads of response messages
upload_formatter: projecting # formats analysis payloads of objects uploaded to storage
# Note: This is not really the right place for this. It should be configured on a per-service basis.
operation: $OPERATION|default
# operation needs to be specified in deployment config for services that are called without an operation specified
operations:
conversion:
input:
multi: False
subdir: ""
extension: ORIGIN.pdf.gz
output:
subdir: "pages_as_images"
extension: json.gz
extraction:
input:
multi: False
subdir: ""
extension: ORIGIN.pdf.gz
output:
subdir: "extracted_images"
extension: json.gz
table_parsing:
input:
multi: True
subdir: "pages_as_images"
extension: json.gz
output:
subdir: "table_parses"
extension: json.gz
image_classification:
input:
multi: True
subdir: "extracted_images"
extension: json.gz
output:
subdir: ""
extension: IMAGE_INFO.json.gz
default:
input:
multi: False
subdir: ""
extension: in.gz
output:
subdir: ""
extension: out.gz
probing_webserver:
host: $PROBING_WEBSERVER_HOST|"0.0.0.0" # Probe webserver address
port: $PROBING_WEBSERVER_PORT|8080 # Probe webserver port
mode: $PROBING_WEBSERVER_MODE|production # webserver mode: {development, production}
rabbitmq:
host: $RABBITMQ_HOST|localhost # RabbitMQ host address
port: $RABBITMQ_PORT|5672 # RabbitMQ host port
user: $RABBITMQ_USERNAME|user # RabbitMQ username
password: $RABBITMQ_PASSWORD|bitnami # RabbitMQ password
heartbeat: $RABBITMQ_HEARTBEAT|7200 # Controls AMQP heartbeat timeout in seconds
queues:
input: $REQUEST_QUEUE|request_queue # Requests to service
output: $RESPONSE_QUEUE|response_queue # Responses by service
dead_letter: $DEAD_LETTER_QUEUE|dead_letter_queue # Messages that failed to process
callback:
analysis_endpoint: $ANALYSIS_ENDPOINT|"http://127.0.0.1:5000"
storage:
backend: $STORAGE_BACKEND|s3 # The type of storage to use {s3, azure}
bucket: "STORAGE_BUCKET_NAME|STORAGE_AZURECONTAINERNAME|pyinfra-test-bucket" # The bucket / container to pull files specified in queue requests from
s3:
endpoint: $STORAGE_ENDPOINT|"http://127.0.0.1:9000"
access_key: $STORAGE_KEY|root
secret_key: $STORAGE_SECRET|password
region: $STORAGE_REGION|"eu-west-1"
azure:
connection_string: $STORAGE_AZURECONNECTIONSTRING|"DefaultEndpointsProtocol=https;AccountName=iqserdevelopment;AccountKey=4imAbV9PYXaztSOMpIyAClg88bAZCXuXMGJG0GA1eIBpdh2PlnFGoRBnKqLy2YZUSTmZ3wJfC7tzfHtuC6FEhQ==;EndpointSuffix=core.windows.net"
retry:
tries: 3
delay: 5
jitter: [1, 3]

76
doc/signatures.txt Normal file
View File

@ -0,0 +1,76 @@
Processing service interface
image classification now : JSON (Mdat PDF) -> (Data PDF -> JSON [Mdat ImObj]
image classification future: JSON [Mdat FunkIm] | Mdat PDF -> (Data [FunkIm] -> JSON [Mdat FunkIm])
object detection : JSON [Mdat PagIm] | Mdat PDF -> (Data [PagIm] -> JSON [[Mdat SemIm]])
NER : JSON [Mdat Dict] -> (Data [Dict] -> JSON [Mdat])
table parsing : JSON [Mdat FunkIm] | Mdat PDF -> (Data [PagIm] -> JSON [[Mdat FunkIm]])
pdf2image : Mdat (fn, [Int], PDF) -> (JSON ([Int], Data PDF) -> [(FunkIm, Mdat)])
image classification now : Mdat (fn, [Int], file) -> (Data PDF -> JSON [Mdat ImObj]
image classification future: Mdat (fn, [Int], dir) -> (Data [FunkIm] -> JSON [Mdat FunkIm])
object detection : Mdat (fn, [Int], dir) -> (Data [PagIm] -> JSON [[Mdat SemIm]])
table parsing : Mdat (fn, [Int], dir) -> (Data [PagIm] -> JSON [[Mdat FunkIm]])
NER : Mdat (fn, [Int], file) -> (Data [Dict] -> JSON [Mdat])
pdf2image : Mdat (fn, [Int], file) -> (JSON ([Int], Data PDF) -> [(FunkIm, Mdat)])
from funcy import identity
access(mdat):
if mdat.path is file:
request = {"data": load(mdat.path), "metadata": mdat}
elif mdat.path is dir:
get_indexed = identity if not mdat.idx else itemgetter(*mdat.idx)
request = {"data": get_indexed(get_files(mdat.path)), "metadata": mdat}
else:
raise BadRequest
storage:
fileId: {
pages: [PagIm]
images: [FunkIm]
sections: gz
}
---------------
assert if targetPath is file then response list must be singleton
{index: [], dir: fileID.pdf.gz, targetPath: fileID.images.json.gz} -> [{data: pdf bytes, metadata: request: ...] -> [{data: null, metadata: request: null, response: {classification infos: ...}]
image classification now : Mdat (fn, [Int], file) -> [JSON (Data PDF, Mdat)] -> [JSON (Data null, Mdat [ImObj])] | 1 -> 1
assert if targetPath is file then response list must be singleton
{index: [], dir: fileID/images, targetPath: fileID.images.json.gz} -> [{data: image bytes, metadata: request: {image location...}] -> [{data: null, metadata: request: null, response: {classification infos: ...}]
image classification future: Mdat (fn, [Int], dir) -> JSON (Data [FunkIm], Mdat) -> [JSON (Data null, Mdat [FunkIm])] |
object detection : Mdat (fn, [Int], dir) -> (Data [PagIm] -> JSON [[Mdat SemIm]])
table parsing : Mdat (fn, [Int], dir) -> (Data [PagIm] -> JSON [[Mdat FunkIm]])
NER : Mdat (fn, [Int], file) -> (Data [Dict] -> JSON [Mdat])
pdf2image : Mdat (fn, [Int], file) -> (JSON ([Int], Data PDF) -> [(FunkIm, Mdat)])
aggregate <==> targetpath is file and index is empty

32
docker-compose.yml Executable file
View File

@ -0,0 +1,32 @@
version: '2'
services:
minio:
image: minio/minio:RELEASE.2022-06-11T19-55-32Z
ports:
- "9000:9000"
environment:
- MINIO_ROOT_PASSWORD=password
- MINIO_ROOT_USER=root
volumes:
- ./data/minio_store:/data
command: server /data
network_mode: "bridge"
rabbitmq:
image: docker.io/bitnami/rabbitmq:3.9.8
ports:
- '4369:4369'
- '5551:5551'
- '5552:5552'
- '5672:5672'
- '25672:25672'
- '15672:15672'
environment:
- RABBITMQ_SECURE_PASSWORD=yes
- RABBITMQ_VM_MEMORY_HIGH_WATERMARK=100%
- RABBITMQ_DISK_FREE_ABSOLUTE_LIMIT=20Gi
network_mode: "bridge"
volumes:
- /opt/bitnami/rabbitmq/.rabbitmq/:/data/bitnami
volumes:
mdata:

6802
poetry.lock generated

File diff suppressed because it is too large Load Diff

View File

@ -1 +0,0 @@

65
pyinfra/callback.py Normal file
View File

@ -0,0 +1,65 @@
import logging
from funcy import merge, omit, lmap
from pyinfra.exceptions import AnalysisFailure
from pyinfra.pipeline_factory import CachedPipelineFactory
logger = logging.getLogger(__name__)
class Callback:
"""This is the callback that is applied to items pulled from the storage. It forwards these items to an analysis
endpoint.
"""
def __init__(self, pipeline_factory: CachedPipelineFactory):
self.pipeline_factory = pipeline_factory
def __get_pipeline(self, endpoint):
return self.pipeline_factory.get_pipeline(endpoint)
@staticmethod
def __run_pipeline(pipeline, analysis_input: dict):
"""
TODO: Since data and metadata are passed as singletons, there is no buffering and hence no batching happening
within the pipeline. However, the queue acknowledgment logic needs to be changed in order to facilitate
passing non-singletons, to only ack a message, once a response is pulled from the output queue of the
pipeline. Probably the pipeline return value needs to contains the queue message frame (or so), in order for
the queue manager to tell which message to ack.
TODO: casting list (lmap) on `analysis_response_stream` is a temporary solution, while the client pipeline
operates on singletons ([data], [metadata]).
"""
def combine_storage_item_metadata_with_queue_message_metadata(analysis_input):
return merge(analysis_input["metadata"], omit(analysis_input, ["data", "metadata"]))
def remove_queue_message_metadata(analysis_result):
metadata = omit(analysis_result["metadata"], queue_message_keys(analysis_input))
return {**analysis_result, "metadata": metadata}
def queue_message_keys(analysis_input):
return {*analysis_input.keys()}.difference({"data", "metadata"})
try:
data = analysis_input["data"]
metadata = combine_storage_item_metadata_with_queue_message_metadata(analysis_input)
analysis_response_stream = pipeline([data], [metadata])
analysis_response_stream = lmap(remove_queue_message_metadata, analysis_response_stream)
return analysis_response_stream
except Exception as err:
logger.error(err)
raise AnalysisFailure from err
def __call__(self, analysis_input: dict):
"""data_metadata_pack: {'dossierId': ..., 'fileId': ..., 'pages': ..., 'operation': ...}"""
operation = analysis_input.get("operation", "")
pipeline = self.__get_pipeline(operation)
try:
logging.debug(f"Requesting analysis for operation '{operation}'...")
return self.__run_pipeline(pipeline, analysis_input)
except AnalysisFailure:
logging.warning(f"Exception caught when calling analysis endpoint for operation '{operation}'.")

View File

@ -0,0 +1,120 @@
import logging
from functools import lru_cache
from funcy import project, identity, rcompose
from pyinfra.callback import Callback
from pyinfra.config import parse_disjunction_string
from pyinfra.file_descriptor_builder import RedFileDescriptorBuilder
from pyinfra.file_descriptor_manager import FileDescriptorManager
from pyinfra.pipeline_factory import CachedPipelineFactory
from pyinfra.queue.consumer import Consumer
from pyinfra.queue.queue_manager.pika_queue_manager import PikaQueueManager
from pyinfra.server.client_pipeline import ClientPipeline
from pyinfra.server.dispatcher.dispatchers.rest import RestDispatcher
from pyinfra.server.interpreter.interpreters.rest_callback import RestPickupStreamer
from pyinfra.server.packer.packers.rest import RestPacker
from pyinfra.server.receiver.receivers.rest import RestReceiver
from pyinfra.storage import storages
from pyinfra.visitor import QueueVisitor
from pyinfra.visitor.downloader import Downloader
from pyinfra.visitor.response_formatter.formatters.default import DefaultResponseFormatter
from pyinfra.visitor.response_formatter.formatters.identity import IdentityResponseFormatter
from pyinfra.visitor.strategies.response.aggregation import AggregationStorageStrategy, ProjectingUploadFormatter
logger = logging.getLogger(__name__)
class ComponentFactory:
def __init__(self, config):
self.config = config
@lru_cache(maxsize=None)
def get_consumer(self, callback=None):
callback = callback or self.get_callback()
return Consumer(self.get_visitor(callback), self.get_queue_manager())
@lru_cache(maxsize=None)
def get_callback(self, analysis_base_url=None):
analysis_base_url = analysis_base_url or self.config.rabbitmq.callback.analysis_endpoint
callback = Callback(CachedPipelineFactory(base_url=analysis_base_url, pipeline_factory=self.get_pipeline))
def wrapped(body):
body_repr = project(body, ["dossierId", "fileId", "operation"])
logger.info(f"Processing {body_repr}...")
result = callback(body)
logger.info(f"Completed processing {body_repr}...")
return result
return wrapped
@lru_cache(maxsize=None)
def get_visitor(self, callback):
return QueueVisitor(
callback=callback,
data_loader=self.get_downloader(),
response_strategy=self.get_response_strategy(),
response_formatter=self.get_response_formatter(),
)
@lru_cache(maxsize=None)
def get_queue_manager(self):
return PikaQueueManager(self.config.rabbitmq.queues.input, self.config.rabbitmq.queues.output)
@staticmethod
@lru_cache(maxsize=None)
def get_pipeline(endpoint):
return ClientPipeline(
RestPacker(), RestDispatcher(endpoint), RestReceiver(), rcompose(RestPickupStreamer(), RestReceiver())
)
@lru_cache(maxsize=None)
def get_storage(self):
return storages.get_storage(self.config.storage.backend)
@lru_cache(maxsize=None)
def get_response_strategy(self, storage=None):
return AggregationStorageStrategy(
storage=storage or self.get_storage(),
file_descriptor_manager=self.get_file_descriptor_manager(),
upload_formatter=self.get_upload_formatter(),
)
@lru_cache(maxsize=None)
def get_file_descriptor_manager(self):
return FileDescriptorManager(
bucket_name=parse_disjunction_string(self.config.storage.bucket),
file_descriptor_builder=self.get_operation_file_descriptor_builder(),
)
@lru_cache(maxsize=None)
def get_upload_formatter(self):
return {"identity": identity, "projecting": ProjectingUploadFormatter()}[self.config.service.upload_formatter]
@lru_cache(maxsize=None)
def get_operation_file_descriptor_builder(self):
return RedFileDescriptorBuilder(
operation2file_patterns=self.get_operation2file_patterns(),
default_operation_name=self.config.service.operation,
)
@lru_cache(maxsize=None)
def get_response_formatter(self):
return {"default": DefaultResponseFormatter(), "identity": IdentityResponseFormatter()}[
self.config.service.response_formatter
]
@lru_cache(maxsize=None)
def get_operation2file_patterns(self):
if self.config.service.operation is not "default":
self.config.service.operations["default"] = self.config.service.operations[self.config.service.operation]
return self.config.service.operations
@lru_cache(maxsize=None)
def get_downloader(self, storage=None):
return Downloader(
storage=storage or self.get_storage(),
bucket_name=parse_disjunction_string(self.config.storage.bucket),
file_descriptor_manager=self.get_file_descriptor_manager(),
)

84
pyinfra/config.py Normal file
View File

@ -0,0 +1,84 @@
"""Implements a config object with dot-indexing syntax."""
import os
from functools import partial
from itertools import chain
from operator import truth
from typing import Iterable
from envyaml import EnvYAML
from frozendict import frozendict
from funcy import first, juxt, butlast, last, lmap
from pyinfra.locations import CONFIG_FILE
def _get_item_and_maybe_make_dotindexable(container, item):
ret = container[item]
return DotIndexable(ret) if isinstance(ret, dict) else ret
class DotIndexable:
def __init__(self, x):
self.x = x
def __getattr__(self, item):
return _get_item_and_maybe_make_dotindexable(self.x, item)
def __repr__(self):
return self.x.__repr__()
def __getitem__(self, item):
return self.__getattr__(item)
def __setitem__(self, key, value):
self.x[key] = value
class Config:
def __init__(self, config_path):
self.__config = EnvYAML(config_path)
def __getattr__(self, item):
if item in self.__config:
return _get_item_and_maybe_make_dotindexable(self.__config, item)
def __getitem__(self, item):
return self.__getattr__(item)
def __setitem__(self, key, value):
self.__config.key = value
def to_dict(self, frozen=True):
return to_dict(self.__config.export(), frozen=frozen)
def __hash__(self):
return hash(self.to_dict())
def to_dict(v, frozen=True):
def make_dict(*args, **kwargs):
return (frozendict if frozen else dict)(*args, **kwargs)
if isinstance(v, list):
return tuple(map(partial(to_dict, frozen=frozen), v))
elif isinstance(v, DotIndexable):
return make_dict({k: to_dict(v, frozen=frozen) for k, v in v.x.items()})
elif isinstance(v, dict):
return make_dict({k: to_dict(v, frozen=frozen) for k, v in v.items()})
else:
return v
CONFIG = Config(CONFIG_FILE)
def parse_disjunction_string(disjunction_string):
def try_parse_env_var(disjunction_string):
try:
return os.environ[disjunction_string]
except KeyError:
return None
options = disjunction_string.split("|")
identifiers, fallback_value = juxt(butlast, last)(options)
return first(chain(filter(truth, map(try_parse_env_var, identifiers)), [fallback_value]))

View File

@ -1,133 +0,0 @@
import argparse
import os
from functools import partial
from pathlib import Path
from typing import Union
from dynaconf import Dynaconf, ValidationError, Validator
from funcy import lflatten
from kn_utils.logging import logger
# This path is ment for testing purposes and convenience. It probably won't reflect the actual root path when pyinfra is
# installed as a package, so don't use it in production code, but define your own root path as described in load config.
local_pyinfra_root_path = Path(__file__).parents[2]
def load_settings(
settings_path: Union[str, Path, list] = "config/",
root_path: Union[str, Path] = None,
validators: list[Validator] = None,
):
"""Load settings from .toml files, .env and environment variables. Also ensures a ROOT_PATH environment variable is
set. If ROOT_PATH is not set and no root_path argument is passed, the current working directory is used as root.
Settings paths can be a single .toml file, a folder containing .toml files or a list of .toml files and folders.
If a ROOT_PATH environment variable is set, it is not overwritten by the root_path argument.
If a folder is passed, all .toml files in the folder are loaded. If settings path is None, only .env and
environment variables are loaded. If settings_path are relative paths, they are joined with the root_path argument.
"""
root_path = get_or_set_root_path(root_path)
validators = validators or get_pyinfra_validators()
settings_files = normalize_to_settings_files(settings_path, root_path)
settings = Dynaconf(
load_dotenv=True,
envvar_prefix=False,
settings_files=settings_files,
)
validate_settings(settings, validators)
logger.info("Settings loaded and validated.")
return settings
def normalize_to_settings_files(settings_path: Union[str, Path, list], root_path: Union[str, Path]):
if settings_path is None:
logger.info("No settings path specified, only loading .env end ENVs.")
settings_files = []
elif isinstance(settings_path, str) or isinstance(settings_path, Path):
settings_files = [settings_path]
elif isinstance(settings_path, list):
settings_files = settings_path
else:
raise ValueError(f"Invalid settings path: {settings_path=}")
settings_files = lflatten(map(partial(_normalize_and_verify, root_path=root_path), settings_files))
logger.debug(f"Normalized settings files: {settings_files}")
return settings_files
def _normalize_and_verify(settings_path: Path, root_path: Path):
settings_path = Path(settings_path)
root_path = Path(root_path)
if not settings_path.is_absolute():
logger.debug(f"Settings path is not absolute, joining with root path: {root_path}")
settings_path = root_path / settings_path
if settings_path.is_dir():
logger.debug(f"Settings path is a directory, loading all .toml files in the directory: {settings_path}")
settings_files = list(settings_path.glob("*.toml"))
elif settings_path.is_file():
logger.debug(f"Settings path is a file, loading specified file: {settings_path}")
settings_files = [settings_path]
else:
raise ValueError(f"Invalid settings path: {settings_path=}, {root_path=}")
return settings_files
def get_or_set_root_path(root_path: Union[str, Path] = None):
env_root_path = os.environ.get("ROOT_PATH")
if env_root_path:
root_path = env_root_path
logger.debug(f"'ROOT_PATH' environment variable is set to {root_path}.")
elif root_path:
logger.info(f"'ROOT_PATH' environment variable is not set, setting to {root_path}.")
os.environ["ROOT_PATH"] = str(root_path)
else:
root_path = Path.cwd()
logger.info(f"'ROOT_PATH' environment variable is not set, defaulting to working directory {root_path}.")
os.environ["ROOT_PATH"] = str(root_path)
return root_path
def get_pyinfra_validators():
import pyinfra.config.validators
return lflatten(
validator for validator in pyinfra.config.validators.__dict__.values() if isinstance(validator, list)
)
def validate_settings(settings: Dynaconf, validators):
settings_valid = True
for validator in validators:
try:
validator.validate(settings)
except ValidationError as e:
settings_valid = False
logger.warning(e)
if not settings_valid:
raise ValidationError("Settings validation failed.")
logger.debug("Settings validated.")
def parse_settings_path():
parser = argparse.ArgumentParser()
parser.add_argument(
"settings_path",
help="Path to settings file(s) or folder(s). Must be .toml file(s) or a folder(s) containing .toml files.",
nargs="+",
)
return parser.parse_args().settings_path

View File

@ -1,57 +0,0 @@
from dynaconf import Validator
queue_manager_validators = [
Validator("rabbitmq.host", must_exist=True, is_type_of=str),
Validator("rabbitmq.port", must_exist=True, is_type_of=int),
Validator("rabbitmq.username", must_exist=True, is_type_of=str),
Validator("rabbitmq.password", must_exist=True, is_type_of=str),
Validator("rabbitmq.heartbeat", must_exist=True, is_type_of=int),
Validator("rabbitmq.connection_sleep", must_exist=True, is_type_of=int),
Validator("rabbitmq.input_queue", must_exist=True, is_type_of=str),
Validator("rabbitmq.output_queue", must_exist=True, is_type_of=str),
Validator("rabbitmq.dead_letter_queue", must_exist=True, is_type_of=str),
]
azure_storage_validators = [
Validator("storage.azure.connection_string", must_exist=True, is_type_of=str),
Validator("storage.azure.container", must_exist=True, is_type_of=str),
]
s3_storage_validators = [
Validator("storage.s3.endpoint", must_exist=True, is_type_of=str),
Validator("storage.s3.key", must_exist=True, is_type_of=str),
Validator("storage.s3.secret", must_exist=True, is_type_of=str),
Validator("storage.s3.region", must_exist=True, is_type_of=str),
Validator("storage.s3.bucket", must_exist=True, is_type_of=str),
]
storage_validators = [
Validator("storage.backend", must_exist=True, is_type_of=str),
]
multi_tenant_storage_validators = [
Validator("storage.tenant_server.endpoint", must_exist=True, is_type_of=str),
Validator("storage.tenant_server.public_key", must_exist=True, is_type_of=str),
]
prometheus_validators = [
Validator("metrics.prometheus.prefix", must_exist=True, is_type_of=str),
Validator("metrics.prometheus.enabled", must_exist=True, is_type_of=bool),
]
webserver_validators = [
Validator("webserver.host", must_exist=True, is_type_of=str),
Validator("webserver.port", must_exist=True, is_type_of=int),
]
tracing_validators = [
Validator("tracing.enabled", must_exist=True, is_type_of=bool),
Validator("tracing.type", must_exist=True, is_type_of=str)
]
opentelemetry_validators = [
Validator("tracing.opentelemetry.endpoint", must_exist=True, is_type_of=str),
Validator("tracing.opentelemetry.service_name", must_exist=True, is_type_of=str),
Validator("tracing.opentelemetry.exporter", must_exist=True, is_type_of=str)
]

View File

@ -0,0 +1,8 @@
from functools import lru_cache
from pyinfra.component_factory import ComponentFactory
@lru_cache(maxsize=None)
def get_component_factory(config):
return ComponentFactory(config)

View File

@ -1,169 +0,0 @@
import asyncio
import signal
import sys
import aiohttp
from aiormq.exceptions import AMQPConnectionError
from dynaconf import Dynaconf
from fastapi import FastAPI
from kn_utils.logging import logger
from pyinfra.config.loader import get_pyinfra_validators, validate_settings
from pyinfra.queue.async_manager import AsyncQueueManager, RabbitMQConfig
from pyinfra.queue.callback import Callback
from pyinfra.queue.manager import QueueManager
from pyinfra.utils.opentelemetry import instrument_app, instrument_pika, setup_trace
from pyinfra.webserver.prometheus import (
add_prometheus_endpoint,
make_prometheus_processing_time_decorator_from_settings,
)
from pyinfra.webserver.utils import (
add_health_check_endpoint,
create_webserver_thread_from_settings,
run_async_webserver,
)
shutdown_flag = False
async def graceful_shutdown(manager: AsyncQueueManager, queue_task, webserver_task):
global shutdown_flag
shutdown_flag = True
logger.info("SIGTERM received, shutting down gracefully...")
if queue_task and not queue_task.done():
queue_task.cancel()
# await queue manager shutdown
await asyncio.gather(queue_task, manager.shutdown(), return_exceptions=True)
if webserver_task and not webserver_task.done():
webserver_task.cancel()
# await webserver shutdown
await asyncio.gather(webserver_task, return_exceptions=True)
logger.info("Shutdown complete.")
async def run_async_queues(manager: AsyncQueueManager, app, port, host):
"""Run the async webserver and the async queue manager concurrently."""
queue_task = None
webserver_task = None
tenant_api_available = True
# add signal handler for SIGTERM and SIGINT
loop = asyncio.get_running_loop()
loop.add_signal_handler(
signal.SIGTERM, lambda: asyncio.create_task(graceful_shutdown(manager, queue_task, webserver_task))
)
loop.add_signal_handler(
signal.SIGINT, lambda: asyncio.create_task(graceful_shutdown(manager, queue_task, webserver_task))
)
try:
active_tenants = await manager.fetch_active_tenants()
queue_task = asyncio.create_task(manager.run(active_tenants=active_tenants), name="queues")
webserver_task = asyncio.create_task(run_async_webserver(app, port, host), name="webserver")
await asyncio.gather(queue_task, webserver_task)
except asyncio.CancelledError:
logger.info("Main task was cancelled, initiating shutdown.")
except AMQPConnectionError as e:
logger.warning(f"AMQPConnectionError: {e} - shutting down.")
except (aiohttp.ClientResponseError, aiohttp.ClientConnectorError):
logger.warning("Tenant server did not answer - shutting down.")
tenant_api_available = False
except Exception as e:
logger.error(f"An error occurred while running async queues: {e}", exc_info=True)
sys.exit(1)
finally:
if shutdown_flag:
logger.debug("Graceful shutdown already in progress.")
else:
logger.warning("Initiating shutdown due to error or manual interruption.")
if not tenant_api_available:
sys.exit(0)
if queue_task and not queue_task.done():
queue_task.cancel()
if webserver_task and not webserver_task.done():
webserver_task.cancel()
await asyncio.gather(queue_task, manager.shutdown(), webserver_task, return_exceptions=True)
logger.info("Shutdown complete.")
def start_standard_queue_consumer(
callback: Callback,
settings: Dynaconf,
app: FastAPI = None,
):
"""Default serving logic for research services.
Supplies /health, /ready and /prometheus endpoints (if enabled). The callback is monitored for processing time per
message. Also traces the queue messages via openTelemetry (if enabled).
Workload is received via queue messages and processed by the callback function (see pyinfra.queue.callback for
callbacks).
"""
validate_settings(settings, get_pyinfra_validators())
logger.info("Starting webserver and queue consumer...")
app = app or FastAPI()
if settings.metrics.prometheus.enabled:
logger.info("Prometheus metrics enabled.")
app = add_prometheus_endpoint(app)
callback = make_prometheus_processing_time_decorator_from_settings(settings)(callback)
if settings.tracing.enabled:
setup_trace(settings)
instrument_pika(dynamic_queues=settings.dynamic_tenant_queues.enabled)
instrument_app(app)
if settings.dynamic_tenant_queues.enabled:
logger.info("Dynamic tenant queues enabled. Running async queues.")
config = RabbitMQConfig(
host=settings.rabbitmq.host,
port=settings.rabbitmq.port,
username=settings.rabbitmq.username,
password=settings.rabbitmq.password,
heartbeat=settings.rabbitmq.heartbeat,
input_queue_prefix=settings.rabbitmq.service_request_queue_prefix,
tenant_event_queue_suffix=settings.rabbitmq.tenant_event_queue_suffix,
tenant_exchange_name=settings.rabbitmq.tenant_exchange_name,
service_request_exchange_name=settings.rabbitmq.service_request_exchange_name,
service_response_exchange_name=settings.rabbitmq.service_response_exchange_name,
service_dead_letter_queue_name=settings.rabbitmq.service_dlq_name,
queue_expiration_time=settings.rabbitmq.queue_expiration_time,
pod_name=settings.kubernetes.pod_name,
)
manager = AsyncQueueManager(
config=config,
tenant_service_url=settings.storage.tenant_server.endpoint,
message_processor=callback,
max_concurrent_tasks=(
settings.asyncio.max_concurrent_tasks if hasattr(settings.asyncio, "max_concurrent_tasks") else 10
),
)
else:
logger.info("Dynamic tenant queues disabled. Running sync queues.")
manager = QueueManager(settings)
app = add_health_check_endpoint(app, manager.is_ready)
if isinstance(manager, AsyncQueueManager):
asyncio.run(run_async_queues(manager, app, port=settings.webserver.port, host=settings.webserver.host))
elif isinstance(manager, QueueManager):
webserver = create_webserver_thread_from_settings(app, settings)
webserver.start()
try:
manager.start_consuming(callback)
except Exception as e:
logger.error(f"An error occurred while consuming messages: {e}", exc_info=True)
sys.exit(1)
else:
logger.warning(f"Behavior for type {type(manager)} is not defined")

50
pyinfra/exceptions.py Normal file
View File

@ -0,0 +1,50 @@
class AnalysisFailure(Exception):
pass
class DataLoadingFailure(Exception):
pass
class ProcessingFailure(Exception):
pass
class UnknownStorageBackend(ValueError):
pass
class InvalidEndpoint(ValueError):
pass
class UnknownClient(ValueError):
pass
class ConsumerError(Exception):
pass
class NoSuchContainer(KeyError):
pass
class IntentionalTestException(RuntimeError):
pass
class UnexpectedItemType(ValueError):
pass
class NoBufferCapacity(ValueError):
pass
class InvalidMessage(ValueError):
pass
class InvalidStorageItemFormat(ValueError):
pass

View File

@ -0,0 +1,99 @@
import abc
import os
from operator import itemgetter
from funcy import project
class FileDescriptorBuilder:
@abc.abstractmethod
def build_file_descriptor(self, queue_item_body, end="input"):
raise NotImplementedError
@abc.abstractmethod
def build_matcher(self, file_descriptor):
raise NotImplementedError
@staticmethod
@abc.abstractmethod
def build_storage_upload_info(analysis_payload, request_metadata):
raise NotImplementedError
@abc.abstractmethod
def get_path_prefix(self, queue_item_body):
raise NotImplementedError
class RedFileDescriptorBuilder(FileDescriptorBuilder):
"""Defines concrete descriptors for storage objects based on queue messages"""
def __init__(self, operation2file_patterns, default_operation_name):
self.operation2file_patterns = operation2file_patterns or self.get_default_operation2file_patterns()
self.default_operation_name = default_operation_name
@staticmethod
def get_default_operation2file_patterns():
return {"default": {"input": {"subdir": "", "extension": ".in"}, "output": {"subdir": "", "extension": ".out"}}}
def build_file_descriptor(self, queue_item_body, end="input"):
def pages():
if end == "input":
if "id" in queue_item_body:
return [queue_item_body["id"]]
else:
return queue_item_body["pages"] if file_pattern["multi"] else []
elif end == "output":
return [queue_item_body["id"]]
else:
raise ValueError(f"Invalid argument: {end=}") # TODO: use an enum for `end`
operation = queue_item_body.get("operation", self.default_operation_name)
file_pattern = self.operation2file_patterns[operation][end]
file_descriptor = {
**project(queue_item_body, ["dossierId", "fileId", "pages"]),
"pages": pages(),
"extension": file_pattern["extension"],
"subdir": file_pattern["subdir"],
}
return file_descriptor
def build_matcher(self, file_descriptor):
def make_filename(file_id, subdir, suffix):
return os.path.join(file_id, subdir, suffix) if subdir else f"{file_id}.{suffix}"
dossier_id, file_id, subdir, pages, extension = itemgetter(
"dossierId", "fileId", "subdir", "pages", "extension"
)(file_descriptor)
matcher = os.path.join(
dossier_id, make_filename(file_id, subdir, self.__build_page_regex(pages, subdir) + extension)
)
return matcher
@staticmethod
def __build_page_regex(pages, subdir):
n_pages = len(pages)
if n_pages > 1:
page_re = "id:(" + "|".join(map(str, pages)) + ")."
elif n_pages == 1:
page_re = f"id:{pages[0]}."
else: # no pages specified -> either all pages or no pages, depending on whether a subdir is specified
page_re = r"id:\d+." if subdir else ""
return page_re
@staticmethod
def build_storage_upload_info(analysis_payload, request_metadata):
storage_upload_info = {**request_metadata, "id": analysis_payload["metadata"].get("id", 0)}
return storage_upload_info
def get_path_prefix(self, queue_item_body):
prefix = "/".join(itemgetter("dossierId", "fileId")(self.build_file_descriptor(queue_item_body, end="input")))
return prefix

View File

@ -0,0 +1,63 @@
from pyinfra.file_descriptor_builder import FileDescriptorBuilder
class FileDescriptorManager:
"""Decorates a file descriptor builder with additional convenience functionality and this way provides a
comprehensive interface for all file descriptor related operations, while the concrete descriptor logic is
implemented in a file descriptor builder.
TODO: This is supposed to be fully decoupled from the concrete file descriptor builder implementation, however some
bad coupling is still left.
"""
def __init__(self, bucket_name, file_descriptor_builder: FileDescriptorBuilder):
self.bucket_name = bucket_name
self.operation_file_descriptor_builder = file_descriptor_builder
def get_input_object_name(self, queue_item_body: dict):
return self.get_object_name(queue_item_body, end="input")
def get_output_object_name(self, queue_item_body: dict):
return self.get_object_name(queue_item_body, end="output")
def get_object_name(self, queue_item_body: dict, end):
file_descriptor = self.build_file_descriptor(queue_item_body, end=end)
object_name = self.__build_matcher(file_descriptor)
return object_name
def build_file_descriptor(self, queue_item_body, end="input"):
return self.operation_file_descriptor_builder.build_file_descriptor(queue_item_body, end=end)
def build_input_matcher(self, queue_item_body):
return self.build_matcher(queue_item_body, end="input")
def build_output_matcher(self, queue_item_body):
return self.build_matcher(queue_item_body, end="output")
def build_matcher(self, queue_item_body, end):
file_descriptor = self.build_file_descriptor(queue_item_body, end=end)
return self.__build_matcher(file_descriptor)
def __build_matcher(self, file_descriptor):
return self.operation_file_descriptor_builder.build_matcher(file_descriptor)
def get_input_object_descriptor(self, queue_item_body):
return self.get_object_descriptor(queue_item_body, end="input")
def get_output_object_descriptor(self, storage_upload_info):
return self.get_object_descriptor(storage_upload_info, end="output")
def get_object_descriptor(self, queue_item_body, end):
# TODO: this is complected with the Storage class API
# FIXME: bad coupling
return {
"bucket_name": self.bucket_name,
"object_name": self.get_object_name(queue_item_body, end=end),
}
def build_storage_upload_info(self, analysis_payload, request_metadata):
return self.operation_file_descriptor_builder.build_storage_upload_info(analysis_payload, request_metadata)
def get_path_prefix(self, queue_item_body):
return self.operation_file_descriptor_builder.get_path_prefix(queue_item_body)

63
pyinfra/flask.py Normal file
View File

@ -0,0 +1,63 @@
import logging
import requests
from flask import Flask, jsonify
from waitress import serve
from pyinfra.config import CONFIG
logger = logging.getLogger()
def run_probing_webserver(app, host=None, port=None, mode=None):
if not host:
host = CONFIG.probing_webserver.host
if not port:
port = CONFIG.probing_webserver.port
if not mode:
mode = CONFIG.probing_webserver.mode
if mode == "development":
app.run(host=host, port=port, debug=True)
elif mode == "production":
serve(app, host=host, port=port)
def set_up_probing_webserver():
# TODO: implement meaningful checks
app = Flask(__name__)
informed_about_missing_prometheus_endpoint = False
@app.route("/ready", methods=["GET"])
def ready():
resp = jsonify("OK")
resp.status_code = 200
return resp
@app.route("/health", methods=["GET"])
def healthy():
resp = jsonify("OK")
resp.status_code = 200
return resp
@app.route("/prometheus", methods=["GET"])
def get_metrics_from_analysis_endpoint():
nonlocal informed_about_missing_prometheus_endpoint
try:
resp = requests.get(f"{CONFIG.rabbitmq.callback.analysis_endpoint}/prometheus")
resp.raise_for_status()
except ConnectionError:
return ""
except requests.exceptions.HTTPError as err:
if resp.status_code == 404:
if not informed_about_missing_prometheus_endpoint:
logger.warning(f"Got no metrics from analysis prometheus endpoint: {err}")
informed_about_missing_prometheus_endpoint = True
else:
logging.warning(f"Caught {err}")
return resp.text
return app

18
pyinfra/locations.py Normal file
View File

@ -0,0 +1,18 @@
"""Defines constant paths relative to the module root path."""
from pathlib import Path
MODULE_DIR = Path(__file__).resolve().parents[0]
PACKAGE_ROOT_DIR = MODULE_DIR.parents[0]
TEST_DIR = PACKAGE_ROOT_DIR / "test"
CONFIG_FILE = PACKAGE_ROOT_DIR / "config.yaml"
TEST_CONFIG_FILE = TEST_DIR / "config.yaml"
COMPOSE_PATH = PACKAGE_ROOT_DIR
BANNER_FILE = PACKAGE_ROOT_DIR / "banner.txt"

View File

@ -0,0 +1,14 @@
import abc
class ParsingError(Exception):
pass
class BlobParser(abc.ABC):
@abc.abstractmethod
def parse(self, blob: bytes):
pass
def __call__(self, blob: bytes):
return self.parse(blob)

View File

@ -0,0 +1,67 @@
import logging
from funcy import rcompose
from pyinfra.parser.blob_parser import ParsingError
logger = logging.getLogger(__name__)
class Either:
def __init__(self, item):
self.item = item
def bind(self):
return self.item
class Left(Either):
pass
class Right(Either):
pass
class EitherParserWrapper:
def __init__(self, parser):
self.parser = parser
def __log(self, result):
if isinstance(result, Right):
logger.log(logging.DEBUG - 5, f"{self.parser.__class__.__name__} succeeded or forwarded on {result.bind()}")
else:
logger.log(logging.DEBUG - 5, f"{self.parser.__class__.__name__} failed on {result.bind()}")
return result
def parse(self, item: Either):
if isinstance(item, Left):
try:
return Right(self.parser(item.bind()))
except ParsingError:
return item
elif isinstance(item, Right):
return item
else:
return self.parse(Left(item))
def __call__(self, item):
return self.__log(self.parse(item))
class EitherParserComposer:
def __init__(self, *parsers):
self.parser = rcompose(*map(EitherParserWrapper, parsers))
def parse(self, item):
result = self.parser(item)
if isinstance(result, Right):
return result.bind()
else:
raise ParsingError("All parsers failed.")
def __call__(self, item):
return self.parse(item)

View File

@ -0,0 +1,7 @@
from pyinfra.parser.blob_parser import BlobParser
class IdentityBlobParser(BlobParser):
def parse(self, data: bytes):
return data

View File

@ -0,0 +1,21 @@
import json
from pyinfra.parser.blob_parser import BlobParser, ParsingError
from pyinfra.server.packing import string_to_bytes
class JsonBlobParser(BlobParser):
def parse(self, data: bytes):
try:
data = data.decode()
data = json.loads(data)
except (UnicodeDecodeError, json.JSONDecodeError, AttributeError) as err:
raise ParsingError from err
try:
data["data"] = string_to_bytes(data["data"])
except (KeyError, TypeError) as err:
raise ParsingError from err
return data

View File

@ -0,0 +1,9 @@
from pyinfra.parser.blob_parser import BlobParser, ParsingError
class StringBlobParser(BlobParser):
def parse(self, data: bytes):
try:
return data.decode()
except Exception as err:
raise ParsingError from err

View File

@ -0,0 +1,18 @@
class CachedPipelineFactory:
def __init__(self, base_url, pipeline_factory):
self.base_url = base_url
self.operation2pipeline = {}
self.pipeline_factory = pipeline_factory
def get_pipeline(self, operation: str):
pipeline = self.operation2pipeline.get(operation, None) or self.__register_pipeline(operation)
return pipeline
def __register_pipeline(self, operation):
endpoint = self.__make_endpoint(operation)
pipeline = self.pipeline_factory(endpoint)
self.operation2pipeline[operation] = pipeline
return pipeline
def __make_endpoint(self, operation):
return f"{self.base_url}/{operation}"

View File

@ -1,329 +0,0 @@
import asyncio
import concurrent.futures
import json
from dataclasses import dataclass, field
from typing import Any, Callable, Dict, Set
import aiohttp
from aio_pika import ExchangeType, IncomingMessage, Message, connect
from aio_pika.abc import (
AbstractChannel,
AbstractConnection,
AbstractExchange,
AbstractIncomingMessage,
AbstractQueue,
)
from aio_pika.exceptions import (
ChannelClosed,
ChannelInvalidStateError,
ConnectionClosed,
)
from aiormq.exceptions import AMQPConnectionError
from kn_utils.logging import logger
from kn_utils.retry import retry
@dataclass
class RabbitMQConfig:
host: str
port: int
username: str
password: str
heartbeat: int
input_queue_prefix: str
tenant_event_queue_suffix: str
tenant_exchange_name: str
service_request_exchange_name: str
service_response_exchange_name: str
service_dead_letter_queue_name: str
queue_expiration_time: int
pod_name: str
connection_params: Dict[str, object] = field(init=False)
def __post_init__(self):
self.connection_params = {
"host": self.host,
"port": self.port,
"login": self.username,
"password": self.password,
"client_properties": {"heartbeat": self.heartbeat},
}
class AsyncQueueManager:
def __init__(
self,
config: RabbitMQConfig,
tenant_service_url: str,
message_processor: Callable[[Dict[str, Any]], Dict[str, Any]],
max_concurrent_tasks: int = 10,
):
self.config = config
self.tenant_service_url = tenant_service_url
self.message_processor = message_processor
self.semaphore = asyncio.Semaphore(max_concurrent_tasks)
self.connection: AbstractConnection | None = None
self.channel: AbstractChannel | None = None
self.tenant_exchange: AbstractExchange | None = None
self.input_exchange: AbstractExchange | None = None
self.output_exchange: AbstractExchange | None = None
self.tenant_exchange_queue: AbstractQueue | None = None
self.tenant_queues: Dict[str, AbstractChannel] = {}
self.consumer_tags: Dict[str, str] = {}
self.message_count: int = 0
@retry(tries=5, exceptions=AMQPConnectionError, reraise=True, logger=logger)
async def connect(self) -> None:
logger.info("Attempting to connect to RabbitMQ...")
self.connection = await connect(**self.config.connection_params)
self.connection.close_callbacks.add(self.on_connection_close)
self.channel = await self.connection.channel()
await self.channel.set_qos(prefetch_count=1)
logger.info("Successfully connected to RabbitMQ")
async def on_connection_close(self, sender, exc):
"""This is a callback for unexpected connection closures."""
logger.debug(f"Sender: {sender}")
if isinstance(exc, ConnectionClosed):
logger.warning("Connection to RabbitMQ lost. Attempting to reconnect...")
try:
active_tenants = await self.fetch_active_tenants()
await self.run(active_tenants=active_tenants)
logger.debug("Reconnected to RabbitMQ successfully")
except Exception as e:
logger.warning(f"Failed to reconnect to RabbitMQ: {e}")
# cancel queue manager and webserver to shutdown service
tasks = [t for t in asyncio.all_tasks() if t is not asyncio.current_task()]
[task.cancel() for task in tasks if task.get_name() in ["queues", "webserver"]]
else:
logger.debug("Connection closed on purpose.")
async def is_ready(self) -> bool:
if self.connection is None or self.connection.is_closed:
try:
await self.connect()
except Exception as e:
logger.error(f"Failed to connect to RabbitMQ: {e}")
return False
return True
@retry(tries=5, exceptions=(AMQPConnectionError, ChannelInvalidStateError), reraise=True, logger=logger)
async def setup_exchanges(self) -> None:
self.tenant_exchange = await self.channel.declare_exchange(
self.config.tenant_exchange_name, ExchangeType.TOPIC, durable=True
)
self.input_exchange = await self.channel.declare_exchange(
self.config.service_request_exchange_name, ExchangeType.DIRECT, durable=True
)
self.output_exchange = await self.channel.declare_exchange(
self.config.service_response_exchange_name, ExchangeType.DIRECT, durable=True
)
# we must declare DLQ to handle error messages
self.dead_letter_queue = await self.channel.declare_queue(
self.config.service_dead_letter_queue_name, durable=True
)
@retry(tries=5, exceptions=(AMQPConnectionError, ChannelInvalidStateError), reraise=True, logger=logger)
async def setup_tenant_queue(self) -> None:
self.tenant_exchange_queue = await self.channel.declare_queue(
f"{self.config.pod_name}_{self.config.tenant_event_queue_suffix}",
durable=True,
arguments={
"x-dead-letter-exchange": "",
"x-dead-letter-routing-key": self.config.service_dead_letter_queue_name,
"x-expires": self.config.queue_expiration_time,
},
)
await self.tenant_exchange_queue.bind(self.tenant_exchange, routing_key="tenant.*")
self.consumer_tags["tenant_exchange_queue"] = await self.tenant_exchange_queue.consume(
self.process_tenant_message
)
async def process_tenant_message(self, message: AbstractIncomingMessage) -> None:
try:
async with message.process():
message_body = json.loads(message.body.decode())
logger.debug(f"Tenant message received: {message_body}")
tenant_id = message_body["tenantId"]
routing_key = message.routing_key
if routing_key == "tenant.created":
await self.create_tenant_queues(tenant_id)
elif routing_key == "tenant.delete":
await self.delete_tenant_queues(tenant_id)
except Exception as e:
logger.error(e, exc_info=True)
async def create_tenant_queues(self, tenant_id: str) -> None:
queue_name = f"{self.config.input_queue_prefix}_{tenant_id}"
logger.info(f"Declaring queue: {queue_name}")
try:
input_queue = await self.channel.declare_queue(
queue_name,
durable=True,
arguments={
"x-dead-letter-exchange": "",
"x-dead-letter-routing-key": self.config.service_dead_letter_queue_name,
},
)
await input_queue.bind(self.input_exchange, routing_key=tenant_id)
self.consumer_tags[tenant_id] = await input_queue.consume(self.process_input_message)
self.tenant_queues[tenant_id] = input_queue
logger.info(f"Created and started consuming queue for tenant {tenant_id}")
except Exception as e:
logger.error(e, exc_info=True)
async def delete_tenant_queues(self, tenant_id: str) -> None:
if tenant_id in self.tenant_queues:
# somehow queue.delete() does not work here
await self.channel.queue_delete(f"{self.config.input_queue_prefix}_{tenant_id}")
del self.tenant_queues[tenant_id]
del self.consumer_tags[tenant_id]
logger.info(f"Deleted queues for tenant {tenant_id}")
async def process_input_message(self, message: IncomingMessage) -> None:
async def process_message_body_and_await_result(unpacked_message_body):
async with self.semaphore:
loop = asyncio.get_running_loop()
with concurrent.futures.ThreadPoolExecutor(max_workers=1) as thread_pool_executor:
logger.info("Processing payload in a separate thread.")
result = await loop.run_in_executor(
thread_pool_executor, self.message_processor, unpacked_message_body
)
return result
async with message.process(ignore_processed=True):
if message.redelivered:
logger.warning(f"Declining message with {message.delivery_tag=} due to it being redelivered.")
await message.nack(requeue=False)
return
if message.body.decode("utf-8") == "STOP":
logger.info("Received stop signal, stopping consumption...")
await message.ack()
# TODO: shutdown is probably not the right call here - align w/ Dev what should happen on stop signal
await self.shutdown()
return
self.message_count += 1
try:
tenant_id = message.routing_key
filtered_message_headers = (
{k: v for k, v in message.headers.items() if k.lower().startswith("x-")} if message.headers else {}
)
logger.debug(f"Processing message with {filtered_message_headers=}.")
result: dict = await (
process_message_body_and_await_result({**json.loads(message.body), **filtered_message_headers})
or {}
)
if result:
await self.publish_to_output_exchange(tenant_id, result, filtered_message_headers)
await message.ack()
logger.debug(f"Message with {message.delivery_tag=} acknowledged.")
else:
raise ValueError(f"Could not process message with {message.body=}.")
except json.JSONDecodeError:
await message.nack(requeue=False)
logger.error(f"Invalid JSON in input message: {message.body}", exc_info=True)
except FileNotFoundError as e:
logger.warning(f"{e}, declining message with {message.delivery_tag=}.", exc_info=True)
await message.nack(requeue=False)
except Exception as e:
await message.nack(requeue=False)
logger.error(f"Error processing input message: {e}", exc_info=True)
finally:
self.message_count -= 1
async def publish_to_output_exchange(self, tenant_id: str, result: Dict[str, Any], headers: Dict[str, Any]) -> None:
await self.output_exchange.publish(
Message(body=json.dumps(result).encode(), headers=headers),
routing_key=tenant_id,
)
logger.info(f"Published result to queue {tenant_id}.")
@retry(tries=5, exceptions=(aiohttp.ClientResponseError, aiohttp.ClientConnectorError), reraise=True, logger=logger)
async def fetch_active_tenants(self) -> Set[str]:
async with aiohttp.ClientSession() as session:
async with session.get(self.tenant_service_url) as response:
response.raise_for_status()
if response.headers["content-type"].lower() == "application/json":
data = await response.json()
return {tenant["tenantId"] for tenant in data}
else:
logger.error(
f"Failed to fetch active tenants. Content type is not JSON: {response.headers['content-type'].lower()}"
)
return set()
@retry(
tries=5,
exceptions=(
AMQPConnectionError,
ChannelInvalidStateError,
),
reraise=True,
logger=logger,
)
async def initialize_tenant_queues(self, active_tenants: set) -> None:
for tenant_id in active_tenants:
await self.create_tenant_queues(tenant_id)
async def run(self, active_tenants: set) -> None:
await self.connect()
await self.setup_exchanges()
await self.initialize_tenant_queues(active_tenants=active_tenants)
await self.setup_tenant_queue()
logger.info("RabbitMQ handler is running. Press CTRL+C to exit.")
async def close_channels(self) -> None:
try:
if self.channel and not self.channel.is_closed:
# Cancel queues to stop fetching messages
logger.debug("Cancelling queues...")
for tenant, queue in self.tenant_queues.items():
await queue.cancel(self.consumer_tags[tenant])
if self.tenant_exchange_queue:
await self.tenant_exchange_queue.cancel(self.consumer_tags["tenant_exchange_queue"])
while self.message_count != 0:
logger.debug(f"Messages are still being processed: {self.message_count=} ")
await asyncio.sleep(2)
await self.channel.close(exc=asyncio.CancelledError)
logger.debug("Channel closed.")
else:
logger.debug("No channel to close.")
except ChannelClosed:
logger.warning("Channel was already closed.")
except ConnectionClosed:
logger.warning("Connection was lost, unable to close channel.")
except Exception as e:
logger.error(f"Error during channel shutdown: {e}")
async def close_connection(self) -> None:
try:
if self.connection and not self.connection.is_closed:
await self.connection.close(exc=asyncio.CancelledError)
logger.debug("Connection closed.")
else:
logger.debug("No connection to close.")
except ConnectionClosed:
logger.warning("Connection was already closed.")
except Exception as e:
logger.error(f"Error closing connection: {e}")
async def shutdown(self) -> None:
logger.info("Shutting down RabbitMQ handler...")
await self.close_channels()
await self.close_connection()
logger.info("RabbitMQ handler shut down successfully.")

View File

@ -1,42 +0,0 @@
from typing import Callable
from dynaconf import Dynaconf
from kn_utils.logging import logger
from pyinfra.storage.connection import get_storage
from pyinfra.storage.utils import (
download_data_bytes_as_specified_in_message,
upload_data_as_specified_in_message,
DownloadedData,
)
DataProcessor = Callable[[dict[str, DownloadedData] | DownloadedData, dict], dict | list | str]
Callback = Callable[[dict], dict]
def make_download_process_upload_callback(data_processor: DataProcessor, settings: Dynaconf) -> Callback:
"""Default callback for processing queue messages.
Data will be downloaded from the storage as specified in the message. If a tenant id is specified, the storage
will be configured to use that tenant id, otherwise the storage is configured as specified in the settings.
The data is the passed to the dataprocessor, together with the message. The dataprocessor should return a
json serializable object. This object is then uploaded to the storage as specified in the message. The response
message is just the original message.
"""
def inner(queue_message_payload: dict) -> dict:
logger.info(f"Processing payload with download-process-upload callback...")
storage = get_storage(settings, queue_message_payload.get("X-TENANT-ID"))
data: dict[str, DownloadedData] | DownloadedData = download_data_bytes_as_specified_in_message(
storage, queue_message_payload
)
result = data_processor(data, queue_message_payload)
upload_data_as_specified_in_message(storage, queue_message_payload, result)
return queue_message_payload
return inner

16
pyinfra/queue/consumer.py Normal file
View File

@ -0,0 +1,16 @@
from pyinfra.queue.queue_manager.queue_manager import QueueManager
class Consumer:
def __init__(self, visitor, queue_manager: QueueManager):
self.queue_manager = queue_manager
self.visitor = visitor
def consume_and_publish(self, n=None):
self.queue_manager.consume_and_publish(self.visitor, n=n)
def basic_consume_and_publish(self):
self.queue_manager.basic_consume_and_publish(self.visitor)
def consume(self, **kwargs):
return self.queue_manager.consume(**kwargs)

View File

@ -1,229 +0,0 @@
import atexit
import concurrent.futures
import json
import logging
import signal
import sys
from typing import Callable, Union
import pika
import pika.exceptions
from dynaconf import Dynaconf
from kn_utils.logging import logger
from kn_utils.retry import retry
from pika.adapters.blocking_connection import BlockingChannel, BlockingConnection
from pyinfra.config.loader import validate_settings
from pyinfra.config.validators import queue_manager_validators
pika_logger = logging.getLogger("pika")
pika_logger.setLevel(logging.WARNING) # disables non-informative pika log clutter
MessageProcessor = Callable[[dict], dict]
class QueueManager:
def __init__(self, settings: Dynaconf):
validate_settings(settings, queue_manager_validators)
self.input_queue = settings.rabbitmq.input_queue
self.output_queue = settings.rabbitmq.output_queue
self.dead_letter_queue = settings.rabbitmq.dead_letter_queue
self.connection_parameters = self.create_connection_parameters(settings)
self.connection: Union[BlockingConnection, None] = None
self.channel: Union[BlockingChannel, None] = None
self.connection_sleep = settings.rabbitmq.connection_sleep
self.processing_callback = False
self.received_signal = False
atexit.register(self.stop_consuming)
signal.signal(signal.SIGTERM, self._handle_stop_signal)
signal.signal(signal.SIGINT, self._handle_stop_signal)
self.max_retries = settings.rabbitmq.max_retries or 5
self.max_delay = settings.rabbitmq.max_delay or 60
@staticmethod
def create_connection_parameters(settings: Dynaconf):
credentials = pika.PlainCredentials(username=settings.rabbitmq.username, password=settings.rabbitmq.password)
pika_connection_params = {
"host": settings.rabbitmq.host,
"port": settings.rabbitmq.port,
"credentials": credentials,
"heartbeat": settings.rabbitmq.heartbeat,
}
return pika.ConnectionParameters(**pika_connection_params)
@retry(
tries=5,
exceptions=(pika.exceptions.AMQPConnectionError, pika.exceptions.ChannelClosedByBroker),
reraise=True,
)
def establish_connection(self):
if self.connection and self.connection.is_open:
logger.debug("Connection to RabbitMQ already established.")
return
logger.info("Establishing connection to RabbitMQ...")
self.connection = pika.BlockingConnection(parameters=self.connection_parameters)
logger.debug("Opening channel...")
self.channel = self.connection.channel()
self.channel.basic_qos(prefetch_count=1)
args = {
"x-dead-letter-exchange": "",
"x-dead-letter-routing-key": self.dead_letter_queue,
}
self.channel.queue_declare(self.input_queue, arguments=args, auto_delete=False, durable=True)
self.channel.queue_declare(self.output_queue, arguments=args, auto_delete=False, durable=True)
logger.info("Connection to RabbitMQ established, channel open.")
def is_ready(self):
try:
self.establish_connection()
return self.channel.is_open
except Exception as e:
logger.error(f"Failed to establish connection: {e}")
return False
@retry(
tries=5,
exceptions=pika.exceptions.AMQPConnectionError,
reraise=True,
)
def start_consuming(self, message_processor: Callable):
on_message_callback = self._make_on_message_callback(message_processor)
try:
self.establish_connection()
self.channel.basic_consume(self.input_queue, on_message_callback)
logger.info("Starting to consume messages...")
self.channel.start_consuming()
except pika.exceptions.AMQPConnectionError as e:
logger.error(f"AMQP Connection Error: {e}")
raise
except Exception as e:
logger.error(f"An unexpected error occurred while consuming messages: {e}", exc_info=True)
raise
finally:
self.stop_consuming()
def stop_consuming(self):
if self.channel and self.channel.is_open:
logger.info("Stopping consuming...")
self.channel.stop_consuming()
logger.info("Closing channel...")
self.channel.close()
if self.connection and self.connection.is_open:
logger.info("Closing connection to RabbitMQ...")
self.connection.close()
def publish_message_to_input_queue(self, message: Union[str, bytes, dict], properties: pika.BasicProperties = None):
if isinstance(message, str):
message = message.encode("utf-8")
elif isinstance(message, dict):
message = json.dumps(message).encode("utf-8")
self.establish_connection()
self.channel.basic_publish(
"",
self.input_queue,
properties=properties,
body=message,
)
logger.info(f"Published message to queue {self.input_queue}.")
def purge_queues(self):
self.establish_connection()
try:
self.channel.queue_purge(self.input_queue)
self.channel.queue_purge(self.output_queue)
logger.info("Queues purged.")
except pika.exceptions.ChannelWrongStateError:
pass
def get_message_from_output_queue(self):
self.establish_connection()
return self.channel.basic_get(self.output_queue, auto_ack=True)
def _make_on_message_callback(self, message_processor: MessageProcessor):
def process_message_body_and_await_result(unpacked_message_body):
# Processing the message in a separate thread is necessary for the main thread pika client to be able to
# process data events (e.g. heartbeats) while the message is being processed.
with concurrent.futures.ThreadPoolExecutor(max_workers=1) as thread_pool_executor:
logger.info("Processing payload in separate thread.")
future = thread_pool_executor.submit(message_processor, unpacked_message_body)
# TODO: This block is probably not necessary, but kept since the implications of removing it are
# unclear. Remove it in a future iteration where less changes are being made to the code base.
while future.running():
logger.debug("Waiting for payload processing to finish...")
self.connection.sleep(self.connection_sleep)
return future.result()
def on_message_callback(channel, method, properties, body):
logger.info(f"Received message from queue with delivery_tag {method.delivery_tag}.")
self.processing_callback = True
if method.redelivered:
logger.warning(f"Declining message with {method.delivery_tag=} due to it being redelivered.")
channel.basic_nack(method.delivery_tag, requeue=False)
return
if body.decode("utf-8") == "STOP":
logger.info(f"Received stop signal, stopping consuming...")
channel.basic_ack(delivery_tag=method.delivery_tag)
self.stop_consuming()
return
try:
filtered_message_headers = (
{k: v for k, v in properties.headers.items() if k.lower().startswith("x-")}
if properties.headers
else {}
)
logger.debug(f"Processing message with {filtered_message_headers=}.")
result: dict = (
process_message_body_and_await_result({**json.loads(body), **filtered_message_headers}) or {}
)
channel.basic_publish(
"",
self.output_queue,
json.dumps(result).encode(),
properties=pika.BasicProperties(headers=filtered_message_headers),
)
logger.info(f"Published result to queue {self.output_queue}.")
channel.basic_ack(delivery_tag=method.delivery_tag)
logger.debug(f"Message with {method.delivery_tag=} acknowledged.")
except FileNotFoundError as e:
logger.warning(f"{e}, declining message with {method.delivery_tag=}.")
channel.basic_nack(method.delivery_tag, requeue=False)
except Exception:
logger.warning(f"Failed to process message with {method.delivery_tag=}, declining...", exc_info=True)
channel.basic_nack(method.delivery_tag, requeue=False)
raise
finally:
self.processing_callback = False
if self.received_signal:
self.stop_consuming()
sys.exit(0)
return on_message_callback
def _handle_stop_signal(self, signum, *args, **kwargs):
logger.info(f"Received signal {signum}, stopping consuming...")
self.received_signal = True
if not self.processing_callback:
self.stop_consuming()
sys.exit(0)

View File

@ -0,0 +1,172 @@
import json
import logging
from itertools import islice
import pika
from pyinfra.config import CONFIG
from pyinfra.exceptions import ProcessingFailure, DataLoadingFailure
from pyinfra.queue.queue_manager.queue_manager import QueueHandle, QueueManager
from pyinfra.visitor import QueueVisitor
logger = logging.getLogger("pika")
logger.setLevel(logging.WARNING)
logger = logging.getLogger()
def monkey_patch_queue_handle(channel, queue) -> QueueHandle:
empty_message = (None, None, None)
def is_empty_message(message):
return message == empty_message
queue_handle = QueueHandle()
queue_handle.empty = lambda: is_empty_message(channel.basic_get(queue))
def produce_items():
while True:
message = channel.basic_get(queue)
if is_empty_message(message):
break
method_frame, properties, body = message
channel.basic_ack(method_frame.delivery_tag)
yield json.loads(body)
queue_handle.to_list = lambda: list(produce_items())
return queue_handle
def get_connection_params():
credentials = pika.PlainCredentials(username=CONFIG.rabbitmq.user, password=CONFIG.rabbitmq.password)
kwargs = {
"host": CONFIG.rabbitmq.host,
"port": CONFIG.rabbitmq.port,
"credentials": credentials,
"heartbeat": CONFIG.rabbitmq.heartbeat,
}
parameters = pika.ConnectionParameters(**kwargs)
return parameters
def get_n_previous_attempts(props):
return 0 if props.headers is None else props.headers.get("x-retry-count", 0)
def attempts_remain(n_attempts, max_attempts):
return n_attempts < max_attempts
class PikaQueueManager(QueueManager):
def __init__(self, input_queue, output_queue, dead_letter_queue=None, connection_params=None):
super().__init__(input_queue, output_queue)
if not connection_params:
connection_params = get_connection_params()
self.connection = pika.BlockingConnection(parameters=connection_params)
self.channel = self.connection.channel()
self.channel.basic_qos(prefetch_count=1)
if not dead_letter_queue:
dead_letter_queue = CONFIG.rabbitmq.queues.dead_letter
args = {"x-dead-letter-exchange": "", "x-dead-letter-routing-key": dead_letter_queue}
self.channel.queue_declare(input_queue, arguments=args, auto_delete=False, durable=True)
self.channel.queue_declare(output_queue, arguments=args, auto_delete=False, durable=True)
def republish(self, body: bytes, n_current_attempts, frame):
self.channel.basic_publish(
exchange="",
routing_key=self._input_queue,
body=body,
properties=pika.BasicProperties(headers={"x-retry-count": n_current_attempts}),
)
self.channel.basic_ack(delivery_tag=frame.delivery_tag)
def publish_request(self, request):
logger.debug(f"Publishing {request}")
self.channel.basic_publish("", self._input_queue, json.dumps(request).encode())
def reject(self, body, frame):
logger.error(f"Adding to dead letter queue: {body}")
self.channel.basic_reject(delivery_tag=frame.delivery_tag, requeue=False)
def publish_response(self, message, visitor: QueueVisitor, max_attempts=3):
logger.debug(f"Processing {message}.")
frame, properties, body = message
n_attempts = get_n_previous_attempts(properties) + 1
try:
response_messages = visitor(json.loads(body))
if isinstance(response_messages, dict):
response_messages = [response_messages]
for response_message in response_messages:
response_message = json.dumps(response_message).encode()
self.channel.basic_publish("", self._output_queue, response_message)
self.channel.basic_ack(frame.delivery_tag)
except (ProcessingFailure, DataLoadingFailure):
logger.error(f"Message failed to process {n_attempts}/{max_attempts} times: {body}")
if attempts_remain(n_attempts, max_attempts):
self.republish(body, n_attempts, frame)
else:
self.reject(body, frame)
def pull_request(self):
return self.channel.basic_get(self._input_queue)
def consume(self, inactivity_timeout=None, n=None):
logger.debug("Consuming")
gen = self.channel.consume(self._input_queue, inactivity_timeout=inactivity_timeout)
yield from islice(gen, n)
def consume_and_publish(self, visitor: QueueVisitor, n=None):
logger.info(f"Consuming input queue.")
for message in self.consume(n=n):
self.publish_response(message, visitor)
def basic_consume_and_publish(self, visitor: QueueVisitor):
logger.info(f"Basic consuming input queue.")
def callback(channel, frame, properties, body):
message = (frame, properties, body)
return self.publish_response(message, visitor)
self.channel.basic_consume(self._input_queue, callback)
self.channel.start_consuming()
def clear(self):
try:
self.channel.queue_purge(self._input_queue)
self.channel.queue_purge(self._output_queue)
assert self.input_queue.to_list() == []
assert self.output_queue.to_list() == []
except pika.exceptions.ChannelWrongStateError:
pass
@property
def input_queue(self) -> QueueHandle:
return monkey_patch_queue_handle(self.channel, self._input_queue)
@property
def output_queue(self) -> QueueHandle:
return monkey_patch_queue_handle(self.channel, self._output_queue)

View File

@ -0,0 +1,51 @@
import abc
class QueueHandle:
def empty(self) -> bool:
raise NotImplementedError
def to_list(self) -> list:
raise NotImplementedError
class QueueManager(abc.ABC):
def __init__(self, input_queue, output_queue):
self._input_queue = input_queue
self._output_queue = output_queue
@abc.abstractmethod
def publish_request(self, request):
raise NotImplementedError
@abc.abstractmethod
def publish_response(self, response, callback):
raise NotImplementedError
@abc.abstractmethod
def pull_request(self):
raise NotImplementedError
@abc.abstractmethod
def consume(self, **kwargs):
raise NotImplementedError
@abc.abstractmethod
def clear(self):
raise NotImplementedError
@abc.abstractmethod
def input_queue(self) -> QueueHandle:
raise NotImplementedError
@abc.abstractmethod
def output_queue(self) -> QueueHandle:
raise NotImplementedError
@abc.abstractmethod
def consume_and_publish(self, callback, n=None):
raise NotImplementedError
@abc.abstractmethod
def basic_consume_and_publish(self, callback):
raise NotImplementedError

View File

View File

View File

@ -0,0 +1,37 @@
import logging
from collections import deque
from funcy import repeatedly, identity
from pyinfra.exceptions import NoBufferCapacity
from pyinfra.server.nothing import Nothing
logger = logging.getLogger(__name__)
def bufferize(fn, buffer_size=3, persist_fn=identity, null_value=None):
def buffered_fn(item):
if item is not Nothing:
buffer.append(persist_fn(item))
response_payload = fn(repeatedly(buffer.popleft, n_items_to_pop(buffer, item is Nothing)))
return response_payload or null_value
def buffer_full(current_buffer_size):
if current_buffer_size > buffer_size:
logger.warning(f"Overfull buffer. size: {current_buffer_size}; intended capacity: {buffer_size}")
return current_buffer_size == buffer_size
def n_items_to_pop(buffer, final):
current_buffer_size = len(buffer)
return (final or buffer_full(current_buffer_size)) * current_buffer_size
if not buffer_size > 0:
raise NoBufferCapacity("Buffer size must be greater than zero.")
buffer = deque()
return buffered_fn

View File

@ -0,0 +1,24 @@
from collections import deque
from itertools import takewhile
from funcy import repeatedly
from pyinfra.server.nothing import is_not_nothing, Nothing
def stream_queue(queue):
yield from takewhile(is_not_nothing, repeatedly(queue.popleft))
class Queue:
def __init__(self):
self.__queue = deque()
def append(self, package) -> None:
self.__queue.append(package)
def popleft(self):
return self.__queue.popleft() if self.__queue else Nothing
def __bool__(self):
return bool(self.__queue)

View File

@ -0,0 +1,44 @@
from itertools import chain, takewhile
from typing import Iterable
from funcy import first, repeatedly, mapcat
from pyinfra.server.buffering.bufferize import bufferize
from pyinfra.server.nothing import Nothing, is_not_nothing
class FlatStreamBuffer:
"""Wraps a stream buffer and chains its output. Also flushes the stream buffer when applied to an iterable."""
def __init__(self, fn, buffer_size=3):
"""Function `fn` needs to be mappable and return an iterable; ideally `fn` returns a generator."""
self.stream_buffer = StreamBuffer(fn, buffer_size=buffer_size)
def __call__(self, items):
items = chain(items, [Nothing])
yield from mapcat(self.stream_buffer, items)
class StreamBuffer:
"""Puts a streaming function between an input and an output buffer."""
def __init__(self, fn, buffer_size=3):
"""Function `fn` needs to be mappable and return an iterable; ideally `fn` returns a generator."""
self.fn = bufferize(fn, buffer_size=buffer_size, null_value=[])
self.result_stream = chain([])
def __call__(self, item) -> Iterable:
self.push(item)
yield from takewhile(is_not_nothing, repeatedly(self.pop))
def push(self, item):
self.result_stream = chain(self.result_stream, self.compute(item))
def compute(self, item):
try:
yield from self.fn(item)
except TypeError as err:
raise TypeError("Function failed with type-error. Is it mappable?") from err
def pop(self):
return first(chain(self.result_stream, [Nothing]))

View File

@ -0,0 +1,16 @@
from funcy import rcompose, flatten
# TODO: remove the dispatcher component from the pipeline; it no longer actually dispatches
class ClientPipeline:
def __init__(self, packer, dispatcher, receiver, interpreter):
self.pipe = rcompose(
packer,
dispatcher,
receiver,
interpreter,
flatten, # each analysis call returns an iterable. Can be empty, singleton or multi item. Hence, flatten.
)
def __call__(self, *args, **kwargs):
yield from self.pipe(*args, **kwargs)

View File

@ -0,0 +1,27 @@
from itertools import tee
from typing import Iterable
def inspect(prefix="inspect", embed=False):
"""Can be used to inspect compositions of generator functions by placing inbetween two functions."""
def inner(x):
if isinstance(x, Iterable) and not isinstance(x, dict) and not isinstance(x, tuple):
x, y = tee(x)
y = list(y)
else:
y = x
l = f" {len(y)} items" if isinstance(y, list) else ""
print(f"{prefix}{l}:", y)
if embed:
import IPython
IPython.embed()
return x
return inner

View File

View File

@ -0,0 +1,30 @@
import abc
from typing import Iterable
from more_itertools import peekable
from pyinfra.server.nothing import Nothing
def has_next(peekable_iter):
return peekable_iter.peek(Nothing) is not Nothing
class Dispatcher:
def __call__(self, packages: Iterable[dict]):
yield from self.dispatch_methods(packages)
def dispatch_methods(self, packages):
packages = peekable(packages)
for package in packages:
method = self.patch if has_next(packages) else self.post
response = method(package)
yield response
@abc.abstractmethod
def patch(self, package):
raise NotImplementedError
@abc.abstractmethod
def post(self, package):
raise NotImplementedError

View File

@ -0,0 +1,21 @@
from itertools import takewhile
from funcy import repeatedly, notnone
from pyinfra.server.dispatcher.dispatcher import Dispatcher
from pyinfra.server.stream.queued_stream_function import QueuedStreamFunction
class QueuedStreamFunctionDispatcher(Dispatcher):
def __init__(self, queued_stream_function: QueuedStreamFunction):
self.queued_stream_function = queued_stream_function
def patch(self, package):
self.queued_stream_function.push(package)
# TODO: this is wonky and a result of the pipeline components having shifted behaviour through previous
# refactorings. The analogous functionality for the rest pipeline is in the interpreter. Correct this
# asymmetry!
yield from takewhile(notnone, repeatedly(self.queued_stream_function.pop))
def post(self, package):
yield from self.patch(package)

View File

@ -0,0 +1,14 @@
import requests
from pyinfra.server.dispatcher.dispatcher import Dispatcher
class RestDispatcher(Dispatcher):
def __init__(self, endpoint):
self.endpoint = endpoint
def patch(self, package):
return requests.patch(self.endpoint, json=package)
def post(self, package):
return requests.post(self.endpoint, json=package)

View File

View File

View File

@ -0,0 +1,8 @@
import abc
from typing import Iterable
class Interpreter(abc.ABC):
@abc.abstractmethod
def __call__(self, payloads: Iterable):
pass

View File

@ -0,0 +1,8 @@
from typing import Iterable
from pyinfra.server.interpreter.interpreter import Interpreter
class IdentityInterpreter(Interpreter):
def __call__(self, payloads: Iterable):
yield from payloads

View File

@ -0,0 +1,23 @@
from typing import Iterable
import requests
from funcy import takewhile, repeatedly, mapcat
from pyinfra.server.interpreter.interpreter import Interpreter
def stream_responses(endpoint):
def receive():
response = requests.get(endpoint)
return response
def more_is_coming(response):
return response.status_code == 206
response_stream = takewhile(more_is_coming, repeatedly(receive))
yield from response_stream
class RestPickupStreamer(Interpreter):
def __call__(self, payloads: Iterable):
yield from mapcat(stream_responses, payloads)

View File

@ -0,0 +1,39 @@
from functools import lru_cache
from funcy import identity
from prometheus_client import CollectorRegistry, Summary
from pyinfra.server.operation_dispatcher import OperationDispatcher
class OperationDispatcherMonitoringDecorator:
def __init__(self, operation_dispatcher: OperationDispatcher, naming_policy=identity):
self.operation_dispatcher = operation_dispatcher
self.operation2metric = {}
self.naming_policy = naming_policy
@property
@lru_cache(maxsize=None)
def registry(self):
return CollectorRegistry(auto_describe=True)
def make_summary_instance(self, op: str):
return Summary(f"{self.naming_policy(op)}_seconds", f"Time spent on {op}.", registry=self.registry)
def submit(self, operation, request):
return self.operation_dispatcher.submit(operation, request)
def pickup(self, operation):
with self.get_monitor(operation):
return self.operation_dispatcher.pickup(operation)
def get_monitor(self, operation):
monitor = self.operation2metric.get(operation, None) or self.register_operation(operation)
return monitor.time()
def register_operation(self, operation):
summary = self.make_summary_instance(operation)
self.operation2metric[operation] = summary
return summary

View File

@ -0,0 +1,17 @@
from itertools import chain
from typing import Iterable, Union, Tuple
from pyinfra.exceptions import UnexpectedItemType
def normalize(itr: Iterable[Union[Tuple, Iterable]]) -> Iterable[Tuple]:
return chain.from_iterable(map(normalize_item, normalize_item(itr)))
def normalize_item(itm: Union[Tuple, Iterable]) -> Iterable:
if isinstance(itm, tuple):
return [itm]
elif isinstance(itm, Iterable):
return itm
else:
raise UnexpectedItemType("Encountered an item that could not be normalized to a list.")

View File

@ -0,0 +1,6 @@
class Nothing:
pass
def is_not_nothing(x):
return x is not Nothing

View File

@ -0,0 +1,33 @@
from itertools import starmap, tee
from typing import Dict
from funcy import juxt, zipdict, cat
from pyinfra.server.stream.queued_stream_function import QueuedStreamFunction
from pyinfra.server.stream.rest import LazyRestProcessor
class OperationDispatcher:
def __init__(self, operation2function: Dict[str, QueuedStreamFunction]):
submit_suffixes, pickup_suffixes = zip(*map(juxt(submit_suffix, pickup_suffix), operation2function))
processors = starmap(LazyRestProcessor, zip(operation2function.values(), submit_suffixes, pickup_suffixes))
self.operation2processor = zipdict(submit_suffixes + pickup_suffixes, cat(tee(processors)))
@classmethod
@property
def pickup_suffix(cls):
return pickup_suffix("")
def submit(self, operation, request):
return self.operation2processor[operation].push(request)
def pickup(self, operation):
return self.operation2processor[operation].pop()
def submit_suffix(op: str):
return "" if not op else op
def pickup_suffix(op: str):
return "pickup" if not op else f"{op}_pickup"

View File

View File

@ -0,0 +1,8 @@
import abc
from typing import Iterable
class Packer(abc.ABC):
@abc.abstractmethod
def __call__(self, data: Iterable, metadata: Iterable):
pass

View File

@ -0,0 +1,14 @@
from itertools import starmap
from typing import Iterable
from pyinfra.server.packer.packer import Packer
def bundle(data: bytes, metadata: dict):
package = {"data": data, "metadata": metadata}
return package
class IdentityPacker(Packer):
def __call__(self, data: Iterable, metadata):
yield from starmap(bundle, zip(data, metadata))

View File

@ -0,0 +1,9 @@
from typing import Iterable
from pyinfra.server.packer.packer import Packer
from pyinfra.server.packing import pack_data_and_metadata_for_rest_transfer
class RestPacker(Packer):
def __call__(self, data: Iterable[bytes], metadata: Iterable[dict]):
yield from pack_data_and_metadata_for_rest_transfer(data, metadata)

34
pyinfra/server/packing.py Normal file
View File

@ -0,0 +1,34 @@
import base64
from _operator import itemgetter
from itertools import starmap
from typing import Iterable
from funcy import compose
from pyinfra.utils.func import starlift, lift
def pack_data_and_metadata_for_rest_transfer(data: Iterable, metadata: Iterable):
yield from starmap(pack, zip(data, metadata))
def unpack_fn_pack(fn):
return compose(starlift(pack), fn, lift(unpack))
def pack(data: bytes, metadata: dict):
package = {"data": bytes_to_string(data), "metadata": metadata}
return package
def unpack(package):
data, metadata = itemgetter("data", "metadata")(package)
return string_to_bytes(data), metadata
def bytes_to_string(data: bytes) -> str:
return base64.b64encode(data).decode()
def string_to_bytes(data: str) -> bytes:
return base64.b64decode(data.encode())

View File

View File

@ -0,0 +1,8 @@
import abc
from typing import Iterable
class Receiver(abc.ABC):
@abc.abstractmethod
def __call__(self, package: Iterable):
pass

View File

@ -0,0 +1,11 @@
from typing import Iterable
from pyinfra.server.receiver.receiver import Receiver
from funcy import notnone
class QueuedStreamFunctionReceiver(Receiver):
def __call__(self, responses: Iterable):
for response in filter(notnone, responses):
yield response

View File

@ -0,0 +1,16 @@
from typing import Iterable
import requests
from funcy import chunks, flatten
from pyinfra.server.receiver.receiver import Receiver
class RestReceiver(Receiver):
def __init__(self, chunk_size=3):
self.chunk_size = chunk_size
def __call__(self, responses: Iterable[requests.Response]):
for response in flatten(chunks(self.chunk_size, responses)):
response.raise_for_status()
yield response.json()

100
pyinfra/server/server.py Normal file
View File

@ -0,0 +1,100 @@
from functools import singledispatch
from typing import Dict, Callable, Union
from flask import Flask, jsonify, request
from prometheus_client import generate_latest
from pyinfra.config import CONFIG
from pyinfra.server.buffering.stream import FlatStreamBuffer
from pyinfra.server.monitoring import OperationDispatcherMonitoringDecorator
from pyinfra.server.operation_dispatcher import OperationDispatcher
from pyinfra.server.stream.queued_stream_function import QueuedStreamFunction
@singledispatch
def set_up_processing_server(arg: Union[dict, Callable], buffer_size=1):
"""Produces a processing server given a streamable function or a mapping from operations to streamable functions.
Streamable functions are constructed by calling pyinfra.server.utils.make_streamable_and_wrap_in_packing_logic on a
function taking a tuple of data and metadata and also returning a tuple or yielding tuples of data and metadata.
If the function doesn't produce data, data should be an empty byte string.
If the function doesn't produce metadata, metadata should be an empty dictionary.
Args:
arg: streamable function or mapping of operations: str to streamable functions
buffer_size: If your function operates on batches this parameter controls how many items are aggregated before
your function is applied.
TODO: buffer_size has to be controllable on per function basis.
Returns:
Processing server: flask app
"""
pass
@set_up_processing_server.register
def _(operation2stream_fn: dict, buffer_size=1):
return __stream_fn_to_processing_server(operation2stream_fn, buffer_size)
@set_up_processing_server.register
def _(stream_fn: object, buffer_size=1):
operation2stream_fn = {None: stream_fn}
return __stream_fn_to_processing_server(operation2stream_fn, buffer_size)
def __stream_fn_to_processing_server(operation2stream_fn: dict, buffer_size):
operation2stream_fn = {
op: QueuedStreamFunction(FlatStreamBuffer(fn, buffer_size)) for op, fn in operation2stream_fn.items()
}
return __set_up_processing_server(operation2stream_fn)
def __set_up_processing_server(operation2function: Dict[str, QueuedStreamFunction]):
app = Flask(__name__)
dispatcher = OperationDispatcherMonitoringDecorator(
OperationDispatcher(operation2function),
naming_policy=naming_policy,
)
def ok():
resp = jsonify("OK")
resp.status_code = 200
return resp
@app.route("/ready", methods=["GET"])
def ready():
return ok()
@app.route("/health", methods=["GET"])
def healthy():
return ok()
@app.route("/prometheus", methods=["GET"])
def prometheus():
return generate_latest(registry=dispatcher.registry)
@app.route("/<operation>", methods=["POST", "PATCH"])
def submit(operation):
return dispatcher.submit(operation, request)
@app.route("/", methods=["POST", "PATCH"])
def submit_default():
return dispatcher.submit("", request)
@app.route("/<operation>", methods=["GET"])
def pickup(operation):
return dispatcher.pickup(operation)
return app
def naming_policy(op_name: str):
pop_suffix = OperationDispatcher.pickup_suffix
prefix = f"redactmanager_{CONFIG.service.name}"
op_display_name = op_name.replace(f"_{pop_suffix}", "") if op_name != pop_suffix else "default"
complete_display_name = f"{prefix}_{op_display_name}"
return complete_display_name

View File

View File

@ -0,0 +1,21 @@
from funcy import first
from pyinfra.server.buffering.queue import stream_queue, Queue
class QueuedStreamFunction:
def __init__(self, stream_function):
"""Combines a stream function with a queue.
Args:
stream_function: Needs to operate on iterables.
"""
self.queue = Queue()
self.stream_function = stream_function
def push(self, item):
self.queue.append(item)
def pop(self):
items = stream_queue(self.queue)
return first(self.stream_function(items))

View File

@ -0,0 +1,51 @@
import logging
from flask import jsonify
from funcy import drop
from pyinfra.server.nothing import Nothing
from pyinfra.server.stream.queued_stream_function import QueuedStreamFunction
logger = logging.getLogger(__name__)
class LazyRestProcessor:
def __init__(self, queued_stream_function: QueuedStreamFunction, submit_suffix="submit", pickup_suffix="pickup"):
self.submit_suffix = submit_suffix
self.pickup_suffix = pickup_suffix
self.queued_stream_function = queued_stream_function
def push(self, request):
self.queued_stream_function.push(request.json)
return jsonify(replace_suffix(request.base_url, self.submit_suffix, self.pickup_suffix))
def pop(self):
result = self.queued_stream_function.pop() or Nothing
if not valid(result):
logger.error(f"Received invalid result: {result}")
result = Nothing
if result is Nothing:
logger.info("Analysis completed successfully.")
resp = jsonify("No more items left")
resp.status_code = 204
else:
logger.debug("Partial analysis completed.")
resp = jsonify(result)
resp.status_code = 206
return resp
def valid(result):
return isinstance(result, dict) or result is Nothing
def replace_suffix(strn, suf, repl):
return remove_last_n(strn, len(suf)) + repl
def remove_last_n(strn, n):
return "".join(reversed(list(drop(n, reversed(strn)))))

16
pyinfra/server/utils.py Normal file
View File

@ -0,0 +1,16 @@
from funcy import compose, identity
from pyinfra.server.normalization import normalize
from pyinfra.server.packing import unpack_fn_pack
from pyinfra.utils.func import starlift
def make_streamable_and_wrap_in_packing_logic(fn, batched):
fn = make_streamable(fn, batched)
fn = unpack_fn_pack(fn)
return fn
def make_streamable(fn, batched):
# FIXME: something broken with batched == True
return compose(normalize, (identity if batched else starlift)(fn))

View File

View File

@ -0,0 +1,34 @@
from abc import ABC, abstractmethod
class StorageAdapter(ABC):
def __init__(self, client):
self.__client = client
@abstractmethod
def make_bucket(self, bucket_name):
raise NotImplementedError
@abstractmethod
def has_bucket(self, bucket_name):
raise NotImplementedError
@abstractmethod
def put_object(self, bucket_name, object_name, data):
raise NotImplementedError
@abstractmethod
def get_object(self, bucket_name, object_name):
raise NotImplementedError
@abstractmethod
def get_all_objects(self, bucket_name):
raise NotImplementedError
@abstractmethod
def clear_bucket(self, bucket_name):
raise NotImplementedError
@abstractmethod
def get_all_object_names(self, bucket_name, prefix=None):
raise NotImplementedError

View File

@ -0,0 +1,64 @@
import logging
from operator import attrgetter
from azure.storage.blob import ContainerClient, BlobServiceClient
from pyinfra.storage.adapters.adapter import StorageAdapter
logger = logging.getLogger(__name__)
logging.getLogger("azure").setLevel(logging.WARNING)
logging.getLogger("urllib3").setLevel(logging.WARNING)
class AzureStorageAdapter(StorageAdapter):
def __init__(self, client):
super().__init__(client=client)
self.__client: BlobServiceClient = self._StorageAdapter__client
def has_bucket(self, bucket_name):
container_client = self.__client.get_container_client(bucket_name)
return container_client.exists()
def __provide_container_client(self, bucket_name) -> ContainerClient:
self.make_bucket(bucket_name)
container_client = self.__client.get_container_client(bucket_name)
return container_client
def make_bucket(self, bucket_name):
container_client = self.__client.get_container_client(bucket_name)
container_client if container_client.exists() else self.__client.create_container(bucket_name)
def put_object(self, bucket_name, object_name, data):
logger.debug(f"Uploading '{object_name}'...")
container_client = self.__provide_container_client(bucket_name)
blob_client = container_client.get_blob_client(object_name)
blob_client.upload_blob(data, overwrite=True)
def get_object(self, bucket_name, object_name):
logger.debug(f"Downloading '{object_name}'...")
container_client = self.__provide_container_client(bucket_name)
blob_client = container_client.get_blob_client(object_name)
blob_data = blob_client.download_blob()
return blob_data.readall()
def get_all_objects(self, bucket_name):
container_client = self.__provide_container_client(bucket_name)
blobs = container_client.list_blobs()
for blob in blobs:
logger.debug(f"Downloading '{blob.name}'...")
blob_client = container_client.get_blob_client(blob)
blob_data = blob_client.download_blob()
data = blob_data.readall()
yield data
def clear_bucket(self, bucket_name):
logger.debug(f"Clearing Azure container '{bucket_name}'...")
container_client = self.__client.get_container_client(bucket_name)
blobs = container_client.list_blobs()
container_client.delete_blobs(*blobs)
def get_all_object_names(self, bucket_name, prefix=None):
container_client = self.__provide_container_client(bucket_name)
blobs = container_client.list_blobs(name_starts_with=prefix)
return map(attrgetter("name"), blobs)

View File

@ -0,0 +1,58 @@
import io
import logging
from itertools import repeat
from operator import attrgetter
from minio import Minio
from pyinfra.exceptions import DataLoadingFailure
from pyinfra.storage.adapters.adapter import StorageAdapter
logger = logging.getLogger(__name__)
class S3StorageAdapter(StorageAdapter):
def __init__(self, client):
super().__init__(client=client)
self.__client: Minio = self._StorageAdapter__client
def make_bucket(self, bucket_name):
if not self.has_bucket(bucket_name):
self.__client.make_bucket(bucket_name)
def has_bucket(self, bucket_name):
return self.__client.bucket_exists(bucket_name)
def put_object(self, bucket_name, object_name, data):
logger.debug(f"Uploading '{object_name}'...")
data = io.BytesIO(data)
self.__client.put_object(bucket_name, object_name, data, length=data.getbuffer().nbytes)
def get_object(self, bucket_name, object_name):
logger.debug(f"Downloading '{object_name}'...")
response = None
try:
response = self.__client.get_object(bucket_name, object_name)
return response.data
except Exception as err:
raise DataLoadingFailure("Failed getting object from s3 client") from err
finally:
if response:
response.close()
response.release_conn()
def get_all_objects(self, bucket_name):
for obj in self.__client.list_objects(bucket_name, recursive=True):
logger.debug(f"Downloading '{obj.object_name}'...")
yield self.get_object(bucket_name, obj.object_name)
def clear_bucket(self, bucket_name):
logger.debug(f"Clearing S3 bucket '{bucket_name}'...")
objects = self.__client.list_objects(bucket_name, recursive=True)
for obj in objects:
self.__client.remove_object(bucket_name, obj.object_name)
def get_all_object_names(self, bucket_name, prefix=None):
objs = self.__client.list_objects(bucket_name, recursive=True, prefix=prefix)
return map(attrgetter("object_name"), objs)

View File

View File

@ -0,0 +1,11 @@
from azure.storage.blob import BlobServiceClient
from pyinfra.config import CONFIG
def get_azure_client(connection_string=None) -> BlobServiceClient:
if not connection_string:
connection_string = CONFIG.storage.azure.connection_string
return BlobServiceClient.from_connection_string(conn_str=connection_string)

View File

@ -0,0 +1,40 @@
import re
from minio import Minio
from pyinfra.config import CONFIG
from pyinfra.exceptions import InvalidEndpoint
def parse_endpoint(endpoint):
# FIXME Greedy matching (.+) since we get random storage names on kubernetes (eg http://red-research-headless:9000)
# FIXME this has been broken and accepts invalid URLs
endpoint_pattern = r"(?P<protocol>https?)*(?:://)*(?P<address>(?:(?:(?:\d{1,3}\.){3}\d{1,3})|.+)(?:\:\d+)?)"
match = re.match(endpoint_pattern, endpoint)
if not match:
raise InvalidEndpoint(f"Endpoint {endpoint} is invalid; expected {endpoint_pattern}")
return {"secure": match.group("protocol") == "https", "endpoint": match.group("address")}
def get_s3_client(params=None) -> Minio:
"""
Args:
params: dict like
{
"endpoint": <storage_endpoint>
"access_key": <storage_key>
"secret_key": <storage_secret>
}
"""
if not params:
params = CONFIG.storage.s3
return Minio(
**parse_endpoint(params.endpoint),
access_key=params.access_key,
secret_key=params.secret_key,
region=params.region,
)

View File

@ -1,89 +0,0 @@
from functools import lru_cache
import requests
from dynaconf import Dynaconf
from kn_utils.logging import logger
from pyinfra.config.loader import validate_settings
from pyinfra.config.validators import (
multi_tenant_storage_validators,
storage_validators,
)
from pyinfra.storage.storages.azure import get_azure_storage_from_settings
from pyinfra.storage.storages.s3 import get_s3_storage_from_settings
from pyinfra.storage.storages.storage import Storage
from pyinfra.utils.cipher import decrypt
def get_storage(settings: Dynaconf, tenant_id: str = None) -> Storage:
"""Establishes a storage connection.
If tenant_id is provided, gets storage connection information from tenant server. These connections are cached.
Otherwise, gets storage connection information from settings.
"""
logger.info("Establishing storage connection...")
if tenant_id:
logger.info(f"Using tenant storage for {tenant_id}.")
validate_settings(settings, multi_tenant_storage_validators)
return get_storage_for_tenant(
tenant_id,
settings.storage.tenant_server.endpoint,
settings.storage.tenant_server.public_key,
)
logger.info("Using default storage.")
validate_settings(settings, storage_validators)
return storage_dispatcher[settings.storage.backend](settings)
storage_dispatcher = {
"azure": get_azure_storage_from_settings,
"s3": get_s3_storage_from_settings,
}
@lru_cache(maxsize=10)
def get_storage_for_tenant(tenant: str, endpoint: str, public_key: str) -> Storage:
response = requests.get(f"{endpoint}/{tenant}").json()
maybe_azure = response.get("azureStorageConnection")
maybe_s3 = response.get("s3StorageConnection")
assert (maybe_azure or maybe_s3) and not (maybe_azure and maybe_s3), "Only one storage backend can be used."
if maybe_azure:
connection_string = decrypt(public_key, maybe_azure["connectionString"])
backend = "azure"
storage_info = {
"storage": {
"azure": {
"connection_string": connection_string,
"container": maybe_azure["containerName"],
},
}
}
elif maybe_s3:
secret = decrypt(public_key, maybe_s3["secret"])
backend = "s3"
storage_info = {
"storage": {
"s3": {
"endpoint": maybe_s3["endpoint"],
"key": maybe_s3["key"],
"secret": secret,
"region": maybe_s3["region"],
"bucket": maybe_s3["bucketName"],
},
}
}
else:
raise Exception(f"Unknown storage backend in {response}.")
storage_settings = Dynaconf()
storage_settings.update(storage_info)
storage = storage_dispatcher[backend](storage_settings)
return storage

View File

@ -0,0 +1,44 @@
import logging
from pyinfra.config import CONFIG
from pyinfra.exceptions import DataLoadingFailure
from pyinfra.storage.adapters.adapter import StorageAdapter
from pyinfra.utils.retry import retry
logger = logging.getLogger(__name__)
logger.setLevel(CONFIG.service.logging_level)
class Storage:
def __init__(self, adapter: StorageAdapter):
self.__adapter = adapter
def make_bucket(self, bucket_name):
self.__adapter.make_bucket(bucket_name)
def has_bucket(self, bucket_name):
return self.__adapter.has_bucket(bucket_name)
def put_object(self, bucket_name, object_name, data):
self.__adapter.put_object(bucket_name, object_name, data)
def get_object(self, bucket_name, object_name):
return self.__get_object(bucket_name, object_name)
@retry(DataLoadingFailure)
def __get_object(self, bucket_name, object_name):
try:
return self.__adapter.get_object(bucket_name, object_name)
except Exception as err:
logging.error(err)
raise DataLoadingFailure from err
def get_all_objects(self, bucket_name):
return self.__adapter.get_all_objects(bucket_name)
def clear_bucket(self, bucket_name):
return self.__adapter.clear_bucket(bucket_name)
def get_all_object_names(self, bucket_name, prefix=None):
return self.__adapter.get_all_object_names(bucket_name, prefix=prefix)

Some files were not shown because too many files have changed in this diff Show More