16 Commits

Author SHA1 Message Date
Matthias Bisping
2c908162f1 refactoring 2022-04-05 16:31:57 +02:00
Matthias Bisping
4756b8c9bd refactoring 2022-04-05 13:03:22 +02:00
Matthias Bisping
e0885c545a added page range paramter to extractor 2022-04-05 13:03:17 +02:00
Matthias Bisping
ce69f7d160 removed obsolete imports 2022-04-04 21:50:10 +02:00
Matthias Bisping
8f61c4cba2 doc.extract_image(xref) can yield None; hence added filtering for None images 2022-04-04 21:49:45 +02:00
Matthias Bisping
5c23898280 added log messages to all pipelien components; converting pipelien output to list for REST transport; refactoring; added e2e test (flask + pipeline)... but hangs 2022-04-02 02:44:30 +02:00
Matthias Bisping
91dd467142 applied black 2022-03-30 19:38:15 +02:00
Matthias Bisping
258c1ab02d testing laberl mappers for raising of excpetions when encountering unexpected input formats 2022-03-30 18:15:45 +02:00
Matthias Bisping
45a07c620a fixed chaining bug that lead to greedy evaluation 2022-03-30 00:53:34 +02:00
Matthias Bisping
ade318c7b7 made classifier accept tupls of images in addition to np.arrays; added pipeline (wip) 2022-03-29 22:00:34 +02:00
Matthias Bisping
7340fb6dda replaced string keys for metadata fields with enum members 2022-03-29 20:29:44 +02:00
Matthias Bisping
e818b05472 applied black 2022-03-28 16:39:34 +02:00
Matthias Bisping
b818ee4724 fixed misaligned metadata and images 2022-03-28 16:38:46 +02:00
Julius Unverfehrt
9461be29d5 add ParsablePDFImageExtractor test 2022-03-28 15:42:54 +02:00
Matthias Bisping
643ab99bd3 added parsable pdf image extractor 2022-03-28 11:27:05 +02:00
Matthias Bisping
a5147c9a58 added image extractor interface and mock 2022-03-27 23:05:27 +02:00