159 Commits

Author SHA1 Message Date
Julius Unverfehrt
fc8a9e15f8 Pull request #12: Diff font sizes on page
Merge in RR/cv-analysis from diff-font-sizes-on-page to master

Squashed commit of the following:

commit d1b32a3e8fadd45d38040e1ba96672ace240ae29
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Thu Jun 30 14:43:30 2022 +0200

    add tests for figure detection first iteration

commit c38a7701afaad513320f157fe7188b3f11a682ac
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Thu Jun 30 14:26:08 2022 +0200

    update text tests with new test cases

commit ccc0c1a177c7d69c9575ec0267a492c3eef008e3
Author: llocarnini <lillian.locarnini@iqser.com>
Date:   Wed Jun 29 23:09:24 2022 +0200

    added fixture for different scaled text on page and parameter for different font style

commit 5f36a634caad2849e673de7d64abb5b6c3a6055f
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Tue Jun 28 17:03:52 2022 +0200

    add pdf2pdf annotate script for figure detection

commit 7438c170371e166e82ab19f9dfdf1bddd89b7bb3
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Tue Jun 28 16:24:52 2022 +0200

    optimize algorithm

commit 93bf8820f856d3815bab36b13c0df189c45d01e0
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Tue Jun 28 16:11:15 2022 +0200

    black

commit 59c639eec7d3f9da538b0ad6cd6215456c92eb58
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Tue Jun 28 16:10:39 2022 +0200

    add tests for figure detection pipeline

commit bada688d88231843e9d299d255d9c4e0d5ca9788
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Tue Jun 28 13:34:36 2022 +0200

    refactor tests

commit 614388a18b46d670527727c11f63e8174aed3736
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Tue Jun 28 13:34:14 2022 +0200

    introduce pipeline logic for figure detection

commit 7195f892d543294829aebe80e260b4395b89cb36
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Tue Jun 28 11:58:41 2022 +0200

    update reqs

commit 4408e7975853196c5e363dd2ddf62e15fe6f4944
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Tue Jun 28 11:56:16 2022 +0200

    add figure detection test

commit 5ff472c2d96238ca2bc1d2368d3d02e62db98713
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Tue Jun 28 11:56:09 2022 +0200

    add figure detection test

commit 66c1307e57c84789d64cb8e41d8e923ac98eebde
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Tue Jun 28 10:36:50 2022 +0200

    refactor draw boxes to work as intended on inversed image

commit 00a39050d051ae43b2a8f2c4efd6bfbd2609dead
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Tue Jun 28 10:36:11 2022 +0200

    refactor module structure

commit f8af01894c387468334a332e75f7dbf545a91f86
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Mon Jun 27 17:07:47 2022 +0200

    add: figure detection now agnostic to input image background color, refactor tests

commit 3bc63da783bced571d53b29b6d82648c9f93e886
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Mon Jun 27 14:31:15 2022 +0200

    add text removal tests

commit 6e794a7cee3fd7633aa5084839775877b0f8794c
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Mon Jun 27 12:12:27 2022 +0200

    figure detection tests WIP

commit f8b20d4c9845de6434142e3dab69ce467fbc7a75
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Fri Jun 24 15:39:37 2022 +0200

    add tests for figure_detection WIP

commit f2a52a07a5e261962214dff40ba710c93993f6fb
Author: llocarnini <lillian.locarnini@iqser.com>
Date:   Fri Jun 24 14:28:44 2022 +0200

    added third test case "figure_and_text"

commit 8f45c88278cdcd32a121ea8269c8eca816bffd0b
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Fri Jun 24 13:25:17 2022 +0200

    add tests for figure_detection
master_17
2022-06-30 14:50:58 +02:00
Julius Unverfehrt
3ae4d81bb9 update dependencies master_16 2022-06-23 16:54:13 +02:00
Julius Unverfehrt
618880241c update dependencies 2022-06-23 16:46:26 +02:00
Julius Unverfehrt
956e673701 update dependencies 2022-06-23 16:37:46 +02:00
Julius Unverfehrt
a0abae195c update dependencies 2022-06-23 16:30:53 +02:00
Julius Unverfehrt
6d1ca4d6a3 Pull request #11: Integrate new pyinfra
Merge in RR/cv-analysis from integrate-new-pyinfra to master

Squashed commit of the following:

commit f27b7eb342838b7a235a062a04363dc417f859ad
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Thu Jun 23 14:24:03 2022 +0200

    refactor table test

commit 9f57cc7d72bffc106c852041666b2f11eb6eacc3
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Thu Jun 23 14:07:37 2022 +0200

    debug bamboo

commit 30911cc5a34559a8b622634ddf974a9860481d17
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Thu Jun 23 13:22:04 2022 +0200

    track test data with dvc

commit 501460c3c99482879ae585872bd67fd67693c47a
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Thu Jun 23 13:19:39 2022 +0200

    untrack test data

commit f65ade167802901a6f402618c062df0120279df3
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Thu Jun 23 12:02:43 2022 +0200

    refactor&extend tests

commit 8c9dc41ddeda5b0f630a267e328d1c09f69bdb04
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Thu Jun 23 09:36:26 2022 +0200

    debug bamboo

commit f0b38130502475cf9bfa8632d3b0eb3a84b32b7d
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Thu Jun 23 09:27:42 2022 +0200

    debug bamboo

commit 0f188b4eb5293cf2bc4024fb397f161ad3b867bd
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Thu Jun 23 09:23:38 2022 +0200

    update build script

commit 281e13d822790deefa3d1a4f2519d300d84cded3
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Thu Jun 23 09:21:31 2022 +0200

    refactor tests

commit e90e84cb3b13b2903611985cc9eb3b5b7bf0262e
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Thu Jun 23 08:54:29 2022 +0200

    parametrize analysis_fn for server logic, refactor tests

commit 20734bcd14fec489e80ea6900dba64de4b190398
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Thu Jun 23 08:53:16 2022 +0200

    oursource tests from module

commit cd2c41762df1a231f2ed1d43c3b71d2443530ffa
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Wed Jun 22 14:26:36 2022 +0200

    add tests for analyse server logic

commit 16497ac4ec8b0d7064f6d8dd887c189f0d955a1d
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Wed Jun 22 11:36:34 2022 +0200

    debug build script

commit 45688c1c6d9b738cce519edcdc044aae3b800cd1
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Wed Jun 22 11:33:13 2022 +0200

    debug build script

commit 0576140916c0cd9d290dd02225621e5360665d71
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Wed Jun 22 10:51:51 2022 +0200

    update tests

commit fcbecdde95cef46bce46545af65d040cc918447b
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Wed Jun 22 10:04:30 2022 +0200

    rename operations, update requirements

commit 7b40f6d643bb332fd7dd0867d64f17db16ede5bb
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Wed Jun 22 10:03:48 2022 +0200

    adjust deployment scripts

commit b66f937d2e0abc79e68bce6ee058bc0bd5cb86e5
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Tue Jun 21 13:32:44 2022 +0200

    refactor server logic, use operation2function logic for pyinfra server

commit 5e7247f85cacaa6c0643796a98f13642db3e59e1
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Mon Jun 20 17:23:11 2022 +0200

    add server logic for pyinfra 2

commit eecb985fed76af9404bd99f0104508efe7d75e35
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Mon Jun 20 16:24:05 2022 +0200

    add server logic for pyinfra 2.0.0

... and 3 more commits
2022-06-23 14:45:08 +02:00
Julius Unverfehrt
0858a69364 update planspec in order to add pyinfra as subrepo to bamboo, since it cant't be updated on other branches 2022-06-22 12:43:58 +02:00
Isaac Riley
268329a57f add pyinfra_compat.py 2022-06-20 13:48:16 +02:00
Isaac Riley
b66a7f15e1 added pyinfra_compat file, usage: from cv_analysis.pyinfra_compat import analyze_byteslist; page_results = analyze_byteslist(img_bytes_list) 2022-06-14 09:09:00 +02:00
Isaac Riley
0d9d577187 reformat 2022-06-13 13:04:15 +02:00
Isaac Riley
c62ab08b98 ready for integration with pyinfra 2022-06-13 12:59:00 +02:00
Isaac Riley
01803d452a Merge branch 'fig-detection-scanned-pdfs' master_8 2022-05-24 17:07:09 +02:00
llocarnini
f5a75f3949 changes in export_example_pages.py as well as removing unused imports in table_parsing.py 2022-05-24 16:20:52 +02:00
llocarnini
e6a173053b Merge remote-tracking branch 'origin/fig-detection-scanned-pdfs' into fig-detection-scanned-pdfs 2022-05-24 09:33:18 +02:00
llocarnini
90dfacab21 deleted function for processing testfiles 2022-05-24 09:32:48 +02:00
llocarnini
c4c85ace6d added locations and changed names for test_files 2022-05-24 09:31:29 +02:00
Isaac Riley
a4626e635a removed problematic dvc file 2022-05-24 08:19:25 +02:00
Isaac Riley
3f33ab4f3d resolve a DVC conflict 2022-05-24 08:01:42 +02:00
llocarnini
179ad20165 minor changes, refactoring and testfiles added 2022-05-17 09:17:24 +02:00
llocarnini
0e30e97f80 Merge branch 'master' of ssh://git.iqser.com:2222/rr/cv-analysis into fig-detection-scanned-pdfs
 Conflicts:
	cv_analysis/figure_detection.py
	cv_analysis/layout_parsing.py
	cv_analysis/table_parsing.py
	scripts/annotate.py
2022-05-04 09:33:14 +02:00
Isaac Riley
21d1f087c8 fixed show parameter, for development only master_7 2022-04-27 11:27:38 +02:00
llocarnini
98ed9a4220 Merge remote-tracking branch 'origin/fig-detection-scanned-pdfs' into fig-detection-scanned-pdfs 2022-04-27 11:12:43 +02:00
llocarnini
2c39ffbcdd changed kernel and iteration for better text removal 2022-04-27 11:12:23 +02:00
Isaac Riley
81fe5139c2 fixed tests, passed (still need to extend tests) 2022-04-27 10:52:35 +02:00
Isaac Riley
41e5f55ea7 got changes to table parsing from other branch 2022-04-27 09:18:57 +02:00
Isaac Riley
b806c3c13d fix for table parsing when no outer line is present 2022-04-27 09:15:15 +02:00
Isaac Riley
4ac1cce0e8 reformatting 2022-04-26 16:01:57 +02:00
llocarnini
19fe6965fb added line in display so the visual logger doesn't open too many plots
changes to fig_detection_with_layout.py so tables are getting parsed as well

reusage of adding external contour in table_parsing.py
2022-04-26 11:19:27 +02:00
Isaac Riley
9327fb7231 fixed json format and refactored service functions 2022-04-22 11:22:16 +02:00
llocarnini
17f5b22443 Merge branch 'master' of ssh://git.iqser.com:2222/rr/cv-analysis into fig-detection-scanned-pdfs
 Conflicts:
	cv_analysis/figure_detection.py
	cv_analysis/layout_parsing.py
	cv_analysis/table_parsing.py
	scripts/annotate.py
2022-04-22 10:24:09 +02:00
llocarnini
11a2465789 few corrections for including smaller figures 2022-04-22 10:12:28 +02:00
Isaac Riley
88bb8dbddf added visual logger for development 2022-04-21 15:10:35 +02:00
Isaac Riley
0ea556a7e0 slightly refactored table parsing and deleted unneeded file 2022-04-21 09:17:12 +02:00
llocarnini
3669b6b341 fig_detection_with_layout.py: approach to label the content of a page through layout detection, table parsing for detected tables needs to be added and overall codes needs to be reviewed
layout_parsing.py added condition so fig_detection_with_layout.py works
table_parsing.py uncommented line for better table parsing
text.py changed kernel sizes
2022-04-20 09:43:30 +02:00
llocarnini
420e484896 the thresholds deciding weather a countour is likely a primary text structure can be set better, as text structures are not always removed. this leads to over detection of figures 2022-04-12 16:48:29 +02:00
Isaac Riley
0b96980cc5 keyword 'show' to fix annotation script without causing problems for non-script usage 2022-04-11 09:44:47 +02:00
Isaac Riley
64258ed6e1 fixed hyphen/underscore confusion in cv_analysis master_5 2022-03-23 14:42:39 +01:00
Isaac Riley
80b0ca4ec5 tiny change to test build server 2022-03-23 14:35:00 +01:00
Isaac Riley
af898a37ac fixed naming errors 2022-03-23 13:55:30 +01:00
Isaac Riley
395456f196 Merge branch 'master' of ssh://git.iqser.com:2222/rr/vidocp 2022-03-23 13:51:26 +01:00
Isaac Riley
8730b34018 change name from vidocp to cv-analysis 2022-03-23 13:46:57 +01:00
Christoph  Schabert
604e9aa1b8 PlanSpec.java edited online with Bitbucket master_2 2022-03-23 13:45:28 +01:00
Isaac Riley
addacf9ed6 modify tests to not use poppler-utils, in order to pass sonar scan master_20 2022-03-23 09:48:47 +01:00
Isaac Riley
ad302aba79 try without test master_19 2022-03-22 14:14:47 +01:00
Isaac Riley
2726b85ef2 fixed build config minutia 2022-03-22 14:06:15 +01:00
Isaac Riley
d37fa6eaf7 remove option to ignore tests in sonar scan 2022-03-22 13:20:52 +01:00
Isaac Riley
dac6d47dc2 uncomment testing code in sonar script 2022-03-22 13:04:11 +01:00
Isaac Riley
7d22db92cf added table tests for use with sonar master_14 2022-03-22 12:54:10 +01:00
Isaac Riley
635fb84811 post-monitoring debug, especially of deskewing and skew check master_13 2022-03-17 21:51:15 +01:00
Isaac Riley
fa479adfb0 manually added tests from test branch to avoid major conflicts master_12 2022-03-15 12:17:09 +01:00