Compare commits

...

152 Commits

Author SHA1 Message Date
Jonathan Kössler
799fe331c3 Merge branch 'bugfix/RED-10722' into 'master'
RED-10722: fix dead letter queue

Closes RED-10722

See merge request redactmanager/cv-analysis-service!32
2025-01-16 09:28:41 +01:00
Jonathan Kössler
dfbfc50556 chore: update version 2025-01-15 13:40:01 +01:00
Jonathan Kössler
63fbd387a3 chore: update pyinfra to v3.4.2 2025-01-15 13:32:57 +01:00
Jonathan Kössler
41dbfc69d9 chore: update pyinfra to v3.4.2 2025-01-14 16:52:13 +01:00
Jonathan Kössler
b73e9b2ed9 Merge branch 'feature/RED-10441' into 'master'
RED-10441: fix abandoned queues

Closes RED-10441

See merge request redactmanager/cv-analysis-service!31
2024-11-13 17:27:22 +01:00
Jonathan Kössler
92692281ce chore: update pyinfra to v3.3.5 2024-11-13 17:22:24 +01:00
Jonathan Kössler
cb0c58d699 chore: update pyinfra to v3.3.4 2024-11-13 16:41:04 +01:00
Jonathan Kössler
eb96403fe2 chore: update pyinfra to v3.3.3 2024-11-13 14:53:11 +01:00
Jonathan Kössler
c8daf888c6 chore: update pyinfra to v3.3.2 2024-11-13 09:45:43 +01:00
Jonathan Kössler
eb921c365d Merge branch 'chore/update_pyinfra' into 'master'
RES-858: fix graceful shutdown

See merge request redactmanager/cv-analysis-service!30
2024-09-30 11:01:07 +02:00
Jonathan Kössler
7762f81a4a chore: update pyinfra to v3.2.11 2024-09-30 10:07:29 +02:00
Jonathan Kössler
e991cfe1bf Merge branch 'chore/update_pyinfra' into 'master'
RES-844 && RES-856: fix tracing & proto format

See merge request redactmanager/cv-analysis-service!29
2024-09-27 08:21:46 +02:00
Jonathan Kössler
35c5ee5831 fix: opentelemtry service name 2024-09-26 13:46:05 +02:00
Jonathan Kössler
e97f34391a chore: update pyinfra to v3.2.10 2024-09-26 13:44:48 +02:00
Francisco Schulz
1fa10721aa Merge branch 'RED-10017-investigate-crashing-py-services-when-upload-large-number-of-files' into 'master'
RED-10017 "Investigate crashing py services when upload large number of files"

See merge request redactmanager/cv-analysis-service!28
2024-09-23 18:55:08 +02:00
Francisco Schulz
7f0d0a48db RED-10017 "Investigate crashing py services when upload large number of files" 2024-09-23 18:55:08 +02:00
Francisco Schulz
333cd498b9 Merge branch 'RES-842-pyinfra-fix-rabbit-mq-handler-shuts-down-when-queues-not-available-yet' into 'master'
chore: update pyinfra version, increase pkg version

Closes RES-842

See merge request redactmanager/cv-analysis-service!27
2024-08-30 14:57:14 +02:00
francisco.schulz
9df8c8f936 chore: update service version 2024-08-30 08:25:00 -04:00
francisco.schulz
60adf0c381 chore: update pyinfra version 2024-08-30 08:15:34 -04:00
francisco.schulz
537f605a85 chore: remove renovate bot config 2024-08-29 11:48:15 -04:00
francisco.schulz
66987ab8e9 chore: update pyinfra version, increase pkg version 2024-08-29 11:29:07 -04:00
Jonathan Kössler
43570142c3 Merge branch 'feature/RES-840-add-client-connector-error' into 'master'
fix: add exception handling for ClientConnectorError

Closes RES-840

See merge request redactmanager/cv-analysis-service!26
2024-08-28 15:47:01 +02:00
Jonathan Kössler
d457f49001 chore: update pyinfra version 2024-08-28 14:47:29 +02:00
Jonathan Kössler
536928c032 Merge branch 'feature/RES-826-pyinfra-update' into 'master'
chore: bump pyinfra version

Closes RES-826

See merge request redactmanager/cv-analysis-service!25
2024-08-26 16:15:17 +02:00
Jonathan Kössler
dc6183490f chore: bump pyinfra version 2024-08-26 15:13:59 +02:00
Jonathan Kössler
bbc2d0c8bf chore: bump pyinfra version 2024-08-22 09:33:26 +02:00
Jonathan Kössler
3462faf8c7 Merge branch 'feature/RES-731-add-queues-per-tenant' into 'master'
RES-731: add queues per tenant

Closes RES-731

See merge request redactmanager/cv-analysis-service!24
2024-08-19 15:03:38 +02:00
Jonathan Kössler
b136cc9ff3 RES-731: add queues per tenant 2024-08-19 15:03:37 +02:00
Julius Unverfehrt
cf431df1cb Merge branch 'table_lines' into 'master'
Table lines

See merge request redactmanager/cv-analysis-service!23
2024-05-15 16:53:27 +02:00
iriley
23406004ed chore: remove debug code 2024-05-15 16:38:51 +02:00
iriley
b0467f2335 chore: debug coordinate remapping logic, esp mirroring 2024-05-15 16:33:58 +02:00
Julius Unverfehrt
e86214f6b7 Merge branch 'table_lines' into 'master'
fix: maping of image coordinates to pdf coordinates (table inference)

See merge request redactmanager/cv-analysis-service!22
2024-05-15 13:02:24 +02:00
iriley
3b8d6eda04 fix: maping of image coordinates to pdf coordinates (table inference) 2024-05-15 11:48:31 +02:00
Isaac Riley
3c9ddfcf0f Merge branch 'table_lines' into 'master'
fix: check nonzero list length in filter_fp_col_lines

See merge request redactmanager/cv-analysis-service!21
2024-05-13 09:39:20 +02:00
iriley
b854312b08 fix: check nonzero list length in filter_fp_col_lines 2024-05-13 08:40:31 +02:00
Isaac Riley
0f45a25bc8 Merge branch 'table_lines' into 'master'
fix: make envvar conditional unfailable

See merge request redactmanager/cv-analysis-service!20
2024-05-08 15:33:45 +02:00
iriley
8762363aa9 Merge branch 'table_lines' 2024-05-08 15:20:51 +02:00
iriley
72d26c4712 fix: make envvar conditional unfailable 2024-05-08 13:44:11 +02:00
Francisco Schulz
62fb637978 fix: generate docs when merging into master branch 2024-05-07 14:17:19 +02:00
Isaac Riley
802372a504 Update .gitlab-ci.yml file to build docs in build job rather than unit-tests 2024-05-07 12:41:46 +02:00
Isaac Riley
ceb1c00784 Update .gitlab-ci.yml file 2024-05-07 12:27:47 +02:00
Isaac Riley
f1f9e8d2bc Update .gitlab-ci.yml file 2024-05-07 10:59:21 +02:00
Isaac Riley
8fcb6f29fb Update .gitlab-ci.yml 2024-05-07 10:40:01 +02:00
Isaac Riley
79926b9990 Update .gitlab-ci.yml file 2024-05-07 09:07:00 +02:00
Isaac Riley
6d37622e95 Remove error in .gitlab-ci.yml file 2024-05-07 07:47:58 +02:00
Isaac Riley
6341512250 Define pages job in .gitlab-ci.yml file 2024-05-07 07:42:16 +02:00
Isaac Riley
713697b32d Update .gitlab-ci.yml file 2024-05-07 07:40:52 +02:00
Isaac Riley
b6e2540399 Update .gitlab-ci.yml 2024-05-07 07:31:41 +02:00
Isaac Riley
78b8f18865 Update .gitlab-ci.yaml 2024-05-07 07:25:53 +02:00
Isaac Riley
55795b9e58 Merge branch 'renovate/configure' into 'master'
Configure Renovate

See merge request redactmanager/cv-analysis-service!4
2024-05-07 06:50:34 +02:00
Isaac Riley
d2ec32b37c Merge branch 'update-ci' into 'master'
update CI to include GitLab Pages

See merge request redactmanager/cv-analysis-service!12
2024-05-07 06:33:45 +02:00
francisco.schulz
3202d95638 fix(build): faulty file reference 2024-05-06 15:59:55 -04:00
francisco.schulz
8c1e30c6df fix: unignore BOM 2024-05-06 15:49:19 -04:00
francisco.schulz
127fd7a399 feat: update dockerfile + add BOM 2024-05-06 15:45:29 -04:00
Francisco Schulz
560c73a5cb feat: use include statement for DVC 2024-05-06 17:47:39 +02:00
Francisco Schulz
d821b93af9 fix: dvc command
- URL not detected
2024-05-06 17:23:14 +02:00
Francisco Schulz
2f20ec4ecd feat: update CI and add DVC job 2024-05-06 17:10:56 +02:00
Isaac Riley
c2027df1c7 Merge branch 'table_lines' into 'master'
Add automatic documentation

See merge request redactmanager/cv-analysis-service!11
2024-05-06 15:50:57 +02:00
Isaac Riley
a966b49f89 Merge branch 'master' into 'table_lines'
# Conflicts:
#   .gitlab-ci.yml
2024-05-06 15:35:05 +02:00
iriley
8d81551da3 feat: add minimal working docs 2024-05-06 15:31:45 +02:00
Isaac Riley
626da20afd Update .gitlab-ci.yml 2024-04-29 15:53:08 +02:00
iriley
a55b34379a fix: remove before_script in gitlab-ci 2024-04-29 15:45:44 +02:00
Isaac Riley
2c5c3669a4 Update .gitlab-ci.yml file 2024-04-29 15:42:45 +02:00
Isaac Riley
55b8e209d3 Update .gitlab-ci.yml file 2024-04-29 15:41:24 +02:00
iriley
ab5096dd86 chore: fix readme problem with docs and modify gitlab ci to build docs 2024-04-29 15:39:33 +02:00
iriley
3a5fc32ec8 feat: add gitlab ci, makefile, sphinx docs 2024-04-29 14:57:36 +02:00
iriley
2c6232a1bf fix: remove typing errors (mypy) 2024-04-29 13:58:35 +02:00
iriley
b43033e6bf chore: repo housekeeping: adapt pre-commit and versioning script 2024-04-29 13:20:14 +02:00
iriley
5d13d8b3d0 chore: formatting and linting 2024-04-29 12:09:44 +02:00
Julius Unverfehrt
f213a16cd0 Merge branch 'table_lines' into 'master'
feat: table line inference (experimental for deployment)

See merge request redactmanager/cv-analysis-service!10
2024-04-26 15:14:51 +02:00
iriley
9e04693ee1 chore: update pyinfra 2024-04-26 15:07:33 +02:00
iriley
fee357872f fix: fix col line filter to handle empty list 2024-04-26 15:02:33 +02:00
iriley
12bb7ee25f fix: disable table inference test for now 2024-04-26 15:02:33 +02:00
iriley
f7a0db2651 feat: remove relextrema because not working; use pure numpy instead 2024-04-26 15:02:33 +02:00
Julius Unverfehrt
1d3b077ace chore: parse args in scripts, add colors for drawing lines 2024-04-26 15:02:33 +02:00
iriley
102617fe2f fix: coordinate remapping 2024-04-26 15:02:33 +02:00
Julius Unverfehrt
0f0fe516d0 funktion: Arbeit In Durchfuehrung: Hinzufuegen von Annotations Logik 2024-04-26 15:02:33 +02:00
Julius Unverfehrt
8de913840f funktion: In Arbeit: Hinzufuegung von Tragbares Dokumentenformat Koordinaten Konvertierung 2024-04-26 15:02:33 +02:00
Julius Unverfehrt
aefb73bf28 hausarbeit: Aktualisierung der Liesmich und Anpassung einer Pruefung" 2024-04-26 15:02:33 +02:00
Julius Unverfehrt
20f8dcd336 feat: adapt interface for production 2024-04-26 15:02:33 +02:00
iriley
681e59d24e chore: add test file to dvc 2024-04-26 15:02:33 +02:00
iriley
abd350cc42 fix: import error and fitz api correction in extract_images_from_pdf; table inference test 2024-04-26 15:02:33 +02:00
iriley
e264c948cf feat: adapt pipeline for new table inference + pyinfra 2024-04-26 15:02:27 +02:00
iriley
ddd680bb4c fix: change None to list when HoughLinesP returns None 2024-04-26 15:00:06 +02:00
iriley
ebdf3cefbf rebase: feat: use line-mapping logic 2024-04-26 14:58:10 +02:00
Julius Unverfehrt
ffb10876f5 fix: RED-8978: update pyinfra 2024-04-16 16:40:05 +02:00
Isaac Riley
95abb5d5fb Merge branch 'table_lines' into 'master'
Table lines

See merge request redactmanager/cv-analysis-service!9
2024-03-08 10:42:54 +01:00
Isaac Riley
482673f927 Table lines 2024-03-08 10:42:54 +01:00
Julius Unverfehrt
a52226d8fe chore(logger): support spring log levels 2024-02-28 16:37:18 +01:00
Julius Unverfehrt
fa959332cb chore(build): fix broken build logic
Standardizes project structure so the dockerbuild works
2024-02-08 15:14:43 +01:00
Julius Unverfehrt
688217f3cd Merge branch 'RES-535-update-pyinfra' into 'master'
feat(opentel,dynaconf): adapt new pyinfra

Closes RES-535

See merge request redactmanager/cv-analysis-service!8
2024-02-08 12:33:05 +01:00
Julius Unverfehrt
183aad4bf8 chore: major service version increment 2024-02-08 11:39:04 +01:00
Julius Unverfehrt
0a11471191 feat(opentel,dynaconf): adapt new pyinfra
This commit also disables a broken test that connot be fixed. There are
also many scripts that didn't work anyways (and are not needed in my
eyes) that were not updatet. The scripts that are needed to run the
service processing locally still work.
2024-02-08 11:19:33 +01:00
Julius Unverfehrt
55fb4e06f2 chore(dev): add ipython dev dependency 2024-02-08 11:17:55 +01:00
Julius Unverfehrt
306c9b67cf feat(opentel): upgrade dependencies 2024-02-08 10:23:18 +01:00
Francisco Schulz
60b1c15f82 Merge branch 'RED-7958-logging-issues-of-python-services' into 'master'
RED-7958 "Logging issues of python services"

See merge request redactmanager/cv-analysis-service!6
2023-12-12 11:45:51 +01:00
Francisco Schulz
940d7b9277 disable integration tests 2023-12-12 11:21:42 +01:00
Francisco Schulz
d1c2610bd5 use png file from ./test directory for integration tests 2023-12-11 13:56:03 +01:00
Francisco Schulz
50831036f5 use image-classification-service integration test file (729.pdf) 2023-12-11 13:24:36 +01:00
francisco.schulz
726aae03a6 update knutils 2023-12-11 10:58:50 +01:00
Francisco Schulz
423842a4c9 use default integration test branch 2023-12-07 13:23:18 +01:00
francisco.schulz
6426c14fb7 use py3.10 branch for integration tests 2023-12-04 10:26:23 +01:00
francisco.schulz
6070736df9 define INTEGRATION_TEST_FILE 2023-11-28 14:28:48 +01:00
francisco.schulz
295a5dea77 use python 3.10 2023-11-28 11:03:03 +01:00
francisco.schulz
515cd2309b update to new CI template 2023-11-28 10:56:21 +01:00
francisco.schulz
be65ea4ff5 update dependencies 2023-11-28 10:55:19 +01:00
francisco.schulz
85885f929b remove psw 2023-11-28 10:55:11 +01:00
francisco.schulz
fc7d4ee829 ignore DS_Store 2023-11-28 10:55:01 +01:00
Julius Unverfehrt
83a922deed Merge branch 'feature/RED-6685-support-absolute-paths' into 'master'
Upgrade pyinfra (absolute FP support)

Closes RED-6685

See merge request redactmanager/cv-analysis-service!5
2023-08-23 15:59:57 +02:00
Julius Unverfehrt
efcd661948 Upgrade pyinfra (absolute FP support)
- Update pyinfra with absolute file path support (still supports
  dossierID fileID format)
- Update CI, use new template
2023-08-23 15:49:42 +02:00
Kevin Tumma
4c4ed8ba1e Add renovate.json 2023-07-14 09:03:30 +00:00
Julius Unverfehrt
415d2b135b Merge branch 'RES-196-red-hotfix-persistent-service-address' into 'master'
Resolve RES-196 "Red hotfix persistent service address"

Closes RES-196

See merge request redactmanager/cv-analysis-service!3
2023-06-26 12:56:41 +02:00
francisco.schulz
5538f12d3f increment version 2023-06-21 15:36:20 +02:00
francisco.schulz
a08799d7b8 add docker scripts 2023-06-21 15:33:42 +02:00
francisco.schulz
db55d4ccf9 update dependencies, pyinrfa@1.5.9 2023-06-21 15:33:29 +02:00
francisco.schulz
76940a28ba add k8s startup probe script 2023-06-21 14:10:38 +02:00
francisco.schulz
5331cb7c5b copy scripts folder 2023-06-21 14:09:24 +02:00
francisco.schulz
d44ed1c596 update dependencies 2023-06-21 14:09:10 +02:00
francisco.schulz
384d4b6f73 reference CI template tag 2023-06-19 12:22:44 +02:00
francisco.schulz
861c3e347e test new CI 2023-06-19 12:14:21 +02:00
francisco.schulz
9c753fede3 update CI 2023-06-19 11:49:28 +02:00
francisco.schulz
fa93255ba1 increment version 2023-06-19 11:35:28 +02:00
francisco.schulz
f743bf6171 add example comment 2023-06-19 11:25:45 +02:00
francisco.schulz
4ee343f6df update dependencies & increment version 2023-06-19 11:25:35 +02:00
francisco.schulz
335da13cb5 copy cv_analysis folder, not only files 2023-06-19 11:25:07 +02:00
Francisco Schulz
441814f201 Merge branch 'RES-142-migrate-red-cv-analysis-service' into 'master'
Resolve RES-142 "Migrate red cv analysis service"

Closes RES-142

See merge request redactmanager/cv-analysis-service!2
2023-06-07 16:46:58 +02:00
francisco.schulz
f9a9a86bc7 use patched CI 2023-06-07 16:45:28 +02:00
francisco.schulz
d98f38607f increase patch version 2023-06-07 15:56:02 +02:00
francisco.schulz
63d2f891e4 correct version number 1.19.0 2023-06-07 15:41:28 +02:00
francisco.schulz
cb974b19b6 use CI template ref:0.2.0 2023-06-07 15:36:26 +02:00
Francisco Schulz
019f0da11a Update .gitlab-ci.yml file 2023-06-07 15:18:40 +02:00
Francisco Schulz
9adc0e2ced echo DVC vars 2023-06-07 15:09:59 +02:00
Francisco Schulz
11515f6f71 Merge branch 'RES-142-migrate-red-cv-analysis-service-patch-0d3b' into 'RES-142-migrate-red-cv-analysis-service'
add config without connection_string

See merge request redactmanager/cv-analysis-service!1
2023-06-07 13:08:30 +02:00
Francisco Schulz
ee5f960a3f add config without connection_string 2023-06-07 13:07:43 +02:00
francisco.schulz
5b991d3a69 remove DVC config from git 2023-06-07 12:55:50 +02:00
francisco.schulz
6033fec952 remove git submodules 2023-06-07 12:53:37 +02:00
francisco.schulz
c64f02696d add CI 2023-06-07 12:53:05 +02:00
francisco.schulz
79163c33cf update dockerfile to work with poetry 2023-06-07 12:52:52 +02:00
francisco.schulz
44dd613715 add dev setup script 2023-06-07 12:52:27 +02:00
francisco.schulz
3654ab3c8d update 2023-06-07 12:52:11 +02:00
francisco.schulz
bb6ba8e0e9 update dependencies 2023-06-07 12:51:59 +02:00
francisco.schulz
6323884683 remove old CI files 2023-06-07 12:51:45 +02:00
Julius Unverfehrt
def2d2d108 Pull request #41: RED-6273 multi tenant storage
Merge in RR/cv-analysis from RED-6273-multi-tenant-storage to master

Squashed commit of the following:

commit ed07dc26b0323bf1f0a5b336e4075c8cc3d20d29
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Tue Mar 28 18:03:16 2023 +0200

    update pyinfra version with removed falsy dependencies from pyinfra

commit 202ce84419465ddcbe3e263e58917cc92b2639a6
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Tue Mar 28 17:27:44 2023 +0200

    update pyinfra for bugfix

commit 87122ffb965c016383bfd49e2eaaa6ab3b5d7101
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Tue Mar 28 15:50:29 2023 +0200

    Update pyinfra for multi-tenancy support

    Update serve script with PayloadProcessor from pyinfra
2023-03-28 18:12:01 +02:00
Julius Unverfehrt
cfbd2e287a update pyinfra with fixed prometheus port 2023-03-21 16:07:58 +01:00
Julius Unverfehrt
436824c926 Pull request #40: add table processing time monitoring
Merge in RR/cv-analysis from RED-6205-add-prometheus-monitoring to master

* commit '1a4ae6735d4ad3112fc1c48496e216f6d69ff675':
  add table processing time monitoring
2023-03-17 07:45:10 +01:00
Julius Unverfehrt
1a4ae6735d add table processing time monitoring 2023-03-16 17:33:49 +01:00
Julius Unverfehrt
08c0096c07 Pull request #38: upgrade references
Merge in RR/cv-analysis from RED-6118-multi-tenancy-patch to master

* commit '233c6facfd75771885ae87c79b57bcb53c71d6e7':
  upgrade references
2023-02-16 16:47:40 +01:00
Julius Unverfehrt
233c6facfd upgrade references 2023-02-16 16:45:19 +01:00
Francisco Schulz
4ce6c9bdc9 Pull request #37: RED-5277 fix heartbeat issue
Merge in RR/cv-analysis from RED-5277-fix-heartbeat-issue to master

* commit '5bb9282da6aa1d75182c2172c601bed534099b0f':
  use python 3.8 in build
  update serve.py to work with new pyinfra version
  update reference to pyinfra
2023-02-16 11:06:06 +01:00
Francisco Schulz
5bb9282da6 use python 3.8 in build 2023-02-16 11:00:06 +01:00
Francisco Schulz
eef371e2a8 update serve.py to work with new pyinfra version 2023-02-16 10:47:13 +01:00
Francisco Schulz
ad45e2c1da update reference to pyinfra 2023-02-16 10:46:55 +01:00
298 changed files with 160222 additions and 4131 deletions

View File

@ -10,7 +10,7 @@ omit =
*/build_venv/*
*/incl/*
source =
cv_analysis
cv_analysis
relative_files = True
data_file = .coverage
@ -46,4 +46,4 @@ ignore_errors = True
directory = reports
[xml]
output = reports/coverage.xml
output = reports/coverage.xml

View File

@ -97,4 +97,4 @@ target/
*.swp
*/*.swp
*/*/*.swp
*/*/*/*.swp
*/*/*/*.swp

View File

@ -1,7 +1,10 @@
[core]
remote = vector
autostage = true
remote = azure_remote
['remote "vector"']
url = ssh://vector.iqser.com/research/nonml_cv_doc_parsing/
port = 22
['remote "azure_remote"']
url = azure://cv-sa-dvc/
connection_string = "DefaultEndpointsProtocol=https;AccountName=cvsacricket;AccountKey=KOuTAQ6Mp00ePTT5ObYmgaHlxwS1qukY4QU4Kuk7gy/vldneA+ZiKjaOpEFtqKA6Mtym2gQz8THy+ASts/Y1Bw==;EndpointSuffix=core.windows.net"
['remote "local"']
url = ../dvc_local_remote

77
.gitignore vendored
View File

@ -1,27 +1,52 @@
# Environments
.env
.venv
env/
venv/
.pytest*
.python-version
.DS_Store
# Project folders
scratch/
*.vscode/
.idea
*_app
*pytest_cache
*joblib
*tmp
*profiling
*logs
*docker
*drivers
*bamboo-specs/target
# Python specific files
__pycache__/
*.egg-info/
deskew_model/
build_venv/
/pdfs/
/results/
/pdfs/
/env/
/.idea/
/.idea/.gitignore
/.idea/misc.xml
/.idea/inspectionProfiles/profiles_settings.xml
/.idea/table_parsing.iml
/.idea/vcs.xml
/results/
/table_parsing.egg-info
/target/
/tests/
/cv_analysis.egg-info/dependency_links.txt
/cv_analysis.egg-info/PKG-INFO
/cv_analysis.egg-info/SOURCES.txt
/cv_analysis.egg-info/top_level.txt
/.vscode/
/cv_analysis/test/test_data/example_pages.json
/data/metadata_testing_files.csv
.coverage
/data/
*.py[cod]
*.ipynb
*.ipynb_checkpoints
# file extensions
*.log
*.csv
*.json
*.pkl
*.profile
*.cbm
# temp files
*.swp
*~
*.un~
# keep files
!notebooks/*.ipynb
# keep folders
!secrets
!data/*
!drivers
# unignore files
!bom.*

30
.gitlab-ci.backup.yml Normal file
View File

@ -0,0 +1,30 @@
include:
- project: "Gitlab/gitlab"
ref: 0.3.0
file: "/ci-templates/research/dvc-versioning-build-release.gitlab-ci.yml"
variables:
NEXUS_PROJECT_DIR: red
IMAGENAME: "${CI_PROJECT_NAME}"
#################################
# temp. disable integration tests, b/c they don't cover the CV analysis case yet
trigger integration tests:
rules:
- when: never
release build:
stage: release
needs:
- job: set custom version
artifacts: true
optional: true
- job: calculate patch version
artifacts: true
optional: true
- job: calculate minor version
artifacts: true
optional: true
- job: build docker nexus
artifacts: true
#################################

35
.gitlab-ci.yml Normal file
View File

@ -0,0 +1,35 @@
# CI for services, check gitlab repo for python package CI
include:
- project: "Gitlab/gitlab"
ref: main
file: "/ci-templates/research/versioning-build-test-release.gitlab-ci.yml"
- project: "Gitlab/gitlab"
ref: main
file: "/ci-templates/research/docs.gitlab-ci.yml"
# set project variables here
variables:
NEXUS_PROJECT_DIR: red # subfolder in Nexus docker-gin where your container will be stored
IMAGENAME: $CI_PROJECT_NAME # if the project URL is gitlab.example.com/group-name/project-1, CI_PROJECT_NAME is project-1
pages:
only:
- master # KEEP THIS, necessary because `master` branch and not `main` branch
###################
# INTEGRATION TESTS
trigger-integration-tests:
extends: .integration-tests
# ADD THE MODEL BUILD WHICH SHOULD TRIGGER THE INTEGRATION TESTS
# needs:
# - job: docker-build::model_name
# artifacts: true
rules:
- when: never
#########
# RELEASE
release:
extends: .release
needs:
- !reference [.needs-versioning, needs] # leave this line as is

View File

@ -0,0 +1,61 @@
import subprocess
import sys
from pathlib import Path
import semver
from loguru import logger
from semver.version import Version
logger.remove()
logger.add(sys.stdout, level="INFO")
def bashcmd(cmds: list) -> str:
try:
logger.debug(f"running: {' '.join(cmds)}")
return subprocess.run(cmds, check=True, capture_output=True, text=True).stdout.strip("\n")
except:
logger.warning(f"Error executing the following bash command: {' '.join(cmds)}.")
raise
def get_highest_existing_git_version_tag() -> str:
"""Get highest versions from git tags depending on bump level"""
try:
git_tags = bashcmd(["git", "tag", "-l"]).split()
semver_compat_tags = list(filter(Version.is_valid, git_tags))
highest_git_version_tag = max(semver_compat_tags, key=semver.version.Version.parse)
logger.info(f"Highest git version tag: {highest_git_version_tag}")
return highest_git_version_tag
except:
logger.warning("Error getting git version tags")
raise
def auto_bump_version() -> bool:
active = Path(".autoversion").is_file()
logger.debug(f"Automated version bump is set to '{active}'")
return active
def main() -> None:
poetry_project_version = bashcmd(["poetry", "version", "-s"])
logger.info(f"Poetry project version: {poetry_project_version}")
highest_git_version_tag = get_highest_existing_git_version_tag()
comparison_result = semver.compare(poetry_project_version, highest_git_version_tag)
if comparison_result in (-1, 0):
logger.warning("Poetry version must be greater than git tag version.")
if auto_bump_version():
logger.info(bashcmd(["poetry", "version", highest_git_version_tag]))
sys.exit(0)
sys.exit(1)
else:
logger.info(f"All good: {poetry_project_version} > {highest_git_version_tag}")
if __name__ == "__main__":
main()

72
.pre-commit-config.yaml Normal file
View File

@ -0,0 +1,72 @@
# See https://pre-commit.com for more information
# See https://pre-commit.com/hooks.html for more hooks
exclude: ^(docs/|notebooks/|data/|src/configs/|tests/|.hooks/|bom.json)
default_language_version:
python: python3.10
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v5.0.0
hooks:
- id: trailing-whitespace
- id: end-of-file-fixer
- id: check-yaml
args: [--unsafe] # needed for .gitlab-ci.yml
- id: check-toml
- id: detect-private-key
- id: check-added-large-files
args: ['--maxkb=10000']
- id: check-case-conflict
- id: mixed-line-ending
# - repo: https://github.com/pre-commit/mirrors-pylint
# rev: v3.0.0a5
# hooks:
# - id: pylint
# args:
# - --disable=C0111,R0903,E0401
# - --max-line-length=120
- repo: https://github.com/pre-commit/mirrors-isort
rev: v5.10.1
hooks:
- id: isort
args:
- --profile black
- repo: https://github.com/psf/black
rev: 24.10.0
hooks:
- id: black
# exclude: ^(docs/|notebooks/|data/|src/secrets/)
args:
- --line-length=120
- repo: https://github.com/compilerla/conventional-pre-commit
rev: v4.0.0
hooks:
- id: conventional-pre-commit
pass_filenames: false
stages: [commit-msg]
# args: [] # optional: list of Conventional Commits types to allow e.g. [feat, fix, ci, chore, test]
- repo: local
hooks:
- id: version-checker
name: version-checker
entry: python .hooks/poetry_version_check.py
language: python
always_run: true
additional_dependencies:
- "semver"
- "loguru"
# - repo: local
# hooks:
# - id: docker-build-test
# name: testing docker build
# entry: ./scripts/ops/docker-compose-build-run.sh
# language: script
# # always_run: true
# pass_filenames: false
# args: []
# stages: [pre-commit]

View File

@ -1,30 +1,78 @@
FROM python:3.10
###############
# BUILDER IMAGE
FROM python:3.10-slim as builder
RUN python -m venv /app/venv
ENV PATH="/app/venv/bin:$PATH"
ARG GITLAB_USER
ARG GITLAB_ACCESS_TOKEN
RUN python -m pip install --upgrade pip
ARG PYPI_REGISTRY_RESEARCH=https://gitlab.knecon.com/api/v4/groups/19/-/packages/pypi
ARG POETRY_SOURCE_REF_RESEARCH=gitlab-research
WORKDIR /app/service
ARG PYPI_REGISTRY_RED=https://gitlab.knecon.com/api/v4/groups/12/-/packages/pypi
ARG POETRY_SOURCE_REF_RED=gitlab-red
COPY ./requirements.txt ./requirements.txt
RUN python3 -m pip install -r requirements.txt
ARG PYPI_REGISTRY_FFORESIGHT=https://gitlab.knecon.com/api/v4/groups/269/-/packages/pypi
ARG POETRY_SOURCE_REF_FFORESIGHT=gitlab-fforesight
COPY ./incl/pyinfra/requirements.txt ./incl/pyinfra/requirements.txt
RUN python -m pip install -r incl/pyinfra/requirements.txt
ARG VERSION=dev
COPY ./incl/pdf2image/requirements.txt ./incl/pdf2image/requirements.txt
RUN python -m pip install -r incl/pdf2image/requirements.txt
LABEL maintainer="Research <research@knecon.com>"
LABEL version="${VERSION}"
COPY ./incl ./incl
WORKDIR /app
RUN python3 -m pip install -e incl/pyinfra
RUN python3 -m pip install -e incl/pdf2image
###########
# ENV SETUP
ENV PYTHONDONTWRITEBYTECODE=true
ENV PYTHONUNBUFFERED=true
ENV POETRY_HOME=/opt/poetry
ENV PATH="$POETRY_HOME/bin:$PATH"
RUN apt-get update && \
apt-get install -y curl git bash build-essential libffi-dev libssl-dev && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*
RUN curl -sSL https://install.python-poetry.org | python3 -
RUN poetry --version
COPY pyproject.toml poetry.lock ./
RUN poetry config virtualenvs.create true && \
poetry config virtualenvs.in-project true && \
poetry config installer.max-workers 10 && \
poetry config repositories.${POETRY_SOURCE_REF_RESEARCH} ${PYPI_REGISTRY_RESEARCH} && \
poetry config http-basic.${POETRY_SOURCE_REF_RESEARCH} ${GITLAB_USER} ${GITLAB_ACCESS_TOKEN} && \
poetry config repositories.${POETRY_SOURCE_REF_RED} ${PYPI_REGISTRY_RED} && \
poetry config http-basic.${POETRY_SOURCE_REF_RED} ${GITLAB_USER} ${GITLAB_ACCESS_TOKEN} && \
poetry config repositories.${POETRY_SOURCE_REF_FFORESIGHT} ${PYPI_REGISTRY_FFORESIGHT} && \
poetry config http-basic.${POETRY_SOURCE_REF_FFORESIGHT} ${GITLAB_USER} ${GITLAB_ACCESS_TOKEN} && \
poetry install --without=dev,docs,test -vv --no-interaction --no-root
##################
# COPY SOURCE CODE
COPY ./config ./config
COPY ./src ./src
COPY ./cv_analysis ./cv_analysis
COPY ./setup.py ./setup.py
RUN python3 -m pip install -e .
###############
# WORKING IMAGE
FROM python:3.10-slim
CMD ["python3", "-u", "src/serve.py"]
# COPY BILL OF MATERIALS (BOM)
COPY bom.json /bom.json
# COPY SOURCE CODE FROM BUILDER IMAGE
COPY --from=builder /app /app
WORKDIR /app
ENV PATH="/app/.venv/bin:$PATH"
############
# NETWORKING
EXPOSE 5000
EXPOSE 8080
################
# LAUNCH COMMAND
CMD [ "python", "src/serve.py"]

94
Makefile Normal file
View File

@ -0,0 +1,94 @@
.PHONY: \
poetry in-project-venv dev-env use-env install install-dev tests \
update-version sync-version-with-git \
docker docker-build-run docker-build docker-run \
docker-rm docker-rm-container docker-rm-image \
pre-commit get-licenses prep-commit \
docs sphinx_html sphinx_apidoc bom
.DEFAULT_GOAL := run
export DOCKER=docker
export DOCKERFILE=Dockerfile
export IMAGE_NAME=cv_analysis_service-image
export CONTAINER_NAME=cv_analysis_service-container
export HOST_PORT=9999
export CONTAINER_PORT=9999
export PYTHON_VERSION=python3.10
# all commands should be executed in the root dir or the project,
# specific environments should be deactivated
poetry: in-project-venv use-env dev-env
in-project-venv:
poetry config virtualenvs.in-project true
use-env:
poetry env use ${PYTHON_VERSION}
dev-env:
poetry install --with dev && poetry update
install:
poetry add $(pkg)
install-dev:
poetry add --dev $(pkg)
requirements:
poetry export --without-hashes --output requirements.txt
update-version:
poetry version prerelease
sync-version-with-git:
git pull -p && poetry version $(git rev-list --tags --max-count=1 | git describe --tags --abbrev=0)
bom:
cyclonedx-py poetry -o bom.json
docker: docker-rm docker-build-run
docker-build-run: docker-build docker-run
docker-build:
$(DOCKER) build \
--no-cache --progress=plain \
-t $(IMAGE_NAME) -f $(DOCKERFILE) \
--build-arg USERNAME=${USERNAME} \
--build-arg TOKEN=${GITLAB_TOKEN} \
.
docker-run:
$(DOCKER) run -it --rm -p $(HOST_PORT):$(CONTAINER_PORT)/tcp --name $(CONTAINER_NAME) $(IMAGE_NAME)
docker-rm: docker-rm-container docker-rm-image
docker-rm-container:
-$(DOCKER) rm $(CONTAINER_NAME)
docker-rm-image:
-$(DOCKER) image rm $(IMAGE_NAME)
tests:
poetry run pytest ./tests
prep-commit:
docs get-license sync-version-with-git update-version pre-commit
pre-commit:
pre-commit run --all-files
get-licenses:
pip-licenses --format=json --order=license --with-urls > pkg-licenses.json
docs: sphinx_apidoc sphinx_html
sphinx_html:
poetry run sphinx-build -b html docs/source/ docs/build/html -E -a
sphinx_apidoc:
cp ./README.md ./docs/source/README.md && cp -r ./data ./docs/source/data/ && poetry run sphinx-apidoc ./src -o ./docs/source/modules --no-toc --module-first --follow-links --separate --force
bom:
cyclonedx-py poetry -o bom.json

View File

@ -1,8 +1,60 @@
# cv-analysis &mdash; Visual (CV-Based) Document Parsing
# cv-analysis - Visual (CV-Based) Document Parsing
parse_pdf()
This repository implements computer vision based approaches for detecting and parsing visual features such as tables or
previous redactions in documents.
## API
Input message:
```json
{
"targetFilePath": {
"pdf": "absolute file path",
"vlp_output": "absolute file path"
},
"responseFilePath": "absolute file path",
"operation": "table_image_inference"
}
```
Response is uploaded to the storage as specified in the `responseFilePath` field. The structure is as follows:
```json
{
...,
"data": [
{
'pageNum': 0,
'bbox': {
'x1': 55.3407,
'y1': 247.0246,
'x2': 558.5602,
'y2': 598.0585
},
'uuid': '2b10c1a2-393c-4fca-b9e3-0ad5b774ac84',
'label': 'table',
'tableLines': [
{
'x1': 0,
'y1': 16,
'x2': 1399,
'y2': 16
},
...
],
'imageInfo': {
'height': 693,
'width': 1414
}
},
...
]
}
```
## Installation
```bash
@ -31,10 +83,9 @@ The below snippet shows hot to find the outlines of previous redactions.
```python
from cv_analysis.redaction_detection import find_redactions
import pdf2image
import pdf2image
import numpy as np
pdf_path = ...
page_index = ...

View File

@ -1,40 +0,0 @@
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
<modelVersion>4.0.0</modelVersion>
<parent>
<groupId>com.atlassian.bamboo</groupId>
<artifactId>bamboo-specs-parent</artifactId>
<version>7.1.2</version>
<relativePath/>
</parent>
<artifactId>bamboo-specs</artifactId>
<version>1.0.0-SNAPSHOT</version>
<packaging>jar</packaging>
<properties>
<sonar.skip>true</sonar.skip>
</properties>
<dependencies>
<dependency>
<groupId>com.atlassian.bamboo</groupId>
<artifactId>bamboo-specs-api</artifactId>
</dependency>
<dependency>
<groupId>com.atlassian.bamboo</groupId>
<artifactId>bamboo-specs</artifactId>
</dependency>
<!-- Test dependencies -->
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<scope>test</scope>
</dependency>
</dependencies>
<!-- run 'mvn test' to perform offline validation of the plan -->
<!-- run 'mvn -Ppublish-specs' to upload the plan to your Bamboo server -->
</project>

View File

@ -1,178 +0,0 @@
package buildjob;
import static com.atlassian.bamboo.specs.builders.task.TestParserTask.createJUnitParserTask;
import java.time.LocalTime;
import com.atlassian.bamboo.specs.api.BambooSpec;
import com.atlassian.bamboo.specs.api.builders.BambooKey;
import com.atlassian.bamboo.specs.api.builders.docker.DockerConfiguration;
import com.atlassian.bamboo.specs.api.builders.permission.PermissionType;
import com.atlassian.bamboo.specs.api.builders.permission.Permissions;
import com.atlassian.bamboo.specs.api.builders.permission.PlanPermissions;
import com.atlassian.bamboo.specs.api.builders.plan.Job;
import com.atlassian.bamboo.specs.api.builders.plan.Plan;
import com.atlassian.bamboo.specs.api.builders.plan.PlanIdentifier;
import com.atlassian.bamboo.specs.api.builders.plan.Stage;
import com.atlassian.bamboo.specs.api.builders.plan.branches.BranchCleanup;
import com.atlassian.bamboo.specs.api.builders.plan.branches.PlanBranchManagement;
import com.atlassian.bamboo.specs.api.builders.project.Project;
import com.atlassian.bamboo.specs.builders.task.CheckoutItem;
import com.atlassian.bamboo.specs.builders.task.InjectVariablesTask;
import com.atlassian.bamboo.specs.builders.task.ScriptTask;
import com.atlassian.bamboo.specs.builders.task.VcsCheckoutTask;
import com.atlassian.bamboo.specs.builders.task.CleanWorkingDirectoryTask;
import com.atlassian.bamboo.specs.builders.task.VcsTagTask;
import com.atlassian.bamboo.specs.builders.trigger.BitbucketServerTrigger;
import com.atlassian.bamboo.specs.builders.trigger.ScheduledTrigger;
import com.atlassian.bamboo.specs.model.task.InjectVariablesScope;
import com.atlassian.bamboo.specs.api.builders.Variable;
import com.atlassian.bamboo.specs.util.BambooServer;
import com.atlassian.bamboo.specs.builders.task.ScriptTask;
import com.atlassian.bamboo.specs.model.task.ScriptTaskProperties.Location;
/**
* Plan configuration for Bamboo.
* Learn more on: <a href="https://confluence.atlassian.com/display/BAMBOO/Bamboo+Specs">https://confluence.atlassian.com/display/BAMBOO/Bamboo+Specs</a>
*/
@BambooSpec
public class PlanSpec {
private static final String SERVICE_NAME = "cv-analysis";
private static final String SERVICE_KEY = SERVICE_NAME.toUpperCase().replaceAll("-","").replaceAll("_","");
/**
* Run main to publish plan on Bamboo
*/
public static void main(final String[] args) throws Exception {
//By default credentials are read from the '.credentials' file.
BambooServer bambooServer = new BambooServer("http://localhost:8085");
Plan plan = new PlanSpec().createDockerBuildPlan();
bambooServer.publish(plan);
PlanPermissions planPermission = new PlanSpec().createPlanPermission(plan.getIdentifier());
bambooServer.publish(planPermission);
Plan secPlan = new PlanSpec().createSecBuild();
bambooServer.publish(secPlan);
PlanPermissions secPlanPermission = new PlanSpec().createPlanPermission(secPlan.getIdentifier());
bambooServer.publish(secPlanPermission);
}
private PlanPermissions createPlanPermission(PlanIdentifier planIdentifier) {
Permissions permission = new Permissions()
.userPermissions("atlbamboo", PermissionType.EDIT, PermissionType.VIEW, PermissionType.ADMIN, PermissionType.CLONE, PermissionType.BUILD)
.groupPermissions("research", PermissionType.EDIT, PermissionType.VIEW, PermissionType.CLONE, PermissionType.BUILD)
.groupPermissions("Development", PermissionType.EDIT, PermissionType.VIEW, PermissionType.CLONE, PermissionType.BUILD)
.groupPermissions("QA", PermissionType.EDIT, PermissionType.VIEW, PermissionType.CLONE, PermissionType.BUILD)
.loggedInUserPermissions(PermissionType.VIEW)
.anonymousUserPermissionView();
return new PlanPermissions(planIdentifier.getProjectKey(), planIdentifier.getPlanKey()).permissions(permission);
}
private Project project() {
return new Project()
.name("RED")
.key(new BambooKey("RED"));
}
public Plan createDockerBuildPlan() {
return new Plan(
project(),
SERVICE_NAME, new BambooKey(SERVICE_KEY))
// .description("Docker build for cv-analysis.")
// .variables()
.stages(new Stage("Build Stage")
.jobs(
new Job("Build Job", new BambooKey("BUILD"))
.tasks(
new CleanWorkingDirectoryTask()
.description("Clean working directory.")
.enabled(true),
new VcsCheckoutTask()
.description("Checkout default repository.")
.checkoutItems(new CheckoutItem().defaultRepository()),
new ScriptTask()
.description("Set config and keys.")
.location(Location.FILE)
.fileFromPath("bamboo-specs/src/main/resources/scripts/key-prepare.sh"),
new ScriptTask()
.description("Build Docker container.")
.location(Location.FILE)
.fileFromPath("bamboo-specs/src/main/resources/scripts/docker-build.sh")
.argument(SERVICE_NAME),
new InjectVariablesTask()
.description("Inject git tag.")
.path("git.tag")
.namespace("g")
.scope(InjectVariablesScope.LOCAL),
new VcsTagTask()
.description("${bamboo.g.gitTag}")
.tagName("${bamboo.g.gitTag}")
.defaultRepository())
.dockerConfiguration(
new DockerConfiguration()
.image("nexus.iqser.com:5001/infra/release_build:4.5.0")
.volume("/var/run/docker.sock", "/var/run/docker.sock")),
new Job("Licence Job", new BambooKey("LICENCE"))
.enabled(false)
.tasks(
new VcsCheckoutTask()
.description("Checkout default repository.")
.checkoutItems(new CheckoutItem().defaultRepository()),
new ScriptTask()
.description("Build licence.")
.location(Location.FILE)
.fileFromPath("bamboo-specs/src/main/resources/scripts/create-licence.sh"))
.dockerConfiguration(
new DockerConfiguration()
.image("nexus.iqser.com:5001/infra/maven:3.6.2-jdk-13-3.0.0")
.volume("/etc/maven/settings.xml", "/usr/share/maven/ref/settings.xml")
.volume("/var/run/docker.sock", "/var/run/docker.sock"))))
.linkedRepositories("RR / " + SERVICE_NAME)
.triggers(
new BitbucketServerTrigger())
.planBranchManagement(
new PlanBranchManagement()
.createForVcsBranch()
.delete(
new BranchCleanup()
.whenInactiveInRepositoryAfterDays(14))
.notificationForCommitters());
}
public Plan createSecBuild() {
return new Plan(project(), SERVICE_NAME + "-Sec", new BambooKey(SERVICE_KEY + "SEC")).description("Security Analysis Plan")
.stages(new Stage("Default Stage").jobs(
new Job("Sonar Job", new BambooKey("SONAR"))
.tasks(
new CleanWorkingDirectoryTask()
.description("Clean working directory.")
.enabled(true),
new VcsCheckoutTask()
.description("Checkout default repository.")
.checkoutItems(new CheckoutItem().defaultRepository()),
new ScriptTask()
.description("Set config and keys.")
.location(Location.FILE)
.fileFromPath("bamboo-specs/src/main/resources/scripts/key-prepare.sh"),
new ScriptTask()
.description("Run Sonarqube scan.")
.location(Location.FILE)
.fileFromPath("bamboo-specs/src/main/resources/scripts/sonar-scan.sh")
.argument(SERVICE_NAME))
.dockerConfiguration(
new DockerConfiguration()
.image("nexus.iqser.com:5001/infra/release_build:4.2.0")
.volume("/var/run/docker.sock", "/var/run/docker.sock"))))
.linkedRepositories("RR / " + SERVICE_NAME)
.triggers(
new ScheduledTrigger()
.scheduleOnceDaily(LocalTime.of(23, 00)))
.planBranchManagement(
new PlanBranchManagement()
.createForVcsBranchMatching("release.*")
.notificationForCommitters());
}
}

View File

@ -1,19 +0,0 @@
#!/bin/bash
set -e
if [[ \"${bamboo_version_tag}\" != \"dev\" ]]
then
${bamboo_capability_system_builder_mvn3_Maven_3}/bin/mvn \
-f ${bamboo_build_working_directory}/pom.xml \
versions:set \
-DnewVersion=${bamboo_version_tag}
${bamboo_capability_system_builder_mvn3_Maven_3}/bin/mvn \
-f ${bamboo_build_working_directory}/pom.xml \
-B clean deploy \
-e -DdeployAtEnd=true \
-Dmaven.wagon.http.ssl.insecure=true \
-Dmaven.wagon.http.ssl.allowall=true \
-Dmaven.wagon.http.ssl.ignore.validity.dates=true \
-DaltDeploymentRepository=iqser_release::default::https://nexus.iqser.com/repository/gin4-platform-releases
fi

View File

@ -1,53 +0,0 @@
#!/bin/bash
set -e
SERVICE_NAME=$1
if [[ "$bamboo_planRepository_branchName" == "master" ]]
then
branchVersion=$(cat version.yaml | grep -Eo "version: .*" | sed -s 's|version: \(.*\)\..*\..*|\1|g')
latestVersion=$( semver $(git tag -l "${branchVersion}.*" ) | tail -n1 )
newVersion="$(semver $latestVersion -p -i minor)"
echo "new release on master with version $newVersion"
elif [[ "$bamboo_planRepository_branchName" == release* ]]
then
branchVersion=$(echo $bamboo_planRepository_branchName | sed -s 's|release\/\([0-9]\+\.[0-9]\+\)\.x|\1|')
latestVersion=$( semver $(git tag -l "${branchVersion}.*" ) | tail -n1 )
newVersion="$(semver $latestVersion -p -i patch)"
echo "new release on $bamboo_planRepository_branchName with version $newVersion"
elif [[ "${bamboo_version_tag}" != "dev" ]]
then
newVersion="${bamboo_version_tag}"
echo "new special version bild with $newVersion"
else
newVersion="${bamboo_planRepository_1_branch}_${bamboo_buildNumber}"
echo "gitTag=${newVersion}" > git.tag
echo "dev build with tag ${newVersion}"
python3 -m venv build_venv
source build_venv/bin/activate
python3 -m pip install --upgrade pip
pip install dvc
pip install 'dvc[ssh]'
dvc pull
echo "index-url = https://${bamboo_nexus_user}:${bamboo_nexus_password}@nexus.iqser.com/repository/python-combind/simple" >> pip.conf
echo "${bamboo_nexus_password}" | docker login --username "${bamboo_nexus_user}" --password-stdin nexus.iqser.com:5001
docker build -f Dockerfile .
exit 0
fi
echo "gitTag=${newVersion}" > git.tag
python3 -m venv build_venv
source build_venv/bin/activate
python3 -m pip install --upgrade pip
pip install dvc
pip install 'dvc[ssh]'
dvc pull
echo "index-url = https://${bamboo_nexus_user}:${bamboo_nexus_password}@nexus.iqser.com/repository/python-combind/simple" >> pip.conf
docker build -f Dockerfile -t nexus.iqser.com:5001/red/$SERVICE_NAME:${newVersion} .
echo "${bamboo_nexus_password}" | docker login --username "${bamboo_nexus_user}" --password-stdin nexus.iqser.com:5001
docker push nexus.iqser.com:5001/red/$SERVICE_NAME:${newVersion}

View File

@ -1,8 +0,0 @@
#!/bin/bash
set -e
mkdir -p ~/.ssh
echo "${bamboo_agent_ssh}" | base64 -d >> ~/.ssh/id_rsa
echo "host vector.iqser.com" > ~/.ssh/config
echo " user bamboo-agent" >> ~/.ssh/config
chmod 600 ~/.ssh/config ~/.ssh/id_rsa

View File

@ -1,67 +0,0 @@
#!/bin/bash
set -e
export JAVA_HOME=/usr/bin/sonar-scanner/jre
python3 -m venv build_venv
source build_venv/bin/activate
python3 -m pip install --upgrade pip
echo "dev setup for unit test and coverage"
pip install -e incl/pyinfra
pip install -r incl/pyinfra/requirements.txt
pip install -e incl/pdf2image
pip install -r incl/pdf2image/requirements.txt
pip install -e .
pip install -r requirements.txt
echo "DVC pull step"
dvc pull
echo "coverage calculation"
coverage run -m pytest
echo "coverage report generation"
coverage report -m
coverage xml
SERVICE_NAME=$1
echo "dependency-check:aggregate"
mkdir -p reports
dependency-check --enableExperimental -f JSON -f HTML -f XML \
--disableAssembly -s . -o reports --project $SERVICE_NAME --exclude ".git/**" --exclude "venv/**" \
--exclude "build_venv/**" --exclude "**/__pycache__/**"
if [[ -z "${bamboo_repository_pr_key}" ]]
then
echo "Sonar Scan for branch: ${bamboo_planRepository_1_branch}"
/usr/bin/sonar-scanner/bin/sonar-scanner -X\
-Dsonar.projectKey=RED_$SERVICE_NAME \
-Dsonar.sources=src,cv_analysis \
-Dsonar.host.url=https://sonarqube.iqser.com \
-Dsonar.login=${bamboo_sonarqube_api_token_secret} \
-Dsonar.branch.name=${bamboo_planRepository_1_branch} \
-Dsonar.dependencyCheck.jsonReportPath=reports/dependency-check-report.json \
-Dsonar.dependencyCheck.xmlReportPath=reports/dependency-check-report.xml \
-Dsonar.dependencyCheck.htmlReportPath=reports/dependency-check-report.html \
-Dsonar.python.coverage.reportPaths=reports/coverage.xml
else
echo "Sonar Scan for PR with key1: ${bamboo_repository_pr_key}"
/usr/bin/sonar-scanner/bin/sonar-scanner \
-Dsonar.projectKey=RED_$SERVICE_NAME \
-Dsonar.sources=src,cv_analysis \
-Dsonar.host.url=https://sonarqube.iqser.com \
-Dsonar.login=${bamboo_sonarqube_api_token_secret} \
-Dsonar.pullrequest.key=${bamboo_repository_pr_key} \
-Dsonar.pullrequest.branch=${bamboo_repository_pr_sourceBranch} \
-Dsonar.pullrequest.base=${bamboo_repository_pr_targetBranch} \
-Dsonar.dependencyCheck.jsonReportPath=reports/dependency-check-report.json \
-Dsonar.dependencyCheck.xmlReportPath=reports/dependency-check-report.xml \
-Dsonar.dependencyCheck.htmlReportPath=reports/dependency-check-report.html \
-Dsonar.python.coverage.reportPaths=reports/coverage.xml
fi

View File

@ -1,22 +0,0 @@
package buildjob;
import com.atlassian.bamboo.specs.api.builders.plan.Plan;
import com.atlassian.bamboo.specs.api.exceptions.PropertiesValidationException;
import com.atlassian.bamboo.specs.api.util.EntityPropertiesBuilders;
import org.junit.Test;
public class PlanSpecTest {
@Test
public void checkYourPlanOffline() throws PropertiesValidationException {
Plan plan = new PlanSpec().createDockerBuildPlan();
EntityPropertiesBuilders.build(plan);
}
@Test
public void checkYourSecPlanOffline() throws PropertiesValidationException {
Plan secPlan = new PlanSpec().createSecBuild();
EntityPropertiesBuilders.build(secPlan);
}
}

30096
bom.json Normal file

File diff suppressed because it is too large Load Diff

67
config/pyinfra.toml Normal file
View File

@ -0,0 +1,67 @@
[asyncio]
max_concurrent_tasks = 10
[dynamic_tenant_queues]
enabled = true
[metrics.prometheus]
enabled = true
prefix = "redactmanager_cv_analysis_service"
[tracing]
enabled = true
# possible values "opentelemetry" | "azure_monitor" (Excpects APPLICATIONINSIGHTS_CONNECTION_STRING environment variable.)
type = "azure_monitor"
[tracing.opentelemetry]
endpoint = "http://otel-collector-opentelemetry-collector.otel-collector:4318/v1/traces"
service_name = "redactmanager_cv_analysis_service"
exporter = "otlp"
[webserver]
host = "0.0.0.0"
port = 8080
[rabbitmq]
host = "localhost"
port = 5672
username = ""
password = ""
heartbeat = 60
# Has to be a divider of heartbeat, and shouldn't be too big, since only in these intervals queue interactions happen (like receiving new messages)
# This is also the minimum time the service needs to process a message
connection_sleep = 5
input_queue = "request_queue"
output_queue = "response_queue"
dead_letter_queue = "dead_letter_queue"
tenant_event_queue_suffix = "_tenant_event_queue"
tenant_event_dlq_suffix = "_tenant_events_dlq"
tenant_exchange_name = "tenants-exchange"
queue_expiration_time = 300000 # 5 minutes in milliseconds
service_request_queue_prefix = "cv_analysis_request_queue"
service_request_exchange_name = "cv_analysis_request_exchange"
service_response_exchange_name = "cv_analysis_response_exchange"
service_dlq_name = "cv_analysis_dlq"
[storage]
backend = "s3"
[storage.s3]
bucket = "redaction"
endpoint = "http://127.0.0.1:9000"
key = ""
secret = ""
region = "eu-central-1"
[storage.azure]
container = "redaction"
connection_string = ""
[storage.tenant_server]
public_key = ""
endpoint = "http://tenant-user-management:8081/internal-api/tenants"
[kubernetes]
pod_name = "test_pod"

19
config/settings.toml Normal file
View File

@ -0,0 +1,19 @@
[logging]
level = "INFO"
visual_logging_level = "DISABLED"
visual_logging_output_folder = "/tmp/debug"
[table_parsing]
skip_pages_without_images = true
[paths]
root = "@format {env[ROOT_PATH]}"
dvc_data_dir = "${paths.root}/data"
pdf_for_testing = "${paths.dvc_data_dir}/pdfs_for_testing"
png_for_testing = "${paths.dvc_data_dir}/pngs_for_testing"
png_figures_detected = "${paths.png_for_testing}/figures_detected"
png_tables_detected = "${paths.png_for_testing}/tables_detected_by_tp"
hashed_pdfs_for_testing = "${paths.pdf_for_testing}/hashed"
metadata_test_files = "${paths.dvc_data_dir}/metadata_testing_files.csv"
test_dir = "${paths.dvc_data_dir}/test"
test_data_dir = "${paths.dvc_data_dir}/test/test_data"

View File

@ -1,31 +0,0 @@
import os
def get_config():
return Config()
class Config:
def __init__(self):
self.logging_level_root = os.environ.get("LOGGING_LEVEL_ROOT", "INFO")
self.table_parsing_skip_pages_without_images = os.environ.get("TABLE_PARSING_SKIP_PAGES_WITHOUT_IMAGES", True)
# visual_logging_level: NOTHING > INFO > DEBUG > ALL
self.visual_logging_level = "DISABLED"
self.visual_logging_output_folder = "/tmp/debug"
# locations
# FIXME: is everything here necessary?
root = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
self.dvc_data_dir = os.path.join(root, "data")
self.pdf_for_testing = os.path.join(self.dvc_data_dir, "pdfs_for_testing")
self.png_for_testing = os.path.join(self.dvc_data_dir, "pngs_for_testing")
self.png_figures_detected = os.path.join(self.png_for_testing, "figures_detected")
self.png_tables_detected = os.path.join(self.png_for_testing, "tables_detected_by_tp")
self.hashed_pdfs_for_testing = os.path.join(self.pdf_for_testing, "hashed")
self.metadata_test_files = os.path.join(self.dvc_data_dir, "metadata_testing_files.csv")
self.test_dir = os.path.join(root, "test")
self.test_data_dir = os.path.join(self.test_dir, "test_data")
def __getitem__(self, key):
return self.__getattribute__(key)

View File

@ -1,61 +0,0 @@
from dataclasses import asdict
from operator import truth
from funcy import lmap, flatten
from cv_analysis.figure_detection.figure_detection import detect_figures
from cv_analysis.table_parsing import parse_tables
from cv_analysis.utils.structures import Rectangle
from pdf2img.conversion import convert_pages_to_images
from pdf2img.default_objects.image import ImagePlus, ImageInfo
from pdf2img.default_objects.rectangle import RectanglePlus
def get_analysis_pipeline(operation, table_parsing_skip_pages_without_images):
if operation == "table":
return make_analysis_pipeline(
parse_tables,
table_parsing_formatter,
dpi=200,
skip_pages_without_images=table_parsing_skip_pages_without_images,
)
elif operation == "figure":
return make_analysis_pipeline(detect_figures, figure_detection_formatter, dpi=200)
else:
raise
def make_analysis_pipeline(analysis_fn, formatter, dpi, skip_pages_without_images=False):
def analyse_pipeline(pdf: bytes, index=None):
def parse_page(page: ImagePlus):
image = page.asarray()
rects = analysis_fn(image)
if not rects:
return
infos = formatter(rects, page, dpi)
return infos
pages = convert_pages_to_images(pdf, index=index, dpi=dpi, skip_pages_without_images=skip_pages_without_images)
results = map(parse_page, pages)
yield from flatten(filter(truth, results))
return analyse_pipeline
def table_parsing_formatter(rects, page: ImagePlus, dpi):
def format_rect(rect: Rectangle):
rect_plus = RectanglePlus.from_pixels(*rect.xyxy(), page.info, alpha=False, dpi=dpi)
return rect_plus.asdict(derotate=True)
bboxes = lmap(format_rect, rects)
return {"pageInfo": page.asdict(natural_index=True), "tableCells": bboxes}
def figure_detection_formatter(rects, page, dpi):
def format_rect(rect: Rectangle):
rect_plus = RectanglePlus.from_pixels(*rect.xyxy(), page.info, alpha=False, dpi=dpi)
return asdict(ImageInfo(page.info, rect_plus.asbbox(derotate=False), rect_plus.alpha))
return lmap(format_rect, rects)

View File

@ -1,139 +0,0 @@
from functools import partial
from itertools import chain, starmap
from operator import attrgetter
import cv2
import numpy as np
from funcy import lmap, lfilter
from cv_analysis.layout_parsing import parse_layout
from cv_analysis.utils.postprocessing import remove_isolated # xywh_to_vecs, xywh_to_vec_rect, adjacent1d
from cv_analysis.utils.structures import Rectangle
from cv_analysis.utils.visual_logging import vizlogger
def add_external_contours(image, image_h_w_lines_only):
contours, _ = cv2.findContours(image_h_w_lines_only, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)
for cnt in contours:
x, y, w, h = cv2.boundingRect(cnt)
cv2.rectangle(image, (x, y), (x + w, y + h), 255, 1)
return image
def apply_motion_blur(image: np.array, angle, size=80):
"""Solidifies and slightly extends detected lines.
Args:
image (np.array): page image as array
angle: direction in which to apply blur, 0 or 90
size (int): kernel size; 80 found empirically to work well
Returns:
np.array
"""
k = np.zeros((size, size), dtype=np.float32)
vizlogger.debug(k, "tables08_blur_kernel1.png")
k[(size - 1) // 2, :] = np.ones(size, dtype=np.float32)
vizlogger.debug(k, "tables09_blur_kernel2.png")
k = cv2.warpAffine(
k,
cv2.getRotationMatrix2D((size / 2 - 0.5, size / 2 - 0.5), angle, 1.0),
(size, size),
)
vizlogger.debug(k, "tables10_blur_kernel3.png")
k = k * (1.0 / np.sum(k))
vizlogger.debug(k, "tables11_blur_kernel4.png")
blurred = cv2.filter2D(image, -1, k)
return blurred
def isolate_vertical_and_horizontal_components(img_bin):
"""Identifies and reinforces horizontal and vertical lines in a binary image.
Args:
img_bin (np.array): array corresponding to single binarized page image
bounding_rects (list): list of layout boxes of the form (x, y, w, h), potentially containing tables
Returns:
np.array
"""
line_min_width = 48
kernel_h = np.ones((1, line_min_width), np.uint8)
kernel_v = np.ones((line_min_width, 1), np.uint8)
img_bin_h = cv2.morphologyEx(img_bin, cv2.MORPH_OPEN, kernel_h)
img_bin_v = cv2.morphologyEx(img_bin, cv2.MORPH_OPEN, kernel_v)
img_lines_raw = img_bin_v | img_bin_h
kernel_h = np.ones((1, 30), np.uint8)
kernel_v = np.ones((30, 1), np.uint8)
img_bin_h = cv2.dilate(img_bin_h, kernel_h, iterations=2)
img_bin_v = cv2.dilate(img_bin_v, kernel_v, iterations=2)
img_bin_h = apply_motion_blur(img_bin_h, 0)
img_bin_v = apply_motion_blur(img_bin_v, 90)
img_bin_extended = img_bin_h | img_bin_v
th1, img_bin_extended = cv2.threshold(img_bin_extended, 120, 255, cv2.THRESH_BINARY)
img_bin_final = cv2.dilate(img_bin_extended, np.ones((1, 1), np.uint8), iterations=1)
# add contours before lines are extended by blurring
img_bin_final = add_external_contours(img_bin_final, img_lines_raw)
return img_bin_final
def find_table_layout_boxes(image: np.array):
def is_large_enough(box):
(x, y, w, h) = box
if w * h >= 100000:
return Rectangle.from_xywh(box)
layout_boxes = parse_layout(image)
a = lmap(is_large_enough, layout_boxes)
return lmap(is_large_enough, layout_boxes)
def preprocess(image: np.array):
image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) if len(image.shape) > 2 else image
_, image = cv2.threshold(image, 195, 255, cv2.THRESH_BINARY)
return ~image
def turn_connected_components_into_rects(image: np.array):
def is_large_enough(stat):
x1, y1, w, h, area = stat
return area > 2000 and w > 35 and h > 25
_, _, stats, _ = cv2.connectedComponentsWithStats(~image, connectivity=8, ltype=cv2.CV_32S)
stats = lfilter(is_large_enough, stats)
if stats:
stats = np.vstack(stats)
return stats[:, :-1][2:]
return []
def parse_tables(image: np.array, show=False):
"""Runs the full table parsing process.
Args:
image (np.array): single PDF page, converted to a numpy array
Returns:
list: list of rectangles corresponding to table cells
"""
image = preprocess(image)
image = isolate_vertical_and_horizontal_components(image)
rects = turn_connected_components_into_rects(image)
#print(rects, "\n\n")
rects = list(map(Rectangle.from_xywh, rects))
#print(rects, "\n\n")
rects = remove_isolated(rects)
#print(rects, "\n\n")
return rects

BIN
data/2017-1078223.pdf Normal file

Binary file not shown.

Binary file not shown.

File diff suppressed because it is too large Load Diff

Binary file not shown.

30
devenvsetup.sh Normal file
View File

@ -0,0 +1,30 @@
#!/bin/bash
python_version=$1
gitlab_user=$2
gitlab_personal_access_token=$3
# cookiecutter https://gitlab.knecon.com/knecon/research/template-python-project.git --checkout master
# latest_dir=$(ls -td -- */ | head -n 1) # should be the dir cookiecutter just created
# cd $latest_dir
pyenv install $python_version
pyenv local $python_version
pyenv shell $python_version
pip install --upgrade pip
pip install poetry
poetry config installer.max-workers 10
# research package registry
poetry config repositories.gitlab-research https://gitlab.knecon.com/api/v4/groups/19/-/packages/pypi
poetry config http-basic.gitlab-research ${gitlab_user} ${gitlab_personal_access_token}
# redactmanager package registry
poetry config repositories.gitlab-red https://gitlab.knecon.com/api/v4/groups/12/-/packages/pypi
poetry config http-basic.gitlab-red ${gitlab_user} ${gitlab_personal_access_token}
poetry env use $(pyenv which python)
poetry install --with=dev
poetry update
source .venv/bin/activate

View File

@ -28,4 +28,4 @@ services:
volumes:
- /opt/bitnami/rabbitmq/.rabbitmq/:/data/bitnami
volumes:
mdata:
mdata:

4
docs/build/html/.buildinfo vendored Normal file
View File

@ -0,0 +1,4 @@
# Sphinx build info version 1
# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done.
config: 04e9c6c5d3e412413c2949e598da60dc
tags: 645f666f9bcd5a90fca523b33c5a78b7

BIN
docs/build/html/.doctrees/README.doctree vendored Normal file

Binary file not shown.

Binary file not shown.

BIN
docs/build/html/.doctrees/index.doctree vendored Normal file

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

657
docs/build/html/README.html vendored Normal file
View File

@ -0,0 +1,657 @@
<!DOCTYPE html>
<html lang="en" data-content_root="./" >
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="viewport" content="width=device-width, initial-scale=1" />
<title>cv-analysis - Visual (CV-Based) Document Parsing &#8212; CV Analysis Service 2.5.2 documentation</title>
<script data-cfasync="false">
document.documentElement.dataset.mode = localStorage.getItem("mode") || "";
document.documentElement.dataset.theme = localStorage.getItem("theme") || "light";
</script>
<!-- Loaded before other Sphinx assets -->
<link href="_static/styles/theme.css?digest=8d27b9dea8ad943066ae" rel="stylesheet" />
<link href="_static/styles/bootstrap.css?digest=8d27b9dea8ad943066ae" rel="stylesheet" />
<link href="_static/styles/pydata-sphinx-theme.css?digest=8d27b9dea8ad943066ae" rel="stylesheet" />
<link href="_static/vendor/fontawesome/6.5.1/css/all.min.css?digest=8d27b9dea8ad943066ae" rel="stylesheet" />
<link rel="preload" as="font" type="font/woff2" crossorigin href="_static/vendor/fontawesome/6.5.1/webfonts/fa-solid-900.woff2" />
<link rel="preload" as="font" type="font/woff2" crossorigin href="_static/vendor/fontawesome/6.5.1/webfonts/fa-brands-400.woff2" />
<link rel="preload" as="font" type="font/woff2" crossorigin href="_static/vendor/fontawesome/6.5.1/webfonts/fa-regular-400.woff2" />
<link rel="stylesheet" type="text/css" href="_static/pygments.css?v=a746c00c" />
<link rel="stylesheet" type="text/css" href="https://assets.readthedocs.org/static/css/badge_only.css" />
<!-- Pre-loaded scripts that we'll load fully later -->
<link rel="preload" as="script" href="_static/scripts/bootstrap.js?digest=8d27b9dea8ad943066ae" />
<link rel="preload" as="script" href="_static/scripts/pydata-sphinx-theme.js?digest=8d27b9dea8ad943066ae" />
<script src="_static/vendor/fontawesome/6.5.1/js/all.min.js?digest=8d27b9dea8ad943066ae"></script>
<script src="_static/documentation_options.js?v=afc61bbc"></script>
<script src="_static/doctools.js?v=9a2dae69"></script>
<script src="_static/sphinx_highlight.js?v=dc90522c"></script>
<script>DOCUMENTATION_OPTIONS.pagename = 'README';</script>
<script async="async" src="https://assets.readthedocs.org/static/javascript/readthedocs-doc-embed.js"></script>
<link rel="index" title="Index" href="genindex.html" />
<link rel="search" title="Search" href="search.html" />
<link rel="next" title="cv_analysis package" href="modules/cv_analysis.html" />
<link rel="prev" title="Welcome to CV Analysis Service documentation!" href="index.html" />
<meta name="viewport" content="width=device-width, initial-scale=1"/>
<meta name="docsearch:language" content="en"/>
<!-- RTD Extra Head -->
<link rel="stylesheet" href="https://assets.readthedocs.org/static/css/readthedocs-doc-embed.css" type="text/css" />
<script type="application/json" id="READTHEDOCS_DATA">{"ad_free": "", "api_host": "", "builder": "sphinx", "canonical_url": "", "docroot": "", "features": {"docsearch_disabled": false}, "global_analytics_code": null, "language": "", "page": "README", "programming_language": "", "project": "", "source_suffix": ".md", "subprojects": {}, "theme": "", "user_analytics_code": null, "version": ""}</script>
<!--
Using this variable directly instead of using `JSON.parse` is deprecated.
The READTHEDOCS_DATA global variable will be removed in the future.
-->
<script type="text/javascript">
READTHEDOCS_DATA = JSON.parse(document.getElementById('READTHEDOCS_DATA').innerHTML);
</script>
<script type="text/javascript" src="https://assets.readthedocs.org/static/javascript/readthedocs-analytics.js" async="async"></script>
<!-- end RTD <extrahead> -->
</head>
<body data-bs-spy="scroll" data-bs-target=".bd-toc-nav" data-offset="180" data-bs-root-margin="0px 0px -60%" data-default-mode="">
<a id="pst-skip-link" class="skip-link" href="#main-content">Skip to main content</a>
<div id="pst-scroll-pixel-helper"></div>
<button type="button" class="btn rounded-pill" id="pst-back-to-top">
<i class="fa-solid fa-arrow-up"></i>
Back to top
</button>
<input type="checkbox"
class="sidebar-toggle"
name="__primary"
id="__primary"/>
<label class="overlay overlay-primary" for="__primary"></label>
<input type="checkbox"
class="sidebar-toggle"
name="__secondary"
id="__secondary"/>
<label class="overlay overlay-secondary" for="__secondary"></label>
<div class="search-button__wrapper">
<div class="search-button__overlay"></div>
<div class="search-button__search-container">
<form class="bd-search d-flex align-items-center"
action="search.html"
method="get">
<i class="fa-solid fa-magnifying-glass"></i>
<input type="search"
class="form-control"
name="q"
id="search-input"
placeholder="Search the docs ..."
aria-label="Search the docs ..."
autocomplete="off"
autocorrect="off"
autocapitalize="off"
spellcheck="false"/>
<span class="search-button__kbd-shortcut"><kbd class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd>K</kbd></span>
</form></div>
</div>
<header class="bd-header navbar navbar-expand-lg bd-navbar">
<div class="bd-header__inner bd-page-width">
<label class="sidebar-toggle primary-toggle" for="__primary">
<span class="fa-solid fa-bars"></span>
</label>
<div class="col-lg-3 navbar-header-items__start">
<div class="navbar-item">
<a class="navbar-brand logo" href="index.html">
<img src="_static/logo.png" class="logo__image only-light" alt="CV Analysis Service 2.5.2 documentation - Home"/>
<script>document.write(`<img src="_static/logo.png" class="logo__image only-dark" alt="CV Analysis Service 2.5.2 documentation - Home"/>`);</script>
</a></div>
</div>
<div class="col-lg-9 navbar-header-items">
<div class="me-auto navbar-header-items__center">
<div class="navbar-item">
<nav class="navbar-nav">
<ul class="bd-navbar-elements navbar-nav">
<li class="nav-item current active">
<a class="nav-link nav-internal" href="#">
cv-analysis - Visual (CV-Based) Document Parsing
</a>
</li>
<li class="nav-item">
<a class="nav-link nav-internal" href="modules/cv_analysis.html">
cv_analysis package
</a>
</li>
<li class="nav-item">
<a class="nav-link nav-internal" href="modules/serve.html">
serve module
</a>
</li>
</ul>
</nav></div>
</div>
<div class="navbar-header-items__end">
<div class="navbar-item navbar-persistent--container">
<script>
document.write(`
<button class="btn navbar-btn search-button-field search-button__button" title="Search" aria-label="Search" data-bs-placement="bottom" data-bs-toggle="tooltip">
<i class="fa-solid fa-magnifying-glass"></i>
<span class="search-button__default-text">Search</span>
<span class="search-button__kbd-shortcut"><kbd class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd class="kbd-shortcut__modifier">K</kbd></span>
</button>
`);
</script>
</div>
<div class="navbar-item">
<script>
document.write(`
<button class="btn btn-sm navbar-btn theme-switch-button" title="light/dark" aria-label="light/dark" data-bs-placement="bottom" data-bs-toggle="tooltip">
<span class="theme-switch nav-link" data-mode="light"><i class="fa-solid fa-sun fa-lg"></i></span>
<span class="theme-switch nav-link" data-mode="dark"><i class="fa-solid fa-moon fa-lg"></i></span>
<span class="theme-switch nav-link" data-mode="auto"><i class="fa-solid fa-circle-half-stroke fa-lg"></i></span>
</button>
`);
</script></div>
</div>
</div>
<div class="navbar-persistent--mobile">
<script>
document.write(`
<button class="btn navbar-btn search-button-field search-button__button" title="Search" aria-label="Search" data-bs-placement="bottom" data-bs-toggle="tooltip">
<i class="fa-solid fa-magnifying-glass"></i>
<span class="search-button__default-text">Search</span>
<span class="search-button__kbd-shortcut"><kbd class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd class="kbd-shortcut__modifier">K</kbd></span>
</button>
`);
</script>
</div>
<label class="sidebar-toggle secondary-toggle" for="__secondary" tabindex="0">
<span class="fa-solid fa-outdent"></span>
</label>
</div>
</header>
<div class="bd-container">
<div class="bd-container__inner bd-page-width">
<div class="bd-sidebar-primary bd-sidebar">
<div class="sidebar-header-items sidebar-primary__section">
<div class="sidebar-header-items__center">
<div class="navbar-item">
<nav class="navbar-nav">
<ul class="bd-navbar-elements navbar-nav">
<li class="nav-item current active">
<a class="nav-link nav-internal" href="#">
cv-analysis - Visual (CV-Based) Document Parsing
</a>
</li>
<li class="nav-item">
<a class="nav-link nav-internal" href="modules/cv_analysis.html">
cv_analysis package
</a>
</li>
<li class="nav-item">
<a class="nav-link nav-internal" href="modules/serve.html">
serve module
</a>
</li>
</ul>
</nav></div>
</div>
<div class="sidebar-header-items__end">
<div class="navbar-item">
<script>
document.write(`
<button class="btn btn-sm navbar-btn theme-switch-button" title="light/dark" aria-label="light/dark" data-bs-placement="bottom" data-bs-toggle="tooltip">
<span class="theme-switch nav-link" data-mode="light"><i class="fa-solid fa-sun fa-lg"></i></span>
<span class="theme-switch nav-link" data-mode="dark"><i class="fa-solid fa-moon fa-lg"></i></span>
<span class="theme-switch nav-link" data-mode="auto"><i class="fa-solid fa-circle-half-stroke fa-lg"></i></span>
</button>
`);
</script></div>
</div>
</div>
<div class="sidebar-primary-items__start sidebar-primary__section">
<div class="sidebar-primary-item">
<nav class="bd-docs-nav bd-links"
aria-label="Section Navigation">
<p class="bd-links__title" role="heading" aria-level="1">Section Navigation</p>
<div class="bd-toc-item navbar-nav"></div>
</nav></div>
</div>
<div class="sidebar-primary-items__end sidebar-primary__section">
</div>
<div id="rtd-footer-container"></div>
</div>
<main id="main-content" class="bd-main">
<div class="bd-content">
<div class="bd-article-container">
<div class="bd-header-article">
<div class="header-article-items header-article__inner">
<div class="header-article-items__start">
<div class="header-article-item">
<nav aria-label="Breadcrumb">
<ul class="bd-breadcrumbs">
<li class="breadcrumb-item breadcrumb-home">
<a href="index.html" class="nav-link" aria-label="Home">
<i class="fa-solid fa-home"></i>
</a>
</li>
<li class="breadcrumb-item active" aria-current="page">cv-analysis...</li>
</ul>
</nav>
</div>
</div>
</div>
</div>
<div id="searchbox"></div>
<article class="bd-article">
<section id="cv-analysis-visual-cv-based-document-parsing">
<h1>cv-analysis - Visual (CV-Based) Document Parsing<a class="headerlink" href="#cv-analysis-visual-cv-based-document-parsing" title="Link to this heading">#</a></h1>
<p>parse_pdf()
This repository implements computer vision based approaches for detecting and parsing visual features such as tables or
previous redactions in documents.</p>
<section id="api">
<h2>API<a class="headerlink" href="#api" title="Link to this heading">#</a></h2>
<p>Input message:</p>
<div class="highlight-json notranslate"><div class="highlight"><pre><span></span><span class="p">{</span>
<span class="w"> </span><span class="nt">&quot;targetFilePath&quot;</span><span class="p">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">&quot;pdf&quot;</span><span class="p">:</span><span class="w"> </span><span class="s2">&quot;absolute file path&quot;</span><span class="p">,</span>
<span class="w"> </span><span class="nt">&quot;vlp_output&quot;</span><span class="p">:</span><span class="w"> </span><span class="s2">&quot;absolute file path&quot;</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="nt">&quot;responseFilePath&quot;</span><span class="p">:</span><span class="w"> </span><span class="s2">&quot;absolute file path&quot;</span><span class="p">,</span>
<span class="w"> </span><span class="nt">&quot;operation&quot;</span><span class="p">:</span><span class="w"> </span><span class="s2">&quot;table_image_inference&quot;</span>
<span class="p">}</span>
</pre></div>
</div>
<p>Response is uploaded to the storage as specified in the <code class="docutils literal notranslate"><span class="pre">responseFilePath</span></code> field. The structure is as follows:</p>
<div class="highlight-json notranslate"><div class="highlight"><pre><span></span><span class="p">{</span>
<span class="w"> </span><span class="err">...</span><span class="p">,</span>
<span class="w"> </span><span class="nt">&quot;data&quot;</span><span class="p">:</span><span class="w"> </span><span class="p">[</span>
<span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="err">&#39;pageNum&#39;</span><span class="p">:</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span>
<span class="w"> </span><span class="err">&#39;bbox&#39;</span><span class="p">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="err">&#39;x</span><span class="mi">1</span><span class="err">&#39;</span><span class="p">:</span><span class="w"> </span><span class="mf">55.3407</span><span class="p">,</span>
<span class="w"> </span><span class="err">&#39;y</span><span class="mi">1</span><span class="err">&#39;</span><span class="p">:</span><span class="w"> </span><span class="mf">247.0246</span><span class="p">,</span>
<span class="w"> </span><span class="err">&#39;x</span><span class="mi">2</span><span class="err">&#39;</span><span class="p">:</span><span class="w"> </span><span class="mf">558.5602</span><span class="p">,</span>
<span class="w"> </span><span class="err">&#39;y</span><span class="mi">2</span><span class="err">&#39;</span><span class="p">:</span><span class="w"> </span><span class="mf">598.0585</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="err">&#39;uuid&#39;</span><span class="p">:</span><span class="w"> </span><span class="err">&#39;</span><span class="mi">2</span><span class="err">b</span><span class="mi">10</span><span class="err">c</span><span class="mi">1</span><span class="err">a</span><span class="mi">2-393</span><span class="err">c</span><span class="mi">-4</span><span class="kc">f</span><span class="err">ca</span><span class="mi">-</span><span class="err">b</span><span class="mf">9e3-0</span><span class="err">ad</span><span class="mi">5</span><span class="err">b</span><span class="mi">774</span><span class="err">ac</span><span class="mi">84</span><span class="err">&#39;</span><span class="p">,</span>
<span class="w"> </span><span class="err">&#39;label&#39;</span><span class="p">:</span><span class="w"> </span><span class="err">&#39;</span><span class="kc">ta</span><span class="err">ble&#39;</span><span class="p">,</span>
<span class="w"> </span><span class="err">&#39;</span><span class="kc">ta</span><span class="err">bleLi</span><span class="kc">nes</span><span class="err">&#39;</span><span class="p">:</span><span class="w"> </span><span class="p">[</span>
<span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="err">&#39;x</span><span class="mi">1</span><span class="err">&#39;</span><span class="p">:</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span>
<span class="w"> </span><span class="err">&#39;y</span><span class="mi">1</span><span class="err">&#39;</span><span class="p">:</span><span class="w"> </span><span class="mi">16</span><span class="p">,</span>
<span class="w"> </span><span class="err">&#39;x</span><span class="mi">2</span><span class="err">&#39;</span><span class="p">:</span><span class="w"> </span><span class="mi">1399</span><span class="p">,</span>
<span class="w"> </span><span class="err">&#39;y</span><span class="mi">2</span><span class="err">&#39;</span><span class="p">:</span><span class="w"> </span><span class="mi">16</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="err">...</span>
<span class="w"> </span><span class="p">],</span>
<span class="w"> </span><span class="err">&#39;imageI</span><span class="kc">nf</span><span class="err">o&#39;</span><span class="p">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="err">&#39;heigh</span><span class="kc">t</span><span class="err">&#39;</span><span class="p">:</span><span class="w"> </span><span class="mi">693</span><span class="p">,</span>
<span class="w"> </span><span class="err">&#39;wid</span><span class="kc">t</span><span class="err">h&#39;</span><span class="p">:</span><span class="w"> </span><span class="mi">1414</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="err">...</span>
<span class="w"> </span><span class="p">]</span>
<span class="p">}</span>
</pre></div>
</div>
</section>
<section id="installation">
<h2>Installation<a class="headerlink" href="#installation" title="Link to this heading">#</a></h2>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>git<span class="w"> </span>clone<span class="w"> </span>ssh://git@git.iqser.com:2222/rr/cv-analysis.git
<span class="nb">cd</span><span class="w"> </span>cv-analysis
python<span class="w"> </span>-m<span class="w"> </span>venv<span class="w"> </span>env
<span class="nb">source</span><span class="w"> </span>env/bin/activate
pip<span class="w"> </span>install<span class="w"> </span>-e<span class="w"> </span>.
pip<span class="w"> </span>install<span class="w"> </span>-r<span class="w"> </span>requirements.txt
dvc<span class="w"> </span>pull
</pre></div>
</div>
</section>
<section id="usage">
<h2>Usage<a class="headerlink" href="#usage" title="Link to this heading">#</a></h2>
<section id="as-an-api">
<h3>As an API<a class="headerlink" href="#as-an-api" title="Link to this heading">#</a></h3>
<p>The module provided functions for the individual tasks that all return some kind of collection of points, depending on
the specific task.</p>
<section id="redaction-detection-api">
<h4>Redaction Detection (API)<a class="headerlink" href="#redaction-detection-api" title="Link to this heading">#</a></h4>
<p>The below snippet shows hot to find the outlines of previous redactions.</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">cv_analysis.redaction_detection</span> <span class="kn">import</span> <span class="n">find_redactions</span>
<span class="kn">import</span> <span class="nn">pdf2image</span>
<span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="nn">np</span>
<span class="n">pdf_path</span> <span class="o">=</span> <span class="o">...</span>
<span class="n">page_index</span> <span class="o">=</span> <span class="o">...</span>
<span class="n">page</span> <span class="o">=</span> <span class="n">pdf2image</span><span class="o">.</span><span class="n">convert_from_path</span><span class="p">(</span><span class="n">pdf_path</span><span class="p">,</span> <span class="n">first_page</span><span class="o">=</span><span class="n">page_index</span><span class="p">,</span> <span class="n">last_page</span><span class="o">=</span><span class="n">page_index</span><span class="p">)[</span><span class="mi">0</span><span class="p">]</span>
<span class="n">page</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">array</span><span class="p">(</span><span class="n">page</span><span class="p">)</span>
<span class="n">redaction_contours</span> <span class="o">=</span> <span class="n">find_redactions</span><span class="p">(</span><span class="n">page</span><span class="p">)</span>
</pre></div>
</div>
</section>
</section>
</section>
<section id="as-a-cli-tool">
<h2>As a CLI Tool<a class="headerlink" href="#as-a-cli-tool" title="Link to this heading">#</a></h2>
<p>Core API functionalities can be used through a CLI.</p>
<section id="table-parsing">
<h3>Table Parsing<a class="headerlink" href="#table-parsing" title="Link to this heading">#</a></h3>
<p>The tables parsing utility detects and segments tables into individual cells.</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>python<span class="w"> </span>scripts/annotate.py<span class="w"> </span>data/test_pdf.pdf<span class="w"> </span><span class="m">7</span><span class="w"> </span>--type<span class="w"> </span>table
</pre></div>
</div>
<p>The below image shows a parsed table, where each table cell has been detected individually.</p>
<p><img alt="Table Parsing Demonstration" src="_images/table_parsing.png" /></p>
</section>
<section id="redaction-detection-cli">
<h3>Redaction Detection (CLI)<a class="headerlink" href="#redaction-detection-cli" title="Link to this heading">#</a></h3>
<p>The redaction detection utility detects previous redactions in PDFs (filled black rectangles).</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>python<span class="w"> </span>scripts/annotate.py<span class="w"> </span>data/test_pdf.pdf<span class="w"> </span><span class="m">2</span><span class="w"> </span>--type<span class="w"> </span>redaction
</pre></div>
</div>
<p>The below image shows the detected redactions with green outlines.</p>
<p><img alt="Redaction Detection Demonstration" src="_images/redaction_detection.png" /></p>
</section>
<section id="layout-parsing">
<h3>Layout Parsing<a class="headerlink" href="#layout-parsing" title="Link to this heading">#</a></h3>
<p>The layout parsing utility detects elements such as paragraphs, tables and figures.</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>python<span class="w"> </span>scripts/annotate.py<span class="w"> </span>data/test_pdf.pdf<span class="w"> </span><span class="m">7</span><span class="w"> </span>--type<span class="w"> </span>layout
</pre></div>
</div>
<p>The below image shows the detected layout elements on a page.</p>
<p><img alt="Layout Parsing Demonstration" src="_images/layout_parsing.png" /></p>
</section>
<section id="figure-detection">
<h3>Figure Detection<a class="headerlink" href="#figure-detection" title="Link to this heading">#</a></h3>
<p>The figure detection utility detects figures specifically, which can be missed by the generic layout parsing utility.</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>python<span class="w"> </span>scripts/annotate.py<span class="w"> </span>data/test_pdf.pdf<span class="w"> </span><span class="m">3</span><span class="w"> </span>--type<span class="w"> </span>figure
</pre></div>
</div>
<p>The below image shows the detected figure on a page.</p>
<p><img alt="Figure Detection Demonstration" src="_images/figure_detection.png" /></p>
</section>
</section>
<section id="running-as-a-service">
<h2>Running as a service<a class="headerlink" href="#running-as-a-service" title="Link to this heading">#</a></h2>
<section id="building">
<h3>Building<a class="headerlink" href="#building" title="Link to this heading">#</a></h3>
<p>Build base image</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>bash<span class="w"> </span>setup/docker.sh
</pre></div>
</div>
<p>Build head image</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>docker<span class="w"> </span>build<span class="w"> </span>-f<span class="w"> </span>Dockerfile<span class="w"> </span>-t<span class="w"> </span>cv-analysis<span class="w"> </span>.<span class="w"> </span>--build-arg<span class="w"> </span><span class="nv">BASE_ROOT</span><span class="o">=</span><span class="s2">&quot;&quot;</span>
</pre></div>
</div>
</section>
<section id="usage-service">
<h3>Usage (service)<a class="headerlink" href="#usage-service" title="Link to this heading">#</a></h3>
<p>Shell 1</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>docker<span class="w"> </span>run<span class="w"> </span>--rm<span class="w"> </span>--net<span class="o">=</span>host<span class="w"> </span>--rm<span class="w"> </span>cv-analysis
</pre></div>
</div>
<p>Shell 2</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>python<span class="w"> </span>scripts/client_mock.py<span class="w"> </span>--pdf_path<span class="w"> </span>/path/to/a/pdf
</pre></div>
</div>
</section>
</section>
</section>
</article>
<footer class="prev-next-footer">
<div class="prev-next-area">
<a class="left-prev"
href="index.html"
title="previous page">
<i class="fa-solid fa-angle-left"></i>
<div class="prev-next-info">
<p class="prev-next-subtitle">previous</p>
<p class="prev-next-title">Welcome to CV Analysis Service documentation!</p>
</div>
</a>
<a class="right-next"
href="modules/cv_analysis.html"
title="next page">
<div class="prev-next-info">
<p class="prev-next-subtitle">next</p>
<p class="prev-next-title">cv_analysis package</p>
</div>
<i class="fa-solid fa-angle-right"></i>
</a>
</div>
</footer>
</div>
<div class="bd-sidebar-secondary bd-toc"><div class="sidebar-secondary-items sidebar-secondary__inner">
<div class="sidebar-secondary-item">
<div
id="pst-page-navigation-heading-2"
class="page-toc tocsection onthispage">
<i class="fa-solid fa-list"></i> On this page
</div>
<nav class="bd-toc-nav page-toc" aria-labelledby="pst-page-navigation-heading-2">
<ul class="visible nav section-nav flex-column">
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#api">API</a></li>
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#installation">Installation</a></li>
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#usage">Usage</a><ul class="nav section-nav flex-column">
<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#as-an-api">As an API</a><ul class="nav section-nav flex-column">
<li class="toc-h4 nav-item toc-entry"><a class="reference internal nav-link" href="#redaction-detection-api">Redaction Detection (API)</a></li>
</ul>
</li>
</ul>
</li>
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#as-a-cli-tool">As a CLI Tool</a><ul class="nav section-nav flex-column">
<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#table-parsing">Table Parsing</a></li>
<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#redaction-detection-cli">Redaction Detection (CLI)</a></li>
<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#layout-parsing">Layout Parsing</a></li>
<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#figure-detection">Figure Detection</a></li>
</ul>
</li>
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#running-as-a-service">Running as a service</a><ul class="nav section-nav flex-column">
<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#building">Building</a></li>
<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#usage-service">Usage (service)</a></li>
</ul>
</li>
</ul>
</nav></div>
<div class="sidebar-secondary-item">
<div class="tocsection sourcelink">
<a href="_sources/README.md.txt">
<i class="fa-solid fa-file-lines"></i> Show Source
</a>
</div>
</div>
</div></div>
</div>
<footer class="bd-footer-content">
</footer>
</main>
</div>
</div>
<!-- Scripts loaded after <body> so the DOM is not blocked -->
<script src="_static/scripts/bootstrap.js?digest=8d27b9dea8ad943066ae"></script>
<script src="_static/scripts/pydata-sphinx-theme.js?digest=8d27b9dea8ad943066ae"></script>
<footer class="bd-footer">
<div class="bd-footer__inner bd-page-width">
<div class="footer-items__start">
<div class="footer-item">
<p class="copyright">
© Copyright All rights reserved.
<br/>
</p>
</div>
<div class="footer-item">
<p class="sphinx-version">
Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 7.3.7.
<br/>
</p>
</div>
</div>
<div class="footer-items__end">
<div class="footer-item">
<p class="theme-version">
Built with the <a href="https://pydata-sphinx-theme.readthedocs.io/en/stable/index.html">PyData Sphinx Theme</a> 0.15.2.
</p></div>
</div>
</div>
</footer>
</body>
</html>

Binary file not shown.

After

Width:  |  Height:  |  Size: 707 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 568 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 3.2 MiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 566 KiB

178
docs/build/html/_sources/README.md.txt vendored Normal file
View File

@ -0,0 +1,178 @@
# cv-analysis - Visual (CV-Based) Document Parsing
parse_pdf()
This repository implements computer vision based approaches for detecting and parsing visual features such as tables or
previous redactions in documents.
## API
Input message:
```json
{
"targetFilePath": {
"pdf": "absolute file path",
"vlp_output": "absolute file path"
},
"responseFilePath": "absolute file path",
"operation": "table_image_inference"
}
```
Response is uploaded to the storage as specified in the `responseFilePath` field. The structure is as follows:
```json
{
...,
"data": [
{
'pageNum': 0,
'bbox': {
'x1': 55.3407,
'y1': 247.0246,
'x2': 558.5602,
'y2': 598.0585
},
'uuid': '2b10c1a2-393c-4fca-b9e3-0ad5b774ac84',
'label': 'table',
'tableLines': [
{
'x1': 0,
'y1': 16,
'x2': 1399,
'y2': 16
},
...
],
'imageInfo': {
'height': 693,
'width': 1414
}
},
...
]
}
```
## Installation
```bash
git clone ssh://git@git.iqser.com:2222/rr/cv-analysis.git
cd cv-analysis
python -m venv env
source env/bin/activate
pip install -e .
pip install -r requirements.txt
dvc pull
```
## Usage
### As an API
The module provided functions for the individual tasks that all return some kind of collection of points, depending on
the specific task.
#### Redaction Detection (API)
The below snippet shows hot to find the outlines of previous redactions.
```python
from cv_analysis.redaction_detection import find_redactions
import pdf2image
import numpy as np
pdf_path = ...
page_index = ...
page = pdf2image.convert_from_path(pdf_path, first_page=page_index, last_page=page_index)[0]
page = np.array(page)
redaction_contours = find_redactions(page)
```
## As a CLI Tool
Core API functionalities can be used through a CLI.
### Table Parsing
The tables parsing utility detects and segments tables into individual cells.
```bash
python scripts/annotate.py data/test_pdf.pdf 7 --type table
```
The below image shows a parsed table, where each table cell has been detected individually.
![Table Parsing Demonstration](data/table_parsing.png)
### Redaction Detection (CLI)
The redaction detection utility detects previous redactions in PDFs (filled black rectangles).
```bash
python scripts/annotate.py data/test_pdf.pdf 2 --type redaction
```
The below image shows the detected redactions with green outlines.
![Redaction Detection Demonstration](data/redaction_detection.png)
### Layout Parsing
The layout parsing utility detects elements such as paragraphs, tables and figures.
```bash
python scripts/annotate.py data/test_pdf.pdf 7 --type layout
```
The below image shows the detected layout elements on a page.
![Layout Parsing Demonstration](data/layout_parsing.png)
### Figure Detection
The figure detection utility detects figures specifically, which can be missed by the generic layout parsing utility.
```bash
python scripts/annotate.py data/test_pdf.pdf 3 --type figure
```
The below image shows the detected figure on a page.
![Figure Detection Demonstration](data/figure_detection.png)
## Running as a service
### Building
Build base image
```bash
bash setup/docker.sh
```
Build head image
```bash
docker build -f Dockerfile -t cv-analysis . --build-arg BASE_ROOT=""
```
### Usage (service)
Shell 1
```bash
docker run --rm --net=host --rm cv-analysis
```
Shell 2
```bash
python scripts/client_mock.py --pdf_path /path/to/a/pdf
```

37
docs/build/html/_sources/index.rst.txt vendored Normal file
View File

@ -0,0 +1,37 @@
.. Keyword Extraction Service documentation master file, created by
sphinx-quickstart on Mon Sep 12 12:04:24 2022.
You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive.
=============================================
Welcome to CV Analysis Service documentation!
=============================================
.. note::
If you'd like to change the looks of things 👉 https://www.sphinx-doc.org/en/master/usage/restructuredtext/basics.html
Table of Contents
-----------------
.. toctree::
:maxdepth: 3
:caption: README
README.md
.. toctree::
:maxdepth: 3
:caption: Modules
modules/cv_analysis
modules/serve
Indices and tables
==================
* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`

View File

@ -0,0 +1,7 @@
cv\_analysis.config module
==========================
.. automodule:: cv_analysis.config
:members:
:undoc-members:
:show-inheritance:

View File

@ -0,0 +1,7 @@
cv\_analysis.figure\_detection.figure\_detection module
=======================================================
.. automodule:: cv_analysis.figure_detection.figure_detection
:members:
:undoc-members:
:show-inheritance:

View File

@ -0,0 +1,7 @@
cv\_analysis.figure\_detection.figures module
=============================================
.. automodule:: cv_analysis.figure_detection.figures
:members:
:undoc-members:
:show-inheritance:

View File

@ -0,0 +1,17 @@
cv\_analysis.figure\_detection package
======================================
.. automodule:: cv_analysis.figure_detection
:members:
:undoc-members:
:show-inheritance:
Submodules
----------
.. toctree::
:maxdepth: 4
cv_analysis.figure_detection.figure_detection
cv_analysis.figure_detection.figures
cv_analysis.figure_detection.text

View File

@ -0,0 +1,7 @@
cv\_analysis.figure\_detection.text module
==========================================
.. automodule:: cv_analysis.figure_detection.text
:members:
:undoc-members:
:show-inheritance:

View File

@ -0,0 +1,7 @@
cv\_analysis.layout\_parsing module
===================================
.. automodule:: cv_analysis.layout_parsing
:members:
:undoc-members:
:show-inheritance:

View File

@ -0,0 +1,7 @@
cv\_analysis.locations module
=============================
.. automodule:: cv_analysis.locations
:members:
:undoc-members:
:show-inheritance:

View File

@ -0,0 +1,7 @@
cv\_analysis.redaction\_detection module
========================================
.. automodule:: cv_analysis.redaction_detection
:members:
:undoc-members:
:show-inheritance:

View File

@ -0,0 +1,30 @@
cv\_analysis package
====================
.. automodule:: cv_analysis
:members:
:undoc-members:
:show-inheritance:
Subpackages
-----------
.. toctree::
:maxdepth: 4
cv_analysis.figure_detection
cv_analysis.server
cv_analysis.utils
Submodules
----------
.. toctree::
:maxdepth: 4
cv_analysis.config
cv_analysis.layout_parsing
cv_analysis.locations
cv_analysis.redaction_detection
cv_analysis.table_inference
cv_analysis.table_parsing

View File

@ -0,0 +1,7 @@
cv\_analysis.server.pipeline module
===================================
.. automodule:: cv_analysis.server.pipeline
:members:
:undoc-members:
:show-inheritance:

View File

@ -0,0 +1,15 @@
cv\_analysis.server package
===========================
.. automodule:: cv_analysis.server
:members:
:undoc-members:
:show-inheritance:
Submodules
----------
.. toctree::
:maxdepth: 4
cv_analysis.server.pipeline

View File

@ -0,0 +1,7 @@
cv\_analysis.table\_inference module
====================================
.. automodule:: cv_analysis.table_inference
:members:
:undoc-members:
:show-inheritance:

View File

@ -0,0 +1,7 @@
cv\_analysis.table\_parsing module
==================================
.. automodule:: cv_analysis.table_parsing
:members:
:undoc-members:
:show-inheritance:

View File

@ -0,0 +1,7 @@
cv\_analysis.utils.annotate module
==================================
.. automodule:: cv_analysis.utils.annotate
:members:
:undoc-members:
:show-inheritance:

View File

@ -0,0 +1,7 @@
cv\_analysis.utils.banner module
================================
.. automodule:: cv_analysis.utils.banner
:members:
:undoc-members:
:show-inheritance:

View File

@ -0,0 +1,7 @@
cv\_analysis.utils.connect\_rects module
========================================
.. automodule:: cv_analysis.utils.connect_rects
:members:
:undoc-members:
:show-inheritance:

View File

@ -0,0 +1,7 @@
cv\_analysis.utils.display module
=================================
.. automodule:: cv_analysis.utils.display
:members:
:undoc-members:
:show-inheritance:

View File

@ -0,0 +1,7 @@
cv\_analysis.utils.draw module
==============================
.. automodule:: cv_analysis.utils.draw
:members:
:undoc-members:
:show-inheritance:

View File

@ -0,0 +1,7 @@
cv\_analysis.utils.filters module
=================================
.. automodule:: cv_analysis.utils.filters
:members:
:undoc-members:
:show-inheritance:

View File

@ -0,0 +1,7 @@
cv\_analysis.utils.image\_extraction module
===========================================
.. automodule:: cv_analysis.utils.image_extraction
:members:
:undoc-members:
:show-inheritance:

View File

@ -0,0 +1,7 @@
cv\_analysis.utils.open\_pdf module
===================================
.. automodule:: cv_analysis.utils.open_pdf
:members:
:undoc-members:
:show-inheritance:

View File

@ -0,0 +1,7 @@
cv\_analysis.utils.postprocessing module
========================================
.. automodule:: cv_analysis.utils.postprocessing
:members:
:undoc-members:
:show-inheritance:

View File

@ -0,0 +1,7 @@
cv\_analysis.utils.preprocessing module
=======================================
.. automodule:: cv_analysis.utils.preprocessing
:members:
:undoc-members:
:show-inheritance:

View File

@ -0,0 +1,28 @@
cv\_analysis.utils package
==========================
.. automodule:: cv_analysis.utils
:members:
:undoc-members:
:show-inheritance:
Submodules
----------
.. toctree::
:maxdepth: 4
cv_analysis.utils.annotate
cv_analysis.utils.banner
cv_analysis.utils.connect_rects
cv_analysis.utils.display
cv_analysis.utils.draw
cv_analysis.utils.filters
cv_analysis.utils.image_extraction
cv_analysis.utils.open_pdf
cv_analysis.utils.postprocessing
cv_analysis.utils.preprocessing
cv_analysis.utils.structures
cv_analysis.utils.test_metrics
cv_analysis.utils.utils
cv_analysis.utils.visual_logging

View File

@ -0,0 +1,7 @@
cv\_analysis.utils.structures module
====================================
.. automodule:: cv_analysis.utils.structures
:members:
:undoc-members:
:show-inheritance:

View File

@ -0,0 +1,7 @@
cv\_analysis.utils.test\_metrics module
=======================================
.. automodule:: cv_analysis.utils.test_metrics
:members:
:undoc-members:
:show-inheritance:

View File

@ -0,0 +1,7 @@
cv\_analysis.utils.utils module
===============================
.. automodule:: cv_analysis.utils.utils
:members:
:undoc-members:
:show-inheritance:

View File

@ -0,0 +1,7 @@
cv\_analysis.utils.visual\_logging module
=========================================
.. automodule:: cv_analysis.utils.visual_logging
:members:
:undoc-members:
:show-inheritance:

View File

@ -0,0 +1,7 @@
serve module
============
.. automodule:: serve
:members:
:undoc-members:
:show-inheritance:

925
docs/build/html/_static/basic.css vendored Normal file
View File

@ -0,0 +1,925 @@
/*
* basic.css
* ~~~~~~~~~
*
* Sphinx stylesheet -- basic theme.
*
* :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
* :license: BSD, see LICENSE for details.
*
*/
/* -- main layout ----------------------------------------------------------- */
div.clearer {
clear: both;
}
div.section::after {
display: block;
content: '';
clear: left;
}
/* -- relbar ---------------------------------------------------------------- */
div.related {
width: 100%;
font-size: 90%;
}
div.related h3 {
display: none;
}
div.related ul {
margin: 0;
padding: 0 0 0 10px;
list-style: none;
}
div.related li {
display: inline;
}
div.related li.right {
float: right;
margin-right: 5px;
}
/* -- sidebar --------------------------------------------------------------- */
div.sphinxsidebarwrapper {
padding: 10px 5px 0 10px;
}
div.sphinxsidebar {
float: left;
width: 270px;
margin-left: -100%;
font-size: 90%;
word-wrap: break-word;
overflow-wrap : break-word;
}
div.sphinxsidebar ul {
list-style: none;
}
div.sphinxsidebar ul ul,
div.sphinxsidebar ul.want-points {
margin-left: 20px;
list-style: square;
}
div.sphinxsidebar ul ul {
margin-top: 0;
margin-bottom: 0;
}
div.sphinxsidebar form {
margin-top: 10px;
}
div.sphinxsidebar input {
border: 1px solid #98dbcc;
font-family: sans-serif;
font-size: 1em;
}
div.sphinxsidebar #searchbox form.search {
overflow: hidden;
}
div.sphinxsidebar #searchbox input[type="text"] {
float: left;
width: 80%;
padding: 0.25em;
box-sizing: border-box;
}
div.sphinxsidebar #searchbox input[type="submit"] {
float: left;
width: 20%;
border-left: none;
padding: 0.25em;
box-sizing: border-box;
}
img {
border: 0;
max-width: 100%;
}
/* -- search page ----------------------------------------------------------- */
ul.search {
margin: 10px 0 0 20px;
padding: 0;
}
ul.search li {
padding: 5px 0 5px 20px;
background-image: url(file.png);
background-repeat: no-repeat;
background-position: 0 7px;
}
ul.search li a {
font-weight: bold;
}
ul.search li p.context {
color: #888;
margin: 2px 0 0 30px;
text-align: left;
}
ul.keywordmatches li.goodmatch a {
font-weight: bold;
}
/* -- index page ------------------------------------------------------------ */
table.contentstable {
width: 90%;
margin-left: auto;
margin-right: auto;
}
table.contentstable p.biglink {
line-height: 150%;
}
a.biglink {
font-size: 1.3em;
}
span.linkdescr {
font-style: italic;
padding-top: 5px;
font-size: 90%;
}
/* -- general index --------------------------------------------------------- */
table.indextable {
width: 100%;
}
table.indextable td {
text-align: left;
vertical-align: top;
}
table.indextable ul {
margin-top: 0;
margin-bottom: 0;
list-style-type: none;
}
table.indextable > tbody > tr > td > ul {
padding-left: 0em;
}
table.indextable tr.pcap {
height: 10px;
}
table.indextable tr.cap {
margin-top: 10px;
background-color: #f2f2f2;
}
img.toggler {
margin-right: 3px;
margin-top: 3px;
cursor: pointer;
}
div.modindex-jumpbox {
border-top: 1px solid #ddd;
border-bottom: 1px solid #ddd;
margin: 1em 0 1em 0;
padding: 0.4em;
}
div.genindex-jumpbox {
border-top: 1px solid #ddd;
border-bottom: 1px solid #ddd;
margin: 1em 0 1em 0;
padding: 0.4em;
}
/* -- domain module index --------------------------------------------------- */
table.modindextable td {
padding: 2px;
border-collapse: collapse;
}
/* -- general body styles --------------------------------------------------- */
div.body {
min-width: 360px;
max-width: 800px;
}
div.body p, div.body dd, div.body li, div.body blockquote {
-moz-hyphens: auto;
-ms-hyphens: auto;
-webkit-hyphens: auto;
hyphens: auto;
}
a.headerlink {
visibility: hidden;
}
a:visited {
color: #551A8B;
}
h1:hover > a.headerlink,
h2:hover > a.headerlink,
h3:hover > a.headerlink,
h4:hover > a.headerlink,
h5:hover > a.headerlink,
h6:hover > a.headerlink,
dt:hover > a.headerlink,
caption:hover > a.headerlink,
p.caption:hover > a.headerlink,
div.code-block-caption:hover > a.headerlink {
visibility: visible;
}
div.body p.caption {
text-align: inherit;
}
div.body td {
text-align: left;
}
.first {
margin-top: 0 !important;
}
p.rubric {
margin-top: 30px;
font-weight: bold;
}
img.align-left, figure.align-left, .figure.align-left, object.align-left {
clear: left;
float: left;
margin-right: 1em;
}
img.align-right, figure.align-right, .figure.align-right, object.align-right {
clear: right;
float: right;
margin-left: 1em;
}
img.align-center, figure.align-center, .figure.align-center, object.align-center {
display: block;
margin-left: auto;
margin-right: auto;
}
img.align-default, figure.align-default, .figure.align-default {
display: block;
margin-left: auto;
margin-right: auto;
}
.align-left {
text-align: left;
}
.align-center {
text-align: center;
}
.align-default {
text-align: center;
}
.align-right {
text-align: right;
}
/* -- sidebars -------------------------------------------------------------- */
div.sidebar,
aside.sidebar {
margin: 0 0 0.5em 1em;
border: 1px solid #ddb;
padding: 7px;
background-color: #ffe;
width: 40%;
float: right;
clear: right;
overflow-x: auto;
}
p.sidebar-title {
font-weight: bold;
}
nav.contents,
aside.topic,
div.admonition, div.topic, blockquote {
clear: left;
}
/* -- topics ---------------------------------------------------------------- */
nav.contents,
aside.topic,
div.topic {
border: 1px solid #ccc;
padding: 7px;
margin: 10px 0 10px 0;
}
p.topic-title {
font-size: 1.1em;
font-weight: bold;
margin-top: 10px;
}
/* -- admonitions ----------------------------------------------------------- */
div.admonition {
margin-top: 10px;
margin-bottom: 10px;
padding: 7px;
}
div.admonition dt {
font-weight: bold;
}
p.admonition-title {
margin: 0px 10px 5px 0px;
font-weight: bold;
}
div.body p.centered {
text-align: center;
margin-top: 25px;
}
/* -- content of sidebars/topics/admonitions -------------------------------- */
div.sidebar > :last-child,
aside.sidebar > :last-child,
nav.contents > :last-child,
aside.topic > :last-child,
div.topic > :last-child,
div.admonition > :last-child {
margin-bottom: 0;
}
div.sidebar::after,
aside.sidebar::after,
nav.contents::after,
aside.topic::after,
div.topic::after,
div.admonition::after,
blockquote::after {
display: block;
content: '';
clear: both;
}
/* -- tables ---------------------------------------------------------------- */
table.docutils {
margin-top: 10px;
margin-bottom: 10px;
border: 0;
border-collapse: collapse;
}
table.align-center {
margin-left: auto;
margin-right: auto;
}
table.align-default {
margin-left: auto;
margin-right: auto;
}
table caption span.caption-number {
font-style: italic;
}
table caption span.caption-text {
}
table.docutils td, table.docutils th {
padding: 1px 8px 1px 5px;
border-top: 0;
border-left: 0;
border-right: 0;
border-bottom: 1px solid #aaa;
}
th {
text-align: left;
padding-right: 5px;
}
table.citation {
border-left: solid 1px gray;
margin-left: 1px;
}
table.citation td {
border-bottom: none;
}
th > :first-child,
td > :first-child {
margin-top: 0px;
}
th > :last-child,
td > :last-child {
margin-bottom: 0px;
}
/* -- figures --------------------------------------------------------------- */
div.figure, figure {
margin: 0.5em;
padding: 0.5em;
}
div.figure p.caption, figcaption {
padding: 0.3em;
}
div.figure p.caption span.caption-number,
figcaption span.caption-number {
font-style: italic;
}
div.figure p.caption span.caption-text,
figcaption span.caption-text {
}
/* -- field list styles ----------------------------------------------------- */
table.field-list td, table.field-list th {
border: 0 !important;
}
.field-list ul {
margin: 0;
padding-left: 1em;
}
.field-list p {
margin: 0;
}
.field-name {
-moz-hyphens: manual;
-ms-hyphens: manual;
-webkit-hyphens: manual;
hyphens: manual;
}
/* -- hlist styles ---------------------------------------------------------- */
table.hlist {
margin: 1em 0;
}
table.hlist td {
vertical-align: top;
}
/* -- object description styles --------------------------------------------- */
.sig {
font-family: 'Consolas', 'Menlo', 'DejaVu Sans Mono', 'Bitstream Vera Sans Mono', monospace;
}
.sig-name, code.descname {
background-color: transparent;
font-weight: bold;
}
.sig-name {
font-size: 1.1em;
}
code.descname {
font-size: 1.2em;
}
.sig-prename, code.descclassname {
background-color: transparent;
}
.optional {
font-size: 1.3em;
}
.sig-paren {
font-size: larger;
}
.sig-param.n {
font-style: italic;
}
/* C++ specific styling */
.sig-inline.c-texpr,
.sig-inline.cpp-texpr {
font-family: unset;
}
.sig.c .k, .sig.c .kt,
.sig.cpp .k, .sig.cpp .kt {
color: #0033B3;
}
.sig.c .m,
.sig.cpp .m {
color: #1750EB;
}
.sig.c .s, .sig.c .sc,
.sig.cpp .s, .sig.cpp .sc {
color: #067D17;
}
/* -- other body styles ----------------------------------------------------- */
ol.arabic {
list-style: decimal;
}
ol.loweralpha {
list-style: lower-alpha;
}
ol.upperalpha {
list-style: upper-alpha;
}
ol.lowerroman {
list-style: lower-roman;
}
ol.upperroman {
list-style: upper-roman;
}
:not(li) > ol > li:first-child > :first-child,
:not(li) > ul > li:first-child > :first-child {
margin-top: 0px;
}
:not(li) > ol > li:last-child > :last-child,
:not(li) > ul > li:last-child > :last-child {
margin-bottom: 0px;
}
ol.simple ol p,
ol.simple ul p,
ul.simple ol p,
ul.simple ul p {
margin-top: 0;
}
ol.simple > li:not(:first-child) > p,
ul.simple > li:not(:first-child) > p {
margin-top: 0;
}
ol.simple p,
ul.simple p {
margin-bottom: 0;
}
aside.footnote > span,
div.citation > span {
float: left;
}
aside.footnote > span:last-of-type,
div.citation > span:last-of-type {
padding-right: 0.5em;
}
aside.footnote > p {
margin-left: 2em;
}
div.citation > p {
margin-left: 4em;
}
aside.footnote > p:last-of-type,
div.citation > p:last-of-type {
margin-bottom: 0em;
}
aside.footnote > p:last-of-type:after,
div.citation > p:last-of-type:after {
content: "";
clear: both;
}
dl.field-list {
display: grid;
grid-template-columns: fit-content(30%) auto;
}
dl.field-list > dt {
font-weight: bold;
word-break: break-word;
padding-left: 0.5em;
padding-right: 5px;
}
dl.field-list > dd {
padding-left: 0.5em;
margin-top: 0em;
margin-left: 0em;
margin-bottom: 0em;
}
dl {
margin-bottom: 15px;
}
dd > :first-child {
margin-top: 0px;
}
dd ul, dd table {
margin-bottom: 10px;
}
dd {
margin-top: 3px;
margin-bottom: 10px;
margin-left: 30px;
}
.sig dd {
margin-top: 0px;
margin-bottom: 0px;
}
.sig dl {
margin-top: 0px;
margin-bottom: 0px;
}
dl > dd:last-child,
dl > dd:last-child > :last-child {
margin-bottom: 0;
}
dt:target, span.highlighted {
background-color: #fbe54e;
}
rect.highlighted {
fill: #fbe54e;
}
dl.glossary dt {
font-weight: bold;
font-size: 1.1em;
}
.versionmodified {
font-style: italic;
}
.system-message {
background-color: #fda;
padding: 5px;
border: 3px solid red;
}
.footnote:target {
background-color: #ffa;
}
.line-block {
display: block;
margin-top: 1em;
margin-bottom: 1em;
}
.line-block .line-block {
margin-top: 0;
margin-bottom: 0;
margin-left: 1.5em;
}
.guilabel, .menuselection {
font-family: sans-serif;
}
.accelerator {
text-decoration: underline;
}
.classifier {
font-style: oblique;
}
.classifier:before {
font-style: normal;
margin: 0 0.5em;
content: ":";
display: inline-block;
}
abbr, acronym {
border-bottom: dotted 1px;
cursor: help;
}
.translated {
background-color: rgba(207, 255, 207, 0.2)
}
.untranslated {
background-color: rgba(255, 207, 207, 0.2)
}
/* -- code displays --------------------------------------------------------- */
pre {
overflow: auto;
overflow-y: hidden; /* fixes display issues on Chrome browsers */
}
pre, div[class*="highlight-"] {
clear: both;
}
span.pre {
-moz-hyphens: none;
-ms-hyphens: none;
-webkit-hyphens: none;
hyphens: none;
white-space: nowrap;
}
div[class*="highlight-"] {
margin: 1em 0;
}
td.linenos pre {
border: 0;
background-color: transparent;
color: #aaa;
}
table.highlighttable {
display: block;
}
table.highlighttable tbody {
display: block;
}
table.highlighttable tr {
display: flex;
}
table.highlighttable td {
margin: 0;
padding: 0;
}
table.highlighttable td.linenos {
padding-right: 0.5em;
}
table.highlighttable td.code {
flex: 1;
overflow: hidden;
}
.highlight .hll {
display: block;
}
div.highlight pre,
table.highlighttable pre {
margin: 0;
}
div.code-block-caption + div {
margin-top: 0;
}
div.code-block-caption {
margin-top: 1em;
padding: 2px 5px;
font-size: small;
}
div.code-block-caption code {
background-color: transparent;
}
table.highlighttable td.linenos,
span.linenos,
div.highlight span.gp { /* gp: Generic.Prompt */
user-select: none;
-webkit-user-select: text; /* Safari fallback only */
-webkit-user-select: none; /* Chrome/Safari */
-moz-user-select: none; /* Firefox */
-ms-user-select: none; /* IE10+ */
}
div.code-block-caption span.caption-number {
padding: 0.1em 0.3em;
font-style: italic;
}
div.code-block-caption span.caption-text {
}
div.literal-block-wrapper {
margin: 1em 0;
}
code.xref, a code {
background-color: transparent;
font-weight: bold;
}
h1 code, h2 code, h3 code, h4 code, h5 code, h6 code {
background-color: transparent;
}
.viewcode-link {
float: right;
}
.viewcode-back {
float: right;
font-family: sans-serif;
}
div.viewcode-block:target {
margin: -1px -10px;
padding: 0 10px;
}
/* -- math display ---------------------------------------------------------- */
img.math {
vertical-align: middle;
}
div.body div.math p {
text-align: center;
}
span.eqno {
float: right;
}
span.eqno a.headerlink {
position: absolute;
z-index: 1;
}
div.math:hover a.headerlink {
visibility: visible;
}
/* -- printout stylesheet --------------------------------------------------- */
@media print {
div.document,
div.documentwrapper,
div.bodywrapper {
margin: 0 !important;
width: 100%;
}
div.sphinxsidebar,
div.related,
div.footer,
#top-link {
display: none;
}
}

Some files were not shown because too many files have changed in this diff Show More