330 Commits

Author SHA1 Message Date
Kilian Schuettler
6e5b1f1978 RED-9139: move document to module in redaction-service
* add feature version
2024-11-14 16:39:48 +01:00
Kilian Schuettler
96acefed78 RED-9139: move document to module in redaction-service
* add TableOfContents node
2024-11-14 16:39:48 +01:00
Kilian Schuettler
366241e6c6 RED-9139: move document to module in redaction-service
* add TableOfContents node
2024-11-14 16:39:48 +01:00
Kilian Schuettler
7f472ccc52 RED-9139: move document to module in redaction-service
* add TableOfContents node
2024-11-14 16:39:48 +01:00
Kilian Schuettler
6f807c7d94 RED-9139: add new TableOfContents Node
* rename previous TableOfContent to SectionTree
* added protobuf compile script
2024-11-14 16:39:48 +01:00
Kilian Schuettler
6e04c15f3d RED-9139: add new TableOfContents Node
* rename previous TableOfContent to SectionTree
* added protobuf compile script
2024-11-14 16:39:48 +01:00
Kilian Schuettler
1384584e2f RED-9139: more robust TOC detection
* detect numbers in words, and not just whole words that are numbers
2024-11-14 16:39:46 +01:00
Kilian Schuettler
e58011e111 RED-9139: more robust TOC detection
* detect numbers in words, and not just whole words that are numbers
2024-11-14 16:39:21 +01:00
Kilian Schüttler
7ee1f9e360 RED-9139: more robust TOC detection 2024-11-13 10:54:39 +01:00
Kilian Schüttler
c90874da7a RED-10249: regex found incorrectly due to wrong text sorting 2024-11-04 12:51:37 +01:00
Kilian Schüttler
4683c696a5 Merge branch 'RED-10247' into 'main'
RED-10247: dictionary entry not found in footer due to wrong text sorting

See merge request fforesight/layout-parser!251
2024-10-25 18:30:35 +02:00
Kilian Schuettler
95c02ce3cf RED-10247: dictionary entry not found in footer due to wrong text sorting 2024-10-25 17:18:14 +02:00
Kilian Schuettler
65c1f03ea3 RED-10270: fix NumberFormatException 2024-10-24 10:59:05 +02:00
Kilian Schüttler
af05218e37 RED-10127: rename TextPositionSequence to Word 2024-10-18 12:20:15 +02:00
Kilian Schüttler
c64445d54b Hotfix 2024-10-18 12:12:15 +02:00
Kilian Schuettler
5f04b45554 RED-10127: add more units 2024-10-15 09:47:39 +02:00
Kilian Schuettler
9d2596e5ef RED-10127: improve list classification
* add one more format to list identification
* add 'ppb' to known units
* special case for headlines continuing with 14C after the identifier (quite often in some specific files)
2024-10-14 17:21:44 +02:00
Kilian Schüttler
7b073eb4f3 RED-10127: add list classification 2024-10-10 10:50:10 +02:00
Kilian Schüttler
6c7442ac6d RED-10127: improve headline detection 2024-10-09 08:48:48 +02:00
Maverick Studer
9d1ffdd779 RM-187: Footers are recognized in the middle of the page 2024-10-08 14:27:44 +02:00
Maverick Studer
fe2ed1807e RED-9123: Improve performance of re-analysis (Spike) 2024-10-07 12:28:10 +02:00
Maverick Studer
8a80abfff1 RED-9010: remove redaction log 2024-09-19 11:34:32 +02:00
Dominique Eifländer
4f40c9dbc9 RED-9975: Fixed missing section numbers in layout grid 2024-09-18 11:22:37 +02:00
Kilian Schüttler
469da38952 Red 9974: improce headline classification, fix font size calculation 2024-09-16 14:06:48 +02:00
Kilian Schuettler
8e165a41d7 hotfix: viewerDocService doesn't remove existing marked content 2024-09-11 16:34:21 +02:00
Kilian Schüttler
393103e074 RED-9975: improve SuperSection handling 2024-09-11 13:38:09 +02:00
Dominique Eifländer
fec19f4afb RED-9976: Removed sorting that scrambles text in PDFTextStripper 2024-09-10 12:50:37 +02:00
Kilian Schüttler
519e95735c Hotfix: unmerge super large tables 2024-09-05 15:05:21 +02:00
Maverick Studer
46ea7edc4c RED-9942: File only with images not recognised 2024-09-05 10:49:12 +02:00
Kilian Schuettler
ce628a99f7 hotfix: add Java advanced imaging 2024-09-04 15:18:12 +02:00
Maverick Studer
dc892d0fec RED-9524: File processing does not annotate images 2024-09-04 13:27:06 +02:00
Kilian Schuettler
befb6b1df6 RED-9964: fix errors with images 2024-09-03 16:37:48 +02:00
maverickstuder
4a06059258 Update tenant-commons for dlq fix 2024-09-03 13:15:08 +02:00
Dominique Eifländer
7c2db6c3c5 RED-9988: Fixed NPE when image representation is not present 2024-09-02 09:51:59 +02:00
Kilian Schüttler
8e14b74da2 Red 9975: fix outline detection 2024-09-02 09:02:36 +02:00
Kilian Schüttler
c5178ea5c2 RED-9964: don't merge tables on non-consecutive pages 2024-08-30 14:00:48 +02:00
Dominique Eifländer
bb40345f79 RED-9974: Improved headline detection for documine old 2024-08-30 10:36:22 +02:00
Kilian Schuettler
f6ca5a3c17 RED-9975: activate outline detection 2024-08-29 14:18:29 +02:00
Maverick Studer
15e3dced35 Merge branch 'tenants-retry' into 'main'
Tenants retry logic and queue renames

See merge request fforesight/layout-parser!197
2024-08-29 13:46:54 +02:00
Maverick Studer
933054b332 Tenants retry logic and queue renames 2024-08-29 13:46:54 +02:00
Kilian Schuettler
8626b106d0 RED-9975: activate outline detection 2024-08-29 12:16:07 +02:00
Maverick Studer
3b33405cbf RED-9331: Explore possibilities for fair upload / analysis processing per tenant 2024-08-27 09:27:37 +02:00
Maverick Studer
62e07686d7 RED-9918: Azure entity recognition (Spike) 2024-08-26 14:34:46 +02:00
Dominique Eifländer
81469413b0 RED-9760: Fixed nullpointer in TextPageBlock 2024-08-13 13:18:50 +02:00
Kilian Schüttler
8e115dcd8a RED-9760: change compareDouble to something sensible 2024-08-12 16:02:50 +02:00
Kilian Schuettler
b0ae00aa02 hotfix: threshold adjustements 2024-08-12 14:52:18 +02:00
Kilian Schuettler
d16377a24a hotfix: line comparison with center coordinates 2024-08-09 15:45:23 +02:00
Dominique Eifländer
1953b5924f RED-9760: Changed lineSeparation threshold for documine old 2024-08-09 14:42:14 +02:00
Kilian Schüttler
69bcd4f68d hotfix reading order 2024-08-09 11:49:12 +02:00
Timo Bejan
cdc2081785 CLARI-140 - case issue 2024-08-08 22:40:11 +03:00