maverickstuder
6a0661cf09
RED-7074: Design Subsection section tree structure algorithm
...
* bugfix
2024-05-15 13:51:49 +02:00
maverickstuder
2d33615b94
RED-7074: Design Subsection section tree structure algorithm
...
* added redactmanager logic for headline classification to documine and clarifynd
* refactored headline classification
* added supersection for non-leaf sections (containing other sections instead of only paragraphs, images, ...)
* bugfix for certain edge cases in some files running into error state
2024-05-15 10:29:39 +02:00
maverickstuder
1856fed640
RED-7074: Design Subsection section tree structure algorithm
...
* improved merging of headlines as well as splitting logic so that more headlines are detected correctly
2024-05-14 17:41:44 +02:00
maverickstuder
2fcaeb3d8c
RED-7074: Design Subsection section tree structure algorithm
...
* added supersection and changed logic so that each normal section only contains leaf nodes
* added SectionIdentifier logic for headline splitting and merging
* fixed many edge cases which resulted in error state files
2024-05-14 10:51:05 +02:00
maverickstuder
4e07ba4ff1
RED-7074: Design Subsection section tree structure algorithm
...
* import optimized
2024-05-08 14:16:29 +02:00
maverickstuder
cfb6f0acfa
RED-7074: Design Subsection section tree structure algorithm
...
* lots of refactoring to splitting logic for text blocks which resulted in some empty blocks to be created which can then not be localized (i.e. by containsBlock)
2024-05-08 14:15:27 +02:00
maverickstuder
a9338262c5
RED-7074: Design Subsection section tree structure algorithm
...
* fix for boundary error
2024-05-07 15:51:54 +02:00
maverickstuder
d2dc369df3
RED-7074: Design Subsection section tree structure algorithm
...
* temp
2024-05-07 14:25:54 +02:00
maverickstuder
f7aeb9a406
RED-7074: Design Subsection section tree structure algorithm
...
* refactoring
2024-05-02 10:36:36 +02:00
maverickstuder
9bf2f5c56c
Merge remote-tracking branch 'origin/RED-7074' into RED-7074
...
# Conflicts:
# layoutparser-service/layoutparser-service-processor/src/main/java/com/knecon/fforesight/service/layoutparser/processor/LayoutParsingPipeline.java
# layoutparser-service/layoutparser-service-processor/src/main/java/com/knecon/fforesight/service/layoutparser/processor/model/ClassificationDocument.java
# layoutparser-service/layoutparser-service-processor/src/main/java/com/knecon/fforesight/service/layoutparser/processor/model/outline/OutlineValidationService.java
# layoutparser-service/layoutparser-service-processor/src/main/java/com/knecon/fforesight/service/layoutparser/processor/model/outline/TableOfContentItem.java
# layoutparser-service/layoutparser-service-processor/src/main/java/com/knecon/fforesight/service/layoutparser/processor/model/outline/TableOfContents.java
# layoutparser-service/layoutparser-service-processor/src/main/java/com/knecon/fforesight/service/layoutparser/processor/services/classification/RedactManagerClassificationService.java
# layoutparser-service/layoutparser-service-server/src/test/java/com/knecon/fforesight/service/layoutparser/server/graph/ViewerDocumentTest.java
# layoutparser-service/layoutparser-service-server/src/test/resources/files/new/UTT-Books-53.pdf
2024-04-30 14:44:26 +02:00
maverickstuder
c071a133e6
RED-7074: Design Subsection section tree structure algorithm
...
* added toc enrichment logic and changed section computation to build upon created toc
2024-04-30 14:41:17 +02:00
maverickstuder
9f9ea68706
RED-7074: Design Subsection section tree structure algorithm
...
* first draft: further implementations
2024-04-29 15:00:49 +02:00
maverickstuder
85e3cf0ecc
RED-7074: Design Subsection section tree structure algorithm
...
* first draft: further implementations
2024-04-29 15:00:49 +02:00
maverickstuder
17756f5977
RED-7074: Design Subsection section tree structure algorithm
...
* first draft: further implementations
2024-04-29 15:00:48 +02:00
maverickstuder
59d9d6c3e6
RED-7074: Design Subsection section tree structure algorithm
...
* first draft: further implementations
2024-04-29 15:00:34 +02:00
maverickstuder
c888746761
RED-7074: Design Subsection section tree structure algorithm
...
* first draft: further implementations
2024-04-29 15:00:34 +02:00
maverickstuder
7279d0a870
RED-7074: Design Subsection section tree structure algorithm
...
* first draft
2024-04-29 15:00:34 +02:00
maverickstuder
c84a199f9d
RED-7074: Design Subsection section tree structure algorithm
...
* first draft
2024-04-29 15:00:32 +02:00
Corina Olariu
4e7c3f584b
RED-8992 - Enable to add annotation on header with line breaks
...
- don't reorder textblocks classified as headers and footers
- add unit test
2024-04-25 11:23:10 +03:00
Dominique Eifländer
8442e60055
RED-8932 Fixed not merged headline with identifier
2024-04-24 11:45:38 +02:00
Dominique Eifländer
58acbab85f
Merge branch 'RED-8826' into 'main'
...
Red 8826
See merge request fforesight/layout-parser!138
2024-04-23 13:12:51 +02:00
Kilian Schüttler
c1afe9b11f
Red 7384
2024-04-23 12:13:19 +02:00
Dominique Eifländer
683f7f1fb8
RED-8826: Do not classify textblocks in graphics as headlines
2024-04-23 09:28:28 +02:00
Dominique Eifländer
b53930328a
RED-8826: Implemented graphics detection
2024-04-19 15:05:17 +02:00
maverickstuder
09148960cf
RED-7074: Design Subsection section tree structure algorithm
...
* first draft: further implementations
2024-04-19 11:31:34 +02:00
maverickstuder
77ee8dd5bd
RED-7074: Design Subsection section tree structure algorithm
...
* first draft: further implementations
2024-04-18 17:52:33 +02:00
Kilian Schuettler
f256f9b30f
RED-8995: unclassified text might be missing from document data
...
* treat TablePageBlock.OTHER like PARAGRAPH (no special treatment)
2024-04-18 17:42:34 +02:00
maverickstuder
e9d1bdc94f
RED-7074: Design Subsection section tree structure algorithm
...
* first draft: further implementations
2024-04-17 14:31:48 +02:00
maverickstuder
894355c7cd
RED-7074: Design Subsection section tree structure algorithm
...
* first draft: further implementations
2024-04-16 12:35:26 +02:00
maverickstuder
ca35feeb63
RED-7074: Design Subsection section tree structure algorithm
...
* first draft: further implementations
2024-04-15 16:43:40 +02:00
maverickstuder
a32a43fc62
RED-7074: Design Subsection section tree structure algorithm
...
* first draft
2024-04-10 12:28:42 +02:00
maverickstuder
7f675b41cf
RED-7074: Design Subsection section tree structure algorithm
...
* first draft
2024-04-09 16:53:57 +02:00
Kilian Schüttler
c4d9c5df02
Merge branch 'RED-8747-fp' into 'main'
...
RED-8747 - Entities not merged properly - fp
See merge request fforesight/layout-parser!131
2024-04-09 16:30:02 +02:00
Corina Olariu
976f408237
RED-8747 - Entities not merged properly - fp
...
- rework the extraction of rulings from the table cells
2024-04-09 14:38:48 +03:00
Corina Olariu
014eba9fc3
RED-8747 - Entities not merged properly - fp
...
- fix typo
- add validate table test
2024-04-09 12:14:57 +03:00
yhampe
c13ff7fbf6
RED-8402: Header and footer are not indexed / searched
...
checkstyle
added review comments
2024-04-08 12:17:49 +02:00
yhampe
0c3194276a
RED-8402: Header and footer are not indexed / searched
...
added headers and footers to simplifiedtext
2024-04-08 12:02:36 +02:00
Corina Olariu
f185b13f2b
RED-8747 - Entities not merged properly - fp
...
- use the rullings from the found tables instead of all rullings as splitting rullings in the blockification service
2024-04-08 09:42:32 +03:00
Dominique Eifländer
990c376ce6
Merge branch 'RED-8873' into 'main'
...
RED-8773 - Fix images not appearing on specific file
See merge request fforesight/layout-parser!123
2024-04-05 10:11:23 +02:00
Kilian Schuettler
f18bda1d4e
RED-8799: LayoutGrid is wrong draw for some tables
2024-04-04 13:33:22 +02:00
Andrei Isvoran
456b8fe4a1
RED-8773 - Fix images not appearing on specific file
2024-04-03 10:20:46 +03:00
maverickstuder
9778ece992
RED-8702: Explore document databases to store entityLog
...
* fix for duplicate images in document structure that are linked to multiple sections
2024-04-02 14:19:14 +02:00
Timo Bejan
5c1708f97f
Issue with merging text blocks multiple times
2024-03-22 12:47:05 +02:00
Dominique Eifländer
8e7e588d26
RED-8627: Fixed scrambled text after sorting
2024-03-19 10:58:36 +01:00
Dominique Eifländer
1d765a6baa
RED-7141: Fixed more overlap problems
2024-03-14 16:30:52 +01:00
Dominique Eifländer
27aa418029
RED-7141: Fixed overlapping blocks
2024-03-13 16:14:55 +01:00
Dominique Eifländer
92fd1a72de
RED-7141: Readded lost mergeLinesInZones
2024-03-12 13:42:40 +01:00
Dominique Eifländer
0d3d25e7d7
Merge branch 'RED-7141-hotfix' into 'main'
...
RED-7141: Align backend text sorting with Webviewer sorting
See merge request fforesight/layout-parser!115
2024-03-12 11:15:41 +01:00
maverickstuder
956fbff872
RED-7141: Align backend text sorting with Webviewer sorting
...
* hotfix for tables not being detected due to wrong x-y-sorting
2024-03-12 11:06:53 +01:00
maverickstuder
16be2467fd
RED-8715: Improve NearestNeighbor Algorithm in LayoutParser
...
* replaced the old algorithm with an algorithm based on a kd-tree
2024-03-11 14:42:28 +01:00