210 Commits

Author SHA1 Message Date
maverickstuder
b08ed2037e RED-7074: Design Subsection section tree structure algorithm
* fix pmd and checkstyle
2024-05-15 16:46:15 +02:00
maverickstuder
b50bfed69d RED-7074: Design Subsection section tree structure algorithm
* fix all failing tests
2024-05-15 16:40:57 +02:00
maverickstuder
49f13d1f03 RED-7074: Design Subsection section tree structure algorithm
* post rebase fixup
2024-05-15 15:09:31 +02:00
maverickstuder
61c90fc30d Merge branch 'main' into RED-7074
# Conflicts:
#	layoutparser-service/layoutparser-service-processor/src/main/java/com/knecon/fforesight/service/layoutparser/processor/LayoutParsingPipeline.java
#	layoutparser-service/layoutparser-service-processor/src/main/java/com/knecon/fforesight/service/layoutparser/processor/model/text/TextPageBlock.java
#	layoutparser-service/layoutparser-service-processor/src/main/java/com/knecon/fforesight/service/layoutparser/processor/services/SectionsBuilderService.java
#	layoutparser-service/layoutparser-service-processor/src/main/java/com/knecon/fforesight/service/layoutparser/processor/services/TableExtractionService.java
#	layoutparser-service/layoutparser-service-processor/src/main/java/com/knecon/fforesight/service/layoutparser/processor/services/blockification/DocstrumBlockificationService.java
#	layoutparser-service/layoutparser-service-processor/src/main/java/com/knecon/fforesight/service/layoutparser/processor/services/classification/DocuMineClassificationService.java
#	layoutparser-service/layoutparser-service-processor/src/main/java/com/knecon/fforesight/service/layoutparser/processor/services/factory/DocumentGraphFactory.java
#	layoutparser-service/layoutparser-service-processor/src/main/java/com/knecon/fforesight/service/layoutparser/processor/services/factory/SectionNodeFactory.java
#	layoutparser-service/layoutparser-service-server/src/test/java/com/knecon/fforesight/service/layoutparser/server/HeadlinesGoldStandardIntegrationTest.java
2024-05-15 14:17:59 +02:00
maverickstuder
6a0661cf09 RED-7074: Design Subsection section tree structure algorithm
* bugfix
2024-05-15 13:51:49 +02:00
maverickstuder
2d33615b94 RED-7074: Design Subsection section tree structure algorithm
* added redactmanager logic for headline classification to documine and clarifynd
* refactored headline classification
* added supersection for non-leaf sections (containing other sections instead of only paragraphs, images, ...)
* bugfix for certain edge cases in some files running into error state
2024-05-15 10:29:39 +02:00
maverickstuder
1856fed640 RED-7074: Design Subsection section tree structure algorithm
* improved merging of headlines as well as splitting logic so that more headlines are detected correctly
2024-05-14 17:41:44 +02:00
maverickstuder
2fcaeb3d8c RED-7074: Design Subsection section tree structure algorithm
* added supersection and changed logic so that each normal section only contains leaf nodes
* added SectionIdentifier logic for headline splitting and merging
* fixed many edge cases which resulted in error state files
2024-05-14 10:51:05 +02:00
Andrei Isvoran
40465e8778 RED-9149 - Improvements 2024-05-13 15:13:37 +03:00
Andrei Isvoran
a76b2ace3f RED-9149 - Address comments 2024-05-13 13:18:33 +03:00
Andrei Isvoran
aeaca2f278 RED-9149 - Header and footer extraction by page-association 2024-05-10 16:04:06 +03:00
Andrei Isvoran
f1dbcc24a2 RED-9149 - Header and footer extraction by page-association 2024-05-10 15:49:08 +03:00
Andrei Isvoran
fda25852d1 RED-9149 - Header and footer extraction by page-association 2024-05-10 15:17:41 +03:00
maverickstuder
4e07ba4ff1 RED-7074: Design Subsection section tree structure algorithm
* import optimized
2024-05-08 14:16:29 +02:00
maverickstuder
cfb6f0acfa RED-7074: Design Subsection section tree structure algorithm
* lots of refactoring to splitting logic for text blocks which resulted in some empty blocks to be created which can then not be localized (i.e. by containsBlock)
2024-05-08 14:15:27 +02:00
Dominique Eifländer
87001090d5 RED-8933: Fixed bugs in DocumineClassificationService 2024-05-08 13:01:23 +02:00
Kilian Schuettler
6a65d7f9fc RED-8825: minor fixes
* also added overrides via env variables
2024-05-07 17:37:42 +02:00
maverickstuder
a9338262c5 RED-7074: Design Subsection section tree structure algorithm
* fix for boundary error
2024-05-07 15:51:54 +02:00
maverickstuder
d2dc369df3 RED-7074: Design Subsection section tree structure algorithm
* temp
2024-05-07 14:25:54 +02:00
Kilian Schuettler
e935cc7b14 RED-8825: some fixes, and experimental column detector 2024-05-06 14:24:39 +02:00
Kilian Schuettler
abb249e966 RED-8825: general layoutparsing improvements
* fix checkstyle
2024-05-03 00:15:31 +02:00
Kilian Schuettler
60acbac53f RED-8825: general layoutparsing improvements
* fixing a bunch of coordinates
2024-05-03 00:06:29 +02:00
Kilian Schuettler
a3decd292d RED-8825: general layoutparsing improvements
* fix RulingCleaningService
2024-05-02 23:00:22 +02:00
Kilian Schuettler
b6f0a21886 RED-8825: general layoutparsing improvements
* refactor all coordinates
2024-05-02 21:01:25 +02:00
maverickstuder
f7aeb9a406 RED-7074: Design Subsection section tree structure algorithm
* refactoring
2024-05-02 10:36:36 +02:00
Kilian Schuettler
d61cac8b4f RED-8825: general layoutparsing improvements
* fix tests
2024-04-30 16:06:22 +02:00
maverickstuder
9bf2f5c56c Merge remote-tracking branch 'origin/RED-7074' into RED-7074
# Conflicts:
#	layoutparser-service/layoutparser-service-processor/src/main/java/com/knecon/fforesight/service/layoutparser/processor/LayoutParsingPipeline.java
#	layoutparser-service/layoutparser-service-processor/src/main/java/com/knecon/fforesight/service/layoutparser/processor/model/ClassificationDocument.java
#	layoutparser-service/layoutparser-service-processor/src/main/java/com/knecon/fforesight/service/layoutparser/processor/model/outline/OutlineValidationService.java
#	layoutparser-service/layoutparser-service-processor/src/main/java/com/knecon/fforesight/service/layoutparser/processor/model/outline/TableOfContentItem.java
#	layoutparser-service/layoutparser-service-processor/src/main/java/com/knecon/fforesight/service/layoutparser/processor/model/outline/TableOfContents.java
#	layoutparser-service/layoutparser-service-processor/src/main/java/com/knecon/fforesight/service/layoutparser/processor/services/classification/RedactManagerClassificationService.java
#	layoutparser-service/layoutparser-service-server/src/test/java/com/knecon/fforesight/service/layoutparser/server/graph/ViewerDocumentTest.java
#	layoutparser-service/layoutparser-service-server/src/test/resources/files/new/UTT-Books-53.pdf
2024-04-30 14:44:26 +02:00
maverickstuder
c071a133e6 RED-7074: Design Subsection section tree structure algorithm
* added toc enrichment logic and changed section computation to build upon created toc
2024-04-30 14:41:17 +02:00
Kilian Schuettler
ae46c5f1ca RED-8825: general layoutparsing improvements
* fix tests
2024-04-30 11:55:18 +02:00
Kilian Schuettler
15ea385f4d RED-8825: general improvements
* some more refactoring
 * fixed text ruling classification for vertical text
 * shrunk min graphics size
2024-04-30 10:44:32 +02:00
Kilian Schuettler
08be18db2d RED-8825: general improvements
* some more refactoring
2024-04-29 20:09:53 +02:00
Kilian Schuettler
64209255cb RED-8825: general improvements
* classify rulings as underline/striketrough
* improve performance of CleanRulings.lineBetween
* use lineBetween where possible
* wip, still todo:
 - Header/Footer by Ruling for all rotations
 - actually the ticket, optimizing layoutparsing for documine
2024-04-29 17:24:15 +02:00
Kilian Schuettler
4761d2e1a2 RED-8825: general improvements
* classify rulings as underline/striketrough
* improve performance of CleanRulings.lineBetween
* use lineBetween where possible
* wip, still todo:
 - Header/Footer by Ruling for all rotations
 - actually the ticket, optimizing layoutparsing for documine
2024-04-29 17:22:33 +02:00
Kilian Schuettler
1916e626df RED-8825: general improvements
* classify rulings as underline/striketrough
* improve performance of CleanRulings.lineBetween
* use lineBetween where possible
* wip, still todo:
 - Header/Footer by Ruling for all rotations
 - actually the ticket, optimizing layoutparsing for documine
2024-04-29 17:15:19 +02:00
Kilian Schuettler
e4663ac8db RED-8825: added split by ruling into every step of docstrum 2024-04-29 15:54:56 +02:00
Kilian Schuettler
6a691183dc RED-8825: improve layoutparsing
* added improved debugging capabilities to viewer-doc
* refactored coordinates (wip)
* refactored line intersection algorithm
* removed cropbox correction from pdfbox text positions
2024-04-29 15:54:56 +02:00
Kilian Schuettler
3dd215288a RED-8825: improve layoutparsing
* added improved debugging capabilities to viewer-doc
* refactored coordinates (wip)
* refactored line intersection algorithm
* removed cropbox correction from pdfbox text positions
2024-04-29 15:54:53 +02:00
maverickstuder
9f9ea68706 RED-7074: Design Subsection section tree structure algorithm
* first draft: further implementations
2024-04-29 15:00:49 +02:00
maverickstuder
85e3cf0ecc RED-7074: Design Subsection section tree structure algorithm
* first draft: further implementations
2024-04-29 15:00:49 +02:00
maverickstuder
17756f5977 RED-7074: Design Subsection section tree structure algorithm
* first draft: further implementations
2024-04-29 15:00:48 +02:00
maverickstuder
59d9d6c3e6 RED-7074: Design Subsection section tree structure algorithm
* first draft: further implementations
2024-04-29 15:00:34 +02:00
maverickstuder
c888746761 RED-7074: Design Subsection section tree structure algorithm
* first draft: further implementations
2024-04-29 15:00:34 +02:00
maverickstuder
7279d0a870 RED-7074: Design Subsection section tree structure algorithm
* first draft
2024-04-29 15:00:34 +02:00
maverickstuder
c84a199f9d RED-7074: Design Subsection section tree structure algorithm
* first draft
2024-04-29 15:00:32 +02:00
Corina Olariu
4e7c3f584b RED-8992 - Enable to add annotation on header with line breaks
- don't reorder textblocks classified as headers and footers
- add unit test
2024-04-25 11:23:10 +03:00
Dominique Eifländer
8442e60055 RED-8932 Fixed not merged headline with identifier 2024-04-24 11:45:38 +02:00
Dominique Eifländer
58acbab85f Merge branch 'RED-8826' into 'main'
Red 8826

See merge request fforesight/layout-parser!138
2024-04-23 13:12:51 +02:00
Kilian Schüttler
c1afe9b11f Red 7384 2024-04-23 12:13:19 +02:00
Dominique Eifländer
683f7f1fb8 RED-8826: Do not classify textblocks in graphics as headlines 2024-04-23 09:28:28 +02:00
Dominique Eifländer
b53930328a RED-8826: Implemented graphics detection 2024-04-19 15:05:17 +02:00