maverickstuder
0c8b2e6d44
RED-7074: Design Subsection section tree structure algorithm
...
* added abstract class SectionNode
* both Section and SuperSection extend the SectionNode class, so that there is no inheritance between Section and SuperSection as well as no field duplication
2024-05-22 13:02:16 +02:00
maverickstuder
b50bfed69d
RED-7074: Design Subsection section tree structure algorithm
...
* fix all failing tests
2024-05-15 16:40:57 +02:00
maverickstuder
49f13d1f03
RED-7074: Design Subsection section tree structure algorithm
...
* post rebase fixup
2024-05-15 15:09:31 +02:00
maverickstuder
61c90fc30d
Merge branch 'main' into RED-7074
...
# Conflicts:
# layoutparser-service/layoutparser-service-processor/src/main/java/com/knecon/fforesight/service/layoutparser/processor/LayoutParsingPipeline.java
# layoutparser-service/layoutparser-service-processor/src/main/java/com/knecon/fforesight/service/layoutparser/processor/model/text/TextPageBlock.java
# layoutparser-service/layoutparser-service-processor/src/main/java/com/knecon/fforesight/service/layoutparser/processor/services/SectionsBuilderService.java
# layoutparser-service/layoutparser-service-processor/src/main/java/com/knecon/fforesight/service/layoutparser/processor/services/TableExtractionService.java
# layoutparser-service/layoutparser-service-processor/src/main/java/com/knecon/fforesight/service/layoutparser/processor/services/blockification/DocstrumBlockificationService.java
# layoutparser-service/layoutparser-service-processor/src/main/java/com/knecon/fforesight/service/layoutparser/processor/services/classification/DocuMineClassificationService.java
# layoutparser-service/layoutparser-service-processor/src/main/java/com/knecon/fforesight/service/layoutparser/processor/services/factory/DocumentGraphFactory.java
# layoutparser-service/layoutparser-service-processor/src/main/java/com/knecon/fforesight/service/layoutparser/processor/services/factory/SectionNodeFactory.java
# layoutparser-service/layoutparser-service-server/src/test/java/com/knecon/fforesight/service/layoutparser/server/HeadlinesGoldStandardIntegrationTest.java
2024-05-15 14:17:59 +02:00
maverickstuder
6a0661cf09
RED-7074: Design Subsection section tree structure algorithm
...
* bugfix
2024-05-15 13:51:49 +02:00
maverickstuder
2d33615b94
RED-7074: Design Subsection section tree structure algorithm
...
* added redactmanager logic for headline classification to documine and clarifynd
* refactored headline classification
* added supersection for non-leaf sections (containing other sections instead of only paragraphs, images, ...)
* bugfix for certain edge cases in some files running into error state
2024-05-15 10:29:39 +02:00
maverickstuder
1856fed640
RED-7074: Design Subsection section tree structure algorithm
...
* improved merging of headlines as well as splitting logic so that more headlines are detected correctly
2024-05-14 17:41:44 +02:00
maverickstuder
2fcaeb3d8c
RED-7074: Design Subsection section tree structure algorithm
...
* added supersection and changed logic so that each normal section only contains leaf nodes
* added SectionIdentifier logic for headline splitting and merging
* fixed many edge cases which resulted in error state files
2024-05-14 10:51:05 +02:00
maverickstuder
cfb6f0acfa
RED-7074: Design Subsection section tree structure algorithm
...
* lots of refactoring to splitting logic for text blocks which resulted in some empty blocks to be created which can then not be localized (i.e. by containsBlock)
2024-05-08 14:15:27 +02:00
maverickstuder
d2dc369df3
RED-7074: Design Subsection section tree structure algorithm
...
* temp
2024-05-07 14:25:54 +02:00
Kilian Schuettler
bcd1eb9afa
RED-8825: general layoutparsing improvements
...
* added test for table line classification
2024-05-03 00:13:48 +02:00
Kilian Schuettler
60acbac53f
RED-8825: general layoutparsing improvements
...
* fixing a bunch of coordinates
2024-05-03 00:06:29 +02:00
Kilian Schuettler
b6f0a21886
RED-8825: general layoutparsing improvements
...
* refactor all coordinates
2024-05-02 21:01:25 +02:00
maverickstuder
f7aeb9a406
RED-7074: Design Subsection section tree structure algorithm
...
* refactoring
2024-05-02 10:36:36 +02:00
maverickstuder
c071a133e6
RED-7074: Design Subsection section tree structure algorithm
...
* added toc enrichment logic and changed section computation to build upon created toc
2024-04-30 14:41:17 +02:00
Kilian Schuettler
ae46c5f1ca
RED-8825: general layoutparsing improvements
...
* fix tests
2024-04-30 11:55:18 +02:00
Kilian Schuettler
15ea385f4d
RED-8825: general improvements
...
* some more refactoring
* fixed text ruling classification for vertical text
* shrunk min graphics size
2024-04-30 10:44:32 +02:00
Kilian Schuettler
08be18db2d
RED-8825: general improvements
...
* some more refactoring
2024-04-29 20:09:53 +02:00
Kilian Schuettler
1916e626df
RED-8825: general improvements
...
* classify rulings as underline/striketrough
* improve performance of CleanRulings.lineBetween
* use lineBetween where possible
* wip, still todo:
- Header/Footer by Ruling for all rotations
- actually the ticket, optimizing layoutparsing for documine
2024-04-29 17:15:19 +02:00
Kilian Schuettler
e4663ac8db
RED-8825: added split by ruling into every step of docstrum
2024-04-29 15:54:56 +02:00
Kilian Schuettler
3dd215288a
RED-8825: improve layoutparsing
...
* added improved debugging capabilities to viewer-doc
* refactored coordinates (wip)
* refactored line intersection algorithm
* removed cropbox correction from pdfbox text positions
2024-04-29 15:54:53 +02:00
maverickstuder
85e3cf0ecc
RED-7074: Design Subsection section tree structure algorithm
...
* first draft: further implementations
2024-04-29 15:00:49 +02:00
maverickstuder
17756f5977
RED-7074: Design Subsection section tree structure algorithm
...
* first draft: further implementations
2024-04-29 15:00:48 +02:00
maverickstuder
59d9d6c3e6
RED-7074: Design Subsection section tree structure algorithm
...
* first draft: further implementations
2024-04-29 15:00:34 +02:00
maverickstuder
c888746761
RED-7074: Design Subsection section tree structure algorithm
...
* first draft: further implementations
2024-04-29 15:00:34 +02:00
maverickstuder
7279d0a870
RED-7074: Design Subsection section tree structure algorithm
...
* first draft
2024-04-29 15:00:34 +02:00
maverickstuder
c84a199f9d
RED-7074: Design Subsection section tree structure algorithm
...
* first draft
2024-04-29 15:00:32 +02:00
Corina Olariu
4e7c3f584b
RED-8992 - Enable to add annotation on header with line breaks
...
- don't reorder textblocks classified as headers and footers
- add unit test
2024-04-25 11:23:10 +03:00
Yannik Hampe
84bdb4d1ed
Merge branch 'RED-8701' into 'main'
...
RED-8701 - Move files to customer data repositories
See merge request fforesight/layout-parser!137
2024-04-25 09:06:35 +02:00
Dominique Eifländer
8442e60055
RED-8932 Fixed not merged headline with identifier
2024-04-24 11:45:38 +02:00
Corina Olariu
0ef67fc07b
RED-8701 - Move files to customer data repositories
...
- update junit tests and syngenta submodule
2024-04-23 14:54:56 +03:00
Corina Olariu
bdcb9aeda4
RED-8701 - Move files to customer data repositories
...
- update junit tests
2024-04-23 11:49:29 +03:00
Corina Olariu
6a86036a78
Merge branch 'main' into RED-8701
2024-04-23 11:46:59 +03:00
Corina Olariu
a358d7565e
RED-8701 - Move files to customer data repositories
...
- update junit tests
2024-04-23 11:12:57 +03:00
Corina Olariu
069a6c0b49
RED-8701 - Move files to customer data repositories
...
- update syngenta submodule
2024-04-23 10:44:23 +03:00
Corina Olariu
7eab3a4088
RED-8701 - Move files to customer data repositories
...
- remove customer files from project
2024-04-22 14:57:51 +03:00
Corina Olariu
970fc99ed1
RED-8701 - Move files to customer data repositories
...
- update junit test
2024-04-22 14:14:47 +03:00
Corina Olariu
48c54f63a0
RED-8701 - Move files to customer data repositories
...
- update submodules
2024-04-22 13:57:39 +03:00
Corina Olariu
20e4e5ddff
RED-8701 - Move files to customer data repositories
...
- update unit tests with the new path to submodules for customer files
2024-04-22 13:37:27 +03:00
Dominique Eifländer
b53930328a
RED-8826: Implemented graphics detection
2024-04-19 15:05:17 +02:00
Corina Olariu
cc9816c8cb
RED-8701 - Move files to customer data repositories
...
- use git lfs to store customer files
2024-04-18 20:31:35 +03:00
yhampe
8099a00bb6
RED-8402: Header and footer are not indexed / searched
...
added unit test and file
2024-04-18 14:39:01 +02:00
Corina Olariu
319268c53d
RED-8747 - Entities not merged properly - fp
...
- update test
2024-04-09 12:24:19 +03:00
Corina Olariu
014eba9fc3
RED-8747 - Entities not merged properly - fp
...
- fix typo
- add validate table test
2024-04-09 12:14:57 +03:00
Corina Olariu
f185b13f2b
RED-8747 - Entities not merged properly - fp
...
- use the rullings from the found tables instead of all rullings as splitting rullings in the blockification service
2024-04-08 09:42:32 +03:00
Dominique Eifländer
8e7e588d26
RED-8627: Fixed scrambled text after sorting
2024-03-19 10:58:36 +01:00
Dominique Eifländer
1d765a6baa
RED-7141: Fixed more overlap problems
2024-03-14 16:30:52 +01:00
Dominique Eifländer
27aa418029
RED-7141: Fixed overlapping blocks
2024-03-13 16:14:55 +01:00
Dominique Eifländer
92fd1a72de
RED-7141: Readded lost mergeLinesInZones
2024-03-12 13:42:40 +01:00
maverickstuder
16be2467fd
RED-8715: Improve NearestNeighbor Algorithm in LayoutParser
...
* replaced the old algorithm with an algorithm based on a kd-tree
2024-03-11 14:42:28 +01:00