Corina Olariu
5f5a6258c5
Merge branch 'main' into RED-9206
2024-06-05 13:34:14 +03:00
Maverick Studer
5d33ad570e
RED-7074: Design Subsection section tree structure algorithm
2024-06-05 12:28:00 +02:00
Corina Olariu
fd698a78fc
RED-9206 - Sections are no longer correctly separated from each other in the test file
...
- introduce new layout parsing type: REDACT_MANAGER_WITHOUT_DUPLICATE_PARAGRAPH to include changes from REDACT_MANAGER apart from duplicate paragraph.
- updated junit tests
-
2024-06-04 20:55:37 +03:00
Maverick Studer
fc06dba2ce
RED-7074: Design Subsection section tree structure algorithm
2024-06-04 15:07:40 +02:00
Maverick Studer
efb1a748af
RED-7074: Design Subsection section tree structure algorithm
2024-05-28 14:48:21 +02:00
Maverick Studer
48b7a22e2b
RED-7074: Design Subsection section tree structure algorithm
2024-05-24 13:30:25 +02:00
Corina Olariu
0ed1481517
RED-9177 - Layout parser fails to process file
...
- use originFile as viewerDocumentFile
- return layoutGridOCGName in case the name is found and not check further properties
2024-05-22 13:02:42 +03:00
Andrei Isvoran
3835d03036
RED-9149 - Remove header detection
2024-05-20 14:59:34 +03:00
Kilian Schuettler
8648ed0952
hotifx for clarifynd
2024-05-15 14:02:02 +02:00
Andrei Isvoran
40465e8778
RED-9149 - Improvements
2024-05-13 15:13:37 +03:00
Andrei Isvoran
a76b2ace3f
RED-9149 - Address comments
2024-05-13 13:18:33 +03:00
Andrei Isvoran
aeaca2f278
RED-9149 - Header and footer extraction by page-association
2024-05-10 16:04:06 +03:00
Andrei Isvoran
f1dbcc24a2
RED-9149 - Header and footer extraction by page-association
2024-05-10 15:49:08 +03:00
Andrei Isvoran
fda25852d1
RED-9149 - Header and footer extraction by page-association
2024-05-10 15:17:41 +03:00
Dominique Eifländer
87001090d5
RED-8933: Fixed bugs in DocumineClassificationService
2024-05-08 13:01:23 +02:00
Kilian Schuettler
6a65d7f9fc
RED-8825: minor fixes
...
* also added overrides via env variables
2024-05-07 17:37:42 +02:00
Kilian Schuettler
e935cc7b14
RED-8825: some fixes, and experimental column detector
2024-05-06 14:24:39 +02:00
Kilian Schuettler
abb249e966
RED-8825: general layoutparsing improvements
...
* fix checkstyle
2024-05-03 00:15:31 +02:00
Kilian Schuettler
60acbac53f
RED-8825: general layoutparsing improvements
...
* fixing a bunch of coordinates
2024-05-03 00:06:29 +02:00
Kilian Schuettler
a3decd292d
RED-8825: general layoutparsing improvements
...
* fix RulingCleaningService
2024-05-02 23:00:22 +02:00
Kilian Schuettler
b6f0a21886
RED-8825: general layoutparsing improvements
...
* refactor all coordinates
2024-05-02 21:01:25 +02:00
Kilian Schuettler
d61cac8b4f
RED-8825: general layoutparsing improvements
...
* fix tests
2024-04-30 16:06:22 +02:00
Kilian Schuettler
ae46c5f1ca
RED-8825: general layoutparsing improvements
...
* fix tests
2024-04-30 11:55:18 +02:00
Kilian Schuettler
15ea385f4d
RED-8825: general improvements
...
* some more refactoring
* fixed text ruling classification for vertical text
* shrunk min graphics size
2024-04-30 10:44:32 +02:00
Kilian Schuettler
08be18db2d
RED-8825: general improvements
...
* some more refactoring
2024-04-29 20:09:53 +02:00
Kilian Schuettler
64209255cb
RED-8825: general improvements
...
* classify rulings as underline/striketrough
* improve performance of CleanRulings.lineBetween
* use lineBetween where possible
* wip, still todo:
- Header/Footer by Ruling for all rotations
- actually the ticket, optimizing layoutparsing for documine
2024-04-29 17:24:15 +02:00
Kilian Schuettler
4761d2e1a2
RED-8825: general improvements
...
* classify rulings as underline/striketrough
* improve performance of CleanRulings.lineBetween
* use lineBetween where possible
* wip, still todo:
- Header/Footer by Ruling for all rotations
- actually the ticket, optimizing layoutparsing for documine
2024-04-29 17:22:33 +02:00
Kilian Schuettler
1916e626df
RED-8825: general improvements
...
* classify rulings as underline/striketrough
* improve performance of CleanRulings.lineBetween
* use lineBetween where possible
* wip, still todo:
- Header/Footer by Ruling for all rotations
- actually the ticket, optimizing layoutparsing for documine
2024-04-29 17:15:19 +02:00
Kilian Schuettler
e4663ac8db
RED-8825: added split by ruling into every step of docstrum
2024-04-29 15:54:56 +02:00
Kilian Schuettler
6a691183dc
RED-8825: improve layoutparsing
...
* added improved debugging capabilities to viewer-doc
* refactored coordinates (wip)
* refactored line intersection algorithm
* removed cropbox correction from pdfbox text positions
2024-04-29 15:54:56 +02:00
Kilian Schuettler
3dd215288a
RED-8825: improve layoutparsing
...
* added improved debugging capabilities to viewer-doc
* refactored coordinates (wip)
* refactored line intersection algorithm
* removed cropbox correction from pdfbox text positions
2024-04-29 15:54:53 +02:00
Corina Olariu
4e7c3f584b
RED-8992 - Enable to add annotation on header with line breaks
...
- don't reorder textblocks classified as headers and footers
- add unit test
2024-04-25 11:23:10 +03:00
Dominique Eifländer
8442e60055
RED-8932 Fixed not merged headline with identifier
2024-04-24 11:45:38 +02:00
Dominique Eifländer
58acbab85f
Merge branch 'RED-8826' into 'main'
...
Red 8826
See merge request fforesight/layout-parser!138
2024-04-23 13:12:51 +02:00
Kilian Schüttler
c1afe9b11f
Red 7384
2024-04-23 12:13:19 +02:00
Dominique Eifländer
683f7f1fb8
RED-8826: Do not classify textblocks in graphics as headlines
2024-04-23 09:28:28 +02:00
Dominique Eifländer
b53930328a
RED-8826: Implemented graphics detection
2024-04-19 15:05:17 +02:00
Kilian Schuettler
f256f9b30f
RED-8995: unclassified text might be missing from document data
...
* treat TablePageBlock.OTHER like PARAGRAPH (no special treatment)
2024-04-18 17:42:34 +02:00
Kilian Schüttler
c4d9c5df02
Merge branch 'RED-8747-fp' into 'main'
...
RED-8747 - Entities not merged properly - fp
See merge request fforesight/layout-parser!131
2024-04-09 16:30:02 +02:00
Corina Olariu
976f408237
RED-8747 - Entities not merged properly - fp
...
- rework the extraction of rulings from the table cells
2024-04-09 14:38:48 +03:00
Corina Olariu
014eba9fc3
RED-8747 - Entities not merged properly - fp
...
- fix typo
- add validate table test
2024-04-09 12:14:57 +03:00
yhampe
c13ff7fbf6
RED-8402: Header and footer are not indexed / searched
...
checkstyle
added review comments
2024-04-08 12:17:49 +02:00
yhampe
0c3194276a
RED-8402: Header and footer are not indexed / searched
...
added headers and footers to simplifiedtext
2024-04-08 12:02:36 +02:00
Corina Olariu
f185b13f2b
RED-8747 - Entities not merged properly - fp
...
- use the rullings from the found tables instead of all rullings as splitting rullings in the blockification service
2024-04-08 09:42:32 +03:00
Dominique Eifländer
990c376ce6
Merge branch 'RED-8873' into 'main'
...
RED-8773 - Fix images not appearing on specific file
See merge request fforesight/layout-parser!123
2024-04-05 10:11:23 +02:00
Kilian Schuettler
f18bda1d4e
RED-8799: LayoutGrid is wrong draw for some tables
2024-04-04 13:33:22 +02:00
Andrei Isvoran
456b8fe4a1
RED-8773 - Fix images not appearing on specific file
2024-04-03 10:20:46 +03:00
maverickstuder
9778ece992
RED-8702: Explore document databases to store entityLog
...
* fix for duplicate images in document structure that are linked to multiple sections
2024-04-02 14:19:14 +02:00
Timo Bejan
5c1708f97f
Issue with merging text blocks multiple times
2024-03-22 12:47:05 +02:00
Dominique Eifländer
8e7e588d26
RED-8627: Fixed scrambled text after sorting
2024-03-19 10:58:36 +01:00