78 Commits

Author SHA1 Message Date
yhampe
b4a225144d RED-8481: Use visual layout parsing to detect signatures
working on failing tests
2024-02-15 10:16:07 +01:00
yhampe
903b1c1fd4 RED-8481: Use visual layout parsing to detect signatures
fixed failing tests because of null pointer
2024-02-15 09:27:07 +01:00
yhampe
c3e7582ee3 RED-8481: Use visual layout parsing to detect signatures
fixed failing tests because of null pointer
2024-02-14 12:33:36 +01:00
yhampe
cfc5db45cd RED-8481: Use visual layout parsing to detect signatures
fixed failing tests because of null pointer
2024-02-14 12:24:32 +01:00
yhampe
fbd0196719 RED-8481: Use visual layout parsing to detect signatures
implemented visuallayoutparsingresult
2024-02-14 12:16:37 +01:00
Kilian Schuettler
23eb0c40a3 RED-8156: refactor ViewerDocumentService as a dependency for ocr-service
* various improvements to experimental parsing steps
* added embed fonts functionality to viewer doc
2024-02-06 16:59:51 +01:00
Timo Bejan
88855de2da Red 8085 2024-01-29 10:31:36 +01:00
Kilian Schüttler
ba1c7c07ab RED-7384: fixes for migration 2023-12-20 12:40:00 +01:00
Dominique Eifländer
dacc2f7f43 DM-589: Filter wrong detected cells that borders from rotation at scanning 2023-11-20 15:54:02 +01:00
yhampe
207d9dec97 * added back in if statement
* removed not needed commentar
2023-11-16 12:40:49 +01:00
yhampe
1316a067fe * removed double chechking for height of cell 2023-11-16 08:51:12 +01:00
yhampe
e203210ade * removed not needed properties 2023-11-16 08:23:58 +01:00
Dominique Eifländer
a6ba66b1aa TAAS-103: Fixed values in wrong cells 2023-11-15 13:36:46 +01:00
yhampe
c3e69b2cdf * fixed bug with incorrect empty cell count by adding threshhold to cell.contains 2023-11-15 10:44:47 +01:00
yhampe
f69331e7d8 *renamed page to firstPage in DocumentStructure and Table 2023-11-07 10:21:19 +01:00
yhampe
01493dc033 TAAS-103: Table Detection and rotated text
* added page property to DocumentStructure to be able to get page of found tables

* added a method to TableExtractionService to get the table area

* added calculateMinCharWidthAndMaxCharHeightInsideTable to LayoutParsingPipeline to calculate the values based upon table area

* refactored PDFLinesTextStripper for better readability

*removed textMatrix from RedTextPosition as it is no longer needed
2023-11-07 08:47:28 +01:00
Corina Olariu
0e0a811f9d RED-7806 - Specific customer document cannot be processed
- add brackets
2023-10-25 11:36:54 +03:00
Corina Olariu
efa3d75479 RED-7806 - Specific customer document cannot be processed
- check for font name null before using to avoid the NPE
2023-10-25 09:16:47 +03:00
Corina Olariu
3bab61c446 RED-7434 - Remove Section Grid entirely
- remove sectionGrid relation (including SectionGridCreatorService)
- update junit tests
2023-10-20 09:09:22 +03:00
Dominique Eifländer
567cbc178b hotfix: Fixed parsing for specific taas document 2023-10-17 15:52:19 +02:00
Corina Olariu
3839de215c RED-7607 - Rotating pages leads to lost annotations (RM & DM)
- rollback to getDir().getDegrees()
2023-10-04 15:27:13 +03:00
Corina Olariu
b4d68594f1 RED-7607 - Rotating pages leads to lost annotations (RM & DM)
- use rotation instead of getDir().getDegrees()
2023-10-04 14:22:15 +03:00
Corina Olariu
99ed331a1e RED-7607 - Rotating pages leads to lost annotations (RM & DM)
- use getXDirAdj instead of getX
- add fontSizeCounter for landscape pages also
2023-10-04 14:13:38 +03:00
Corina Olariu
f2c0991987 RED-7607 - Rotating pages leads to lost annotations (RM & DM)
- fix PMD findings
2023-10-04 14:09:46 +03:00
Kilian Schuettler
5792ff4a93 TAAS-104: merge visually intersecting Paragraphs
* fix build
2023-09-05 16:54:23 +02:00
Kilian Schuettler
621c3f269d TAAS-104: merge visually intersecting Paragraphs 2023-09-05 16:09:05 +02:00
deiflaender
306a53ea79 RED-7461: Fixed wrong textblock classifation if footer is marked as header 2023-09-01 12:07:47 +02:00
Kilian Schuettler
28ec4c9ccb TAAS-89: added log entry and an end2end test 2023-08-31 14:28:18 +02:00
Kilian Schuettler
f87e2d75b5 TAAS-89: fixed weird bug with empty sections 2023-08-31 11:41:22 +02:00
Kilian Schuettler
261ef4c367 TAAS-89: added some more documentation
* fixed weird bug with empty sections
2023-08-31 10:49:32 +02:00
Kilian Schuettler
3a18923ef5 upgrade PDFBox to 3.0.0
* disable experimental ruling header stuff
2023-08-21 17:54:20 +02:00
Kilian Schuettler
2b15fd1d3c RED-7461: improve header/footer recognition 2023-08-21 17:49:13 +02:00
deiflaender
0cb8029f0a RED-7461: Fixed pr findings 2023-08-21 16:57:37 +02:00
deiflaender
b270b9c942 RED-7461: Use marked content to classify headers and footers if available 2023-08-21 16:02:24 +02:00
deiflaender
60615ec5d8 RED-7461: First working iteration of header and footer improvement 2023-08-21 15:31:11 +02:00
Timo Bejan
83d39ba3a5 Fixed issue with weird colors 2023-08-18 16:21:45 +03:00
Kilian Schuettler
0387cdd143 RED-7158: fix for all page rotations
* also make lines thinner
2023-08-15 14:55:41 +02:00
Kilian Schuettler
9aa9cb2d54 RED-7158: add layoutgrid into new ViewerDocument as optional content
* set layer to invisible by default
2023-08-15 13:14:16 +02:00
Kilian Schuettler
63de8ef82d RED-7158: add layoutgrid into new ViewerDocument as optional content
* downgraded storage-commons
2023-08-14 16:07:11 +02:00
Kilian Schuettler
ea0af08c31 RED-7851: add layoutgrid to new viewer document as optional content 2023-08-14 16:06:23 +02:00
Kilian Schuettler
4bd6e7e343 update PDFBox Version 2023-08-09 12:41:28 +02:00
Kilian Schuettler
17259ed805 add renovate, fix checkstyle 2023-08-09 10:11:02 +02:00
Andrei Isvoran
5c1dca5933 RED-6864 - Switch to DELETE_ON_CLOSE 2023-08-09 09:30:37 +03:00
Andrei Isvoran
cfca5376a0 RED-6864 - Switch to new storage-commons download 2023-08-08 17:16:40 +02:00
deiflaender
5877aea3f7 DM-165: Fixed numberFormatException on german local machines 2023-08-07 12:12:00 +02:00
deiflaender
f2b92de827 DM-165: Fixed indexOutOfBounds error in TableNodeFactory 2023-08-05 10:20:05 +02:00
Kilian Schuettler
4a5464d6aa Refactoring to make downstream refactoring easier 2023-08-04 15:16:36 +02:00
deiflaender
150aea55c0 RED-5253: Ported last documine changes 2023-08-04 09:55:35 +02:00
Kilian Schuettler
d6a74dc9f9 add field id to image data 2023-07-31 16:32:11 +02:00
Kilian Schuettler
2a55654fcf add simplifiedText 2023-07-31 15:30:03 +02:00