171 Commits

Author SHA1 Message Date
Dominique Eifländer
5e88cb9a2d RED-8933: Fixed bugs in DocumineClassificationService 2024-05-08 12:56:51 +02:00
Corina Olariu
f4f01644f7 RED-8992 - Enable to add annotation on header with line breaks
- don't reorder textblocks classified asheaders and footers
- add unit test
2024-04-24 13:36:36 +03:00
Dominique Eifländer
9eaecdf378 RED-8932 Fixed not merged headline with identifier 2024-04-24 11:44:17 +02:00
Kilian Schuettler
0dda309829 RED-7384: add empty textBlock to Image to ensure continuous textranges across all SemanticNodes 2024-04-23 11:30:13 +02:00
Kilian Schuettler
37f7a6a03f RED-8995: swap incremental save for save without compression to correct wrong layers in rare cases 2024-04-22 11:00:43 +02:00
Kilian Schuettler
2addf63baf RED-8995: unclassified text might be missing from document data
* treat TablePageBlock.OTHER like PARAGRAPH (no special treatment)
2024-04-17 17:40:21 +02:00
Corina Olariu
a01958c842 RED-8747 - Entities not merged properly 2024-04-09 16:30:24 +02:00
Kilian Schüttler
fd7c461c8d RED-8799: LayoutGrid is wrong draw for some tables 2024-04-05 13:42:36 +02:00
Andrei Isvoran
34b260bb60 RED-8773 - Fix images not appearing on specific file 2024-04-03 10:21:45 +03:00
Dominique Eifländer
350513a699 RED-8627: Fixed scrambled text after sorting 2024-03-19 11:16:07 +01:00
Kilian Schuettler
007cbfd1ee RED-7384: Fixes for ClassCastException
* changed save -> incrementalSave
* always use origin file instead of reusing viewerdoc
* Sometimes the viewer document is corrupted after saving and missing the contentstreams on a random page, for the files we viewed it did not seem to happen with incrementalSave.might only be a timing issue though
2024-03-08 12:42:40 +01:00
Maverick Studer
33f726c689 RED-8550: Faulty table recognition and text duplication leads to huge sections
(cherry picked from commit 74f55a5cbf905d0f869d7aa2c12c80a6d9c42e36)
2024-02-29 13:09:50 +01:00
Maverick Studer
18a28e82d0 RED-8550: Faulty table recognition and text duplication leads to huge sections
* cherrypick
2024-02-21 14:19:48 +01:00
Kilian Schuettler
015984891f RED-8156: refactor ViewerDocumentService as a dependency for ocr-service
* fix pmd
2024-02-06 17:17:26 +01:00
Kilian Schuettler
66fcb62833 RED-8156: refactor ViewerDocumentService as a dependency for ocr-service
* fix pmd
2024-02-06 17:09:21 +01:00
Kilian Schuettler
48824f56a8 RED-8156: refactor ViewerDocumentService as a dependency for ocr-service
* fix pmd
2024-02-06 17:06:53 +01:00
Kilian Schuettler
785628537f RED-8156: refactor ViewerDocumentService as a dependency for ocr-service
* various improvements to experimental parsing steps
* added embed fonts functionality to viewer doc
* fix checkstyle
2024-02-06 17:03:38 +01:00
Kilian Schuettler
23eb0c40a3 RED-8156: refactor ViewerDocumentService as a dependency for ocr-service
* various improvements to experimental parsing steps
* added embed fonts functionality to viewer doc
2024-02-06 16:59:51 +01:00
Dominique Eifländer
e4f3557b36 RED-8171: Traces do not stop at @Async 2024-02-02 13:22:57 +01:00
Timo Bejan
88855de2da Red 8085 2024-01-29 10:31:36 +01:00
Dominique Eifländer
12344d57b2 RED-8106: Make documentdata serializable 2023-12-21 13:42:25 +01:00
Dominique Eifländer
b779c72041 RED-1137: Do not observe actuator endpoints 2023-12-20 14:05:00 +01:00
Kilian Schüttler
ba1c7c07ab RED-7384: fixes for migration 2023-12-20 12:40:00 +01:00
Dominique Eifländer
da2cdc288e RED-5223: Use tracing-commons from fforesight 2023-12-13 15:31:26 +01:00
Dominique Eifländer
711548d1a7 hotfix: removed dlq from response queue to be equal to persistence-service 2023-12-13 09:47:27 +01:00
Dominique Eifländer
750ccf4ce2 RED-5223: Enabled tracing, upgrade spring, use logstash-logback-encoder for json logs 2023-12-11 15:06:23 +01:00
Andrei Isvoran
d8c9659469 RED-7715 - Add log4j config to enable switching between json/line logs 2023-12-06 11:59:42 +02:00
Dominique Eifländer
dacc2f7f43 DM-589: Filter wrong detected cells that borders from rotation at scanning 2023-11-20 15:54:02 +01:00
yhampe
207d9dec97 * added back in if statement
* removed not needed commentar
2023-11-16 12:40:49 +01:00
yhampe
1316a067fe * removed double chechking for height of cell 2023-11-16 08:51:12 +01:00
yhampe
e203210ade * removed not needed properties 2023-11-16 08:23:58 +01:00
yhampe
b25d46291a * checkstyle 2023-11-16 08:12:47 +01:00
yhampe
84148d3b6e * fixed tests 2023-11-16 07:51:08 +01:00
Dominique Eifländer
a6ba66b1aa TAAS-103: Fixed values in wrong cells 2023-11-15 13:36:46 +01:00
yhampe
c3e69b2cdf * fixed bug with incorrect empty cell count by adding threshhold to cell.contains 2023-11-15 10:44:47 +01:00
yhampe
f69331e7d8 *renamed page to firstPage in DocumentStructure and Table 2023-11-07 10:21:19 +01:00
yhampe
01493dc033 TAAS-103: Table Detection and rotated text
* added page property to DocumentStructure to be able to get page of found tables

* added a method to TableExtractionService to get the table area

* added calculateMinCharWidthAndMaxCharHeightInsideTable to LayoutParsingPipeline to calculate the values based upon table area

* refactored PDFLinesTextStripper for better readability

*removed textMatrix from RedTextPosition as it is no longer needed
2023-11-07 08:47:28 +01:00
yhampe
459e0c8be7 TAAS-103: 2023-11-07 08:39:15 +01:00
Corina Olariu
0e0a811f9d RED-7806 - Specific customer document cannot be processed
- add brackets
2023-10-25 11:36:54 +03:00
Corina Olariu
efa3d75479 RED-7806 - Specific customer document cannot be processed
- check for font name null before using to avoid the NPE
2023-10-25 09:16:47 +03:00
Corina Olariu
3bab61c446 RED-7434 - Remove Section Grid entirely
- remove sectionGrid relation (including SectionGridCreatorService)
- update junit tests
2023-10-20 09:09:22 +03:00
Dominique Eifländer
567cbc178b hotfix: Fixed parsing for specific taas document 2023-10-17 15:52:19 +02:00
Dominique Eifländer
8647cf5a18 RED-7759: Upgraded storage-commons to newest windwos compatible version 2023-10-13 12:15:22 +02:00
Corina Olariu
daba0bf8a6 RED-7607 - Rotating pages leads to lost annotations (RM & DM)
- remove finally clause
2023-10-04 17:46:46 +03:00
Corina Olariu
3839de215c RED-7607 - Rotating pages leads to lost annotations (RM & DM)
- rollback to getDir().getDegrees()
2023-10-04 15:27:13 +03:00
Corina Olariu
b4d68594f1 RED-7607 - Rotating pages leads to lost annotations (RM & DM)
- use rotation instead of getDir().getDegrees()
2023-10-04 14:22:15 +03:00
Corina Olariu
99ed331a1e RED-7607 - Rotating pages leads to lost annotations (RM & DM)
- use getXDirAdj instead of getX
- add fontSizeCounter for landscape pages also
2023-10-04 14:13:38 +03:00
Corina Olariu
f2c0991987 RED-7607 - Rotating pages leads to lost annotations (RM & DM)
- fix PMD findings
2023-10-04 14:09:46 +03:00
Kilian Schuettler
5792ff4a93 TAAS-104: merge visually intersecting Paragraphs
* fix build
2023-09-05 16:54:23 +02:00
Kilian Schuettler
621c3f269d TAAS-104: merge visually intersecting Paragraphs 2023-09-05 16:09:05 +02:00