179 Commits

Author SHA1 Message Date
Dominique Eifländer
d10c0a7900 Revert idRemoval fix, removed duplicate code 2021-01-08 13:46:55 +01:00
Timo
e3a960d086 idRemoval fix 2021-01-08 11:29:47 +02:00
Dominique Eifländer
e23ed69e04 Avoid IndexOutOfBoundsException if dictionary entry has blank at end 2021-01-07 16:20:51 +01:00
Dominique Eifländer
633fb403e0 Fixed RegEx for et al. recommendations 2021-01-07 13:07:21 +01:00
Timo Bejan
e58b4ff6c1 Pull request #95: Feature/ruleset integration
Merge in RED/redaction-service from feature/ruleset-integration to master

* commit '2c4350b8f369c00177781edc4567df1f2806a2fe':
  Rules Tester
  rule update fix
2021-01-06 18:47:57 +01:00
Timo
2c4350b8f3 Rules Tester 2021-01-06 19:37:39 +02:00
Timo
07ffeab3ae rule update fix 2021-01-06 19:30:40 +02:00
Timo Bejan
f087d4afdb Pull request #94: dev mode features, exception generalisation
Merge in RED/redaction-service from feature/ruleset-integration to master

* commit 'c2669ab56843f3ec335cfb1cd7d1e59dc828fb98':
  fixed tests
  fixed tests
  dev mode features, exception generalisation
2021-01-06 16:23:56 +01:00
Timo
c2669ab568 fixed tests 2021-01-06 17:04:07 +02:00
Timo
a824aa20a5 fixed tests 2021-01-06 17:02:43 +02:00
Timo
6412cf37d9 dev mode features, exception generalisation 2021-01-06 16:41:16 +02:00
Dominique Eifländer
de725a630c RED-727: Added possibility to redact/addRecommendations by regEx in rules. Added email regEx and et al. author recommendation regEx 2021-01-06 14:53:49 +01:00
Timo
09069d11ad RedactionLog now stores ruleSetId 2021-01-06 10:11:16 +02:00
Timo
5aba4b69ba RuleSetId integration and drools update 2021-01-06 01:45:38 +02:00
Dominique Eifländer
e8256c49dc Fixed annotating cell with more than one TextBlock (Mismatch between EntityPositionSequence and found Entity) 2021-01-05 14:53:09 +01:00
Dominique Eifländer
599c7bd6e4 Tables with only 2 column are treated as on text 2021-01-05 12:23:24 +01:00
Dominique Eifländer
609018a051 Fixed false positive dictionary problems 2021-01-04 16:34:55 +01:00
Dominique Eifländer
704e6a4b5a Find annotations also in Header cells 2021-01-04 11:53:03 +01:00
Timo
0bc5abb29d fixed text-after and text-before spacing 2021-01-03 12:40:21 +02:00
Dominique Eifländer
79b57e85cd Handle 'u00A0' character the same way as ' ' 2020-12-23 10:57:58 +01:00
Dominique Eifländer
000b145e71 Fixed 'Comparison method violates its general contract' by using QuickSort from PDFBox 2020-12-22 16:04:29 +01:00
Dominique Eifländer
caf6277de9 RED-882: Added textBefore and textAfter to redaction log 2020-12-18 14:31:27 +01:00
deiflaender
bfa363a3d2 RED-871: Fixed endless processing on document with corrupted contentStream 2020-12-11 11:26:48 +01:00
deiflaender
50ec16601c Fixed table offset bug 2020-12-10 19:33:58 +01:00
deiflaender
e43bd1b711 RED-864, Added isDictionaryEntry to redactionLog. Fixed order of dictionary types 2020-12-10 12:37:51 +01:00
deiflaender
44613ee117 Made dictionaries Theadsafe 2020-12-09 17:09:11 +01:00
deiflaender
608ea4bbcc RED-824: Add author recommendations based on vertebrate study tables 2020-12-07 15:53:20 +01:00
deiflaender
c90eee23c4 Fixed duplicate Textblock in Tables 2020-12-03 15:17:33 +01:00
deiflaender
4ef6e0e2ef RED-740: Improved section recognition 2020-11-27 15:39:31 +01:00
Timo
3c4d6dd2f2 fixed awt auto * 2020-11-26 20:20:43 +02:00
Timo
0e645ab273 fixed Tests & fallback 2020-11-26 20:12:36 +02:00
Timo
cc1a3c9e49 removed import 2020-11-26 18:56:02 +02:00
Timo
536d4689f3 Added rank of dictionary to processing entities in redaction service, simplified code 2020-11-26 18:52:44 +02:00
deiflaender
d466d9b032 Rename sponsor dictionary to CBI_sponsor 2020-11-26 12:09:19 +01:00
deiflaender
746a25c00d RED-783: Seperate rules for PII and CBI, RED-780: Added PII rules for author(s) and performing laboratory 2020-11-26 09:36:26 +01:00
deiflaender
6de87d051e RED-743: Updated to latest rules and dictionaries 2020-11-23 12:24:22 +01:00
deiflaender
9c2926451d RED-744: Expose section grid 2020-11-19 11:37:57 +01:00
deiflaender
bdc231f3c2 RED-473: Fixed missing batched produced at annotation 2020-11-18 13:29:33 +01:00
deiflaender
efb05d15a1 Adjusted rules 2020-11-18 12:54:09 +01:00
deiflaender
a4f8d2f424 Changed rules and dictionaries 2020-11-17 09:57:04 +01:00
deiflaender
e178393c23 Fixed paragraph recognition 2020-11-16 16:00:45 +01:00
deiflaender
f75aff5186 RED-692: Added matchedRule to RedactionLog 2020-11-16 11:55:58 +01:00
deiflaender
85d73dae47 RED-632: Set status DECLINED in RedactionLog for manual removals that are DECLINED 2020-11-12 16:14:45 +01:00
deiflaender
936683f94d RED-629: Each annotation is one entry in the RedactionLog 2020-11-11 15:29:28 +01:00
deiflaender
efe49ac2c1 RED-419: Avoid duplicate entries 2020-11-05 12:27:24 +01:00
deiflaender
c6d3e3a4cf RED-515: Added manualRedactionType to redactionLog, to know if it must added or not in pdftron-redaction-service if it is requested 2020-11-04 15:47:00 +01:00
Thierry Göckel
122d5a556e Bump conf service version and remove obsolete object init 2020-11-04 11:49:13 +01:00
Thierry Göckel
355cb16679 Fix test 2020-11-03 21:10:17 +01:00
Thierry Göckel
c285c384ce Fix rule in drools file, too 2020-11-03 20:48:23 +01:00
Thierry Göckel
02052fbb6a Fix sponsor companies rule and add corresponding test 2020-11-03 20:47:11 +01:00