From 64209255cb90e0eda917a37a0088c8a6ca90c1ee Mon Sep 17 00:00:00 2001 From: Kilian Schuettler Date: Mon, 29 Apr 2024 17:24:15 +0200 Subject: [PATCH] RED-8825: general improvements * classify rulings as underline/striketrough * improve performance of CleanRulings.lineBetween * use lineBetween where possible * wip, still todo: - Header/Footer by Ruling for all rotations - actually the ticket, optimizing layoutparsing for documine --- .../processor/utils/RulingIntersectionFinder.java | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/layoutparser-service/layoutparser-service-processor/src/main/java/com/knecon/fforesight/service/layoutparser/processor/utils/RulingIntersectionFinder.java b/layoutparser-service/layoutparser-service-processor/src/main/java/com/knecon/fforesight/service/layoutparser/processor/utils/RulingIntersectionFinder.java index 2107abc..e69bcee 100644 --- a/layoutparser-service/layoutparser-service-processor/src/main/java/com/knecon/fforesight/service/layoutparser/processor/utils/RulingIntersectionFinder.java +++ b/layoutparser-service/layoutparser-service-processor/src/main/java/com/knecon/fforesight/service/layoutparser/processor/utils/RulingIntersectionFinder.java @@ -26,6 +26,12 @@ public class RulingIntersectionFinder { /** * Implementation to find line intersection in O(P + n log n), where n is the number of lines and P the numer of intersections. * based on Segment Intersection by Piotr Indyk + * + * @param horizontals a list of non-overlapping horizontal rulings + * @param verticals a list of non-overlapping vertical rulings + * @return a Map of each found intersection point pointing to the two lines forming the intersection. + */ + /* * The algorithm assumes there are only horizontal and vertical lines which are unique in their coordinates. (E.g. no overlapping horizontal lines exist) * As a high level overview, the algorithm uses a sweep line advancing from left to right. * It dynamically updates the horizontal rulings which are intersected by the current sweep line. @@ -37,10 +43,6 @@ public class RulingIntersectionFinder { * Since we are using this implementation to find table cells, one can expect this worst case to always be the case. * A simple runtime comparison for a single page with the most lines we can expect (SinglePages/AbsolutelyEnormousTable.pdf with 30 horizontals and 144 verticals) shows this implementation takes roughly 14 ms, whereas the naive approach takes 7 ms. Both are negligible, but the naive approach is two times as fast. * If we would like to make this faster, we would need a better data structure for 'TreeMap horizontalRulingsInCurrentSweep', where we can query the TreeMap for all horizontal rulings in a given interval in O(log n). - * - * @param horizontals a list of non-overlapping horizontal rulings - * @param verticals a list of non-overlapping vertical rulings - * @return a Map of each found intersection point pointing to the two lines forming the intersection. */ public Map find(List horizontals, List verticals) {