RED-8825: general improvements

* classify rulings as underline/striketrough
* improve performance of CleanRulings.lineBetween
* use lineBetween where possible
* wip, still todo:
 - Header/Footer by Ruling for all rotations
 - actually the ticket, optimizing layoutparsing for documine
This commit is contained in:
Kilian Schuettler 2024-04-29 17:24:15 +02:00
parent 4761d2e1a2
commit 64209255cb

View File

@ -26,6 +26,12 @@ public class RulingIntersectionFinder {
/**
* Implementation to find line intersection in O(P + n log n), where n is the number of lines and P the numer of intersections.
* based on <a href="http://people.csail.mit.edu/indyk/6.838-old/handouts/lec2.pdf">Segment Intersection by Piotr Indyk</a>
*
* @param horizontals a list of non-overlapping horizontal rulings
* @param verticals a list of non-overlapping vertical rulings
* @return a Map of each found intersection point pointing to the two lines forming the intersection.
*/
/*
* The algorithm assumes there are only horizontal and vertical lines which are unique in their coordinates. (E.g. no overlapping horizontal lines exist)
* As a high level overview, the algorithm uses a sweep line advancing from left to right.
* It dynamically updates the horizontal rulings which are intersected by the current sweep line.
@ -37,10 +43,6 @@ public class RulingIntersectionFinder {
* Since we are using this implementation to find table cells, one can expect this worst case to always be the case.
* A simple runtime comparison for a single page with the most lines we can expect (SinglePages/AbsolutelyEnormousTable.pdf with 30 horizontals and 144 verticals) shows this implementation takes roughly 14 ms, whereas the naive approach takes 7 ms. Both are negligible, but the naive approach is two times as fast.
* If we would like to make this faster, we would need a better data structure for 'TreeMap<Ruling, Void> horizontalRulingsInCurrentSweep', where we can query the TreeMap for all horizontal rulings in a given interval in O(log n).
*
* @param horizontals a list of non-overlapping horizontal rulings
* @param verticals a list of non-overlapping vertical rulings
* @return a Map of each found intersection point pointing to the two lines forming the intersection.
*/
public Map<Point2D, IntersectingRulings> find(List<Ruling> horizontals, List<Ruling> verticals) {