From d0126d36a765d374da7b1fa659ffb9c4907704bc Mon Sep 17 00:00:00 2001 From: Kilian Schuettler Date: Tue, 8 Aug 2023 10:57:20 +0200 Subject: [PATCH] proomping 2 --- drools-prompt/drools-prompt | 18 ++++++++++++------ 1 file changed, 12 insertions(+), 6 deletions(-) diff --git a/drools-prompt/drools-prompt b/drools-prompt/drools-prompt index 224d8bb2..79f9831e 100644 --- a/drools-prompt/drools-prompt +++ b/drools-prompt/drools-prompt @@ -16,7 +16,7 @@ The Section, Table, TableCell, Paragraph, and Headline implement a common interf - TableCells may have any child, except TableCells. - Paragraphs and Headlines have no children. - Sections may have any child except TableCells, but if it contains Paragraphs as well as Tables, it is split into a Section with multiple Sections as children, where any child Section only contains either Tables or Paragraphs. - The first Headline remains in the Parent Section, while all others are put into the child section they belong to. + Further, if the first SemanticNode is a Headline it remains the first child in the Parent Section, before any subsections. ---------------------------------------------------------------- The relevant functions for SemanticNode: @@ -49,10 +49,12 @@ Set getPages() * @param pageNumber The page number to check. * @return True if this node is found on the specified page number, false otherwise. */ -boolean isOnPage(int pageNumber) +boolean onPage(int pageNumber) /** -* Returns the closest Headline associated with this SemanticNode +* For Sections it searches its children and returns the first Headline. +* For Paragraphs, Tables, and TableCells it returns getHeadline() of getParent() +* For Headline it returns itself and for Headers or Footers it returns an empty dummy Headline. * * @return First Headline found. */ @@ -60,7 +62,7 @@ Headline getHeadline() /** * @return The SemanticNode representing the Parent in the DocumentTree -* throws NotFoundException, when no parent is present +* When no parent is present, the Document is returned. And for the Document itself it throws an UnsupportedOperationException. */ SemanticNode getParent() @@ -330,6 +332,10 @@ The Page Object has the following functions: */ public TextBlock getMainBodyTextBlock() /** +* @return All SemanticNodes that occur on the page, except Header and Footer +*/ +public List getMainBody() +/** * Gets all Entities located on the page * @return Set of all Entities associated with this Page */ @@ -341,8 +347,8 @@ Set getEntities(); */ Integer getPageNumber(); ---------------------------------------------------------------- -The goal of the Rules is to find pieces of Text that we want to redact. -There are two different types of rules, during one you create new Entities and in the other you change or remove existing Entities. +The goal of the Rules is to find pieces of Text that we want to redact. These pieces of text are represented as RedactionEntities +There are two different types of rules, one you create new Entities and in the other you change or remove existing Entities. An Entity is any piece of text, uniquely identified in the Document by its Boundary, its Type and its EntityType. The Boundary consists of a start and stop index in the text of the document. The Type is a String like "PII", which stands for The goal is to find entities that fulfill certain conditions. Each SemanticNode has its own set of entities, but these sets may have intersections.