proomping 2
This commit is contained in:
parent
b2621fb4b3
commit
d0126d36a7
@ -16,7 +16,7 @@ The Section, Table, TableCell, Paragraph, and Headline implement a common interf
|
||||
- TableCells may have any child, except TableCells.
|
||||
- Paragraphs and Headlines have no children.
|
||||
- Sections may have any child except TableCells, but if it contains Paragraphs as well as Tables, it is split into a Section with multiple Sections as children, where any child Section only contains either Tables or Paragraphs.
|
||||
The first Headline remains in the Parent Section, while all others are put into the child section they belong to.
|
||||
Further, if the first SemanticNode is a Headline it remains the first child in the Parent Section, before any subsections.
|
||||
|
||||
----------------------------------------------------------------
|
||||
The relevant functions for SemanticNode:
|
||||
@ -49,10 +49,12 @@ Set<Page> getPages()
|
||||
* @param pageNumber The page number to check.
|
||||
* @return True if this node is found on the specified page number, false otherwise.
|
||||
*/
|
||||
boolean isOnPage(int pageNumber)
|
||||
boolean onPage(int pageNumber)
|
||||
|
||||
/**
|
||||
* Returns the closest Headline associated with this SemanticNode
|
||||
* For Sections it searches its children and returns the first Headline.
|
||||
* For Paragraphs, Tables, and TableCells it returns getHeadline() of getParent()
|
||||
* For Headline it returns itself and for Headers or Footers it returns an empty dummy Headline.
|
||||
*
|
||||
* @return First Headline found.
|
||||
*/
|
||||
@ -60,7 +62,7 @@ Headline getHeadline()
|
||||
|
||||
/**
|
||||
* @return The SemanticNode representing the Parent in the DocumentTree
|
||||
* throws NotFoundException, when no parent is present
|
||||
* When no parent is present, the Document is returned. And for the Document itself it throws an UnsupportedOperationException.
|
||||
*/
|
||||
SemanticNode getParent()
|
||||
|
||||
@ -330,6 +332,10 @@ The Page Object has the following functions:
|
||||
*/
|
||||
public TextBlock getMainBodyTextBlock()
|
||||
/**
|
||||
* @return All SemanticNodes that occur on the page, except Header and Footer
|
||||
*/
|
||||
public List<SemanticNode> getMainBody()
|
||||
/**
|
||||
* Gets all Entities located on the page
|
||||
* @return Set of all Entities associated with this Page
|
||||
*/
|
||||
@ -341,8 +347,8 @@ Set<RedactionEntity> getEntities();
|
||||
*/
|
||||
Integer getPageNumber();
|
||||
----------------------------------------------------------------
|
||||
The goal of the Rules is to find pieces of Text that we want to redact.
|
||||
There are two different types of rules, during one you create new Entities and in the other you change or remove existing Entities.
|
||||
The goal of the Rules is to find pieces of Text that we want to redact. These pieces of text are represented as RedactionEntities
|
||||
There are two different types of rules, one you create new Entities and in the other you change or remove existing Entities.
|
||||
An Entity is any piece of text, uniquely identified in the Document by its Boundary, its Type and its EntityType. The Boundary consists of a start and stop index in the text of the document.
|
||||
The Type is a String like "PII", which stands for
|
||||
The goal is to find entities that fulfill certain conditions. Each SemanticNode has its own set of entities, but these sets may have intersections.
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user