renaming; readme

This commit is contained in:
Matthias Bisping 2022-02-05 14:42:00 +01:00
parent c435d25ac0
commit 8c88fc594d
2 changed files with 30 additions and 4 deletions

26
README.md Normal file
View File

@ -0,0 +1,26 @@
# Table Parsing
This repository implements computer vision based approaches for detecting and parsing visual features such as tables or previous redactions.
## Installation
```bash
git clone ssh://git@git.iqser.com:2222/rr/table_parsing.git
cd table_parsing
python3 -m venv env
source env/bin/activate
pip install -e .
pip install -r requirements.txt
```
## Usage
```bash
# Parse tables on second page of a PDF
python scripts/annotate.py <path to pdf> 1 --type table
# Detect redactions (black filled rectangles) on first page of a PDF
python scripts/annotate.py <path to pdf> 0 --type redaction
```

View File

@ -9,7 +9,7 @@ def parse_args():
parser = argparse.ArgumentParser()
parser.add_argument("pdf_path")
parser.add_argument("page_index", type=int)
parser.add_argument("--object", choices=["table", "box", "layout"], default="table")
parser.add_argument("--type", choices=["table", "redaction", "layout"], default="table")
args = parser.parse_args()
@ -18,10 +18,10 @@ def parse_args():
if __name__ == "__main__":
args = parse_args()
if args.object == "table":
if args.type == "table":
annotate_tables_in_pdf(args.pdf_path, page_index=args.page_index)
elif args.object == "box":
elif args.type == "redaction":
annotate_boxes_in_pdf(args.pdf_path, page_index=args.page_index)
elif args.object == "layout":
elif args.type == "layout":
annotate_layout_in_pdf(args.pdf_path, page_index=args.page_index)