Julius Unverfehrt e7b28f5bda Pull request #18: Remove pil
Merge in RR/cv-analysis from remove_pil to master

Squashed commit of the following:

commit 83c8d88f3d48404251470176c70979ee75ae068b
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Thu Jul 21 10:51:51 2022 +0200

    remove deprecated server tests

commit cebc03b5399ac257a74036b41997201f882f5b74
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Thu Jul 21 10:51:08 2022 +0200

    remove deprecated server tests

commit ce2845b0c51f001b7b5b8b195d6bf7e034ec4e39
Author: Julius Unverfehrt <julius.unverfehrt@iqser.com>
Date:   Wed Jul 20 17:05:00 2022 +0200

    repair tests to work without pillow WIP

commit 023fdab8322f28359a24c63e32635a3d0deccbe4
Author: Isaac Riley <Isaac.Riley@iqser.com>
Date:   Wed Jul 20 16:40:36 2022 +0200

    fixed typo

commit 33850ca83a175f74789ae6b9bebd057ed84b7fb3
Author: Isaac Riley <Isaac.Riley@iqser.com>
Date:   Wed Jul 20 16:38:37 2022 +0200

    fixed import from refactored open_img.py

commit dbc6d345f074e538948e2c4f94ebed8a5ef520bc
Author: Isaac Riley <Isaac.Riley@iqser.com>
Date:   Wed Jul 20 16:32:42 2022 +0200

    removed PIL from production code, now inly in scripts
2022-07-21 13:25:00 +02:00

28 lines
917 B
Python

from numpy import array, ndarray
import pdf2image
from PIL import Image
from cv_analysis.utils.preprocessing import preprocess_page_array
def open_pdf(pdf, first_page=0, last_page=None):
first_page += 1
last_page = None if last_page is None else last_page + 1
if type(pdf) == str:
if pdf.lower().endswith((".png", ".jpg", ".jpeg")):
pages = [Image.open(pdf)]
elif pdf.lower().endswith(".pdf"):
pages = pdf2image.convert_from_path(pdf, first_page=first_page, last_page=last_page)
else:
raise IOError("Invalid file extension. Accepted filetypes:\n\t.png\n\t.jpg\n\t.jpeg\n\t.pdf")
elif type(pdf) == bytes:
pages = pdf2image.convert_from_bytes(pdf, first_page=first_page, last_page=last_page)
elif type(pdf) in {list, ndarray}:
return pdf
pages = [preprocess_page_array(array(p)) for p in pages]
return pages