Matthias Bisping 5cdf93b923 Pull request #39: RED-6084 Improve image extraction speed
Merge in RR/image-prediction from RED-6084-adhoc-scanned-pages-filtering-refactoring to master

Squashed commit of the following:

commit bd6d83e7363b1c1993babcceb434110a6312c645
Author: Matthias Bisping <matthias.bisping@axbit.com>
Date:   Thu Feb 9 16:08:25 2023 +0100

    Tweak logging

commit 55bdd48d2a3462a8b4a6b7194c4a46b21d74c455
Author: Matthias Bisping <matthias.bisping@axbit.com>
Date:   Thu Feb 9 15:47:31 2023 +0100

    Update dependencies

commit 970275b25708c05e4fbe78b52aa70d791d5ff17a
Author: Matthias Bisping <matthias.bisping@axbit.com>
Date:   Thu Feb 9 15:35:37 2023 +0100

    Refactoring

    Make alpha channel check monadic to streamline error handling

commit e99e97e23fd8ce16f9a421d3e5442fccacf71ead
Author: Matthias Bisping <matthias.bisping@axbit.com>
Date:   Tue Feb 7 14:32:29 2023 +0100

    Refactoring

    - Rename
    - Refactor image extraction functions

commit 76b1b0ca2401495ec03ba2b6483091b52732eb81
Author: Matthias Bisping <matthias.bisping@axbit.com>
Date:   Tue Feb 7 11:55:30 2023 +0100

    Refactoring

commit cb1c461049d7c43ec340302f466447da9f95a499
Author: Matthias Bisping <matthias.bisping@axbit.com>
Date:   Tue Feb 7 11:44:01 2023 +0100

    Refactoring

commit 092069221a85ac7ac19bf838dcbc7ab1fde1e12b
Author: Matthias Bisping <matthias.bisping@axbit.com>
Date:   Tue Feb 7 10:18:53 2023 +0100

    Add to-do

commit 3cea4dad2d9703b8c79ddeb740b66a3b8255bb2a
Author: Matthias Bisping <matthias.bisping@axbit.com>
Date:   Tue Feb 7 10:11:35 2023 +0100

    Refactoring

    - Rename
    - Add typehints everywhere

commit 865e0819a14c420bc2edff454d41092c11c019a4
Author: Matthias Bisping <matthias.bisping@axbit.com>
Date:   Mon Feb 6 19:38:57 2023 +0100

    Add type explanation

commit 01d3d5d33f1ccb05aea1cec1d1577572b1a4deaa
Author: Matthias Bisping <matthias.bisping@axbit.com>
Date:   Mon Feb 6 19:37:49 2023 +0100

    Formatting

commit dffe1c18fc3a322a6b08890d4438844e8122faaf
Author: Matthias Bisping <matthias.bisping@axbit.com>
Date:   Mon Feb 6 19:34:13 2023 +0100

    [WIP] Either refactoring

    Add alternative formulation for monadic chain

commit 066cf17add404a313520cd794c06e3264cf971c9
Author: Matthias Bisping <matthias.bisping@axbit.com>
Date:   Mon Feb 6 18:40:30 2023 +0100

    [WIP] Either refactoring

commit f53f0fea298cdab88deb090af328b34d37e0198e
Author: Matthias Bisping <matthias.bisping@axbit.com>
Date:   Mon Feb 6 18:18:34 2023 +0100

    [WIP] Either refactoring

    Propagate error and metadata

commit 274a5f56d4fcb9c67fac5cf43e9412ec1ab5179e
Author: Matthias Bisping <matthias.bisping@axbit.com>
Date:   Mon Feb 6 17:51:35 2023 +0100

    [WIP] Either refactoring

    Fix test assertion

commit 3235a857f6e418e50484cbfff152b0f63efb2f53
Author: Matthias Bisping <matthias.bisping@axbit.com>
Date:   Mon Feb 6 16:57:31 2023 +0100

    [WIP] Either-refactoring

    Replace Maybe with Either to allow passing on error information or
    metadata which otherwise get sucked up by Nothing.

commit 89989543d87490f8b20a0a76055605d34345e8f4
Author: Matthias Bisping <matthias.bisping@axbit.com>
Date:   Mon Feb 6 16:12:40 2023 +0100

    [WIP] Monadic refactoring

    Integrate image validation step into monadic chain.

    At the moment we lost the error information through this. Refactoring to
    Either monad can bring it back.

commit 022bd4856a51aa085df5fe983fd77b99b53d594c
Author: Matthias Bisping <matthias.bisping@axbit.com>
Date:   Mon Feb 6 15:16:41 2023 +0100

    [WIP] Monadic refactoring

commit ca3898cb539607c8c3dd01c57e60211a5fea8a7d
Author: Matthias Bisping <matthias.bisping@axbit.com>
Date:   Mon Feb 6 15:10:34 2023 +0100

    [WIP] Monadic refactoring

commit d8f37bed5cbd6bdd2a0b52bae46fcdbb50f9dff2
Author: Matthias Bisping <matthias.bisping@axbit.com>
Date:   Mon Feb 6 15:09:51 2023 +0100

    [WIP] Monadic refactoring

commit 906fee0e5df051f38076aa1d2725e52a182ade13
Author: Matthias Bisping <matthias.bisping@axbit.com>
Date:   Mon Feb 6 15:03:35 2023 +0100

    [WIP] Monadic refactoring

... and 35 more commits
2023-02-10 08:33:13 +01:00

44 lines
1.8 KiB
Python

from funcy import juxt
from image_prediction.classifier.classifier import Classifier
from image_prediction.classifier.image_classifier import ImageClassifier
from image_prediction.compositor.compositor import TransformerCompositor
from image_prediction.encoder.encoders.hash_encoder import HashEncoder
from image_prediction.estimator.adapter.adapter import EstimatorAdapter
from image_prediction.formatter.formatters.camel_case import Snake2CamelCaseKeyFormatter
from image_prediction.formatter.formatters.enum import EnumFormatter
from image_prediction.image_extractor.extractors.parsable import ParsablePDFImageExtractor
from image_prediction.label_mapper.mappers.probability import ProbabilityMapper
from image_prediction.model_loader.loader import ModelLoader
from image_prediction.model_loader.loaders.mlflow import MlflowConnector
from image_prediction.redai_adapter.mlflow import MlflowModelReader
from image_prediction.transformer.transformers.coordinate.pdfnet import PDFNetCoordinateTransformer
from image_prediction.transformer.transformers.response import ResponseTransformer
def get_mlflow_model_loader(mlruns_dir):
model_loader = ModelLoader(MlflowConnector(MlflowModelReader(mlruns_dir)))
return model_loader
def get_image_classifier(model_loader, model_identifier):
model, classes = juxt(model_loader.load_model, model_loader.load_classes)(model_identifier)
return ImageClassifier(Classifier(EstimatorAdapter(model), ProbabilityMapper(classes)))
def get_extractor(**kwargs):
image_extractor = ParsablePDFImageExtractor(**kwargs)
return image_extractor
def get_formatter():
formatter = TransformerCompositor(
PDFNetCoordinateTransformer(), EnumFormatter(), ResponseTransformer(), Snake2CamelCaseKeyFormatter()
)
return formatter
def get_encoder():
return HashEncoder()