From e4dc6631b54a84df43cd2be1e66a20b174f71b94 Mon Sep 17 00:00:00 2001
From: Matthias Bisping <Matthias.Bisping@iqser.com>
Date: Thu, 3 Feb 2022 11:44:11 +0100
Subject: [PATCH] Pull request #1: Setup

Merge in RR/fb_detr_prediction_container from setup to master

Squashed commit of the following:

commit 7fae4878d4250676367b7201fa163a4b67f79f84
Author: Matthias Bisping <matthias.bisping@iqser.com>
Date:   Thu Feb 3 11:22:12 2022 +0100

    readded annotation to client

commit ff788030f6b3b342919a7fd31dfa66940033d7e1
Author: Matthias Bisping <matthias.bisping@iqser.com>
Date:   Thu Feb 3 11:15:16 2022 +0100

    applied black

commit 3521444f678950a2772b725c6964751e0e655736
Merge: 4080aff 51d6597
Author: Matthias Bisping <matthias.bisping@iqser.com>
Date:   Thu Feb 3 10:39:11 2022 +0100

    Merge branch 'setup' of ssh://git.iqser.com:2222/rr/fb_detr_prediction_container into setup

commit 4080affd21a02ad32c61fbd2027511f51a202d63
Author: Matthias Bisping <matthias.bisping@iqser.com>
Date:   Thu Feb 3 10:39:02 2022 +0100

    added poppler-utils download to Dockerfile, since pdf2image only is a wrapper for it

commit 51d6597b056ae9ac693280f65a3f37d46b1276cf
Author: Julius Unverfehrt <Julius.Unverfehrt@iqser.com>
Date:   Thu Feb 3 09:43:35 2022 +0100

    Structure change for local backbone lookup (working now)

commit ac314d5148d6e026c67f00df45a8bbc70c15b52d
Author: Julius Unverfehrt <Julius.Unverfehrt@iqser.com>
Date:   Thu Feb 3 09:35:41 2022 +0100

    env bug fixed

commit 1c3221fe4956911b29fd8fede8d07dcdefad06d8
Author: Julius Unverfehrt <Julius.Unverfehrt@iqser.com>
Date:   Thu Feb 3 09:23:55 2022 +0100

    ENV correctly set now

commit 58069440583f1f78cfb2fb796fa4dc4a63e2916a
Author: Julius Unverfehrt <Julius.Unverfehrt@iqser.com>
Date:   Thu Feb 3 08:41:29 2022 +0100

    ENV for local torch model lookup set

commit f0501cf0bf904793e8e04afbd3d80ee84af9d981
Author: Matthias Bisping <matthias.bisping@iqser.com>
Date:   Wed Feb 2 18:28:44 2022 +0100

    changed host and port for flask

commit 986fda22f6656b10930628d0d284995b33ea2df5
Author: Matthias Bisping <matthias.bisping@iqser.com>
Date:   Wed Feb 2 17:33:07 2022 +0100

    added debug webserver method

commit 64b857ce53757ec2b7e7c327962fa65b551603a0
Author: Matthias Bisping <matthias.bisping@iqser.com>
Date:   Wed Feb 2 16:59:11 2022 +0100

    moved utils into module; fixed open-cv (maybe)

commit c62ada183135e12b41a29c6822472e33698f947f
Author: Matthias Bisping <matthias.bisping@iqser.com>
Date:   Wed Feb 2 15:55:10 2022 +0100

    made bash scripts executable

commit 982bdd7503c14fcf1776ae10c38589475199545e
Author: Matthias Bisping <matthias.bisping@iqser.com>
Date:   Wed Feb 2 15:35:16 2022 +0100

    service building logic added (WIP)

commit 46e5e3b8e67e54ecedaeee4765a3437f08fa4b17
Author: Matthias Bisping <matthias.bisping@iqser.com>
Date:   Wed Feb 2 14:37:28 2022 +0100

    applied black

commit ad93130e66d2e87bc86b2bf1de6234f3c037df48
Author: Matthias Bisping <matthias.bisping@iqser.com>
Date:   Wed Feb 2 14:36:09 2022 +0100

    fixed formatting (w, h -> x2, y2); added drawing logic to caller mock

commit df76f033599e66aaa52143f5e2b156530f643df9
Author: Matthias Bisping <matthias.bisping@iqser.com>
Date:   Wed Feb 2 13:54:34 2022 +0100

    page indices in predictions

commit 5e87c57dff752419486d1a44de9a734e3f840816
Author: Matthias Bisping <matthias.bisping@iqser.com>
Date:   Wed Feb 2 13:17:34 2022 +0100

    service main loop WIP (working in basic version)

commit ba5ec3d57621d090201413309126955940602be9
Author: Matthias Bisping <matthias.bisping@iqser.com>
Date:   Wed Feb 2 13:03:52 2022 +0100

    service main loop WIP

commit 77266f6982ec826eadcdd8a18c5ccf0fc380611b
Author: Matthias Bisping <matthias.bisping@iqser.com>
Date:   Wed Feb 2 11:24:27 2022 +0100

    fixed bug for self.classes == None

commit 858ef7589d6914ad503660a3ddc5e75bf72a6bb7
Author: Matthias Bisping <matthias.bisping@iqser.com>
Date:   Wed Feb 2 11:09:11 2022 +0100

    removed 'postprocessors' argument and attribute

... and 32 more commits
---
 .dvc/.gitignore                               |   3 +
 .dvc/config                                   |   6 +
 .dvc/plots/confusion.json                     | 107 ++++++++++++++++
 .dvc/plots/confusion_normalized.json          | 112 ++++++++++++++++
 .dvc/plots/linear.json                        | 116 +++++++++++++++++
 .dvc/plots/scatter.json                       | 104 +++++++++++++++
 .dvc/plots/simple.json                        |  31 +++++
 .dvc/plots/smooth.json                        |  39 ++++++
 .dvcignore                                    |   3 +
 .gitmodules                                   |   3 +
 Dockerfile                                    |  35 +++++
 __init__.py                                   |   0
 config.yaml                                   |   7 +
 data/.gitignore                               |   1 +
 data/checkpoint.pth.dvc                       |   4 +
 data/hub/checkpoints/.gitignore               |   1 +
 .../hub/checkpoints/resnet50-0676ba61.pth.dvc |   4 +
 fb_detr/__init__.py                           |   0
 fb_detr/locations.py                          |   7 +
 fb_detr/predictor.py                          | 121 ++++++++++++++++++
 fb_detr/utils/__init__.py                     |   0
 fb_detr/utils/config.py                       |  18 +++
 incl/__init__.py                              |   0
 incl/detr                                     |   1 +
 requirements.txt                              |  14 ++
 scripts/client_mock.py                        |  58 +++++++++
 scripts/flask_test.py                         |  35 +++++
 scripts/predict.py                            |  58 +++++++++
 setup.py                                      |  13 ++
 setup/docker.sh                               |  14 ++
 setup/docker_local.sh                         |   8 ++
 src/run_service.py                            |  66 ++++++++++
 32 files changed, 989 insertions(+)
 create mode 100644 .dvc/.gitignore
 create mode 100644 .dvc/config
 create mode 100644 .dvc/plots/confusion.json
 create mode 100644 .dvc/plots/confusion_normalized.json
 create mode 100644 .dvc/plots/linear.json
 create mode 100644 .dvc/plots/scatter.json
 create mode 100644 .dvc/plots/simple.json
 create mode 100644 .dvc/plots/smooth.json
 create mode 100644 .dvcignore
 create mode 100644 .gitmodules
 create mode 100644 Dockerfile
 create mode 100644 __init__.py
 create mode 100644 config.yaml
 create mode 100644 data/.gitignore
 create mode 100644 data/checkpoint.pth.dvc
 create mode 100644 data/hub/checkpoints/.gitignore
 create mode 100644 data/hub/checkpoints/resnet50-0676ba61.pth.dvc
 create mode 100644 fb_detr/__init__.py
 create mode 100644 fb_detr/locations.py
 create mode 100644 fb_detr/predictor.py
 create mode 100644 fb_detr/utils/__init__.py
 create mode 100644 fb_detr/utils/config.py
 create mode 100644 incl/__init__.py
 create mode 160000 incl/detr
 create mode 100644 requirements.txt
 create mode 100644 scripts/client_mock.py
 create mode 100644 scripts/flask_test.py
 create mode 100644 scripts/predict.py
 create mode 100644 setup.py
 create mode 100755 setup/docker.sh
 create mode 100755 setup/docker_local.sh
 create mode 100644 src/run_service.py

diff --git a/.dvc/.gitignore b/.dvc/.gitignore
new file mode 100644
index 0000000..528f30c
--- /dev/null
+++ b/.dvc/.gitignore
@@ -0,0 +1,3 @@
+/config.local
+/tmp
+/cache
diff --git a/.dvc/config b/.dvc/config
new file mode 100644
index 0000000..bde583a
--- /dev/null
+++ b/.dvc/config
@@ -0,0 +1,6 @@
+[core]
+    remote = vector
+    autostage = true
+['remote "vector"']
+    url = ssh://vector.iqser.com/research/detr_server/
+    port = 22
diff --git a/.dvc/plots/confusion.json b/.dvc/plots/confusion.json
new file mode 100644
index 0000000..af1b48d
--- /dev/null
+++ b/.dvc/plots/confusion.json
@@ -0,0 +1,107 @@
+{
+    "$schema": "https://vega.github.io/schema/vega-lite/v4.json",
+    "data": {
+        "values": "<DVC_METRIC_DATA>"
+    },
+    "title": "<DVC_METRIC_TITLE>",
+    "facet": {
+        "field": "rev",
+        "type": "nominal"
+    },
+    "spec": {
+        "transform": [
+            {
+                "aggregate": [
+                    {
+                        "op": "count",
+                        "as": "xy_count"
+                    }
+                ],
+                "groupby": [
+                    "<DVC_METRIC_Y>",
+                    "<DVC_METRIC_X>"
+                ]
+            },
+            {
+                "impute": "xy_count",
+                "groupby": [
+                    "rev",
+                    "<DVC_METRIC_Y>"
+                ],
+                "key": "<DVC_METRIC_X>",
+                "value": 0
+            },
+            {
+                "impute": "xy_count",
+                "groupby": [
+                    "rev",
+                    "<DVC_METRIC_X>"
+                ],
+                "key": "<DVC_METRIC_Y>",
+                "value": 0
+            },
+            {
+                "joinaggregate": [
+                    {
+                        "op": "max",
+                        "field": "xy_count",
+                        "as": "max_count"
+                    }
+                ],
+                "groupby": []
+            },
+            {
+                "calculate": "datum.xy_count / datum.max_count",
+                "as": "percent_of_max"
+            }
+        ],
+        "encoding": {
+            "x": {
+                "field": "<DVC_METRIC_X>",
+                "type": "nominal",
+                "sort": "ascending",
+                "title": "<DVC_METRIC_X_LABEL>"
+            },
+            "y": {
+                "field": "<DVC_METRIC_Y>",
+                "type": "nominal",
+                "sort": "ascending",
+                "title": "<DVC_METRIC_Y_LABEL>"
+            }
+        },
+        "layer": [
+            {
+                "mark": "rect",
+                "width": 300,
+                "height": 300,
+                "encoding": {
+                    "color": {
+                        "field": "xy_count",
+                        "type": "quantitative",
+                        "title": "",
+                        "scale": {
+                            "domainMin": 0,
+                            "nice": true
+                        }
+                    }
+                }
+            },
+            {
+                "mark": "text",
+                "encoding": {
+                    "text": {
+                        "field": "xy_count",
+                        "type": "quantitative"
+                    },
+                    "color": {
+                        "condition": {
+                            "test": "datum.percent_of_max > 0.5",
+                            "value": "white"
+                        },
+                        "value": "black"
+                    }
+                }
+            }
+        ]
+    }
+}
diff --git a/.dvc/plots/confusion_normalized.json b/.dvc/plots/confusion_normalized.json
new file mode 100644
index 0000000..1d38849
--- /dev/null
+++ b/.dvc/plots/confusion_normalized.json
@@ -0,0 +1,112 @@
+{
+    "$schema": "https://vega.github.io/schema/vega-lite/v4.json",
+    "data": {
+        "values": "<DVC_METRIC_DATA>"
+    },
+    "title": "<DVC_METRIC_TITLE>",
+    "facet": {
+        "field": "rev",
+        "type": "nominal"
+    },
+    "spec": {
+        "transform": [
+            {
+                "aggregate": [
+                    {
+                        "op": "count",
+                        "as": "xy_count"
+                    }
+                ],
+                "groupby": [
+                    "<DVC_METRIC_Y>",
+                    "<DVC_METRIC_X>"
+                ]
+            },
+            {
+                "impute": "xy_count",
+                "groupby": [
+                    "rev",
+                    "<DVC_METRIC_Y>"
+                ],
+                "key": "<DVC_METRIC_X>",
+                "value": 0
+            },
+            {
+                "impute": "xy_count",
+                "groupby": [
+                    "rev",
+                    "<DVC_METRIC_X>"
+                ],
+                "key": "<DVC_METRIC_Y>",
+                "value": 0
+            },
+            {
+                "joinaggregate": [
+                    {
+                        "op": "sum",
+                        "field": "xy_count",
+                        "as": "sum_y"
+                    }
+                ],
+                "groupby": [
+                    "<DVC_METRIC_Y>"
+                ]
+            },
+            {
+                "calculate": "datum.xy_count / datum.sum_y",
+                "as": "percent_of_y"
+            }
+        ],
+        "encoding": {
+            "x": {
+                "field": "<DVC_METRIC_X>",
+                "type": "nominal",
+                "sort": "ascending",
+                "title": "<DVC_METRIC_X_LABEL>"
+            },
+            "y": {
+                "field": "<DVC_METRIC_Y>",
+                "type": "nominal",
+                "sort": "ascending",
+                "title": "<DVC_METRIC_Y_LABEL>"
+            }
+        },
+        "layer": [
+            {
+                "mark": "rect",
+                "width": 300,
+                "height": 300,
+                "encoding": {
+                    "color": {
+                        "field": "percent_of_y",
+                        "type": "quantitative",
+                        "title": "",
+                        "scale": {
+                            "domain": [
+                                0,
+                                1
+                            ]
+                        }
+                    }
+                }
+            },
+            {
+                "mark": "text",
+                "encoding": {
+                    "text": {
+                        "field": "percent_of_y",
+                        "type": "quantitative",
+                        "format": ".2f"
+                    },
+                    "color": {
+                        "condition": {
+                            "test": "datum.percent_of_y > 0.5",
+                            "value": "white"
+                        },
+                        "value": "black"
+                    }
+                }
+            }
+        ]
+    }
+}
diff --git a/.dvc/plots/linear.json b/.dvc/plots/linear.json
new file mode 100644
index 0000000..65549f9
--- /dev/null
+++ b/.dvc/plots/linear.json
@@ -0,0 +1,116 @@
+{
+    "$schema": "https://vega.github.io/schema/vega-lite/v4.json",
+    "data": {
+        "values": "<DVC_METRIC_DATA>"
+    },
+    "title": "<DVC_METRIC_TITLE>",
+    "width": 300,
+    "height": 300,
+    "layer": [
+        {
+            "encoding": {
+                "x": {
+                    "field": "<DVC_METRIC_X>",
+                    "type": "quantitative",
+                    "title": "<DVC_METRIC_X_LABEL>"
+                },
+                "y": {
+                    "field": "<DVC_METRIC_Y>",
+                    "type": "quantitative",
+                    "title": "<DVC_METRIC_Y_LABEL>",
+                    "scale": {
+                        "zero": false
+                    }
+                },
+                "color": {
+                    "field": "rev",
+                    "type": "nominal"
+                }
+            },
+            "layer": [
+                {
+                    "mark": "line"
+                },
+                {
+                    "selection": {
+                        "label": {
+                            "type": "single",
+                            "nearest": true,
+                            "on": "mouseover",
+                            "encodings": [
+                                "x"
+                            ],
+                            "empty": "none",
+                            "clear": "mouseout"
+                        }
+                    },
+                    "mark": "point",
+                    "encoding": {
+                        "opacity": {
+                            "condition": {
+                                "selection": "label",
+                                "value": 1
+                            },
+                            "value": 0
+                        }
+                    }
+                }
+            ]
+        },
+        {
+            "transform": [
+                {
+                    "filter": {
+                        "selection": "label"
+                    }
+                }
+            ],
+            "layer": [
+                {
+                    "mark": {
+                        "type": "rule",
+                        "color": "gray"
+                    },
+                    "encoding": {
+                        "x": {
+                            "field": "<DVC_METRIC_X>",
+                            "type": "quantitative"
+                        }
+                    }
+                },
+                {
+                    "encoding": {
+                        "text": {
+                            "type": "quantitative",
+                            "field": "<DVC_METRIC_Y>"
+                        },
+                        "x": {
+                            "field": "<DVC_METRIC_X>",
+                            "type": "quantitative"
+                        },
+                        "y": {
+                            "field": "<DVC_METRIC_Y>",
+                            "type": "quantitative"
+                        }
+                    },
+                    "layer": [
+                        {
+                            "mark": {
+                                "type": "text",
+                                "align": "left",
+                                "dx": 5,
+                                "dy": -5
+                            },
+                            "encoding": {
+                                "color": {
+                                    "type": "nominal",
+                                    "field": "rev"
+                                }
+                            }
+                        }
+                    ]
+                }
+            ]
+        }
+    ]
+}
diff --git a/.dvc/plots/scatter.json b/.dvc/plots/scatter.json
new file mode 100644
index 0000000..9af9304
--- /dev/null
+++ b/.dvc/plots/scatter.json
@@ -0,0 +1,104 @@
+{
+    "$schema": "https://vega.github.io/schema/vega-lite/v4.json",
+    "data": {
+        "values": "<DVC_METRIC_DATA>"
+    },
+    "title": "<DVC_METRIC_TITLE>",
+    "width": 300,
+    "height": 300,
+    "layer": [
+        {
+            "encoding": {
+                "x": {
+                    "field": "<DVC_METRIC_X>",
+                    "type": "quantitative",
+                    "title": "<DVC_METRIC_X_LABEL>"
+                },
+                "y": {
+                    "field": "<DVC_METRIC_Y>",
+                    "type": "quantitative",
+                    "title": "<DVC_METRIC_Y_LABEL>",
+                    "scale": {
+                        "zero": false
+                    }
+                },
+                "color": {
+                    "field": "rev",
+                    "type": "nominal"
+                }
+            },
+            "layer": [
+                {
+                    "mark": "point"
+                },
+                {
+                    "selection": {
+                        "label": {
+                            "type": "single",
+                            "nearest": true,
+                            "on": "mouseover",
+                            "encodings": [
+                                "x"
+                            ],
+                            "empty": "none",
+                            "clear": "mouseout"
+                        }
+                    },
+                    "mark": "point",
+                    "encoding": {
+                        "opacity": {
+                            "condition": {
+                                "selection": "label",
+                                "value": 1
+                            },
+                            "value": 0
+                        }
+                    }
+                }
+            ]
+        },
+        {
+            "transform": [
+                {
+                    "filter": {
+                        "selection": "label"
+                    }
+                }
+            ],
+            "layer": [
+                {
+                    "encoding": {
+                        "text": {
+                            "type": "quantitative",
+                            "field": "<DVC_METRIC_Y>"
+                        },
+                        "x": {
+                            "field": "<DVC_METRIC_X>",
+                            "type": "quantitative"
+                        },
+                        "y": {
+                            "field": "<DVC_METRIC_Y>",
+                            "type": "quantitative"
+                        }
+                    },
+                    "layer": [
+                        {
+                            "mark": {
+                                "type": "text",
+                                "align": "left",
+                                "dx": 5,
+                                "dy": -5
+                            },
+                            "encoding": {
+                                "color": {
+                                    "type": "nominal",
+                                    "field": "rev"
+                                }
+                            }
+                        }
+                    ]
+                }
+            ]
+        }
+    ]
+}
diff --git a/.dvc/plots/simple.json b/.dvc/plots/simple.json
new file mode 100644
index 0000000..9cf71ce
--- /dev/null
+++ b/.dvc/plots/simple.json
@@ -0,0 +1,31 @@
+{
+    "$schema": "https://vega.github.io/schema/vega-lite/v4.json",
+    "data": {
+        "values": "<DVC_METRIC_DATA>"
+    },
+    "title": "<DVC_METRIC_TITLE>",
+    "width": 300,
+    "height": 300,
+    "mark": {
+        "type": "line"
+    },
+    "encoding": {
+        "x": {
+            "field": "<DVC_METRIC_X>",
+            "type": "quantitative",
+            "title": "<DVC_METRIC_X_LABEL>"
+        },
+        "y": {
+            "field": "<DVC_METRIC_Y>",
+            "type": "quantitative",
+            "title": "<DVC_METRIC_Y_LABEL>",
+            "scale": {
+                "zero": false
+            }
+        },
+        "color": {
+            "field": "rev",
+            "type": "nominal"
+        }
+    }
+}
diff --git a/.dvc/plots/smooth.json b/.dvc/plots/smooth.json
new file mode 100644
index 0000000..d497ce7
--- /dev/null
+++ b/.dvc/plots/smooth.json
@@ -0,0 +1,39 @@
+{
+    "$schema": "https://vega.github.io/schema/vega-lite/v4.json",
+    "data": {
+        "values": "<DVC_METRIC_DATA>"
+    },
+    "title": "<DVC_METRIC_TITLE>",
+    "mark": {
+        "type": "line"
+    },
+    "encoding": {
+        "x": {
+            "field": "<DVC_METRIC_X>",
+            "type": "quantitative",
+            "title": "<DVC_METRIC_X_LABEL>"
+        },
+        "y": {
+            "field": "<DVC_METRIC_Y>",
+            "type": "quantitative",
+            "title": "<DVC_METRIC_Y_LABEL>",
+            "scale": {
+                "zero": false
+            }
+        },
+        "color": {
+            "field": "rev",
+            "type": "nominal"
+        }
+    },
+    "transform": [
+        {
+            "loess": "<DVC_METRIC_Y>",
+            "on": "<DVC_METRIC_X>",
+            "groupby": [
+                "rev"
+            ],
+            "bandwidth": 0.3
+        }
+    ]
+}
diff --git a/.dvcignore b/.dvcignore
new file mode 100644
index 0000000..5197305
--- /dev/null
+++ b/.dvcignore
@@ -0,0 +1,3 @@
+# Add patterns of files dvc should ignore, which could improve
+# the performance. Learn more at
+# https://dvc.org/doc/user-guide/dvcignore
diff --git a/.gitmodules b/.gitmodules
new file mode 100644
index 0000000..6ea2203
--- /dev/null
+++ b/.gitmodules
@@ -0,0 +1,3 @@
+[submodule "incl/detr"]
+	path = incl/detr
+	url = ssh://git@git.iqser.com:2222/rr/detr.git
diff --git a/Dockerfile b/Dockerfile
new file mode 100644
index 0000000..04c159b
--- /dev/null
+++ b/Dockerfile
@@ -0,0 +1,35 @@
+FROM python:3.8 as builder1
+
+# Use a virtual environment.
+RUN python -m venv /app/venv
+ENV PATH="/app/venv/bin:$PATH"
+
+# Upgrade pip.
+RUN python -m pip install --upgrade pip
+
+# Make a directory for the service files and copy the service repo into the container.
+WORKDIR /app/service
+COPY . ./
+
+# Set up service as a module and install all its dependencies.
+RUN bash setup/docker_local.sh
+
+# Make a new container and copy all relevant files over to filter out temporary files
+# produced during setup to reduce the final container's size.
+FROM python:3.8
+
+WORKDIR /app/
+COPY --from=builder1  /app .
+ENV PATH="/app/venv/bin:$PATH"
+
+WORKDIR /app/service
+
+RUN apt update --yes
+RUN apt install vim --yes
+RUN apt install poppler-utils --yes
+
+EXPOSE 5000
+EXPOSE 8080
+
+# Run the service loop.
+CMD ["python3", "src/run_service.py"]
diff --git a/__init__.py b/__init__.py
new file mode 100644
index 0000000..e69de29
diff --git a/config.yaml b/config.yaml
new file mode 100644
index 0000000..f1b38a3
--- /dev/null
+++ b/config.yaml
@@ -0,0 +1,7 @@
+device: cpu
+threshold: .5
+
+classes: ["logo", "other", "formula", "signature", "handwriting_other"]
+rejection_class: "other"
+
+checkpoint: checkpoint.pth
diff --git a/data/.gitignore b/data/.gitignore
new file mode 100644
index 0000000..65ac288
--- /dev/null
+++ b/data/.gitignore
@@ -0,0 +1 @@
+/checkpoint.pth
diff --git a/data/checkpoint.pth.dvc b/data/checkpoint.pth.dvc
new file mode 100644
index 0000000..7707825
--- /dev/null
+++ b/data/checkpoint.pth.dvc
@@ -0,0 +1,4 @@
+outs:
+- md5: 9face65530febd41a0722e0513da2264
+  size: 496696129
+  path: checkpoint.pth
diff --git a/data/hub/checkpoints/.gitignore b/data/hub/checkpoints/.gitignore
new file mode 100644
index 0000000..17c6958
--- /dev/null
+++ b/data/hub/checkpoints/.gitignore
@@ -0,0 +1 @@
+/resnet50-0676ba61.pth
diff --git a/data/hub/checkpoints/resnet50-0676ba61.pth.dvc b/data/hub/checkpoints/resnet50-0676ba61.pth.dvc
new file mode 100644
index 0000000..1110d26
--- /dev/null
+++ b/data/hub/checkpoints/resnet50-0676ba61.pth.dvc
@@ -0,0 +1,4 @@
+outs:
+- md5: b94941323912291bb67db6fdb1d80c11
+  size: 102530333
+  path: resnet50-0676ba61.pth
diff --git a/fb_detr/__init__.py b/fb_detr/__init__.py
new file mode 100644
index 0000000..e69de29
diff --git a/fb_detr/locations.py b/fb_detr/locations.py
new file mode 100644
index 0000000..264cdda
--- /dev/null
+++ b/fb_detr/locations.py
@@ -0,0 +1,7 @@
+from pathlib import Path
+
+
+MODULE_ROOT = Path(__file__).resolve().parents[1]
+CONFIG_FILE = MODULE_ROOT / "config.yaml"
+DATA_DIR = MODULE_ROOT / "data"
+TORCH_HOME = DATA_DIR
diff --git a/fb_detr/predictor.py b/fb_detr/predictor.py
new file mode 100644
index 0000000..8055120
--- /dev/null
+++ b/fb_detr/predictor.py
@@ -0,0 +1,121 @@
+import argparse
+from itertools import compress, starmap
+from operator import itemgetter
+from pathlib import Path
+from typing import Iterable
+
+import torch
+from detr.models import build_model
+from detr.test import get_args_parser, infer
+from iteration_utilities import starfilter
+
+from fb_detr.utils.config import read_config
+
+
+def load_model(checkpoint_path):
+
+    parser = argparse.ArgumentParser(parents=[get_args_parser()])
+    args = parser.parse_args()
+
+    if args.output_dir:
+        Path(args.output_dir).mkdir(parents=True, exist_ok=True)
+
+    device = torch.device(read_config("device"))
+
+    model, _, _ = build_model(args)
+
+    checkpoint = torch.load(checkpoint_path, map_location="cpu")
+    model.load_state_dict(checkpoint["model"])
+
+    model.to(device)
+
+    return model
+
+
+class Predictor:
+    def __init__(self, checkpoint_path, classes=None, rejection_class=None):
+        self.model = load_model(checkpoint_path)
+        self.classes = classes
+        self.rejection_class = rejection_class
+
+    @staticmethod
+    def __format_boxes(boxes):
+
+        keys = "x1", "y1", "x2", "y2"
+
+        x1s = boxes[:, 0].tolist()
+        y1s = boxes[:, 1].tolist()
+        x2s = boxes[:, 2].tolist()
+        y2s = boxes[:, 3].tolist()
+
+        boxes = [dict(zip(keys, vs)) for vs in zip(x1s, y1s, x2s, y2s)]
+
+        return boxes
+
+    @staticmethod
+    def __normalize_to_list(maybe_multiple):
+        return maybe_multiple if isinstance(maybe_multiple, tuple) else tuple([maybe_multiple])
+
+    def __format_classes(self, classes):
+        if self.classes:
+            return self.__normalize_to_list(itemgetter(*classes.tolist())(self.classes))
+        else:
+            return classes.tolist()
+
+    def __format_prediction(self, output: dict):
+
+        boxes, classes = itemgetter("bboxes", "classes")(output)
+
+        if len(boxes):
+            boxes = self.__format_boxes(boxes)
+            classes = self.__format_classes(classes)
+        else:
+            boxes, classes = [], []
+
+        output["bboxes"] = boxes
+        output["classes"] = classes
+
+        return output
+
+    def __filter_predictions_for_image(self, predictions):
+
+        boxes, classes = itemgetter("bboxes", "classes")(predictions)
+
+        if boxes:
+            keep = map(lambda c: c != self.rejection_class, classes)
+            compressed = list(compress(zip(boxes, classes), keep))
+            boxes, classes = map(list, zip(*compressed)) if compressed else ([], [])
+            predictions["bboxes"] = boxes
+            predictions["classes"] = classes
+
+        return predictions
+
+    def filter_predictions(self, predictions):
+        def detections_present(_, prediction):
+            return bool(prediction["classes"])
+
+        def build_return_dict(page_idx, predictions):
+            return {"page_idx": page_idx, **predictions}
+
+        filtered_rejections = map(self.__filter_predictions_for_image, predictions)
+        filtered_no_detections = starfilter(detections_present, enumerate(filtered_rejections))
+        filtered_no_detections = starmap(build_return_dict, filtered_no_detections)
+
+        return filtered_no_detections
+
+    def format_predictions(self, outputs: Iterable):
+        return map(self.__format_prediction, outputs)
+
+    def predict(self, images, threshold=None, format_output=False):
+
+        if not threshold:
+            threshold = read_config("threshold")
+
+        predictions = infer(images, self.model, read_config("device"), threshold)
+
+        if format_output:
+            predictions = self.format_predictions(predictions)
+            if self.rejection_class:
+                predictions = self.filter_predictions(predictions)
+
+        return predictions
diff --git a/fb_detr/utils/__init__.py b/fb_detr/utils/__init__.py
new file mode 100644
index 0000000..e69de29
diff --git a/fb_detr/utils/config.py b/fb_detr/utils/config.py
new file mode 100644
index 0000000..c0d9334
--- /dev/null
+++ b/fb_detr/utils/config.py
@@ -0,0 +1,18 @@
+import yaml
+
+from fb_detr.locations import CONFIG_FILE
+
+
+def read_config(key, config_path: str = CONFIG_FILE):
+    """Reads the values associated with a key from a config.
+
+    Args:
+        key: Key to look up the value to.
+        config_path: Path to config.
+
+    Returns:
+        The value associated with `key`.
+    """
+    with open(config_path) as f:
+        config = yaml.load(f, Loader=yaml.FullLoader)
+        return config[key]
diff --git a/incl/__init__.py b/incl/__init__.py
new file mode 100644
index 0000000..e69de29
diff --git a/incl/detr b/incl/detr
new file mode 160000
index 0000000..7e3258c
--- /dev/null
+++ b/incl/detr
@@ -0,0 +1 @@
+Subproject commit 7e3258ccc1fa2be7a9d8ab333873b79de7005809
diff --git a/requirements.txt b/requirements.txt
new file mode 100644
index 0000000..7d4c102
--- /dev/null
+++ b/requirements.txt
@@ -0,0 +1,14 @@
+torch==1.10.2
+numpy==1.22.1
+#opencv-python==4.5.5.62
+opencv-python-headless==4.5.5.62
+torchvision==0.11.3
+pycocotools==2.0.4
+scipy==1.7.3
+pdf2image==1.16.0
+PyYAML==6.0
+Flask==2.0.2
+requests==2.27.1
+iteration-utilities==0.11.0
+dvc==2.9.3
+dvc[ssh]
diff --git a/scripts/client_mock.py b/scripts/client_mock.py
new file mode 100644
index 0000000..e3960cf
--- /dev/null
+++ b/scripts/client_mock.py
@@ -0,0 +1,58 @@
+import argparse
+import json
+from operator import itemgetter
+
+import pdf2image
+import requests
+from PIL import ImageDraw
+
+
+def draw_coco_box(draw: ImageDraw.Draw, bbox, klass):
+    x1, y1, x2, y2 = itemgetter("x1", "y1", "x2", "y2")(bbox)
+    draw.rectangle(((x1, y1), (x2, y2)), outline="red")
+    draw.text((x1, y1), text=klass, fill=(0, 0, 0, 100))
+
+
+def draw_coco_boxes(image, bboxes, classes):
+
+    draw = ImageDraw.Draw(image)
+    for bbox, klass in zip(bboxes, classes):
+        draw_coco_box(draw, bbox, klass)
+
+    return image
+
+
+def annotate(pdf_path, predictions):
+    pages = pdf2image.convert_from_path(pdf_path)
+
+    for prd in predictions:
+        page_idx, boxes, classes = itemgetter("page_idx", "bboxes", "classes")(prd)
+        page = pages[page_idx]
+        image = draw_coco_boxes(page, boxes, classes)
+        image.save(f"/tmp/serv_out/{page_idx}.png")
+
+
+def parse_args():
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--pdf_path", required=True)
+    args = parser.parse_args()
+
+    return args
+
+
+def main(args):
+
+    response = requests.post("http://0.0.0.0:8080", data=open(args.pdf_path, "rb"))
+
+    response.raise_for_status()
+
+    predictions = response.json()
+
+    print(json.dumps(predictions, indent=2))
+
+    annotate(args.pdf_path, predictions)
+
+
+if __name__ == "__main__":
+    args = parse_args()
+    main(args)
diff --git a/scripts/flask_test.py b/scripts/flask_test.py
new file mode 100644
index 0000000..2934f11
--- /dev/null
+++ b/scripts/flask_test.py
@@ -0,0 +1,35 @@
+import argparse
+
+from PIL import Image
+from flask import Flask, request, jsonify
+from pathlib import Path
+
+app = Flask(__name__)
+
+
+@app.before_first_request
+def init():
+    from fb_detr.predictor import Predictor
+
+    global PRED
+
+    PRED = Predictor(args.resume)
+
+
+@app.route("/", methods=["GET", "POST"])
+def predict_request():
+    if request.method == "POST":
+        image_folder_path = request.form.get("image_folder_path")
+        images = list(map(Image.open, Path(image_folder_path).glob("*.png")))
+        results = PRED.predict(images, format_output=True)
+        for result in results:
+            return jsonify(result)
+    if request.method == "GET":
+        return "Not implemented"
+
+
+parser = argparse.ArgumentParser()
+parser.add_argument("--resume", required=True)
+args = parser.parse_args()
+
+app.run()
diff --git a/scripts/predict.py b/scripts/predict.py
new file mode 100644
index 0000000..3c13358
--- /dev/null
+++ b/scripts/predict.py
@@ -0,0 +1,58 @@
+import argparse
+import json
+from pathlib import Path
+
+from detr.test import draw_boxes
+from pdf2image import pdf2image
+
+from fb_detr.predictor import Predictor
+
+
+def parse_args():
+
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--resume", required=True)
+    parser.add_argument("--output_dir", required=True)
+    parser.add_argument("--pdf_path")
+    parser.add_argument("--draw_boxes", default=False, action="store_true")
+
+    args = parser.parse_args()
+
+    return args
+
+
+def build_image_paths(image_root_dir):
+    return [*map(str, Path(image_root_dir).glob("*.png"))]
+
+
+def pdf_to_pages(pdf_path):
+    pages = pdf2image.convert_from_path(pdf_path)
+    return pages
+
+
+def main():
+
+    # TDOO: de-hardcode
+
+    classes = {1: "logo", 2: "other", 3: "formula", 4: "signature", 5: "handwriting_other"}
+
+    args = parse_args()
+    predictor = Predictor(args.resume, classes=classes, rejection_class="other")
+
+    images = pdf_to_pages(args.pdf_path)
+    outputs = predictor.predict(images, 0.5)
+
+    if args.draw_boxes:
+        for im, o in zip(images, outputs):
+            if len(o["bboxes"]):
+                draw_boxes(image=im, **o, output_path=args.output_dir)
+
+    else:
+        outputs = predictor.format_predictions(outputs)
+        outputs = predictor.filter_predictions(outputs)
+        for o in outputs:
+            print(json.dumps(o, indent=2))
+
+
+if __name__ == "__main__":
+    main()
diff --git a/setup.py b/setup.py
new file mode 100644
index 0000000..21d77de
--- /dev/null
+++ b/setup.py
@@ -0,0 +1,13 @@
+#!/usr/bin/env python
+
+from distutils.core import setup
+
+setup(
+    name="fb_detr",
+    version="0.1.0",
+    description="",
+    author="",
+    author_email="",
+    url="",
+    packages=["fb_detr"],
+)
diff --git a/setup/docker.sh b/setup/docker.sh
new file mode 100755
index 0000000..537d3d3
--- /dev/null
+++ b/setup/docker.sh
@@ -0,0 +1,14 @@
+#!/bin/bash
+set -e
+
+python3 -m venv build_venv
+source build_venv/bin/activate
+python3 -m pip install --upgrade pip
+
+pip install dvc
+pip install 'dvc[ssh]'
+dvc pull
+
+git submodule update --init --recursive
+
+docker build -t detr-server .
diff --git a/setup/docker_local.sh b/setup/docker_local.sh
new file mode 100755
index 0000000..1b951d1
--- /dev/null
+++ b/setup/docker_local.sh
@@ -0,0 +1,8 @@
+#!/bin/bash
+set -e
+
+pip install -e .
+pip install -r requirements.txt
+
+cd incl/detr
+pip install -e .
diff --git a/src/run_service.py b/src/run_service.py
new file mode 100644
index 0000000..58528b2
--- /dev/null
+++ b/src/run_service.py
@@ -0,0 +1,66 @@
+import argparse
+import os
+
+from fb_detr.locations import DATA_DIR
+from fb_detr.locations import TORCH_HOME
+from fb_detr.predictor import Predictor
+from flask import Flask, request, jsonify
+from pdf2image import pdf2image
+from fb_detr.utils.config import read_config
+
+
+def parse_args():
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--resume")
+    args = parser.parse_args()
+
+    return args
+
+
+def load_classes():
+    classes = read_config("classes")
+    id2class = dict(zip(range(1, len(classes) + 1), classes))
+    return id2class
+
+
+def get_checkpoint():
+    return DATA_DIR / read_config("checkpoint")
+
+
+def set_torch_env():
+    os.environ["TORCH_HOME"] = str(TORCH_HOME)
+
+
+def main(args):
+    set_torch_env()
+
+    def initialize_predictor():
+        checkpoint = get_checkpoint() if not args.resume else args.resume
+        predictor = Predictor(checkpoint, classes=load_classes(), rejection_class=read_config("rejection_class"))
+        return predictor
+
+    app = Flask(__name__)
+
+    @app.route("/", methods=["POST"])
+    def predict_request():
+
+        pdf = request.data
+
+        pages = pdf2image.convert_from_bytes(pdf)
+        predictions = predictor.predict(pages, format_output=True)
+
+        return jsonify(list(predictions))
+
+    @app.route("/status", methods=["GET"])
+    def status():
+        response = "OK"
+        return jsonify(response)
+
+    predictor = initialize_predictor()
+
+    app.run(host="0.0.0.0", port=8080)
+
+
+if __name__ == "__main__":
+    args = parse_args()
+    main(args)