Split import and processing pipeline
When a user uploads a file:
- We save the file to the disk, create a
Scan
entry, and start the importing task. - In the importing task, we extract every image we can find in the file. We store this image information the
Image
table, containing:path
,origin
,status
andscan_id
. - After the import task is finished, we start the processing task for the
scan_id
. - The processing task reads all
Image
entries from the database, and processes them according tostatus
, creatingPage
entries in the process.
The Image
table:
-
id
: primary key -
path
: path to the image on disk -
origin
: path to the image in the original scan file -
status
: one ofnormal
,raw
,failed
. -
scan_id
: points to the scan
The Page
table:
- Add
image_id
- Remove
path
We can deduce what to do with an Image
the following way (related to #455):
- The status
normal
/raw
without aPage
entry means: waiting for processing with the normal/raw pipeline. - The status
normal
with aPage
entry means: processed successfully with the normal pipeline, immutable. - The status
raw
with aPage
entry means: processed successfully with the raw pipeline, we still allow assigning a different student, page and copy manually. - The status
failed
without aPage
entry indicates the user can either manually assign student, page and copy, or delete the image altogether.