Already flattened PDFs cannot be processed
If you create an exam with a flattened PDF the downloaded copies cannot be processed.
Steps to reproduce:
- Create a new exam with any flattened PDF, for example
tests/data/flattened-a4-2pages.pdf
. - Finalize the exam and download a copy
- Upload it to the submissions
- It will fail to process:
Failed on all x pages.
The reason for this is that the original PDF already contains only one image that meets the specifications needed for pikepdf
. When you download a copy, the data matrix is drawn as an inline image, which does not count towards the total of image resources on the page. When our pipeline tries to extract the image from the PDF with pikepdf
it sees only the original image, and thinks it is a scan. This original image does of course not contain a data matrix, and thus the processing fails.
Possible solutions I can think of:
- Draw the data matrix as a regular image instead of an inline image. However, this could have possible performance implications, as the PDF now needs a resource table, while it probably didn't have it before.
- Feed the page to Wand after reading a data matrix failed? Unfortunately this solution is a lot more work.
This is a minor issue though, and it is probably only relevant to testing so I'm not sure if we want to fix it.