Skip to content

Partially rewrite `image_extraction.py`

Ghost User requested to merge (removed):image-extraction into master

In this MR, there are some internal changes to image_extraction.py. These changes are:

  • Change function arguments of both PikePDF and Wand extraction functions to use the pikepdf.Page instance instead of the page number and the pikepdf.Pdf instance. In turn, the for-loop is changed to instead loop over the enumeration of the index and pages of the pikepdf.Page instance
  • The extract_image_pikepdf now uses page.images to retrieve images from the page. This slightly changes the behaviour of this function, because it does no longer depend on the XObject existing. Because of this, the AttributeError is now only thrown if the MediaBox attribute does not exist. The docstring is also changed to reflect this change, as well as the unit test which evaluates this behaviour.
  • The extract_image_pikepdf now uses the actual difference in aspect ratio between the PDF page and the image, rather than the difference in ratio between width and height.
  • Add the description of the dpi argument of the extract_image_wand function to its docstring.
  • Use the allowed line length of 120 characters to the full extent.

Merge request reports