loongson/pypi/: pdf2image-1.16.3 metadata and description

Homepage Simple index

A wrapper around the pdftoppm and pdftocairo command line tools to convert PDF to a PIL Image list.

author Edouard Belval
author_email edouard@belval.org
classifiers
  • Development Status :: 5 - Production/Stable
  • Intended Audience :: Developers
  • License :: OSI Approved :: MIT License
  • Programming Language :: Python :: 3
  • Programming Language :: Python :: 3.7
  • Programming Language :: Python :: 3.8
  • Programming Language :: Python :: 3.9
  • Programming Language :: Python :: 3.10
description_content_type text/markdown
keywords pdf image png jpeg jpg convert
license MIT
platform
  • UNKNOWN
requires_dist
  • pillow

Because this project isn't in the mirror_whitelist, no releases from root/pypi are included.

File Tox results History
pdf2image-1.16.3-py3-none-any.whl
Size
11 KB
Type
Python Wheel
Python
3
  • Replaced 2 time(s)
  • Uploaded to loongson/pypi by loongson 2023-06-01 00:42:58
pdf2image-1.16.3.tar.gz
Size
13 KB
Type
Source
  • Replaced 2 time(s)
  • Uploaded to loongson/pypi by loongson 2023-06-01 00:42:59

pdf2image

CircleCI PyPI version codecov Downloads GitHub CI

A python (3.7+) module that wraps pdftoppm and pdftocairo to convert PDF to a PIL Image object

How to install

pip install pdf2image

Windows

Windows users will have to build or download poppler for Windows. I recommend @oschwartz10612 version which is the most up-to-date. You will then have to add the bin/ folder to PATH or use poppler_path = r"C:\path\to\poppler-xx\bin" as an argument in convert_from_path.

Mac

Mac users will have to install poppler.

Installing using Brew:

brew install poppler

Linux

Most distros ship with pdftoppm and pdftocairo. If they are not installed, refer to your package manager to install poppler-utils

Platform-independant (Using conda)

  1. Install poppler: conda install -c conda-forge poppler
  2. Install pdf2image: pip install pdf2image

How does it work?

from pdf2image import convert_from_path, convert_from_bytes

from pdf2image.exceptions import (
    PDFInfoNotInstalledError,
    PDFPageCountError,
    PDFSyntaxError
)

Then simply do:

images = convert_from_path('/home/belval/example.pdf')

OR

images = convert_from_bytes(open('/home/belval/example.pdf', 'rb').read())

OR better yet

import tempfile

with tempfile.TemporaryDirectory() as path:
    images_from_path = convert_from_path('/home/belval/example.pdf', output_folder=path)
    # Do something here

images will be a list of PIL Image representing each page of the PDF document.

Here are the definitions:

convert_from_path(pdf_path, dpi=200, output_folder=None, first_page=None, last_page=None, fmt='ppm', jpegopt=None, thread_count=1, userpw=None, use_cropbox=False, strict=False, transparent=False, single_file=False, output_file=str(uuid.uuid4()), poppler_path=None, grayscale=False, size=None, paths_only=False, use_pdftocairo=False, timeout=600, hide_attributes=False)

convert_from_bytes(pdf_file, dpi=200, output_folder=None, first_page=None, last_page=None, fmt='ppm', jpegopt=None, thread_count=1, userpw=None, use_cropbox=False, strict=False, transparent=False, single_file=False, output_file=str(uuid.uuid4()), poppler_path=None, grayscale=False, size=None, paths_only=False, use_pdftocairo=False, timeout=600, hide_attributes=False)

What's new?

Performance tips

Limitations / known issues