MuPDF

Language: C

PDF / Document Processing

MuPDF was developed by Artifex Software to provide a fast and compact alternative to larger PDF libraries. It is commonly used in mobile applications, desktop software, and embedded systems where performance and low resource usage are critical.

MuPDF is a lightweight, high-performance PDF, XPS, and EPUB rendering library written in C. It allows text extraction, page rendering, and manipulation of PDF documents with minimal memory usage.

Installation

linux: sudo apt install mupdf mupdf-tools libmupdf-dev

mac: brew install mupdf

windows: Download precompiled binaries from https://mupdf.com/downloads/

Usage

MuPDF provides APIs to render PDF pages to bitmaps, extract text, access metadata, and annotate documents. It supports interactive viewing, searching, and converting pages to images or other formats.

Rendering a PDF page to PNG

# Command-line example
mutool draw -o output.png input.pdf 1

Uses the `mutool` command-line utility to render the first page of a PDF to a PNG image.

Extracting text from a PDF page

# Command-line example
mutool extract input.pdf

Extracts embedded text and images from a PDF document using MuPDF utilities.

Using the C API to open a PDF document

#include "mupdf/fitz.h"
int main() {
    fz_context *ctx = fz_new_context(NULL, NULL, FZ_STORE_DEFAULT);
    fz_document *doc = fz_open_document(ctx, "input.pdf");
    int page_count = fz_count_pages(ctx, doc);
    printf("Number of pages: %d\n", page_count);
    fz_drop_document(ctx, doc);
    fz_drop_context(ctx);
    return 0;
}

Opens a PDF document using the MuPDF C API and prints the number of pages.

Rendering a page to an image buffer

// Use fz_new_pixmap_from_page to render pages and save as PNG or display in GUI applications.

Demonstrates rendering PDF pages programmatically for GUI display or image export.

Text extraction using C API

// Use fz_new_text_page and fz_text_page_from_page to extract structured text from a PDF page.

Enables text extraction programmatically, suitable for search or text analysis applications.

Error Handling

Failed to open document: Ensure the file path is correct and the PDF is not corrupted.

Memory allocation errors: Manage contexts, documents, and pixmaps carefully to avoid leaks.

Text extraction returns empty: Some PDFs may store text as images; OCR may be required.

Best Practices

Always create and drop `fz_context` to manage memory safely.

Check return values for all MuPDF API calls to handle errors gracefully.

Use MuPDF for lightweight PDF rendering where memory efficiency is important.

Release resources such as documents and pixmaps after use.

Combine MuPDF with GUI libraries for building custom PDF viewers.

Official Docs Github