
Using qpdf in Cloud Functions for Firebase (2nd gen)
At Coup, we frequently collect and process PDFs, including invoices, payment receipts, letters, notices etc. We categorize, process, and eventually combine these PDFs, often as part of our effort to produce well-organized documentation for our customers' research & development cost. This requires some form of PDF processing capabilities. And in Node.js, our environment of choice, readily available options are constrained. For example, the otherwise excellent pdf-lib
package does not support processing encrypted PDFs, which is required with many user-provided PDFs.
A great option would be to use qpdf
, a native library supporting many operations on PDFs, including decrypting them. However, qpdf is written in C++ and not readily available in Cloud Functions for Firebase / Google Cloud Functions.
ℹ️ An alternative for processing encrypted PDFs is Ghostscript, which is even available as a system package in Cloud Functions for Firebase. However, Ghostscript focuses on printing PDFs, which can lead to information like embedded files or attachments being lost. A no-go in our use-case. In contrast, qpdf is built to perform only content-preserving transformations.
Fortunately, it is possible to make qqpdf available in Cloud Functions for Firebase. To do so, grab the stand-alone Linux binary distributed by qpdf. At the time of writing this post, that is qpdf-12.2.0-bin-linux-x86_64.zip
. This ZIP file needs to be included in the Cloud Function deployment, for example by placing it in a folder bin
next to the entry index
file. The binary within the ZIP should work in Cloud Functions, which run in a container image based on Ubuntu 22.
Next, when executing a Cloud Function, the ZIP file needs to be extracted, and the contained qpdf
binary's permission changed to allow executing it. The following code achieves just that:
1async function installQPDF(): Promise<string> {2// point to the zip file - depending on your folder structure, this path may3// look different:4const zipPath = path.join(5__dirname, // /workspace/lib/functions/src/6'..', // /workspace/lib/functions/7'..', // /workspace/lib/8'..', // /workspace/9'bin',10'qpdf-12.2.0-bin-linux-x86_64.zip'11);1213// setup folder to unzip qpdf into:14const extractDir = join(tmpdir(), 'qpdf-12.2.0-bin-linux-x86_64');15if (!existsSync(extractDir)) {16mkdirSync(extractDir);17}1819// the qpdf binary will be available within a `bin` subfolder:20const qpdfPath = path.join(extractDir, 'bin', 'qpdf');2122// save processing, if qpdf was already installed:23if (existsSync(qpdfPath)) {24logger.info('qpdf already installed');25return qpdfPath;26}2728try {29await new Promise<void>((resolve, reject) => {30execFile(31'unzip',32['-o', zipPath, '-d', extractDir],33(error, _, stderr) => {34if (error) {35logger.error('Failed to unzip qpdf', { error, stderr });36reject(error);37} else {38resolve();39}40}41);42});4344// allow executing the qpdf binary:45chmodSync(qpdfPath, 0o755);4647logger.info(`Installed qpdf to be executable at ${qpdfPath}`);48return qpdfPath;49} catch (error) {50throw new Error('Failed to install qpdf');51}52}
Importantly, qpdf should be provided in zipped form, as it contains symlinks within the lib
folder. We found that trying to upload the unzipped files led to runtime errors when executing qpdf (file too short
).
The installQPDF
function returns the path pointing to the qpdf
binary, which can then be called as a child process:
1const pqdfPath = await installQPDF();23// example: decrypt some file4await new Promise((resolve, reject) => {5execFile(6qpdfPath,7[`--password=some_password`, '--decrypt', 'file/to/input.pdf', 'file/to/output.pdf'],8(error, stdout) => {9if (error) reject(error);10else resolve(stdout);11}12);13});
As with the qpdf library itself, the input and output PDFs can be (temporarily) placed in the folder returned by tmpdir
from the os
system package (make sure to clean them up after processing).