Layered and Cached PDF Generation in Rails by David Higgins, Developer

We can take a layered approach to PDF generation in Rails by using a few different techniques to improve performance and reduce the amount of work that needs to be done as the model changes. Such that we can respond dynamically to a changing model and still provide a good user experience. Doing this carefully we can re-use the same pattern for different models in our Rails application. Further this can allow us to also re-use the view templates for the PDF generation.

Layered and Cached PDF Generation The criteria for what we would like to accomplish.
Caching How to setup the cache for the PDF generation.
Rails Models What we need in our Rails models to make this work.
PDF Generation The core of the PDF generation module and options for extending it.
References & Prerequisites
- CombinePDF
- Grover

Layered and Cached PDF Generation

We recently had a need for generating dynamic pdfs for a Rails model Order with has_many relationship of OrderItem, where each item can have a file attachmented of either images or pdf documents. What we wanted to be able to accomplish is creating and caching a current pdf for each Order on demand.

Some of our criteria for the task included:

Don't generate the document unnecessarily.
Add headers and footers to each page created.
Append associated pdfs and images for each OrderItem.
Don't regenerate parts of the page that don't change ie. headers and footers
Support custom margins for OrderItem images.
Expire outdated generated versions.
Be portable to other use cases.

The external tools that we primarily used to accomplish the task successfully are CombinePDF and Grover. Internally we defined a Rails model PdfStorage to cache the generated documents in the database. This model should be able to cache multiple variations of the document. In our case we are caching, the headers & footers, the core document, attached images, attached pdfs, and the final document.

Caching

As we don't generate the document before hand, when a controller action is called the model will generate the document and cache it. As we tie the cache to the model, we can expire the cache by updating the model. This is done by updating the updated_at field on the model. And then running a background job to expire the cache as desired.

  
class OrdersController < ApplicationController
...
  def show
    respond_to do |format|
      format.html { render :show }
      format.pdf { send_data document.to_pdf, filename:, type: "application/pdf" }
    end
  end
...
private
...
  def document
    @document ||= @order&.document_with_attachments
  end
end

The model for caching the documents is very straightforward and does not require any outside relationships to the rest of the application:

  
class PdfStorage < ApplicationRecord
  validates :cached_at, presence: true, uniqueness: {scope: :name}, on: :create
  validates :data, presence: true, on: :create

  scope :on_or_before, ->(date) do
    where("cached_at <= ?", date)
  end
end

To manage the cache, we define a Cache class within our ParademPDF module:

  
module ParademPdf
  class Cache
    attr_reader :remote_url, :protocol, :full_path

    def cached_files
      PdfStorage.all.pluck(:name).uniq
    end

    def remove_all
      PdfStorage.delete_all
    end

    def remove_on_or_before(date)
      PdfStorage.on_or_before(date).delete_all
    end

    def remove_name_on_or_before(date, name)
      PdfStorage.where(name:).on_or_before(date).delete_all
    end

    def remove(name)
      PdfStorage.where(name:).delete_all
    end

    def cached?(name)
      PdfStorage.exists?(name:)
    end

    def check_cache(name, cached_at)
      pdf_storage = PdfStorage.find_by(name:, cached_at:)
      return pdf_storage.data if pdf_storage.present?

      data = yield
      PdfStorage.create(name:, data:, cached_at:)
      data
    end
  end
end

Rails Models

Inside the Rails application, the only methods we will need to add to our models are document & document_with_attachments in Order and image_document & file_download in OrderItem.

  
class Order < ApplicationRecord
  has_many :items, class_name: "OrderItem", dependent: :destroy
...
  def document
    ParademPdf::OrderDocument.new(
      name: "Order #{id}",
      date: updated_at,
      record: self,
      template: "pdfs/_order",
      remote_url:,
      protocol:
    )
  end

  def document_with_attachments
    ParademPdf::OrderDocumentWithAttachments.new(
      attachments: items.with_image_attachments.collect(&:image_document_pdf),
      supplements: items.with_pdf_attachments.collect(&:file_download),
      name: "Order #{id} with attachments",
      date: updated_at
      record: self,
      template: "pdfs/_order",
      remote_url:,
      protocol:
    )
  end
...
end

class OrderItem < ApplicationRecord
  has_one_attached :file, dependent: :destroy

  scope :with_file_attachments, -> { joins(file_attachment: :blob) }

  scope :with_image_attachments, -> do
    with_file_attachments
      .references(:file_attachment)
      .where(ActiveStorage::Blob.arel_table[:content_type].matches("image/%"))
  end

  scope :with_pdf_attachments, -> do
    with_file_attachments
      .references(:file_attachment)
      .where(ActiveStorage::Blob.arel_table[:content_type].matches("application/pdf"))
  end
...
  def file_download
    file.download
  end

  def image_document
    return unless file.attached? && file.content_type.include?("image")

    ParademPdf::OrderImageAttachmentDocument.new(
      name: description,
      date: order.delivery_date,
      record: self,
      template: "pdfs/_order_item_image_attachment",
      remote_url:,
      protocol:
    )
  end

  def image_document_pdf
    image_document.to_pdf
  end
...

The key pieces to note here are we don't need to be concerned in the Rails app with the cache and we are can reuse our existing view templates in the application to generate the pdfs.

PDF Generation

The core of the PDF generation module includes just three classes: ParademPdf::Cache, ParademPdf::Configuration, and ParademPdf::Document.

The configuration class simply allows us to define default margins:

  
module ParademPdf
  class Configuration
    def self.header_margins
      {
        top: "50px",
        right: "50px",
        bottom: "0",
        left: "50px"
      }
    end

    def self.body_margins
      {
        top: "120px",   # 50px base margin + 70px for header height
        right: "50px",
        bottom: "80px", # 50px base margin + 30px for footer height
        left: "50px"
      }
    end

    def self.footer_margins
      {
        right: "50px",
        bottom: "50px",
        left: "50px"
      }
    end
  end
end

The core of the pdf generation is utilizing the various cached items to generate the final item. The main method we use in the Document class is to_pdf:

  
    def to_pdf
      @cache.check_cache(record_cache_key, @record.updated_at) do
        pdf = Grover.new(render_html(@template, assigns: assignments), margin: body_margins).to_pdf

        merge_header_and_footer(pdf)
      end
    end

This will check to see if we have already generated the document and use the cache to return the document. Otherwise we will render the document and merge the headers & footers:

  
    def merge_header_and_footer(pdf)
      document = CombinePDF.parse(pdf)
      page_total = document.pages.length

      document.pages.each_with_index do |page, index|
        page << parse_header(page_number: index + 1, page_total:) if header_template
        page << parse_footer(page_number: index + 1, page_total:) if footer_template
      end

      document.to_pdf
    end

The headers & footers are generated and cached in much the same way as the core document. Here is how we generate a header:

  
    def header_pdf(page_number:, page_total:)
      @cache.check_cache(header_cache_key(page_number), header_timestamp) do
        locals = {
          page_number:,
          page_total:
        }.merge(header_assignments)

        Grover.new(render_html(header_template, locals:), margin: header_margins).to_pdf
      end
    end

What this buys us in broad terms is that we can extend the document class and be able to create variations of the core document and still use the cache for items we have already created. For our specific use case we generated a pdf with the core document and extend the OrderDocument with OrderDocumentWithAttachments:

  
module ParademPdf
  class OrderDocumentWithAttachments < OrderDocument
    attr_reader(:attachments, :supplements)

    def initialize(attachments:, supplements:, **args)
      super(**args)

      @attachments = attachments
      @supplements = supplements
    end

    def to_pdf
      @cache.check_cache(record_with_attachments_cache_key, @record.updated_at) do
        Document.merge_documents([super].concat(attachments, supplements))
      end
    end

    private

    def record_with_attachments_cache_key
      "#{record_cache_key}/with_attachments"
    end
  end
end

The key part to note here is that we are overriding the to_pdf method, using the cache again and merging the core document via super with both attached images and supplemental pdfs. We use a custom cache name here so that we don't use the same cache key as the core document. We also have a document class OrderImageAttachmentDocument for generating pages for each image in much the same way, but omitting the headers & footers and overriding the top margin.

In our OrderDocument class we are generating a new header & footer cache monthly. This means that we only generate the headers and footers once a month for this document type:

  
module ParademPdf
  class OrderDocument < Document
    private

    def assignments
      {
        order: @record
      }
    end

    def footer_cache_key(page_number)
      "#{app_id}/#{@record.class}/footer/#{page_number}"
    end

    def footer_timestamp
      Time.now.beginning_of_month
    end

    def header_cache_key(_page_number)
      "#{app_id}/#{@record.class}/header"
    end

    def header_timestamp
      Time.now.beginning_of_month
    end
  end
end

The entirety of our Document class is below and should be fairly extensible:

  
module ParademPdf
  class Document
    attr_reader(:remote_url, :protocol, :doc_name, :margin_symbol_offset)

    # This is a helper method to merge multiple pdfs into one.
    def self.merge_documents(pdfs)
      document = CombinePDF.new
      pdfs.each do |pdf_file|
        document << CombinePDF.parse(pdf_file)
      end

      document.to_pdf
    end

    def initialize(name:, date:, record:, template:, remote_url:, protocol:)
      @name = name
      @date = date
      @template = template
      @remote_url = remote_url
      @protocol = protocol
      @renderer = ApplicationController.renderer
      @cache = Cache.new
      @record = record
    end

    def body_margins
      Configuration.body_margins
    end

    def filename
      return "#{@name}#{filename_suffix}" if @date.blank?

      "#{@name} - #{@date}#{filename_suffix}"
    end

    def filename_suffix
      ".pdf"
    end

    def to_pdf
      @cache.check_cache(record_cache_key, @record.updated_at) do
        pdf = Grover.new(render_html(@template, assigns: assignments), margin: body_margins).to_pdf

        merge_header_and_footer(pdf)
      end
    end

    private

    def assignments
      {}
    end

    def footer_assignments
      {}
    end

    def footer_cache_key(page_number)
      "#{record_cache_key}/footer/#{page_number}"
    end

    def footer_margins
      Configuration.footer_margins
    end

    def footer_pdf(page_number:, page_total:)
      @cache.check_cache(footer_cache_key(page_number), footer_timestamp) do
        locals = {
          page_number:,
          page_total:
        }.merge(footer_assignments)

        Grover.new(render_html(footer_template, locals:), margin: footer_margins).to_pdf
      end
    end

    def footer_template
      "pdfs/_footer"
    end

    def footer_timestamp
      @record.created_at
    end

    def header_assignments
      {}
    end

    def header_cache_key(page_number)
      "#{record_cache_key}/header/#{page_number}"
    end

    def header_margins
      Configuration.header_margins
    end

    def header_pdf(page_number:, page_total:)
      @cache.check_cache(header_cache_key(page_number), header_timestamp) do
        locals = {
          page_number:,
          page_total:
        }.merge(header_assignments)

        Grover.new(render_html(header_template, locals:), margin: header_margins).to_pdf
      end
    end

    def header_template
      "pdfs/_header"
    end

    def header_timestamp
      @record.created_at
    end

    # This method merges the header, body and footer pages into one pdf.
    def merge_header_and_footer(pdf)
      document = CombinePDF.parse(pdf)
      page_total = document.pages.length

      document.pages.each_with_index do |page, index|
        page << parse_header(page_number: index + 1, page_total:) if header_template
        page << parse_footer(page_number: index + 1, page_total:) if footer_template
      end

      document.to_pdf
    end

    def render_html(template, assigns: {}, locals: {})
      Grover::HTMLPreprocessor.process(
        @renderer.render_to_string(
          template:,
          layout: "pdf",
          assigns:,
          locals:
        ),
        remote_url, protocol
      )
    end

    def parse_header(page_number:, page_total:)
      CombinePDF.parse(header_pdf(page_number:, page_total:)).pages[0]
    end

    def parse_footer(page_number:, page_total:)
      CombinePDF.parse(footer_pdf(page_number:, page_total:)).pages[0]
    end

    def record_cache_key
      @record.to_global_id.to_s
    end

    def app_id
      @app_id ||= "gid://#{Rails.application.config.global_id.app}"
    end
  end
end

Layered and Cached PDF Generation in Rails by David Higgins, Developer

Table of contents