We can take a layered approach to PDF generation in Rails by using a few different techniques to improve performance and reduce the amount of work that needs to be done as the model changes. Such that we can respond dynamically to a changing model and still provide a good user experience. Doing this carefully we can re-use the same pattern for different models in our Rails application. Further this can allow us to also re-use the the view templates for the PDF generation.
Table of contents
- Layered and Cached PDF Generation The criteria for what we would like to accomplish.
- Caching How to setup the cache for the PDF generation.
- Rails Models What we need in our Rails models to make this work.
-
PDF Generation The core of the PDF generation module and options for extending it.
- References & Prerequisites
Layered and Cached PDF Generation
We recently had a need for generating dynamic pdfs for a Rails model Order
with has_many
relationship of OrderItem
, where each item can have a file attachmented of either images or pdf documents. What we wanted to be able to accomplish is creating and caching a current pdf for each Order on demand.
Some of our criteria for the task included:
- Don't generate the document unnecessarily.
- Add headers and footers to each page created.
- Append associated pdfs and images for each OrderItem.
- Don't regenerate parts of the page that don't change ie. headers and footers
- Support custom margins for OrderItem images.
- Expire outdated generated versions.
- Be portable to other use cases.
The external tools that we primarily used to accomplish the task successfully are CombinePDF and Grover. Internally we defined a Rails model PdfStorage
to cache the generated documents in the database. This model should be to able cache multiple variations of the document. In our case we are caching, the headers & footers, the core document, attached images, attached pdfs, and the final document.
Caching
As we don't generate the document before hand, when a controller action is called the model will generate the document and cache it. As we tie the cache to the model, we can expire the cache by updating the model. This is done by updating the updated_at
field on the model. And then running a background job to expire the cache as desired.
class OrdersController < ApplicationController
...
def show
respond_to do |format|
format.html { render :show }
format.pdf { send_data document.to_pdf, filename:, type: "application/pdf" }
end
end
...
private
...
def document
@document ||= @order&.document_with_attachments
end
end
The model for caching the documents is very straightforward and does not require any outside relationships to the rest of the application:
class PdfStorage < ApplicationRecord
validates :cached_at, presence: true, uniqueness: {scope: :name}, on: :create
validates :data, presence: true, on: :create
scope :on_or_before, ->(date) do
where("cached_at <= ?", date)
end
end
To manage the cache, we define a Cache
class within our ParademPDF
module:
module ParademPdf
class Cache
attr_reader :remote_url, :protocol, :full_path
def cached_files
PdfStorage.all.pluck(:name).uniq
end
def remove_all
PdfStorage.delete_all
end
def remove_on_or_before(date)
PdfStorage.on_or_before(date).delete_all
end
def remove_name_on_or_before(date, name)
PdfStorage.where(name:).on_or_before(date).delete_all
end
def remove(name)
PdfStorage.where(name:).delete_all
end
def cached?(name)
PdfStorage.exists?(name:)
end
def check_cache(name, cached_at)
pdf_storage = PdfStorage.find_by(name:, cached_at:)
return pdf_storage.data if pdf_storage.present?
data = yield
PdfStorage.create(name:, data:, cached_at:)
data
end
end
end
Rails Models
Inside the Rails application, the only methods we will need to add to our models are document
& document_with_attachments
in Order
and image_document
& file_download
in OrderItem
.
class Order < ApplicationRecord
has_many :items, class_name: "OrderItem", dependent: :destroy
...
def document
ParademPdf::OrderDocument.new(
name: "Order #{id}",
date: updated_at,
record: self,
template: "pdfs/_order",
remote_url:,
protocol:
)
end
def document_with_attachments
ParademPdf::OrderDocumentWithAttachments.new(
attachments: items.with_image_attachments.collect(&:image_document_pdf),
supplements: items.with_pdf_attachments.collect(&:file_download),
name: "Order #{id} with attachments",
date: updated_at
record: self,
template: "pdfs/_order",
remote_url:,
protocol:
)
end
...
end
class OrderItem < ApplicationRecord
has_one_attached :file, dependent: :destroy
scope :with_file_attachments, -> { joins(file_attachment: :blob) }
scope :with_image_attachments, -> do
with_file_attachments
.references(:file_attachment)
.where(ActiveStorage::Blob.arel_table[:content_type].matches("image/%"))
end
scope :with_pdf_attachments, -> do
with_file_attachments
.references(:file_attachment)
.where(ActiveStorage::Blob.arel_table[:content_type].matches("application/pdf"))
end
...
def file_download
file.download
end
def image_document
return unless file.attached? && file.content_type.include?("image")
ParademPdf::OrderImageAttachmentDocument.new(
name: description,
date: order.delivery_date,
record: self,
template: "pdfs/_order_item_image_attachment",
remote_url:,
protocol:
)
end
def image_document_pdf
image_document.to_pdf
end
...
The key pieces to note here are we don't need to be concerned in the Rails app with the cache and we are can reuse our existing view templates in the application to generate the pdfs.
PDF Generation
The core of the PDF generation module includes just three classes: ParademPdf::Cache
, ParademPdf::Configuration
, and ParademPdf::Document
.
The configuration class simply allows us to define default margins:
module ParademPdf
class Configuration
def self.header_margins
{
top: "50px",
right: "50px",
bottom: "0",
left: "50px"
}
end
def self.body_margins
{
top: "120px", # 50px base margin + 70px for header height
right: "50px",
bottom: "80px", # 50px base margin + 30px for footer height
left: "50px"
}
end
def self.footer_margins
{
right: "50px",
bottom: "50px",
left: "50px"
}
end
end
end
The core of the pdf generation is utilizing the various cached items to generate the final item. The main method we use in the Document
class is to_pdf
:
def to_pdf
@cache.check_cache(record_cache_key, @record.updated_at) do
pdf = Grover.new(render_html(@template, assigns: assignments), margin: body_margins).to_pdf
merge_header_and_footer(pdf)
end
end
This will check to see if we have already generated the document and use the cache to return the document. Otherwise we will render the document and merge the headers & footers:
def merge_header_and_footer(pdf)
document = CombinePDF.parse(pdf)
page_total = document.pages.length
document.pages.each_with_index do |page, index|
page << parse_header(page_number: index + 1, page_total:) if header_template
page << parse_footer(page_number: index + 1, page_total:) if footer_template
end
document.to_pdf
end
The headers & footers are generated and cached in much the same way as the core document. Here is how we generate a header:
def header_pdf(page_number:, page_total:)
@cache.check_cache(header_cache_key(page_number), header_timestamp) do
locals = {
page_number:,
page_total:
}.merge(header_assignments)
Grover.new(render_html(header_template, locals:), margin: header_margins).to_pdf
end
end
What this buys us in broad terms is that we can extend the document class and be able to create variations of the core document and still use the cache for items we have already created. For our specific use case we generated a pdf with the core document and extend the OrderDocument
with OrderDocumentWithAttachments
:
module ParademPdf
class OrderDocumentWithAttachments < OrderDocument
attr_reader(:attachments, :supplements)
def initialize(attachments:, supplements:, **args)
super(**args)
@attachments = attachments
@supplements = supplements
end
def to_pdf
@cache.check_cache(record_with_attachments_cache_key, @record.updated_at) do
Document.merge_documents([super].concat(attachments, supplements))
end
end
private
def record_with_attachments_cache_key
"#{record_cache_key}/with_attachments"
end
end
end
The key part to note here is that we are overriding the to_pdf
method, using the cache again and merging the core document via super
with both attached images and supplemental pdfs. We use a custom cache name here so that we don't use the same cache key as the core document. We also have a document class OrderImageAttachmentDocument
for generating pages for each image in much the same way, but omitting the headers & footers and overriding the top margin.
In our OrderDocument
class we are generating a new header & footer cache monthly. This means that we only generate the headers and footers once a month for this document type:
module ParademPdf
class OrderDocument < Document
private
def assignments
{
order: @record
}
end
def footer_cache_key(page_number)
"#{app_id}/#{@record.class}/footer/#{page_number}"
end
def footer_timestamp
Time.now.beginning_of_month
end
def header_cache_key(_page_number)
"#{app_id}/#{@record.class}/header"
end
def header_timestamp
Time.now.beginning_of_month
end
end
end
The entirety of our Document
class is below and should be fairly extensible:
module ParademPdf
class Document
attr_reader(:remote_url, :protocol, :doc_name, :margin_symbol_offset)
# This is a helper method to merge multiple pdfs into one.
def self.merge_documents(pdfs)
document = CombinePDF.new
pdfs.each do |pdf_file|
document << CombinePDF.parse(pdf_file)
end
document.to_pdf
end
def initialize(name:, date:, record:, template:, remote_url:, protocol:)
@name = name
@date = date
@template = template
@remote_url = remote_url
@protocol = protocol
@renderer = ApplicationController.renderer
@cache = Cache.new
@record = record
end
def body_margins
Configuration.body_margins
end
def filename
return "#{@name}#{filename_suffix}" if @date.blank?
"#{@name} - #{@date}#{filename_suffix}"
end
def filename_suffix
".pdf"
end
def to_pdf
@cache.check_cache(record_cache_key, @record.updated_at) do
pdf = Grover.new(render_html(@template, assigns: assignments), margin: body_margins).to_pdf
merge_header_and_footer(pdf)
end
end
private
def assignments
{}
end
def footer_assignments
{}
end
def footer_cache_key(page_number)
"#{record_cache_key}/footer/#{page_number}"
end
def footer_margins
Configuration.footer_margins
end
def footer_pdf(page_number:, page_total:)
@cache.check_cache(footer_cache_key(page_number), footer_timestamp) do
locals = {
page_number:,
page_total:
}.merge(footer_assignments)
Grover.new(render_html(footer_template, locals:), margin: footer_margins).to_pdf
end
end
def footer_template
"pdfs/_footer"
end
def footer_timestamp
@record.created_at
end
def header_assignments
{}
end
def header_cache_key(page_number)
"#{record_cache_key}/header/#{page_number}"
end
def header_margins
Configuration.header_margins
end
def header_pdf(page_number:, page_total:)
@cache.check_cache(header_cache_key(page_number), header_timestamp) do
locals = {
page_number:,
page_total:
}.merge(header_assignments)
Grover.new(render_html(header_template, locals:), margin: header_margins).to_pdf
end
end
def header_template
"pdfs/_header"
end
def header_timestamp
@record.created_at
end
# This method merges the header, body and footer pages into one pdf.
def merge_header_and_footer(pdf)
document = CombinePDF.parse(pdf)
page_total = document.pages.length
document.pages.each_with_index do |page, index|
page << parse_header(page_number: index + 1, page_total:) if header_template
page << parse_footer(page_number: index + 1, page_total:) if footer_template
end
document.to_pdf
end
def render_html(template, assigns: {}, locals: {})
Grover::HTMLPreprocessor.process(
@renderer.render_to_string(
template:,
layout: "pdf",
assigns:,
locals:
),
remote_url, protocol
)
end
def parse_header(page_number:, page_total:)
CombinePDF.parse(header_pdf(page_number:, page_total:)).pages[0]
end
def parse_footer(page_number:, page_total:)
CombinePDF.parse(footer_pdf(page_number:, page_total:)).pages[0]
end
def record_cache_key
@record.to_global_id.to_s
end
def app_id
@app_id ||= "gid://#{Rails.application.config.global_id.app}"
end
end
end