PostgreSQL Views in Rails by David Higgins, Developer

PostgreSQL Views can be a powerful tool for joining data from multiple databases in a Rails project. Should your project only have read access to the target database, you can use a view to join the data from the target database via a Rails model. However, getting everything setup and working can be a bit difficult. This is a not so brief reference for how to get the most out of using views in a Rails app.

TLDR / Table of contents

At a high level these are several gotchas to pay attention to get everything working as expected:

Data permissions Ensure that you are able to assign privileges to the target database that contains the tables you want to join to.
Data types Ensure that the data types in the target database are correct for the columns you want to join to. Add indexes as appropriate.
View migrations Install the dblink extension for PostgreSQL and configure the view in a rails migration.
Local data models Define the model within the Rails app with the appropriate primary key and relationships. Methods & scopes here should work as expected for regular Rails models. Add a helper model for testing the view.
Testing Mock the external database and table data. This is a bit tricky when working with separate data sources.
CI / CD Ensure that the migration actually creates the view properly when running tests.
Deployment Ensure that the view is created on staging and production. This may need to be done manually.
References & Prerequisites
- PostgreSQL 7.2+
- PostgreSQL Views
- PostgreSQL dblink
- pg gem
- Rspec Rails for this example, however any testing framework should suffice.

Background

Recently I have been working with a client that has been using two separate databases since Rails 4 or 5. So the database connection to the target data is not quite modern & is read only, however the project has been largely updated to recent gem versions and is running Rails 7.0. Both databases are hosted on the same server and is using different credentials for each database without superuser access on staging and production.

The client is doing some machine learning and processing on the data and would like to make the result available back to the Rails app. The data does not need to be edited and other options for ingesting the data would involve duplicating the data locally and create the possibility for it to get out of sync.

Using a PostgreSQL View seemed like the best option for this project. The benefits that using a view provides include:

The data is always in sync.
Ability to use existing rails gems and conventions to use the external data as if it was local data.
Ability to map table and column names to match naming conventions.
Ability to search, sort, and paginate the external data.

While the standard rails migrations worked well for local dev, the staging and production environments required a bit of manual work to create the view and expose the appropriate data. The primary difficulty we run into is that the client is using the same instance to host all of their databases and it would be inadvisable to make permissions on the instance less strict.

Some of data tables we wanted view in Rails required some updating before connecting them to a view. Specifically the target database tables were missing:

Columns for id, created_at, and updated_at
Indexes for several important columns. These are not required for the view to work, but are recommended to improve query performance. (ie id, uuid)

Gotchas

Let's dig into a more concrete example of how to address the issues we ran across.

The following sections will need to be manually run on each staging server to connect them to their appropriate external database connection and tables. The following is an example of how to connect the external_data table for a staging server. Once the setup has been complete it should not need to be run again unless the target database is changed. The same process will need to be done for each table that we want to create a view for. The Rails migrations for those tables should serve as a template to create the views manually. Note: These instructions assume that the target database is running on the same host. It should be possible to adapt these instructions to work with a remote host as well.

Legend

External database: external_db_staging
External database table name: external_data
External database user: external_user
External database password: *_supersecret_value_* (not shown)
Rails database user: rails_db_user
Rails view: analytic_data_views

Data permissions

We want to ensure that the rails_db_user user has the appropriate privileges to the target database.

Run these commands on the target database:

-- Grant CONNECT privilege to the user on the target database
GRANT CONNECT ON DATABASE external_db_staging TO rails_db_user;

-- Grant SELECT privilege to the user on all tables in the target database
GRANT SELECT ON ALL TABLES IN SCHEMA public TO rails_db_user;

-- Grant SELECT privilege to the user on all future tables in the target database
ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT SELECT ON TABLES TO rails_db_user;

Data types

Recommended columns to either added or be present in the external database table:

id
uuid (What we want to use to join to data in the Rails app)
created_at
updated_at

Add a sequence for the id column if necessary.

CREATE SEQUENCE external_data_id_seq START 1;
ALTER TABLE external_data ALTER COLUMN id SET DEFAULT nextval('external_data_id_seq');

Create indexes on the external data as appropriate.

CREATE UNIQUE INDEX external_data_pkey ON external_data (id);
CREATE INDEX index_external_data_on_uuid ON external_data (uuid);
CREATE INDEX index_external_data_on_reference_date ON external_data (reference_date);

View Migrations

This is the view that will be used to join the data from the external database to the Rails app. The view is created in the primary database so that the Rails can access and index the data.

We can also include in the migration the ability to check if the dblink extension is install.

The point of the migration here is to facilitate local dev and testing environments. Running it on staging or production will likely produce a broken view.

Some points to keep in mind:

We are extracting database details from environment variables.
When using the connection in local dev or our testing environments as a superuser we do not need to provide the user or password in the connection details.
We are using parallel specs to speed up the tests so we are including the variable for the test number.

Note 1: This bit is not able to run via the standard rails rake migration functions on staging & production. It will be address in a following section.

Note 2: We are not using the same table and column names internally that the external database is using. This is useful as we can follow the terminology has already defined within the Rails app and be a bit more explicit in what we are referring to. This also allows us to re-order columns so that they are a bit closer to what the Rails standard is.

# db/migrate/20240426211995_add_view_for_analytic_data_on_external_database.rb
class AddViewForAnalyticDataOnExternalDatabase < ActiveRecord::Migration[7.0]
  def up
    return if view_exists?(:analytic_data_views)

    external_connection =
      if ENV["EXTERNAL_DB_USER"].present? && ENV["EXTERNAL_DB_PASSWORD"].present?
        "dbname=#{ENV["EXTERNAL_DATABASE"]} user=#{ENV["EXTERNAL_DB_USER"]} " \
        "password=#{ENV["EXTERNAL_DB_PASSWORD"]} options=-csearch_path="
      else
        "dbname=#{ENV["EXTERNAL_DATABASE"]}#{ENV["TEST_ENV_NUMBER"]} options=-csearch_path="
      end

    execute <<~SQL
      DO $$
        BEGIN
          IF NOT EXISTS (
            SELECT 1
              FROM   pg_catalog.pg_extension
              WHERE  extname = 'dblink')
              THEN CREATE EXTENSION dblink;
          END IF;

          CREATE OR REPLACE VIEW analytic_data_views AS
          SELECT *
          FROM dblink('#{external_connection}',
            'SELECT
              id, reference_date, ext_division as division, processed_data as relevant_data,
              prior as previous_data, average, uuid, created_at, updated_at
              FROM public.external_data')
            AS t1(
              id int,
              reference_date date,
              division text,
              relevant_data text,
              previous_data text,
              average float8,
              uuid text,
              created_at timestamptz,
              updated_at timestamptz);
            END
      $$;
    SQL
  end

  def down
    execute <<~SQL
      DROP VIEW IF EXISTS analytic_data_views;
    SQL
  end
end

Testing Migration

Our current setup is using a rake task to seed the external test database. Your mileage may vary here. The important thing is that you can create the external database and necessary tables.

# lib/tasks/external_database.rake
namespace :external_database do
  task create_tables: :environment do
    p "Creating external database tables..."
    ApplicationRecord.establish_connection "external_db_#{Rails.env}".to_sym

    ActiveRecord::Migration.class_eval do
      unless table_exists? :external_data
        create_table :external_data do |t|
          t.date :reference_date
          t.text :uuid
          t.text :ext_division
          t.text :processed_data
          t.text :prior
          t.float :average

          t.timestamps

          t.index :uuid, unique: false
          t.index :reference_date
        end
      end
    end
  end
end

Local data models

We want to create two models to access the data in the Rails app. The first model is used to access the data in the view. The second model is used to assist when running tests and seeding the external test database.

Note: We are defining the default scope to be readonly.

# app/models/analytic_data_view.rb
class AnalyticDataView < ApplicationRecord
  self.primary_key = "id"
  default_scope { readonly.order(reference_date: :desc) }

  has_one :parent_model, foreign_key: :uuid, primary_key: :uuid
  # Note: There should be a reflective relationship between the parent model and
  #       the analytic data view.
  #       `has_many :analytic_data_views, foreign_key: :uuid, primary_key: :uuid`

  class << self
    # Note: We are using Ransack to filter the data in rails, however this is
    #       not required to connect the view.
    #       Standard Rails conventions will suffice.
    def ransackable_attributes(auth_object = nil)
      %w[id reference_date division average relevant_data previous_data uuid created_at updated_at]
    end

    def ransackable_associations(auth_object = nil)
      %w[parent_model]
    end
  end
end

# app/models/external_database/external_data.rb
class ExternalDatabase::ExternalData < Daex::Base
  self.abstract_class = true
  establish_connection "external_db_#{Rails.env}".to_sym

  self.table_name = "external_data"
end

Testing

In our example we are using Rspec for the testing framework. However the concepts here should translate to other testing frameworks.

In our spec_helper file we are manually creating fixtures that live outside of the transactions within descriptions and test. This is important for the data to already exist in the external test database before we open a transaction chain. Otherwise the view will not see the data because it has not been committed externally.

# spec/spec_helper.rb
RSpec.configure do |config|
# ...
  config.before(:suite) do
    ExternalDatabase::ExternalData.create!(
      reference_date: Date.yesterday,
      uuid: "b8399d43-97b3-472f-aa8e-bec972b04323",
      ext_division: "division",
      assignment: "RNDM12",
      prior: "RNDM14",
      average: 100.0,
      created_at: DateTime.now,
      updated_at: DateTime.now)
    ExternalDatabase::ExternalData.create!(
      reference_date: Date.yesterday,
      uuid: "08b6159d-ef0e-4955-aaf0-2b301cf87870",
      ext_division: "another division",
      assignment: "RDO12",
      prior: "RDO14",
      average: 105.0,
      created_at: DateTime.now,
      updated_at: DateTime.now)
  end

  configure.after(:suite) do
    ExternalDatabase::ExternalData.destroy_all
  end
end

This allows us to create related items and write expectations against the view as we would normally do. As we create these outside of the transaction chain the data should be generic so that several different tests can be run against the same read only data.

CI / CD

We are using Semaphore for our continuous integration. The important thing to keep in mind here is to ensure that you are setting up the external database and running the migrations before running the tests. And connecting the view the external database properly.

Note: You may want to ensure that the view is created properly by running the specific migration down & up:

bin/rails db:migrate:down VERSION=20240426211995 RAILS_ENV=test
bin/rails db:migrate:up VERSION=20240426211995 RAILS_ENV=test

Deployment

The rails migration for the view was not creating the view properly on our staging and production environments. We were able to resolve this by manually creating the view in the database. Once the view was created everything worked as expected.

CREATE OR REPLACE VIEW analytic_data_views AS
  SELECT *
  FROM dblink('dbname=external_db_staging user=external_user password=*_supersecret_value_* options=-csearch_path=',
    'SELECT
      id, reference_date, ext_division as division, processed_data as relevant_data,
      prior as previous_data, average, uuid, created_at, updated_at
      FROM public.external_data')
    AS t1(
      id int,
      reference_date date,
      division text,
      relevant_data text,
      previous_data text,
      average float8,
      uuid text,
      created_at timestamptz,
      updated_at timestamptz);

Use the following command should you need to remove an incorrectly created view from the primary database:

DROP VIEW IF EXISTS analytic_data_views;

Final Thoughts

While it adds some complexity that we are not used to in a Rails application, employing a PostgreSQL View can be a great way to access data in a more efficient way. It also reduces the need to keep the data synchronized, and duplicating the space needed to store that data.

– David Higgins, Developer