PostgreSQL Views can be a powerful tool for joining data from multiple databases in a Rails project. Should your project only have read access to the target database, you can use a view to join the data from the target database via a Rails model. However, getting everything setup and working can be a bit difficult. This is a not so brief reference for how to get the most out of using views in a Rails app.
TLDR / Table of contents
At a high level these are several gotchas to pay attention to get everything working as expected:
- Data permissions Ensure that you are able to assign privileges to the target database that contains the tables you want to join to.
- Data types Ensure that the data types in the target database are correct for the columns you want to join to. Add indexes as appropriate.
- View migrations Install the dblink extension for PostgreSQL and configure the view in a rails migration.
- Local data models Define the model within the Rails app with the appropriate primary key and relationships. Methods & scopes here should work as expected for regular Rails models. Add a helper model for testing the view.
- Testing Mock the external database and table data. This is a bit tricky when working with separate data sources.
- CI / CD Ensure that the migration actually creates the view properly when running tests.
-
Deployment Ensure that the view is created on staging and production. This may need to be done manually.
- References & Prerequisites
- PostgreSQL 7.2+
- PostgreSQL Views
- PostgreSQL dblink
- pg gem
- Rspec Rails for this example, however any testing framework should suffice.
Background
Recently I have been working with a client that has been using two separate databases since Rails 4 or 5. So the database connection to the target data is not quite modern & is read only, however the project has been largely updated to recent gem versions and is running Rails 7.0. Both databases are hosted on the same server and is using different credentials for each database without superuser access on staging and production.
The client is doing some machine learning and processing on the data and would like to make the result available back to the Rails app. The data does not need to be edited and other options for ingesting the data would involve duplicating the data locally and create the possibility for it to get out of sync.
Using a PostgreSQL View seemed like the best option for this project. The benefits that using a view provides include:
- The data is always in sync.
- Ability to use existing rails gems and conventions to use the external data as if it was local data.
- Ability to map table and column names to match naming conventions.
- Ability to search, sort, and paginate the external data.
While the standard rails migrations worked well for local dev, the staging and production environments required a bit of manual work to create the view and expose the appropriate data. The primary difficulty we run into is that the client is using the same instance to host all of their databases and it would be inadvisable to make permissions on the instance less strict.
Some of data tables we wanted view in Rails required some updating before connecting them to a view. Specifically the target database tables were missing:
- Columns for
id
,created_at
, andupdated_at
- Indexes for several important columns. These are not required for the view to
work, but are recommended to improve query performance. (ie
id
,uuid
)
Gotchas
Let's dig into a more concrete example of how to address the issues we ran across.
The following sections will need to be manually run on each staging server to
connect them to their appropriate external database connection and tables. The
following is an example of how to connect the external_data
table for a
staging
server. Once the setup has been complete it should not need to be run
again unless the target database is changed. The same process will need to be
done for each table that we want to create a view for. The Rails migrations for
those tables should serve as a template to create the views manually. Note:
These instructions assume that the target database is running on the same host.
It should be possible to adapt these instructions to work with a remote host as
well.
Legend
- External database:
external_db_staging
- External database table name:
external_data
- External database user:
external_user
- External database password:
*_supersecret_value_*
(not shown) - Rails database user:
rails_db_user
- Rails view:
analytic_data_views
Data permissions
We want to ensure that the rails_db_user
user has the appropriate privileges
to the target database.
Run these commands on the target database:
-- Grant CONNECT privilege to the user on the target database
GRANT CONNECT ON DATABASE external_db_staging TO rails_db_user;
-- Grant SELECT privilege to the user on all tables in the target database
GRANT SELECT ON ALL TABLES IN SCHEMA public TO rails_db_user;
-- Grant SELECT privilege to the user on all future tables in the target database
ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT SELECT ON TABLES TO rails_db_user;
Data types
Recommended columns to either added or be present in the external database table:
id
uuid
(What we want to use to join to data in the Rails app)created_at
updated_at
Add a sequence for the id
column if necessary.
CREATE SEQUENCE external_data_id_seq START 1;
ALTER TABLE external_data ALTER COLUMN id SET DEFAULT nextval('external_data_id_seq');
Create indexes on the external data as appropriate.
CREATE UNIQUE INDEX external_data_pkey ON external_data (id);
CREATE INDEX index_external_data_on_uuid ON external_data (uuid);
CREATE INDEX index_external_data_on_reference_date ON external_data (reference_date);
View Migrations
This is the view that will be used to join the data from the external database to the Rails app. The view is created in the primary database so that the Rails can access and index the data.
We can also include in the migration the ability to check if the dblink extension is install.
The point of the migration here is to facilitate local dev and testing environments. Running it on staging or production will likely produce a broken view.
Some points to keep in mind:
- We are extracting database details from environment variables.
- When using the connection in local dev or our testing environments as a superuser we do not need to provide the user or password in the connection details.
- We are using parallel specs to speed up the tests so we are including the variable for the test number.
Note 1: This bit is not able to run via the standard rails rake migration functions on staging & production. It will be address in a following section.
Note 2: We are not using the same table and column names internally that the external database is using. This is useful as we can follow the terminology has already defined within the Rails app and be a bit more explicit in what we are referring to. This also allows us to re-order columns so that they are a bit closer to what the Rails standard is.
# db/migrate/20240426211995_add_view_for_analytic_data_on_external_database.rb
class AddViewForAnalyticDataOnExternalDatabase < ActiveRecord::Migration[7.0]
def up
return if view_exists?(:analytic_data_views)
external_connection =
if ENV["EXTERNAL_DB_USER"].present? && ENV["EXTERNAL_DB_PASSWORD"].present?
"dbname=#{ENV["EXTERNAL_DATABASE"]} user=#{ENV["EXTERNAL_DB_USER"]} " \
"password=#{ENV["EXTERNAL_DB_PASSWORD"]} options=-csearch_path="
else
"dbname=#{ENV["EXTERNAL_DATABASE"]}#{ENV["TEST_ENV_NUMBER"]} options=-csearch_path="
end
execute <<~SQL
DO $$
BEGIN
IF NOT EXISTS (
SELECT 1
FROM pg_catalog.pg_extension
WHERE extname = 'dblink')
THEN CREATE EXTENSION dblink;
END IF;
CREATE OR REPLACE VIEW analytic_data_views AS
SELECT *
FROM dblink('#{external_connection}',
'SELECT
id, reference_date, ext_division as division, processed_data as relevant_data,
prior as previous_data, average, uuid, created_at, updated_at
FROM public.external_data')
AS t1(
id int,
reference_date date,
division text,
relevant_data text,
previous_data text,
average float8,
uuid text,
created_at timestamptz,
updated_at timestamptz);
END
$$;
SQL
end
def down
execute <<~SQL
DROP VIEW IF EXISTS analytic_data_views;
SQL
end
end
Testing Migration
Our current setup is using a rake task to seed the external test database. Your mileage may vary here. The important thing is that you can create the external database and necessary tables.
# lib/tasks/external_database.rake
namespace :external_database do
task create_tables: :environment do
p "Creating external database tables..."
ApplicationRecord.establish_connection "external_db_#{Rails.env}".to_sym
ActiveRecord::Migration.class_eval do
unless table_exists? :external_data
create_table :external_data do |t|
t.date :reference_date
t.text :uuid
t.text :ext_division
t.text :processed_data
t.text :prior
t.float :average
t.timestamps
t.index :uuid, unique: false
t.index :reference_date
end
end
end
end
end
Local data models
We want to create two models to access the data in the Rails app. The first model is used to access the data in the view. The second model is used to assist when running tests and seeding the external test database.
Note: We are defining the default scope to be readonly.
# app/models/analytic_data_view.rb
class AnalyticDataView < ApplicationRecord
self.primary_key = "id"
default_scope { readonly.order(reference_date: :desc) }
has_one :parent_model, foreign_key: :uuid, primary_key: :uuid
# Note: There should be a reflective relationship between the parent model and
# the analytic data view.
# `has_many :analytic_data_views, foreign_key: :uuid, primary_key: :uuid`
class << self
# Note: We are using Ransack to filter the data in rails, however this is
# not required to connect the view.
# Standard Rails conventions will suffice.
def ransackable_attributes(auth_object = nil)
%w[id reference_date division average relevant_data previous_data uuid created_at updated_at]
end
def ransackable_associations(auth_object = nil)
%w[parent_model]
end
end
end
# app/models/external_database/external_data.rb
class ExternalDatabase::ExternalData < Daex::Base
self.abstract_class = true
establish_connection "external_db_#{Rails.env}".to_sym
self.table_name = "external_data"
end
Testing
In our example we are using Rspec for the testing framework. However the concepts here should translate to other testing frameworks.
In our spec_helper file we are manually creating fixtures that live outside of the transactions within descriptions and test. This is important for the data to already exist in the external test database before we open a transaction chain. Otherwise the view will not see the data because it has not been committed externally.
# spec/spec_helper.rb
RSpec.configure do |config|
# ...
config.before(:suite) do
ExternalDatabase::ExternalData.create!(
reference_date: Date.yesterday,
uuid: "b8399d43-97b3-472f-aa8e-bec972b04323",
ext_division: "division",
assignment: "RNDM12",
prior: "RNDM14",
average: 100.0,
created_at: DateTime.now,
updated_at: DateTime.now)
ExternalDatabase::ExternalData.create!(
reference_date: Date.yesterday,
uuid: "08b6159d-ef0e-4955-aaf0-2b301cf87870",
ext_division: "another division",
assignment: "RDO12",
prior: "RDO14",
average: 105.0,
created_at: DateTime.now,
updated_at: DateTime.now)
end
configure.after(:suite) do
ExternalDatabase::ExternalData.destroy_all
end
end
This allows us to create related items and write expectations against the view as we would normally do. As we create these outside of the transaction chain the data should be generic so that several different tests can be run against the same read only data.
CI / CD
We are using Semaphore for our continuous integration. The important thing to keep in mind here is to ensure that you are setting up the external database and running the migrations before running the tests. And connecting the view the external database properly.
Note: You may want to ensure that the view is created properly by running the specific migration down & up:
bin/rails db:migrate:down VERSION=20240426211995 RAILS_ENV=test
bin/rails db:migrate:up VERSION=20240426211995 RAILS_ENV=test
Deployment
The rails migration for the view was not creating the view properly on our staging and production environments. We were able to resolve this by manually creating the view in the database. Once the view was created everything worked as expected.
CREATE OR REPLACE VIEW analytic_data_views AS
SELECT *
FROM dblink('dbname=external_db_staging user=external_user password=*_supersecret_value_* options=-csearch_path=',
'SELECT
id, reference_date, ext_division as division, processed_data as relevant_data,
prior as previous_data, average, uuid, created_at, updated_at
FROM public.external_data')
AS t1(
id int,
reference_date date,
division text,
relevant_data text,
previous_data text,
average float8,
uuid text,
created_at timestamptz,
updated_at timestamptz);
Use the following command should you need to remove an incorrectly created view from the primary database:
DROP VIEW IF EXISTS analytic_data_views;
Final Thoughts
While it adds some complexity that we are not used to in a Rails application, employing a PostgreSQL View can be a great way to access data in a more efficient way. It also reduces the need to keep the data synchronized, and duplicating the space needed to store that data.
– David Higgins, Developer