Polling for File Generation

Background of the Problem

Image of Orders Table A list of orders

We are currently generating a CSV of orders for a vendor; which at the moment isn't a big problem as we do not have many vendors, or many orders, so the CSV will generate relatively quickly.

class OrdersController < ApplicationController
#...

def download_csv
    orders = Order.where(id: order_ids)

    file = Tempfile.new

    CSV.open(file, "wb") do |csv|
    csv << ["Name", "Address", "City", "County", "Postcode", "Email"]
    orders.each do |order|
        csv << [order.name, order.address, order.city, order.county, order.postcode, order.email]
    end
    end

    send_file File.open(file.path)
end
end

However, we know that suppliers and orders are bound to increase, therefore we need to find a better solution for generating files which will scale without blocking our precious ruby processes.

Obviously, this is where workers come in to do all the heavy lifting. By putting this work in a background process, we can free up our ruby processes again.

class OrdersController < ApplicationController
#...

def download_csv
    GenerateCSVJob.perform_async(params[:order_ids])

    # Code to send file back to user
end
end

However, how can I tell the controller that the file has finished being generated so that it can be sent back to the user?

The Implementation

It's at this point that we need to do some polling, and the purpose of writing this article.

Now that we have the controller calling the worker to generate the file we need a way of tracking the file generated. As we are not using a model for this we don't have any handy IDs to keep track, this is where I like to use timestamps instead.

require 'csv'

class GenerateCSVJob
include Sidekiq::Worker

def perform(timestamp, order_ids)
    orders = Order.where(id: order_ids)

    file = Tempfile.new(timestamp.to_s)

    CSV.open(file, "wb") do |csv|
    csv << ["Name", "Address", "City", "County", "Postcode", "Email"]
    orders.each do |order|
        csv << [order.name, order.address, order.city, order.county, order.postcode, order.email]
    end
    end

    File.rename(file.path, "/tmp/#{timestamp}_order.csv")
end
end

Now that we have a unique way of identifying the file we have just generated, we have a clear way of identifying the file, in order to pass it back to the user.

class OrdersController < ApplicationController
#...

def download_csv
    timestamp = Time.zone.now.to_i.to_s

    GenerateCSVJob.perform_async(timestamp, params[:order_ids])

    send_file File.open("/tmp/#{timestamp}_order.csv")
end
end

However, it is much cleaner if we just send back the entire URL for them to poll instead. At this point, that I want to clean up the OrdersController and move the logic into it's own controller instead.

class CSVExportsController < ApplicationController
def create
    timestamp = Time.zone.now.to_i.to_s

    GenerateCSVJob.perform_async(timestamp, params[:order_ids])

    respond_to do |format|
    format.json { render json: { url: csv_export_path(timestamp) },
                        status: :ok }

    end
end
end

Now we need to add an action to check whether the file exists yet or not, and if it does, then send it back to the user.

class CSVExportsController < ApplicationController
def create
    #...
end

def show
    timestamp = params[:id]
    if File.exist?("/tmp/#{timestamp}_order.csv")
    send_file File.open("/tmp/#{timestamp}_order.csv") }
    else
    head :not_found
    end
end
end

However, I prefer to send a link back to the user which they can then use to download the file whenever they want.

class CSVExportsController < ApplicationController
def create
    #...
end

def show
    timestamp = params[:id]
    if File.exist?("/tmp/#{timestamp}_order.csv")
    respond_to do |format|
        format.csv { send_file File.open("/tmp/#{timestamp}_order.csv") }
        format.json do
        render json: { file: csv_export_path(timestamp, format: :csv)}
        end
    end
    else
    head :not_found
    end
end
end

But of course, none of this will work without the Ajax to marry it all up.

(($) ->
$ ->
    # Provide some context to the user so they know what is happening after we submit the form
    $('button[data-behavior="generate_csv"]').click ->
    $('button[data-behavior="generate_csv"]').hide()
    $('div[data-behavior="generating_csv"]').show()

    $('form[data-attribute="generate_csv_form"]').on 'ajax:success', (e, data, xhr) ->
    # Uncheck the checkboxes
    $('input:checkbox').removeAttr('checked')

    # This will be our URL to check if the file exists
    url = data.url

    # Set up our polling object
    poll = (url) ->
        $.ajax({
        type: "GET",
        dataType: 'json',
        url:  url,
        error: ->
            # If the file does not exist yet, try again
            setTimeout ( => poll(url) ), 5000
        success: (data, status, xhr) ->
            # Now that the file exists, populate the download link with the download URL and then show it
            $('div[data-behavior="generating_csv"]').hide()
            $('a[data-attribute="download_csv_link"]').attr("href", data.file)
            $('div[data-behavior="download_csv"]').show()
        })

    # Start polling csv_export_path to see if the file exists yet
    poll(url)
) jQuery

The great thing about sending a URL back rather than just the file, is that we can add other formats to the respond_to block, if we ever need to generate other types of files. For example a PDF of order labels for the supplier to print and stick on their orders.

class FileExportsController < ApplicationController
def create
    #...
end

def show
    timestamp = params[:id]
    file_type = params[:file_type] # CSV, PDF, etc
    if File.exist?("/tmp/#{timestamp}_order.#{file_type}")
        respond_to do |format|
            format.send(file_type) { send_file File.open("/tmp/#{timestamp}_order.#{file_type}") }
            format.json do
                render json: { file: file_export_path(timestamp, format: file_type,
                                                                file_type: file_type)}
            end
        end
    else
        head :not_found
    end
end

If you know of an alternative appraoch to achieving the same result, then please let me know in the comments below.

Resources

An example project with all the relavent code can be found under my github account.