module Roda::RodaPlugins::MailProcessor

  1. lib/roda/plugins/mail_processor.rb

The mail_processor plugin allows your Roda application to process mail using a routing tree. Quick example:

class MailProcessor < Roda
  plugin :mail_processor

  route do |r|
    # Match based on the To header, extracting the ticket_id
    r.to /ticket\+(\d+)@example.com/ do |ticket_id|
      if ticket = Ticket[ticket_id.to_i]
        # Mark the mail as handled if there is a valid ticket associated
        r.handle do
          ticket.add_note(text: mail_text, from: from)
        end
      end
    end

    # Match based on the To or CC header
    r.rcpt "post@example.com" do
      # Match based on the body, capturing the post id and tag
      r.body(/^Post: (\d+)-(\w+)/) do |post_id, tag|
        unhandled_mail("no matching post") unless post = Post[post_id.to_i]
        unhandled_mail("tag doesn't match for post") unless post.tag == tag

        # Match based on APPROVE somewhere in the mail text,
        # marking the mail as handled
        r.handle_text /\bAPPROVE\b/i do
          post.approve!(from)
        end

        # Match based on DENY somewhere in the mail text,
        # marking the mail as handled
        r.handle_text /\bDENY\b/i do
          post.deny!(from)
        end
      end
    end
  end
end

Processing Mail

To submit a mail for processing via the mail_processor routing tree, call the process_mail method with a Mail instance:

MailProcessor.process_mail(Mail.new do
  # ...
end)

You can use this to process mail messages from the filesystem:

MailProcessor.process_mail(Mail.read('/path/to/message.eml'))

If you have a service that delivers mail via an HTTP POST request (for realtime processing), you can have your web routes convert the web request into a Mail instance and then call process_mail:

r.post "email" do
  # check request is submitted by trusted sender

  # If request body is the raw mail body
  r.body.rewind
  MailProcessor.process_mail(Mail.new(r.body.read))

  # If request body is in a parameter named content
  MailProcessor.process_mail(Mail.new(r.params['content']))

  # If the HTTP request requires a specific response status code (such as 204)
  response.status = 204

  nil
end

Note that when receiving messages via HTTP, you need to make sure you check that the request is trusted. How to do this depends on the delivery service, but could involve using HTTP basic authentication, checking for valid API tokens, or checking that a message includes a signature/hash that matches the expected value.

If you have setup a default retriever_method for Mail, you can call process_mailbox, which will process all mail in the given mailbox (using Mail.find_and_delete):

MailProcessor.process_mailbox

You can also use a :retreiver option to provide a specific retriever:

MailProcessor.process_mailbox(retreiver: Mail::POP3.new)

Routing Mail

The mail_processor plugin handles routing similar to Roda’s default routing for web requests, but because mail processing may not return a result, the mail_processor plugin uses a more explicit approach to consider whether the message has been handled. If the r.handle method is called during routing, the mail is considered handled, otherwise the mail is considered not handled. The unhandled_mail method can be called at any point to stop routing and consider the mail as not handled (even if inside an r.handle block).

Here are the mail routing methods and what they use for matching:

from

match on the mail From address

to

match on the mail To address

cc

match on the mail CC address

rcpt

match on the mail recipients (To and CC addresses by default)

subject

match on the mail subject

body

match on the mail body

text

match on text extracted from the message (same as mail body by default)

header

match on a mail header

All of these routing methods accept a single argument, except for r.header, which can take two arguments.

Each of these routing methods also has a r.handle_* method (e.g. r.handle_from), which will call r.handle implicitly to mark the mail as handled if the routing method matches and control is passed to the block.

The address matchers (from, to, cc, rcpt) perform a case-insensitive match if given a string or array of strings, and a regular regexp match if given a regexp.

The content matchers (subject, body, text) perform a case-sensitive substring search if given a string or array of strings, and a regular regexp match if given a regexp.

The header matcher should be called with a key and an optional value. If the matcher is called with a key and not a value, it matches if a header matching the key is present in the message, yielding the header value. If the matcher is called with a key and a value, it matches if a header matching the key is present and the header value matches the value given, using the same criteria as the content matchers.

In all cases for matchers, if a string is given and matches, the match block is called without arguments. If an array of strings is given, and one of the strings matches, the match block is called with the matching string argument. If a regexp is given, the match block is called with the regexp captures. This is the same behavior for Roda’s general string, array, and regexp matchers.

Recipient-Specific Routing

To allow splitting up the mail processor routing tree based on recipients, you can use the rcpt class method, which takes any number of string or regexps arguments for recipient addresses, and a block to handle the routing for those addresses instead of using the default routing.

MailProcessor.rcpt('a@example.com') do |r|
  r.text /Post: (\d+)-(\h+)/ do |post_id, hmac|
    next unless Post[post_id.to_i]
    unhandled_mail("no matching Post") unless post = Post[post_id.to_i]
    unhandled_mail("HMAC for doesn't match for post") unless hmac == post.hmac_for_address(from.first)

    r.handle_text 'APPROVE' do
      post.approved_by(from)
    end

    r.handle_text 'DENY' do
      post.denied_by(from)
    end
  end
end

The rcpt class method does not mark the messages as handled, because in most cases you will need to do additional matching to extract the information necessary to handle the mail. You will need to call r.handle or similar method inside the block to mark the mail as handled.

Matching on strings provided to the rcpt class method is an O(1) operation as the strings are stored lowercase in a hash. Matching on regexps provided to the rcpt class method is an O(n) operation on the number of regexps.

If you would like to break up the routing tree using something other than the recipient address, you can use the multi_route plugin.

Hooks

The mail_processor plugin offers hooks for processing mail.

For mail that is handled successfully, you can use the handled_mail hook:

MailProcessor.handled_mail do
  # nothing by default
end

For mail that is not handled successfully, either because r.handle was not called during routing or because the unhandled_mail method was called explicitly, you can use the unhandled_mail hook.

The default is to reraise the UnhandledMail exception that was raised during routing, so that calling code will not be able to ignore errors when processing mail. However, you may want to save such mails to a special location or forward them as attachments for manual review, and the unhandled_mail hook allows you to do that:

MailProcessor.unhandled_mail do
  # raise by default

  # Forward the mail as an attachment to an admin
  m = Mail.new
  m.to 'admin@example.com'
  m.subject '[APP] Unhandled Received Email'
  m.add_file(filename: 'message.eml', :content=>mail.encoded)
  m.deliver
end

Finally, for all processed mail, regardless of whether it was handled or not, there is an after_mail hook, which can be used to archive all processed mail:

MailProcessor.after_mail do
  # nothing by default

  # Add it to a received_mail table using Sequel
  DB[:received_mail].insert(:message=>mail.encoded)
end

The after_mail hook is called after the handled_mail or unhandled_mail hook is called, even if routing, the handled_mail hook, or the unhandled_mail hook raises an exception. The handled_mail and unhandled_mail hooks are not called if an exception is raised during routing (other than for UnhandledMail exceptions).

Extracting Text from Mail

The most common use of the mail_processor plugin is to handle replies to mails sent out by the application, so that recipients can reply to mail to make changes without having to access the application directly. When handling replies, it is common to want to extract only the text of the reply, and ignore the text of the message that was replied to. Because there is no consistent way to format replies in mail, there have evolved various approaches to do this, with some gems devoted to extracting the reply text from a message.

The mail_processor plugin does not choose any particular approach for extracting text from mail, but it includes the ability to configure how to do that via the mail_text class method. This method affects the r.text match method, as well as mail_text instance method. By default, the decoded body of the mail is used as the mail text.

MailProcessor.mail_text do
  # mail.body.decoded by default

  # https://github.com/github/email_reply_parser
  EmailReplyParser.parse_reply(mail.body.decoded)

  # https://github.com/fiedl/extended_email_reply_parser
  mail.parse
end

Security

Note that due to the way mail delivery works via SMTP, the actual sender and recipient of the mail (the SMTP envelope MAIL FROM and RCPT TO addresses) may not match the sender and receiver embedded in the message. Because mail_processor routing relies on parsing the mail, it does not have access to the actual sender and recipient used at the SMTP level, unless a mail server adds that information as a header to the mail (and clears any existing header to prevent spoofing). Keep that in mind when you are setting up your mail routes. If you have setup your mail server to add the SMTP RCPT TO information to a header, you may want to only consider that header when looking for the recipients of the message, instead of looking at the To and CC headers. You can override the default behavior for determining the recipients (this will affect the rcpt class method, r.rcpt match method, and mail_recipients instance method):

MailProcessor.mail_recipients do
  # Assuming the information is in the X-SMTP-To header
  Array(header['X-SMTP-To'].decoded)
end

Also note that unlike when handling web requests where you can rely on storing authentication information in the session, when processing mail, you should manually authenticate each message, as email is trivially forged. One way to do this is assigning and storing a unique identifier when sending each message, and checking for a matching identifier when receiving a response. Another option is including a computable authentication code (e.g. HMAC) in the message, and then when receiving a response, recomputing the authentication code and seeing if it matches the authentication code in the message. The unique identifier approach requires storing a large number of identifiers, but allows you to remove the identifier after a reply is received (to ensure only one response is handled). The authentication code approach does not require additional storage, but does not allow you to ensure only a single response is handled.

Avoiding Mail Loops

If processing the mail results in sending out additional mail, be careful not to send a response to the sender of the email, otherwise if the sender of the email has an auto-responder, you can end up with a mail loop, where every mail you send results in a response, which you then process and send out a response to.