Shipping Software in FinTech

Posted by Dennis Ideler & Oleksandr Kruk

Funding Circle is one of Europe’s leading FinTech companies. Our headquarters are based in the City of London, one of the oldest financial centres in the world and still a global leader. Naturally there is a lot of financial regulations with which our business must comply.

In this post we describe how regulation affects our agile development process, and present an open-source tool that helps with some of the challenges faced. Afterwards we discuss some of the implementation details – this section is more technical and is included as optional reading.


Becoming a regulated environment

Funding Circle has seen tremendous growth over the last year. We’ve become an established financial institution and with that comes financial regulation. One of the regulatory bodies in the UK is the Financial Conduct Authority (FCA). We have been operating under interim permission from the FCA since April 2014 and continue to work closely with them to obtain a full operating license.

With regulation comes process. There are a lot more checks in place now than when we were a startup, and rightfully so to protect our client money and data. The increased trust that comes with such process is a win, but if the business isn’t careful, the tradeoff is a loss in speed from all the red tape.

With our goal to become compliant, an anti-goal was slowing down. To be successful, we have to be able to make quick decisions and deploy software multiple times a day. A non-goal was to be faster - if it happened that would be great, but we wouldn’t actively be trying to achieve that.

Becoming FCA compliant affects virtually every department of our business. For the tech team it meant being able to answer ‘yes’ to a very long list of questions about engineering and security practices. Fortunately for us there were quite a lot of practices which we already satisfied or exceeded. If we can prove during an audit that we follow good practices, we can answer 'yes’ to those. For those which we cannot, we have to explain what we are doing to get to a 'yes’ answer.

Move fast and safe

One of the sections of the compliance checklist was focused on the governance of software delivery. Our approach was to build a system that would create an inspectable audit trail of the software development lifecycle, highlighting practices that we follow. It would have to withstand a surprise technical audit, meaning we had to create a detailed audit trail of events where we could easily view the state of the world at a specific time in history.

After an inception meeting we had a quick look to see if there were any existing tools but couldn’t find any that met our criteria. After many discussions and diagrams in a cramped cold room we dubbed “The Icebox”, we all decided on the following:

  • Our unit of currency would be Git SHAs. These are the software versions that ultimately get shipped.
  • Track the full delivery process for all released software versions. From story inception to deployment and everything in between.
  • Keep the tool fairly generic and not too tightly coupled to specific services. In case the business decides to switch from GitHub to BitBucket for example.
  • The tool would only observe and alert, not lock down. Though we cannot prevent another process locking things down based on the information retrieved from our tool.

We open sourced the tool from the beginning as it would be a lot more challenging to do after it’s been built. It’s called Shipment Tracker1 and you can find it on GitHub.

Use cases

Shipment Tracker brings our software development processes closer to becoming fully compliant, though it’s important to note that it only addresses certain needs. It’s not a silver bullet – there will be other changes to your processes and tools that need to be made in addition to using a tool such as Shipment Tracker.

Feature Reviews

When a feature is ready to be reviewed by the product owner (PO), the developer presents them with a Feature Review. This is a page that has a checklist of various processes, such as relevant tickets and their state, test results, which user acceptance testing (UAT) environment it’s been deployed to, QA approval, and so on.

Feature Review

[fig. a] The Feature Review page as shown for a specific software version that’s under review.

Some of these processes could be optional. For example, not every change may need QA to review.

Each panel gets its information from events. Shipment Tracker is continuously receiving and storing events from many different sources – more on this later.

A Feature Review gives the change-control process more visibility. We can use it when signing off a feature to make sure it’s in good state. We can also use it during an audit to see what criteria was met at various specific times.

Because this page is a projection of events, we can show the state of a Feature Review at any time by replaying events up until a given time. If no time is specified, we show the Feature Review at its current state by replaying all events up to the very last one.

Feature Review timestamp

[fig. b] Feature Reviews can be viewed at specific times, down to milliseconds if needed.

On every push to GitHub, Shipment Tracker creates a commit status for the associated Feature Review. GitHub Commit Status failed GitHub Commit Status pending GitHub Commit Status reapproval GitHub Commit Status pass

[fig. c] GitHub Commit Statuses by Shipment Tracker show the state of any associated Feature Review(s).

When no associated Feature Review exists, it asks you to create one and link it to at least one (JIRA) ticket. If an associated Feature Review does exist for the commit, it links to it. Once a Feature Review exists for the feature branch, any child commits on the same branch will have a Feature Review auto-created for them.

A detailed description of all the Feature Review panels can be found on Shipment Tracker’s wiki.

Releases

Every repository tracked by Shipment Tracker has its own Releases page where you can see what’s been deployed. A Release is defined by its software version (Git SHA). Only commits made directly on the canonical branch2 are considered to be Releases as that’s the source for releasable software3. Meaning any commits made on feature branches will not be shown, but their merge commit will be.

Releases

[fig. d] The Releases page gives a region-specific overview of a project’s deploy queue.

An application can be deployed in multiple regions, with each region having a different deploy queue. This is why Releases are per geography – indicated by a flag for each geography. New geographies can easily be added as two-letter country codes4 by setting an environment variable.

The page is divided into two sections. Pending releases and deployed releases. Releases are by default considered to be pending. To be marked as deployed, Shipment Tracker must receive a production deploy event for that software version or for one of its children.

We can see if the release has been approved or not. Unapproved releases are indicated by a red row and a variety of statuses, such as it being in a pre-approval state, or a code change was pushed after approval, or it simply lacks an associated Feature Review.

Release Statuses

[fig. e] Releases can be flagged if they haven’t gone through the proper approval process.

Deployment Alerts

The Shipment Tracker tool is an observer, not a gatekeeper. Sometimes things won’t follow the happy path.

Unauthorised Releases will trigger a deploy alert at the time of deployment. This can be for a variety of reasons, such as deploying an older software version. A full list can be found on the Shipment Tracker wiki.

Deploy Alert

[fig. f] Risky deploys are alerted to the business via a Slack channel.

Currently such alerts go to a Slack channel, but it can easily be extended to send email alerts. Here the Deployer or Product Owner can justify the release or retrospectively rectify the situation.

The notification links back to the Releases page.

Search

The Shipment Tracker landing page is the Search page. Here you can search for released features that have gone through the Shipment Tracker process. By default it shows releases for the current day.

Search

[fig. g] Tickets that have been released can be found via the Search page.

The query can contain the app name or SHA of a deployed commit, or any keywords from a ticket title or description. The search criteria is weighted. For example, matches against deploy data is most relevant and those results will appear first.

The results will show all relevant tickets with their title and snippet of their description. The bottom of each ticket panel shows deployment information, such as the region, time of deployment, app name, and short SHA.

Implementation

Tech Stack

Shipment Tracker is a typical Rails application with a PostgreSQL database and background jobs. It has an authenticated GUI and API. We used Auth0 for user authentication but this is easily configurable with environment variables. In fact, because we built Shipment Tracker as an open source application from the start, it affected a lot of our design decisions. We strived to follow the guidelines of a twelve-factor app wherever possible.

A few areas where Shipment Tracker stands out from other Rails apps are - Interaction with Git - there’s quite a bit of this - PostgreSQL is used in a variety of ways, including as a NoSQL store (via JSON) and full-text search - Storing state as an event stream, which creates an audit trail with time travel capability

Event Sourcing

Shipment Tracker relies on the concept of Event Sourcing. In summary, event sourcing consists of treating all the system state changes as a stream of events and recording all the states changes of your application through event objects. This allows you to reconstruct the state of your application at any given time by replaying all the events, assuming you have stored all the events since the beginning of the application operation.

Shipment Tracker stores all events in a relational database management system (RDBMS) known as PostgreSQL.

Shipment Tracker

[fig. h] Shipment Tracker keeps track of the software development lifecycle by receiving events from a variety of external and internal sources. For brevity, not all event sources are shown here.

Shipment Tracker uses the event sourcing concept for providing the traceability of all the relevant actions taken during the software development and delivery process. The platform provides a webhook to which all the external services report. This webhook is the source of all the events that the system needs to create its current state. Tracking new data is as easy as creating a new type of event with a specific endpoint.

Snapshots

To get the current state of the world, state has to be accumulated by applying all events, from the very beginning until the specific point of interest in time. This is inefficient and doesn’t scale well. The time it takes to apply events grows linearly as the number of events grows.

Instead of storing events and lazily (re)applying them whenever the system has to project information, we can apply events as we receive them and persist the accumulated state so it can be efficiently looked up. This is called “snapshotting” and is common in systems that use event sourcing.

A snapshot is a record of accumulated state from events, for a specific model. For example, events from a CI source such as CircleCI, Travis, or Jenkins would be normalised into build snapshots with these attributes:5

Build entity
[fig. i] Events are normalized into persisted snapshots, which are more efficient to work with.

So any information that is ultimately projected to users, is retrieved from snapshots instead of raw events.

Queries – which are used in controllers – collect information to be projected in the views. A Query can have multiple Repositories for data retrieval. Every Repository has a data store, which contains Snapshots for a specific model. Finally, a Snapshot normalizes a raw Event.

Relations
[fig. j] Views project information from Queries. Queries use Repositories for interacting with the persistence layer. Repositories each have a Snapshot store. Snapshots store accumulated state of Events.

Snapshotting is a continuous process. There is an infinite loop that snapshots every new event. To identify new events, there is an EventCount table that keeps track of the last event applied for each snapshot type. Events that have an id greater than the last event applied are considered new and are pending snapshot.

EventCount entity
[fig. k] The EventCounts table tracks the last event applied for each event repository.

Sometimes we need to wipe all snapshots and recreate them. An example is adding a new attribute to a snapshot. We would have to recreate all snapshots of that type so that events are re-applied and the relevant information can be extracted for the new field. In these cases we put the application into maintenance mode. A Capistrano task sets DATA_MAINTENANCE=true, stops the background workers (so they don’t process new incoming events), then restarts the application. A rake task is then automatically triggered to recreate snapshots - but only up to the previous snapshot count. After resnapshotting is completed, maintenance mode is automatically turned off and any new events that were queued are snapshotted as usual.

Base events

All events are derived from an abstract BaseEvent model and stored in the database.

Event entity
[fig. l] Events have a straightforward structure. Event-specific details are stored as JSON. The unstructured data allows for quick and easy modification of payloads from event sources.

The only “custom” field is the details column which holds all event metadata. The others are common columns provided by Rails. We have the typical timestamp columns which we need for event sourcing, and we also have a type column that we need for Single Table Inheritance (STI).

Specific type of events that are derived from the base event will have their type set. This allows us to use BaseEvent to query all events. We can also narrow down our queries by using a specific event, such as JiraEvent.

Note that the details column uses a JSON data type. This makes it very easy for us to accept any type of payload for event details, which is essential as we have a large amount of event sources, each being very different. Many of these are from third-party sources where we do not have any control over the payload. So we accept anything but we only pick and choose what we need from the details payload, and that varies for different events.

JIRA events

Let’s look at the JIRA case. JIRA allows you to specify an endpoint for notifying on the specific modifications we are interested in. In our case, we are interested in the ticket lifecycle flow, meaning that whenever a ticket is moved between different states e.g: “To Do”, “In Progress”, “Ready for Deploy”, we want to be notified. And the reason for this interest in a ticket’s state change is that it will allow us to answer questions like:

  1. “Was the code changed in context of 'ticket A’ after the ticket was approved by PO?”
  2. “What is the current state of the 'ticket A’?

In order to explain how we answer those questions we need to introduce the Github push notifications.

GitHub events

GitHub, as well as JIRA, simplifies our life by allowing us to setup an endpoint to which it will send the events we are interested in. For answering our questions, we are only interested in receiving a notification event when a code change is pushed to a repository which we are tracking. We do not filter the destination of the commits, all pushes to all branches are processed. How do we use the information from GitHub push events? First let’s look at some key fields in the body of the GitHub push event

{
  repo: "audited_repo",
  sha: "29efedc",
  parent: "4acde2d"
}

As you probably guessed, the sha key will point to the commit’s SHA about which we are being notified while the parent key will point to the SHA of the parent of the current commit.

With this information we can easily identify which repository was updated, but what about the information about the branches? Well, in order to be able to quickly query the repository for relevant information, we decided to maintain a copy of each repository locally on each Shipment Tracker deployed instance, we’ll give more details on this later, but for now let’s assume that we have a local up-to-date copy for each repository that we track. We use rugged to manage the locally cloned repositories. Rugged is a very useful gem (a wrapper for a C++ library libgit2) that allows us to query the repository and find out which commit belongs to which branch, which commits were made between commit A and B, was commit A merged to the master branch, among other important information.

Now that we have an event to call us when any change is made to the code and we have an easy way to find out more information about the commit by querying the local repository, we can do some interesting things. Whenever we receive the push event from GitHub, Shipment Tracker will check the following condition:

  • Is there any association between the parent commit and a JIRA ticket, if so, associate that same ticket (or tickets) to the newly pushed commit.
  • Check the status which the associated ticket is in, and post it to GitHub as a commit status. The status can be one of the described in figure c.

These commit statuses on GitHub are a convenient way for developers to check what status the feature related ticket is in. Also it can be very useful to implement a specific “merge to master” policy, e.g.: a developer can only merge the PR if the ticket in JIRA is approved (which is not the practice at Funding Circle, given that we prefer to monitor and alert as opposed to restrict people’s actions and slow down the development process).

How do we keep the local copies of git repositories up to date?

In section GitHub events we mentioned that we keep a copy of all git repositories that are under audit by Shipment Tracker. Now let’s look at how we keep the data in those copies up to date. In order to be able to query the most recent state of the code for each local repo, we need to update them frequently. We have already experimented two different solutions:

  1. Fetch the repo updates whenever there is a web request that will require querying the repository
  2. Run a background job which compares local repo against remote every minute and in case the local repo is outdated, fetch the updates for that repo.

Let’s look at each solution closely and analyse the pros and cons. Solution 1. is quite efficient given that we are not polling the repo all the time. Also the repo is only cloned when there is a clear need to query it. The problem starts manifesting when we deal with larger repos which take some time to update or when we have to clone the repo for the first access. The size of the repo impacts quite significantly the response time of the web requests which is quite frustrating for the users :(.

After having this solution for a couple of months, we’ve decided that it the impact on the user experience is too big of an impact on the user experience and started developing a better version which resulted in solution 2) mentioned above.

We are still evaluating this second solution and we have already identified a couple of advantages and disadvantages. This solution provides a much better user experience in most cases by keeping all the repositories up to date in the background. All the repositories are scanned in an infinite loop.

It provides quite a fast way of keeping all the repos up to date in most cases but also has an associated cost of the tireless background worker. One of the problems that we’ve foreseen is scaling the number of tracked repositories.

Imagine we have a thousand repositories to keep track of. Assuming that in a given update loop we have 300 outdated repositories. If we assume an update operation takes in average 2 seconds, then 2x300 will give us a delay of 600s. or 6 minutes for the last repository to be updated. This can be acceptable for small companies which are less likely to reach those numbers, but when you have dozens of teams working in parallel on a couple of repositories each, you might reach the acceptable limit quite easily. Luckily this is not the worst problem to have since it can be solved with threading. By using only six threads we can reduce the longest outage to one minute, which looks much more acceptable, and if it’s not, we can always use more threads. In fact, the most reasonable case for this problem is probably to define the maximum outage time and calculate the number of threads needed to satisfy that criteria.

Let’s look at an example, considering the exact same number of outdated repositories, 300, and an average time of 2s. per repo update, assuming we want a maximum outage time of 10 seconds, the number of threads needed to ensure this would be obtained by threads = 300/(10/2) = 60. Sixty is not a low number by any means, but it’s steel manageable. In a real world we would of course define a maximum threshold for the number of threads and establish the maximum limit of repos for which we can guarantee a maximum outage of 10s.

Circle CI events

In case of CircleCI we receive notification events whenever a build finishes. These events allow us to determine if the build for a given commit was successful or not. This information can be seen in the "Test Results” section of the Feature Review page in the picture above.

This information is relevant for the Product Owner and QA, it is a quick way of confirming that the implementation for this specific feature is building correctly and passes all tests. Another use case for this information, which is planned for future is to allow automated deploy once the build goes green. A CircleCI build would package the application version and store it in a convenient place, than with a single click of a button a deploy of this version could be done.

Deploy events

Deploy events can come from any machine or service responsible for a deploy. In figure h the source is Jenkins CI, but it could also come from CircleCI, Heroku or Capistrano for example.

Depending on the source, the payload can be different. A typical deploy could have the following fields:

{
  app_name: "my_app",
  version: "sha",
  deployed_by: "someone",
  locale: "gb",
  environment: "staging",
  servers: ["a.example.com", "b.example.com"],
}

With that information Shipment Tracker can link a deploy to a specific software version for audited projects.

Search

The landing page of Shipment Tracker is a search page for Released Tickets, as seen in figure g. For this we use PostgreSQL’s full text search instead of a dedicated full-text search tool such as Elasticsearch.

Postgres’s text search system preprocesses documents6 by reducing them to the tsvector format – a compact representation of the full document. The tsvector data type stores lexemes (key words) with associated rankings. It’s like a hash map specifically for weighted search.

Here’s an example of a tsvector for a document:

"'accept':9B,16B,35 'api':6B,28,54 'borrow':8B 'creat':4B 'fulli':13B 'fund':14B 'goal':18 'integr':46 'loan':11B,36 'make':49 'new':53 'one':25 'place':41 'result':32 'stori':21 'test':5B,47 'updat':44"

Some of the words look weird because they’re stemmed. Postgres uses a dictionary to eliminate common words that should not be considered in a search, and to stem words so that different derived forms of the same word will match. For example, “jumping” and “jumped” would be stored as a lexeme like “jump”.

Searching and ranking are performed entirely on the tsvector — the original text only needs to be retrieved when the document has been selected for display to a user.

Search results are ranked. Certain parts of a document (e.g. titles) can be giving higher relevance by using weights, so that when there are any matches, the most relevant results can be shown first.

Consider the following table.

CREATE TABLE released_tickets (
  id integer NOT NULL,
  key character varying,
  title character varying,
  description text,
  tsv tsvector,
);

Full text search can be slow. It’s recommended to index the tsvector column(s) to accelerate search.

CREATE INDEX index_released_tickets_on_tsv ON released_tickets
USING gin (tsv);

We use the pg_search gem to extend ActiveRecord (the default ORM for Rails). It provides ActiveRecord callbacks to update the tsv indexes, but some ActiveRecord operations skip callbacks, so it’s much safer to have a trigger defined on the database.

This trigger will set the weights for records that have been touched and index them.

CREATE FUNCTION released_tickets_trigger() RETURNS TRIGGER AS $$
BEGIN
  new.tsv :=
    setweight(to_tsvector(coalesce(new.title, '')), 'A') ||
    setweight(to_tsvector(coalesce(new.description, '')), 'B');
  RETURN new;
END
$$ LANGUAGE plpgsql;

CREATE TRIGGER released_tickets_tsv_update
BEFORE INSERT OR UPDATE ON released_tickets
FOR EACH ROW EXECUTE PROCEDURE released_tickets_trigger();

Data analysis

A pleasant side effect which we hadn’t thought about is that now we are collecting a lot of information regarding our development process and we can answer interesting questions about it. Some of the most interesting questions are:

  1. How many times we deploy in a day?
  2. How long, on average, does it take from development start until the deploy of a feature?
  3. How many unauthorised deploys we have per month?

This information is extremely valuable when you want to improve/speed-up the process because it allows you to see where the bottlenecks are and focus immediately on the pain points without loosing too much time on investigation.

Testing

Shipment Tracker has various layers of testing.

Unit tests

Most of the tests are unit tests, using the RSpec testing framework for Ruby. In many cases we stub external dependencies of the system under test so the unit is tested in isolation.

Acceptance tests

Our acceptance tests are integration tests. Here we almost never stub, except for cases such as communication with a third-party service. These tests use Cucumber (for the Gherkin syntax), Capybara (for simulating user interaction), and RSpec (for test assertions).

Our acceptance tests are usually written in a BDD style, focusing on the business value from a customer’s perspective. The narrative follows the style of user-story. For example:

Feature: Managing Repository Locations
  As an application onboarder
  I want to add a repository from GitHub
  Because I want an audit trail of the application's development

Scenario: Add repositories
  Given I am on the new repository location form
  When I enter a valid uri "ssh://github.com/new_app"
  Then I should see the repository locations:
    | Name      | URI                            |
    | new_app   | ssh://github.com/new_app       |

From a development perspective, the outside-in approach let us focus on getting the feature working first, and then making it better while being confident that it doesn’t break. This prevents us from being bogged down in design details. Because acceptance tests are expensive, we don’t use them to test exhaustively, instead only testing the business critical paths which are usually happy paths.

Linters

After unit tests and acceptance tests are run, we run a Ruby style linter known as RuboCop. These run after because the changes are usually cosmetic, and we prefer to see that the code behaves as specified before making cosmetic changes. On projects with a slower test suite, you’ll usually see such linters running first.

Performance tests

We introduced performance tests to benchmark projections before and after snapshotting. They were used to justify implementing snapshots, as event sourcing wasn’t scalable without it - page requests would very quickly start to timeout.

These tests do not run by default as they considerably slow down the test suite. The results can be seen in these slides.

Testing with Git

Testing with Git can be difficult. Git repositories need to be built up and teared down on the spot, while keeping the tests easy to understand. To make it easier, we introduced some helpers.

The interface for interacting with a test Git repository is the GitTestRepository class. This class helps us create basic test repositories and handles all the common operations for us, such as creating commits, and creating and merging branches. To reference specific commits in our tests without knowing what their SHAs will be beforehand, we use “pretend commits” which is a hash that links a predetermined human-readable key such as “#abc” to an actual commit.

Tests that required git quickly became very verbose and difficult to read and write. So we introduced a RepositoryBuilder class that would build GitTestRepositorys for us based on an ASCII diagram of a Git tree.

   o-A-B---
  /        \
-o-------o--C---o

Relevant commits would be named with a letter. Other commits would simply be an 'o’. The diagrams should be read from left to right. Here is an example of a test that requires git interaction.

Conclusion

Overall the introduction of Shipment Tracker was a positive experience and it has been successfully running for almost a year. We are really happy for being able to carry on our development with agility while being regulated, instead of setting up blockers which would frustrate the development team and slow down the company.

There are some enhancements that have to be done in the area of usability so that we reduce the distraction time to a minimum. For example a typo fix to the README is not a change that affects production. Maybe there could be a commit message keyword to skip some of the strict checks. There are also some lacking features, such as alerting on out-of-hour deploys or unlinking tickets from a Feature Review.

As a nice to have, it would be also interesting to create a statistics page, which would represent in charts the answers to the questions mentioned in Data analysis section.

We hope this project might be of use to others and would love to see some contributions given its open-source nature. Don’t hesitate to contact us if you are interested in exploring more of Shipment Tracker.


  1. The name “Snowden” won an internal poll but was rejected due to its controversial meaning. 

  2. In most cases the canonical branch will be “origin/master”. 

  3. Container-based deployment using a tool such as Docker are becoming increasingly popular. In such cases we recommend setting image tags as the commit SHA. That ensures the software version can be easily found when sending a deploy event. 

  4. Country codes as defined in ISO 3166-1 alpha-2. For example, 'GB’ for Great Britain and 'US’ for the United States of America. 

  5. Notice the event_created_at field which each snapshot has. It allows the system to easily move across history at specific points in time. For example, we may want to show that a build for a software version was failing at the time of ticket approval, but a later rebuild was passing. Damn flakey tests. 

  6. For text search purposes, a document is the unit of searching. In other words, the text to be searched.