Hurdle 5: How Do I Go About Building an Exchange?
Throwing myself at more unfamiliar territory, I figured the first thing to being able to develop an exchange was to understand how the logic worked behind the scenes.
Understanding Exchange Logic
Let’s recap. An exchange works by people simply selecting the odds they’re willing to bet at and the exchange executes the bet if there exists a bet at those odds or better for the opposing team. If not, the bet gets added to the exchange until a bet arrives that meets that criteria.
The same is true of stock exchanges, replacing bets for the price of a share, so I scoured the web for any tutorials that shared the inner workings. A quick Google search for “how does a stock exchange work?” yielded mainly results of how a stock market worked on too high of a level, discussing the buyer, seller, clearing houses, brokerages, issuing stocks, etc. It wasn’t quite what I was looking for.
What I soon realized though, was that what I was really looking to understand was how an order book worked.
Order Books 101
Definition: An order book is a buy and sell list of a specific security, which includes each order’s number of shares, whether it was a buy or sell order, and the price at which the order was listed.
There are also two different types of orders I’ll discuss that can be executed against an exchange, limit and market orders:
Limit Order: An order to buy or sell a security at a specific price or better. A buy limit order can only be executed at the limit price or lower, and a sell limit order can only be executed at the limit price or higher.
Market Order: An order to buy or sell a security immediately. This type of order guarantees that the order will be executed, but does not guarantee the execution price.
Finally, I was able to find a great tutorial from Udacity, as part of their Machine Learning for Trading course, on how orders are executed against the order book.
Order Book POC
After doing a couple examples by hand, I understood the concept and put it in the context of sports betting, converting the following terms:
- “bid” to “back” or to bet for a team
- “ask” to “lay” or to bet against a team
- “price” to “odds”
- “size” to “amount”
After a few more examples by hand in this context, I decided I should be able to code the concept out locally, executing a few bets against the exchange, using priority queues to order the bets for each team. A few hours later, I finally had a working POC.
Now, it was time to design for this at scale.
Requirements
- Ingest, store, and distribute sporting event updates in near real-time
- Execute and distribute market and limit orders in real-time
- Cancel all non-executed orders in the order book when an event ends
Again, I’ll address these one at a time along with the tradeoffs and complications at each step.
Ingest, store, and distribute sporting event updates in near real-time
In order to have an order book, I needed to relay to the application back end which events were actually available. As a reminder, this would also enable the type-ahead, event pages, and any future features related to the event itself such as real-time score updates.
As mentioned in the diagram previously, I knew the sink of this component would be a kinesis stream, however, the source was completely unknown. Where could I get near real-time sporting event updates? More importantly where could I get these updates without breaking the bank?
Many of you are probably wondering at this point, how come cost is never really strictly defined or mentioned as a constraint?
The Budget
It’s worth addressing here that I worked on this project primarily on weekends and knew it could take anywhere from three to nine months, thus I wanted to keep the monthly budget under $100, as spending more than $1000 on a personal project purely for educational purposes and experience seemed a bit unreasonable. That being said, this was a flexible number and we all know AWS services aren’t free. While AWS Activate does offer up to $100,000 for startups, this wasn’t that.
The Source
At this point I searched the web for vendors and API’s that provided the following:
- Real-time or near real-time event updates
- Full-season schedules
- Free to use or a low-cost subscription
- An easy to use API or SDK
- Supported football, basketball, baseball, and hockey
From there I found a few different options that I looked into:
While some of these had outrageous prices or poor documentation, I decided to go with SportsReference which was not only free, but had great documentation as well as a python library that pulled data from sports-reference.com. Now I just had to decide what this was going to run on.
The Infrastructure
The infrastructure seemed pretty straight forward as I required something that was always up, didn’t require much scaling, and preferably could be containerized. ECS seemed like an obvious solution and I was getting pretty comfortable with it at this point.
The other requirement here was that I was going to need low-latency persistent storage for watermarks or cached events in order to not send duplicate event updates, especially in the case of a container restarting for any reason. I originally was thinking of using DynamoDB for this, but I’ll discuss later what I ended up using as it went hand-in-hand with some requirements for the component that would execute orders against the order book.
The Development
The application itself had to pull data from the source, check the data against the cache, and send data to kinesis. To be honest, I decided to have some fun here and make some optimization that I most definitely didn’t need, but hey, why not. There were two development patterns I really admired that I had seen before:
- Airflow’s concept of Operators and DAG’s, or directed acyclic graphs, to modularize code, especially in orchestration of processing data
- Kinesis or Kafka’s concept of using an intermediary pub/sub or queueing device to separate the processing of one component from another
As such, I decided to play around with “operators” which would execute some defined work, but could also pass messages to one another through a queueing mechanism. This meant processing occurred simultaneously and relatively independently while operating on a single machine via multi-threading.
It certainly wasn’t perfect but it had potential. This kept delays from the SportsReference library to be independent of processing, as well as independent of delays in writes to Kinesis which have their own latency issues. For example, Boto3 works over HTTP and latency builds over multiple PutRecord calls to Kinesis. While this can be reduced with batching calls using PutRecords, it certainly does have its own set of limitations.
Execute and distribute market and limit orders in real-time
This part was essentially the core logic of the entire exchange. Again, it would mock the POC done earlier on order books but would have to be scalable and fast, so let’s check out some requirements:
- Can handle up to thousands of sports games or events
- Can handle up to thousands of orders or bets per second
- Can persist partially executed or non-executed orders in a priority queueing mechanism
Processing
Let’s start with the easy part first. An ECS cluster with Smart Wallet’s Kinesis Consumer and DynamoDb integrity checks could easily scale up to 1 container per Kinesis shard as well as be always up, and maintain long-lived connections which some databases prefer. Based on our partitioning scheme, this would be more than fast enough to handle the expected load.
Winner: ECS & DynamoDb
Storage
Now for an implementation of a priority queue with low latency and persistent storage. There were quite a few options that could work as priorities queues. For example:
- SQL: tables for priority queues with stored procedures
- DynamoDb: tables for priorities queues with a Java SDK Client
- RabbitMQ: priority queue support
Most of these felt like trying to fit a square block in a round hole. Some seemed unnecessarily complex, while others didn’t meet performance requirements or could only handle discrete queueing priority with limited cardinality. In this case, our priority was based on the desired odds of an order, which could essentially be any integer.
But wait — what about Redis? Redis is an open source, in-memory data structure store, used as a database, cache, and message broker. It supports data structures such as strings, hashes, lists, sets, sorted sets with range queries, bitmaps, hyperloglogs, geospatial indexes, and streams. That’s right, it has sorted sets (or priority queues) as a first class citizen! On top of that, it had a simple Python library with great documentation.
Furthermore, AWS offered ElastiCache for Redis as a fully managed service that provided sub-millisecond latency. While data on Redis is not typically persistent, ElastiCache does offer AOF, append only files, that would help mitigate some of my durability concerns.
Winner: ElastiCache for Redis
The Development
Of any part of the system, this was the most critical to get right. I knew it would need clean abstractions to make logic easy to follow as well as strong testing.
With regards to abstractions I might’ve gone a little overboard but in the end it did make development simple. For example, I abstracted an interface over Redis operations to make the priority queue slightly easier to use. Instead of operations like zpopmin
, zpopmax
, zrangebyscore
, I created the following abstraction:
Additionally I could separate the exchange from the handling of incoming orders through the following abstraction:
The other half of this was testing, especially when looking to iterate quickly. Ideally I wanted to be able to test most if not all functionality locally. To do so, I created clients that mocked Kinesis and Redis, but would still be able to perform operations in memory and store data in such a way that it could be accessed in test cases.
Kinesis was relatively straight forward, mocking Boto3’s put_record
call, adding data to a list, no problem. Redis, however, was a little trickier as I had to mock many of the methods with the help of the in-memory Sorted Containers library which included SortedSets
. In the end, the implementation was as follows:
from sortedcontainers import SortedSet
class MockRedis:
def __init__(self):
self.kv_store = {} def zadd(self, name: str, mapping: map):
for data, score in mapping.items():
if name in self.kv_store:
self.kv_store[name].add((data.encode("utf-8"), score))
else:
self.kv_store[name] = SortedSet([(data.encode("utf-8"), score)], key=lambda value: value[1]) def zpopmin(self, name: str):
if name in self.kv_store:
try:
return [self.kv_store[name].pop(index=0)]
except IndexError:
return []
else:
return [] def zpopmax(self, name: str):
if name in self.kv_store:
try:
return [self.kv_store[name].pop(index=-1)]
except IndexError:
return []
else:
return [] def get(self, name: str):
return self.kv_store.get(name) def delete(self, *names):
for name in names:
if name in self.kv_store:
del self.kv_store[name] def set(self, name: str, value: str):
self.kv_store[name] = value.encode("utf-8") def zrangebyscore(self, name: str, min: any, max: any, withscores: bool = False):
if name in self.kv_store:
values = list(self.kv_store[name].irange((None, min), (None, max)))
if withscores:
return values
return list(map(lambda x: x[0], values))
return [] def zrem(self, name: str, *values: any):
if name in self.kv_store:
for value in values:
matching_value = None
for sorted_set_val in self.kv_store[name]:
if value.encode("utf-8") == sorted_set_val[0]:
matching_value = sorted_set_val
break
if matching_value:
self.kv_store[name].remove(matching_value)
Nice! Local development was looking promising. Only issue — I had no way of viewing orders in the order book as ElasticCache Redis didn’t exactly come with a UI.
The Order Book UI
I was looking for something that I could both run locally and in AWS. While I did mock the Redis client, I was also running Redis locally on Docker to get a little closer to the real deal. While scouring the web for docker images for Redis UI’s, I came across Redis Commander, a Redis web management tool written in node.js.
Redis Commander to help view and manage RedisI found a nice HackerNoon article on configuring the Redis Commander docker-compose file for the UI locally, and for the AWS environment I created a task definition for an ECS cluster and routed an ALB to it.
Cancel all non-executed orders in the order book when an event ends
Ok, time for the last requirement. While at this point I could execute bets as well as store them in the order book, I needed some way of draining all bets for an event that ended, as the back end application would have to communicate updates in bet statuses to users.
The solution — a poison pill. A poison pill is a design pattern using a known predefined data item that allows to provide graceful shutdown for separate distributed consumption process. In other words, it’s a message that typically shuts down a component of a system. In this case, I wasn’t technically using a poison pill but I was using a predefined data item on the incoming orders Kinesis stream that communicated to the component executing orders to dequeue the order book until empty, send the orders to Kinesis as cancelled, and mark the event as closed.
This meant that when the component integrating events from the SportReference API noticed an event had completed, it not only communicated it out over Kinesis, but also dropped it in the incoming orders Kinesis stream.
The Architecture
At this point, we’ve discussed a lot of individual design decisions, but this design puts it all in one place, including the communication between components. Note that I’ve also thrown some Kinesis Firehoses in there with S3 buckets so I could view communication between the back end and exchange via Athena, a serverless interactive query service for analyzing data in S3 using standard SQL.
0 Comments