Building it yourself: How to implement metered billing

Kshitij Grover

In this post, we’ll work through how we might implement metered billing using the example of a hypothetical image processing platform called PixelMate. Throughout this post, we’ll try to make as few assumptions as possible — the intent is not to construct an overly complex example, but to map out what’s required to build a system that will accurately bill our customers.

Suppose that PixelMate offers the following off-the-shelf plans that users can sign up for:

  • A monthly plan at $500/mo, with 1000 included images. After 1000 images, you’re charged $0.01/image for the first 100, and $0.008 thereafter.
  • A more generous annual plan with an upfront yearly payment of $5000, with 1500 included images each month. After the included monthly images, we’ll charge $0.005 per image.

In order to charge customers today, we use Stripe Invoicing. PixelMate’s BizOps team goes through a monthly close process where each of the 80 customer’s charges are tallied and invoices are sent out by the 4th day of the month. Because PixelMate is growing, it has started to get painful, so we’re off to make this better with some automation.

Existing infrastructure: a simple starting point

Let’s assume a simple starting point for the lifecycle of a request to PixelMate.

  1. The system receives an API request on application servers from an end user, with an API token. The request contains an image which needs to be processed, and uses the API token to identify the customer.
  2. Images are uploaded to S3, processed by async workers, and the final result is stored in our data model. 
  3. In order to notify the end user that processing is complete, the async worker logs an event to a Kinesis Firehose stream, which is then consumed by a system which fires off a completion notification via email.

Using this existing infrastructure, we’ll sketch out how to architect a full-featured metering and billing system relying on Stripe Billing.

Sketching out our events pipeline

We’ll start off by storing the image events data so we can charge for it, taking advantage of our existing Firehose stream (though of course we could be using a different streaming solution like Kafka). Because we care about the durability and accuracy of billing data, we’ll route the Firehose stream into a new table in our data warehouse, a Redshift cluster. We choose Redshift because it’s already being used for other OLAP-style queries by our data science team, and because we know we’ll eventually want to store a lot of events in this table.

This table will be simple, with a row per event in our Firehose stream. The resulting row will minimally include timestamp to represent the request time, a request_id to uniquely identify the request, and a customer_id to identify which one of our customers took action.

	"message_timestamp_ms": 1520032588000,
	"timestamp": "2018-03-02T00:00:00.000Z",
	"request_id": "5d0b0000-0000-0000-0000",
	"customer_id": "cus_2Wm0M3xPjg",
	"image_id": "im_12340101",

Since Firehose (with the appropriate retries) should offer us an at-least once delivery guarantee to Redshift, it’s pretty safe to assume that we won’t normally have any missing events. That being said, we’ll  need to find some way to de-duplicate this data in Redshift, since there may be cases where an INSERT is performed multiple times. To do this, we’ll schedule a daily query that swaps out the existing table with a new one using a select distinct based on request ID.  Even with this, we can’t risk unhappy customers who’d be double charged, so we’ll also have to add a DISTINCT to each query, just to make sure we can’t accidentally double-count one of these events.

Products and prices in Stripe Billing

Since we already use Stripe’s invoices, let’s set up Stripe Billing to handle the automation. Stripe Billing feels like a natural tool to reach to, since Stripe is already being used in our organization and their documentation advertises what looks like comprehensive support for the usage-based billing use case. When a user signs up for PixelMate, we’ll receive a signup request and the plan that they’re intending to use — this should then create a subscription in Stripe Billing so we start charging the customer based on their plan.

Accordingly, we’ll create the following in Stripe:

  • A product called PixelMate Base Fee that is associated with two prices: one that charges $500 on a monthly cadence, and another that charges $5,000 on an annual cadence.
  • A product called PixelMate Overage that is set up to use Metered Billing. Here, again, we’ll set up two prices with the appropriate tier rates. When configuring this product, we have to make a coupled decision: how we want to determine the final usage based on the records we report. We’ll come back to this in a second!

For monthly subscribers, we'll subscribe to both the base fee and the overage, setting up the invoices with both line items — we’re all set in terms of billing behavior for that option. 

Unfortunately, the annual case isn’t straightforward: Stripe Billing won’t let us set up an invoice with both an annual and monthly fee — different cadences aren’t supported. To get around this, we’ll have to create two Stripe subscriptions for each customer, one to each product. This isn’t ideal, since our customers will see two separate invoices, and failures could mean that these subscription lifecycles aren’t always aligned. Nevertheless, we’re set up to start charging both the platform fee and overage.

Connecting usage to Stripe Billing

As images are processed in PixelMate, we want to make sure we’re charging the user what’s due for them. Since we’ve set up a metered price, the recommended approach is to use a UsageRecord (Stripe Docs) to either report an incremental datapoint, or a cumulative value. 

It’s easy enough for us to execute a query over Redshift for a given billing period, so let’s choose to report the cumulative value; this means we’ll set the usage to the latest usage record in the period. In particular, here’s a brief overview of how our reporting pipeline will work:

  • Using a Cloudwatch event (i.e. distributed cron), every hour, we’ll trigger the run of a new service called the StripeUsageReporter.
  • We’ll loop through each of our customers and query for their Stripe subscription based on information in our data model.
  • We’ll filter to the Stripe subscription that has the relevant metered price, and find the billing period bounds corresponding to the metered line item on the next invoice.
  • We’ll execute an aggregate query in Redshift with the period bounds, which is the cumulative value in the month so far which we want to report.
  • We’ll make an update call to Stripe with a UsageRecord containing that value. In order to make sure our request is idempotent, we’ll use an idempotency key which is a function of the timeframe over which we’re reporting.

As we’re evaluating this design, there’s a troubling edge case: what happens if we miss a reporting window? This could happen because our containers are down for an extended period of time, Cloudwatch has an outage, events aren’t being delivered to Redshift, or there’s simply a bug in our reporter. Stripe does not allow reporting a usage record for a subscription item once the billing period has already passed — this would lead to missed usage.

Even barring cases where our usage reporter is down for unforeseen reasons, this can happen in the normal course of business. For example, when our usage reporter is active around the end of the month, it may very well fail to report usage for a subset of customers in the previous period, because the month boundary has gone by. It’s possible we may be able to report it for the following month instead, but there will be plenty of downstream issues here (we’ll lose the ability to audit since our invoices don’t match Redshift timestamps, our revenue recognition will be wrong, and so on). 

When discussing this edge case with our product manager and finance analyst, we decide to prioritize the guiding principles of invoice accuracy and invoice clarity. We should ensure that users pay exactly for what they consume, and there is full transparency and clarity on what’s being billed. We collectively decide that the edge case around billing period boundaries is not one we can accept from a business perspective, and go back to the drawing board for a working solution to report usage to Stripe Billing.

Stripe Billing setup (take 2)

Instead of using the UsageRecord reporting APIs and a metered price, what if we just rely on Stripe to handle the invoice generation, and have our system take care of the usage entirely? Luckily, we can take advantage of the fact that Stripe doesn’t immediately issue our invoices. 

We can build a new webhook consumer which listens for the invoice.created webhook every time a billing period ends, and edit the draft invoice to include our usage item. By default, invoices will issue an hour after they’re created but we can give ourselves some more breathing room by turning off the auto-advance behavior. We now can make sure that our usage is added to the invoice before it’s sent.

Although this does work, we’ve now taken on a lot more responsibility: we can no longer rely on Stripe’s graduated price modeling, because our system is solely responsible for generating the new invoice line item which should include both the quantity and the amount we want to charge. This has two consequences:

  • Our PixelMate data model is now the source of truth for the customer’s tiering price configuration, since Stripe just manages the base monthly fee.
  • The webhook consumer that we build has to now implement the tiering logic to translate a quantity from Redshift into one or more line items that need to be added to Stripe. Even with a simple multi-tier graduated price, this is error-prone and sensitive business logic!

Regardless, this solution meets our correctness criteria even if it falls a bit short on simplicity or separation of concerns.

Providing mid-period visibility

With some work, we’ve now set up a system that allows us to bill our customers accurately and on-time for their incurred usage. How might we now provide both internal teams and our end users some visibility into their upcoming invoice? This would be a valuable part of the PixelMate admin console and allow our product and growth team to better understand how user adoption is tracking.

This isn’t possible using Stripe’s upcoming invoice API with the approach we’ve settled on so far, since Stripe doesn’t know about usage until the invoice is about to finalize. 

Instead, let’s explore creating an InvoiceItem on the Stripe customer instead; this is a pending invoice item that can be updated during the period and will automatically get picked up by the next invoice. Because an invoice item isn’t attached to a single subscription/invoice, we’ll attempt to provide our mid-period visibility by merging together the subscription’s upcoming invoice and pending items. We will also have to manage the lifecycle of this invoice item ourselves; for example, we’ll want to make sure that we don’t create a new invoice item for a new month until the previous invoice has been issued with the existing pending item, and also ensure that any one-off invoices we issue don’t pick up this item. This is getting a little convoluted, so let’s back away from this idea.

We’ll resort to just hooking up a tool like Metabase to our Redshift instance — at least this will get us a view of usage in a month and some simple day-by-day visualizations for our internal teams.  With some more engineering work, we’ll be able to combine this with our application datastore to build an internal dashboard of the customer’s upcoming invoice.

Changing prices for new customers

Six months down the road, we’ve gotten feedback from customers that our tier boundaries aren’t set exactly correctly. Pricing is not a “set it and forget it” exercise, so our product partner is keen on us iterating on it. Our new goal is to make a slight modification to each tier, increasing the price of the Annual tier slightly and tweaking tier allocations.

We will roll this change out to only existing customers, as we want to monitor impact to our conversion and upgrade funnel before rolling it out to new customers.

Because Stripe is no longer the source of truth for pricing, we have to build the data model and logic to orchestrate the price change. We’ll create a versioned tiering model, where each version has an effective_date, helping us to successfully maintain old pricing for existing customers. This will also ensure that invoices that haven’t yet been issued will still be on the existing pricing.

We’ll also revisit our Stripe user provisioning step and modify it so that when a subscription is created, we’ll attach some metadata that identifies a price_version_id for a version in our data model. When an invoice finalizes, we’ll find the corresponding subscription’s price_version based on the metadata field, and use that to generate the additional invoice line items. 

When we want to change pricing on existing customers, we won’t be able to rely on metadata from the subscription creation. Instead, we’ll have to add more business logic to our codebase that identifies which invoices for a subscription should use the old and new tier amounts.

The bottom line

Building metered billing with Stripe is doable, but it’s tricky to get right and even more complex to maintain. As an engineering or product organization, billing infrastructure is understandably not an area of sustained engineering investment. If you’re facing this challenge and want to get time back for your core engineering work, we'd love to help. Orb handles streaming and deduplication of events, flexible in-product queries, and supports your stakeholders through the full revenue journey. You’re invited to explore our sandbox to get a sense of the product.

(By the way, if plumbing work like this excites you, we’re hiring!). 

May 10, 2023

Ready to solve billing?

Contact us to learn how you can revamp your billing infrastructure today.

Let's talk.

Thank you! We'll be in touch shortly.
Oops! Something went wrong while submitting the form.