web engineer - chillfire

We are just selling a few products to our customers, it can’t be that hard, can it?

Background

We were a number of months into designing and building a micro service based platform to replace a number of monolithic B2B applications (each had their own business organisations, products and services), when it was discovered that no one in the business had considered how the products might be sold and billed from a single place, they just assumed we would do it the same as we had always done.

This seemed like a simple issue, cross selling products to our customers is what we had always done, but in a very manual way.
We were planning on combining each business into a single data-driven platform where we would link a product to a customer and invoice them for the number of users that use the product.

Investigations

However after doing some research around the different business heads and sales teams, it turned out that there wasn’t a lot of common ground to start from.

In reality:

Customers were being charged different prices for the same products.
Prices were time boxed or volume controlled.
Some sales were based on sliding scales.
Contracts sometimes didn’t mention pricing.
Price changes were difficult due to no end dates in contracts.
Forecast were impossible as we sometimes billed in arrears.
The number of drivers changed on a nearly daily basis.

There were also a few additional wrinkles that I found out about that sat outside of what we knew:

Pricing, some customers were being charged in Euros and a few in other currencies.
Content, was predominantly in English, but we had one product that supported 5 foreign languages.
Versioning, some content was updated constantly and only some customers would be eligible to see it.
Users being added at any time should get the latest content their company has access to.
Users being added or removed might trigger a charge or refund depending on the contract.

This new information left us a with a lot of new technical challenges that had not been picked up in the investigation stages.

What we did as a business

Our generic business model was simple, our customers have drivers and their drivers have risk associated with them. We mapped and showed companies where that risk was and how to mitigate it.

Based on a bespoke customer algorithm they might have a range of products added to their user account depending on what they do:

a regular licence check
elearning courses
online assessments
on road training
telematics installed in their vehicle

We charged the customers for each product their drivers completed.

The billing was very ad-hoc and was purely manual
Our monthly bill run was taking weeks of time and no one really had any solid idea of where the money was coming from or when it might arrive.

Automation

I really wanted to limit the manual work needed to run the new platform and decided to introduce a workflow process that would cover the assignment and billing for products to drivers.
In later iterations the workflow became intrinsic and central to how the entire platform functioned as we linked any repeatable task as a step in a workflow somewhere.
I tasked one of the senior engineers to find some options and after some demos and discussions we ended up using an off the shelf c# open source platform called elsa: https://v2.elsaworkflows.io/
It had hooks in and out of our systems for triggers and had its own admin UI tooling, this saved us a lot of development time.

Mobile Contracts

I decided that the best solution for "how will we sell and bill products to our customers" was to build it like a mobile phone bundle/contract.
Everyone understands how these work and I had done this before in the telco world.
I was hopeful it would allow us to sell any number of items bundled together to any number of customers at any price.

I set out the technical business requirements for the Offer Package.
It would be a structural wrapper around our content, products, prices, policies, templates and any other data linked to what we would “sell” to a customer.

The technical design sessions did become busy over time as the team and I were modelling against ideas that were evolving.
To help we developed proof of concepts and mock-ups as we went to make sure we were on the right track.
We were able to hide the complexity and repetitive work inside the workflow engine.

Data Structure

Working closely with the my data team, we worked out a base data structure that would work within a relational database and fit into the overall tech roadmap we had in place.
Our biggest goal was that we didn’t want any data duplication anywhere, as it was a huge issue in the current monoliths.

The Offer package structure ended up a:
Each Offer Package had many Products and each Product had versioned content in different languages with a default price for each available currency.

We added more availability dates after modelling this out against what the sales teams would need to sell we realised we had to be able to control when the Offer package would be available.
These were then added to Products, Prices, Pricelists and in the end every entity in the database had them, these became some of our core fields that we used for all sorts of reporting and automation checks.

Building the validation for this to work in code was a headache for the developers and a lot of unit tests were created to cover a very large amount of use-cases.

The validation had to consider cases like: An offer package could be assigned to a customer at any time, but it would only be visible to a workflow to be assigned to a user after its available date was passed. However, if the products or prices inside the Offer Package were not available then the workflow would error and the users would not be assigned the right products and no charges would be raised.

As the complexity grew so did the technical workshops and whiteboard sessions to make sure we were not missing anything.

Educating the developers on what the architecture looked like was imperative, proof of concept models and UI were being drafted/built on a sprintly basis.

Once the team had agreed on a model that felt we were on the right track and user stories were created
Demos were done to the stakeholders at the beginning of each testing release.
Any ideas from these stored for a later development phase to get the core done as quickly as possible.

Versioning

The next challenge was how to store the versioned and multi-lingual product content, each type of content had a different data structure; text, videos, images, files, full assessments, 3^rd party data, external API calls.
A standard relational database model really wasn’t the best option for storing this mixed data set very efficiently, but we couldn’t move from this.

One piece of content like a word document or PDF for example, would need to be made available to a driver in their own language, this content would always need to be up to date when they read it the first time. However, if it was something they signed and agreed to, they should never see the updated version of the same document as it would not be valid.
But if its not something that’s been agreed, then maybe they should see the latest version every time, even if its changed.
This gets more complicated when managing elearning content, if you have started the course and it gets updated after you started, should you get the new version or continue on the old?

All of these sorts of use cases had to be managed via flags in the data, every workflow decision needed to be able to be answered via a data query.

Complex Content Storage

Each product was available in multiple languages, each language could have multiple versions.
Some versions were ready for the future, some had been available in the past but only one was ever available ‘now’.
Now being relative to when the user wanted to view it.

My initial solution was to use a SQL field to store serialised and structured model JSON for the content and standard fields for the generic data name, dates, version ID, etc.

I had done this sort of thing before for health data models in XML in the past and although the developers weren’t overly comfortable with storing the data like this, once we ran a proof of concept through they saw the benefits.
Each product type would have its own model in c# that would allow the system to store and read the JSON instantly based on a lookup type stored in the data.
It also enabled us to update the structure of the models by either migrating the JSON itself or simply creating a new model type to save and read the new data type.

Unfortunately, when we started volume testing this as part of a proof of concept, it didn’t work exactly as we expected.
For smaller pieces of content, like a URL to a file etc, both reads and writes were fast. The serialised JSON would be ingested by the correct class and it worked really well.
However when working with complex data like assessments/surveys, large amounts of HTML this read time became too long.
We had to split the versions of each content out so these also had available from/to dates as well.
This wasn’t a huge amount of extra work, just an extra layer of data and it now left us with a much cleaner data structure, we still used the JSON concept, but now we could find the just content we needed ‘now’ much faster than opening a large piece of JSON, parsing it and finding the right ‘now’ content.

Cache time

After the table adjustment we ran some more efficiency/benchmarking tests on the new data layers and found that storing the complete UI view in the table as serialised JSON was extremely fast.
Adding a caching layer on top of the database behind the API gave us even more flexibility.

If we needed [/product/123/content/568] from the API it returned the JSON object directly from the cache if it was available, or from the database if not, while storing the result in the cache for next time.
The UI could render the object without any computation time.

A workflow service monitored the cache and clearing out anything that was older than a set time frame.
We added a separate mechanism that reacted to changed data in the database and removed dirty cache entries.

Take away

Looking back there were times when I am sure my team thought I was losing the plot with some of the decisions that were being made, however the end result was a slick system with the smallest amount of human admin as necessary to keep things running.
There are parts of this I have not mentioned, that were just as big a challenge as offer packages, like billing cross currencies, the definition of ‘now’ for 2 users in different timezones etc.

The Offer Packages and Workflow were a turning point in the evolution of how software was being developed.
Abstracting business ideas away into generic data structures with automation and solid technology behind them leaves everyone in a place where any mis-understandings or changes can be made without re-developing code and moves closer to my personal goal of configuration

Don’t get me wrong, it wasn’t perfect, but it was a lot cleaner and more importantly leaner than the systems we were leaving behind.

Complexity

The underlying data over time became very simple and was everything ended up being based on a generic table structure.

The models we mapped out in the database were mapped directly to real world values, giving the data meaning and context, end to end across the platform and its services.
When we reviewed the system when it was up and running with real data in it, we could see that things made sense to both developers and business stakeholders alike.
Any changes to the system now had to have a reason for being made, new fields have to have a value for being added.
We were finally only storing the data we really needed.

What I didn’t get time to do

This wasn’t a true micro service when I left because although the services themselves (Offer Package, Prices, Billing, etc) were independent pieces of functionality running on different azure pieces, there was still only a single database.
Workflow was the only true microservice, although we shared c# code with it so it could understand our logic.

The SQL database schema was set out to allow splitting or sharding, but we were waiting until we had the data volumes in before we started doing this as the cost was just too high to do on a small customer base.

Future thoughts

Abstract data-driven systems are powerful, but they are also very complicated to put together.
We could have built a very specific piece of software, this would give us a mirror of what we were replacing, it was a bold move to do and took longer than we wanted it to, but the end result is a data driven platform, that with a few changes of config could be used for a completely different business, without redevelopment.

web engineer - chillfire

Friday, February 02, 2024

No comments: