Avoiding Microservice Dependency Chaos using 3LA

Greenlight Engineering
13 min readMar 24, 2023

--

At Greenlight, we are undergoing a fundamental change in how we structure our services. As with many startups, we began our journey using a monolithic architecture. This was perfect while we searched for product-market fit. It allowed us to move quickly with a small team, and without a lot of operational overhead.

We are now in a phase where we need to scale beyond what our previous practices could support. Modular monoliths and microservices are both common industry solutions. Regardless of which design is chosen, the question of boundaries between components is critical to get right. If you get it wrong, the resulting architecture includes unnecessary crosstalk and coupling. These inefficiencies prevent you from achieving the very scale you were seeking.

We have chosen to migrate to a microservices based architecture. One of the key challenges with this approach is avoiding the Distributed Monolith problem. The goal of a microservices architecture is to create services which are independently deployable, with minimal dependencies between them.

With a Distributed Monolith, services are tightly coupled, either because the code from the existing monolith was just extracted as-is, bringing along its tangled dependencies, or you placed insufficient thought into factoring the services and data to minimize the dependencies.

Regardless of the reason, you end up with the worst of both worlds when you have a Distributed Monolith. You simultaneously introduce increased operational overhead from required synchronized deployments while also having to ensure you don’t fall victim to the Fallacies of Distributed Computing!

This is where the Three-Layer Architecture (3LA) comes into play. The 3LA provides a set of design constraints which result in services with minimal coupling and greater reusability. When a design constraint is violated, it is a sign that we have factored the design incorrectly. This has proven invaluable as we have worked through the high level design of our Long Term Architecture (LTA).

Context and Terminology

Before we get into the details of the 3LA, it is important to provide additional context and define some important terms.

  • Domain — A grouping of related business needs that the system must fulfill. Domains may span multiple layers in the 3LA.
  • Capability — A set of one or more related resources and operations inside a domain that provide a subset of the needed functionality. Capabilities live in a single layer in the 3LA.
  • Service — The physical implementation of one or more capabilities. This is the unit of deployment within the architecture.
  • Layer — A collection of services that share a set of design constraints within the 3LA.
  • Orchestrate — The process of chaining together a series of actions across a number of services to implement a workflow.

We follow domain oriented architecture. This includes the use of Domain Driven Design (DDD) to scope our domains and service boundaries around capabilities. Capability boundaries are linked to service boundaries. Dependencies between capabilities result in dependencies between services and vice versa. Any solution looking to tame complexity must watch for complexity in both conceptual and implementation boundaries.

An Overview of the 3LA

The layers in the 3LA are named Commerce, Orchestration, and Core.

A simplified view of the 3LA

Let’s walk through each layer and incrementally derive our design constraints.

The Commerce Layer

The Commerce layer holds all product specific nuances. Business logic, things the marketing department wants to experiment with changing, and APIs that are UI adjacent all live here. The Commerce layer builds upon and orchestrates calls across lower layer services. The APIs in this layer make up the topmost layer of the backend.

A tangled monolith

When working in a monolith, we essentially only have a Commerce layer. All the lower layers are buried in the implementation of the topmost endpoints. Unless the monolith is extremely modular on the inside, a tangled web of dependencies develops that prevents any sort of clear layer boundaries from forming (also known as a Big Ball of Mud). This results in product nuances invading every part of the monolith. The 3LA includes constraints which help pull these nuances up into the Commerce layer, but the first step in the process is to define the capabilities in that layer.

A capability defines a boundary around a set of related functionality. Formally defining boundaries around the topmost functionality introduces abstractions (in the form of Commerce layer capability definitions) that can then be analyzed. We move from just seeing individual endpoints to seeing capabilities. We can categorize and name them. We can assign ownership of their implementations to teams. Most importantly, we can start to see the dependencies between them.

Since managing dependencies is the core problem the 3LA is designed to solve, it includes constraints about what dependencies are allowed. The first is that it forbids synchronous, lateral, cross-domain communication between capabilities in the same layer. This forces related things to stay together and allows unrelated things to be split out.

The Core and Orchestration Layers

What if things are not independent?

This works great for completely independent domains, but it quickly becomes a problem whenever you need to reuse data. How can you access data owned by another Commerce layer service?

Sharing data using a Core capability

The Core layer solves this problem. Core services are primarily data stores and vendor integrations.

Any data that needs to be shared across Commerce layer capabilities gets pushed down into a Core layer capability.

Being able to reuse data only gets us so far. At some point, multiple Commerce layer capabilities will need to perform a very similar set of operations against the same data. How do we handle this duplication? We have two solutions.

Embracing Duplication

The first is to not solve it. Eliminating duplication is a trade-off. There is a cost to centralizing anything. You have to ensure that changes to the shared thing are compatible with all other clients. You need to ensure clients are kept to date. Breaking changes require cross-team coordination. The list goes on. Sometimes, the best thing to do with duplication is embrace it.

If the duplication is relatively small, if the duplication exists by chance rather than by design, or if the cost of maintenance outweighs the cost of duplicating, accept the duplication. Standardizing on the approach or just comparing notes is often sufficient to achieve the advantages of sharing without the overhead of shared components.

When duplication overhead exceeds sharing overhead

The second option is what the Orchestration layer is intended to solve. The Orchestration layer contains all non-trivial orchestration logic that is needed by multiple capabilities. This is what gets added when you decide the overhead of duplication is too costly. Not all domains will have the need to share orchestration logic with other domains so some architectures will find little need for this layer.

More Dependency Restrictions

Cross layer dependencies getting out of control

The no cross-domain, synchronous, lateral communication rule prevents tangled dependencies from forming in a single layer, but we just added two additional layers. What’s to prevent out of control dependencies between the layers?

Restricting synchronous call patterns to reduce dependencies

Synchronous communication is always top-down. Higher layer services may call lower layer services, but lower layer services may not call higher-layer ones. This does wonders for preventing product specific nuances and unmanageable dependencies from seeping into the lower layer services.

Limiting Autonomy to Increase Reuse

Orchestration and Core just do what they are told

Here’s a 3LA design constraint that may seem overly restrictive at first.

Core and Orchestration services are not allowed to initiate requests on their own. Another way of putting this is that they are not allowed to perform any independent action. They do not run anything on a schedule nor do they register callbacks. They do what they are told and no more. Any validation they perform is only to ensure that the requests they receive make sense and do not violate any internal constraints.

This constraint provides further protection from lower-layer services collecting business logic that belongs in the Commerce layer. For example, an orchestrator cannot own the closing inactive customer accounts after so many days. It must be told to perform an account deactivation by something in a layer above it. The number of days and the other conditions under which an account should be closed is something the business owns so it does not belong in the lower layers of the architecture.

A Hidden Danger

Taken together, the rules for top-down communications, no lateral communications, and no independent action drive the creation of generic, reusable services in the Core layer and shared orchestration logic in the Orchestration layer.

What we don’t want to happen!

There’s still a danger, however, that these lower layer services are not as reusable as they first appear. There are times when service implementations make choices about how to behave based on who is calling them and not the operation being called. This sometimes seems the easier route to take when adding support for another caller with slightly different requirements, but not wanting to change the existing callers.

The 3LA forbids this with the constraint: no assumption is made about the caller or intent. In other words, the behavior of an API is the same regardless of who calls the API, as long as its parameters are the same.

Adjusting DDD to Incorporate 3LA Constraints

As described in a previous post, we make use of Domain Driven Design (DDD). DDD modeling starts at the Commerce layer, without any regard to the 3LA. We identify all of the APIs and resources you need to handle your use cases. From there we start iterating to incorporate the design constraints.

When the pieces fit

We map out dependencies bidirectionally and identify places where the 3LA constraints are violated. This causes us to split single capabilities into multiple ones. The split off capabilities are pushed into lower layers in the architecture which reduces the number of constraint violations. Shared resources get pushed down into Core. Groupings of shared operations move to the Orchestration layer. And then we hit a snag. What if the pieces don’t fit?

This is not an uncommon occurrence. The design constraints discussed up to this point handle the easy cases. Let’s walk through a few more challenging scenarios and discuss how we handle them.

Cannot Get There From Here

Scenario: Asynchronous State Changes

Decoupling the communication of async state changes

Imagine a Commerce capability called C-1. It calls Orchestrator O-A synchronously. O-A performs as much of the orchestration as it can synchronously before returning a response. Some time later, O-B makes a change to the state of the request. How does C-1 find out about it? O-A cannot call C-1 directly as that would create a hard dependency on C-1. Callbacks could be used, but doing so in a way that doesn’t couple C-1 and O-A is difficult.

The eagle eye reader may have noticed the qualification to the top-down communication constraint. It says all synchronous communication has to be top-down, but it says nothing about asynchronous communication. This is very much intentional as it opens the door for our solution.

Persistent event queues! O-A publishes events for all notable state changes. Any service, anywhere in the backend may register to receive such events. Neither service needs to know who is producing or publishing the events, only where to send or receive the necessary events.

This fits our scenario perfectly. When O-A makes an important state change, it just emits an event about it and C-1 receives it so it can take whatever follow-up action is necessary.

Can Only Go Down so Far

Scenario: Commerce Owned APIs

How to avoid pushing a commerce resource into core

Core resources are often not directly exposed outside of the Commerce layer. They tend to be overly generic and lacking in some product specific metadata which makes them more difficult to show directly to customers. This, of course, is by design as it makes them more general purpose.

Commerce capabilities can easily fill this product specific gap by maintaining metadata about the Core resources and providing APIs to expose capability specific variants of them.

This works great when a single Commerce capability does this (say C-A), but the moment you have another Commerce capability that would like to use that API (say C-B), you are stuck. The 3LA constraints appear to force you to push that Commerce specific resource variant owned by C-A down into the Core layer. This adds overhead and feels like overkill when we are talking about one or two fields (such as a name and a category).

The solution here is to recognize that Commerce capabilities are called by Clients that live outside of the backend 3LA. Some clients implement or support the UI. Others are third-party vendor adjacent. Clients translate between the data model used by our UIs or third-party vendor webhooks into the ones the 3LA exposes.

Clients are allowed to orchestrate across Commerce layer capabilities as long as that orchestration does not contain business logic. This neatly solves our scenario as a UI screen showing data from C-B that needs data from C-A can simply make the needed call, extract the data and pass it into C-B.

In other words, when pushing functionality down doesn’t work, consider pushing orchestration up.

On Becoming Event Driven

Scenario: If this, then that.

Event driven orchestration

The example from an earlier section of having a Commerce layer capability own the orchestration of inactive account closures demonstrates a powerful pattern that may not be immediately apparent.

Core capabilities can (and are expected to) emit events for all notable state changes. These events unlock the ability to create event driven architectures. A Core capability may not be able to execute a chunk of business logic based on when an account last received a transaction, but it can publish the events to provide the raw data which can then be interpreted through a business logic based lens in the Commerce layer.

One of the beauties of the 3LA is that it unlocks future potential. If a capability doesn’t emit events for every notable state change when it is first implemented, these events can be added later as a concrete need for them emerges.

Architectures built on the 3LA can incrementally become event driven over time due to the dependency patterns it encourages. Implementations are centralized, duplication is kept to a minimum, and reuse is the norm. This makes emitting and consuming new events a much more straightforward enterprise.

All the Data

Scenario: Data Lakes

Using events to provide eventually consistent views of state changes

Some capabilities, such as those in the Risk domain, need data from all over the place. There’s no one layer where such a capability can exist and get access to everything it needs. Even if there was, you wouldn’t want to have to add synchronous calls to Risk every time it has a new data requirement.

Analytics or activity feed use cases have similar properties. Their data needs are pervasive and ever changing.

In the 3LA, all services are expected to publish asynchronous events about all notable state changes to resources. This goes beyond the targeted event publishing mentioned in our first scenario as it includes state changes that were not made in response to any particular request.

Pervasive use of events enables capabilities to be event sourced, decoupled, and decentralized.

Summary

The 3LA design principles at a glance

The elements within the 3LA are:

  • Clients — formally outside of the backend 3LA, but may perform orchestration between Commerce layer capabilities. Clients also own translating between an external data model and the one used by the backend.
  • Commerce Layer — where all product specific nuance and business logic lives.
  • Orchestration Layer — where Commerce capabilities push shared orchestration logic when the cost of maintaining duplicate copies is too high.
  • Core Layer — where shared datastores and vendor integrations live.

The design constraints of the 3LA are:

  • All product specific nuances and business logic live in the Commerce layer; none may live in the Orchestration or Core layers or in Clients.
  • All synchronous communication is top-down.
  • No cross-domain, synchronous, lateral communication is allowed between capabilities in the same layer.
  • All capabilities are expected to publish asynchronous events for notable state changes of owned resources.
  • Orchestration and Core capabilities operations are always triggered by higher-layer capabilities. They initiate requests on their own or take any other form of independent action (such as scheduling things to happen on a periodic basis).

The combination of the layers and design constraints within the 3LA is potent. They provide clear guidance to teams looking to migrate from a monolithic architecture into microservices. Microservice based architectures that adhere to these principles can easily be modified to support additional, unforeseen use cases. They enable teams to work more autonomously and with greater agility. This is exactly why we have adopted the 3LA at Greenlight. If you are going through a similar transformation, we hope our 3LA design principles will help you too!

-Joshua Benuck -Principal engineer (GreenLight)

--

--

No responses yet