Some notes on architecture diagrams

I look at a lot of architecture diagrams. Most of them are not about systems I personally shepherd, and I have no control over what is presented to me. But boy, do I have opinions.

In this blog post, I outline some principles I see as immutable, regardless of system shape, and explain why I hold these beliefs. I explain what views I—and probably you—need. And finally, we talk a bit about how to produce them while acknowledging that this is dependent on which technologies you use.

The point isn’t to generate pretty pictures; in fact, I couldn’t care less about the layout or color scheme (okay, okay, readability would be nice). I want truthfulness, not having to memorize a bunch of verbal addenda and errata when I meet with the architect for the first time.

The cool thing about these diagrams is that we all want the same things from them: no matter if you are an architect, an individual contributor, an auditor, or involved in due diligence, you want to see data flow, external surface area, network segmentation, data warehousing, et al. If we do things right, we can use the same diagrams (or maybe with slight redactions) for onboarding, Architecture Decision Records, threat modelling and other workshops, and DD and audits. Wouldn’t that be nice?

Some principles

Below is a quick list of my principles. If you do most of these, we can be friends (if you don’t, I’ll still be your friend, but I’ll judge you for your life choices).

Generate the data. The diagram should rely on data, and the data should be autogenerated. Generate it during CI, export it from your cloud or hosting provider, do what you need to do to get to the truth.
Make it text-first. This makes the diagram diffable over time. You see what components were added, removed, changed, moved. It’s awesome.
Timestamp it. Embed some meta-information (commit hash, generation time, etc.) in the diagram.
Regenerate it on change. Whenever the architecture changes, the diagram has to change. Ideally after CI has run (I always love to see a doc stage in CI definitions).
Use redactions to make it publishable. If you want to share it somewhere, you can redact sensitive information first (easy when you generate a bunch of data in a sensible format!). This way you can share it on external documentation or during DDs (thank you!).
Multiple views over the model. A model can encapsulate the whole system, but a view cannot. Give me multiple pictures (see below for a list of views).

Now, if we had a tool that made it easy and cheap to create views of your system, this would all be so simple…

Some anti-patterns

Of course, all principles come with their inverse. Here are some things I don’t want to see or do:

Don’t draw the diagram by hand. I have no idea how accurate it truly is, and I always discover some blind spots that erode trust in the model.
Don’t just commit pictures. I want the generation data and metadata, so I can diff things over time.
No one-diagram-to-rule-them-all. No architecture can be captured from all angles in one diagram. Generate more than one (more information in the next section).
Don’t just draw entities. I need to see data flow and boundaries to understand the system.
Don’t build it all at once. I describe a fully specified system. You don’t need to start with that, though, and you probably shouldn’t. Start with a simple import from K8s, AWS, or the like, and iterate from there.

The views we both need

No matter if we look at the system from the outside or because we live inside it, we still need the same views. For the purposes of this blog post, I thought of some snappy names, but I suck at naming, so you’ll need to go by their description.

Context: external actors, major systems, trust boundaries.
Container: services, data stores, queues, external SaaS.
Component: internals of the service (for your most critical services).
Data flow: data flows, classifications, trust zone crossings. Think DFD.
Network segmentation: VPC/VNet, subnets, SG/NSG/NetworkPolicy edges, ingress/egress points.
Auth flows: OIDC/OAuth exchanges, token scopes, service-to-service auth.
Data persistence: stores, retention, RPO/RTO, backup/restore, encryption.
Runtime topology: deployment graph (K8s namespaces, workloads, ingresses).

These first three are essentially the C4 model views, and we just add some other useful things on top.

I know that some of these are easier to generate than others, and creating all of these accurately and automatically isn’t always feasible. This is a best-in-class write-up, and for some systems might remain aspirational. Try to get as close to it as possible (and then back off when the asymptotes kick in).

Data sources and what to generate

Here are some things that might serve as data sources for your diagrams. These are so specific to technologies used that I won’t even try to cover all of it. Take these as examples, not as a full list.

Infra: Terraform/Terragrunt plan graphs, cloud inventory, K8s manifests/resources.
App surface: OpenAPI/AsyncAPI/gRPC/proto; service discovery; ingress rules.
Auth: IdP config (e.g., Keycloak realms/clients/roles), OPA policies, service RBAC, IAM roles and rules.
Data: DB schemas, dbt DAG, lineage (OpenLineage/Marquez), S3/Bucket inventories.
Runtime: tracing/telemetry (OpenTelemetry/Jaeger) to call graphs/sequence diagrams.
Policies: network policies, security groups, firewall rules for segmentation views.

Hopefully, you won’t need all of these data sources. But whatever you take, consolidate it into a format of your choice (JSON or YAML are usually good choices) that is then used by renderers to actually produce pictures (or Mermaid/PlantUML/whatever diagrams).

A production pipeline sketch

I sketched an example GitLab pipeline (full disclosure: I used an LLM for a lot of this, so you’ll probably have to either fix most of this or just take it as pseudocode) in this gist. It showcases all the steps and generations.

This might seem like a lot (and it is!), but keep in mind that this is the maximalist approach, and you don’t have to start with the full package. Start with something simple and work your way up. Build an MVP in a day and then start iterating (context and data flow are two good ondes to start with).

Metadata overlays

Some metadata will probably not be captured in these generations. We either inline the metadata about each service in the diagram, or we link to the service documentation and/or source inside the diagram.

For each node we need owners, SLAs, data privacy and authZ information, external dependencies, and so on somewhere. For each edge, we need protocol, transport, auths, rate, etc. Not all of this should be inline, it will crowd the diagram. Link to external sources of information wherever possible.

A quick note on redactions

There will be a point in a company’s lifetime when these diagrams need to be shared externally. Maybe some of them are part of the external documentation. Maybe they’re being handed over to a potential buyer or investor.

Some data will then need to be redacted. The nice thing about a data and model-first approach, though, is that redactions are easy to implement: they are just filters.

And, since you have multiple views of the system, you can additionally control sharing by choosing the view. You can throw out the nitty-gritty like the runtime topology or network segmentation without any additional work.

Fin

In this blog post, we looked at how to build useful architecture diagrams for internal and external use. We talked about principles and anti-patterns, views and how to create and manage them.

I hope you were able to take something useful away from this blog post that will elevate the usefulness of your diagrams. They are the first view into your system, and they are what you’ll see when you close your eyes to think about your architecture.