Customers and partners, and recently investors ask the question “How is one Intent-Based Networking (IBN) offering different than the others?” And more often than not I hear answers that sound subjective, descriptive, and subject to interpretation. What is needed are tangible, fact-based nomenclature that can help you reason about the maturity of IBN solutions. Using this nomenclature, we should be able to map IBN solutions to an IBN maturity level, starting with Level 0 (low maturity/incomplete) and going up to Level 3 which enables a fully compliant IBN solution (mature/complete).
Apstra introduced the notion of intent-based networking and Self-Operating Networks in June of 2016, and in our blog “Intent Based Networking: What Is It” we provided a definition of IBN, listing capabilities that a complete IBN system needs to provide. The goal is to classify maturity levels of various IBN implementations and enable network operators to cut through the marketing hype and make sound buying decisions. With that in mind, let me introduce the IBN maturity levels.
Level 0 IBN: Basic Automation
As IBN matures it is expected that there will be early implementations that do not have all the required capabilities initially implemented. Capabilities that may be present in these Level 0 systems include the ability to:
Generate device configurations from declarative specifications. For example: scripts running Ansible modules or other declarative libraries such as NAPALM.
Support a heterogeneous infrastructure.
Ingest real-time network status in a protocol- and transport-agnostic way.
The main capability missing at Level 0, which is the requirement for the next level, is the presence of a single source of truth that contains both intent and operational network state. This is the key requirement that enables reasoning whether or not the intent has been met and is therefore a fundamental aspect of a mature IBN implementation. Most IBN companies today are Level 0.
Level 1 IBN: Single Source of Truth
An implementation classified as Level 1 implements a single source of truth containing the intent and the network operational state. It contains data and state artifacts related to all aspects of a network service lifecycle: design, build, deploy, and validate. It is very important to emphasize that this single source of truth contains both the intent and the context rich operational network state. This clarification is important as in the absence of stored intent, it would be impossible to validate that the business intent has been met. In the absence of context it would be impossible to reason about the impact of operational failures on business objects. It would also be impossible to reason about new business rule changes (For example, can this change be implemented without impacting existing business objects and services?). To make an analogy, if you were playing in a symphonic orchestra you would look at the conductor (intent), as well as listen to other players (operational state), in order to produce the most fitting performance of your instrument.
Next, how do you test whether an implementation has a single source of truth? It is fairly simple: there should be an API that allows you to query the single source of truth to get the answers. For example, there should be an API that answers the question “What is the link utilization on all the links that carry traffic of customer Pepsi?” or “Which customers will be impacted if link x fails or gets congested?”
Now imagine how you would answer these questions without a single source of truth. You would have to:
Consult a network map:
Verify it reflects the current state of the network physically
Verify the operational state of all the links
Determine how the tenant is configured:
Where do the end points reside
Overlay the end points on the network map in step 1
Collect all the data:
From the union of step 1 and 2 check the NMS for all needed counters
Do a calculation by hand of the effect of the reflowing of traffic for a link failure
… and stitch together the answer yourself, or write a script, or hire professional services to connect the dots. The need to stitch together the answer yourself indicates you are dealing with Level 0 system. Getting the answer should be the job of IBN, especially as it forms the foundation for getting to the next level of maturity.
Another angle is to see what you are missing if you don’t have a single source of truth? Well, when the need for troubleshooting the problem arises your experts will not be able to ask some IBN solutions the right questions and will have to dig into the problems themselves, which usually involves manual, repetitive, mundane tasks that waste time (and money) and cause dissatisfaction. If you don’t have intent, but just operational state, you will not be able to tell what is right and what is wrong. Interface “down” may be a perfectly fine status if your intent was not to have a cable plugged into it. And interface “up” may be an indication that an intruder has plugged into your network. How do you know, even for such a simple example, what is right and what is wrong? And you cannot look at a device configuration for an answer as someone may have entered something incorrectly. And you cannot alway rely on control plane protocols as they may have bugs. What you see on the device is not always what you meant to be there.
AOS has been built from day one with this data-centric and single source of truth paradigm as a fundamental guiding principle. It is not an afterthought. Integrations built around message buses are not data-centric. They are message-centric. For analysis and comparison between the two, along with the limitations of message-based systems when applied to an IBN domain space, check out our AOS Architecture white paper.
In summary, Level 1 introduces the notion of a single source of truth that ties intent and operational state together and can therefore be used to reason about whether the intent has been met. Consequently, in the absence of a single source of truth, the solution has little to do with IBN.
Another way to look at this is
Level 1 can give you answers to important questions about the state of your intent and your infrastructure. The next level is having IBN ask the right questions on your behalf and do it at the right time.
Level 2 IBN: Real-time Change Validation
Level 2 addresses an important capability: real-time validation that the intent is met.
Change is inevitable, and the fundamental task on IBN’s plate is how to deal with it. One set of changes comes from the operator in the form of a business rule change or policy change. Even more challenging ones (as you don’t control them) are changes coming from the infrastructure in terms of operational status changes or failures. If something failed and you didn’t ask a question, you will not know about it.
The real-time aspect comes in the form of real-time notification in response to a subscription that an event of interest took place. If an IBN implementation is doing batch processing at scheduled intervals it is not real-time. Batch processing may be perfectly suitable for some use cases and completely inadequate for others.
Subscription is also a mechanism for scaling the complexity by partitioning the problem domain into manageable chunks. You don’t want everyone notified of every change. You want specialized behavior implemented by its own module and by subscribing only to a subset of events that are relevant to this specific behavior.
While real-time is the meat of this requirement, “programmatic reasoning” also deserves attention as it is about how the validation is implemented. Let’s give a few examples to introduce this concept, in the context of validation.
Let’s say a business rule change like ‘add a new tenant and/or security zone’ was submitted to an IBN system. Before making the change, the IBN system may want to query the following:
Does the request conflict with any existing policy?
What resources (IPs, VNIs) should be used?
Another example is an operator who wants to put one of the leaf switches into maintenance mode. Before performing the request the IBN system may want to check:
Is this an appropriate time window for this action?
Are there any other leafs in maintenance mode? How many?
Is there a mission critical app running on servers attached to this leaf switch?
All of the points noted involve some sort of query against the single source of truth, containing resource allocations, policies, and operational state. The query response is processed by a callback function. “Programmatic reasoning” is a requirement that all these specialized behaviors are implemented using a standard, repeatable pattern. While most systems can be implemented in software, we are insisting on the need for testable, maintainable software. How do you ensure this with your IBN vendor? Simply ask to add a few specialized behaviors and see how repeatable and digestible that process actually is.
So what happens in the absence of these capabilities? If things go wrong and you rely on your batch processing analytics to tell you 10 minutes later what happened it may be too late. If you build fragile integration to close the loop between configuration and operational state it will likely break in the presence of change if you don’t have a single source of truth and pub-sub support. Will this integration tell you when things fail or an administrator changes some policy or modifies a service or modifies which infrastructure resource is your prized customer “Pepsi” traversing so that you can guarantee the service you can be proud of? Can you have thresholds that trigger anomalies be dynamically set based on the current operational status, without your intervention? When you put your spine to maintenance mode you expect traffic to increase on the remaining devices, but can your IBN system recognize that and adjust the thresholds? Or do you turn off monitoring in these situations to avoid a flood of these “expected” anomalies and likely miss the real ones? Can you have anomalies trigger data collection and analytics which give you deep insights by collecting and analyzing data when (real-time) and where (leveraging pub-sub) appropriate, something you couldn’t afford to do all the time on all your resources?
AOS supports a powerful pub-sub mechanism layered on top of a graph-based representation from day one. And it is a pillar of our support for future innovation. Message-based pub-sub doesn’t even scratch the surface of complexity required to support a rich set of relationships between the intent, resources, services, policies and capabilities that exist in modern infrastructures. And pure graph databases don’t have real time, granular pub-sub mechanism support.
Programmatic reasoning is also an enabler for creating more sophisticated functions such as root cause analysis, identification of complex symptoms (“Is my total fabric ECMP imbalanced?”) etc … Being at least Level 2 is the only reasonable way to implement IBN requirements in a scalable and extensible way.
Level 3 IBN: Self-Operation
While validation closes the loop between the intent and operational state by providing observability, the last step is doing something about what is being observed, if and when it is applicable. This ultimate level requires corrective actions and takes IBN on a path to self-operating networks. This step is absolutely impossible to tackle if one has not built solid foundations in Levels 2 and 3. This step does not appear to be a huge technological challenge given the supporting features in the first three levels, but it is expected that current operational practices and (understandably) human reluctance to relinquish control to software will throttle the adoption. As the maturity of IBN solutions increases, so will the acceptance of the capabilities offered at this level.
AOS is at present a Level 2 IBN solution, for reasons that have nothing to do with technology, as the technology capability is there. The technological advances alone are not a catalyst for change. As an industry, it is our collective response to opportunities presented by technology in general, that will drive the transition to self operating networks. As this digital transformation takes hold in human and organizational behavior so will self-operating networks come to life.
Completeness and Scale
In addition to solutions being classified according to maturity model levels there may be other constraints present in the solution that need to be taken into consideration. Constraints you should give second thought to when evaluating an IBN solution include:
Solution may be vendor specific
Solution may be tailored to a specific reference architecture with no intent to make it generally applicable
Solution may focus on a subset of network service lifecycle phases (a subset of design, build, deploy, validate)
Solution may be focused on specific function (security, or reachability, etc…)
Solution’s single source of truth may be focused on intent only or operational state only
Solution may be applicable only at limited scale
If any of these constraints are present they could be added as a qualifier to classification, i.e.:
Level 1 (reachability only)
Level 2 (vendor x only, deployment only)
When constraints are present it may be important to assess how likely are they to be removed in the future as the implementation evolves. In other words, is the implementation likely to remain a silo applicable to constrained domain space, or is it going to break the silo and remove the constraint? Also note that putting multiple silos as a “bundle” usually reduces the maturity level of the composite to Level 0 as bundling fundamentally breaks a “single source of truth”.
Single vendor solutions (including “white box only” solutions) limit your freedom of choice. You may be perfectly happy today with your trusted vendor, be satisfied with their pace of innovation and support, yet want the option to switch to another vendor in the future for the same reasons. Or add a white box offering. Or you may enjoy the openness of white box but want to mix it strategically with the quality and support of a trusted brite box vendor for cost and performance trade-offs. But where that trade-off line lies should be your decision.
More often than not, constraints indicate that certain assumptions have already been made early in the design and modeling process, and removing these constraints later may be a challenge. You need to understand if an IBN solution is based on a flexible underlying platform that can evolve hand in hand with innovation and new reference designs.
Therefore, understanding the ability of an implementation to evolve is crucial.
You can make sense of the different capabilities of the various offerings that claim IBN by mapping each into one of the levels we have identified. Once you fit an offering into a level, you can understand the limitations from a capability standpoint. The next step is evaluating an offering on how complete and scalable the offering is at that level. But if you are really looking to deploy IBN, as you should be, you can use the levels to think about what capabilities you need for your network and what scale and scope you require for these capabilities.
You also may want to consider the ability of the chosen solution to evolve. Phenotypic plasticity refers to an ability of a code in our DNA to have variability and respond to changes in the environment. For example, in response to availability of more food and increased activity, mammals can gain muscle mass whereas snakes cannot. In turn, you should check your IBN vendor’s DNA for this kind of plasticity. If you buy yourself a snake, don’t expect it to grow muscle in response to a new requirement, though your vendor may try to sell you a bigger, more muscular snake. If you are not into building a snake farm and if you need to adapt to a change in the environment and stay competitive by keeping the options open – buy yourself a mammal.
This guide should help you differentiate between snakes and mammals. Whether an IBN solution has a single source of truth is not an opinion, it is a fact. Whether a single source of truth contains both intent and operational state is not an opinion, it is a fact. Whether an IBN solution involves real time aspect and granular pub-sub mechanism to give you the right answer at the right time is not an opinion but a fact. Whether an IBN solution has built-in support for a multi-vendor environment (and not just a theoretical one) is a fact. These levels are black and white. And these technical capabilities clearly map to your benefits as discussed in this blog. With this information you are empowered to be the judge yourself.
Learn more about IBN, or to speak with someone about where IBN offerings fit in the maturity model please schedule a 1:1 briefing.
About the Author:
Sasha Ratkovic is the Co-Founder and CTO of Apstra, Inc. He is a thought leader in Intent-Based Analytics and a very early pioneer in Intent-Based Networking and Self-Operating Networks. He has deep expertise in domain abstraction and intent-driven automation. As CTO at Apstra, he drives the architecture direction of the Apstra Operating System (AOS) and is deeply involved in all aspects of the Apstra product and engineering efforts. Prior to Apstra, Sasha was a distinguished engineer at Juniper Networks where he led automation efforts for data center products. Sasha holds a Ph.D. in Electrical Engineering from UCLA.
Portions of this article by Sasha Ratkovic appeared previously in Network World on July 19, 2018 at https://www.networkworld.com/article/3286812/lan-wan/a-taxonomy-of-intent-based-networking-ibn.html