Aeon-ZTPS: Bootstrap a Multi-Vendor Network

Apstra Blog
Aeon-ZTPS: Bootstrap a Multi-Vendor Network

Whether you have a data center, campus, or remote branch deployment, bootstrapping new equipment is not fun.  The job is very time consuming and highly error prone when done manually. People want a “human free” solution  commonly called zero-touch-provisioning.  The basic concept is simple: power-on equipment from factory-reset, get an IP address from DHCP, get the right version of network operating system (NOS) installed, and finally apply the device specific configuration.  Simple in theory, not so great in practice.  Here’s why:

Bamboozled by the Cloud

Apstra Blog
Bamboozled by the Cloud

A few months back, I took the outrageous (to some) position of comparing the public cloud to a black hole. I claimed it was like a black hole because it would suck all the IT capabilities and expertise out of your organization, and eventually suck in your business as well. Or perhaps, like Marc Andreessen, I should say “eat”, as in: “software is eating entire industries.”

Amazon has done an amazing job of promoting their ancillary business, AWS, as the “public cloud”. This is astounding because it is only public in the sense that anyone can enter if they pay the price of admission. By this definition, Disneyland™ would be a public park.
Actually, the primary difference between Disneyland and AWS is that Disneyland is far easier to leave. AWS has been furiously deploying “services” that seduce you and your developers into using it. It is easy to check in but almost impossible to leave.

Well, this rant was two months ago, and I don’t think you want to hear it again. Or if you do, you can re-read my December blog, “Public Cloud as a Black Hole: There are Choices before the Event Horizon“; I wouldn’t change a word.

However, I am back because AWS just gave me another great opportunity to revisit my black hole analogy. On Tuesday, February 28, 2017, large portions of AWS went down, or should I say “black”. As a consequence, literally thousands of web sites went black, as their content disappeared into the black hole that S3 had become. In some cases, it was only parts of web sites, in others the service just became very slow, while others were just “gone” — disappeared over the event horizon. I don’t know if the rest of the world will call this “black Tuesday,” but I will for now.

But wait! The cloud is not supposed to be like that. It is just supposed to be floating along somewhere, nice and white and fluffy, taking care of our computing needs, tended to by meticulous Bezos minions who know exactly how to keep it floating. Returning to the comparison to Disneyland, this park is Fantasy Land.

This story is another triumph of marketing. Let’s get real. Years ago, cloud used to have a bunch of foggy meanings in computing. “Your head is in the clouds” or “cloudy reasoning”. Or, “marketing is where the rubber meets the clouds.” What is this “public cloud” (oops, I mean Jeff Bezos’s Cloud), in reality?

In reality, the “public cloud” is a large complex set of data centers, each hosting thousands of servers and complex networking, growing rapidly to suck up as much business as possible. After all, that’s the strategy — get big fast. I don’t blame them for this one bit. Yet, rapid growth always ups the risk of instability. New systems have to be installed and configured, and connected to existing networks and systems, all running the risk of incorrect connection and misconfiguration. And, new operators need to be trained and deployed and learn from “experience,” i.e. making mistakes.

I value old sayings because, if they are old and we know them, there is a lot of truth in them that keeps them alive. How about the one: “The bigger they are, the harder they fall.” AWS is the biggest cloud by far. Or, as my mother would say, “Don’t put all your eggs in one basket”. This was not qualified by some phrase like “unless the basket is managed in the cloud” or “unless it is a public basket”.

I was once quoted as saying “the only reason all the computers in the world haven’t crashed at the same time is because they aren’t all connected yet.” (I’m not taking it back). We are working on solving that connectivity problem with the Internet. The cloud makes them strongly connected.

But perhaps more prophetic for “Black Tuesday” is a comment by famed computer scientist Leslie Lamport who said, “Cloud computing is having a computer you never heard of bringing your work to a halt.” Actually, he said this about distributed computing, but that was before the “cloud” term became the word du jour. Nonetheless, are you willing to have your work come to a halt because of some computer or person completely outside of your organization? Apparently, almost 148,000 sites got “lamported” on Black Tuesday.

I can’t resist pointing out another analogy between AWS and a black hole. You can’t see a black hole because light can’t escape from it. That is, it is not only dark but keeps you in the dark, so to speak. With AWS drinking their own Kool-aid, the AWS dashboard is apparently hosted on S3. So when S3 was down, it was still showing as being up. You have to laugh when the page is down — except when the rest of your business is also down. So, don’t count on the cloud telling you when it is raining. You just know it when your web presence and enterprise productivity gets soaked.

Now, everyone can make mistakes and everyone can have hardware failures. We can’t expect, and don’t expect, the AWS folks to be perfect. However, how much better are they really than what you can accomplish in your private cloud? Here, I think the rapid growth of AWS is working against them. What is the average number of years of experience of operators at AWS if they are doubling as rapidly as they claim? I think the average has to be going down because they are hiring faster than experienced operators can be produced, which takes years. Moreover, rapid growth means instability. You don’t need the growth, but AWS shareholders do, so you are being exposed to instability to your detriment, to benefit them.

And, it is well-known that most failures are due to operator error. Let me pay homage to a hero of mine, the beloved Jim Gray who wrote 30 years ago: “We can’t hope for better people. The only hope is to simplify and reduce human intervention in these aspects of the system.” We humans just make mistakes in dealing with these complex systems, especially when everything is changing around you, like at AWS. We can only reduce human intervention with automation.

Intent-based Networking Systems

Apstra Blog
Intent-based Networking Systems

Gartner recently released a great report titled “Innovation Insight: Intent-Based Networking Systems” by Andrew Lerner, Joe Skorupa, and Sanjit Ganguli (ID: G00323513, Feb 7, 2017). The report covers key aspects of what makes an intent-based system, the impact intent-based networking will have going forward, and recommendations on how organizations can start realizing the benefits of these systems now.

In this post, we’re going to cover the three recommendations that Gartner makes in the report and how the Apstra Operating System (AOS) is an ideal way to get started with intent-based networking.

“Mandate support for open, RESTful APIs when purchasing new networking infrastructure in order to support integration within an intent-based system moving forward.”

There is quite a diverse range of programmable “interfaces” for network devices today, both on-box and off-box. NetConf, REST, Python, TCL, SLAX, XSLT, OpenFlow, I2RS, and more. In fact, on some recent platforms programmers are even able to access a “broadcom shell” on the devices to allow direct interaction with the forwarding hardware. Many devices also allow users to install software directly on the box, providing importable libraries for Python and C.

With a flexible intent-based network system like AOS, developers can develop ‘drivers’ for network devices using virtually any combination of these interfaces. This is a very important point. Look, we know networking is messy. Not all platforms will have the same diverse range of support for programmable interfaces. While it is true you should only be buying programmable devices going forward, you will want the kind of flexibility that AOS provides in an intent-based system in order to make use of whatever arbitrary set of interfaces a given device exposes.

Even if you don’t intend to write your own code against these programmable interfaces, intent-based solutions like AOS require them. AOS is a distributed operating system, and so the underlying hardware devices will have “device drivers” much like the devices on a computer require device drivers. The drivers that ship with AOS work optimally when the hardware has robust programmability features like the ones described above.

“Pilot intent-based networking solutions by deploying them pragmatically in phases over time, versus a full initial implementation.”

Many network automation solutions require entire networks to be rebuilt and reconfigured. If a given network configuration does not fit neatly with the assumptions that software developers made when building such a solution, then that software becomes an obstacle in the day-to-day operations of your network.

AOS mitigates this problem in several ways. First, AOS manages the network in a way that is intuitive to network engineers: in “sections.” Your network is divided into sections with one or more campus networks, one or more data center pods, one or more extranets, and so on. If you need to add a section to your network, for instance for a Big Data project, than you can leverage AOS today to design, deploy, and operate the ideal Big Data network using modern best practices.

Second, AOS can be customized to accommodate parts of your existing network so that you don’t have to redesign or reconfigure them. AOS is extensible. It can be adapted to your topology and to the devices you use as discussed above.

Last, AOS can very minimally be introduced into your environment in “telemetry only” mode, harvesting telemetry from the devices in your network and streaming them to a collection tool of your choosing.

“Budget for intent-based networking solutions using improved network agility, increased network uptime, and/or better alignment with business initiatives as the funding drivers.”

One of the key lessons coming off the SDN hype-cycle is that networking matters. Automating the network is a tough thing to do, and it’s not likely that any solution will eliminate the need for network engineers. If a solution is going to have any impact at all on agility, uptime, cost, and risk, then that solution must be built with profound insight into what network engineers actually do.

By helping network engineers design, deploy, and operate networks faster and with fewer mistakes, AOS finally delivers on decades of promises to increase network uptime and agility while reducing cost and risk.

Gartner Cool Vendor and IBNS: The Journey to Autonomous Network Operations

Apstra Blog
Gartner Cool Vendor and IBNS: The Journey to Autonomous Network Operations

It’s been an exciting few months at Apstra! Among everything else going on, we have earned recognition from industry analyst Gartner. First, Apstra Operating System™ (AOS) was profiled in Gartner’s Innovation Insight Report: Intent-Based Networking Systems (IBNS) , as the only full offering defining a new category in data center networking. Gartner predicts 1000 IBNS deployments by 2020. Then Apstra was profiled in their Market Guide for Network Automation report in March. And then earlier this week, Apstra was named a Gartner Cool Vendor in Enterprise Networking, 2017.

On behalf of the entire Apstra team, we are very proud and appreciative of the validation that comes with Gartner’s vote of confidence.

We started Apstra because we knew that networking infrastructures are increasingly becoming a critical asset of businesses — especially in this age of IoT, Self-Driving Cars, ubiquitous Virtual Reality, and Delivery Drones, all of which require tremendous networking resources at the core.

We started Apstra because it was (and is still!) mind-boggling that network engineers are asked to operate their ever-increasing and ever more critical networks by typing arcane CLI commands on a box-per-box basis. It made no sense that when systems as mundane as thermostats were being fully automated using well understood approaches to automation — intent driven, closed loop, and vendor-agnostic — no such approaches to automation existed in the networking world.

So we set ourselves to drive the industry towards delivering a Self-Operating Network™ — a network that configures itself, defends itself, and fixes itself; and with AOS, we pioneered Intent-Based Networking and delivered the first and only vendor-agnostic, intent-based Self Operating Network.

According to Gartner, an Intent-Based Networking System such as AOS reduces network infrastructure delivery times to business leaders by 50% to 90%, while simultaneously reducing the number and duration of outages by at least 50%.

It will also reduce the number of operation teams using the command line interface (CLI) from 85% today to 30% in 2020!

We believe in a world where CLI is no longer used in networking, the same way DOS is no longer used to run your computer. We believe in a world where network operations are fully autonomous — delivering on massive improvements in uptime, agility, economics, and enabling operations that scale at the speed of your business.

If you share our belief; if your network infrastructure is becoming increasingly critical to your business; if you are interested in increased uptime, agility, and fundamentally different economics for your data center network infrastructure, then I invite you to follow Gartner’s recommendation. Don’t wait for your refresh cycle! You can fund your Intent-Based Network System using uptime improvements alone… Not to mention improved agility and massively streamlined operations.

We would love to hear from you — join the journey towards no CLI and autonomous network operations!


Claw Back the Cost of Network Failure

Apstra Blog
Claw Back the Cost of Network Failure

The network is the underlying foundation of the data center. If that foundation becomes unstable, everything else, apps and all, are affected. The bummer is that most large data center networks are, in fact, unstable.

Complex networks experience rapid entropy and require constant human care. This entropy manifests itself as a lack of network agility and poor network availability. Whatever the reasons networks are not working as desired, the effects are the same; apps and services quickly break, impacting both operations and customers.

Of course, CIOs and IT departments would reduce network entropy and instability if they could. However, they can’t, and here’s why.