Serverless Eventing: Modernizing Legacy Streaming with Kafka

Knative Eventing offers a variety of EventSources to use for building a serverless eventing platform. In my previous blog post I talk about SinkBinding and we use this concept to create an EventSource the pulls Twitter data.

What happens when you have a legacy system though? It often doesn't make practical sense to throw out a functioning system just to replace it with the "latest and greatest". Also, software solutions rarely fit a "one size fits all" approach. You may also find yourself in a "hybrid" or "brownfield" situation.

For example, perhaps you have a legacy messaging bus that is not just used for your serverless application but also used for IoT devices. This is a perfect example of when you will have a "hybrid environment". Dismantling a working system just to improve the backend makes no sense. We can still modernize our applications with serverless eventing.

I am a fan of Apache Kafka. Admittedly, I didn't know much about Kafka prior to my time at Google. Through my interactions, I got to know the people at Confluent and they taught me about the ins and outs of Kafka. Scalyr has an amazing blog post that goes into the benefits of Kafka. Confluent also has some information here on Enterprise use-cases for Kafka.

My TL;DR is that Kafka is an open source stream-processing platform with high through-put, low latency, and reliable sends. This is ideal for people who want to ensure that their messages are reliably sent in the order that they were received and in real time. That's probably why a significant number of Fortune 100 companies use them.

In this demo, we will use Confluent Cloud Operator. There are many other such as Strimzi but I chose the Confluent Operator for a few reasons.

1) It is a bit more mature. Strimzi is still seen as a CNCF Sandbox Project while the Confluent Operator is currently used in production developments.
2) Larger Community
3) Comes with some interesting tools like Control Center
4) I have been playing with it for longer

So let's check out my tutorial here. I will wait for you to complete

Do Lab

Come Back

That was fun wasn't it? We basically created an application that would go pull currency exchange information and send it to a Kafka Producer. While this simple example doesn't seem like much, imagine having 100 microservices that all conduct their own processing with currency and they need to send them to Kafka. Would it make sense to have developers hard code connectors into each microservice, giving the Kafka administrator many points of failure to diagnose? How about supporting multiple libraries as some services are written in NodeJS, some in Go, and some in Python?

You create a service to simply egress the data outside of the container and let the SinkBinding determine where those messages should go.Then you can connect to an event sink which can be one service or a multitude of services using Channels, Brokers, and Triggers. You conceivably have one code base to handle the Kakfa Event Production while supporting multiple microservices.

Next time, we will learn how to consume with Serverless Eventing.

Serverless Eventing: SinkBinding 101

You are writing an applicaiton where you are trying to analyze tweets in near-real time for a live TV broadcast. A tweet sent 5 hours ago serves you no purpose in this context. How would you do this?

We need to create an application capable of collecting the events and sending them to be ingested, or consumed. This is the the Producer-Consumer Pattern. One process creates an event and transmits it while the other receives the event and does something with the data.

Knative Eventing allows for the creation of first party event sources via SinkBinding. In the most simple terms, a SinkBinding is responsible for matching your producer to a consumer. In this context, your consumer is the "event sink".

The consumer can be any Kubernetes resources that employ a PodSpec. This can be a DaemonSet, Job, Stateful Set, Deployment of even a Knative Service. If you aren't new to Knative, you may better know this as a ContainerSource. ContainerSource's are coming back to Knative Eventing but the favorable solution is SinkBinding.

The ContainerSource YAML contained both the sink binding and the deployment definitions. While this was simpler, it also limited what Knative Eventing could use as a source. By decoupling the deployment definition from the sink definition, we open up the possibilities.

I have a tutorial here in GitHub. Here you will create a small Python application that will pull 25 tweets every 30 seconds and then send them to a Knative service that will simply log all the messages coming in.

You will need to create an application in Twitter to get the necessary API keys to do this tutorial. Everything is detailed in the README file in the GitHub repository. Give a try then come back and let's talk.

Alright, you tried it? Wasn't in cool? Now you may be asking yourself why this matters. After all, couldn't you just write these applications to utilize some kind of messaging bus like RabbitMQ or Kafka?

Sure, but we are talking about serverless here. We are talking about finding ways to simplify the developer experience and asking developers or operators to maintain additional tooling to make the application work moves us away from the goal of being serverless.

There definitely are cloud providers who offer fully managed versions of these tools but fully managed != serverless. You have taken the maintenance requirements out of the picture but now we have to bind sources and sinks.

Knative Eventing allows you to declaratively bind sources to their sinks. If you look at the code, you didn't have to hard code where your wanted to send the code or where you were receiving the code from. You also didn't have to import special libraries so that you could connect to your source or sink.

The code for your event source used the K_SINK environment variable. This variable is used by Knative to know where to route the traffic. For my consumer, I created an endpoint that received all incoming traffic.

The SinkBinding YAML was all that I needed in order to tell Knative how to route my events. This shows how you can simplify binding event sources to their receivers.

This is a very simple example where events are sent straight for the source to the sink. In a future tutorial, we will show you the power of Brokers and Channels when you want to have a more robust routing for your events.

Serverless Eventing: What is it anyway?

Let's take a step back and define "Serverless Eventing".

What is Serverless?

It is no secret that I love serverless computing. As you know, we have discovered a way to transmit data through condensation. We actually beam data into nimbostratus clouds and that's where the compute and storage takes place. You experience outages when the sky is clear.

Alright, I am just kidding. We have not yet discovered how to host data in actual clouds but someone in a lab somewhere is working on it. I actually go into details in an earlier post.

TL;DR: Serverless is about abstracting infrastructure from the developer so that they can focus on code. I have heard people often confuse "Managed Services" with "Serverless". On the surface, they look the same. The responsibility of infrastructure management has shifted from the user to the provider in both scenarios.

Managed Services, however, tend to stop at that point. If you have a managed WordPress installation, for example, it is still WordPress at the end of the day. The only difference is that the responsibility for managing uptime and patches lies with a provider, not the user.

What differentiates serverless from Managed Services can be punctuated by these three points.

  1. Simplified Code Deployment: The core tenet of Serverless computing is to be primarily code centric. This is accomplished by creating a packaging standard that allows developers to define their code runtime, bundle that definition with their code, and push to production. Two common examples could be Buildpacks and Containers.
  2. Resource Allocation On-Demand: Abstracting the infrastructure takes away the power of resource control from the developers. Developers need to know that the resources they need will be available and will scale based on increased traffic. A Serverless platform needs to guarantee the ability to bring up workers to execute code. We also need to ensure that developers on pay for the resources that they use.
  3. Stateless Microservices: In order to take full advantage of serverless architecture, one should decompose their application into smaller,stateless microservices. Having these components act independently of each other allows for more agile development, easier packaging, and code specialization.

What is Eventing?

So what is Eventing? Admittedly, I didn't make up the word. Knative Eventing is where the word first came to light for me. But I enjoy this over "event-driven".

If we take a step back, in my earlier post I talk about "data-driven" applications. The term is very vague as data is simply information but it doesn't tell you anything about the state of said information. The data could be current, the data could be in cold-storage. "Event-Driven" is being data-driven but with a real-time aspect.

Events run our world. I am not just talking about conferences and concerts. A recent transaction on your credit card is an event. Adding an item to a cart with an online retailer is an event. Clicking on a recommended item is an event. Your ride share app telling you that your driver is now 4 minutes away instead of 5 is an event.

A modern application will generate and/or ingest several hundreds if not thousands of events in a given day. This is what we can call "eventing"; an action verb for producing and consuming events in an application. These events are active in nature. The context and content of an event tells your application what to do next.

In the above credit card example, to prevent fradulent activity, one would want to analyze that transaction as soon as it happens. This would kick off another event where an initial fraud analysis takes place, looking for obvious signs. Depending on the results, there are two possible events, either an "everything is okay, proceed as you were" or "this doesn't look right, let's further analyze the charge". That in turn may kick off its own set of events and so on.

Bringing it home, what is "Serverless Eventing"

Let's combine these two concepts. We want to create an application that is event-driven and also serverless. Easy, just create a Kafka cluster, create some topics, setup your microservices to consume and you are good to go. Right?

Yes and no. That is certainly one way you can do it but really Kafka or any pub/sub style messaging bus is really more of a component of Serverless Eventing.

Let's look at our definitions of Serverless again. The application needs to be stateless, resources need to scale, and we need to simplify the bundling of code. This done by abstracting the infrastructure components from the developer. Similarly we need to abstract the binding of event sources to their sinks (producers and consumers).

The traditional method is to do so imperatively. You tell your application to get these messages from this topic hosted on this cluster using this API Endpoint, etc. In a Serverless model, we want to scale and imperatively defining the bindings prohibit that. As our application grows and our services are working with more events, we need to declaratively bind the events.

We don't want to code our services to directly consume from a specific source, requiring specific libraries, and so on. Instead, our services should be able to passively receive events and react accordingly based on our services' business logic. As your application grows, rather than having to bind event sources to your services by hardcoding it, you define the connections and event flow using the Serverless Eventing tool.

While Knative Eventing isn't the only tool that is accomplishing this vision, I personally believe that they have one of the best opinionated structures to execute. You define event sources, brokers, channels, and sinks as Kubernetes objects via YAML. These components will then deliver the messages to your services.

Due to it's open source nature, you are able to extend the Knative Eventing CRDs to support new event sources and do other customizations. Knative Eventing supports a number of event sources such as GitHub, Kubernetes, Kafka, and more. You are essentially given a toolbox to build your own pipelines for event-driven applications.

Knative Eventing also allows for both simple and advanced routing. Simple routing will take an event from a source and send it to a specific sink. Advanced routing gives you some intelligence when it comes to determining what to do with the event before it reaches the sink. You can then route it to the proper sink based on a set of criteria.

In summary, Serverless Eventing enables developers to declaratively bind event sources to their sinks, abstracting the process from the developer. Through this blog series, you will learn specific ways to do this.

Serverless Eventing: A Series

"I want a data-driven organization!" We have all heard this in recent years haven't we? Maybe it was at an All-Hands, maybe your manager, probably at a mixer in Silicon Valley somewhere. My healthcare provider told me about their new "data-driven matching".

Sometimes I wonder if the term holds any weight anymore or if it has just become a new marketing term that we use to sound futuristic. Are they collecting data? What data are they collecting? How are they extracting meaning from the data? Is it being applied properly? These are all very good questions when evaluating what it means to be "data-driven".

Depending on who you ask, you will get a lot of answers. I will break down my opinion on the concept.

Let's first break down the concept of data. We don't need to go too deep as data is information. Your weight, your birthday, your name, and more. I tend to see this as static data. This data largely persists through time and changes are infrequent. This data has value to many organizations. I may want to send a "Happy Birthday" email to my customers when it's their birthday or offer them 10% off of a service. Knowing where a customer resides allows you to market specials at the closest location of your grocery chain.

Data in and of itself is useless though. I can collect every iota of data in the world but if it sits in a storage bucket somewhere, what use is it? Data should be collected in order to be analyzed and derive meaning. Why did people purchase more pizza in April than in February? What are the most clicked parts of your website during the summer months?

When you are able to analyze and derive meaning, you then need to apply that meaning towards decision making? Maybe books on baking aren't as popular in the spring months are they are the autumn months so you should consider that when stocking your bookstore. THAT is what I think about when I hear data-driven. It means collecting, analyzing, and using data to drive the decisions in you or your organizations lifecycle.

In the age of IoT, Smart Phones, and Responsive websites, however, we have created a new breed of data. According to IDC, nearly 30% of all data will be real-time in nature. This data is dynamic in nature. If I am using a fitness app to track my running, every second I am in a new location. The value of a given crypto can change every second as well.

This dynamic data I often frame as events. This is more than just information, this is information that is time-bound. If I am waiting for a ride share, I want to know if they are 5 minutes away or 15. I want to know what traffic looks like right now as I head home from work, not a week ago.

This new advent of real-time dynamic data has pushed us from needing to be just data-driven to being event-driven. If we are receiving data in real-time that is time-critical, we need to analyze and act on it just as fast.

This brings us to SERVERLESS EVENTING. Now what is "Serverless Eventing"? This isn't trying to send event data directly through the sky and bypassing the need for bare-metal. Serverless is about simplifying the developer experience and enabling them to manage their own deployments. This allows developers to focus more on code and packaging code and less on managing networks, routes, storage, server uptimes, and everything else infrastructure related.

Serverless Eventing is simplifying the developer experience in building real-time streaming applications. Streaming applications usually use a message queue or bus and then you hardcode the functionality imperatively. You are writing libraries and functions to connect to your Kafka topics and repeating this for every service you create.

Serverless Eventing allows you to declaratively define your streaming pipeline. It offers abstractions to simplify connecting code to events. These abstractions are also reusable and scalable. As your application grows to have more services, you can declaratively connect your events to your applications.

I am starting this series to cover some concepts and examples related to Serverless Eventing. This will center around Knative Eventing. Knative is a Kubernetes Native platform that was open sourced in 2018 by Google. It enables operators to design serverless platforms for their developers.The Eventing component focuses on allowing developers to declaratively build serverless applications.

I will show a variety tutorials and concepts using Knative Eventing stand-alone as well as showing how to integrate it with legacy systems such as Apache Kafka. You can follow my code and demos at this GitHub Repo.

Please feel free to share and ping me with questions.

What the *bleep* is Multi/Hybrid-Cloud Anyway?

With 2019 coming to an end, it has become clear that Hybrid Cloud and Muli-Cloud have become the buzzwords of the year.

It wasn't that long ago that we were talking about getting everyone onto the cloud and now we are talking about bringing the cloud to their datacenters. The cloud was supposed to promise endless scaling, new technology and innovation, etc. Are we going backwards or is there something more happening?

Let's start by talking a bit about the "state" of the cloud today. An IDG report from 2018 stated that 73% of all enterprises have at least one application in the cloud. An additional 17% plan on moving part of their workload into the cloud.

Now in the same report, 44% of organizations are multi-cloud. For the purposes of level-settings, multi-cloud means having workloads on more than one public cloud provider. Now this kind of makes sense. Many people will often start their migration story on one vendor. As they start moving more of their workloads to the cloud some vendors decide to continue working with vendor A but other times they decide to move a workload to vendor B or C.

Now on the flip side, we have this thing called "Hybrid Cloud". To level-set again, "Hybrid Cloud" is running workloads in the cloud and on-premises (private cloud). RightScale did a study showing that maybe 72% of Enterprises are use private cloud.

This provided a unique opportunity for integrators. Many Infrastructure-as-code providers filled in the gap created to help organizations find a way to both manage the multiple environments and enable connection between the environments. While this did help fill in a gap it did not address the problem that is opinionated vendor lock-in.

Every vendor had it's own opinionated way to execute on common standards. Something as common as VMs operate differently depending on whether we are talking about VMWare, OpenShift, OpenStack, AWS, Azure, GCP, etc. This is where the problem with Multi/Hybrid Cloud begins to take hold.

For organizations, they are usually having to hire and train staff who understand specific vendor platforms. Sure you can find tooling to simplify it, but the same tooling usually has to have multiple configurations to handle each opionated platform.

Now there has been several attempts to create a standardized format over the years. Containers were supposed to help and while the technology has taken off, we have a variety of options such as layers, buildpack, containers, etc. Kubernetes essentially won the orchestration wars but just like Linux, every vendor has it's own "distro" of Kubernetes.

So where are we today? The strategy that the major vendors have attempted is to bring their opinionated platform to the consumer. This is why Hybrid/Multi Cloud has become a buzzword. Rather than offering tooling and/or opinions to connect clouds, they will just bring their cloud to you with some additional tooling.

Right now the major players are AWS Outposts, Azure Arc, Google Cloud Anthos, VMWare Tanzu and Cloud Foundation.

Each has their own opinion on how to bring their public cloud to the datacenter and other clouds. We are still seeing opinionated computing but are now able to bundle the offering and deploy.

From a strategy perspective, it makes perfect sense. After all, cloud computing is a consumption based business. You need people on your platform using resources in order to realize revenue. If there are workloads on other clouds, private or public, that is a lost opportunity.

The obvious concern is that we go backwards in erms of "openness". Innovative people worked to find ways to fill the gap between cloud vendors and their proprietary platforming. This brought us tools like Kubernetes, Chef, Jenkins, just to name a few (really the tip of the iceberg). Could this move cause us to regress? Is having a black box deployed across platforms the way to go?

I think it could be if done right. It is said that Kubernetes is the "Linux of the Cloud". In Linux land, we saw many different Distros. You could get a fully supported Enterprise version from Red Hat, Suse, or Ubuntu LTS, or you could run a popular OSS version like CentOS or Debian or go deep into the weeds with Gentoo or anything you can find on DistroWatch. The benefit is that at the end of the day, Linux is Linux and the platform was the same exeprience regardless of the path.

Kubernetes could easily allow this as well. Enterprises and hobbyists alike could create their own "Kubernetes Distribution" and appliciations living on them deploy the same regardless. Yes, you would get an opionated configuration, but the actual experience will largely be the same.

Some vendors are choosing this path but others are just extending their current black box to other clouds. Which one is the better approach? Well as a fan of OSS, I would prefer Kubernetes, but I guess the market will speak.

As we go into 2020, I am excited to see what will happen in the world of Multi/Hybrid Cloud world. It seems like 2019 was the "announcement year" and 2020 and beyond will be the "practice year". Stay tuned for all the announcements and let me know your experience with your platform.

The Promise of Cloud Compute is Serverless

In recent years I have noticed more and more people seem to be dropping the term "serverless". It is up there with "Blockchain", "Crypto", and "Micro-services" in the trending tech nomenclature.

It makes sense why people would love it though. I mean I just pay for what I use. I don't think about networks or storage or patching servers. I just write and deploy my code and all of the server operator work is abstracted away.

Continue reading "The Promise of Cloud Compute is Serverless"