Networks get smarter with stream oriented computing
December 21, 2015
Companies struggle with reliable device signaling for a myriad of reasons: security, dropped connections, catch-up on missed data, firewall/NAT enviro...
Companies struggle with reliable device signaling for a myriad of reasons: security, dropped connections, catch-up on missed data, firewall/NAT environments, scaling, reliability, replication, redundancy, etc. Much like with CDNs (Content Delivery Networks) such as Akamai, many companies prefer to use a service already proven to work and scale, and instead focus on what they really want to be building, which isn’t infrastructure.
Today’s DSNs (Data Stream Networks) are limited to routing data reliably. You can bring data from many devices to a single point, or from a single point to many devices, or even many to many. This is of huge value, of course, but one thing doesn’t change: the data. The data itself is moving around, but it isn’t changing. It’s just a stream of packets going from one end to another. But what if you *could* change the data mid-stream? We think that would open lots of new and interesting possibilities.
Consider this hypothetical example: Suppose you want to build a chat app for millions of junior high kids to talk with each other. You might want to filter for profanity. Conceptually this is a simple thing, just an “if” statement. If message contains any words in a blacklist, rewrite the message or send a notice back to the student.
In practice, however, implementing a global chat across millions of users is surprisingly difficult. Without filtering, messages will go directly from student A to student B through the DSN. However, to implement filtering you would have to take data out of every stream, pull it down to your servers, do some processing, then send it back to the DSN to its original destination. What should be a simple “if” statement becomes a big distributed computing problem. There has to be a better way.
What if the network itself could be smarter? What if you could move some of the computation into the data stream itself instead of having to pull it down to your server, when your server isn’t the end destination anyway? This is what we call “stream oriented computing,” where computation is moved into the data stream itself.
With stream oriented computing, simple computations can remain simple even as they scale to millions of users. Imagine tracking temperatures from a thousand sensors and you want to know if any of them rises or falls more than 1 percent in a five-minute window. This would be trivial to build if computation lived in the network, but very difficult if all of the real-time datapoints have to be moved to a server first.
Once computation is moved into the data stream, interesting things become possible. The network could merge and split streams based on geolocation, time zone, or other situational criteria. The geolocation of a data point could be used to look up additional information to enrich the data, like lat/long to addresses, or finding all Thai restaurants nearby.
Data from multiple sensors could be aggregated into a single timestamped stream before being sent to a central server. And for those sensors, the data stream network could be running a constant calculation to determine the rolling average of a value like a temperature reading or CO2 levels. Or how about a voting app? The end server doesn’t care about individual votes, only the aggregated totals. Let the network handle that math. It all happens on the fly as data streams through the global network with minimal latency.
Writing software this way won’t be easy at first. We have to split our computation up into separable components, and not all computation is appropriate to put into the network. Many things will still live on a server. However, any code that is conceptually simple but hard to scale to millions of users is a prime candidate for moving into a smarter network. It will also help to use existing languages like Javascript and SQL instead of having to learn specialty query languages.
Long term, we will see a generation of apps that are essentially server-less. Rather than shoving messages around, servers will act like flight controllers, giving the network of planes directions from a central point instead of flying the planes directly. Stream oriented computing will fundamentally change how we build real time applications.
Josh Marinacci is Technical Marketing Manager at PubNub.