We've spent time researching, reading, and searching for ways to approach this problem. That is, making sense of the massive influx of data that exists today.

And as we search, like Rome, all roads lead back to one man: Claude Shannon. A Mathematical Theory of Communication didn't just make the internet, it created both a new science and offers a model for quantifying the world in ways never before imagined. Surprise! This paper is from 1948 and we're still finding new ways to apply its concepts in 2018.  Can it help with the massive influx of data? We think so.

What resonated so deeply with us wasn't the history lesson of how Shannon came up with the math. No, that forms the underpinnings of the digital age and it makes sense. However, it was something else. Something seemingly innocuous.

It was his definition of the word "surprise." Let's start with the basics.

Merriam Webster's dictionary defines surprise as:

It's the last definition that drives at the heart of the matter for us. Surprise, in a traditional sense, is more about feelings related to unexpected events.

Let's go one step further and see if we can apply this definition towards our influx of data problem.

What if 'surprise' was something we could measure and make sense of? Shannon took this concept of surprise and applied it to his work. He used the likelihood of a given message as a means of assigning value to information in a given communications channel.

Why is this important? We've just gone from talking about feelings (Merriam Webster) to talking about ratios (math). Pierre Baldi took the specifics of this math in his 2002 paper A Computational Theory of Surprise.

A Surprise Shift in Definition

We can't adequately explain how massive a shift this is. Taking Shannon's work in quantifying and making sense of the information held within the English language (surprise, it's pretty redundant and inefficient) shows us just how much room there is to make even the most mundane parts of our life more efficient.

In today's age of streaming data, we see this as a critical concept in understanding the decision-making process that happens behind the massive influx of data that's continuing to grow, all those 'data lakes' are already wondering when the levee is going to break.

We see the measure of surprise as a critical aspect of how to identify critical variables and events that are high value (contain a high preponderance of information). This also reminds us a lot of the concept of statistical significance and as we see it, business intelligence is focused on helping to answer questions about an organization's data. Questions, hypotheses, confidence that the 'null hypothesis' is very unlikely (high surprise): surprise isn't how I feel, it's far more useful as a measurement and indicator of trends in data through a (communications) channel.

AWS re:Invent 2018 - NLP-Driven Keynote Live Blog (almost)

We've spent a lot of time working with natural language processing (NLP) processes and timing lined up for us to

Read more

AWS re:Invent 2017 Preview

I'll keep it short and sweet--re:Invent 2017 is downright crazy. Highlights include:

  • Expanding to 5 major venues
  • Over
Read more