Five Essential Points on Data Visualization

The goal of data visualization is to act as a catalyst for some sort of behavior change, yet mostly practitioners focus on other things.

At the most commonly found level, a data visualization project consist of having some data, having a vague idea of the user’s actual needs, and a strong conviction that what you’re doing is perfect for the problem. Well, in 99.99% of the cases it’s not.

This process could be visualized kind of like this.

For the sake of completeness, here is the real-time / streaming analytics version.

It’s very easy to take data, mix it with some sort of “black magic” (read Javascript) and have everyone look at it in awe. But how about after the presentation? How about after first 10 uses? How about after a year? How about when something needs to be rapidly changed (because NOBODY gets the spec right at first go > analytics dashboards are a moving target because user needs are).

It may be straightforward to build things with various libraries, but that’s the wrong problem.

The number one reason for frustration (user) and failure (business) regarding data visualization is “data puking”. Don’t just puke, tell a story that helps people to take action (change behavior).

The basic principle is that generally you’ll understand the value in your work better than anyone else does. Be very mindful of that.

Around 2008 I led a team of young turks that shared a goal; start a new era in web analytics. At that point, Omniture had not been acquired yet, and Google Analytics was really bad. I mean really bad. There were three of us trying to change things — STATSIT, Nuconomy and KissMetrics — and show the big boys how to do it. I think to some extent we did show them, at least Google. Below is a screenshot from the keynote speech Avinash Kaushik, then Google’s head evangelist for analytics, gave in the Google Analytics summit in 2009. In it you see the goal setting view of STATSIT web analytics.

As the screen shows, almost 10 years ago, we had built-in goal setting for ‘behaviors’ and ‘long term’ goals, in addition to what were more commonly referred to as conversions. While ‘behaviors’ have to some extent become part of the analytics toolkit today, there is still much room for improvements in terms of ‘long term’ goal tracking. This is just one example of how analytics is nowhere as nearly goal-driven as it could/should be.

Related with the goal aspect, you could follow these 6 steps to avoid headaches later:

  1. Identify what the current goals of the user are
  2. Make sure that the visualizations correlate with that
  3. Create a simple prototype first
  4. After some use figure out what the goals now are
  5. Make changes rapidly (this will make users want it badly)
  6. Keep iterating

Here is an example of a simple user feedback loop:

Now go do it. Forget bells and whistles and ‘wow factors’ until you figure out what the user actually needs. Note that ‘needs’ and ‘wants’ are often entirely different.

Going back to the point about starting simple, you might ask, how do you know it’s simple enough?

What do I want the user to do after they look at my visualization / dashboard? Most of the stuff, maybe 99% is “nice to know” with a sprinkle of “actionable”. A very simple example would be to use a color coding where the color already sends a contextually relevant message to the user. For example, something that is critical is darker red than something that is a bug. A nice to have would be light yellow, whereas must have would be darker yellow. We don’t see a lot of this.

But we do see (still) a lot of this.

Endless menus, with endless depth and no indication of what to pay attention to (or what not to pay attention to).

Behavior change is not some buzzword, but an actual thing. You can (and should) engineer it exactly like you should engineer other aspects of a visualization project. One of the best ways to get started with the idea of behavior change engineering, is to adopt a framework that is specifically made for the purpose.

The basic premise in behavior change is that you need three things:

  1. Ability
  2. Motivation
  3. Trigger

99(.9)% of all data visualization projects fail to appropriately address this. A good way to start learning about how to appropriately address it, is thinking carefully about the kind of behaviors you want the user to take.

Once you know exactly the kind of behaviors you are aiming for, then you can identify the relevant abilities, motivations and triggers, and put it together in to some amazing dataviz!

Technology (dashboards / plots / tables) are as useful as the context it’s in. For example, if you eat nothing but soup and I give you fork, then that is not a great technology at all. For salad eating on the other hand, fork is a brilliant piece of technology.

Generally what you think is relevant, is far less relevant to others. Because you tend to establish relevance based on your own ideas and preferences, and conversely others do it based on theirs.

For example, I needed to do a lot of descriptive stats tables, but I did not want to go out of my dev environment which is iPython / Jupyter. Painfully seeking high and low for a solution, I came to conclude that there was no meaningful way to do it without moving the data out first. I was proficient in two different visualization libraries (Matplotlib and Seaborn), Pandas and a bunch of other Pydata stuffs, but this is not something I could do with any of it. So I built it, which I find is often the best way to get exactly what you need the way you need it. In this case I was the user, so it’s easy to know the actual need. That’s the kind of match there should be between demand and supply.

Then that led to doing some other really basic plots that I found it was hard to do with some of the well established packages out there:

Later I made astetik into a python package so everyone else needing to do descriptive data tables could do it as well. I figured that might be a few people.

pip install astetik

My last example is related with deep learning, the most froth inducing topic in the world of analytics. Again I was looking at what I could do with the libraries I already knew how to use, and what I could do with Keras or Sci-kit, or Tensorflow. None of it gave the exact thing I needed as my primary result view to get started in doing hundreds and thousands of tests with all kinds of data. None of it supported the kind of visualization I would need. So I started with this…

Then that led to the understanding of what I actually needed, so I was able to move forward building that.

Generally what you think is important (for yourself) is just a reflection of your limited understanding of the scope / domain. The more time you put into the initial build, the less likely you’re going to make significant changes (according to your actual needs) later. This is called emotional attachment, avoid it like plague. Or otherwise the whole thing may end up something like this…

As a parting word, I want to leave you with something to think about. We often think that we’re experts in a given topic, and that’s really not helping. It’s much better to start with a blank sheet of paper, and assume to know nothing at all. Always go back to the needs of the user, and your inability to accurately understand it (even if you’re the user), and the way things are always changing and how analytics dashboards / dataviz is not like some art piece sitting in a museum (even though often it ends up like that) but a living, continuously evolving thing.

At best data visualization is a thing that can help make people’s lives less frustrating, their work more productive, and help them feel more joy in whatever it is that they are doing. Have fun!



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store

Worked with machine intelligence for 15 years, and built the interwebs for 25. Nothing here is my own.