Quantcast
Channel: dataviz – Healthy Algorithms
Viewing all 40 articles
Browse latest View live

Baby faces and Chernoff faces

$
0
0

I’ve been spending a lot of time looking at faces lately. Baby faces, to be specific. Baby faces are just wonderful, especially my baby, if I do say so myself. In fact, that adorable baby, said face, and said staring are making me forget my “blogging” voice. I think I can recover it with a little effort.

There is a face-related visualization technique that I’ve been planning to write about, and this seems like just the moment: recall Chernoff Faces. This 1970s-era method for multidimensional data visualization is so cute that it has been a recurring example in visualization education for 40 years, but it is so hard to use that it is almost never used in practice. Have you seen it before?

There are two major problems with this method: it is almost always a mistake to use it for public communication (Eugene Turner’s map of LA above is a rare exception), and it is almost always a mistake to use it in a dashboard. It has lived on from its original paper because it is so cute, but it made it to publication in its original paper because it was used appropriately: as a human computation aid in a clustering task.


It looks to me like there are three different shaped faces in the above figure. Agree?

This hasn’t always gone so well, though, as an example from a subsequent application by different authors shows below. In the following, do you see anything worth grouping together? I guess they show only a single exemplar from each cluster here, so they’re all supposed to look different.

But I’m still excited about for the same reason as everyone else: faces are so cute! I have come up with a related human computation task where it may also be useful: outlier detection.

I’m not the first to want to do this, and it has even been attempt in health service research before:

(from here)


(from here)

A quick search turns up lots of code for generating faces from multidimensional data, but nothing as cute as Chernoff’s original work. I’ll remedy that in a near future post. Unless you know of something already out there that I missed.



DataViz in Python: Chernoff Faces with Matplotlib

$
0
0

As promised previously, here is my effort towards drawing Chernoff faces as cute as they were in the original paper.

Isn’t that just the cutest? Code here.

My replication was greatly aided by a graphic that I call “the missing figure”, which I found on a website that has since disappeared. It helped me understand what Chernoff’s paper was talking about when it says the corner of the face:

Here are a few other ways to look at it:

Varying one parameter at a time:

Lots and lots of random faces:

Making that figure reminds me of a wonderfully vintage quote from Chernoff’s 1972 paper:

“At this time the cost of drawing these faces is about 20 to 25 cents
per face on the IBM 360-67 at Stanford University using the Calcomp
Plotter. Most of this cost is in the computing, and I believe that it
should be possible to reduce it considerably. ”

I’d pay $250 for a beautiful plot of that last figure. But it is nice that the price has come down.


Baby Faces II

$
0
0

I knew I read something about baby faces recently, and now I found it. It was in a Malcolm Gladwell book. This guy should be my inspiration.


Same Chernoff

$
0
0

I’ve been reading up on the Chernoff face for data visualization, which, as I’ve mentioned, is so cute. This helped me demonstrate that I haven’t forgotten everything from my grad school days, like a bound with name similar to the face. Back in my olden days, I thought Chernoff bounds were the cutest, and Chernoff faces were quite far from what I spent my time on.

So what a nice continuity that I learned it was the same Chernoff who lent his name to both the bound and the face. He seems to have a good sense of humor about both, and says that a different Herman deserves credit for the bound:

My result, involving the infimum ofa moment generating function, was less elegant and less general than the Cramer result, but did not require a special condition that Cramer required. Also, my proof could be described as crudely beating the problem to death. Herman claimed that he could get a lower bound much easier. I challenged him, and he produced a short Chebyshev Inequality type proof, which was so trivial that I did not trouble to cite his contribution.

What a mistake! It seems that Shannon had incorrectly applied the Central Limit theorem to the far tails of the distribution in one of his papers on Information theory. When his error was pointed out, he discovered the lower bound of Rubin in my paper and rescued his results. As a result I have gained great fame in electrical engineering circles for the Chernoff bound which was really due to Herman. One consequence of the simplicity of the proof was that no one ever bothered to read the original paper of which I was very proud. For years they referred to Rubin’s bound as the Chernov bound, not even spelling my name correctly. … Fortunately for me, my lasting fame, if any, will depend, not on Rubin’s bound, but on Chernoff faces.


What is “hello, world” for statistical graphics?

D3js axes are going to be hard to teach

Brushing handler, a recent development (?)

The key to interaction in D3js


Javascript style?

Animations and transitions in D3js

Force-directed graph with clusters in D3js

D3js speed

Global Data Viz in translation

$
0
0

IHME has recently worked with the World Bank to release a series of regional reports on relevant findings from the Global Burden of Disease 2010 Study. It is cool to see this work getting disseminated, and now even in non-English editions. This raises questions for data visualization translations, like should 1990 and 2010 be in reversed positions when accompanying right-to-left text?

gh_dv_q


GeoJSON for Norway Counties

Will I attend a MOOC?


Matplotlib and dj3s, together at last

$
0
0

There is an exciting new project in pythonic interactive data visualization that I have my eye on: mpld3. It plays well with matplotlib-based pretty plotting packages, and has the beginnings of a plugin framework for adding custom interactivity.

I used it to mock up a Cartesian fish eye distortion plot, something I’ve wanted for DisMod-MR ever since I learned about it. (Sometimes the interactivity doesn’t work in that notebook, and requires reloading everything… cutting edge software has some rough edges.)


Stylish tooltips in mpld3

IDV in Python: Interactive heatmap with Pandas and mpld3

$
0
0

I’ve been having a good time following the development of the mpld3 package, and I think it has a lot of potential for making interactive data visualization part of my regular workflow instead of that special something extra. A few weeks ago, an mpld3 user showed up with an interesting challenge, and solved their own problem quite well.

I finally got a chance to look at it today, and with a little spit-and-polish this could be something really useful for me.

ihm


IDV in Python: adding text callouts to a scatter plot interactively with mpld3

$
0
0

I’ve been pretty interested in the potential of interactive data visualization recently, especially ever since I saw the reaction that the Global Burden of Disease 2010 visualization tool, GBD Compare, received last year. And one promising technology for making this stuff routine is mpld3, a mashup of the Python plotting library matplotlib and the javascript visualization kernel d3. Have I mentioned this before?

The thing about interactive data visualization is that its not always clear what is useful because it excites my reptile brain, and what is useful for more logical reasons. But I was asking a colleague to add some callouts to a (non-interactive) figure recently when I realized that this is a chance for interactivity to be _obviously_ useful. These finishing touches on a graphic often take me tons of time, and using a command-line plotting program just can’t be the right way to do it. How about an mpld3 plugin that lets me add text callouts interactively? And when I’m done, it can “save” the callouts, by creating the necessary Python script to generate them again? Here it is, in a notebook.


SciPy2014 Plotting Contest

Viewing all 40 articles
Browse latest View live