Data Chain for January 2016 – Things We Say

One of the things I want to do in 2016 is get more involved with the Tableau community.  I’m not calling it a resolution, since that would jinx it and doom it to failure.  Luckily for me, a number of projects were created after TCC15 to encourage such involvement (Data Dare, Makeover Monday, Data Chain, etc.).  I chose Data Chain because I liked the idea of a regular deadline and someone on the other end expecting me to deliver something, namely a postcard.

The Data Chain assignment for January was “Things We Say”.  I initially interpreted this literally and considered counting specific things that I say (like swear words, thank-yous, etc.) but then after watching a video from my friend on having more meaningful connections with people, I decided to track how well (or not) I do that.  Specifically, I wanted to track my conversations throughout the week.  Since I work from home, and we live on a couple of acres, I spend a fair bit of time on my own.  I’m not isolated in the same way Jack was in The Shining, but I do experience long stretches of time when I don’t physically encounter another human being outside of my immediate family.  So I thought it would be interesting to see how often I have actual conversations with people and thereby opportunities for meaningful connection.


Data Collection

There’s no app for tracking conversations (although I’m sure the NSA is working on one), so I set up a Google Sheet to capture my data.   As there are no clear rules for what defines a conversation, I also had to establish some basic parameters to guide my data collection activities:

  • Live Conversation – has to last at least 5 minutes
  • Staggered Conversation (i.e. things like email and text where there can be a delay between responses but a single topic of conversation is sustained) – has to have at least 3 communications from each side.
  • Group meetings don’t count.  I participate in many of these every day, but I wouldn’t qualify them as opportunities for meaningful connection with other people given their focus on things like project timelines, business performance, etc.

My goal with this was to filter out all of the short exchanges that litter a day (e.g. quick greetings, single comments on Facebook, brief IM’s along the lines of “Hey, can I get X” followed by “Here you go!”, etc.).  I also added fields for whether the conversation was personal or work-related, what type of communication it was (in person, phone, IM, etc.), what time of day it occurred in, etc.

Setting up the spreadsheet was certainly easier than populating it.  I’ve never done any “quantified self” analysis before, and so I had to constantly remind myself to pay attention to my exchanges and dutifully enter the relevant data.  Having the spreadsheet opened permanently on my second laptop helped with the work-related conversations, but things got a lot harder when I was away from my office.  Even though I could enter the data via the Google Sheets app on my phone, I had to remember to do that.  This exercise definitely gave me renewed appreciation for what Dear Data and Dear Data 2 have accomplished; I just needed to produce something in a month, while they needed to do something every week!

The Visualization

I wanted a visual that I could not only draw on a postcard (with my limited drawing skills) but that I could also replicate in Tableau.  Some of Andy Kriebel’s recent Dear Data 2 posts (beauty and positive feelings) employed hub and spoke charts to organize the data in a way that was not only visually appealing, but seemed relatively easy to create on paper.  The only thing I thought I might want to do a little differently is have the length of the spokes vary base on the length of the conversation.  Shouldn’t be too hard, right?

Wrong.  Figuring this part out was definitely harder than it needed to be (and I probably spent far too much time on it, but frustration can quickly morph into obsession).  When I started Googling articles on how to create these hub and spoke charts, I found very few resources that explained how to easily generate the x and y coordinates.  Many of them veered into explanations of network graphs (which can be mesmerizing) or were based on having geographic data (where the lat/long coordinates became the natural x and y values), or just assumed I wasn’t scared of math (“simple trigonometry” are two words I don’t use together, and I have no idea why I’d ever need to understand \mathbf{B}(t) = (1 - t)[(1 - t) \mathbf P_0 + t \mathbf P_1] + t [(1 - t) \mathbf P_1 + t \mathbf P_2] \mbox{ , } 0 \le t \le 1 in order to create a chart that resembles a flower drawn by a kindergartner).

I just needed a way to assign somewhat meaningful x and y coordinates to my 45 rows of non-geographic data.  If I had only a handful of data points I needed to plot, this post by Andy Kriebel would have worked well.  But I needed something a bit less tedious.  NodeXL (an Excel plug-in) was mentioned in numerous articles so I downloaded that and entered my “edge” values (i.e. the “to” and “from” data points) only to generate something that looked sophisticated but told me nothing.

Could I study this tool more, and network graphs in general, and produce a better result?  Yup.  Did I have the time or the desire to do so?  Nope.  So I decided to take a different approach and generate the numbers myself based on the available data.  As I said before, I wanted the length of the line to represent the duration of the conversation.  I also wanted a visual way to differentiate some of the key attributes, specifically whether it was a personal or work conversation, and then whether it was face-to-face (i.e. I could see the person) or not.  I decided on the following:

  • Personal conversations would point up, while work conversations would point down.
  • Face-to-face conversations would point to the right, while the rest would point to the left.

Using those simple rules, I threw the file into Alteryx and had it spit out a modified Excel file with two rows for each interaction (since the hub and spoke needs these two points in order to draw the line) along with the appropriate vector coordinates.  I could have done this in Excel, but it’s hard not to fire up Alteryx if you have access to it.  Although, to be honest, having Alteryx perform these fairly simple data preparations sometimes feels like asking The Incredible Hulk to open a jar that’s just a wee bit too tight.

The resulting file looked like this:
Each interaction had two rows, with the field Path Order identifying the 1st and 2nd row for each.  The X &Y coordinates for Path Order = 1 are all 0,0 (since I wanted the lines to all start from the center), while the X & Y coordinates for Path Order = 2 are just the duration, with the sign (positive or negative) being determined by what kind of conversation it was.  If it was personal, then the Y coordinate is positive, but if it was work-related then the Y is negative.  Similarly, if it was a face-to-face conversation (e.g. in-person, video chat) then the X coordinate is positive, but if it wasn’t then the X value is negative.  Pretty rudimentary, but I could wrap my brain around it.
I pointed Tableau to the Excel file, constructed the viz the way the articles instructed, and voila…

Ummm….that’s not right.  Well, it is right based on the data I fed in, but it isn’t what I wanted it to look like.  I wanted some space between the data points vs. having them overlap like that.  Luckily, there’s a really easy way to add that space inside of Tableau.  I remembered a jittering method I’d employed before and gave that a try.  I created a new field called X * Index (where I just multiplied the X measure by the INDEX() function) and then placed that on the columns shelf instead of the X measure.  I set the Compute Using to be at the level of detail required to get the right amount of space….

…and finally ended up with the visualization I wanted.


So did I learn anything about myself through this exercise?  Yes and no.  I did expect the general distribution that my conversations took (personal being predominantly in person while work was all virtual, work dominating the mornings and afternoons, at least during the week, etc.), but I was surprised by two things:

1. Despite the fact that I have a number of friends whom I interact with online, I don’t have that many actual conversations with them.  My interactions are generally quite brief.  This is something I should definitely work on.

2. I spent an average of 205 minutes each day (out of a possible 1,020 waking minutes) having conversations with people, with a couple days considerably below that already low average.  While I know some of that other time was spent doing normal things like working, reading, editing my photos, watching TV, etc., it does highlight to me that I might be more cocooned than I want to be.

There might be something to this quantified self stuff. 🙂

The Postcard

As I haven’t drawn anything in quite awhile, I decided to solicit some assistance from my 5-year old son, who draws all of the time.  I showed him what I was trying to do, and while he sort of liked the visual I was going for, he much preferred the “different colored suns” that he saw in Andy’s post…and so he drew one of those instead, naturally (to a 5-year old) swapping out the sun for a black hole. 🙂

The Tableau Workbook

If the embedded version below doesn’t render properly, you can click over to it here.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s