I’m fairly new to the state of Washington (having only moved here 6 months ago), and am very eager to explore all of the natural wonders that now surround me. But unlike my younger years, where I could just head out on my own for hikes of undetermined length and difficulty (so long as I didn’t die, I was totally okay with the uncertainty that went along with unplanned forays into the outdoors), I now need to be a bit more pragmatic and find hikes that my family will enjoy, specifically my kids. The 4-year old gets excited by anything new but has little legs, while the teenage girls can certainly walk longer but have a pretty steep boredom threshold.
There are a number of great resources out there for researching hikes. The two I use the most are the Washington Trail Association website and the book The Best Hikes with Kids. Both are very thorough, with lots of great information, but in both cases I often find myself flipping back and forth between pages as I try to compare hikes. The impatient data geek in me figured there had to be a better way.
Getting the Data
The first thing I needed to do was grab the necessary data. Up until now, I’ve always dealt with tabular data, pulled either from Excel or a data warehouse (spoiled, I know). In this case, the data I needed was inside pages and pages of search results. Furthermore, some of the data wasn’t on the main search results page, but rather one level deeper on the hike details page. Even after I filtered the Washington Trail Association’s search results for just the kid-friendly hikes, that still left well over 500 hits. That’s a lot of data to manually transcribe, and I’ve never been that ambitious.
While most of the data I needed was in string or numeric format, the hike features (e.g. whether there were mountain views, whether dogs were allowed, etc.) were all represented by icons, so I couldn’t just map those to a text column. However, if I changed the column type to html, I could grab the underlying URL string for each unique icon, figuring I could just strip out everything but the feature string later.
Another problem was that the features (and icons) were inconsistent between hikes. In other words, not only did hikes contain a different number of features, but the order of the icons could vary from hike to hike. So I just created nine generic icon fields (e.g. icon 1, icon 2, etc.) and mapped them to the specific location on each page. This resulted in each icon field containing multiple values, but I knew this was something I could correct after I had extracted the data.
Preparing the data for Tableau
After extracting all of the hike data, I ended up with two CSV files – one with the primary hike data, and another with the secondary data. The first one could be imported into Tableau without incident, since it was structured optimally with one row per hike and each column representing a unique attribute like name, region, rating etc. The second file, however, contained those nine separate icon columns. Before I could import that into Tableau, I had to turn those into two columns, one with the icon number and the second with icon URL string.
Using the Tableau Excel Reshaping tool made that very easy. Now I could import this second file into Tableau as well.
The final piece of data prep was stripping out all of the URL string junk and just leaving the feature description. A Tableau string function using a combination of mid, len and find did the trick. I now had two data files I could play with.
Were I to need to do this type of data prep regularly, or on a significantly larger data set, I think I’d definitely look to Alteryx to automate the workflow. But for a small and ad-hoc use-case like this, import.io and the Tableau data reshaping tool were more than sufficient, despite the little hiccups.
Building the Tableau View
I wanted my Tableau view to do a few things:
1. Let me filter the list of possible hikes based on things like region, feature set, distance etc.
2. Compare those hikes in terms of rating, popularity and distance
3. See where they were on a map
4. Link out to additional info like trip reports and photos
The only slightly tricky part was #1, simply because I had imported the hike data with multiple rows per hike in order to accommodate the feature field. And while I could use the single Hike Feature field to filter those rows, what I really wanted the filter to do was only show me the hikes that met all of my filter criteria. For example, if I selected ‘Waterfalls’, ‘Mountain Views’ and ‘Dogs Allowed’, I would want to see only those hikes where all three of those were true. As you can see below, simply applying the Hike Feature filter gives me all hikes where any of those criteria are true.
As happens more often than not in Tableau, table calculations to the rescue! I created three table calculations to help me identify which hikes met all of the features in my filter set:
1. # of Selected Features – I used total(countd([Hike Feature])), set to compute along Table Down to give me the total number of features that had been selected.
2. # of Matching Features – I used window_sum(countd([Feature Number])), set to compute along all Names and Hike Features but restarting every Name, to tell me how many features were present per unique hike.
3. Matching Hikes? – this is just a Boolean comparison of the two prior calculations ([# of Matching Features]=[# of Selected Features]) to tell me which hikes had the same number of features as the number selected. Placing this on the filter shelf, with only “True” selected, ensures that my view will only contain hikes that meet all of my filter criteria.
Once I had that, the rest was predominantly formatting & design decisions.
- I had originally planned to make a map the central part of the view, but felt that it wouldn’t allow for the kind of comparisons I wanted to make. In addition, you couldn’t see all of the hike names at once, which you can with a bar chart. So I placed a comparative bar chart in the top (i.e. main) section of the view, and used the map as a supporting chart, linked to the bar chart via a highlight action.
- I first placed all of the hike details and description in the tool-tip, but then quickly realized that the tool-tip often obscured the map below (given how long some of the hike descriptions were). So I moved the hike info down next to the map, and left the bar chart tool-tip leaner, focusing primarily on the links to other resources.
- Speaking of the links, I wanted to provide more than just a link back to the hike details page that I pulled the data from. So I also created a link to a Google Image search, passing the hike name + “WA” as a search parameter (given how generic some of the hike names were, adding WA to the string increased the changes of getting relevant photos). Being a photographer, I often want to assess a hike based on the photographic opportunities it might offer me. This is something that often drives my family insane. 🙂
- I created a parameter and an associated calculated field that lets me sort the hikes based on things like rating, popularity, distance etc.
- Since I borrowed many of the icons from the WTA website, I chose a green hue for the text and bars that I felt was complimentary.
Click the image of the dashboard to go to the interactive version.
Thanks for reading.