Wednesday, October 7, 2009

"Mystery Data CSV" Parsing


 This week for Visualizing Data we were given a "mystery" data set, along with some hints that the set (wink, wink) might contain x-y coordinates. This was an exercise in not only parsing CSV's, but also in taking in data and deriving meaning from it.


A quick parse of the code revealed the x-y coordinates, and quickly demonstrated them to represent a map of the world. The third (data) value was indeterminate, but appeared to represent some sort of variable (population, energy consumption?) associated with more populous areas. When used as a pixel's alpha value, the picture came to quickly represent the well known maps of earth from space at night.


While this was all well and good, it didn't seem to reveal anything about the data, other than that it was exactly what it appeared to be, and that there was world wide trending. However, in an effort to possibly determine slightly more about it, I decided to project the data values into the y axis, and the y axis into z space. This meant that the map was being rendered horizontally, with the height of the map representing data at a given point.


Once this was done, it revealed a few more interesting facts about the data:


1) Despite the "hot spots", there's not a particular are of the world that doesn't have high data points. The points are universally high and low across the breadth of the map.


2) The data appears to be highly stratified across the map, resulting in data "rows" on the y-axis. While I can't be sure why this might be, it seems likely that these "rows" are the result of estimates or rounding employed in the data collection.


Overall, this exercise allowed me to parse CSV's, which is relatively trivial. However, it also forced me to look at the data a little more closely, and in doing so revealed some facts that might have been otherwise overlooked in the 2D model.


Download code by clicking here

No comments:

Post a Comment