Wednesday, November 14, 2012

So Far this Semester

This semester I am working with Jim McCusker on the Health Data Project as part of an independent study with Professor McGuinness.  I have been working on a variety of graphs and charts using the d3 visualization library.  All of the charts accept data and a json object which contians options.  I also have included clustering, polynomial and other analysis algorithms to improve the value of the charts.  So far I have created code which can produce bar graphs, heat mpas, parallel coordinates, pie charts, and scatter plots.

Above is an example of my pie chart graph.  Given an one dimensional array, it gives each entry a slice proportional to its area.  I plan on improving this graph, by adding an optional legend.  I am also trying to devise a way to assign colors based on some analysis of the data.  I was also planning on allowing each section to broken down further which would allow the creation of a radial tree.
The above graph is an example of my Parallel Coordinates graph.  The difference in color was produced using a K-means clustering algorithm.  The axes were ordered to minimize the linear and quadratic least squares regression errors between neighboring axes.  More information on this algorithm can be found here.  I implemented both of these analytical tools last semester.  Since then I have implemented a second ordering algorithm which relies on a more visual heuristic as opposed to a mathematical one.  The new algorithm minimizes the number of times the points cross.  I am also working on a heuristic which minimizes the total of slopes of the all the lines drawn.  I am also considering implemented a different clustering algorithm, Expectation Maximization (EM).  The EM algorithm produces the probabilities that each point lies in each cluster.  This unlike the K-Means algorithm which assigns each point to exactly one cluster.  Using the EM algorithm the color of each point can be based on these probabilities, which would allow for  easier identification of outliers.


The above graph is an example of my Scatter Plot chart.  This chart allows for the inclusion of a polynomial computed using a least squares algorithm.  The algorithm supports any polynomial of degree zero or more.  In the above graph, the polynomial has degree one.  The polynomial allows the trend in the data to be extenuated.
 Above is not a random assortment of colored rectangles, but an example of my Heat Map chart.   In its current state, the Heat Map is not very useful.  I plan on making improving by, making the transitions between the rectangles a gradient.  I also plan on adding an elegant way of displaying the names of the dimensions to give it more meaning.  I was also thinking about creating an algorithm that would allow the reordering of ordinal dimensions in order to minimize the difference between adjacent points.

 Above is an example of my Bar Chart.  It supports showing multiple dimensions at once, with a variable padding between each group.  The orientation of the chart can also adjusted by adjusting a single value in the json object supplied to the function.  Due to the nature of the Bar Chart, I have no planned analytic tools to which to add to the chart.

All of these were designed with the purpose of being used in the Health Data Project; however, I have designed them to as modular as possible.