Wednesday, November 14, 2012

So Far this Semester

This semester I am working with Jim McCusker on the Health Data Project as part of an independent study with Professor McGuinness.  I have been working on a variety of graphs and charts using the d3 visualization library.  All of the charts accept data and a json object which contians options.  I also have included clustering, polynomial and other analysis algorithms to improve the value of the charts.  So far I have created code which can produce bar graphs, heat mpas, parallel coordinates, pie charts, and scatter plots.

Above is an example of my pie chart graph.  Given an one dimensional array, it gives each entry a slice proportional to its area.  I plan on improving this graph, by adding an optional legend.  I am also trying to devise a way to assign colors based on some analysis of the data.  I was also planning on allowing each section to broken down further which would allow the creation of a radial tree.
The above graph is an example of my Parallel Coordinates graph.  The difference in color was produced using a K-means clustering algorithm.  The axes were ordered to minimize the linear and quadratic least squares regression errors between neighboring axes.  More information on this algorithm can be found here.  I implemented both of these analytical tools last semester.  Since then I have implemented a second ordering algorithm which relies on a more visual heuristic as opposed to a mathematical one.  The new algorithm minimizes the number of times the points cross.  I am also working on a heuristic which minimizes the total of slopes of the all the lines drawn.  I am also considering implemented a different clustering algorithm, Expectation Maximization (EM).  The EM algorithm produces the probabilities that each point lies in each cluster.  This unlike the K-Means algorithm which assigns each point to exactly one cluster.  Using the EM algorithm the color of each point can be based on these probabilities, which would allow for  easier identification of outliers.


The above graph is an example of my Scatter Plot chart.  This chart allows for the inclusion of a polynomial computed using a least squares algorithm.  The algorithm supports any polynomial of degree zero or more.  In the above graph, the polynomial has degree one.  The polynomial allows the trend in the data to be extenuated.
 Above is not a random assortment of colored rectangles, but an example of my Heat Map chart.   In its current state, the Heat Map is not very useful.  I plan on making improving by, making the transitions between the rectangles a gradient.  I also plan on adding an elegant way of displaying the names of the dimensions to give it more meaning.  I was also thinking about creating an algorithm that would allow the reordering of ordinal dimensions in order to minimize the difference between adjacent points.

 Above is an example of my Bar Chart.  It supports showing multiple dimensions at once, with a variable padding between each group.  The orientation of the chart can also adjusted by adjusting a single value in the json object supplied to the function.  Due to the nature of the Bar Chart, I have no planned analytic tools to which to add to the chart.

All of these were designed with the purpose of being used in the Health Data Project; however, I have designed them to as modular as possible. 

Wednesday, May 2, 2012

For the past few weeks I have been working on the datacube browser for the lobi project.  These efforts have paralleled assignments for a class I am taking in visualization.  As a part of my final project for this class, I decided to improve my portion of the datacube browser, which is a parallel coordinates graph.  I devised a way for the system to automatically order the axes in a meaningful way.  I used a least squares algorithm to fit polynomials of degree one and two to pairs of dimensions to determine how well the data exhibited trends which are visibly appealing.  I also limited the appearance of randomness of k-means clustering, by assigning the colors based on attributes of cluster instead of randomly.  I plan to further improve these algorithms to show more interesting trends and appear less random and possibly not be random.  Any suggestions on what types of trends are visibly appealing in parallel coordinates would be welcomed.  If anyone wants a more detailed explanation of my algorithms, I have a pdf with a writeup of the project available.

Thursday, April 5, 2012

LOBI

I have been working on the datacube browser for the Linked Open Biomedical Investigation (LOBI).  I added a parallel coordinates graph underneath the existing scatter plot matrix, as shown below:



I used a similar framework as outlined two posts ago, which includes the k-means clustering.  The datacube browser allows a user to select specific dimensions.  Then a scatter plot matrix, which shows a scatter plot for every pair of dimensions, and a parallel coordinates chart are constructed.  The year can also be specified by the slider at the bottom.  Each time the dimensions or years are changed, the charts are redrawn.

I still have some minor problems to fix on the parallel coordinates graph.  After I fix these I am going to start teaching myself about RDF and Sparql.

Tuesday, March 27, 2012

Linked Open Biomedical Investigations

I have started to work on the Linked Open Biomedical Investigations project with Jim McCusker.  So far i have just looked at the existing code in an attempt to understand it.  I plan on giving the project a considerable amount of time, which will include improving my d3 and javascript skills.  I am also trying to generalize the k-means function I used in the project in my previous post for use in this project.  I have run into difficulties, because of my lack of javascript knowledge.

Wednesday, March 7, 2012

Learning d3

I have continued to teach myself D3 and Javascript.  I used D3 for my midterm project in Introduction to Visualization.  The project visualization about my own music listening habits using parallel coordinates.  I had previously created a similar visualization using VTK, but I wanted to improve on it by using D3 to create it.

Original visualization using VTK
D3 allowed more flexibility, because the visualization was not using a rigid class.  Everything was explicitly drawn and based on the data it was given.  I used D3's CSV parser to quickly import the data.  I then ran the data through my own implementation of K-means clustering.  I then drew the lines, axises and other features.  I also included a brush tool to help in analysis.  Below are screenshots of the visualization.
This shows the parallel coordinates with K-means Clustering, where K is equal to 3
K-means clustering takes k random data points.  It then assigns every point to one of these k points.  Then the k points are computer to be the average of all points assigned to it.  This is repeated until the assignment of labels does not change.  Since, K-means starts with k random points, the results are different every time it is run.

The pictures below show the usefulness of  the brush tool.  The clusters overlapped and it was difficult to see them, but the brush tool was able to limit what points were in the foreground.


This is the visualization with k equal to 4 and without the brush tool


This is the same visualization, with the brush tool to limit what points are visible

Monday, February 27, 2012

Starting at the Tetherless World Constellation

After receiving an email about opportunities for research at the TWC, I immediately sent an email to the listed addresses with the intent of researching at the TWC.  After filling out some forms, I was ready to start.

I first read some a paper about the design process of creating an ontology.  The paper was interesting, and not too difficult to read.  It first defined what an ontology was, then moved onto some problems that may arise when creating one.  The paper used the example of wine to illustrate the problems and solutions presented.  After reading the paper I felt confident that I could design an ontology on an abstract level.

The first project i have worked on is the Population Science Grid.  For this project I have been learning the javascript library d3.  D3 is a tool to create interactive data driven visualization.  I first learned html and javacript.  I then did some d3 tutorials, and tried to create some visualizations on my own.  My next step is to take data from a csv file and then use d3 to make a visualization.