These are chat archives for FreeCodeCamp/DataScience

21st
Feb 2016
evaristoc
@evaristoc
Feb 21 2016 14:26
People:
A first overview of Sankey diagram as part of the camper path analysis project can be found at:
https://bl.ocks.org/evaristoc/2d6c4f8c06c64768f119
evaristoc
@evaristoc
Feb 21 2016 14:45
Just to mention some relevant information about the first results:
  • Sankey diagrams were offering interesting hints when the number of nodes was relatively small. For example, during the period of Feb to Apr 2015 before change of curricula, most of the campers sampled during that time (about 1000) followed similar paths, with few exceptions. That helped to make a visualisation.
  • However, since the April 2015 change in the curriculum, the number of campers grew reaching 30 times the Feb-Apr amount by Sep-Nov same year, and with that the number of starting points and possible paths. Additionally, the number of challenges like waypoints grew, implying more starting points and different paths. A simple Sankey diagram became insufficient and non-effective visualization on those conditions.
  • For a more effective analysis after that period it would be possible that we should apply tools more in the area of sequential pattern mining and visualizations more used in that area.
  • Sankey diagrams could still be a useful technique if we can fragment the data, for example analysing each starting point separately. But the identification of bottlenecks (places where the abandon rate is higher) or aggregations points (nodes that are very common along all paths) might not be easily identified.
There is more that we can do to improve the Sankey diagram visualisation I made, for example instead of naming then nodes as the coded system, giving then the full challenge name.
evaristoc
@evaristoc
Feb 21 2016 14:54
NOTE: sequential pattern mining is one of the most important techniques used in bioinformatics...
evaristoc
@evaristoc
Feb 21 2016 23:15
People:
I will be working on this one too:
http://codepen.io/ecccs/pen/obreEd