Just to mention some relevant information about the first results:
Sankey diagrams were offering interesting hints when the number of nodes was relatively small. For example, during the period of Feb to Apr 2015 before change of curricula, most of the campers sampled during that time (about 1000) followed similar paths, with few exceptions. That helped to make a visualisation.
However, since the April 2015 change in the curriculum, the number of campers grew reaching 30 times the Feb-Apr amount by Sep-Nov same year, and with that the number of starting points and possible paths. Additionally, the number of challenges like waypoints grew, implying more starting points and different paths. A simple Sankey diagram became insufficient and non-effective visualization on those conditions.
For a more effective analysis after that period it would be possible that we should apply tools more in the area of sequential pattern mining and visualizations more used in that area.
Sankey diagrams could still be a useful technique if we can fragment the data, for example analysing each starting point separately. But the identification of bottlenecks (places where the abandon rate is higher) or aggregations points (nodes that are very common along all paths) might not be easily identified.
There is more that we can do to improve the Sankey diagram visualisation I made, for example instead of naming then nodes as the coded system, giving then the full challenge name.