These are chat archives for FreeCodeCamp/DataScience

13th
Nov 2017
Alice Jiang
@becausealice2
Nov 13 2017 01:00
I have a link on reusable charts, one sec
It's old, but the theory behind it stands. https://bost.ocks.org/mike/chart/ @GoldbergData
Josh Goldberg
@GoldbergData
Nov 13 2017 01:04
Thanks @becausealice2
CamperBot
@camperbot
Nov 13 2017 01:04
goldbergdata sends brownie points to @becausealice2 :sparkles: :thumbsup: :sparkles:
api offline
Josh Goldberg
@GoldbergData
Nov 13 2017 01:05
Here is my github for Kaggle. I’ll be adding more visualizations tonight after dinner.
Matthew Barlowe
@mcbarlowe
Nov 13 2017 02:15
Anybody know about setting up a kruskal minimum spanning tree algorithm in python?
K Krishna Chaitanya
@kkchaitu27
Nov 13 2017 03:21
hi, is anyone interested in learning data science through onine?
evaristoc
@evaristoc
Nov 13 2017 11:47
@mcbarlowe
I have one code that involves recursion but for what?
Matthew Barlowe
@mcbarlowe
Nov 13 2017 12:27
I just have a problem in these final projects where I need to set one up using an adjacency list that I just can’t wrap my head around
evaristoc
@evaristoc
Nov 13 2017 12:28

It is a greedy algorithm. Wikipedia's pseudo:

KRUSKAL(G):
1 A = ∅
2 foreach v ∈ G.V:
3    MAKE-SET(v)
4 foreach (u, v) in G.E ordered by weight(u, v), increasing:
5    if FIND-SET(u) ≠ FIND-SET(v):
6       A = A ∪ {(u, v)}
7       UNION(u, v)
8 return A

You have to define MAKE-SET, FIND-SET, and UNION. The notation of the pseudo-code doesn't match the explanation given in the Wikipedia article. The explanation should goes as follows:

  1. create a graph A (a set of trees), where each vertex in the graph is a separate tree.
  2. create a set S containing all the edges in the graph (this is the function MAKE-SET given above and what does is modifying G to add G.E; in the psuedo-code, G.V represents NODES and G.E represents EDGES)
  3. while S (G.E) is nonempty and A is not yet spanning:
    1. remove an edge with minimum weight from S (G.E)
    2. if the removed edge connects two different trees (FIND-SET(u) ≠ FIND-SET(v)) then add it to the forest A, combining two trees into a single tree (the UNION(u,v) function)

See that the set of (u,v) nodes from G.E edges to be evaluated involves the combination of ALL the nodes without duplication.

The pseudo-code in Wikipedia doesn't show any additional trick. The recursion (a transversal) in the other code I have is used for something called path compression according to the literature and it would make the algorithm to run faster (almost constant time).

A modification of the code I have would involve to iterate over a generator or a ordered queue of a combination of (u,v) edges.

A well implemented naive Kruskal is already fast (log time) for the worst case.

Matthew Barlowe
@mcbarlowe
Nov 13 2017 12:41
Yeah I just don’t know how to implement that in python really
evaristoc
@evaristoc
Nov 13 2017 12:41
First thing to do as important is that it is a greedy algo, @mcbarlowe, so you would have to use the length of the edge and order it based on something, usually weight. The algo will start looking those paths that are the shortest ones and add them as trees in A. If there is a short path that connects some of those trees, then that would make a new tree. At the beginning is easy, as trees will consists on just 2 edges ones. But what when there is a connected tree of more than 2 vertices? You have to move around the tree's vertices and find the edge that would connect it to a neighbouring tree with the shortest path. If there is a path already between two points, you should reject any added connection otherwise becomes a cycle.
@mcbarlowe There is info online for sure. I am reading some.
One important aspect of Kruskal is finding the vertex that will better connect to the neighbouring tree.
So you have to evaluate the tree before connecting.
Matthew Barlowe
@mcbarlowe
Nov 13 2017 12:43
Like I’ve got what’re I’ve sorted by edges the main thing I’m having trouble setting up is checking whether the addition of an edge creates a cycle
evaristoc
@evaristoc
Nov 13 2017 12:46
I think you might base that on dicts. In the reference I am reading suggests that you will be create a matroid.
evaristoc
@evaristoc
Nov 13 2017 12:55
The algo seems to suggests that the trick for excluding them might resides in the UNION and FIND functions. You should exclude from finding all those values given in UNION that are already connected. It looks like a simple if.
That is the naive Kruskal.
The book I normally use? "Python Algorithms" by Hetland. I read it all. And I would do that again and again. The code for the Kruskal is there. But it is more important to understand it. So that is why I am not giving up the code to you yet.
And again, you will certainly find it online for sure. Easily.
Matthew Barlowe
@mcbarlowe
Nov 13 2017 13:07
Yeah that’s fine I’ll check that book out and I’ve found plenty of material on it and i understand what the algorithm is supposed to do it’s just getting it to do it in code is the hard part
evaristoc
@evaristoc
Nov 13 2017 15:33

I see, @mcbarlowe .

  • It is greedy so it search based on a simple rule: shortest edges left at each iteration.
  • Every short edge that joints two points is accepted as part of the tree unless they join two joined points that already belong to the same tree. You know that they belong to the same tree because they are already registered on it. To determine if they belong to the same tree you could use different techniques. Transversal is one.
  • The trees that are formed, all must be put in a container for further comparison.
  • Once accepted, you don't visit that short edge any more (you delete it from the list).
  • If for the contrary the next short edge belongs to two different trees, you accept it and union those trees into one. Then you continue looking at the remaining edges ALWAYS by starting from the shortest from them all.

That is more or less what it is happening.

Let me know if that helps.
evaristoc
@evaristoc
Nov 13 2017 15:42
You avoid cycling by finding a way to determine that two vertices were already registered as part of the same tree. You delete that edge from the list of edges to be visited (so you won't visit it anymore) and go to the next edge. Then if the two points joined belong to different trees, you join them and create a new tree that is the union of those two.
Simple but effective. These people working in algorithms are really cleaver in finding optima.
Matthew Barlowe
@mcbarlowe
Nov 13 2017 16:05
Yeah I think I have an idea now we’ll see though once I start to code t
Actually the thing where you said check to see if two vertices are part of the tree made it click for me thanks also @evaristoc I was trying to do some recursive path function I couldn’t get working
CamperBot
@camperbot
Nov 13 2017 16:07
mcbarlowe sends brownie points to @evaristoc :sparkles: :thumbsup: :sparkles:
:cookie: 380 | @evaristoc |http://www.freecodecamp.com/evaristoc
Matthew Barlowe
@mcbarlowe
Nov 13 2017 18:12
def question3(G):

  nodes = set()
  mst = []
  edges = [(key, x[0], x[1]) for key in G.keys() for x in G[key]]

  sort_edges = sorted(edges, key = lambda x: x[2])

  for edge in sort_edges:
    if mst == []:
      mst.append(edge)
      nodes.add(edge[0])
    else:
      if edge[0] in nodes and edge[1] in nodes:
        pass
      else:
        nodes.add(edge[0])
        nodes.add(edge[1])
        mst.append(edge)
  print(mst)


test = {'A': [('B', 2), ('C', 6)],
        'B': [('A', 2), ('C', 5)], 
        'C': [('A', 6), ('B', 5)],
        'D': [('C', 8), ('C', 5)]
}

question3(test)
yes I think I've got it finally lol
it's not the most optimal since it still has to iterate through all the edges but it works at least on my simple test cases
Matthew Barlowe
@mcbarlowe
Nov 13 2017 19:02
never mind it doesn't work but it's close
evaristoc
@evaristoc
Nov 13 2017 19:05
@mcbarlowe I think to not iterate all the edges, eliminate them from a list of edges to be evaluated. That is what it makes it greedy.
I think you have duplicates?

Duplicate edges ((u,v) == (v,u)) being evaluated twice.

I think, I am not completely sure.

evaristoc
@evaristoc
Nov 13 2017 19:12
For an example of a finding function:
def naive_find(C,u): #where u is the vertex and C the component to which u belongs
   while C[u] != u:
      u = C[u]
   return u
The search is for those vertices in the component that are different to the one you are evaluating.
Anyway... I think this comment is bringing you less clarity...
evaristoc
@evaristoc
Nov 13 2017 19:17
I think in general the code you made is ok. The only issue I see is finding that the two vertices belong or not to the same tree.
Matthew Barlowe
@mcbarlowe
Nov 13 2017 19:17
what do you mean by component is that the edge the vertex belongs to?
evaristoc
@evaristoc
Nov 13 2017 19:18
Tree
Well... in this case a tree.
Component is a broader concept in network analysis.
For what I see from your code the finding function is still to be improved, as well as how to move to the next edge being the previous ones already evaluated.

It has indeed to evaluate all the edges, that is correct. But it will discharge immediately those that are linking points belonging to the same tree.

If all points are already linked, the evaluation of two points in the same tree should happen in at least linear time.

evaristoc
@evaristoc
Nov 13 2017 19:25

nodes.add(edge[1]) is not really tracking. The track should involve another greedy aspect of the algo: it should be a variable that won't have memory, only the best last value. In the book, he uses a dictionary that updates the value of C[u] by the best last vertex that fits the test:

C[u] = v

That occurs in the Union part.

def naive_union(C,u,v):
   u = naive_find(C,u)
   v = naive_find(C,v)
   C[u] = v
Find the next vertex from u and from v and if they are in different trees, assuming the edge that link them is short (as you have sorted), then complete an union.
CamperBot
@camperbot
Nov 13 2017 19:29
find the next vertex from u and from v and if they are in different trees assuming the edge that link them is short as you have sorted then complete an union
nothing found
evaristoc
@evaristoc
Nov 13 2017 19:30
CamperBot is listening too much reggae lately...
Ahhh the findis an instruction... ok...
find something
CamperBot
@camperbot
Nov 13 2017 19:31
find something
nothing found
evaristoc
@evaristoc
Nov 13 2017 19:31
OK.
Matthew Barlowe
@mcbarlowe
Nov 13 2017 19:35
lol no I think I'm geting closer to understanding it i jsut need to play around with it more
evaristoc
@evaristoc
Nov 13 2017 19:35

Sorry I said

the best last vertex that fits the test

In this case is the best FIRST vertex

Good luck!