It's old, but the theory behind it stands. https://bost.ocks.org/mike/chart/ @GoldbergData

goldbergdata sends brownie points to @becausealice2 :sparkles: :thumbsup: :sparkles:

api offline

Anybody know about setting up a kruskal minimum spanning tree algorithm in python?

hi, is anyone interested in learning data science through onine?

@mcbarlowe

I have one code that involves recursion but for what?

I have one code that involves recursion but for what?

I just have a problem in these final projects where I need to set one up using an adjacency list that I just can’t wrap my head around

It is a greedy algorithm. Wikipedia's pseudo:

```
KRUSKAL(G):
1 A = ∅
2 foreach v ∈ G.V:
3 MAKE-SET(v)
4 foreach (u, v) in G.E ordered by weight(u, v), increasing:
5 if FIND-SET(u) ≠ FIND-SET(v):
6 A = A ∪ {(u, v)}
7 UNION(u, v)
8 return A
```

You have to define MAKE-SET, FIND-SET, and UNION. The notation of the pseudo-code doesn't match the explanation given in the Wikipedia article. The explanation should goes as follows:

- create a graph A (a set of trees), where each vertex in the graph is a separate tree.
- create a set S containing all the edges in the graph (this is the function MAKE-SET given above and what does is modifying G to add G.E; in the psuedo-code, G.V represents NODES and G.E represents EDGES)
- while S (G.E) is nonempty and A is not yet spanning:
- remove an edge with minimum weight from S (G.E)
- if the removed edge connects two different trees (FIND-SET(u) ≠ FIND-SET(v)) then add it to the forest A, combining two trees into a single tree (the UNION(u,v) function)

See that the set of (u,v) nodes from G.E edges to be evaluated involves the combination of ALL the nodes without duplication.

The pseudo-code in Wikipedia doesn't show any additional trick. The recursion (a transversal) in the other code I have is used for something called path compression according to the literature and it would make the algorithm to run faster (almost constant time).

A modification of the code I have would involve to iterate over a generator or a ordered queue of a combination of (u,v) edges.

A well implemented naive Kruskal is already fast (log time) for the worst case.

Yeah I just don’t know how to implement that in python really

First thing to do as important is that it is a greedy algo, @mcbarlowe, so you would have to use the length of the edge and order it based on something, usually weight. The algo will start looking those paths that are the shortest ones and add them as trees in A. If there is a short path that connects some of those trees, then that would make a new tree. At the beginning is easy, as trees will consists on just 2 edges ones. But what when there is a connected tree of more than 2 vertices? You have to move around the tree's vertices and find the edge that would connect it to a neighbouring tree with the shortest path. If there is a path already between two points, you should reject any added connection otherwise becomes a cycle.

@mcbarlowe There is info online for sure. I am reading some.

One important aspect of Kruskal is finding the vertex that will better connect to the neighbouring tree.

So you have to evaluate the tree before connecting.

Like I’ve got what’re I’ve sorted by edges the main thing I’m having trouble setting up is checking whether the addition of an edge creates a cycle

I think you might base that on dicts. In the reference I am reading suggests that you will be create a matroid.

The algo seems to suggests that the trick for excluding them might resides in the UNION and FIND functions. You should exclude from finding all those values given in UNION that are already connected. It looks like a simple

`if`

.
That is the naive Kruskal.

The book I normally use? "Python Algorithms" by Hetland. I read it all. And I would do that again and again. The code for the Kruskal is there. But it is more important to understand it. So that is why I am not giving up the code to you yet.

And again, you will certainly find it online for sure. Easily.

Yeah that’s fine I’ll check that book out and I’ve found plenty of material on it and i understand what the algorithm is supposed to do it’s just getting it to do it in code is the hard part

I see, @mcbarlowe .

- It is greedy so it search based on a simple rule: shortest edges left at each iteration.
- Every short edge that joints two points is accepted as part of the tree unless they join two joined points that already belong to the same tree. You know that they belong to the same tree because they are already registered on it. To determine if they belong to the same tree you could use different techniques. Transversal is one.
- The trees that are formed, all must be put in a container for further comparison.
- Once accepted, you don't visit that short edge any more (you delete it from the list).
- If for the contrary the next short edge belongs to two different trees, you accept it and union those trees into one. Then you continue looking at the remaining edges ALWAYS by starting from the shortest from them all.

That is more or less what it is happening.

Let me know if that helps.

You avoid cycling by finding a way to determine that two vertices were already registered as part of the same tree. You delete that edge from the list of edges to be visited (so you won't visit it anymore) and go to the next edge. Then if the two points joined belong to different trees, you join them and create a new tree that is the union of those two.

Simple but effective. These people working in algorithms are really cleaver in finding optima.

Yeah I think I have an idea now we’ll see though once I start to code t

Actually the thing where you said check to see if two vertices are part of the tree made it click for me thanks also @evaristoc I was trying to do some recursive path function I couldn’t get working

mcbarlowe sends brownie points to @evaristoc :sparkles: :thumbsup: :sparkles:

:cookie: 380 | @evaristoc |http://www.freecodecamp.com/evaristoc

```
def question3(G):
nodes = set()
mst = []
edges = [(key, x[0], x[1]) for key in G.keys() for x in G[key]]
sort_edges = sorted(edges, key = lambda x: x[2])
for edge in sort_edges:
if mst == []:
mst.append(edge)
nodes.add(edge[0])
else:
if edge[0] in nodes and edge[1] in nodes:
pass
else:
nodes.add(edge[0])
nodes.add(edge[1])
mst.append(edge)
print(mst)
test = {'A': [('B', 2), ('C', 6)],
'B': [('A', 2), ('C', 5)],
'C': [('A', 6), ('B', 5)],
'D': [('C', 8), ('C', 5)]
}
question3(test)
```

yes I think I've got it finally lol
it's not the most optimal since it still has to iterate through all the edges but it works at least on my simple test cases

@mcbarlowe I think to not iterate all the edges, eliminate them from a list of edges to be evaluated. That is what it makes it greedy.

I think you have duplicates?

Duplicate edges ((u,v) == (v,u)) being evaluated twice.

I think, I am not completely sure.

For an example of a finding function:

```
def naive_find(C,u): #where u is the vertex and C the component to which u belongs
while C[u] != u:
u = C[u]
return u
```

The search is for those vertices in the component that are different to the one you are evaluating.

Anyway... I think this comment is bringing you less clarity...

I think in general the code you made is ok. The only issue I see is finding that the two vertices belong or not to the same tree.

what do you mean by component is that the edge the vertex belongs to?

Well... in this case a tree.

Component is a broader concept in network analysis.

For what I see from your code the finding function is still to be improved, as well as how to move to the next edge being the previous ones already evaluated.

It has indeed to evaluate all the edges, that is correct. But it will discharge immediately those that are linking points belonging to the same tree.

If all points are already linked, the evaluation of two points in the same tree should happen in at least linear time.

`nodes.add(edge[1])`

is not really tracking. The track should involve another greedy aspect of the algo: it should be a variable that won't have memory, only the best last value. In the book, he uses a dictionary that updates the value of C[u] by the best last vertex that fits the test:

`C[u] = v`

That occurs in the Union part.

```
def naive_union(C,u,v):
u = naive_find(C,u)
v = naive_find(C,v)
C[u] = v
```

Find the next vertex from u and from v and if they are in different trees, assuming the edge that link them is short (as you have sorted), then complete an union.

find **the next vertex from u and from v and if they are in different trees assuming the edge that link them is short as you have sorted then complete an union**

nothing found

nothing found

Ahhh the

`find`

is an instruction... ok...
find something

lol no I think I'm geting closer to understanding it i jsut need to play around with it more

Sorry I said

the best last vertex that fits the test

In this case is the best FIRST vertex

Good luck!