These are chat archives for typelevel/general

9th
Feb 2018
Owein Reese
@wheaties
Feb 09 14:57
Anyone have an idea of when the idea of immutable data structures first made it to publication?
My CTO is asking for background reference publications and I don't even know how far back to go.
Guillaume Martres
@smarter
Feb 09 15:10
Luka Jacobowitz
@LukaJCB
Feb 09 15:11
AFAIK it goes back to the first LISPs
Emily Pillmore
@emilypi
Feb 09 15:11
^McCarthy had papers introducing the first cons-operator for LISP
Luka Jacobowitz
@LukaJCB
Feb 09 15:12
So probably 60s or something like that
Owein Reese
@wheaties
Feb 09 15:13
yeah, I assumed it was around the time of LISP or before
Will look at McCarthy's earlier LISP papers on the con-operator. Thanks @emilypi
Emily Pillmore
@emilypi
Feb 09 15:17
You could take this a ways back to recursion theory in the early 1900’s. I recall seeing papers as early as the 50’s (i think from Steenrod [double checking] iirc) that wrote about using monoids to describe recursion and generating trees. There’s probably stuff that came sooner I just don’t know any specifically off the top of my head
Owein Reese
@wheaties
Feb 09 15:21
yeah, problem is, most of the early papers dealt with theory wherein they mapped to functions (e.g. side effect free)
I have to wonder if they even considered mutability as a first class thing to talk about
pragmatically, the compilers and such had to deal with it
Emily Pillmore
@emilypi
Feb 09 15:23
and we’ve paid for it ever since
Owein Reese
@wheaties
Feb 09 15:24
if you've ever had to write to something with almost no RAM in an old language, you'd get mutability real fast. I can't tell you how often I had to swap ram to keep the part of a program I cared about in memory.
Emily Pillmore
@emilypi
Feb 09 15:25
Lisp seemed to do an okay job of not abusing mutability
Mark Tomko
@mtomko
Feb 09 20:06

With a persistent data structure, do we know if Scala can garbage collect if we do

val m1 = Map(1 -> 'x')
val m2 = Map(1 -> 'x') + 1 -> 'y'

Is the intermediate value used to construct m2 available for garbage collection? Or, does m2 take up more space than m1 because the structure doesn't make it easy for (1 -> 'x') to be garbage collected?

Lars Hupel
@larsrh
Feb 09 20:42
In that particular case m1 if not referenced elsewhere will be garbage-collected, because immutable maps are special-cases for ≤ 4 elements
s/cases/cased
Emily Pillmore
@emilypi
Feb 09 20:42
Interesting, thank you @larsrh
Mark Tomko
@mtomko
Feb 09 20:43
Interesting. So, for larger maps, I presume the problem gets harder.
But in the general case, you'll have structural sharing
IIRC the default immutable maps in Scala are hash tries?
not sure
but structural sharing means: some old stuff is reused, other parts aren't
Mark Tomko
@mtomko
Feb 09 20:47
I can't quite visualize it in my head, but I'd assume that the tries are constructed in such a way that you can remove elements, then, and expect some or most of the removed structure would be available for the garbage collector, as in the tree example on that wikipedia page.
Lars Hupel
@larsrh
Feb 09 20:48
The idea is: either structure gets reused, or garbage collected
you want to reuse as much as possible to save space (in case you have references to older generations) and reduce GC churn (in case you don't)
Mark Tomko
@mtomko
Feb 09 20:49
yeah.
Lars Hupel
@larsrh
Feb 09 20:50
@emilypi np, happy to explain
@mtomko Have you seen Okasaki's book?
there's a ton of this stuff in there
Mark Tomko
@mtomko
Feb 09 20:51
Yes! It's a neat book. I was digging around in there for examples of complex structures with deletions
Lars Hupel
@larsrh
Feb 09 20:51
Deletion is hard :smile:
Mark Tomko
@mtomko
Feb 09 20:51
There seemed to be some, but it's been long enough since I was immersed in it, answering the question was slow going.
Lars Hupel
@larsrh
Feb 09 20:51
one of the most common persistent data structures, red-black trees, often come with wrong implementations of deletion
in the other cases authors will write "deletion is left as an exercise for the reader" :smile:
(I know this because my PhD supervisor is in the business of verifying immutable data structures)
Mark Tomko
@mtomko
Feb 09 20:53
I tried to implement Okasaki's red-black tree because I needed some data structure based off it - an interval tree, I think... I got it working but mine was too slow and we had to go find some other library that had one. Some day I want to figure out what I did wrong, but who knows if I'll have the time.
Lars Hupel
@larsrh
Feb 09 20:56
A nice take on this is using dependent types to ensure that the invariants are correct
Mark Tomko
@mtomko
Feb 09 20:56
Okasaki's implementation is so pretty, and he describes it as being quite fast so I'm sure I must have done something terrible along the way.
Lars Hupel
@larsrh
Feb 09 20:56
unfortunately this is a) quite hard in Scala (@milessabin tried it on stage at Typelevel Summit in Philadelphia) and b) many implementations don't do delete
here's one that I found for Haskell: https://fstaals.net/RedBlackTree.html
@mtomko if you can paste your code I can have a look
Mark Tomko
@mtomko
Feb 09 20:59
I will have to dig around. I'm sure at some point I ended up deleting it although it's definitely in a git repository.
oh, I think I can find it... I'll dig it out and make a gist. It's been a long time since I've looked at this.
Mark Tomko
@mtomko
Feb 09 21:08
wow, I think this is impossible for a human to grok. I'm sort of ashamed :)
Lars Hupel
@larsrh
Feb 09 21:08
I can post some of my old code if it helps :wink:
That's what I had. It's unfortunate that it's sort of intrinsically tied up in being an interval tree.
And I clearly was trying to be terse but erred on the side of unreadability.
It is a good exercise to look at one's own code from years previous, as a check to one's hubris.
Lars Hupel
@larsrh
Feb 09 21:11
I have to admit I have no clue whether or not it is a good idea to base an Interval tree on RBTs
Mark Tomko
@mtomko
Feb 09 21:11
I don't remember where I got the idea to do that, it's been 3 years.
Lars Hupel
@larsrh
Feb 09 21:12
No clue. But I vaguely remember that https://github.com/stew/dogs had something like that
Mark Tomko
@mtomko
Feb 09 21:12
And that may well have been the real problem, not that the RBT was no good but rather that the RBT didn't really provide the sort of search I needed. I think I saw reference in this mess to the fact that I'd had to occasionally search both branches of the tree, which was probably the real problem.
Yeah, the Diet in dogs was something I looked at, and I seem to recall there was something about what that provided that was insufficient.
Mark Tomko
@mtomko
Feb 09 21:19
In our case, the intervals were mappings of features such as genes to the genome. We are basically answering the question of, "If I mess with the genome at position X, what features will I wreck?". I might not understand the Diet correctly, but it seems like it acts more like a set that could represent a single discontinuous feature along the genome.
Anyway, sorry for monopolizing the channel for a while. Thank you to everyone for being kind and informative and tolerating my questions :)
Lars Hupel
@larsrh
Feb 09 21:27
ah I see, that sounds vaguely like the sort of thing R trees could be used for
no worries about the channel :smile:
Mark Tomko
@mtomko
Feb 09 21:28
I'll look up R trees. We ended up using an interval tree from the Stanford NLP library, which works fine and is fast enough, but I think it brings in a lot of dependencies considering we only want just a tiny fragment o their code. I have never investigated how their tree is implemented.