Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
  • Mar 06 02:25
    dmvianna closed #33
  • Mar 06 02:25
    dmvianna closed #33
  • Feb 04 2021 22:49
    flak153 removed as member
  • May 20 2020 05:04

    ocramz on gh-pages

    Add `sampling` (compare)

  • May 19 2020 09:03

    ocramz on gh-pages

    Add kdt, Supervised Learning se… (compare)

  • Apr 14 2020 01:32
    tonyday567 removed as member
  • Jan 30 2020 07:37

    ocramz on gh-pages

    Add arrayfire (compare)

  • Jan 02 2020 12:51

    ocramz on gh-pages

    add inliterate (compare)

  • Jan 02 2020 12:43

    ocramz on gh-pages

    update hvega entry (compare)

  • Jul 01 2019 09:43
    dmvianna added as member
  • Jun 15 2019 04:55

    ocramz on gh-pages

    Add pcg-random (compare)

  • Jun 14 2019 16:08
    ocramz labeled #42
  • Jun 14 2019 16:08
    ocramz labeled #42
  • Jun 14 2019 16:08
    ocramz labeled #42
  • Jun 14 2019 16:08
    ocramz labeled #42
  • Jun 14 2019 16:08
    ocramz labeled #42
  • Jun 14 2019 16:08
    ocramz labeled #42
  • Jun 14 2019 16:08
    ocramz opened #42
  • Jun 14 2019 16:08
    ocramz opened #42
  • Jun 06 2019 18:21

    ocramz on gh-pages

    Fix graphite link Merge pull request #41 from alx… (compare)

Ignat Insarov
@kindaro:matrix.org
[m]
Maybe we can start with the abstraction of a data set. This can be a directory of CSV files or an SQL data base. What is the appropriate type? Not an easy question already.
chreekat
@b:chreekat.net
[m]

What is the simplest thing that can be done with a data set that is still useful?

I immediately think of https://datasette.io/

Ignat Insarov
@kindaro:matrix.org
[m]
To glue two data sets together is one simple thing I can think of. Like, a canonical data set monoid.
Ignat Insarov
@kindaro:matrix.org
[m]
There is also visualization. In principle every data set is isomorphous to its visualization and its storage.
1 reply
There are also projections — destructive operations like an average of a column or a correlation between columns.
Finally, there are generative operations that create «fake» data sets from a bunch of parameters. They may be either various noises or carefully derived predictions.
There is also distance between data sets of the same type that lets us do validation.
Ignat Insarov
@kindaro:matrix.org
[m]
So, we have read → project → generate → validate → visualize.
Suppose we split the data set into 10 random slices. Do we now have a data set of data sets?
Now we can talk about how well a given prediction is validated as a projection of the data set of data sets. For example, we can visualize it. Cool!
Ignat Insarov
@kindaro:matrix.org
[m]
Yes, what I mean is not a single visualization. There is a «full set» of visualizations that displays all dimensions of all data points.
Something like a scatter plot matrix.
We want to be able to see all the information that there is, so at least there is a monic arrow from the type of data sets to the type of visualizations.
And a visualization can be written down as a data set — we want it to be the same data set we started with.
Ignat Insarov
@kindaro:matrix.org
[m]
Maybe we can say that there are many different visualizations for the same data set. Like, say, with different colour schemes.
Ignat Insarov
@kindaro:matrix.org
[m]
But, for example, a box plot is not a visualization. It is visualization after projection.
chreekat
@b:chreekat.net
[m]

Interesting! I see where you're going.

But is that actually useful? :) Datasets don't just exist to be manipulated, but to explain something useful about the real world. What I mean is that a tool to simplify creating a box plot out of a CSV file is more likely to build a user base than a general algebra over datasets

Ignat Insarov
@kindaro:matrix.org
[m]
I can easily build a tool that does something specific for Joe, Kim and Mary. But no amount of such tools will get Haskell to a place I want to put it in. We must show developers that they can develop, and then developers will show users that they can use.
Ignat Insarov
@kindaro:matrix.org
[m]

Bold claim: every widespread language has at least one widespread framework, in the wide sense of the word.

  • Python: Flask, PyTorch.
  • JavaScript: DOM API, React.
  • C: Linux, the POSIX standard.
  • R: the R repl.

I say success of a language is impossible unless hard decisions are already made for some practical area of expertise.

This does not imply causality of course. It is hard to establish causality in History. So, I find this observation persuasive — this is as good as it gets. If we are to make any decision, the decision to get a framework going is a good one.
chreekat
@b:chreekat.net
[m]
Haskell's "framework", so far, is geared towards language design
1 reply
Vincent Meade
@vmeade
Hello
Jaeyoung Lee
@jaeyounkg
Hi all, anyone still alive here?
chreekat
@b:chreekat.net
[m]
For some definition of 'alive'
Jaeyoung Lee
@jaeyounkg
I stumbled upon dataHaskell recently and thought it looked cool. Then I decided to give it a try, first by using Chart. I spent the entire day trying to set it up in a stack project but failed because there was no LTS version providing all the necessary component libraries of Chart(?). Then I found out that the Chart repository has basically been inactive since 2016 or so...
This community seems largely inactive especially recently but still, I love both Haskell and data science and I think it'd be really cool to have a Haskell environment where you can do advanced data science stuff easily. If I wanna contribute to dataHaskell, where can I start?
Jaeyoung Lee
@jaeyounkg
I read the recent conversation from @kindaro:matrix.org and it seems like, we still don't even have a commonly-agreed-upon uniform framework providing basic types, like Matrices?
chreekat
@b:chreekat.net
[m]
Seems so. I guess it would help to start by clarifying the ideas. I mean, what would really help is if some hacker genius just threw together a new tool and language that was heavily influenced by Haskell on the one hand, and successful tools on the other. Maybe it would be an edsl, maybe not
Marco Zocca
@ocramz_:matrix.org
[m]
hi tebu1783 (tebu1783), I second your sentiments re. the need for this kind of tooling in haskell, and of a supportive community
but this language is both a blessing and a curse because it favors experimentation and is very large, therefore there are more "standards" than actual users
we also have to reinvent many things from scratch that other languages get for free like subtyping or inheritance
Man of Letters
@man_of_letters:mozilla.org
[m]
@tebu1783: if there's not suitable LTS, would cabal work?
chreekat
@b:chreekat.net
[m]
I'm a Nix advocate, I'll try to keep down the zealotry to a minimum though
Jaeyoung Lee
@jaeyounkg
Thanks for the response, @ocramz_:matrix.org
That reminded me of the classical essay The Lisp Curse. I thought this wouldn't be the case for Haskell because it's way more restricted than Lisp (in the sense that it has no macros)... But the "Haskell Curse" is still caused by the enormous amount of language extensions and new advanced features built on top of those?
@man_of_letters:mozilla.org @b:chreekat.net I haven't given cabal a go, but now you mention it, I should try using Nix and cabal as a last resort, too :-)
chreekat
@b:chreekat.net
[m]
Fwiw I don't feel like the number of language extensions is a problem, especially with the GHC20xx concept that is now available. But the fragmentation of the "standard" library is a problem
And the other problem as far as data goes is the Num type class
Marco Zocca
@ocramz_:matrix.org
[m]
see? that's exactly what I meant above: 2 users, 2 orthogonal opinions.
chreekat
@b:chreekat.net
[m]
Haha
Marco Zocca
@ocramz_:matrix.org
[m]
IMO the haskell type system should simply be bypassed for data/ML workflows. With metaprogramming, generics and so on. Stuff like type families and singletons is simply a pain in the ass at scale.
this is my rationale for 'heidi' and my recent work on tensor compilation
chreekat
@b:chreekat.net
[m]
Yeah I'm thinking the same thing, in that the boilerplate of type decorations is just a tremendous pain in the ass sometimes. And I agree that opening the doors to over engineering is probably not the right idea in this situation
chreekat
@b:chreekat.net
[m]

My personal experience in this domain was at university studying numerical analysis and computational physics. For some dumb reason I really wanted to use C for everything at first. When I got around to anything using matrices, though, the boilerplate just killed me. I switched to octave. Sure, having a bunch of math stuff built in was useful, but what really saved me was a compact syntax built for the problem domain.

That's why I'm not sure just using raw Haskell would be the right idea.

I never got around to learning python, so I'm curious to know in what ways it was flexible enough to get heavy adoption for mathematical programming

Marco Zocca
@ocramz_:matrix.org
[m]
chreekat: python is great for doing science because it runs your code no matter what. No matter what they say about pre-registering experiments, scientists just love to tinker and make up hypotheses after the fact
haskell is the opposite, it forces you to formalize the data domain very early. And the more type stuff you add to the domain, the harder it becomes to refactor
chreekat
@b:chreekat.net
[m]
Marco Zocca: yes, that makes sense
Yves Parès
@YPares
Hey guys, I'm looking for simple models in Haskell to generate random unique names (as in first names), just to identify stuff (as random UUIDs would, but in a pronounceable manner). Grenade seems to be able to train models that could do that, but maybe it's overkill, and also it doesn't seem really maintained. Any idea? I'd like something light that generates serializable models so it can easily be embedded in a Haskell application
Man of Letters
@man_of_letters:mozilla.org
[m]
@YPares: hi! what kind of neural networks are these (in newbie terms: fully connected, convolutional, recurrent, etc.)? Do you have in mind some classic model described somewhere in detail?
also, is the pronounceability learned by imitation and not verified, or is there a loss function that really enforces that during the training? (pardon me if that's trivial to google)
Samuel Schlesinger
@SamuelSchlesinger

chreekat: python is great for doing science because it runs your code no matter what. No matter what they say about pre-registering experiments, scientists just love to tinker and make up hypotheses after the fact

I must say, I've had some horrible experiments where I run Python code for hours and then finally it prints out:

Traceback (most recent call last):
  File "boop.py", line 1, in <module>
    print(x)
NameError: name 'x' is not defined