Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
  • Mar 06 02:25
    dmvianna closed #33
  • Mar 06 02:25
    dmvianna closed #33
  • Feb 04 2021 22:49
    flak153 removed as member
  • May 20 2020 05:04

    ocramz on gh-pages

    Add `sampling` (compare)

  • May 19 2020 09:03

    ocramz on gh-pages

    Add kdt, Supervised Learning se… (compare)

  • Apr 14 2020 01:32
    tonyday567 removed as member
  • Jan 30 2020 07:37

    ocramz on gh-pages

    Add arrayfire (compare)

  • Jan 02 2020 12:51

    ocramz on gh-pages

    add inliterate (compare)

  • Jan 02 2020 12:43

    ocramz on gh-pages

    update hvega entry (compare)

  • Jul 01 2019 09:43
    dmvianna added as member
  • Jun 15 2019 04:55

    ocramz on gh-pages

    Add pcg-random (compare)

  • Jun 14 2019 16:08
    ocramz labeled #42
  • Jun 14 2019 16:08
    ocramz labeled #42
  • Jun 14 2019 16:08
    ocramz labeled #42
  • Jun 14 2019 16:08
    ocramz labeled #42
  • Jun 14 2019 16:08
    ocramz labeled #42
  • Jun 14 2019 16:08
    ocramz labeled #42
  • Jun 14 2019 16:08
    ocramz opened #42
  • Jun 14 2019 16:08
    ocramz opened #42
  • Jun 06 2019 18:21

    ocramz on gh-pages

    Fix graphite link Merge pull request #41 from alx… (compare)

Man of Letters
@man_of_letters:mozilla.org
[m]
I'm being told "dataHaskell_Lobby might like to add a topic"
sam, stites: you are listed as admins, so perhaps you can do that (not sure about others)?
Marco Zocca
@ocramz_:matrix.org
[m]
👋
Man of Letters: I haven't seen Sam around here in a long time. I know he's doing a PhD at Northeastern, so that might explain it :)
Man of Letters
@man_of_letters:mozilla.org
[m]
oh dear ;)
Ignat Insarov
@kindaro:matrix.org
[m]
Hello gurus.
I was wondering if there is some sort of a standard set of types for doing data stuff?
So that different packages can talk to each other?

For example, these three packages have completely different typing:

https://hackage.haskell.org/package/kmeans-0.1.3/docs/Data-KMeans.html
https://hackage.haskell.org/package/clustering-0.4.1/docs/AI-Clustering-KMeans.html
https://hackage.haskell.org/package/roc-cluster-0.1.0.0/docs/Data-Cluster-ROC.html

Since they all do more or less the same thing, stands to reason that they should have compatible typing.

Ignat Insarov
@kindaro:matrix.org
[m]
There are other packages for clustering on Hackage — I have been looking at this topic some time ago. Some have lists, some have vectors, some have custom data types.
Is there some way we can get to talk about the common interfaces or frameworks?
chreekat
@b:chreekat.net
[m]

Hi!

I don't try to unify types - each domain has its specific needs which are best documented as individual types. Instead, use functions as the interface between libraries.

Types are cheap
Ignat Insarov
@kindaro:matrix.org
[m]
Another example is Vega, the graphing library written I think in JavaScript? Some sort of bindings to it were posted here a while ago and I tried them out. I do not recall the details, but you are basically asked to construct a Json blob.
Ideally I want to get some clustering and send it to a state of the art graphing front end in one line.
I understand and anticipate the position that open source should organize itself. But I do not believe in it. Yes, there are some gains to be had from domain specific types. Maybe one kind of clustering needs lists and another needs vectors. But there are also gains from coöperation.
I want to coöperate. Either I can go to Python or R, where the open source process had time and money to play out. Or I can try to coöperate with Haskell. But there is no option to stay in Haskell and do everything by myself.
I think the community should think carefully and roll out a unified understanding of what statistics in Haskell should look like.
Ignat Insarov
@kindaro:matrix.org
[m]

For example, there is Frames.

https://github.com/acowley/Frames

Is it good? If it is good, why is it not everywhere? I cannot afford to learn on my own experience. I can spend an evening, maybe a weekend, but it is not a safe bet. R is a safe bet — they have their unified typing that everything works with.

Consider also that I am a highly proficient Haskell programmer. I can make sense of the source code, I have a good understanding of how vectors are different from lists and so on. Not to say that I have some special arcane knowledge, but there is no way an average person will spend time figuring out which types work best for their specific problem. People want stuff to work well out of the box, and rightly so.
chreekat
@b:chreekat.net
[m]
I, uh, wasn't talking about open source philosophy. I'm sorry there's no good library or framework in haskell for what you want to do right now. A unified understanding can't really compete with decades of use by countless users. My point about individual types still stands, but I know that doesn't really help if the community isn't there
I'm not even a data scientist, I just ended up here by accident 😬
chreekat
@b:chreekat.net
[m]
Oh hey, that makes me remember something. Is there anybody out there who has tried to use Haskell on terabytes of memory? I don't do data science myself, but I do devops on (Java) software that does supply chain analysis on huge machines using single digit terabyte heap size, and it makes me curious what that would look like with Haskell
Ignat Insarov
@kindaro:matrix.org
[m]
How about we build a good library or framework?
Ignat Insarov
@kindaro:matrix.org
[m]
Say what. If we can give people good input and output based on solid types, everyone will be incentivized to write to these types.
So, for example, if we can send a CSV file through type X into graphing in one line, people will happily work with X because they will start from this.
chreekat
@b:chreekat.net
[m]
I agree! Too often Haskell libraries are clearly designed to prove some generality (valuable academic work) rather than designed to get some job done. Start with some key tasks a person wants to accomplish and build the library around that. Types can follow the purpose
Ignat Insarov
@kindaro:matrix.org
[m]
Maybe we can start with the abstraction of a data set. This can be a directory of CSV files or an SQL data base. What is the appropriate type? Not an easy question already.
chreekat
@b:chreekat.net
[m]

What is the simplest thing that can be done with a data set that is still useful?

I immediately think of https://datasette.io/

Ignat Insarov
@kindaro:matrix.org
[m]
To glue two data sets together is one simple thing I can think of. Like, a canonical data set monoid.
Ignat Insarov
@kindaro:matrix.org
[m]
There is also visualization. In principle every data set is isomorphous to its visualization and its storage.
1 reply
There are also projections — destructive operations like an average of a column or a correlation between columns.
Finally, there are generative operations that create «fake» data sets from a bunch of parameters. They may be either various noises or carefully derived predictions.
There is also distance between data sets of the same type that lets us do validation.
Ignat Insarov
@kindaro:matrix.org
[m]
So, we have read → project → generate → validate → visualize.
Suppose we split the data set into 10 random slices. Do we now have a data set of data sets?
Now we can talk about how well a given prediction is validated as a projection of the data set of data sets. For example, we can visualize it. Cool!
Ignat Insarov
@kindaro:matrix.org
[m]
Yes, what I mean is not a single visualization. There is a «full set» of visualizations that displays all dimensions of all data points.
Something like a scatter plot matrix.
We want to be able to see all the information that there is, so at least there is a monic arrow from the type of data sets to the type of visualizations.
And a visualization can be written down as a data set — we want it to be the same data set we started with.
Ignat Insarov
@kindaro:matrix.org
[m]
Maybe we can say that there are many different visualizations for the same data set. Like, say, with different colour schemes.
Ignat Insarov
@kindaro:matrix.org
[m]
But, for example, a box plot is not a visualization. It is visualization after projection.
chreekat
@b:chreekat.net
[m]

Interesting! I see where you're going.

But is that actually useful? :) Datasets don't just exist to be manipulated, but to explain something useful about the real world. What I mean is that a tool to simplify creating a box plot out of a CSV file is more likely to build a user base than a general algebra over datasets

Ignat Insarov
@kindaro:matrix.org
[m]
I can easily build a tool that does something specific for Joe, Kim and Mary. But no amount of such tools will get Haskell to a place I want to put it in. We must show developers that they can develop, and then developers will show users that they can use.
Ignat Insarov
@kindaro:matrix.org
[m]

Bold claim: every widespread language has at least one widespread framework, in the wide sense of the word.

  • Python: Flask, PyTorch.
  • JavaScript: DOM API, React.
  • C: Linux, the POSIX standard.
  • R: the R repl.

I say success of a language is impossible unless hard decisions are already made for some practical area of expertise.

This does not imply causality of course. It is hard to establish causality in History. So, I find this observation persuasive — this is as good as it gets. If we are to make any decision, the decision to get a framework going is a good one.
chreekat
@b:chreekat.net
[m]
Haskell's "framework", so far, is geared towards language design
1 reply
Vincent Meade
@vmeade
Hello
Jaeyoung Lee
@jaeyounkg
Hi all, anyone still alive here?