Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
  • Jun 16 17:16

    NickSeagull on master

    Remove unused email (compare)

  • Jun 16 17:16
    NickSeagull closed #2
  • Jun 16 17:16
    NickSeagull closed #2
  • Jun 16 17:16
    NickSeagull commented #2
  • Jun 16 17:16
    NickSeagull commented #2
  • Mar 06 02:25
    dmvianna closed #33
  • Mar 06 02:25
    dmvianna closed #33
  • Feb 04 2021 22:49
    flak153 removed as member
  • May 20 2020 05:04

    ocramz on gh-pages

    Add `sampling` (compare)

  • May 19 2020 09:03

    ocramz on gh-pages

    Add kdt, Supervised Learning se… (compare)

  • Apr 14 2020 01:32
    tonyday567 removed as member
  • Jan 30 2020 07:37

    ocramz on gh-pages

    Add arrayfire (compare)

  • Jan 02 2020 12:51

    ocramz on gh-pages

    add inliterate (compare)

  • Jan 02 2020 12:43

    ocramz on gh-pages

    update hvega entry (compare)

  • Jul 01 2019 09:43
    dmvianna added as member
  • Jun 15 2019 04:55

    ocramz on gh-pages

    Add pcg-random (compare)

  • Jun 14 2019 16:08
    ocramz labeled #42
  • Jun 14 2019 16:08
    ocramz labeled #42
  • Jun 14 2019 16:08
    ocramz labeled #42
  • Jun 14 2019 16:08
    ocramz labeled #42
Man of Letters
@man_of_letters:mozilla.org
[m]
but after some more searching I've found yours on gitter and gitter has now moved to Matrix
but this group is not part of the Haskell space (and also not on the most popular matrix.org server, but on gitter server), so I couldn't find it: https://chat.mozilla.org/#/room/#haskell-space-meta:matrix.org
would you like to join the Haskell space on Matrix and become more discoverable?
I think it's enough to ask on the link above
I'm guessing the link to give the Haskell space admins is something like https://app.element.io/#/room/#dataHaskell_Lobby:gitter.im
Man of Letters
@man_of_letters:mozilla.org
[m]
BTW, the SPJ's AD talk is now at https://www.youtube.com/watch?v=q3BOJ8gK29M
purepani
@purepani:matrix.org
[m]
Hi! I am relatively new to looking into data science in haskell and I was currently wondering the current state of things of this group
Marco Z
@ocramz
hi there! things have been pretty quiet here for a while. We lost early momentum, mission was/is unclear, people are busy :)
@purepani your best bet is the "current environment" page, which is a directory of related packages
Marco Zocca
@ocramz_:matrix.org
[m]
hi from Matrix o/
Man of Letters
@man_of_letters:mozilla.org
[m]
if nobody objects I will ask admins to add us to the Haskell space on Matrix in a couple of days
Man of Letters
@man_of_letters:mozilla.org
[m]
done
🎉
we are now in Haskell space on Matrix
I'm being told "dataHaskell_Lobby might like to add a topic"
sam, stites: you are listed as admins, so perhaps you can do that (not sure about others)?
Marco Zocca
@ocramz_:matrix.org
[m]
👋
Man of Letters: I haven't seen Sam around here in a long time. I know he's doing a PhD at Northeastern, so that might explain it :)
Man of Letters
@man_of_letters:mozilla.org
[m]
oh dear ;)
Ignat Insarov
@kindaro:matrix.org
[m]
Hello gurus.
I was wondering if there is some sort of a standard set of types for doing data stuff?
So that different packages can talk to each other?

For example, these three packages have completely different typing:

https://hackage.haskell.org/package/kmeans-0.1.3/docs/Data-KMeans.html
https://hackage.haskell.org/package/clustering-0.4.1/docs/AI-Clustering-KMeans.html
https://hackage.haskell.org/package/roc-cluster-0.1.0.0/docs/Data-Cluster-ROC.html

Since they all do more or less the same thing, stands to reason that they should have compatible typing.

Ignat Insarov
@kindaro:matrix.org
[m]
There are other packages for clustering on Hackage — I have been looking at this topic some time ago. Some have lists, some have vectors, some have custom data types.
Is there some way we can get to talk about the common interfaces or frameworks?
chreekat
@b:chreekat.net
[m]

Hi!

I don't try to unify types - each domain has its specific needs which are best documented as individual types. Instead, use functions as the interface between libraries.

Types are cheap
Ignat Insarov
@kindaro:matrix.org
[m]
Another example is Vega, the graphing library written I think in JavaScript? Some sort of bindings to it were posted here a while ago and I tried them out. I do not recall the details, but you are basically asked to construct a Json blob.
Ideally I want to get some clustering and send it to a state of the art graphing front end in one line.
I understand and anticipate the position that open source should organize itself. But I do not believe in it. Yes, there are some gains to be had from domain specific types. Maybe one kind of clustering needs lists and another needs vectors. But there are also gains from coöperation.
I want to coöperate. Either I can go to Python or R, where the open source process had time and money to play out. Or I can try to coöperate with Haskell. But there is no option to stay in Haskell and do everything by myself.
I think the community should think carefully and roll out a unified understanding of what statistics in Haskell should look like.
Ignat Insarov
@kindaro:matrix.org
[m]

For example, there is Frames.

https://github.com/acowley/Frames

Is it good? If it is good, why is it not everywhere? I cannot afford to learn on my own experience. I can spend an evening, maybe a weekend, but it is not a safe bet. R is a safe bet — they have their unified typing that everything works with.

Consider also that I am a highly proficient Haskell programmer. I can make sense of the source code, I have a good understanding of how vectors are different from lists and so on. Not to say that I have some special arcane knowledge, but there is no way an average person will spend time figuring out which types work best for their specific problem. People want stuff to work well out of the box, and rightly so.
chreekat
@b:chreekat.net
[m]
I, uh, wasn't talking about open source philosophy. I'm sorry there's no good library or framework in haskell for what you want to do right now. A unified understanding can't really compete with decades of use by countless users. My point about individual types still stands, but I know that doesn't really help if the community isn't there
I'm not even a data scientist, I just ended up here by accident 😬
chreekat
@b:chreekat.net
[m]
Oh hey, that makes me remember something. Is there anybody out there who has tried to use Haskell on terabytes of memory? I don't do data science myself, but I do devops on (Java) software that does supply chain analysis on huge machines using single digit terabyte heap size, and it makes me curious what that would look like with Haskell
Ignat Insarov
@kindaro:matrix.org
[m]
How about we build a good library or framework?
Ignat Insarov
@kindaro:matrix.org
[m]
Say what. If we can give people good input and output based on solid types, everyone will be incentivized to write to these types.
So, for example, if we can send a CSV file through type X into graphing in one line, people will happily work with X because they will start from this.
chreekat
@b:chreekat.net
[m]
I agree! Too often Haskell libraries are clearly designed to prove some generality (valuable academic work) rather than designed to get some job done. Start with some key tasks a person wants to accomplish and build the library around that. Types can follow the purpose
Ignat Insarov
@kindaro:matrix.org
[m]
Maybe we can start with the abstraction of a data set. This can be a directory of CSV files or an SQL data base. What is the appropriate type? Not an easy question already.
chreekat
@b:chreekat.net
[m]

What is the simplest thing that can be done with a data set that is still useful?

I immediately think of https://datasette.io/

Ignat Insarov
@kindaro:matrix.org
[m]
To glue two data sets together is one simple thing I can think of. Like, a canonical data set monoid.
Ignat Insarov
@kindaro:matrix.org
[m]
There is also visualization. In principle every data set is isomorphous to its visualization and its storage.
1 reply
There are also projections — destructive operations like an average of a column or a correlation between columns.
Finally, there are generative operations that create «fake» data sets from a bunch of parameters. They may be either various noises or carefully derived predictions.
There is also distance between data sets of the same type that lets us do validation.
Ignat Insarov
@kindaro:matrix.org
[m]
So, we have read → project → generate → validate → visualize.
Suppose we split the data set into 10 random slices. Do we now have a data set of data sets?