by

Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
  • May 20 05:04

    ocramz on gh-pages

    Add `sampling` (compare)

  • May 19 09:03

    ocramz on gh-pages

    Add kdt, Supervised Learning se… (compare)

  • Apr 14 01:32
    tonyday567 removed as member
  • Jan 30 07:37

    ocramz on gh-pages

    Add arrayfire (compare)

  • Jan 02 12:51

    ocramz on gh-pages

    add inliterate (compare)

  • Jan 02 12:43

    ocramz on gh-pages

    update hvega entry (compare)

  • Jul 01 2019 09:43
    dmvianna added as member
  • Jun 15 2019 04:55

    ocramz on gh-pages

    Add pcg-random (compare)

  • Jun 14 2019 16:08
    ocramz labeled #42
  • Jun 14 2019 16:08
    ocramz labeled #42
  • Jun 14 2019 16:08
    ocramz labeled #42
  • Jun 14 2019 16:08
    ocramz labeled #42
  • Jun 14 2019 16:08
    ocramz labeled #42
  • Jun 14 2019 16:08
    ocramz labeled #42
  • Jun 14 2019 16:08
    ocramz opened #42
  • Jun 14 2019 16:08
    ocramz opened #42
  • Jun 06 2019 18:21

    ocramz on gh-pages

    Fix graphite link Merge pull request #41 from alx… (compare)

  • Jun 06 2019 18:21
    ocramz closed #41
  • Jun 06 2019 18:21
    ocramz closed #41
  • Jun 06 2019 17:32
    alx741 opened #41
Austin Huang
@austinvhuang

@complyue pretty interesting case study!

If the initial goal is to obtain a qualitative understanding of the data, do those qualitative properties materially change working with downsampled versions? I'm probably not close enough to your use case, usually for me insight-oriented analyses of large datasets hit diminishing returns well before full dataset scans because the inferences (qualitative description or explicit parameter estimation) converge well before that. Capability-oriented ML models like language and vision are a different story of course...

On the topic of visualization i've always thought there should be a way to automatically do dimensionality reduction (like UMAP) for EDA for any arbitrary structured data by declaratively specifying a set of columns that could be heterogeneous in nature. Columns of numerical data should be automatically normalized, categorical variables should be automatically be run through categorical embeddings, etc. I haven't seen anyone outside commercial vendors tackle this though, although I've always felt it should be doable in a pretty general way with a bit of effort.

I'm probably not close enough to your use case to comment much usefully (also not sure what this "narrative" approach you mention refers to). Thanks for sharing though.

Austin Huang
@austinvhuang

@YPares yeah it's an interesting development, though I haven't had time to try it myself.

Given the amount of resource and adoption jupyter has, it's always felt way more stagnated than it should be as a technology. Maybe it's time for new players like this and Observable to come in.

Does it solve the reproducibility + in-memory state spaghetti of jupyter notebooks? If it does would be especially interested.

Compl Yue
@complyue
@austinvhuang Thanks for sharing your insights, very informative to me. I can't describe my thoughts on solid theoretical ground, but I feel sorta down-sampling is with the approach we're proceeding, we use a event system to adapt to the varying of significance each model w/ params contributes to the overall expectation, along the timeline, and present captured sum only with sufficient importance to human for estimation and possible analyze. or simply put, the vast majority of data, even with some parts meaningful, would render each other mutually plain noise. to find the boundaries of meaningful parts from the data is the most hard work I presume, and tailoring by importance should be effective in attacking this. ideally there should be only 1~2% data left from the dataset left for human analyze, we definitely need vis tools beyond there, but yet still need vis tools to arrive there.
Yves Parès
@YPares
Hello guys, just to say that the blog post presenting porcupine is up https://www.tweag.io/posts/2019-10-30-porcupine.html cc @mgajda @mmesch
@austinvhuang I don't know about Observable. I should check them out
I'm not sure Netflix has solved reproducibility of notebooks but they tout they've made a big step towards it anyway :)
Michał J. Gajda
@mgajda

@YPares Cool! Will read it.

By the way, did you see streamly? https://github.com/composewell/streamly
After a few posts on how to make streaming benchmarks: https://github.com/composewell/streaming-benchmarks

I wonder if we can make an improved Porcupine+Streamly pipeline platform that can replace Shake as well?
Yves Parès
@YPares
I invited you to https://gitter.im/tweag/porcupine to continue the discussion
Austin Huang
@austinvhuang

I've not taken a deep dive behind Observable. The creator of D3 Mike Bostock is one of the people behind it. It seems to solve the out-of-control notebook state problems way better than jupyter does, taking advantage of javascript's asynchronous aspects (cells = promises, disallow variable re-definition).

This stuff is far beyond jupyter IMO:

https://observablehq.com/collection/@observablehq/explorables

I like this one which is a technical topic many researchers even are not well versed in (but super important in ml):
https://observablehq.com/@tophtucker/theres-plenty-of-room-in-the-corners?collection=@observablehq/explorables

On the other hand, it's all javascript which would require a cultural sea change for it to take over as mainstream in the data science / machine learning world. I wouldn't put anything beyond javascript, but a lot of shifts would have to align.

Compl Yue
@complyue
wow, I'm astonished by observable notebooks 😲, regret not discovered it earlier, lucky to see here, thanks!!
they even achieved this far with bare metal js, make me feel that type-safety is less a concern at all aspects they did so well.
Compl Yue
@complyue
also the 1st time I get to know about the 'ball' concept, many thanks for sharing! @austinvhuang
Yves Parès
@YPares
Well all JS might even be an advantage for us :) ==> compile to JS
and this one: https://nextjournal.com/
both interesting approaches

iodide is based on webassembly
nextjournal has some functional programming ideas behind:

The good news is that I believe that it is possible to fix computational reproducibility by applying a principle of functional programming at the systems level: immutability (which is what we're doing at Nextjournal). I will show how applying immutability to code, data and the computational environment gives us a much better chance of keeping things runnable in a follow-up post.

Yves Parès
@YPares
"In Nextjournal, you can use multiple programming language runtimes together in a single notebook. Values can be exchanged between runtimes using files."
Files?? C'mon...
^^
show some love
Compl Yue
@complyue
I've not been using arrayfire, but love the idea, and appreciate their effort to make it viable.
Compl Yue
@complyue
for Haskell, I'd think a GHC backend spitting CUDA C would be even more sexy, but not seeing a clue it will come into being.
Bogdan Penkovsky
@masterdezign
Marco Z
@ocramz
@complyue that's what accelerate does, more or less
Compl Yue
@complyue
@ocramz thanks! this place is full of surprises. the only accelerate I knew before now is https://developer.apple.com/documentation/accelerate , now I know acceleratehs, too cool :D
Marco Z
@ocramz
@complyue :) yeah sorry for not elaborating earlier, I was on my mobile. http://hackage.haskell.org/package/accelerate does really a few things and I'm not too qualified for describing them in detail, but one of these used to be rewriting high-level Haskell array programs in terms of predefined Cuda C splices, which are then executed on the GPU. Nowadays IIUC this is subsumed by compiling to LLVM intermediate representation : https://hackage.haskell.org/package/accelerate-llvm-ptx
there are some actual experts lurking around here, such as the main author @tmcdonell and one of the core LLVM-Hs contributors @cocreature
there's also a whole gitter channel dedicated to AccelerateHS : https://gitter.im/AccelerateHS/Lobby
Austin Huang
@austinvhuang
@dmjio very nice! Have been interested to see how arrayfire was coming along.
Marco Z
@ocramz
@dmjio speaking of, I was looking at the API docs, and was puzzled by the use of unsafePerformIO in the wrapper functions e.g. : https://hackage.haskell.org/package/arrayfire-0.1.0.0/docs/src/ArrayFire.FFI.html#op3 ; wouldn't it be better to catch the FFI error codes and convert them into a Haskell sum type?
sure, it's possible to catch an AFException, but I think the "pure" return types in the main API are misleading
Doug Burke
@DougBurke
@complyue (and other people interested in large-scale data visualization in the browser); I just saw mention of https://github.com/uwdata/falcon from one of the Vega folks
Austin Huang
@austinvhuang

@DougBurke awesome! "initially loads reduced interactive resolutions, then progressively improves them" seems right.

if this can be abstracted and reused for arbitrary backend data + frontend consumers that would be quite useful.

Compl Yue
@complyue
@DougBurke @ocramz thanks for the info, I've suffered much behind the GFW for a few days, even with several vpn like solutions, networking to major sites barely connects. our govenor is insane to ban hard even tech sites, sigh!
Compl Yue
@complyue

looked at falcon, about the data size it handled:

10M flights in the browser and ~180M flights or ~1.7B stars when connected to OmniSciDB (formerly known as MapD)

make me remember that at a time about months ago, I'd had to increase chrome's heap size to around 12GB to visualize one of my dataset (with BokehJS frontend and golang backend), as even the 64-bit version of chrome has a default heap size limit around 3.5GB:

performance.memory.jsHeapSizeLimit/1024/1024/1024
3.501772880554199

seems they haven't hit this limit, while Bokeh already exceeded.

Yves Parès
@YPares
Hi guys, I and a few people started a room for pro/semi-pro Haskellers to discuss architecture of purely functional programs, mostly how to deal with effect composition frameworks (be it via raw monad stacks, free monads/applicatives, mtl, extensible-effects, polysemy, etc) which one is the best and interoperate between different libs written in different frameworks.
Its goal is to be "Effect composition for the rest of us": we want to have a focus on practical intermediate to advanced stuff (ie. remain accessible for those who don't know/want to know about coyoneda lemma or right-kan adjustments)
Ideal scenario: we determine the best effect composition pattern to rule them all and take over the pure FP world
Good enough scenario: we produce insights, use cases and material (or just discussions) about which way to best address effect composition in such specific scenarios and how to deal with interop between frameworks
Yves Parès
@YPares
You can join the public community room https://gitter.im/pure-fp-architects/community if you are interested, and just drop a line if you are interested. The actual room is invite only
Yves Parès
@YPares
(by "which one is the best" I meant "which one seems to be the most suited in specific contexts", I'm not intending to solve what the Haskell community as a whole has been struggling with for 20 years ^^, don't worry)
Austin Huang
@austinvhuang
@YPares do you do much ghc hacking or deeper dives into the RTS? or know anyone that might be useful to ask about the RTS?
Torsten Scholak
@tscholak
Hi, didn’t know about this channel, I see lots of familiar faces!
Tim Pierson
@o1lo01ol1o
:wave:
Marco Z
@ocramz
Hi @tscholak , welcome !
Austin Huang
@austinvhuang

Hi @tscholak!

Indeed... upside and downside of doing data science + machine learning in Haskell, you know everyone after about five minutes!

that's alright though, there's a strong selection bias for interesting people :)
Yves Parès
@YPares
@austinvhuang For my part, I never dabbed with GHC internals, but I think we have people at Tweag who did work on the RTS. I can ask around
Austin Huang
@austinvhuang
Thanks @YPares !
Compl Yue
@complyue
I almost succeeded in distributing custom GHC by source with Nix, but met an annoying problem, any tip is appreciated: NixOS/nixpkgs#73443
David Johnson
@dmjio
hey