by

Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
  • May 20 05:04

    ocramz on gh-pages

    Add `sampling` (compare)

  • May 19 09:03

    ocramz on gh-pages

    Add kdt, Supervised Learning se… (compare)

  • Apr 14 01:32
    tonyday567 removed as member
  • Jan 30 07:37

    ocramz on gh-pages

    Add arrayfire (compare)

  • Jan 02 12:51

    ocramz on gh-pages

    add inliterate (compare)

  • Jan 02 12:43

    ocramz on gh-pages

    update hvega entry (compare)

  • Jul 01 2019 09:43
    dmvianna added as member
  • Jun 15 2019 04:55

    ocramz on gh-pages

    Add pcg-random (compare)

  • Jun 14 2019 16:08
    ocramz labeled #42
  • Jun 14 2019 16:08
    ocramz labeled #42
  • Jun 14 2019 16:08
    ocramz labeled #42
  • Jun 14 2019 16:08
    ocramz labeled #42
  • Jun 14 2019 16:08
    ocramz labeled #42
  • Jun 14 2019 16:08
    ocramz labeled #42
  • Jun 14 2019 16:08
    ocramz opened #42
  • Jun 14 2019 16:08
    ocramz opened #42
  • Jun 06 2019 18:21

    ocramz on gh-pages

    Fix graphite link Merge pull request #41 from alx… (compare)

  • Jun 06 2019 18:21
    ocramz closed #41
  • Jun 06 2019 18:21
    ocramz closed #41
  • Jun 06 2019 17:32
    alx741 opened #41
Compl Yue
@complyue
and Haskell code inside the project alter the web UI by sending text packets in json format back through the websocket conection, see https://github.com/complyue/hadui/blob/b1924e2be62a8256dc719b6b14c965b6da726041/hadui/src/HaduiRT.hs#L117 and the json cmd is interpreted at web page here: https://github.com/complyue/hadui/blob/b1924e2be62a8256dc719b6b14c965b6da726041/hadui/web/wsc.js#L122
Compl Yue
@complyue

I did some updates to the home page, added rating examples and brief:
Orientation
hadui is data science oriented, it is not suitable as a general purpose web framework.

all exported functions from all modules in the stack project of matter, are exposed to frontend in a flat space. this is ideal to support analytical workflows, but overly open or even prohibitive to support business workflows.

Platform Support
macOS - regularly used on Mojave
Linux - regularly used on Ubuntu 18.04
Windows - should work in Docker in theory, not attempted yet
GHC
currently you have to be comfortable to compile yourself an experimental version of GHC 8.6.5 with :frontend cmd to start using hadui.

do this trick to incorporate it into your stack's GHC installation.

the mod to GHC is very light, should be no difficulty to migrate to other GHC versions, but as time being, not attempted yet. a MR to GHC is thought of but not carried out yet.

Tony Day
@tonyday567
@complyue looked promising until I read the "experimental" ghc compilation step. gulp Is this a significant technical workaround or for development convenience?
Tony Day
@tonyday567
I'm currently playing around with interactive charts, using websockets via javascript-bridge. A natural development direction would be to hook this up to ghci. https://github.com/tonyday567/chart-svg
Compl Yue
@complyue
@tonyday567 the experiment with ghc is minimalism, the purpose is to delegate the initialization of GHC api state to stack ghci, so easily the ecosystem of haskellstack is right available to hadui. that said, I don't think data analysts could be afraid to use a custom compiled GHC (with SIMD vectorization optimized e.g.) after all, that's one of the greatest things with open sourced tool stack.
Stefan Dresselhaus
@Drezil
@complyue depends on how you can set things up. If you just have a "download, unzip & run"-solution there would be no friction of adoption. If people need to clone, patch & compile themselves then i don't think that will be a way to go..
Compl Yue
@complyue
@Drezil can't agree more, ultimately hadui should just work out-of-the-box with stack build --exec hadui in any new stack project. but I'm not hurry toward that before it's prepared for wider audience. the most critical issue is the arbitrary-code-execution vulnerability for now, and I don't think it's realism to have a GHC minor release on 8.6 branch even I submit a MR. GHC 8.8 is still out of stack lts right now, I'd think of a MR to 8.8 branch once stack lts embraces it.
Compl Yue
@complyue
stack itself has not matured the custom GHC instance feature enough, or I could use that to package a custom GHC package for easy get. all need just time I'd think ;)
Stefan Dresselhaus
@Drezil
why not put a MR into ghc 8.8 before stack embraces it?
for the latter one: hvr has his GHC-ppa .. so it should be easily doable for the common ubuntu-folks.. https://launchpad.net/~hvr/%2Barchive/ubuntu/ghc
Compl Yue
@complyue
um, I'd like to stick with stack, for all my team to use just stable features of it. and I'm prioritizing works at hand acroding to internal needs so so.
Stefan Dresselhaus
@Drezil
ah, ok :)
Compl Yue
@complyue
btw, I've haskelled not long, does not many ones compiling GHC for themselves? when I think of it, what's in my mind is tensorflow, there you almost always compile it yourself as the stock release is way to conservative about hardware requirements, you have to compile with sse4.2 avx etc enabled to not waste capabilities on decent CPUs
Stefan Dresselhaus
@Drezil
not really .. as with tensorflow you don't care about ssse4 & avx if you use a graphics-card anyway ...
most is: pip install tensorflow & use it.
at least in my experience..
might be different if you have a big, dedicated team at a big corporation .. but most of my experiences are in science (students to post-docs that are happy when it just works(tm)) and in small businesses where you just have 1-2 people doing ML
Compl Yue
@complyue
okay, my team is less than 10p, but have to spin up dozens of rack servers at constant crunching, so that's not typical :)
Guillaume Desforges
@GuillaumeDesforges
Hey everybody!
I was thinking about writing some sort of cached pipeline for my Machine Learning experimentation workflow. For the mechanism I am thinking about, I need to compare the running function plus inputs with previously ran functions and inputs. I was thinking about comparing hashes.
Would it be possible to get the hash of a piece of source code (at compile time I guess)?
Yves Parès
@YPares
@GuillaumeDesforges Hi! Hashing the source code would be difficult to do without some template haskell trickery. http://hackage.haskell.org/package/funflow does what you want, and it just requires you to update a salt whenever you modify your code and want to invalidate your cache
Tony Day
@tonyday567
@complyue Is hadui like a ghci replacement, similar to intero, say, with interactive custom commands, or is it more fundamentally a different way to use GHC? Or something else?
Compl Yue
@complyue
@tonyday567 my own usecase is to have vscode+hie open developing all code of the stack project, with hadui-dev run as the ide's build task (it keeps running all the time, when runtime errors occur, code), then a browser window open hadui page to enter interactive code, mostly parameter scripts to trigger plotting in the web browser.
Compl Yue
@complyue
hadui-dev will print source location to the ide console when runtime error occurs, so you just click the link to navi to source in vscode. i'm just now working on the error formatting code, and found non-trival to extract useful info from runtime errors with ghc :/
Compl Yue
@complyue
planned hadui-dap to support breakpoint, single stepping, var reveal etc in vscode to debug hadui project, while no tight schedule for that.
Compl Yue
@complyue
Surprise! it may be painless for you to try out hadui right now. I was wrong to recognize stack's custom GHC instance feature as immature, that coz I'd built GHC with its new hadrian method, I'm amazed to find out that stack actually works pretty well with the bindist of GHC built with good old make.
so you are 3 cmds away from a running hadui-demo on your machine, given you are on decent macOS or Linux. please let me know whether it succeeded or not if you ever tried. https://github.com/complyue/hadui-demo
Marco Z
@ocramz
@complyue trying the demo (osx mavericks), but my system doesn't have xz it seems
this project is a very cool idea, my only concern (pretty much like @tonyday567 ) is with the custom ghc build. Congratulations for getting it all to work together though!
Compl Yue
@complyue
@ocramz thanks! is stack working to install stock GHC on your system? if so I think maybe it's distributing in other compression format than xz, and maybe I can re-pack the bindist to make it work.
Marco Z
@ocramz
yep I routinely use stack :)
Compl Yue
@complyue
okay, found
Compl Yue
@complyue
@ocramz I've updated hadui-demo to use bz2, please pull and try build again
I'm away from my mac, it's packed on linux, not quite sure it will work out but I think very prolly.
Austin Huang
@austinvhuang
@complyue any reason for bokeh vs vega?
Compl Yue
@complyue
@austinvhuang thanks for pointing! we used to be python centric, so ignored vega in the 1st place, you just reminded me that we've drifted off python ecosystem, so vega is an option now :)
Compl Yue
@complyue
@austinvhuang a quick refresh, I think we'll stay with bokeh coz its acceptable lags in visualizing data points at an order of millions, due to its design to render with WebGL by default, https://www.anaconda.com/python-data-visualization-2018-why-so-many-libraries/ check out the 'Data Size' section there. bokeh has long been battle tested with us in this regard.
another killer feature of bokeh for us is this: https://docs.bokeh.org/en/latest/docs/user_guide/interaction/linking.html#userguide-interaction-linking we usually have a few, sometimes up to 30 figures shown, with their x axis or both x+y linked for zoom/pan/selection . I havn't tried hard enough with other frameworks to implement this effect, but bokeh just works.
Doug Burke
@DougBurke
@complyue Vega can do liked views for pan, zoom, and selection - e.g. see http://hackage.haskell.org/package/hvega-0.4.1.1/docs/Graphics-Vega-Tutorials-VegaLite.html#g:29 - but I have not tried it out on very-large datasets (my guess is that it isn't optimised for this use case).
Compl Yue
@complyue
@DougBurke yeah, this feature seems on a par. you even made it work with IHaskell 👍, I wish I had dug harder in stackage/hackage ;-)
um, d3 based visualization all comes at a bottleneck of data size lower than WebGL based ones, I hit the wall 2~3 years ago, and bcoz of python, have been stuck with bokeh all along.
Isaac Shapira
@fresheyeball_gitlab
Howdy!
I am here to leeeeaaaarn!
Yves Parès
@YPares
Hi!
We are here to teeeeaaaaaaach!
(within the limits of the reasonable)
Isaac Shapira
@fresheyeball_gitlab
@YPares many sauces of awesome
Austin Huang
@austinvhuang

@complyue vega is supported by python in the form of altair bindings https://altair-viz.github.io/ use it all the time when working with python!

Once you get to million datapoints, I tend to lean towards bespoke apps that either serve data on-demand or expose data at the right level of granularity (google maps style). By the time one is dealing with > 30k datapoints, you're either thinking of the data in the form of a density, or inspecting points in a local region of the data space. But I do get there's something nice about a framework that takes care of this for you without building from scratch.

i'm a big fan of crossfiltering/linking concepts as well. There's probably room in the DS ecosystem for a rshiny killer with crossfiltering as a basis.

Austin Huang
@austinvhuang
welcome @fresheyeball_gitlab !
Compl Yue
@complyue

@austinvhuang at the very early stage when choosing a vis kit (years ago), I intentionally avoided declarative plotting tools, i.e. echarts, plotly etc. I decided that later interaction with the origin data source is important, incremental updates to the chart would be always on the way, I had been thinking the implementation of stock k-charts being updated in realtime at that time. while bokeh fits in pretty well of this idea. but today I'd say that's not that important.

wrt data size as the problem for me, my team is not particularly strong at data modeling, they need to see sth before capture sth meaningful from the data, then start informed analysis. I developed a home brew array database, that mmap an entire dataset (sizing from 20GB ~ 300GB) into each computing node's memory address space, each node with typical 144GB RAM, it's trivial for a backend process to fully scan the dataset by means of memory reads. repeated scans are perfectly cached by the os's kernel page and shared by all processes on that node, so only the 1st scan on a node needs to consume bandwidth to the storage server to fill its kernel page cache. so throughput of massive data is really cheap in my env.

at my hand now is the problem of efficiency in data analyzing to solve. I identified it as the under capability for my analysts to describe the data well enough with what they've got. I'm investigating into some sorta of narrative methods to do data description, leading them to start by telling what they'd like to see, then in order to see that, what's needed, and so on, hopefully finally to land in what data we actually have. I started haskelling for exactly this purpose, in finding a proper DSL to establish the communication.

so far the DSL is not as ideal, as free monad seems unacceptable performance killer, we have to stay with mtl style, simple transformers or even vanilla monad. and I actually found my direction points to massive concurrent events simulation to achieve the narrative style data description, tho haskell seems pretty good at handling concurrency and parallelism, I've found no reference implementation for my idea.

Compl Yue
@complyue
I'd recognize visualization in my scenario as less hypothetical showcasing but more blind data exploration, where more data we see in the first place, the more meaningful the clues can be extracted, with fixed brain power/capability we have in the team.