Where communities thrive

  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
Repo info
    D Rajiv Lochan Patra
    Marc Vertes

    Hello everyone,

    It's a great pleasure for me to announce you that today we
    launch our skale platform, see

    It's in beta, it's still experimental, but it is online and ready
    to use right now.

    After creating an account, you will be able to deploy skale
    applications simply with skale deploy, and run it in the cloud
    with skale run.

    From the web interface, you will be able to schedule the execution
    of your app, or to scale it up (adding parallel workers).

    Do not hesitate to give us any feedback or suggestions.

    So let start this journey at , and enjoy.

    Thank you for your support,


    Marc Vertes

    w541:skale/dd> skale deploy
    deploy error: Error: Command failed: /bin/sh -c git remote remove skale; git remote add skale ""; git add -A .; git commit -m "automatic commit"; git push skale master

    * Please tell me who you are.


    git config --global ""
    git config --global "Your Name"

    to set your account's default identity.
    Omit --global to set the identity only in this repository.

    fatal: unable to auto-detect email address (got 'felix@w541.(none)')
    error: src refspec master does not match any.
    error: failed to push some refs to ''


    Sould I run the above git config commands on my laptop ? Should I use the same user email as I have used to create my skale account ?
    Marc Vertes
    Hello @philippe56, thanks for your report. It seems that git complain on your laptop because it was its first use. We should handle that. Otherwise, you should not need to use git directly (beside the init config that we will fix). When asked for login and password, provide the ones used on the website. Login/passwd is only required once.
    Thank you, skale deploy is working fine now. The deployed demo app runs and completes successfully on the remote infrastructure :)
    Marc Vertes
    cool, thanks :)
    @mvertes thanks! Success! I will check it for sure.
    Cosmetic remark: on my mozilla firefox 45.0 client, web page plain text is displayed in the same blue color as clickable links. Using a slightly different color for plain text would probably help.
    Marc Vertes
    Announce: skale-engine 0.6.2 has just landed. It adds dataset streaming to and from AWS S3, on the fly gzip/gunzip, bug fixes and performance improvements.
    Marc Vertes
    Thanks @philippe56 for your feedback, we will improve the page!
    Manfred Touron
    @mvertes @CedricArtigue, do you come at Paris Machine Learning meetup tonight ?
    Marc Vertes
    Marc Vertes
    Hi @moul, yes i will be there. @CedricArtigue not. See you there i hope
    Alexandre Vallette
    hello, i'm new to skale and I'm confused. I'm trying to parallelize a computation that takes an iterator as input and apply a function imported from a library. My first problem is that the objectStream doesn't take an iterator as input (or at least not the iterator given by the library I dodged the problem momentarily by converting it into array. Second when i try to apply a function inside the map, it says my function (levenshtein) is not defined.
    here is my code var cmb = Combinatorics.bigCombination(Object.keys(adverts), 2); // while(a = console.log(a); sc.parallelize(cmb.toArray()).forEach( function(x){ var distance = levenshtein.get(adverts[x[0]].texte, adverts[x[1]].texte); console.log(distance); }, function(){ sc.end(); });
    var cmb = Combinatorics.bigCombination(Object.keys(adverts), 2);
            // while(a = console.log(a);
                var distance = levenshtein.get(adverts[x[0]].texte, adverts[x[1]].texte);
    any help would be appreciated
    @CedricArtigue hope your tour is going well
    Marc Vertes
    Hello @valletea, the function that you want to map is from an external dependency which must be made available first in the worker context. This is possible but not exposed by the API yet. What you can try is to manually serialize the dependency by including it inside the mapper function. I will come back shortly with a better solution.
    NB. this week is a travel week, I will be off most of the time.
    Hi is skale-engine already used in production somewhere?
    Marc Vertes
    Hello @frankbaele, skale-engine is used and tested for several months under contract in some major e-commerce and adtech companies, for data preparation upstream of machine learning and business analytics. Not production yet, but stagging. For a couple of these companies, transition to production is already planned for the following weeks.
    ok thx
    have you guys played around with workers spinned up in docker containers?
    Marc Vertes
    Yes, all of our clusters are deployed using docker, including swarms. We can provide sample config files if required
    A sample config would be awesome, no rush :)
    Marc Vertes
    if you haven't noticed yet, sample docker files are available in skale-engine, as per @frankbaele request
    Hi everyone, I am a big fan of distributed systems and I like the idea of having a tool like Spark for js. Kudos to authors
    I'd like to know if anyone run some processing on bigger data, like terabytes 50T or 100T
    If you are interested I can provide 10K azure credits to test that
    Marc Vertes
    @artakvg , we used skale to process few terabytes datasets, split in compressed files of around 100 MB each (so ~ 2000 files in a S3 bucket , azure storage ok too). total uncompressed data around 10 TB, with a cluster of ~ 12 EC2 m4.2xlarge (8 cpu 32 GB Ram). Jobs mostly around aggregateByKey and coGroup variants, plus various map/reduce.
    @artakvg, thanks for your proposal, will followup by mail
    Aureliano Bergese
    hi there, I'm pretty new to skale. I'm working on a poc and I need to run cassandra query IN a forEach iteration, but I cannot have the cassandra-driver working INTO the worker. Anybody already had a similar problem?
    Marc Vertes
    Hello @auridevil, the worker needs to be modified for that. External dependencies which needs "require" can not be added yet in the worker. A cassandra connector needs to be developed. See hackers guide as a starting point, I can assist in doing it too.
    Marc Vertes
    Specifically the section "adding a new source", which would be the type of cassandra request connector, if i understand your need correctly
    Aureliano Bergese
    Ok, I'll have a look right now and report to you here. thank you for now!
    Aureliano Bergese
    well, not really, i need cassandra data to decorate working data, so I just need, for each row of my huge-data-file, to make a query on cassandra and have more information about it (classical 'select * from table where id=..) , in order to reduce the data to my needs.
    Cédric Artigue
    hi @auridevil, even if skale had a working connector for cassandraDB it may be a little bit too much to run a query for each entry of your source datafile, it may be more efficient to run a join on 2 key-value datasets where key is the id and value is the document from each source. After the join you can then apply a map to process joined data and generate final decorated documents. You could start prototyping this using a second file containing a cassandra dump. Ask me if you need help on this.
    Marc Vertes

    I'm happy to announce the release of skale-1.2.0

    This is a major feature relase. Install it with npm


    • Skale-engine is renamed to skale. Version is now 1.2.0, identical to 0.8.0.
    • Add a machine learning library with classification, regression, clustering
    • Allows dependencies to be deployed in workers with new routine sc.require(). This will ease considerably the integration of various connectors to data sources, databases, etc.
    • Major improvements to documentation website


    • The test suite has been fully reworked, and now uses individual files that can be executed separately
    • Tests are considerably faster and easier to develop and debug
    • Both standalone and distributed engine are now systematically tested
    • save(): now support output to CSV format
    • save(), textFile(): automatic forward of AWS env and credentials to workers
    • Workers: control garbage collect by command line option
    • Modernize javascript syntax
    • Continuous integration: add MacOSX target in addition to Linux and Windows


    • Fix a problem insample()
    • Fix support of undefined keys in aggregateByKey()
    • Fix debug traces


    P.S. Please note also that the gitter room has been renamed to skale!

    jose hilario
    hi!, i liked skale. but, i have been trying this
    var cursor = db.collection('clients').find();
    and objectStream don´t work for me
    thanks :)
    Aureliano Bergese
    heu guys, happy to hear new skale version, back on my little project
    there are some know issue about console.log? I'm triyn out the new engine and even the samples doesn't print nothing at all
    Kristjan Siimson
    It's been almost a year since last message here, is this project still alive?
    Kristjan Siimson
    I was trying to use hashtable, as I am dealing with some huge data structures, but I am not sure if it can be added, or how to add it to dependencies. The index.jsx uses this: var HashTable = require('./build/Release/native.node').HashTable; (
    @siimsoni Yes! skale is alive and kicking and getting very close to mainnet for more