Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
  • Jan 31 2019 19:27
    rufuspollock commented #241
  • Dec 14 2018 15:27
    StephenAbbott opened #246
  • Dec 03 2018 09:12
    rufuspollock commented #245
  • Nov 26 2018 14:51
    StephenAbbott opened #245
  • Nov 08 2018 08:31
    zelima commented #243
  • Nov 08 2018 08:05
    zelima closed #244
  • Nov 08 2018 08:05
    zelima commented #244
  • Nov 08 2018 07:57
    zaneselvans commented #244
  • Nov 07 2018 07:22
    zelima commented #244
  • Nov 07 2018 07:16
    akariv commented #244
  • Nov 07 2018 07:10
    akariv commented #234
  • Nov 06 2018 16:56
    parrottsquawk commented #234
  • Nov 01 2018 13:25
    zelima commented #244
  • Nov 01 2018 13:25
    zelima commented #244
  • Nov 01 2018 13:23
    zelima commented #244
  • Nov 01 2018 08:29
    anuveyatsu commented #244
  • Nov 01 2018 08:29
    anuveyatsu commented #244
  • Oct 24 2018 19:03
    zaneselvans opened #244
  • Oct 23 2018 09:40
    geraldb starred datahq/datahub-qa
  • Oct 19 2018 08:22

    Branko-Dj on master

    [travis][s]: Added update comma… (compare)

Rufus Pollock
@rufuspollock
@Branko-Dj @adyork could we get a graph onto https://datahub.io/cryptocurrency/bitcoin asap?
Branko
@Branko-Dj
@rufuspollock I will create a graph for it
Rufus Pollock
@rufuspollock

@rufuspollock I mean a database that keeps references to that actual files. Some fields from the header that might be relevant to data aggregation in analyses etc. can also be saved for easy querying, but once you want the data itself (pixel data or a header field that's not included in the database schema), you have to read it from the DICOM file.

Sorry for the slow reply @ZviBaratz!

To answer your question I think you could a) pull out metadata and save into datapackage.json b) i’m understanding that you want to do specific post-processing on the data e.g. to generate all of the info for a particular scan. I’d be doing that with a separate workflow after storing the basic packages.

JavaScriptFamily
@JavaScriptFamily
Hi All,
Can you please help me, I have created a frictionless data package using data package for PHP
I want to validate this
Is there any online platform to validate my package
thanks
Anuar Ustayev
@anuveyatsu
@JavaScriptFamily Hi :wave: please, have a look at this page - https://datahub.io/tools/validate
Branko
@Branko-Dj
@rufuspollock the graph for bitcoin is now available
Raja Sahe
@rajasahe
I need Indian cities house prices dataset. How can i get that ? Any idea
Rufus Pollock
@rufuspollock
@rajasahe please email us at support@datahub.io
Stephen Abbott Pugh
@StephenAbbott
@rufuspollock Sincere apologies. I failed to follow up and flag the issue I raised with you here on September 12th. I have now raised an issue on datahub-qa as you suggested datahq/datahub-qa#245
Amber York
@adyork

@akariv , I have been looking through the dataflow tutorials that use custom functions and nested flows trying to figure out how I can use custom functions for one resource when there are many.

For example, I have a row processor:

def mycustomfcn(row):

What I want to do is this in a flow specifying one resource:
mycustomfcn(resouces='mclane_log') and be able to specify whether it is a package, row, or rows processor somehow.

Any tips?

I can also do stuff directly in the flow like this but again, I can't figure out how to specify one resource if there are many.

lambda row: dict(row, val=row['val']/5),

Irakli Mchedlishvili
@zelima
@adyork how about
def mycustomfcn(package):
    yield package.pkg
    resources = iter(package)

    for resource in resources:
         if resource.name == 'my-resource-name':
              # do stuff here Eg:
              yield filter(lambda row: (row['x']  in [1,2,3,4,5]),  resource)
        else:
              # Deque others
              yield resource
Amber York
@adyork
Thanks @zelima! I will try that.
Amber York
@adyork

Found a bit simpler way to get one resource:

resource = package.pkg.get_resource('seagrass')

Irakli Mchedlishvili
@zelima
:+1:
Rakesh Kumar Devalapally
@devalapa_gitlab
Hi, I am working on the temperatures dataset and I see an entry like this
France
France(Europe)
Can anyone explain the difference between these two?
Irakli Mchedlishvili
@zelima
@devalapa_gitlab can you please paste the link to the dataset?
Rakesh Kumar Devalapally
@devalapa_gitlab
Irakli Mchedlishvili
@zelima
@devalapa_gitlab data is coming from https://data.giss.nasa.gov/gistemp/ I believe you will find answer there
Rakesh Kumar Devalapally
@devalapa_gitlab
thank you
@zelima I will check it out
Irakli Mchedlishvili
@zelima
:+1:
Shrif Rai
@joyryder
hello brothers
fabirubiru
@fabirubiru
Hi everyone
I'm new on that and I wan to learn about Datahub, could someone helpme or shared any documentation about that?
Anuar Ustayev
@anuveyatsu
@joyryder Hi there!
@fabirubiru Hi! Sure, you can start here - http://datahub.io/docs
David Cottrell
@david-cottrell_gitlab
Is there a way to delete a datapackage? I ended up pushing a package called "datapackage", renamed it and repushed so now I have two. Have searched a lot but do not yet see how to delete.
Stephen Abbott Pugh
@StephenAbbott
Hi. I've been trying to install version 0.4.5 of the Data publishing app for MacOS but keep getting an error message. Is there a different version I should try? My laptop OS is MacOS High Sierra (version 10.13.6)
Rufus Pollock
@rufuspollock

@david-cottrell_gitlab

Is there a way to delete a datapackage? I ended up pushing a package called "datapackage", renamed it and repushed so now I have two. Have searched a lot but do not yet see how to delete.

You make it unpublished atm so no-one can see it - we are working on a purge type command but for now make it undeleted ...

Hi. I've been trying to install version 0.4.5 of the Data publishing app for MacOS but keep getting an error message. Is there a different version I should try? My laptop OS is MacOS High Sierra (version 10.13.6)

Can you give a bit more detail on the error message - and we can check that build :smile:

Stephen Abbott Pugh
@StephenAbbott

Can you give a bit more detail on the error message - and we can check that build :smile:

I've downloaded version 0.4.5 . When I open the application, it says 'Please wait, we are installing the CLI tool on this machine'. The install reaches 100% and then I get asked to update permissions on the downloaded CLI. I grant these permissions and then see an error message which just says 'Something went wrong while CLI tool update. We will try again automatically in 1 minute'. I've tried installing this version a few times now.

Rufus Pollock
@rufuspollock
@StephenAbbott ok - can you open an issue in github and we’ll look. In the meantime do you want to try installing the cli tool directly?
Stephen Abbott Pugh
@StephenAbbott
@rufuspollock I've opened an issue now datahq/datahub-qa#246 I'll try installing the CLI tool directly
Stephen Abbott Pugh
@StephenAbbott
Would anyone from the Datahub team be available tomorrow (Tuesday 18th December) for a conversation to help me resolve an issue I'm having with installing data as a command line tool? I'm hoping to use datahub.io to publish some data relating to an academic paper which is due for publication on Wednesday 19th December or shortly afterwards. Thanks
Rufus Pollock
@rufuspollock
@anuveyatsu could you connect with @StephenAbbott tomorrow (tuesday)?
Anuar Ustayev
@anuveyatsu
Hi @StephenAbbott I’m around today so let me know when you’re online :smile:
Stephen Abbott Pugh
@StephenAbbott
@anuveyatsu Thanks! Will DM you now
Stephen Abbott Pugh
@StephenAbbott
Thanks to @anuveyatsu, we've resolved the issue :thumbsup:
Anuar Ustayev
@anuveyatsu
@StephenAbbott :+1: and I’ve just updated the installation docs mentioning your issue :smile:
Zane Selvans
@zaneselvans
Does anyone here have favorite references for how one goes about testing data processing pipelines? We're using PyTest now, to run the entire ETL process, and then do a bunch of outputs... but it seems like a kludge, and it takes a long time to do the entire dataset, and it's mixing together testing the code and testing the data, and these tasks seem like they should be isolated from each other to the extent possible.
Rufus Pollock
@rufuspollock
@zaneselvans great question. If you are using dataflows you can test each test using pytest etc.
David Cottrell
@david-cottrell_gitlab
@zaneselvans are you testing pipeline frameworks or pipelines instances (data)
I would say do not use anything from the usual testing world