Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Chris White
    @cicdw
    Hi @khmurakami ! Check out our First Steps section on "Tasks" here: https://docs.prefect.io/guide/getting_started/first-steps.html#tasks
    We are also working on a recommended "best practices" for task design / implementation, but that might not be out for a bit
    Laszlo Sragner
    @xLaszlo_twitter
    hi,
    I am having the following error when using DaskExecutor (otherwise no error)
    [2019-06-25 11:38:01,945] INFO - prefect.TaskRunner | Unexpected error: PicklingError('Cannot pickle files that do not map to an actual file',)
    if I pickle/cloudpickle the individual tasks there is no error
    Chris White
    @cicdw
    Hi @xLaszlo_twitter - the output of the tasks are also pickled during execution, it looks like one of your tasks is returning something which is not pickleable
    Laszlo Sragner
    @xLaszlo_twitter
    thanks, how do you know this from the error?
    Chris White
    @cicdw
    no problem! I can't say for sure without a reproducible example, but you said your tasks themselves were pickleable and that log comes from the TaskRunner, so I'm just making an educated guess
    Laszlo Sragner
    @xLaszlo_twitter
    cool, thanks again, I am testing the outputs now
    Laszlo Sragner
    @xLaszlo_twitter
    yeah, one of the outputs had an open file handler
    pickle's error message was better than cloudpickle's
    Josh
    @wilsojb
    Hi! I'm just getting started with Prefect and I've been working through some of the examples provided on github. I'm curious, is there a place in this chatroom for "getting started" help? I'm running into issues that the docs do not seem to support.
    Chris White
    @cicdw
    Hey @wilsojb ! The gitter room isn't very organized (like Slack or discord), so right now feel free to ask right here!
    Josh
    @wilsojb
    Sweet. Thanks @cicdw . I'm sure my issue is quite simple. I started with the write_tweets_to_airtable.py example, but stripped it down to the bare bones (almost hello world). Here it is.
    import prefect
    from prefect import Flow, Parameter, task
    
    
    @task
    def say_hello(name):
        print(f"Hello, {name}!")
    
    def main():
        with Flow("My First Flow") as flow:
            twitter_user = Parameter("twitter_user")
    
            # tweet_id = Parameter("tweet_id", required=False)
    
            say_hello(twitter_user)
    
        flow_context = dict()
        with prefect.context(flow_context):
            # run the flow
            state = flow.run(parameters=dict(
                twitter_user="Josh"
                # tweet_id="1046873748559814656"
            ))
    
    
    if __name__ == '__main__':
        main()
    Chris White
    @cicdw

    ok cool; I'm not 100% sure what the

       flow_context = dict()
        with prefect.context(flow_context):

    is trying to achieve, but otherwise this looks OK to me

    Josh
    @wilsojb
    Now this works, but if I add uncomment both lines related to the tweet_id Parameter, I get the following error,
    Traceback (most recent call last):
      File "first-flow.py", line 30, in <module>
        main()
      File "first-flow.py", line 25, in main
        tweet_id="1046873748559814656"
      File "/Users/josh/Envs/prefect/lib/python3.6/site-packages/prefect/core/flow.py", line 976, in run
        fmt_params
    ValueError: Flow.run received the following unexpected parameters: tweet_id
    Chris White
    @cicdw
    Ah OK
    Josh
    @wilsojb
    It appears I need to USE that Parameter somewhere
    Chris White
    @cicdw
    Yes I can see how this might be confusing -> yes that's exactly write. Simply instantiating a task won't add it to the Flow
    Josh
    @wilsojb
    If I add it to say_hello, then this works as expected. no more complains. but If I just decide - not to use it, then I get this
    Chris White
    @cicdw
    you have to either explicitly add it, or use it in another task
    so you could add it to your flow via:
    flow.add_task(tweet_id)
    and then your example should run, without updating say_hello
    Josh
    @wilsojb
    ahh sweet! that worked. thanks for your help. Anywhere in the docs I can jump to to learn more? This design pattern is a bit confusing to me.
    Chris White
    @cicdw
    anytime! Outside of the Getting Started Docs and Concept docs (which it sounds like you've used), there isn't another good document for explaining Task behavior like this. This is something that has come up before though, and so I've prioritized writing one that would cover this and other things
    I hope to have it out sometime next week
    Josh
    @wilsojb
    Awesome. sounds good. I see what my issue is. Didn't add it to the flow, how would flow know what tweet_id is when its passed this to one of its methods (i.e. flow.run)? Anyway. thanks for your help.
    Chris White
    @cicdw
    anytime! let me know if you come across any other issues
    Braunk
    @braunreyes
    hello everyone...it seems as though a lot of the documentation of geared toward running this workflow engine in the cloud product once its ready. Is there any reason to not just leverage the workflow syntax for simply running inside of an ephemeral container
    the task flow logic seems pretty easy to write and could easily be run from start to finish within a docker container that runs on a scheduled basis
    Chris White
    @cicdw
    Hi @braunreyes - you can definitely run your flow inside an ephemeral container; most people want the full power of the Cloud product because it includes a live-updating database of all task states, which then drives the web interface and also makes the system more robust generally (in case the process running the flow dies for anyone reason, for example)
    Running your Flow inside a container should work fine though, it just might not always be transparent to debug when things go wrong. However, given the limited availability of Cloud at this moment I think it's a reasonable way to go for the time being
    Braunk
    @braunreyes
    My use case is that I want to empower our data analysts with basic python skills to run basic workflows for running scripts, bash commands, and stuff
    i think its fine if they just have logs for what failed
    Chris White
    @cicdw
    Yea that makes sense - and if you decide you need more than that, feel free to shoot me an e-mail (chris at prefect.io) and we can set up some time to talk about our product timeline, demo the UI, etc.
    Joseph Abrahamson
    @tel
    @cicdw been a while since I was here, lots of life things! to reply to your question: I believe the snapshots should live locally much of the time, but sometimes in the cloud. I think in the general case I'm a little less concerned about build times (which is how I think about the benefits of cloud storage) and more about consistency and reproducibility which can all be handled locally
    Chris White
    @cicdw

    hey everyone! In case you haven't seen, by popular request we are going to migrate our public discussions to Slack for better history / organization: https://t.co/UyS12hy3mO

    I'll leave the gitter open for a while to reroute people there

    @/all ^^
    jasonjho
    @jasonjho
    @cicdw Regarding state and metadata (such as a log of Task runs), you mentioned a while back that Prefect Cloud will have this setup by default. If we wanted to roll our own DB, how custom is this kind of setup and is there any Airflow-like equivalent to setting up your "metadata db"?
    Braunk
    @braunreyes
    @cicdw slack invite is not working
    Braunk
    @braunreyes
    No it does not
    jamesn
    @jamesn43543455_twitter
    @cicdw if my tasks are purely executed in sequence, how could I avoid organizing these tasks in multi-thread way? Prefect seems to assign each task to a separate thead by default which is obviously complicated in my application.
    Chris White
    @cicdw
    Hi @jamesn43543455_twitter - use a LocalExecutor (https://docs.prefect.io/api/unreleased/engine/executors.html#localexecutor)
    also FYI we don't monitor this channel much anymore - for faster responses I'd recommend joining our Slack channel: https://join.slack.com/t/prefect-public/shared_invite/enQtNzE5OTU3OTQwNzc1LTQ5M2FkZmQzZjI0ODg1ZTBmOTc0ZjVjYWFjMWExZDAyYzBmYjVmMTE1NTQ1Y2IxZTllOTc4MmI3NzYxMDlhYWU
    Mike McCarty
    @mmccarty
    Hi all! Is there a way to run a Flow asynchronously?
    Chris White
    @cicdw
    Hi @mmccarty ! Sorry for the delayed response, we don't check the gitter channel as frequently anymore (we migrated to slack). When you say "run a Flow asynchronously", I assume you are referring to the individual tasks running asynchronously -- for this we recommend using the DaskExecutor which supports asynchronous execution (some relevant docs here: https://docs.prefect.io/guide/tutorials/dask-cluster.html), but let me know if I misunderstood your question!
    Mike McCarty
    @mmccarty
    Ah, I see the message above. I’ll join the slack channel