S3 datastore operation _get_s3_object failed (An error occurred (400) when calling the HeadObject operation: Bad Request). Retrying 7 more times..
Hi all, I have a question regarding the inheritance of the Flow.
We have an algorithm that has multiple versions (v1.0, v1.1 v1.2, etc.). Mostly these versions are similar to the primary version. So we only want to override some steps. When I checked if Metaflow supports it, I saw Netflix/metaflow#245 this ticket.
it looks like we have 2 options to implement this;
Do you think which one is the best choice in the Metaflow ecosystem? or different ideas about this?
Internally we export our flows to Meson (Netflix's workflow orchestrator) and shortly we are going to release a similar integration with AWS Step Functions - Netflix/metaflow#2
Hi @savingoyal when can we hope for this feature? I have seen the google docs for this and we are excited and keen to try this feature out. Also is there a nightly-build we can access?
Hi all, I am having some trouble with
FlowSpec.merge_artifacts. I have a scatter-join sequence of steps that looks like this:
@step def set_up(self): self.foreach_tuple = TUPLE_OF_VALUES_TO_PROCESS self.next(self.process, foreach='foreach_tuple') @step def process(self): self.value = self.input # (process self.value, not touching self.foreach_tuple) self.next(join_step) @step def join_step(self, inputs): self.merge_artifacts(inputs)
Running my code results in a
MergeArtifactsException due to
input.foreach_tuple having a different value in each
inputs. This despite the fact that I haven't touched
foreach_tuple since assignment in
The odd thing is that when I look at
FlowSpec.merge_artifacts, the values of
input._datastore['foreach_tuple'] are the same for each
inputs, as expected. However, the SHAs (as accessed via
input._datastore.items()) are all different.
Any idea about what may be causing the SHAs too all differ (triggering the
MergeArtifactsException) while the values are all the same?
Thanks in advance.
Hi, we're running into an issue where the AWS Batch job will succeed but the flow won't proceed to subsequent steps. It looks like it's hanging waiting for the Batch job to complete even though it already has. Any ideas as to what might be going on here? We're just running the
00-helloworld flow with a
@batch decorator for the
hello step. The logs are as below:
2020-07-21 11:36:24.118 Workflow starting (run-id 23): 2020-07-21 11:36:24.182 [23/start/65 (pid 19754)] Task is starting. 2020-07-21 11:36:25.281 [23/start/65 (pid 19754)] HelloFlow is starting. 2020-07-21 11:36:25.609 [23/start/65 (pid 19754)] Task finished successfully. 2020-07-21 11:36:25.847 [23/hello/66 (pid 19811)] Task is starting. 2020-07-21 11:36:26.597 [23/hello/66 (pid 19811)] [5c92ef5f-6525-4b76-8e4e-96d75049d087] Task is starting (status SUBMITTED)... 2020-07-21 11:36:29.747 [23/hello/66 (pid 19811)] [5c92ef5f-6525-4b76-8e4e-96d75049d087] Task is starting (status RUNNABLE)... 2020-07-21 11:36:59.777 [23/hello/66 (pid 19811)] [5c92ef5f-6525-4b76-8e4e-96d75049d087] Task is starting (status RUNNABLE)... 2020-07-21 11:37:29.831 [23/hello/66 (pid 19811)] [5c92ef5f-6525-4b76-8e4e-96d75049d087] Task is starting (status RUNNABLE)... 2020-07-21 11:38:00.027 [23/hello/66 (pid 19811)] [5c92ef5f-6525-4b76-8e4e-96d75049d087] Task is starting (status RUNNABLE)... 2020-07-21 11:38:04.489 [23/hello/66 (pid 19811)] [5c92ef5f-6525-4b76-8e4e-96d75049d087] Task is starting (status STARTING)... 2020-07-21 11:38:30.738 [23/hello/66 (pid 19811)] [5c92ef5f-6525-4b76-8e4e-96d75049d087] Task is starting (status RUNNING)... 2020-07-21 11:53:04.709 1 tasks are running: e.g. .... 2020-07-21 11:53:04.709 0 tasks are waiting in the queue. 2020-07-21 11:53:04.709 0 steps are pending: e.g. ....
Any help would be appreciated. Thanks.
Hi, I just had a quick question about debugging a metaflow run. It seems like there's some sort of buffering to
stdout taking place. If I open a console in debug mode and execute
print('foo\nbar'), the system will only print
foo to the console. I'll have to print something else to see
Any idea how I might find a workaround for this issue? It gets triggered if you try to print the contends of a pandas DataFrame, which can be a real pain when you're debugging
METAFLOW_CLIENT_CACHE_PATHin the codebase but providing that as an env var doesn't seem to be doing anything. Any suggestions? Thanks.
also we published a new Administrators Guide to Metaflow, which should be of interest for many people here https://admin-docs.metaflow.org/
The admin guide was inspired by your questions and comments here, so a huge thanks to everyone! I hope you will find it useful. Please give feedback especially if you notice something missing or misrepresented.
Hi I have a question running a project using Metaflow on AWS. How does metaflow on batch handle module imports of other files that are required by the python script that is being run using --with batch?
If for instance, in the tutorial example,
python 02-statistics/stats.py --with batch run --max-workers 4,
"stats.py" had a module import such as 'import moduleX', and the functions in moduleX are used by stats.py, how does batch handle the dependency on moduleX as this would typically be on my local filesystem and the --with batch only runs stats.py on batch
In my project I have a library of modules and helper functions that would be called within the script that is run as a metaflow DAG. I'm unclear how I can make the other modules available when running on batch when they are not in the file that defines the metaflow DAG tasks
Hi, I'm struggling to work out how to use images not on dockerhub, e.g. I'd like to use images from AWS ECR.
metaflow configure aws hints that if
METAFLOW_BATCH_CONTAINER_REGISTRY is not set then "https://hub.docker.com/" is used; however if I explicitly set it to that value then I get
AWS Batch error:,
CannotPullContainerError: invalid reference format This could be a transient error. Use @retry to retry.
edit: everything works normally if I don't configure
Hi, has anyone gotten the conda dependency management to work with pycharm and be able to debug?
I am running it with a miniconda interpreter and it seems to start fine, but never actually starts executing anything in the flow. However I don't really get any error either. If I run it from my command line it works just fine. Runing this on OSX
runfile('/Users/maxpagel/Developer/infrastructure/TimescaleDB/Scripts/metaFlowPlayground.py', args=['--environment=conda', 'run'], wdir='/Users/maxpagel/Developer/infrastructure/TimescaleDB/Scripts')
Metaflow 2.1.1 executing BranchFlow for user:maxpagel
Validating your flow...
The graph looks good!
Process finished with exit code 1
Hey guys, I'm on a fresh Metaflow installation and can successfully run locally. Attempting to run the tutorial
05-helloaws and met with
An error occurred (ClientException) when calling the SubmitJob operation: JobQueue [arn] not found
Any advice on how this JobQueue could have not been stood up when I used the Cloud Formation file provided?
I can even see the
CREATE_COMPLETE Status in the
Resources tab in the
CloudFormation>Stacks page on the AWS UI.
Hi all, I have a question regarding the Metaflow Parameter.
I want to add a parameter like this;
dry_run = Parameter('dry_run', default=False)
When I add this parameter I need to send as
--dry_run=False. But I want to use as
So when I changed parameter name as
dry-run, I realized that this doesn't work. (dry_run = Parameter('dry-run', default=False))
Do you think is this bug? or is
dry-run reserved parameter?
join = TRUEarguments, but Metaflow complains about an incorrect number of steps. I feel like I'm missing something conceptual about how the branching works here. Can anyone help me? I'll put my attempt in a reply because it's quite lengthy.