Hey community,
I am having issues with importing local modules.
I have no idea why this behaviour appears to be unique to metaflow.
I have tried the package list
command to make sure everything I expect is being packaged up.
My project structure is as so:
.
├── README.md
├── python
│ ├── db
│ │ ├── __init__.py
│ │ ├── database_adapter.py
│ │ └── database_connector.py
│ ├── sectional_model_flow.py
│ ├── __init__.py
├── requirements.txt
└── tests
└── unit
├── aws
│ ├── test_ssm.py
│ └── test_sts.py
└── db
├── test_database_adapter.py
└── test_database_connector.py
I tried this way because metaflow doesnt like me importing say from sectional_model_flow like so from python.db.database_connector import DatabaseConnector
it gives me a ModuleNotFoundError.
So I tried following the instructions here (Netflix/metaflow#175), and this allowed it to run, but now my tests can't run because the import above would change to from db.database_connector import DatabaseConnector
What am I missing, why does this behave so differently to normal python implementations?
Hi, I have a question about Metaflow RDS DB migration.
I set up the Metaflow CloudFormation stack in a number AWS accounts some time ago.
Now I want to update the CF template and enable encryption for the RDS DB.
Since this change will replace the DB, I want to safely migrate the data to new DB instances.
Here's my migration plan: 1) deploy new Metaflow stack, 2) migrate data from old DB to new DB with pg_dump and pg_restore, 3) recreate .metaflowconfig files, and 4) delete old stack.
How can I be sure that my older deployments of Metaflow will have the same tables/schema as newer deployments?
Looking at the docs here, I'm not sure how to proceed.
Hey everyone. Metaflow is awesome. I have a question, sorry if it's recurring. I'd like to open a connection in a database during the initialization phase of my class that inherits FlowSpec. I tried something like
class MyFlow(FlowSpec):
def __init__(self):
self.qdb = querydb.QueryDB()
FlowSpec.__init__(self)
...
But when I run the flow I get an internal error. Any hints?
Hm, when I wrote it as:
def __init__(self):
self.qdb = querydb.QueryDB()
super(MyFlow, self).__init__()
I get the more informative error:
2021-10-31 21:19:58.767 [161/start/2407 (pid 15355)] blob = pickle.dumps(obj, protocol=2)
2021-10-31 21:19:58.767 [161/start/2407 (pid 15355)] TypeError: cannot pickle 'psycopg2.extensions.connection' object
2021-10-31 21:19:58.767 [161/start/2407 (pid 15355)]
Hi everybody, I'm new to Metaflow and running into a few efficiency issues: I'm currently experimenting with Metaflow+batch to compute (PyTorch model) embeddings for datasets in the 10-100MM range where each such dataset is a S3 bucket full of JPGs (this input data format is a given as it's driven by customers).
Q: What is best practice here to efficiently implement the data loading logic? The datasets are too big to store locally, and i thought one of the tricks here is to directly stream from S3? But any of Metaflow's S3.get functions it will make a local copy first right?
Q: More precisely: Currently I have a written PyTorch Dataset that makes a single S3.get call for each s3uri (in the getitem function), but this is slow (which makes sense). Any leads about where going next?
(I would be happy to see a snippet about doing large scale inference (not training) using Metaflow + PyTorch)
from metaflow import S3
with S3(s3root='s3://my-bucket/savin/tmp/s3demo/') as s3:
s3.get_many(['fruit', 'animal'])
to fetch objects from S3. I'm wondering if there are any examples of how to effectively do this in a pytorch dataloader with multiple workers? There's a lot of multiprocesing baked into both libraries and I'm curious if there's a tutorial / best practice for how to do this with a pytorch dataloader? Thanks!
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='metaf-albui-16iw6xoay172l-1453067463.us-east-1.elb.amazonaws.com', port=80): Max retries exceeded with url: /flows/ContentAgnosticModelBuilder (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fa74192e8d0>: Failed to establish a new connection: [Errno 110] Connection timed out'))
PermissionError: [Errno 13] Permission denied: '/app'
error when trying to log artifact using my docker mlflow server
FROM python:3-slim
WORKDIR /mlflow/
RUN pip install --no-cache-dir mlflow==1.23.1
EXPOSE 5000
ENV BACKEND_URI sqlite:////app/mlflow/mlflow.db
ENV ARTIFACT_ROOT /app/mlflow/artifacts
CMD mlflow server \
--backend-store-uri ${BACKEND_URI} \
--default-artifact-root ${ARTIFACT_ROOT} \
--host 0.0.0.0 \
--port 5000
Hi, does metaflow's S3 API work with session tokens?
MetaflowS3Exception Traceback (most recent call last)
<ipython-input-318-a0128f9c1437> in <module>
5 # url = s3.put('example_object', message)
6 # print("Message saved at", url)
----> 7 s3.get('test.csv')
~/Github/venvs/lib/python3.9/site-packages/metaflow/datatools/s3.py in get(self, key, return_missing, return_info)
612 addl_info = None
613 try:
--> 614 path, addl_info = self._one_boto_op(_download, url)
615 except MetaflowS3NotFound:
616 if return_missing:
~/Github/venvs/lib/python3.9/site-packages/metaflow/datatools/s3.py in _one_boto_op(self, op, url, create_tmp_file)
930 # add some jitter to make sure retries are not synchronized
931 time.sleep(2 ** i + random.randint(0, 10))
--> 932 raise MetaflowS3Exception(
933 "S3 operation failed.\n" "Key requested: %s\n" "Error: %s" % (url, error)
934 )
MetaflowS3Exception: S3 operation failed.
Key requested: s3://demo-nonprod-sagemaker/joshzastrow/sandbox/test.csv
Error: An error occurred (InvalidToken) when calling the GetObject operation: The provided token is malformed or otherwise invalid.
I have aws_access_key
, aws_secret_access_key
and aws_session_token
setup as my default and user profile in ~/.aws/credentials
. I ran metaflow configure aws
and only configured the AWS s3 as the storage backend.
I've tried setting the profile both for aws and in metaflow (export METAFLOW_PROFILE=my_profile
).
I have been able to write to S3 using s3fs
object:
s3_file_system = s3fs.S3FileSystem(
anon=False,
s3_additional_kwargs={'ServerSideEncryption': 'AES256'},
profile=PROFILE
)
I'm wondering if there is an additional encryption argument that needs to be set with Metaflow in order to work with it's S3 API?
Hey,
I deployed Metaflow with provided cloudformation stack with default values and everything works fine. But I received a security concern from AWS Support Security advisor that I should disable public IP of the task definition, because it is accessible from Internet. Why we need a public IP? can we disable it?
I appreciate your explanation. I will forward it to my manager :)
Best regards and thanks