Hi, I am searching for a way to build a tf.data.dataset or tfio.IODataset from Apache Parquet files residing on S3. However, I cannot access the data from S3, e.g. tf.data.Dataset.list_files(s3uri + "/*", shuffle=True)
gives the error InvalidArgumentError: Expected 'tf.Tensor(False, shape=(), dtype=bool)' to be true. Summarized data: b'No files matched pattern:...
.
On Sagemaker Studio this worked out of the box but I assume they mount s3?
Is there a good way to achieve this?
s3://bucket/object
without mounting. If it is not working, it could be related to configuration as s3 file system needs information about AWS region and permissions , either config file or environmental variable (see https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-files.html)
@diggerk thanks for the suggestion. I got the statcktrace as below, but it does not seem to be very informative.
Current thread 0x00007fe3a832d740 (most recent call first):
File "/home/victor.xie/.cache/pypoetry/virtualenvs/image-inference-pipeline-2_RwuX6D-py3.8/lib/python3.8/site-packages/tensorflow/python/eager/execute.py", line 59 in quick_execute
File "/home/victor.xie/.cache/pypoetry/virtualenvs/image-inference-pipeline-2_RwuX6D-py3.8/lib/python3.8/site-packages/tensorflow/python/ops/gen_io_ops.py", line 596 in read_file_eager_fallback
File "/home/victor.xie/.cache/pypoetry/virtualenvs/image-inference-pipeline-2_RwuX6D-py3.8/lib/python3.8/site-packages/tensorflow/python/ops/gen_io_ops.py", line 558 in read_file
File "test/test_tf_s3_support.py", line 30 in <module>
unable to open file: libtensorflow_io.so
). Has anyone successfully gotten this to build on an M1 mac? I'm using python 3.9 and tensorflow is working just fine (it's able to see my Mac's GPU etc) .