@freeman-lab I had a chance to look at this this morning. Your hunch on the partition argument being the problem with the slowdown was correct. If it's not passed in it defaults to the number of files, even with spark. In previous versions it did not do that (it seemed more dynamic in nature....). I passed in an arbitrary number that was based on the spark.default.parallelism property calculation (3-4 x number of CPUs in your cluster) and the result returned back to me in line with the same timing we were seeing in version 0.6. I briefly looked at the v0.6 and v1.0 code and couldn't see anything really that different in the way things were being handled in the reader where the partitions argument is being used. In our application that is currently using the v0.6 Thunder codebase, we do not pass anything to the npartitions argument.