-
Type:
Bug
-
Status: To Do
-
Priority:
High
-
Resolution: Unresolved
-
Affects Version/s: None
-
Fix Version/s: None
-
Component/s: None
-
Labels:None
-
Environment:
AWS EKS
Spark support loading data frames from multiple paths - https://spark.apache.org/docs/2.0.2/api/java/org/apache/spark/sql/DataFrameReader.html#load(scala.collection.Seq)
When I tried to emulate similar with Snappy SQL, I got really weird results:
CREATE TEMPORARY TABLE users(uuid string, date_created long) USING parquet OPTIONS( path 's3a://bucket/users_2018_01_01-2018_01_31/', path 's3a://bucket/users_2018_02_01-2018_02_31/'));
The query succeeds, but when I select data it contains all data from the last path (path 's3a://bucket/users_2018_02_01-2018_02_31/') without data from the first path.
Unfortunately, I cannot use hive partitioning here to add data by partitions.