C'est Spark 2.4.4 et Delta Lake 0.5.0.
J'essaie de créer une table en utilisant la source de données delta et il semble que je manque quelque chose. Bien que le CREATE TABLE USING delta
la commande a bien fonctionné ni le répertoire de la table n'est créé ni insertInto
fonctionne.
Le suivant CREATE TABLE USING delta
a bien fonctionné, mais insertInto
a échoué.
scala> sql("""
create table t5
USING delta
LOCATION '/tmp/delta'
""").show
scala> spark.catalog.listTables.where('name === "t5").show
+----+--------+-----------+---------+-----------+
|name|database|description|tableType|isTemporary|
+----+--------+-----------+---------+-----------+
| t5| default| null| EXTERNAL| false|
+----+--------+-----------+---------+-----------+
scala> spark.range(5).write.option("mergeSchema", true).insertInto("t5")
org.Apache.spark.sql.AnalysisException: `default`.`t5` requires that the data to be inserted have the same number of columns as the target table: target table has 0 column(s) but the inserted data has 1 column(s), including 0 partition column(s) having constant value(s).;
at org.Apache.spark.sql.execution.datasources.PreprocessTableInsertion.org$Apache$spark$sql$execution$datasources$PreprocessTableInsertion$$preprocess(rules.scala:341)
...
Je pensais créer avec des colonnes définies, mais cela n'a pas fonctionné non plus.
scala> sql("""
create table t6
(id LONG, name STRING)
USING delta
LOCATION '/tmp/delta'
""").show
org.Apache.spark.sql.AnalysisException: delta does not allow user-specified schemas.;
at org.Apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:325)
at org.Apache.spark.sql.execution.command.CreateDataSourceTableCommand.run(createDataSourceTables.scala:78)
at org.Apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
at org.Apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
at org.Apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
at org.Apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:194)
at org.Apache.spark.sql.Dataset.$anonfun$withAction$2(Dataset.scala:3370)
at org.Apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:78)
at org.Apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
at org.Apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
at org.Apache.spark.sql.Dataset.withAction(Dataset.scala:3370)
at org.Apache.spark.sql.Dataset.<init>(Dataset.scala:194)
at org.Apache.spark.sql.Dataset$.ofRows(Dataset.scala:79)
at org.Apache.spark.sql.SparkSession.sql(SparkSession.scala:642)
... 54 elided
Delta Lake 0.7.0 avec Spark 3.0.0 (les deux viennent de sortir) prend en charge CREATE TABLE
Commande SQL.
Veillez à "installer" Delta SQL en utilisant spark.sql.catalog.spark_catalog
propriété de configuration avec org.Apache.spark.sql.delta.catalog.DeltaCatalog
.
$ ./bin/spark-submit \
--packages io.delta:delta-core_2.12:0.7.0 \
--conf spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension \
--conf spark.sql.catalog.spark_catalog=org.Apache.spark.sql.delta.catalog.DeltaCatalog
scala> spark.version
res0: String = 3.0.0
scala> sql("CREATE TABLE delta_101 (id LONG) USING delta").show
++
||
++
++
scala> spark.table("delta_101").show
+---+
| id|
+---+
+---+
scala> sql("DESCRIBE EXTENDED delta_101").show(truncate = false)
+----------------------------+---------------------------------------------------------+-------+
|col_name |data_type |comment|
+----------------------------+---------------------------------------------------------+-------+
|id |bigint | |
| | | |
|# Partitioning | | |
|Not partitioned | | |
| | | |
|# Detailed Table Information| | |
|Name |default.delta_101 | |
|Location |file:/Users/jacek/dev/oss/spark/spark-warehouse/delta_101| |
|Provider |delta | |
|Table Properties |[] | |
+----------------------------+---------------------------------------------------------+-------+