Spark 1 如何实现Spark External Datasource
Spark Data Source API: Extending Our Spark SQL Query Engine
Spark SQL之External DataSource外部数据源(一)示例
Spark SQL之External DataSource外部数据源(二)源码分析
参考实现:
Spark 2 如何实现Spark External Datasource
https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-sql-datasource-api.html
https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-sql-datasource.html
http://www.spark.tc/exploring-the-apache-spark-datasource-api/
新特性:
- 子查询的自持
- 更加丰富的读写api支持,包括
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19- RelationProvider
https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-sql-datasource-api.html
https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-sql-datasource.html
http://www.spark.tc/exploring-the-apache-spark-datasource-api/
[关键](http://wiki.baidu.com/pages/viewpage.action?pageId=213816907)
- DataSourceRegister
triat ```DataSourceRegister``` is an interface to register DataSources under their shortName aliases (to look them up later).
``` scala
package org.apache.spark.sql.sources
trait DataSourceRegister {
def shortName(): String
}
It allows users to use the data source alias as the format type over the fully qualified class name.