Embedded SQL endpoint¤
SQL endpoint dataset parameters
The dataset only requires that the tableNamePrefix parameter is given. This will be used as the prefix for the names of the generated tables. When a set of entities is written to the endpoint a view is generated for each entity type (defined by an ‘rdf_type’ attribute). That means that the mapping or data source that are used as input for the SQL endpoint need to have a type or require a user defined type mapping.
The operator has a compatibility mode. This mode will avoid complex types such as Arrays. When arrays exist in the input they are converted to a String using the given arraySeparator. This avoids errors and warnings in some Jdbc clients that are unable to handle typed arrays and may make working with software like Excel easier.
The parameter aliasMap of the endpoint allows the specification of column aliases. The map is a comma separated list of key-value pairs.
Each key and value is denoted by key:value. An example for renaming 2 columns (source1, source2 to target1, target2) in the result would be:
source1:target1,source2:target2
Note: Table and column (mapping target) names will be automatically converted to be valid in as many databases as possible. Table names will be shortened to 128 characters. Only a-z, A-Z, 0-9 and _ are allowed. Others will be replaced with an underscore. Column names undergo the same transformation but will be converted to lower case as well. The log will inform about changes. The table names will be generated based on the target type of each mapping. The user needs to make sure that each object mapping specifies a unique type. If two object mappings define the same type, only the last one will be written.
SQL endpoint activity
See [ActivityDocumentation] for a general description of the Data Integration activities. The activity will start automatically, when the SQL endpoint is used as a data sink and Data Integration is configured to make the SQL endpoint accessible remotely.
When the activity is started and running it returns the server status and JDBC URL as its value.
Stopping the activity will drop all views generated by the activity. It can be restarted by rerunning the workflow containing it as a sink.
Remote client configuration (via JDBC and ODBC)
Within Data Integration the SQL endpoint can be used as a source or sink like any other dataset. If the startThriftServer option is set to ‘true’ access via JDBC or ODBC is possible.
ODBC and JDBC drivers can be used to connect to relational databases.
When selecting a version of a driver the client operating system and its type (32bit/64 bit) are the most important factors. The version of the client drivers sometimes is the same as the server’s. If no version of a driver is given, the newest driver of the vendor should work, as it should be backwards compatible.
Any JDBC or ODBC client can connect to an SQL endpoint dataset. SparkSQL uses the same query processing as Hive, therefore the requirements for the client are:
- A JDBC driver compatible with Hive 1.2.11 (platform independent driver org.apache.hive.jdbc.HiveDriver is needed) or
- A JDBC driver compatible with Spark 2.3.3
- A Hive ODBC driver (ODBC driver for the client architecture and operating system needed)
A detailed instruction to connect to a Hive or SparkSQL endpoint with various tools (e.g. SQuirreL, beeline, SQL Developer, …) can be found at Apache HiveServer2 Clients. The database client DBeaver can connect to the SQL endpoint out of the box.
Parameter¤
Table name prefix¤
Prefix of the table that will be shared. In the case of complex mappings more than one table will be created. If one name is given it will be used as a prefix for table names. If left empty the table names will be generated from the user name and time stamps and start with ‘root’, ‘object-mapping’
- ID: tableNamePrefix
- Datatype: string
- Default Value: None
Cache¤
Optional boolean option that selects if the table should be cached by Spark or not (default = true).
- ID: cache
- Datatype: boolean
- Default Value: true
Array separator¤
The character that is used to separate the parts of array values. Write \t to specify the tab character.
- ID: arraySeparator
- Datatype: string
- Default Value: |
Compatibility¤
If true, basic types will be used for unusual data types that otherwise may result in client errors. Try switching this on, if a client has weird error messages. (Default = true)
- ID: useCompatibleTypes
- Datatype: boolean
- Default Value: true
Map¤
Mapping of column names. Similar to aliases E.g. ‘c1:c2’ would rename column c1 into c2.
- ID: map
- Datatype: stringmap
- Default Value: None
Advanced Parameter¤
None