Datasetsยค
Datasets are collections of data that can be read or written.
Intended audience: Linked Data Experts and Domain Experts
Name | Description |
---|---|
Alignment | Writes the alignment format specified at http://alignapi.gforge.inria.fr/format.html. |
Avro | Read from or write to an Apache Avro file. |
Binary file | Reads and writes binary files. A typical use-case for this dataset is to process PDF documents or images. |
CSV | Read from or write to an CSV file. |
Excel | Read from or write to an Excel workbook in Open XML format (XLSX). |
Excel (Google Drive) | Read data from a remote Google Spreadsheet. |
Excel (OneDrive, Office365) | Read data from a remote onedrive or Office365 Spreadsheet. |
Hive database | Read from or write to an embedded Apache Hive endpoint. |
In-memory dataset | A Dataset that holds all data in-memory. |
Internal dataset | Dataset for storing entities between workflow steps. The underlying dataset type can be configured using the dataset.internal.* configuration parameters. |
Internal dataset (single graph) | Dataset for storing entities between workflow steps. This variant does use the same graph for all internal datasets in a workflow. The underlying dataset type can be configured using the dataset.internal.* configuration parameters. |
JDBC endpoint | Connect to an existing JDBC endpoint. |
JSON | Read from or write to a JSON or JSON Lines file. |
Knowledge Graph | Read RDF from or write RDF to a Knowledge Graph embedded in Corporate Memory. |
Multi CSV ZIP | Reads from or writes to multiple CSV files from/to a single ZIP file. |
Neo4j | Neo4j graph |
ORC | Read from or write to an Apache ORC file. |
Parquet | Read from or write to an Apache Parquet file. |
RDF file | Dataset which retrieves and writes all entities from/to an RDF file. For reading, the dataset is loaded in-memory and thus the size is restricted by the available memory. Large datasets should be loaded into an external RDF store and retrieved using the SPARQL dataset instead. |
Snowflake JDBC endpoint | Connect to Snowflake JDBC endpoint. |
SparkSQL view | Use the SQL endpoint dataset instead. |
SPARQL endpoint | Connect to an existing SPARQL endpoint. |
SQL endpoint | Provides a JDBC endpoint that exposes workflow or transformation results as tables, which can be queried using SQL. |
Text | Reads and writes plain text files. |
XML | Read from or write to an XML file. |