Datasetsยค
Datasets are collections of data that can be read or written.
Intended audience: Linked Data Experts and Domain Experts
| Name | Description |
|---|---|
| Alignment | Writes the alignment format specified at http://alignapi.gforge.inria.fr/format.html. |
| Avro | Read from or write to an Apache Avro file. |
| Binary file | Reads and writes binary files. A typical use-case for this dataset is to process PDF documents or images. |
| CSV | Read from or write to an CSV file. |
| Embedded Spark SQL view | Deprecated: Use the embedded SQL endpoint dataset instead. |
| Embedded SQL endpoint | Provides a JDBC endpoint that exposes workflow or transformation results as tables, which can be queried using SQL. |
| Excel | Read from or write to an Excel workbook in Open XML format (XLSX). The sheet is selected by specifying it as type in the subsequent workflow operator. |
| Excel (Google Drive) | Read data from a remote Google Spreadsheet. |
| Excel (OneDrive, Office365) | Read data from a remote onedrive or Office365 Spreadsheet. |
| Hive database | Read from or write to an embedded Apache Hive endpoint. |
| In-memory dataset | A Dataset that holds all data in-memory. |
| Internal dataset | Dataset for storing entities between workflow steps. The underlying dataset type can be configured using the dataset.internal.* configuration parameters. |
| Internal dataset (single graph) | Dataset for storing entities between workflow steps. This variant does use the same graph for all internal datasets in a workflow. The underlying dataset type can be configured using the dataset.internal.* configuration parameters. |
| JSON | Read from or write to a JSON or JSON Lines file. |
| Knowledge Graph | Read RDF from or write RDF to a Knowledge Graph embedded in Corporate Memory. |
| Multi CSV ZIP | Reads from or writes to multiple CSV files from/to a single ZIP file. |
| Neo4j | Neo4j graph |
| ORC | Read from or write to an Apache ORC file. |
| Parquet | Read from or write to an Apache Parquet file. |
| RDF file | Dataset which retrieves and writes all entities from/to an RDF file. For reading, the dataset is loaded in-memory and thus the size is restricted by the available memory. Large datasets should be loaded into an external RDF store and retrieved using the SPARQL dataset instead. |
| Remote SQL endpoint | Connect to an existing JDBC endpoint. |
| Snowflake SQL endpoint | Connect to Snowflake JDBC endpoint. |
| SPARQL endpoint | Connects to an existing SPARQL endpoint. |
| Text | Reads and writes plain text files. |
| XML | Read from or write to an XML file. |