Datasets¤

Datasets are collections of data that can be read or written.

Intended audience: Linked Data Experts and Domain Experts

Name	Description
Alignment	Writes the alignment format specified at http://alignapi.gforge.inria.fr/format.html.
Avro	Read from or write to an Apache Avro file.
Binary file	Reads and writes binary files. A typical use-case for this dataset is to process PDF documents or images.
CSV	Read from or write to an CSV file.
Embedded Spark SQL view	Deprecated: Use the embedded SQL endpoint dataset instead.
Embedded SQL endpoint	Provides a JDBC endpoint that exposes workflow or transformation results as tables, which can be queried using SQL.
Excel	Read from or write to an Excel workbook in Open XML format (XLSX). The sheet is selected by specifying it as type in the subsequent workflow operator.
Excel (Google Drive)	Read data from a remote Google Spreadsheet.
Excel (OneDrive, Office365)	Read data from a remote onedrive or Office365 Spreadsheet.
Hive database	Read from or write to an embedded Apache Hive endpoint.
In-memory dataset	A Dataset that holds all data in-memory.
Internal dataset	Dataset for storing entities between workflow steps. The underlying dataset type can be configured using the `dataset.internal.*` configuration parameters.
Internal dataset (single graph)	Dataset for storing entities between workflow steps. This variant does use the same graph for all internal datasets in a workflow. The underlying dataset type can be configured using the `dataset.internal.*` configuration parameters.
JSON	Read from or write to a JSON or JSON Lines file.
Knowledge Graph	Read RDF from or write RDF to a Knowledge Graph embedded in Corporate Memory.
Multi CSV ZIP	Reads from or writes to multiple CSV files from/to a single ZIP file.
Neo4j	Neo4j graph
ORC	Read from or write to an Apache ORC file.
Parquet	Read from or write to an Apache Parquet file.
RDF file	Dataset which retrieves and writes all entities from/to an RDF file. For reading, the dataset is loaded in-memory and thus the size is restricted by the available memory. Large datasets should be loaded into an external RDF store and retrieved using the SPARQL dataset instead.
Remote SQL endpoint	Connect to an existing JDBC endpoint.
Snowflake SQL endpoint	Connect to Snowflake JDBC endpoint.
SPARQL endpoint	Connects to an existing SPARQL endpoint.
Text	Reads and writes plain text files.
XML	Read from or write to an XML file.

Datasets¤

Comments