Skip to content

Datasetsยค

Datasets are collections of data that can be read or written.

Intended audience: Linked Data Experts and Domain Experts

Name Description
Alignment Writes the alignment format specified at http://alignapi.gforge.inria.fr/format.html.
Avro Read from or write to an Apache Avro file.
Binary file Reads and writes binary files. A typical use-case for this dataset is to process PDF documents or images.
CSV Read from or write to an CSV file.
Embedded Spark SQL view Deprecated: Use the embedded SQL endpoint dataset instead.
Embedded SQL endpoint Provides a JDBC endpoint that exposes workflow or transformation results as tables, which can be queried using SQL.
Excel Read from or write to an Excel workbook in Open XML format (XLSX). The sheet is selected by specifying it as type in the subsequent workflow operator.
Excel (Google Drive) Read data from a remote Google Spreadsheet.
Excel (OneDrive, Office365) Read data from a remote onedrive or Office365 Spreadsheet.
Hive database Read from or write to an embedded Apache Hive endpoint.
In-memory dataset A Dataset that holds all data in-memory.
Internal dataset Dataset for storing entities between workflow steps. The underlying dataset type can be configured using the dataset.internal.* configuration parameters.
Internal dataset (single graph) Dataset for storing entities between workflow steps. This variant does use the same graph for all internal datasets in a workflow. The underlying dataset type can be configured using the dataset.internal.* configuration parameters.
JSON Read from or write to a JSON or JSON Lines file.
Knowledge Graph Read RDF from or write RDF to a Knowledge Graph embedded in Corporate Memory.
Multi CSV ZIP Reads from or writes to multiple CSV files from/to a single ZIP file.
Neo4j Neo4j graph
ORC Read from or write to an Apache ORC file.
Parquet Read from or write to an Apache Parquet file.
RDF file Dataset which retrieves and writes all entities from/to an RDF file. For reading, the dataset is loaded in-memory and thus the size is restricted by the available memory. Large datasets should be loaded into an external RDF store and retrieved using the SPARQL dataset instead.
Remote SQL endpoint Connect to an existing JDBC endpoint.
Snowflake SQL endpoint Connect to Snowflake JDBC endpoint.
SPARQL endpoint Connects to an existing SPARQL endpoint.
Text Reads and writes plain text files.
XML Read from or write to an XML file.

Comments