Skip to content

Datasetsยค

Datasets are collections of data that can be read or written.

Intended audience: Linked Data Experts and Domain Experts

Name Description
Alignment Writes the alignment format specified at http://alignapi.gforge.inria.fr/format.html.
Avro Read from or write to an Apache Avro file.
Binary file Reads and writes binary files. A typical use-case for this dataset is to process PDF documents or images.
CSV Read from or write to an CSV file.
Excel Read from or write to an Excel workbook in Open XML format (XLSX).
Excel (Google Drive) Read data from a remote Google Spreadsheet.
Excel (OneDrive, Office365) Read data from a remote onedrive or Office365 Spreadsheet.
Hive database Read from or write to an embedded Apache Hive endpoint.
In-memory dataset A Dataset that holds all data in-memory.
Internal dataset Dataset for storing entities between workflow steps. The underlying dataset type can be configured using the dataset.internal.* configuration parameters.
Internal dataset (single graph) Dataset for storing entities between workflow steps. This variant does use the same graph for all internal datasets in a workflow. The underlying dataset type can be configured using the dataset.internal.* configuration parameters.
JDBC endpoint Connect to an existing JDBC endpoint.
JSON Read from or write to a JSON or JSON Lines file.
Knowledge Graph Read RDF from or write RDF to a Knowledge Graph embedded in Corporate Memory.
Multi CSV ZIP Reads from or writes to multiple CSV files from/to a single ZIP file.
Neo4j Neo4j graph
ORC Read from or write to an Apache ORC file.
Parquet Read from or write to an Apache Parquet file.
RDF file Dataset which retrieves and writes all entities from/to an RDF file. For reading, the dataset is loaded in-memory and thus the size is restricted by the available memory. Large datasets should be loaded into an external RDF store and retrieved using the SPARQL dataset instead.
Snowflake JDBC endpoint Connect to Snowflake JDBC endpoint.
SparkSQL view Use the SQL endpoint dataset instead.
SPARQL endpoint Connect to an existing SPARQL endpoint.
SQL endpoint Provides a JDBC endpoint that exposes workflow or transformation results as tables, which can be queried using SQL.
Text Reads and writes plain text files.
XML Read from or write to an XML file.

Comments