Skip to content

Custom Workflow Tasks¤

A custom workflow task is an operator that can be used in a workflow.

Intended audience: Linked Data Experts and Domain Experts

Name Description
Add project files Adds file resources to the project that are piped into the input port.
Cancel Workflow Cancels a workflow if a specified condition is fulfilled. A typical use case for this operator is to cancel the workflow execution if the input data is empty.
Combine CSV files Combine CSV files with the same structure to one dataset.
Concatenate to file Concatenates values into a file.
Create Embeddings Fetch and output LLM created embeddings from input entities.
Create/Update Salesforce Objects Manipulate data in your organization’s Salesforce account.
Delete project files Removes file resources from the project based on a regular expression.
Distinct by Removes duplicated entities based on a user-defined path. Note that this operator does not retain the order of the entities.
Download file Downloads a file from a given URL.
Download Nextcloud files Download files from a given Nextcloud instance.
Download Office 365 Files Download files from Microsoft OneDrive or Sites
Download SSH files Download files from a given SSH instance
Evaluate template Evaluates a template on a sequence of entities. Can be used after a transformation or directly after datasets that output a single table, such as CSV or Excel.
Execute a command in a kubernetes pod Connect to a cluster, execute a command and gather the output.
Execute commands via SSH Execute commands on a given SSH instance.
Execute Instructions Send instructions (prompt) to an LLM and process the result.
Execute REST requests REST operator that fetches and optionally merges data from a REST endpoint. It supports executing multiple requests either via input entities that each overwrite config parameters or via paging. If you only need to download a single file, the “Download file” operator might be the better option. Most features are currently only supported for JSON REST APIs. From multiple requests the REST operator can produce a merged JSON result, i.e. for JSON it will concatenate all results in a JSON array. Alternatively multiple results can be written directly to file (of a JSON dataset), either as a merged JSON file or one file per request inside a ZIP file. By default the output of this operator is an entity with a single property ‘result’, which is the (concatenated) JSON string.
Execute Spark function Applies a specified Scala function to a specified field.
Extract from PDF files Extract text and tables from PDF files
Generate base36 IRDIs Create unique ECLASS IRDIs.
Generate SHACL shapes from data Generate SHACL node and property shapes from a data graph
Get project files Get file resources from the project.
GraphQL query Executes a custom GraphQL query to a GraphQL endpoint and saves result to a JSON dataset.
Join tables Joins a set of inputs into a single table. Expects a list of entity tables and links. All entity tables are joined into the first entity table using the provided links.
jq Process a JSON document with a jq filter / program.
JQL query Search and retrieve JIRA issues.
Kafka Consumer (Receive Messages) Reads messages from a Kafka topic and saves it to a messages dataset (Consumer).
Kafka Producer (Send Messages) Reads a messages dataset and sends records to a Kafka topic (Producer).
List Nextcloud files List directories and files from a given Nextcloud folder.
List Office 365 Files List files from OneDrive or Sites
List project files List file resources from the project.
List SSH files List files from a given SSH instance.
Merge tables Stores sets of instance and mapping inputs as relational tables with the mapping as an n:m relation. Expects a list of entity tables and links. All entity tables have a relation to the first entity table using the provided links.
Normalize units of measurement Custom task that will substitute numeric values and pertaining unit symbols with a SI-system-unit normalized representation.
OAuth2 Authentication Provide an OAuth2 access token for other tasks (via config port).
Office 365 Upload Files Upload files to OneDrive or a site Sharepoint
Parse JSON Parses an incoming entity as a JSON dataset. Typically, it is used before a transformation task. Takes exactly one input of which only the first entity is processed.
Parse XML Takes exactly one input and reads either the defined inputPath or the first value of the first entity as XML document. Then executes the given output entity schema similar to the XML dataset to construct the result entities.
Parse YAML Parses files, source code or input values as YAML documents.
Pivot The pivot operator takes data in separate rows, aggregates it and converts it into columns.
Request RDF triples A task that requests all triples from an RDF dataset.
Scheduler Executes a workflow at specified intervals.
Search addresses Looks up locations from textual descriptions using the configured geocoding API. Outputs results as RDF.
Search Vector Embeddings Search for top-k metadata stored in Postgres Vector Store (PGVector).
Send email Sends an email using an SMTP server.
Send Mattermost messages Send messages to Mattermost channels and/or users.
Set or Overwrite parameters Connect this task to a config port of another task in order to set or overwrite the parameter values of this task.
SHACL validation with pySHACL Performs SHACL validation with pySHACL.
SOQL query (Salesforce) Executes a custom Salesforce Object Query (SOQL) to return sets of data your organization’s Salesforce account.
Spark SQL query Executes a custom SQL query on the first input Spark dataframe and returns the result as its output.
SPARQL Construct query A task that executes a SPARQL Construct query on a SPARQL enabled data source and outputs the SPARQL result. If the result should be written to the same RDF store it is read from, the SPARQL Update operator is preferable.
SPARQL Select query A task that executes a SPARQL Select query on a SPARQL enabled data source and outputs the SPARQL result. If the SPARQL source is defined on a specific graph, a FROM clause will be added to the query at execution time, except when there already exists a GRAPH or FROM clause in the query. FROM NAMED clauses are not injected.
SPARQL Update query A task that outputs SPARQL Update queries for every entity from the input based on a SPARQL Update template. The output of this operator should be connected to the SPARQL datasets to which the results should be written.
Split file Split a file into multiple parts with a specified size.
SQL Update query A task that outputs SQL queries. The output of this operator should be connected to a remote SQL endpoint on which queries should be executed.
Start Workflow per Entity Loop over the output of a task and start a sub-workflow for each entity.
Store Vector Embeddings Store embeddings into Postgres Vector Store (PGVector).
Unpivot Given a list of table columns, transforms those columns into attribute-value pairs.
Update SemSpect Tell SemSpect to prepare a Knowledge Graph for visualization.
Upload File to Knowledge Graph Uploads an N-Triples or Turtle (limited support) file from the file repository to a ‘Knowledge Graph’ dataset. The output of this operatorcan be the input of datasets that support graph store file upload, e.g. ‘Knowledge Graph’. The file will be uploaded to the graph specified in that dataset.
Upload files to Nextcloud Upload files to a given Nextcloud instance.
Upload local files Replace a file dataset resource with a local file or upload multiple local files to a project.
Upload SSH files Upload files to a given SSH instance.
Validate Entities Use a JSON schema to validate entities or a JSON dataset.
Validate Knowledge Graph Use SHACL shapes to validate resources in a Knowledge Graph.
Validate XML Validates an XML dataset against a provided XML schema (XSD) file. Any errors are written to the output. Can be used in conjunction with the Cancel Workflow operator in order to stop the workflow if errors have been found.
XSLT A task that converts an XML resource via an XSLT script and writes the transformed output into a file resource.

Comments