Execute Instructions¤

Python Plugin

This operator is part of a Python Plugin Package. In order to use it, you need to install it, e.g. with cmemc.

Overview¤

This plugin executes Large Language Model (LLM) instructions over entity collections, enabling AI-powered text generation, analysis, and transformation tasks within Corporate Memory workflows.

Core Functionality¤

LLM Integration: Supports OpenAI API, Azure OpenAI, and OpenAI-compatible endpoints (Anthropic Claude, OpenRouter, etc.)
Entity Processing: Processes entities individually or in batches with configurable concurrency
Template System: Uses Jinja2 templates for dynamic prompt generation from entity data
Output Formats: Supports text, JSON mode, and structured outputs with Pydantic schemas
Performance Optimization: Includes batching, rate limiting, and async processing for high-throughput scenarios

Input/Output Behavior¤

After processing, each entity receives an additional path (default: _instruction_output) containing the LLM response. Input/output ports are automatically configured based on template variables:

No placeholders: No input ports required
With placeholders: Dynamic input ports created for each template variable
Port ordering: Variables sorted alphabetically determine port order
Schema handling: Fixed schemas when using specific entity paths, flexible schemas otherwise

Template System¤

Uses Jinja2 templating for dynamic prompts:

{{ variable }}           # Entire entity as JSON
{{ variable.name }}      # Specific entity property
{{ variable_a.title }}   # Property from first additional input port
{{ variable_b.content }} # Property from second additional input port

The followin template processing rule are implemented:

Variable Extraction: Automatically detects template variables to configure input ports
Entity Iteration: Main processing iterates over first input port entities
Additional Inputs: Secondary ports provide context data for template rendering
Consumption Modes: Choose between first-entity or all-entities consumption from additional ports

Output Formats¤

Text Output (Default) - Standard LLM text responses for general-purpose tasks.
JSON Mode - Ensures valid JSON output format. Add JSON structure requirements to your prompt template.
Structured Output - Uses Pydantic schemas for type-safe, validated responses:

from pydantic import BaseModel

class StructuredOutput(BaseModel):
    title: str
    summary: str
    keywords: list[str]
    confidence_score: float

Performance Features¤

Parallel Processing: - Concurrent Requests: Configurable semaphore-controlled API calls - Batch Processing: Entities processed in configurable batch sizes - Rate Limiting: Optional delays between requests - Memory Optimization: Streaming processing with generator patterns

Error Handling: - Graceful Degradation: Continue processing on API errors (configurable) - Detailed Logging: Comprehensive error reporting and debugging information - Workflow Integration: Proper cancellation support and progress reporting

API Compatibility¤

Supported Providers: - OpenAI: Direct API access with full feature support - Azure OpenAI: Enterprise Azure-hosted services with API versioning - OpenAI-Compatible: Anthropic Claude, OpenRouter, local models, and other compatible endpoints

Authentication: - API Keys: Secure password-type parameters for API authentication - Azure Integration: Supports Azure OpenAI API versioning and endpoint configuration - Flexible Endpoints: Custom base URLs for various providers

Advanced Configuration¤

Message Templates¤

Customize the conversation structure beyond simple prompts:

[
    {"role": "system", "content": "You are a data analyst."},
    {"role": "user", "content": "{{ instruction_prompt }}"}
]

Performance Tuning¤

Temperature Control: Adjust creativity vs. determinism (0.0-2.0)
Timeout Management: Request-level timeout configuration
Concurrency Limits: Prevent rate limiting with request throttling
Batch Optimization: Balance memory usage vs. throughput

Best Practices¤

Schema Design: Use specific entity paths in templates for fixed schemas
Error Strategy: Enable error continuation for large datasets
Performance: Adjust concurrency and batch size based on API limits
Templates: Design prompts with clear instructions and expected outputs
Testing: Start with small entity sets to validate templates and outputs

For detailed prompting guidance, see OpenAI’s Text Generation Guide.

Parameter¤

Base URL¤

The base URL of the OpenAI compatible API (without endpoint path).

ID: base_url
Datatype: string
Default Value: https://api.openai.com/v1/

API Type¤

Select the API client type. This determines the authentication method and endpoint configuration used for API requests. Choose OPENAI for direct OpenAI API access or AZURE_OPENAI for Azure-hosted OpenAI services. Consider using the API version advanced parameter in case you access Azure-hosted OpenAI services.

ID: api_type
Datatype: enumeration
Default Value: OPENAI

API key¤

An optional API key for authentication.

ID: api_key
Datatype: password
Default Value: None

Instruct Model¤

The identifier of the instruct model to use. Note that some provider do not support a model list endpoint. Just create a custom entry then. Available model IDs for some public providers can be found here: Claude, OpenRouter, Azure.

ID: model
Datatype: string
Default Value: gpt-4o-mini

Instruction Prompt Template¤

The instruction prompt template. Please have a look at the task documentation for detailed instructions.

ID: instruct_prompt_template
Datatype: code-jinja2
Default Value: Write a paragraph about this entity: {{ entity }}

Advanced Parameter¤

API Version¤

Azure OpenAI API version (only used when API Type is AZURE_OPENAI). For more information about OpenAI API version at Azure, please see the documentation.

ID: api_version
Datatype: string
Default Value: None

Temperature (between 0 and 2)¤

A parameter that controls the randomness and creativity of the model. A high temperature value (0.8 - 1.0) increases randomness and creativity. This is useful for open-ended tasks like storytelling or brainstorming. A low temperature value (0.0 - 0.4) produces more deterministic and focused outputs. This is suitable for factual or technical tasks.

ID: temperature
Datatype: double
Default Value: 1.0

Timeout (seconds)¤

The timeout for a single API request in seconds.

ID: timeout
Datatype: double
Default Value: 300

Instruction Output Path¤

The entity path where the instruction result will be provided.

ID: instruction_output_path
Datatype: string
Default Value: _instruction_output

Messages Template¤

A list of messages comprising the conversation compatible with OpenAI chat completion API message object. Have look at Message roles and instruction following to learn about different levels of priority to messages with different roles.

ID: messages_template
Datatype: code-json

Default Value:

[
    {
        "role": "developer",
        "content": "You are a helpful assistant."
    },
    {
        "role": "user",
        "content": "{{ instruction_prompt }}"
    }
]

Consume all entities from additional input ports¤

If enabled, all entities from additional input ports will be consumed. Otherwise, only the first entity of the additional ports will be used.

ID: consume_all_entities
Datatype: boolean
Default Value: false

Output Format¤

Specifying the format that the model must output. Possible values are TEXT - Standard text output, STRUCTURED_OUTPUT - output follows a given schema. Add your schema as Pydantic model in the parameter below, JSON_MODE - a more basic version of the structured outputs feature where you have to add your structure to the prompt template.

ID: output_format
Datatype: enumeration
Default Value: TEXT

Pydantic Schema¤

The Pydantic schema definition with a mandatory class named StructuredOutput(BaseModel). This is only used in combination with the Structured Output format. A schema may have up to 100 object properties total, with up to 5 levels of nesting. The total string length of all property names, definition names, enum values, and const values cannot exceed 15,000 characters.

ID: pydantic_schema
Datatype: code-python