Split file¤
Python Plugin
This operator is part of a Python Plugin Package. In order to use it, you need to install it, e.g. with cmemc.
A task splitting a text file into multiple parts with a specified size.
Options¤
Input filename¤
The input file to be split.
Example: An input file with the name input.nt will be split into files with the names input_000000001.nt,
input_000000002.nt, input_000000003.nt, etc.
⚠️ Existing files will be overwritten!
Chunk size¤
The maximum size of the chunk files.
Size unit¤
The unit of the size value: kilobyte (KB), megabyte (MB), gigabyte (GB), or number of lines (Lines).
Delete input file¤
Delete the input file after splitting.
Include header¤
Include the header in each split. The first line of the input file is treated as the header.
Use internal projects directory¤
Use the internal projects directory of DataItegration to fetch and store files, instead of using the API. If enabled, the “Internal projects directory” parameter has to be set.
Internal projects directory¤
The path to the internal projects directory. If “Use internal projects directory” is disabled, this parameter has no effect.
Parameter¤
Input filename¤
The input file to be split.
- Datatype:
string
- Default Value:
None
Chunk size¤
The maximum size of the chunk files.
- Datatype:
double
- Default Value:
None
Size unit¤
The unit of the size value: kilobyte (KB), megabyte (MB), gigabyte (GB), or number of lines (Lines).
- Datatype:
string
- Default Value:
MB
Include header¤
Include the header in each split. The first line of the input file is treated as the header.
- Datatype:
boolean
- Default Value:
false
Delete input file¤
Delete the input file after splitting.
- Datatype:
boolean
- Default Value:
false
Use internal projects directory¤
Use the internal projects directory of DataIntegration to fetch and store files, instead of using the API. If enabled, the “Internal projects directory” parameter has to be set.
- Datatype:
boolean
- Default Value:
false
Internal projects directory¤
The path to the internal projects directory. If “Use internal projects directory” is disabled, this parameter has no effect.
- Datatype:
string
- Default Value:
/data/datalake