Skip to content

Split file¤

Python Plugin

This operator is part of a Python Plugin Package. In order to use it, you need to install it, e.g. with cmemc.

A task splitting a text file into multiple parts with a specified size.

Options¤

Input filename¤

The input file to be split.
Example: An input file with the name input.nt will be split into files with the names input_000000001.nt, input_000000002.nt, input_000000003.nt, etc.
⚠️ Existing files will be overwritten!

Chunk size¤

The maximum size of the chunk files.

Size unit¤

The unit of the size value: kilobyte (KB), megabyte (MB), gigabyte (GB), or number of lines (Lines).

Delete input file¤

Delete the input file after splitting.

Include header¤

Include the header in each split. The first line of the input file is treated as the header.

Use internal projects directory¤

Use the internal projects directory of DataItegration to fetch and store files, instead of using the API. If enabled, the “Internal projects directory” parameter has to be set.

Internal projects directory¤

The path to the internal projects directory. If “Use internal projects directory” is disabled, this parameter has no effect.

Parameter¤

Input filename¤

The input file to be split.

  • Datatype: string
  • Default Value: None

Chunk size¤

The maximum size of the chunk files.

  • Datatype: double
  • Default Value: None

Size unit¤

The unit of the size value: kilobyte (KB), megabyte (MB), gigabyte (GB), or number of lines (Lines).

  • Datatype: string
  • Default Value: MB

Include header¤

Include the header in each split. The first line of the input file is treated as the header.

  • Datatype: boolean
  • Default Value: false

Delete input file¤

Delete the input file after splitting.

  • Datatype: boolean
  • Default Value: false

Use internal projects directory¤

Use the internal projects directory of DataIntegration to fetch and store files, instead of using the API. If enabled, the “Internal projects directory” parameter has to be set.

  • Datatype: boolean
  • Default Value: false

Internal projects directory¤

The path to the internal projects directory. If “Use internal projects directory” is disabled, this parameter has no effect.

  • Datatype: string
  • Default Value: /data/datalake

Comments