Transformations

Transformations are the basic bricks to build ETL processes. Basically, it gets lines from its input and sends transformed lines to its output.

You’re highly encouraged to use the rdc.etl.transform.Transform class as a base for your custom transforms, as it defines the whole I/O logic. All transformations provided by the package are subclasses of rdc.etl.transform.Transform.

class rdc.etl.transform.Transform(transform=None, input_channels=None, output_channels=None)[source]

Base class and decorator for transformations.

transform(hash, channel=0)[source]

Core transformation method that will be called for each input data row.

INPUT_CHANNELS

List of input channel names.

OUTPUT_CHANNELS

List of output channel names

Example:

>>> @Transform
... def my_transform(hash, channel=STDIN):
...     yield hash.copy({'foo': hash['foo'].upper()})

>>> print list(my_transform(
...         H(('foo', 'bar'), ('bar', 'alpha')),
...         H(('foo', 'baz'), ('bar', 'omega')),
...     ))
[H{'foo': 'BAR', 'bar': 'alpha'}, H{'foo': 'BAZ', 'bar': 'omega'}]

Builtin transformations reference

Design notes