Utilities

Helper and utility transformations.

Log

class rdc.etl.transform.util.Log(field_filter=None, condition=None, clean=None)[source]

Identity transform that adds a console output side effect, to watch what is going through Queues at some point of an ETL process.

Stop

class rdc.etl.transform.util.Stop(transform=None, input_channels=None, output_channels=None)[source]

Sinker transform that stops anything through the pipes.

Override

class rdc.etl.transform.util.Override(override_data=None)[source]

Simple transform that will overwrite some values with constant values provided in a Hash.

Clean

class rdc.etl.transform.util.Clean(transform=None, input_channels=None, output_channels=None)[source]

Remove all fields with keys starting by _

SimpleTransform

class rdc.etl.extra.simple.SimpleTransform(*filters)[source]

SimpleTransform is an attempt to make a trivial transformation easy to build, using fluid APIs and a lot of easy shortcuts to apply filters to some fields.

The API is not stable and this will probably go into an “extra” module later.

Example:

>>> t = SimpleTransform()

Apply “upper” method on “name” field, and store it back in “name” field.

>>> t.add('name').filter('upper') 
<rdc.etl.extra.simple._SimpleItemTransformationDescriptor object at ...>

Apply the lambda to “description” field content, and store it into the “full_description” field.

>>> t.add('full_description', 'description').filter(lambda v: 'Description: ' + v) 
<rdc.etl.extra.simple._SimpleItemTransformationDescriptor object at ...>

Remove the previously defined “useless” descriptor. This does not remove the “useless” fields into transformed hashes, it is only usefull to override some parent stuff.

>>> t.useless = 'foo'
>>> t.delete('useless')

Mark the “notanymore” field for deletion upon transform. Output hashes will not anymore contain this field./

>>> t.remove('notanymore')

Add a field (output hashes will contain this field, all with the same “foo bar” value).

>>> t.test_field = 'foo bar'