Kickstart

To get started, you should also read pragmatic examples in the Cookbook.

Create an empty project

If you want to bootstrap an ETL project on your computer, you can now do it using the provided PasteScript template.

pip install PasteScript
paster create -t etl_project MyProject

Overview of concepts

Extract

Extract is a flexible base class to write extract transformations. We use a generator here, real life would usually use databases, webservices, files ...

from rdc.etl.transform.extract import Extract

@Extract
def my_extract():
    yield {'foo': 'bar', 'bar': 'min'}
    yield {'foo': 'boo', 'bar': 'put'}

For more informations, see the extracts reference.

Transform

Transform is a flexible base class for all kind of transformations.

from rdc.etl.transform import Transform

@Transform
def my_transform(hash, channel):
    yield hash.update({
        'foo': hash['foo'].upper()
    })

For more informations, see the transformations reference.

Load

We’ll use the screen as our load target ...

from rdc.etl.transform.util import Log

my_load = Log()

For more informations, see the loads reference.

Note

Log is not a “load” transformation stricto sensu (as it acts as an identity transformation, sending to the default output channel whatever comes in its default input channel), but we’ll use it as such for demonstration purpose.

Run

Let’s create a Job. It will be used to:

  • Connect transformations
  • Manage threads
  • Monitor execution
from rdc.etl.job import Job

job = Job()

The Job has a add_chain() method that can be used to easily plug a list of ordered transformations together.

job.add_chain(my_extract, my_transform, my_load)

Our job is ready, you can run it.

job()

For more informations, see the jobs documentation.