Skip to main content
Version: 1.2.0 (latest)

extract.resource

with_table_name

def with_table_name(item: TDataItems, table_name: str) -> DataItemWithMeta

[view_source]

Marks item to be dispatched to table table_name when yielded from resource function.

with_hints

def with_hints(item: TDataItems,
hints: TResourceHints,
create_table_variant: bool = False) -> DataItemWithMeta

[view_source]

Marks item to update the resource with specified hints.

Will create a separate variant of hints for a table if name is provided in hints and create_table_variant is set.

Create TResourceHints with make_hints. Setting table_name will dispatch the item to a specified table, like with_table_name

DltResource Objects

class DltResource(Iterable[TDataItem], DltResourceHints)

[view_source]

Implements dlt resource. Contains a data pipe that wraps a generating item and table schema that can be adjusted

source_name

Name of the source that contains this instance of the source, set when added to DltResourcesDict

section

A config section name

SPEC

A SPEC that defines signature of callable(parametrized) resource/transformer

from_data

@classmethod
def from_data(cls,
data: Any,
name: str = None,
section: str = None,
hints: TResourceHints = None,
selected: bool = True,
data_from: Union["DltResource", Pipe] = None,
inject_config: bool = False) -> Self

[view_source]

Creates an instance of DltResource from compatible data with a given name and section.

Internally (in the most common case) a new instance of Pipe with name is created from data and optionally connected to an existing pipe from_data to form a transformer (dependent resource).

If inject_config is set to True and data is a callable, the callable is wrapped in incremental and config injection wrappers.

name

@property
def name() -> str

[view_source]

Resource name inherited from the pipe

with_name

def with_name(new_name: str) -> TDltResourceImpl

[view_source]

Clones the resource with a new name. Such resource keeps separate state and loads data to new_name table by default.

is_transformer

@property
def is_transformer() -> bool

[view_source]

Checks if the resource is a transformer that takes data from another resource

requires_args

@property
def requires_args() -> bool

[view_source]

Checks if resource has unbound arguments

incremental

@property
def incremental() -> IncrementalResourceWrapper

[view_source]

Gets incremental transform if it is in the pipe

validator

@property
def validator() -> Optional[ValidateItem]

[view_source]

Gets validator transform if it is in the pipe

validator

@validator.setter
def validator(validator: Optional[ValidateItem]) -> None

[view_source]

Add/remove or replace the validator in pipe

max_table_nesting

@property
def max_table_nesting() -> Optional[int]

[view_source]

A schema hint for resource that sets the maximum depth of nested table above which the remaining nodes are loaded as structs or JSON.

pipe_data_from

def pipe_data_from(data_from: Union[TDltResourceImpl, Pipe]) -> None

[view_source]

Replaces the parent in the transformer resource pipe from which the data is piped.

add_pipe

def add_pipe(data: Any) -> None

[view_source]

Creates additional pipe for the resource from the specified data

select_tables

def select_tables(*table_names: Iterable[str]) -> TDltResourceImpl

[view_source]

For resources that dynamically dispatch data to several tables allows to select tables that will receive data, effectively filtering out other data items.

Both with_table_name marker and data-based (function) table name hints are supported.

add_map

def add_map(item_map: ItemTransformFunc[TDataItem],
insert_at: int = None) -> TDltResourceImpl

[view_source]

Adds mapping function defined in item_map to the resource pipe at position inserted_at

item_map receives single data items, dlt will enumerate any lists of data items automatically

Arguments:

  • item_map ItemTransformFunc[TDataItem] - A function taking a single data item and optional meta argument. Returns transformed data item.
  • insert_at int, optional - At which step in pipe to insert the mapping. Defaults to None which inserts after last step

Returns:

  • "DltResource" - returns self

add_yield_map

def add_yield_map(item_map: ItemTransformFunc[Iterator[TDataItem]],
insert_at: int = None) -> TDltResourceImpl

[view_source]

Adds generating function defined in item_map to the resource pipe at position inserted_at

item_map receives single data items, dlt will enumerate any lists of data items automatically. It may yield 0 or more data items and be used to ie. pivot an item into sequence of rows.

Arguments:

  • item_map ItemTransformFunc[Iterator[TDataItem]] - A function taking a single data item and optional meta argument. Yields 0 or more data items.
  • insert_at int, optional - At which step in pipe to insert the generator. Defaults to None which inserts after last step

Returns:

  • "DltResource" - returns self

add_filter

def add_filter(item_filter: ItemTransformFunc[bool],
insert_at: int = None) -> TDltResourceImpl

[view_source]

Adds filter defined in item_filter to the resource pipe at position inserted_at

item_filter receives single data items, dlt will enumerate any lists of data items automatically

Arguments:

  • item_filter ItemTransformFunc[bool] - A function taking a single data item and optional meta argument. Returns bool. If True, item is kept
  • insert_at int, optional - At which step in pipe to insert the filter. Defaults to None which inserts after last step

Returns:

  • "DltResource" - returns self

add_limit

def add_limit(max_items: int) -> TDltResourceImpl

[view_source]

Adds a limit max_items to the resource pipe.

This mutates the encapsulated generator to stop after max_items items are yielded. This is useful for testing and debugging.

Notes:

  1. Transformers won't be limited. They should process all the data they receive fully to avoid inconsistencies in generated datasets.
  2. Each yielded item may contain several records. add_limit only limits the "number of yields", not the total number of records.
  3. Async resources with a limit added may occasionally produce one item more than the limit on some runs. This behavior is not deterministic.

Arguments:

  • max_items int - The maximum number of items to yield

Returns:

  • "DltResource" - returns self

parallelize

def parallelize() -> TDltResourceImpl

[view_source]

Wraps the resource to execute each item in a threadpool to allow multiple resources to extract in parallel.

The resource must be a generator or generator function or a transformer function.

bind

def bind(*args: Any, **kwargs: Any) -> TDltResourceImpl

[view_source]

Binds the parametrized resource to passed arguments. Modifies resource pipe in place. Does not evaluate generators or iterators.

args_bound

@property
def args_bound() -> bool

[view_source]

Returns true if resource the parameters are bound to values. Such resource cannot be further called. Note that resources are lazily evaluated and arguments are only formally checked. Configuration was not yet injected as well.

explicit_args

@property
def explicit_args() -> StrAny

[view_source]

Returns a dictionary of arguments used to parametrize the resource. Does not include defaults and injected args.

state

@property
def state() -> StrAny

[view_source]

Gets resource-scoped state from the active pipeline. PipelineStateNotAvailable is raised if pipeline context is not available

__call__

def __call__(*args: Any, **kwargs: Any) -> TDltResourceImpl

[view_source]

Binds the parametrized resources to passed arguments. Creates and returns a bound resource. Generators and iterators are not evaluated.

__or__

def __or__(transform: Union["DltResource", AnyFun]) -> "DltResource"

[view_source]

Allows to pipe data from across resources and transform functions with | operator This is the LEFT side OR so the self may be resource or transformer

__ror__

def __ror__(data: Union[Iterable[Any], Iterator[Any]]) -> TDltResourceImpl

[view_source]

Allows to pipe data from across resources and transform functions with | operator This is the RIGHT side OR so the self may not be a resource and the LEFT must be an object that does not implement | ie. a list

__iter__

def __iter__() -> Iterator[TDataItem]

[view_source]

Opens iterator that yields the data items from the resources in the same order as in Pipeline class.

A read-only state is provided, initialized from active pipeline state. The state is discarded after the iterator is closed.

This demo works on codespaces. Codespaces is a development environment available for free to anyone with a Github account. You'll be asked to fork the demo repository and from there the README guides you with further steps.
The demo uses the Continue VSCode extension.

Off to codespaces!

DHelp

Ask a question

Welcome to "Codex Central", your next-gen help center, driven by OpenAI's GPT-4 model. It's more than just a forum or a FAQ hub – it's a dynamic knowledge base where coders can find AI-assisted solutions to their pressing problems. With GPT-4's powerful comprehension and predictive abilities, Codex Central provides instantaneous issue resolution, insightful debugging, and personalized guidance. Get your code running smoothly with the unparalleled support at Codex Central - coding help reimagined with AI prowess.