Version: 1.2.0 (latest)

SQLAlchemy destination

The SQLAlchemy destination allows you to use any database that has an SQLAlchemy dialect implemented as a destination.

Currently, MySQL and SQLite are considered to have full support and are tested as part of the dlt CI suite. Other dialects are not tested but should generally work.

Install dlt with SQLAlchemy

Install dlt with the sqlalchemy extra dependency:

pip install "dlt[sqlalchemy]"

Note that database drivers are not included and need to be installed separately for the database you plan on using. For example, for MySQL:

pip install mysqlclient

Refer to the SQLAlchemy documentation on dialects for information about client libraries required for supported databases.

Create a pipeline

1. Initialize a project with a pipeline that loads to MS SQL by running:

dlt init chess sqlalchemy

2. Install the necessary dependencies for SQLAlchemy by running:

pip install -r requirements.txt

or run:

pip install "dlt[sqlalchemy]"

3. Install your database client library.

E.g., for MySQL:

pip install mysqlclient

4. Enter your credentials into .dlt/secrets.toml.

For example, replace with your database connection info:

[destination.sqlalchemy.credentials]
database = "dlt_data"
username = "loader"
password = "<password>"
host = "localhost"
port = 3306
driver_name = "mysql"

Alternatively, a valid SQLAlchemy database URL can be used, either in secrets.toml or as an environment variable. E.g.

[destination.sqlalchemy]
credentials = "mysql://loader:<password>@localhost:3306/dlt_data"

export DESTINATION__SQLALCHEMY__CREDENTIALS="mysql://loader:<password>@localhost:3306/dlt_data"

An SQLAlchemy Engine can also be passed directly by creating an instance of the destination:

import sqlalchemy as sa
import dlt

engine = sa.create_engine('sqlite:///chess_data.db')

pipeline = dlt.pipeline(
    pipeline_name='chess',
    destination=dlt.destinations.sqlalchemy(engine),
    dataset_name='main'
)

Notes on SQLite

Dataset files

When using an SQLite database file, each dataset is stored in a separate file since SQLite does not support multiple schemas in a single database file. Under the hood, this uses ATTACH DATABASE.

The file is stored in the same directory as the main database file (provided by your database URL).

E.g., if your SQLite URL is sqlite:////home/me/data/chess_data.db and your dataset_name is games, the data is stored in /home/me/data/chess_data__games.db

Note: If the dataset name is main, no additional file is created as this is the default SQLite database.

In-memory databases

In-memory databases require a persistent connection as the database is destroyed when the connection is closed. Normally, connections are opened and closed for each load job and in other stages during the pipeline run. To ensure the database persists throughout the pipeline run, you need to pass in an SQLAlchemy Engine object instead of credentials. This engine is not disposed of automatically by dlt. Example:

import dlt
import sqlalchemy as sa

# Create the SQLite engine
engine = sa.create_engine('sqlite:///:memory:')

# Configure the destination instance and create pipeline
pipeline = dlt.pipeline('my_pipeline', destination=dlt.destinations.sqlalchemy(engine), dataset_name='main')

# Run the pipeline with some data
pipeline.run([1,2,3], table_name='my_table')

# The engine is still open and you can query the database
with engine.connect() as conn:
    result = conn.execute(sa.text('SELECT * FROM my_table'))
    print(result.fetchall())

Write dispositions

The following write dispositions are supported:

append
replace with truncate-and-insert and insert-from-staging replace strategies. staging-optimized falls back to insert-from-staging.
merge with delete-insert and scd2 merge strategies.

Data loading

Data is loaded in a dialect-agnostic manner with an insert statement generated by SQLAlchemy's core API. Rows are inserted in batches as long as the underlying database driver supports it. By default, the batch size is 10,000 rows.

Syncing of `dlt` state

This destination fully supports dlt state sync.

Data types

All dlt data types are supported, but how they are stored in the database depends on the SQLAlchemy dialect. For example, SQLite does not have DATETIME or TIMESTAMP types, so timestamp columns are stored as TEXT in ISO 8601 format.

Supported file formats

typed-jsonl is used by default. JSON-encoded data with typing information included.
parquet is supported.

Supported column hints

unique hints are translated to UNIQUE constraints via SQLAlchemy (granted the database supports it).

SQLAlchemy destination

Install dlt with SQLAlchemy

Create a pipeline

Notes on SQLite

Dataset files

In-memory databases

Write dispositions

Data loading

Syncing of `dlt` state

Data types

Supported file formats

Supported column hints

DHelp

Ask a question

SQLAlchemy destination

Install dlt with SQLAlchemy​

Create a pipeline​

Notes on SQLite​

Dataset files​

In-memory databases​

Write dispositions​

Data loading​

Syncing of dlt state​

Data types​

Supported file formats​

Supported column hints​

DHelp

Ask a question

Install dlt with SQLAlchemy

Create a pipeline

Notes on SQLite

Dataset files

In-memory databases

Write dispositions

Data loading

Syncing of `dlt` state

Data types

Supported file formats

Supported column hints