Django Bulkmodel

This projects adds a number of features missing from Django’s ORM. It enables heterogeneous updates, concurrent writes, retrieving records after bulk-creating them, and offline connection management to name a few features it provides.


What BulkModel does, by example

Suppose you have the following model:

from bulkmodel.models import BulkModel

class Foo(BulkModel):
    name = models.CharField(max_length=50, blank=False)
    value = models.IntegerField(null=False)

Some things you can do:

Retrieve bulk-created model instances

from random import randint, random, string

ls = []
for i in range(10):
    ls.append(Foo(
        # random string
        name = ''.join(random.choices(string.ascii_uppercase, 25)),

        # random value
        value = randint(0, 1000),
    ))

# create instances and return a queryset of the created items
foos = Foo.objects.bulk_create(ls, return_queryset=True)

Heterogeneously update data

The .update() method on a queryset performs a homogeneous update. That is, one or more columns for all the records in the queryset are updated to the same value.

Django-bulkmodel lets you set different values for different primary keys, with a simple and intuitive API, by introducing a method on a queryset called update_fields().

for foo in foos:
    foo.value += randint(100, 200)

# update all fields that changed
foos.update_fields()

# or update just the value field
foos.update_fields('value')

Concurrent writes

The batch_size flag that ships with django inserts data synchronously, blocking on each batch to be written into the database.

If your database hardware is sufficient and you’re on Python 3.4+ you can decrease overall write time by batch inserting concurrently. With django-bulkmodel you simply turn on the concurrency flag into any write operation.

foos = ...

# concurrently write foos into the database
Foo.objects.bulk_create(foos, concurrent=True, batch_size=1000, max_concurrent_workers=10)

# a regular (homogeneous) update can be written concurrently
foos.update(concurrent=True, batch_size=1000, max_concurrent_workers=10)

# and so can a heterogeneous update
foos.update_fields(concurrent=True, batch_size=1000, max_concurrent_workers=10)



Indices and tables