1. Bulk Create¶
Django ships with a bulk_create
method that supports a batch_size parameter for batch writing.
Django-bulkmodel expands on this queryset method with some new options.
1.1. Returning queryset¶
Creating data in bulk returns what the database returns: the number of records created.
However there many cases where you want to obtain the created records for further manipulation, and there’s no way to do with this without have the primary keys associated with each record.
Django-bulkmodel exposes a parameter called return_queryset
which returns created data as a queryset.
from random import randint, random, string
from bulkmodel.models import BulkModel
class Foo(BulkModel):
name = models.CharField(max_length=50, blank=False)
value = models.IntegerField(null=False)
foo_objects = []
for i in range(10):
foo_objects.append(Foo(
# random string
name = ''.join(random.choices(string.ascii_uppercase, 25)),
# random value
value = randint(0, 1000),
))
# create instances and return a queryset of the created items
foos = Foo.objects.bulk_create(foo_objects, return_queryset=True)
1.2. Writing data by copying from a buffer¶
Bulk create will perform several inserts. Depending on your schema and database it may be faster to load data from a path or buffer.
For supported databases, a BulkModel queryset exposes this functionality.
foos = []
for i in range(10):
foos.append(Foo(
# random string
name = ''.join(random.choices(string.ascii_uppercase, 25)),
# random value
value = randint(0, 1000),
))
foos = Foo.objects.copy_from_objects(ls, return_queryset=True)
The return_queryset
is available on all write methods. See the Queryset Reference for more details.
1.3. Missing signals¶
A BulkModel
adds several signals, including signals around creating data in bulk.
These signals are coupled to the two methods of creating data, as documented above:
pre_bulk_create
/post_bulk_create
: signals fired when data is created frombulk_create
pre_copy_from_instances
/post_copy_from_instances
: signals fired when data is created usingcopy_from_objects
You can optionally turn off emitting signals when creating data.
foo_objects = ...
# do not send signals (the default is True)
Foo.objects.bulk_create(foo_objects, send_signals=False)
For more information see the signals user guide or the signals API reference.
1.4. Concurrent writes¶
You can accelerate the loading of data by splitting work into batches and writing each batch concurrently.
A BulkModel queryset exposes three parameters to give you full control over this process:
batch_size
: The size of each chunk to write into the database; this parameter can be used with or without concurrencyconcurrent
: If true, a write will happen concurrently. The default is Falsemax_concurrent_workers
: The total number of concurrent workers involved in the event loop.
Example
foos = ...
# concurrently write foos into the database
Foo.objects.bulk_create(foos, concurrent=True, batch_size=1000, max_concurrent_workers=10)
# a regular (homogeneous) update can be written concurrently
foos.update(concurrent=True, batch_size=1000, max_concurrent_workers=10)
# and so can a heterogeneous update
foos.update_fields(concurrent=True, batch_size=1000, max_concurrent_workers=10)
For more information see the concurrent writes user guide or the queryset API reference.