Files
apple-foundationdb/documentation/sphinx/source/bulkload.rst
Michael Stack d537de748b Add bulkload page and correct error in user-facing bulkload doc. (#12585)
* * documentation/sphinx/source/bulkdump.rst
 Just formatting to make the page look prettier.

* documentation/sphinx/source/bulkload-user.rst
 Correct errors. No need for locking or shard location in metadata when
 bulkdumping.

* documentation/sphinx/source/bulkload.rst
 Add a developer's bulkload doc. to mirror the bulkdump one.

* Fix doc compile
2025-12-09 16:42:12 -08:00

121 lines
8.0 KiB
ReStructuredText

##############################
BulkLoad (Dev)
##############################
Overview
========
The BulkLoad feature works in conjunction with :doc:`BulkDump <bulkdump>` to provide a complete data migration solution.
BulkLoad takes manifest files and SST files generated by :doc:`BulkDump <bulkdump>` and efficiently loads them into a target FoundationDB cluster.
Input and Requirements
----------------------
When a user wants to start a bulkload job, the user provides:
1. **Job ID**: The unique identifier from the corresponding BulkDump job
2. **Key Range**: The range of keys to load (must be within the dumped range)
3. **Source Path**: Either a local directory or `blobstore URL <https://apple.github.io/foundationdb/backups.html#backup-urls>`_ containing the dump files
**Required Configuration:**
BulkLoad requires the following server knobs to be enabled:
- ``--knob_shard_encode_location_metadata=1``: Enables shard-aware location metadata
- ``--knob_enable_read_lock_on_range=1``: Enables exclusive range locking during load operations
**Input File Structure:**
BulkLoad expects the input files to be organized as produced by :doc:`BulkDump <bulkdump>`.
How to use?
-----------
Currently, FDBCLI tools and low-level ManagementAPIs are provided to submit a job or clear a job.
These operations are achieved by issuing transactions to update the bulkload metadata and taking exclusive locks on the target range.
Submitting a job involves validating the input parameters, taking an exclusive read lock on the target range, and writing job metadata.
When submitting a job, the API checks if there is any ongoing bulkload job or conflicting locks. If yes, it will reject the job. Otherwise, it accepts the job.
Clearing a job releases the range lock and marks the job as cancelled in the metadata.
FDBCLI provides following interfaces to do the operations:
1. Submit a job: bulkload load (JobID) (BeginKey) (EndKey) (RootFolder) // ...where JobID is from BulkDump, and RootFolder is a local directory or blobstore URL
2. Clear a job: bulkload cancel (JobID)
3. Enable the feature: bulkload mode on \| off // "bulkload mode" command prints the current value (on or off) of the mode
4. Check status: bulkload status // Shows current running job information
5. View history: bulkload history // Shows completed job history
For detailed usage examples and quickstart guide, see :doc:`bulkload-user`.
ManagementAPI provides following interfaces to do the operations:
1. Submit a job: ``submitBulkLoadJob(BulkLoadJobState jobState)``
2. Clear a job: ``cancelBulkLoadJob(UID jobId)``
3. Enable the feature: ``setBulkLoadMode(int mode)`` // Set mode = 1 to enable; Set mode = 0 to disable
4. Get job status: ``getBulkLoadJobStatus(Database cx)``
5. BulkLoad job metadata is generated by ``createBulkLoadJob()``
Mechanisms
==========
Workflow
--------
- Users submit a BulkLoad job via ``submitBulkLoadJob()`` specifying the source JobID, target range, and data location
- The API validates the job parameters and checks for conflicting BulkLoad/BulkDump jobs
- An exclusive read lock is taken on the entire target range using ``takeExclusiveReadLockOnRange()``
- Job metadata is persisted to bulkload job space (``\\xff/bulkLoadJob/`` prefix) and task space is initialized
- DD's ``bulkLoadJobManager()`` detects the new job and downloads the global job-manifest.txt file
- DD parses the job manifest to build a map of manifest entries by key range
- DD creates BulkLoad tasks by grouping manifest entries (up to ``MANIFEST_COUNT_MAX_PER_BULKLOAD_TASK`` per task)
- Each task is persisted to task metadata space (``\\xff/bulkLoadTask/`` prefix) and triggers data movement
- DD's ``doBulkLoadTask()`` coordinates with data movement system to load SST files into target shards
- Storage servers receive data movement requests containing BulkLoad task information
- Storage servers download SST files, validate integrity, and apply data using storage engine ingestion
- Tasks complete and are marked as ``BulkLoadPhase::Complete`` or ``BulkLoadPhase::Error``
- When all tasks finish, the job is finalized and moved to job history, and the range lock is released
Range Locking
-------------
BulkLoad uses FoundationDB's range locking mechanism to ensure data consistency:
- ``registerRangeLockOwner()`` registers the BulkLoad system as a lock owner with name ``"BulkLoad"``
- ``takeExclusiveReadLockOnRange()`` takes an exclusive read lock on the entire job range during ``submitBulkLoadJob()``
- This prevents any concurrent transactions from modifying data in the target range
- Lock-aware transactions can still read from the range during the load process
- The lock is automatically released via ``releaseExclusiveReadLockOnRange()`` when the job completes, is cancelled, or errors
- Range locks are managed through the ``\\xff/rangeLock/`` keyspace with owner information in ``\\xff/rangeLockOwner/``
- BulkLoad jobs will fail with ``range_lock_reject`` if the target range is already locked by another operation
Invariants
----------
- At any time, FDB cluster accepts at most one bulkload job. ``submitBulkLoadJob()`` checks for existing BulkLoad or BulkDump jobs and rejects with ``bulkload_task_failed()`` if conflicts exist
- DD partitions jobs into tasks where each task contains up to ``MANIFEST_COUNT_MAX_PER_BULKLOAD_TASK`` manifest entries
- Task ranges are determined by the union of all manifest ranges within the job range
- Tasks are assigned to data movement operations that target the appropriate storage servers for each shard
- BulkLoad tasks are tracked in ``BulkLoadTaskCollection`` to coordinate with data movement and prevent shard boundary changes
- Each task validates source data integrity through manifest checksums and range validation
- Tasks complete atomically - either all manifests in a task succeed or the entire task is marked as error
- The job range remains exclusively locked throughout the entire operation until completion or cancellation
- Task metadata persists through DD restarts - incomplete tasks are automatically resumed
Data Validation
---------------
- **Manifest Validation**: Task ranges are validated against source manifest files using ``getBulkLoadManifestMetadataFromEntry()``
- **Job Coverage Validation**: The job range must be entirely covered by the source dataset or the job fails with ``bulkload_dataset_not_cover_required_range()``
- **Task Atomicity**: Each task either completes entirely or fails - partial task completion is not supported
- **SST File Integrity**: Storage engines validate SST file integrity during ingestion
- **Range Alignment**: Task ranges are aligned with shard boundaries and manifest boundaries
Failure Handling
----------------
- **DD Restart**: Tasks persist through DD restarts via ``\\xff/bulkLoadTask/`` metadata and are automatically resumed
- **Task Retry**: Failed tasks are retried automatically by the BulkLoad engine up to configured limits
- **Job Cancellation**: ``cancelBulkLoadJob()`` clears all metadata and releases range locks immediately
- **Data Movement Conflicts**: Tasks coordinate with data movement system through ``BulkLoadTaskCollection`` to handle shard reassignments
- **Lock Conflicts**: Jobs fail immediately with ``range_lock_reject`` if the target range is already locked
- **Manifest Download Failures**: Network/S3 failures during manifest download cause the job to error and move to history
- **Task Error Handling**: Individual task failures are marked as ``BulkLoadPhase::Error`` and can be acknowledged by users
- **Range Coverage Failures**: Jobs fail with ``bulkload_dataset_not_cover_required_range()`` if source data doesn't cover the requested range
Performance Considerations
--------------------------
- **Parallelism**: Controlled by ``DD_BULKLOAD_PARALLELISM`` knob for DD-level parallelism
- **Storage Server Load**: Each storage server handles one bulkload task at a time
- **Network Bandwidth**: Large SST files may saturate network bandwidth during downloads
- **Storage Engine Impact**: Direct SST ingestion bypasses normal write paths for better performance
- **Memory Usage**: SST files are typically loaded into memory for validation before application