apple-foundationdb

mirror of https://github.com/apple/foundationdb.git synced 2026-01-25 12:28:19 +00:00

Author	SHA1	Message	Date
Zhe Wang	42e17d8bd1	BulkLoading Use RangeLock (#11741 ) * use range lock in bulk load * refactor BulkLoading workload and nits * add background traffic * nits * address comments	2024-10-31 12:58:13 -07:00
Zhe Wang	43446204ed	Database Per-Range Lock (#11693 ) * range lock framework * improve the framework * persist to txnStateStore * fix bugs * code clean * code clean * bug fix * address comments * add complex test workload and fix bugs found by the workload * add workload correctness check and fix bugs * code clean up * add random range lock injection * fix bugs in RandomRangeLock.actor.cpp * enable random range lock injection in general workloads * add rangelockcycle test * disable random range lock in backup workloads * nits * add range lock ownership concept * enable lock ownership to rangeLock * api deal with tenant * fix CI * add test for multiple rangeLock owners * nits * address comments and renaming * address comments	2024-10-23 16:25:56 -07:00
Jingyu Zhou	712f88a1ff	More protocol version related code removal Removed code handle old protocol versions, i.e., before 7.1	2024-09-18 13:28:06 -07:00
Jingyu Zhou	fc30fc269e	Remove dead code after removing tagLocalityUpgraded usage 20240918-170752-jzhou-33111b2c3e6776aa	2024-09-18 11:23:09 -07:00
Jingyu Zhou	7b76561bb9	Remove tagLocalityUpgraded usage at various places Since we have removed old tlog implementation, so the code path using this tag can be deleted to simplify the code.	2024-09-18 11:23:09 -07:00
Syed Paymaan Raza	c3e7542cda	Update end year in copyright header	2024-08-02 09:40:11 -07:00
Zhe Wang	74990e44bd	Bulk Loading Framework (#11369 )	2024-07-23 14:57:28 -07:00
Zhe Wang	9af7eb7422	rebase and comments	2024-05-22 21:53:10 -07:00
Zhe Wang	f74a421988	fix types	2024-05-22 19:26:12 -07:00
Zhe Wang	b87e3003ac	fix-restart-test-dmid	2024-05-22 19:22:35 -07:00
Zhe Wang	33eecd0775	Real-time corruption detection with accumulative checksum (#11255 ) * acs framework * code refactor and fix bugs * add ss crash loop protector * use sharedptr instead of raw pointer * fixed critical bugs and add provate mutation acs to the framework * enable ACS for all mutations except for clear serverTag mutation and fix bugs * fix restarting tests * refactor code and fix bugs * fix AccumulativeChecksumState toString * fix bugs * allow all mutations in acs and fixed bugs * fix bugs and code cleanup * code clean up for adding recovery support * simplify code and support recovery * clear acs state at ss * fix bug * terminate validator if ss will be removed in the current batch * simplify code * add trace * address comments * optimize code * deep copy when adding mutation to acs validator * warp encode and decode persist acs key * make acstable private * remove unless func * remove unless func * remove epoch in ACS validator * add acs mutation counter in SS metrics * code cleanup and make knob check better * make mutation buffer global * simplify code * add comments * make knob randomly set * address comments * ss reboot after acs mismatch found	2024-04-04 15:03:44 -07:00
Dimitris Apostolou	a88114c222	Fix typos	2024-02-07 01:16:00 +02:00
Jingyu Zhou	78174692d3	Merge pull request #11107 from apple/synthesize-data	2024-01-04 09:17:39 -08:00
Dan Lambright	2b4b4ae512	Synthesize data on SS based off parameters from new system transaction	2023-12-20 11:25:47 -05:00
Dan Lambright	5ebe8b0915	move data to value and parse it	2023-12-18 09:10:06 -05:00
Dan Lambright	a20f9d3475	Interfaces to synthesize data	2023-12-13 15:19:17 -05:00
He Liu	6610dda763	Fixed unassigned data move reasons. (#11089 )	2023-12-06 18:42:37 -08:00
Zhe Wang	1e9c5bb390	Propagate data move reason from DD to SS (#11063 ) * encode reason to data move id * address comments * fix data move id decode bug and add assert for data move decode invariant * address comments	2023-11-15 13:07:11 -08:00
He Liu	b8f1670a0e	Physical shard move tss (#11057 ) * Refactored newDataMoveId() and decodeServerKeysValue(). * Enabled physical shard move for tss. * Added unit test & cleanup. * clean up test configs.	2023-11-13 11:34:07 -08:00
Jingyu Zhou	307491d68e	Use getRange for server metadatas To reduce read load on SSes that serve the reads.	2023-09-18 13:32:53 -07:00
Jingyu Zhou	12fe500633	ClusterController watches changes to storage metadata To retrieve storage metadata for every status json request is very expensive for clusters with a large number of storage servers. So I change the logic so that ClusterController actively monitors changes to storage metadata, and only retrieves them when there is a change.	2023-09-15 14:19:04 -07:00
Hui Liu	af20493ad0	Move lastFlushTs to BlobGranuleBackupConfig (#10505 )	2023-06-16 16:12:10 -07:00
Evan Tschannen	359e178dcd	Merge branch 'main' into feature-durable-change-feed # Conflicts: # fdbclient/ClientKnobs.cpp # fdbserver/BlobManager.actor.cpp # fdbserver/worker.actor.cpp	2023-06-11 13:58:35 -07:00
Evan Tschannen	f69f4c73ad	addressed review comments	2023-06-11 13:54:38 -07:00
Evan Tschannen	be8d8a8f72	fix: popping the cache was removing too many versions	2023-06-09 16:20:48 -07:00
Evan Tschannen	197c39b552	cache change feeds using a storage engine to avoid reading them for the server on startup	2023-06-07 08:41:31 -07:00
He Liu	8ad7ec6fdf	Psm ss (#9817 ) * Update NativeAPI getCheckpointForRange(). * Implemented checkpoint in SS. * clean up. * Disabled StorageServerCheckpointTest. * Serialized checkpoint creation and deletion. Simplified checkpoint GC, via deleting CheckpointMetaData::dir. * Fixed PhysicalShardMove test. Where fetchCheckpoint target range is misset. * Minor improvements on CheckpointMetaData and DataMoveMetaData. * fmt. * Optimized PhysicalShardMove test cleanup. * Refactored ShardedRocks checkpoint/restore for psm. * Complete ShardedRocks::restore. * dismiss operation_obsolete, and throw actor_cancelled. * Validate checkpoint when !asKeyValues. * fmt. * Don't read from uninitialized physical shard. * Resolved commments. * cleanup. * Added verify_checksum_before_restore for ShardedRocks. * Added ShardedRocksDB checkpoint/restore unit test. * Populate CheckpointMetaData::dir in RocksDB. * Rename MovingIn as Adding. * Added StorageServerUtils. * Added physical shard move in SS. * Fix on ApplyMetaData, doFetchFile error handling etc. * Debugging incorrect shard size. * Create/delete checkpoints only when Physical shard move is enabled. * Added back SHARD_ENCODE_LOCATION_METADATA. * Fixed bytesSample incorrect issue. Essentially dedicated CheckpointRocksDBCF as key-value based checkpoint, will need to add a new format for the file-based checkpoint. * Cleanup. * Cleanup & compile rocksdb with 8.1 branch. * clean up. * clean up. * Allowed request_maybe_delivered error type in FetchShard. * Added FDBRocksDBVersion.h. * Fixed stuck fetchShard. * Don't create checkpoint on TSS. * Upgrade to RocksDB 8.1.1 * Cleanup. * Fixed accidently deleted db_path and name fields. * Improved trace event. * Removed redundants from previuos ShardedrocksDB. * Cleanup. * cleanup. * cleanup. * reanme `state`. * Cleanup. * Removed excessive TraceEvent. * * Fixed shardMap race condition on different threads * Added Stats, logging data move rates. Added `DD_PHYSICAL_SHARD_MOVE_PROBABILITY` to support hybrid data move. * Resolved comments. * fmt. * Use physical shard move in PhysicalShardMoveTest. * Enforce physical-shard-move for PhysicalShardMoveTest. * fmt	2023-05-23 11:18:35 -07:00
Hui Liu	7ca13d8f9c	support blob restore in fdbrestore (#10248 )	2023-05-19 14:45:14 -07:00
Josh Slocum	2916a11a86	New ConsistencyScan (#10265 ) * Remove duplicate getRange() for DB handles and update existing GetRange to accept DB handles. * Initial progress checkpoint on new ConsistencyScan role. * Updated TODOs, finished most if not all state updates. * placeholder * Add more TODOs, documentation and comment improvements. * Checkpoint round state to avoid advancing progress if commit fails. * Bug fix, check is supposed to be for overlap, not lack of overlap. * Added more TODO's and added faked read results / exceptions and faked DB size retrieval to prove the consistencyScanCore logic works. * Update JSON schemas and command help. * Add comment about lifetime stats reset. * More TODO comments and some renames for clarity, some bug fixes. * properly stopping consistency scan in simulation so that it doesn't run forever and cause quiet database to fail * removing trailing comma from consistency_scan json schema * Making CC inconsistency not an error if it's intentional tss corruption * consistency scan actually reads storage locations * added check that consistency scan actually completes a round in simulation, fixed bug and added debugging around consistency scan getting stuck * made consistency scan properly fetch database size * refactoring data check to be used in both consistency scan and consistency check * checking that consistency scan always completes at least one round and doesn't get stuck * cleanup * fixing ide build * consistencyscan fdbcli command wasn't actually changing db state * consistencyscan fdbcli command always said enabled even when it wasn't --------- Co-authored-by: Steve Atherton <steve.atherton@snowflake.com>	2023-05-18 15:02:41 -05:00
Zhe Wang	8559d4f1a8	Adding cleanup of old audit metadata (#10137 ) * clean up old audit metadata * change comments * fix audit cleanup rule as PR description claim and reduce timeout of auditStorageCorrectness in tester * address comment * clear audit metadata should not throw error * cleanup progress metadata by type * control number of AuditStatistic events * carefully persist new audit state * add unit tests and fix issues * cleanup * allow audit concurrent run for different types and fix some bug in auditutl * fix ci issue and nits	2023-05-10 19:32:04 -07:00
Josh Slocum	6be0c74d5b	Adding explicit blob range mutation log to handle large number of ranges (#10174 ) * Adding explicit blob range mutation log to handle large number of ranges * fixing ide build	2023-05-09 11:30:04 -05:00
Steve Atherton	d52113e7a3	Bug fix, check is supposed to be for overlap, not lack of overlap.	2023-05-04 18:08:37 -07:00
Zhe Wang	d254fba6e5	Adding cleanup of audit progress metadata when audit complete (#10118 ) * cleanup audit progress metadata and tester directly issue audit requests to DD instead of CC * address comments and fix test dd issue request but dd not present	2023-05-03 15:39:22 -07:00
Zhe Wang	d6e7b5f736	Audit storage: validate consistency of replica and shard location metadata (#9628 ) * Implemented AuditUtils.actor.cpp Moved AuditUtils to fdbserver/ * Persist AuditStorageState. * Passed persisted AuditStorageState test. * Added audit_storage_error to indicate a corruption is caught. Throw/Send audit_storage_error when there is a data corruption. Added doAuditStorage() for resuming Audit. * Load and resume AuditStorage when DD restarts. * Generate audit id monotonically. * Fixed minor issue AuditId/Type was not set. * Adding getLatestAuditStates. * Improved persisted errors and added AuditStorageCommand.actor.cpp for fdbcli. * Added `audit_storage` fdbcli command. * fmt. * Fixed null shared_ptr issue. * Improve audit data. * Change DDAuditFailed to SevWarn. * Sev. * set SERVE_AUDIT_STORAGE_PARALLELISM to 1. * Moved AuditUtils* to fdbclient/. * Added getAuditStatus fdbcli command. * Refactor audit storage fdb cli commands. * Added auditStorage in sim. * Cleanup. * Resolved comments. * Resolved comments. * Added SystemData for metadata audit. Refactored audit workflow to make sure all sub-tasks are executed w/o early exit. * Improvements. * Persisted Failed state after too many retries. * Added retryCount for resumeAuditStorage(). * resolving conflict. * Resolved conflicts. * allow-merged-to-run * add timeout to audit client * fmt * validate replica * add audit serverKey * address comments and fmt * fix audit_storage_exceeded_request_limit * fix segfault in getLatestAuditStatesImpl * fix bugs * remove timeout from workload * fix bugs * audit local view of shard assignment * fmt * fix-stuck-issue-and-make-dd-audit-storage-self-retry * fix timeout * fix timeout * fix bugs and cleanup * fix nit * change name state to coreState for audit metadata * address comments * code clean * fmt * setup debug * cleanup * clean up * code cleanup * code clean * remove tmp file * fmt * trace portion of shards that of anonymous physical shard * remove unnecessary actor cleanup * do not give up when tr is too old * address commits * refactor * clean * fmt * fix-command-help-text * fix-auditstate-restore-and-enable-restore-to-metadata-audit * address comments * fmrt * debug and improve efficient of resume audit * small change * fix audit cli * bypass completed audit when dd restart * fix auditStorageCommandActor * make mismatch key range more visable * address comments * make local shard metadata check can make progress by retries * address comments * address comments * partition location metadata validation by range and server * unset MIN_TRACE_SEVERITY * address comments and SS auto proceed until failed then notify dd * persistNewAuditState should checkMoveKeysLock * audit storage location metadata partitioned by range and move shard assignment history def to the end of SS structure * code cleanup * fix error message in metadata validation * fix registerAuditsForShardAssignmentHistoryCollection input for local shard validation * add comments to code and add guard to make sure the SS audit does not proceeds automatically for many times without being notified by DD --- to support audit cancellation later * fix coalesceRangeList * replace rangeOverlapping func with operator and use struct instead of complicated type for return value of getKeyServer/serverKey/shardInfo * simplify shard assignment history * shardAssignmentRecordRequests should be unorder_map * address comments, make trackShardAssignment simple, make anyChildAuditFailed cover all audit children, keep only one audit actor run at a time on each SS * only run validate shard info once at a time, other audit type does not have this limitation --------- Co-authored-by: He Liu <heliu05023@gmail.com> Co-authored-by: He Liu <heliu@apple.com> Co-authored-by: Zhe Wang <zhewang@Zhes-Laptop.local>	2023-05-01 10:35:52 -07:00
Steve Atherton	b1a17cce0d	Added KeyBackedRangeMap and SystemKey.	2023-04-18 22:03:41 -07:00
Josh Slocum	370feaa3c9	refactoring and adding future compatibility to blob range metadata (#9955 ) * refactoring and adding future compatibility to blob range metadata * formatting	2023-04-13 15:06:50 -05:00
A.J. Beamon	64b6a5d257	Allow boolean parameters to be nested inside of namespaces or classes	2023-03-30 15:09:59 -07:00
He Liu	0f5e75b34b	Added newDataMoveId(). (#9647 ) * Added newDataMoveId(). * Added `ENABLE_DD_PHYSICAL_SHARD_MOVE` * fmt. * Replace `teamId` with `shardId`.	2023-03-16 18:06:06 -07:00
Josh Slocum	a5b4212990	adding blob granule logical size	2023-03-15 08:54:49 -05:00
Hui Liu	c43f8b3fdc	Refactor - introduce BlobRestoreController for APIs to manage restore state (#9616 )	2023-03-08 07:50:30 -08:00
Evan Tschannen	8872e5a462	Merge pull request #9347 from sfc-gh-etschannen/feature-change-feed-cache added a disk to blob workers	2023-02-24 13:59:03 -08:00
Nim Wijetunga	29819b0645	Change Feed Bug Fix + Encryption Asserts (#9457 ) * add encryption asserts * modify function name * address pr comments * address pr comments * Trigger Build	2023-02-23 19:33:25 -08:00
Evan Tschannen	8129381689	merge in main	2023-02-21 12:06:35 -08:00
Evan Tschannen	4f9e86b0a4	fixed two bugs that prevented the blob manager from properly loading worker affinity	2023-02-20 16:47:26 -08:00
Hui Liu	aa1d983132	Truncate logs after force-flushing cold blob granules	2023-02-17 10:17:04 -08:00
Evan Tschannen	20bc868ee0	merge in main	2023-02-13 12:41:31 -08:00
Evan Tschannen	bad8b2fad4	blob workers reboot with a different ID and register in the database their previous ID	2023-02-12 10:44:53 -08:00
A.J. Beamon	72c5abc0f5	Refactor storage quotas to store them in a key backed map in the tenant metadata space	2023-01-25 20:48:17 -08:00
Hui Liu	e1b06a62f9	Add tenant metadata ranges to manifest backup	2023-01-24 17:09:04 -08:00
Hui Liu	36e8e5a3bb	Merge pull request #9176 from sfc-gh-huliu/restoreversion Restore to a previous version	2023-01-19 17:37:23 -08:00

1 2 3 4 5 ...

396 Commits