Commit Graph

808 Commits

Author SHA1 Message Date
sfc-gh-tclinkenbeard
e724c90ffe Remove unnecessary GLOBAL_TAG_THROTTLING_MIN_TPS knob 2023-05-25 16:45:32 -07:00
He Liu
1900b63acd Merge branch 'main' of https://github.com/apple/foundationdb into delete-data-move-checkpoints-by-id 2023-05-24 13:41:02 -07:00
He Liu
9100507928 Disable physical shard move by default. 2023-05-24 08:51:13 -07:00
Jingyu Zhou
13800ae1a8 Increase BW_RK_SIM_QUIESCE_DELAY to 400s
The blob worker needs more time to catchup, about 388s in the failed simulation
test.

Reproduction:
  seed: -f ./tests/slow/BlobGranuleVerifyLargeClean.toml -s 4068151139 -b on
  commit: 3bdd71cb0 at release-7.3 branch
  build: gcc
2023-05-23 15:54:56 -07:00
He Liu
5160f91e78 Removed SHARD_ENCODE_LOCATION_METADATA. 2023-05-23 13:39:25 -07:00
He Liu
8ad7ec6fdf Psm ss (#9817)
* Update NativeAPI getCheckpointForRange().

* Implemented checkpoint in SS.

* clean up.

* Disabled StorageServerCheckpointTest.

* Serialized checkpoint creation and deletion.

Simplified checkpoint GC, via deleting CheckpointMetaData::dir.

* Fixed PhysicalShardMove test. Where fetchCheckpoint target range is misset.

* Minor improvements on CheckpointMetaData and DataMoveMetaData.

* fmt.

* Optimized PhysicalShardMove test

cleanup.

* Refactored ShardedRocks checkpoint/restore for psm.

* Complete ShardedRocks::restore.

* dismiss operation_obsolete, and throw actor_cancelled.

* Validate checkpoint when !asKeyValues.

* fmt.

* Don't read from uninitialized physical shard.

* Resolved commments.

* cleanup.

* Added verify_checksum_before_restore for ShardedRocks.

* Added ShardedRocksDB checkpoint/restore unit test.

* Populate CheckpointMetaData::dir in RocksDB.

* Rename MovingIn as Adding.

* Added StorageServerUtils.

* Added physical shard move in SS.

* Fix on ApplyMetaData, doFetchFile error handling etc.

* Debugging incorrect shard size.

* Create/delete checkpoints only when Physical shard move is enabled.

* Added back SHARD_ENCODE_LOCATION_METADATA.

* Fixed bytesSample incorrect issue.

Essentially dedicated CheckpointRocksDBCF as key-value based checkpoint, will need to add a new format for the file-based checkpoint.

* Cleanup.

* Cleanup & compile rocksdb with 8.1 branch.

* clean up.

* clean up.

* Allowed request_maybe_delivered error type in FetchShard.

* Added FDBRocksDBVersion.h.

* Fixed stuck fetchShard.

* Don't create checkpoint on TSS.

* Upgrade to RocksDB 8.1.1

* Cleanup.

* Fixed accidently deleted db_path and name fields.

* Improved trace event.

* Removed redundants from previuos ShardedrocksDB.

* Cleanup.

* cleanup.

* cleanup.

* reanme `state`.

* Cleanup.

* Removed excessive TraceEvent.

* * Fixed shardMap race condition on different threads
* Added *Stats, logging data move rates.
* Added `DD_PHYSICAL_SHARD_MOVE_PROBABILITY` to support hybrid data move.

* Resolved comments.

* fmt.

* Use physical shard move in PhysicalShardMoveTest.

* Enforce physical-shard-move for PhysicalShardMoveTest.

* fmt
2023-05-23 11:18:35 -07:00
Xiaoxi Wang
969196d8ba Add read ops shard metrics notify bound 2023-05-23 09:46:34 -07:00
He Liu
eaa934dac6 Added more logs about shard management. (#10303) 2023-05-22 18:00:00 -07:00
Yao Xiao
bbf15be05f Knobs to speed up DB open. (#10301) 2023-05-22 16:21:05 -07:00
Hui Liu
7ca13d8f9c support blob restore in fdbrestore (#10248) 2023-05-19 14:45:14 -07:00
Zhe Wu
93ad70db38 Merge pull request #10263 from halfprice/zhewu/gc-generation-using-recoverat
GC earlier TLog generation using each generation's `recover at` version instead of `start version`
2023-05-19 12:07:02 -07:00
Yao Xiao
cef93f7d22 knobs (#10253) 2023-05-18 14:58:09 -07:00
Zhe Wu
0bdfe1889b Add recovered at in CSTATE, and use a knob to guard the use of it 2023-05-16 12:47:00 -07:00
neethuhaneesha
854464a6af Hex values in TSS logs and rocksb debuglogs mode knob (#10231) 2023-05-16 10:34:58 -07:00
Zhe Wang
852e012eb2 Adding throttling of audit storage tasks and tracing progress of tasks (#10233)
* when trigger doAuditOnStorageServer, check remainingBudgetForAuditTasks

* add trace event of audit progress

* address comments

* code clean up

* make dispatch and schedule audit be more clear

* make dispatch and schedule audit be more clear 2

* make dispatch and schedule audit be more clear 3

* address comments
2023-05-15 16:19:41 -07:00
Jingyu Zhou
9675f13ba9 Reduce STORAGE_FETCH_KEYS_DELAY to speedup data movement
Buggified value of 100s is too long to cause consistency check failures.
2023-05-15 13:56:08 -07:00
neethuhaneesha
92d1da79a9 RocksDB WAL archive options. (#10211) 2023-05-10 21:36:18 -07:00
Zhe Wang
8559d4f1a8 Adding cleanup of old audit metadata (#10137)
* clean up old audit metadata

* change comments

* fix audit cleanup rule as PR description claim and reduce timeout of auditStorageCorrectness in tester

* address comment

* clear audit metadata should not throw error

* cleanup progress metadata by type

* control number of AuditStatistic events

* carefully persist new audit state

* add unit tests and fix issues

* cleanup

* allow audit concurrent run for different types and fix some bug in auditutl

* fix ci issue and nits
2023-05-10 19:32:04 -07:00
Yao Xiao
182d2cafbf Log physical shard size in KVS 2023-05-10 12:54:59 -07:00
Yao Xiao
2d1b5d02e2 Range deletion memory usage improvements (#10048) 2023-05-10 10:23:01 -07:00
Yao Xiao
fa101e1e11 Log background error and add knobs for memory tuning. (#9841)
* error logger

* recovery mode
2023-05-10 10:23:01 -07:00
Yao Xiao
fa821c0ed6 Cherrypick #9746 2023-05-10 10:23:01 -07:00
Yao Xiao
abd45c4486 Cherrypick #9665 2023-05-10 10:23:01 -07:00
Josh Slocum
6be0c74d5b Adding explicit blob range mutation log to handle large number of ranges (#10174)
* Adding explicit blob range mutation log to handle large number of ranges

* fixing ide build
2023-05-09 11:30:04 -05:00
Evan Tschannen
c8e8505101 buggified max_shards_on_large_teams (#10105)
* buggified max_shards_on_large_teams, and had the consistency scan verify the proper number of shards have been overreplicated

* fix: when restarting the data distributor, do no allow more than max_shards_on_large_teams shards to be marked as healthy
2023-05-08 16:56:42 -07:00
Ankita Kejriwal
63354f68ad Update knob values for Storage Quota polling intervals (#10154) 2023-05-08 10:06:29 -07:00
neethuhaneesha
8b2f3bcfdc Rocksdb paranoid file checks knob. 2023-05-04 11:49:38 -07:00
Xiaoxi Wang
91de1c880e remove PrepareBlobRestore waiting for inFlight moving 2023-05-03 14:43:23 -07:00
Xiaoxi Wang
d7c089fd13 add timeout to blob migrator getReply to tackle recovery during preparation 2023-05-03 14:43:23 -07:00
Josh Slocum
22155c84f4 adding logic to disable splitting within a truncated tuple, and validating it in test (#10106) 2023-05-03 10:23:46 -05:00
Zhe Wu
fffdfa5b3d Increase MAX_STORAGE_COMMIT_TIME to be inline with LOW_PRIORITY_DURABILITY_LAG 2023-05-02 11:12:52 -07:00
Zhe Wang
d6e7b5f736 Audit storage: validate consistency of replica and shard location metadata (#9628)
* Implemented AuditUtils.actor.cpp

Moved AuditUtils to fdbserver/

* Persist AuditStorageState.

* Passed persisted AuditStorageState test.

* Added audit_storage_error to indicate a corruption is caught.

Throw/Send audit_storage_error when there is a data corruption.

Added doAuditStorage() for resuming Audit.

* Load and resume AuditStorage when DD restarts.

* Generate audit id monotonically.

* Fixed minor issue AuditId/Type was not set.

* Adding getLatestAuditStates.

* Improved persisted errors and added AuditStorageCommand.actor.cpp for
fdbcli.

* Added `audit_storage` fdbcli command.

* fmt.

* Fixed null shared_ptr issue.

* Improve audit data.

* Change DDAuditFailed to SevWarn.

* Sev.

* set SERVE_AUDIT_STORAGE_PARALLELISM to 1.

* Moved AuditUtils* to fdbclient/.

* Added getAuditStatus fdbcli command.

* Refactor audit storage fdb cli commands.

* Added auditStorage in sim.

* Cleanup.

* Resolved comments.

* Resolved comments.

* Added SystemData for metadata audit.

Refactored audit workflow to make sure all sub-tasks are executed w/o
early exit.

* Improvements.

* Persisted Failed state after too many retries.

* Added retryCount for resumeAuditStorage().

* resolving conflict.

* Resolved conflicts.

* allow-merged-to-run

* add timeout to audit client

* fmt

* validate replica

* add audit serverKey

* address comments and fmt

* fix audit_storage_exceeded_request_limit

* fix segfault in getLatestAuditStatesImpl

* fix bugs

* remove timeout from workload

* fix bugs

* audit local view of shard assignment

* fmt

* fix-stuck-issue-and-make-dd-audit-storage-self-retry

* fix timeout

* fix timeout

* fix bugs and cleanup

* fix nit

* change name state to coreState for audit metadata

* address comments

* code clean

* fmt

* setup debug

* cleanup

* clean up

* code cleanup

* code clean

* remove tmp file

* fmt

* trace portion of shards that of anonymous physical shard

* remove unnecessary actor cleanup

* do not give up when tr is too old

* address commits

* refactor

* clean

* fmt

* fix-command-help-text

* fix-auditstate-restore-and-enable-restore-to-metadata-audit

* address comments

* fmrt

* debug and improve efficient of resume audit

* small change

* fix audit cli

* bypass completed audit when dd restart

* fix auditStorageCommandActor

* make mismatch key range more visable

* address comments

* make local shard metadata check can make progress by retries

* address comments

* address comments

* partition location metadata validation by range and server

* unset MIN_TRACE_SEVERITY

* address comments and SS auto proceed until failed then notify dd

* persistNewAuditState should checkMoveKeysLock

* audit storage location metadata partitioned by range and move shard assignment history def to the end of SS structure

* code cleanup

* fix error message in metadata validation

* fix registerAuditsForShardAssignmentHistoryCollection input for local shard validation

* add comments to code and add guard to make sure the SS audit does not proceeds automatically for many times without being notified by DD --- to support audit cancellation later

* fix coalesceRangeList

* replace rangeOverlapping func with operator and use struct instead of complicated type for return value of getKeyServer/serverKey/shardInfo

* simplify shard assignment history

* shardAssignmentRecordRequests should be unorder_map

* address comments, make trackShardAssignment simple, make anyChildAuditFailed cover all audit children, keep only one audit actor run at a time on each SS

* only run validate shard info once at a time, other audit type does not have this limitation

---------

Co-authored-by: He Liu <heliu05023@gmail.com>
Co-authored-by: He Liu <heliu@apple.com>
Co-authored-by: Zhe Wang <zhewang@Zhes-Laptop.local>
2023-05-01 10:35:52 -07:00
Steve Atherton
16d8b1d1f9 Merge pull request #9949 from sfc-gh-etschannen/fix-shard-count
fix: do not let too many shards use large teams
2023-04-29 23:50:49 -07:00
neethuhaneesha
53fe07a709 Enabling auto_prefix_mode to true in rocksdb. (#10050) 2023-04-27 12:11:48 -07:00
He Liu
e11f804f96 ShardedRocks checkpoint/restore for physical shard move (#9752)
* Update NativeAPI getCheckpointForRange().

* Implemented checkpoint in SS.

* clean up.

* Disabled StorageServerCheckpointTest.

* Serialized checkpoint creation and deletion.

Simplified checkpoint GC, via deleting CheckpointMetaData::dir.

* Fixed PhysicalShardMove test. Where fetchCheckpoint target range is misset.

* Minor improvements on CheckpointMetaData and DataMoveMetaData.

* fmt.

* Optimized PhysicalShardMove test

cleanup.

* Refactored ShardedRocks checkpoint/restore for psm.

* Complete ShardedRocks::restore.

* dismiss operation_obsolete, and throw actor_cancelled.

* Validate checkpoint when !asKeyValues.

* fmt.

* Don't read from uninitialized physical shard.

* Resolved commments.

* cleanup.

* Added verify_checksum_before_restore for ShardedRocks.

* Added ShardedRocksDB checkpoint/restore unit test.

* Populate CheckpointMetaData::dir in RocksDB.

* Addressed comments.
2023-04-26 09:17:18 -07:00
sfc-gh-tclinkenbeard
9639192a88 Add GLOBAL_TAG_THROTTLING_REPORT_ONLY knob 2023-04-21 11:13:42 -07:00
Nim Wijetunga
22ba818133 Prevent Encryption Key Refresh for Non-Latest Keys (#9959)
prevent refresh for old encryption keys
2023-04-18 09:43:24 -07:00
sfc-gh-tclinkenbeard
7076c050d2 Decrease MIN_TAG_*_PAGES_RATE defaults 2023-04-17 15:16:29 -07:00
sfc-gh-tclinkenbeard
a7217055c8 Update default value of MIN_TAG_WRITE_PAGES_RATE to match default value of MIN_TAG_READ_PAGES_RATE 2023-04-17 11:58:50 -07:00
sfc-gh-tclinkenbeard
4be3c3e7ff Fix initialization of SERVER_KNOBS->MIN_TAG_READ_PAGES_RATE 2023-04-17 11:58:50 -07:00
sfc-gh-tclinkenbeard
0f0eb7c2b6 Add GLOBAL_TAG_THROTTLING_TRACE_INTERVAL knob 2023-04-17 10:09:38 -07:00
Evan Tschannen
12e507e06c rename knobs 2023-04-13 09:40:37 -07:00
Xiaoxi Wang
f7061debde remove unused CPU knob; add comments for EligibilityCounter 2023-04-12 09:33:05 -07:00
Xiaoxi Wang
b0fe14aed5 getTeam based on EligiblityCount 2023-04-12 09:33:05 -07:00
Xiaoxi Wang
7ca44124d4 explain what does pivot ratio mean; fix the knob assertion 2023-04-12 09:33:05 -07:00
Xiaoxi Wang
5648f827a0 adjust CPU pivot knobs to hack simulation test 2023-04-12 09:33:05 -07:00
Xiaoxi Wang
31fd4bb272 consider consistent low CPU status for 5min 2023-04-12 09:33:05 -07:00
Xiaoxi Wang
490a7b534a add getAverageCPU method; delete default value of GetTeamRequest
arguments (solve conflicts)
2023-04-12 09:33:05 -07:00
Zhe Wu
10a6f3d2d0 Merge pull request #9890 from halfprice/zhewu/log-router-gray-failure
Gray failure detects disconnected remote log router and recover high DC lag
2023-04-07 16:25:11 -07:00
Hui Liu
396f89a3f4 Cleanup stale disk files for double recruitment of storage server (#9794) 2023-04-06 12:13:59 -07:00