* Multiple improvements to AuditStorages (#10685)
* remove danger DDAudit assert, add AuditRate knob, add progress check when ssshard complete, add progress check for ssshard in fdbcli
* throttle progress check for ssshard
* fix getAuditProgressByServer
* fix trace event for ss audit
* using name -- checkMoveKeysLockForAudit
* new scheduleAuditLocationMetadata
* address comments
* shorten progress summary for ssshard
* simplify getAuditProgressByServer in fdbcli
* Audit storage for specific engine (#10781)
* audit storage for specific engine
* fix getStorageType
* fix budget of skipAuditOnRange
* fix budget in scheduleAuditOnRange
* fix CI error
* improve trace events
* address comments
* Audit location metadata in DD (#10820)
* Audit location metadata in DD
* nits
* Fix auditStorage: Audit task should not retry if the task is issued by an outdated DD (copy from main PR 10844)
* Implemented AuditUtils.actor.cpp
Moved AuditUtils to fdbserver/
* Persist AuditStorageState.
* Passed persisted AuditStorageState test.
* Added audit_storage_error to indicate a corruption is caught.
Throw/Send audit_storage_error when there is a data corruption.
Added doAuditStorage() for resuming Audit.
* Load and resume AuditStorage when DD restarts.
* Generate audit id monotonically.
* Fixed minor issue AuditId/Type was not set.
* Adding getLatestAuditStates.
* Improved persisted errors and added AuditStorageCommand.actor.cpp for
fdbcli.
* Added `audit_storage` fdbcli command.
* fmt.
* Fixed null shared_ptr issue.
* Improve audit data.
* Change DDAuditFailed to SevWarn.
* Sev.
* set SERVE_AUDIT_STORAGE_PARALLELISM to 1.
* Moved AuditUtils* to fdbclient/.
* Added getAuditStatus fdbcli command.
* Refactor audit storage fdb cli commands.
* Added auditStorage in sim.
* Cleanup.
* Resolved comments.
* Resolved comments.
* Added SystemData for metadata audit.
Refactored audit workflow to make sure all sub-tasks are executed w/o
early exit.
* Improvements.
* Persisted Failed state after too many retries.
* Added retryCount for resumeAuditStorage().
* resolving conflict.
* Resolved conflicts.
* allow-merged-to-run
* add timeout to audit client
* fmt
* validate replica
* add audit serverKey
* address comments and fmt
* fix audit_storage_exceeded_request_limit
* fix segfault in getLatestAuditStatesImpl
* fix bugs
* remove timeout from workload
* fix bugs
* audit local view of shard assignment
* fmt
* fix-stuck-issue-and-make-dd-audit-storage-self-retry
* fix timeout
* fix timeout
* fix bugs and cleanup
* fix nit
* change name state to coreState for audit metadata
* address comments
* code clean
* fmt
* setup debug
* cleanup
* clean up
* code cleanup
* code clean
* remove tmp file
* fmt
* trace portion of shards that of anonymous physical shard
* remove unnecessary actor cleanup
* do not give up when tr is too old
* address commits
* refactor
* clean
* fmt
* fix-command-help-text
* fix-auditstate-restore-and-enable-restore-to-metadata-audit
* address comments
* fmrt
* debug and improve efficient of resume audit
* small change
* fix audit cli
* bypass completed audit when dd restart
* fix auditStorageCommandActor
* make mismatch key range more visable
* address comments
* make local shard metadata check can make progress by retries
* address comments
* address comments
* partition location metadata validation by range and server
* unset MIN_TRACE_SEVERITY
* address comments and SS auto proceed until failed then notify dd
* persistNewAuditState should checkMoveKeysLock
* audit storage location metadata partitioned by range and move shard assignment history def to the end of SS structure
* code cleanup
* fix error message in metadata validation
* fix registerAuditsForShardAssignmentHistoryCollection input for local shard validation
* add comments to code and add guard to make sure the SS audit does not proceeds automatically for many times without being notified by DD --- to support audit cancellation later
* fix coalesceRangeList
* replace rangeOverlapping func with operator and use struct instead of complicated type for return value of getKeyServer/serverKey/shardInfo
* simplify shard assignment history
* shardAssignmentRecordRequests should be unorder_map
* address comments, make trackShardAssignment simple, make anyChildAuditFailed cover all audit children, keep only one audit actor run at a time on each SS
* only run validate shard info once at a time, other audit type does not have this limitation
---------
Co-authored-by: He Liu <heliu05023@gmail.com>
Co-authored-by: He Liu <heliu@apple.com>
Co-authored-by: Zhe Wang <zhewang@Zhes-Laptop.local>
* Implemented AuditUtils.actor.cpp
Moved AuditUtils to fdbserver/
* Persist AuditStorageState.
* Passed persisted AuditStorageState test.
* Added audit_storage_error to indicate a corruption is caught.
Throw/Send audit_storage_error when there is a data corruption.
Added doAuditStorage() for resuming Audit.
* Load and resume AuditStorage when DD restarts.
* Generate audit id monotonically.
* Fixed minor issue AuditId/Type was not set.
* Adding getLatestAuditStates.
* Improved persisted errors and added AuditStorageCommand.actor.cpp for
fdbcli.
* Added `audit_storage` fdbcli command.
* fmt.
* Fixed null shared_ptr issue.
* Improve audit data.
* Change DDAuditFailed to SevWarn.
* Sev.
* set SERVE_AUDIT_STORAGE_PARALLELISM to 1.
* Moved AuditUtils* to fdbclient/.
* Added getAuditStatus fdbcli command.
* Refactor audit storage fdb cli commands.
* Added auditStorage in sim.
* Cleanup.
* Resolved comments.
* Resolved comments.
* Test disabling audit for sims.
* Cleanup.
Co-authored-by: He Liu <heliu@apple.com>