Commit Graph

417 Commits

Author SHA1 Message Date
Zhe Wang
ca4ab1eca9 Fix traceTooManyEvents and externalTimeouts in BulkLoad test (#11769) 2024-11-11 11:05:43 -08:00
Dan Lambright
4b3525f8a3 Do not test bulk loading and version vector simultaneously (#11759)
Co-authored-by: Dan Lambright <hlambright@apple.com>
2024-11-06 13:39:21 -08:00
Dan Lambright
5716e4e7c2 Do not check for PROXY_USE_RESOLVER_PRIVATE_MUTATIONS in rangeLockEnabled (#11752)
* Do not check for PROXY_USE_RESOLVER_PRIVATE_MUTATIONS in rangeLockEnabled

* Dont modify knob proxy_use_resolver_private_mutations in range lock tests

---------

Co-authored-by: Dan Lambright <hlambright@apple.com>
2024-11-05 12:13:18 -05:00
Dan Lambright
317956ee14 disable version vector with range lock tests (#11746)
* disable version vector with range lock tests

* turn off rangeLock if versionvector is on (#11747)

---------

Co-authored-by: Dan Lambright <hlambright@apple.com>
Co-authored-by: Zhe Wang <zhe.wang@wustl.edu>
2024-10-31 21:07:01 -04:00
Zhe Wang
42e17d8bd1 BulkLoading Use RangeLock (#11741)
* use range lock in bulk load

* refactor BulkLoading workload and nits

* add background traffic

* nits

* address comments
2024-10-31 12:58:13 -07:00
Zhe Wang
43446204ed Database Per-Range Lock (#11693)
* range lock framework

* improve the framework

* persist to txnStateStore

* fix bugs

* code clean

* code clean

* bug fix

* address comments

* add complex test workload and fix bugs found by the workload

* add workload correctness check and fix bugs

* code clean up

* add random range lock injection

* fix bugs in RandomRangeLock.actor.cpp

* enable random range lock injection in general workloads

* add rangelockcycle test

* disable random range lock in backup workloads

* nits

* add range lock ownership concept

* enable lock ownership to rangeLock

* api deal with tenant

* fix CI

* add test for multiple rangeLock owners

* nits

* address comments and renaming

* address comments
2024-10-23 16:25:56 -07:00
Zhe Wang
7d95b87483 improve the probability that sharded rocksdb is selected in simulation tests 2024-10-10 09:48:23 -07:00
He Liu
274ae7dacd Remove storage engine type from DataLossRecovery test. (#11655) 2024-09-12 13:31:03 -07:00
Dan Lambright
eaf08c2414 Disable version vector for tenant test. (#11644)
Co-authored-by: Dan Lambright <hlambright@apple.com>
2024-09-10 14:02:05 -04:00
Zhe Wang
5ee0db13e6 Fix external timeout with ShardedRocksDB and re-enable ShardedRocksDB in simulation tests (#11638)
* speedup sharded rocksdb in simulation

* re-enable shardedrocksdb and disable physical shard move
2024-09-08 10:57:55 -07:00
Zhe Wang
6c502e9707 Solve RocksDB external timeout error and re-enable RocksDB simulation tests (#11577)
* init knob tune

* include rocksdb in tests

* probably reuse rocksdb iterator in simulation

* clear unnecessary knob change
2024-08-16 12:37:18 -07:00
Zhe Wang
dcebf1a9bc Add extraStorageMachineCountPerDC config to simulation (#11529) 2024-07-26 09:05:41 -07:00
Zhe Wang
74990e44bd Bulk Loading Framework (#11369) 2024-07-23 14:57:28 -07:00
neethuhaneesha
a7498b50ad Excluding some sharded rocksdb tests in simulation 2024-05-29 13:28:31 -07:00
Jingyu Zhou
0b64d21c9c Disable sharded rocks for more simulation tests
Found in nightly and can't reproduce the ExternalTimeout error.
2023-09-27 09:59:40 -07:00
Jingyu Zhou
b2d724f4a5 Disable sharded rocks for more sim tests
Saw ExternalTimeout test failures for these.
2023-09-11 10:57:54 -07:00
Xiaoge Su
5a12d12774 fixup! Enable sharded RocksDB storage engine for PhysicalShardMove 2023-08-28 14:56:39 -07:00
He Liu
62859f7151 Disabled ShardedRocks for some sim tests due to timeout issue. (#10778) 2023-08-11 14:34:48 -07:00
Hui Liu
7c8c24bc8d blob restore : Log and skip data copy if we miss data for a certain tenant (#10621) 2023-07-19 09:52:30 -07:00
Ata E Husain Bohra
7779c908b3 EaR: Remove usage of ENABLE_CONFIGURABLE_ENCRYPTION knob (#10570)
Description

Given Configurable encryption has been checked in and being tested via
simulation for more than a month and also to avoid penalty of accessing
KNOBS in inline commit path, patch retires the KNOB and make
ConfigurationEncryption default EaR mode for FDB.

BlobCipher still supports the old format header and encryption semantics,
will remove the dead code as a followup PR.

Testing

devRunCorrectness - 100K
2023-06-30 17:48:09 -07:00
A.J. Beamon
155c03f6fe Decrease transaction rate of backup correctness clean to speed it up 2023-06-23 16:14:03 -07:00
Ata E Husain Bohra
bfbf8cd053 EaR: Update KMS URL refresh policy and fix bugs (#10382)
* EaR: Update KMS URL refresh policy and fix bugs

Description

RESTKmsConnector implements discovery and refresh semantics i.e.
on bootstrap it discovers KMS Urls and periodically refresh the
URLs (handle server upgrade scenario). The current implementation
caches the URLs in a min-heap, as part of serving a request, actor
pops out elements from min-heap and attempts connecting to the server,
on failure, the URL is temporarily stored in a stack, at the end of
the request processing, the stack is merged back into the heap.
The code doesn't work as expected if there are multiple requests
consumes the heap causing following issues:
1. Min-heap would retain old URLs replaced by latest refresh (stack merge)
2. URL discovery file is read more than expected as multiple requests can
empty heap, causing the code to read URLs from the file.

Patch proposes following policy to cache and maintain URLs priority:
1. Unresponsiveness penalty: KMS flaky connection or overload can cause
requests to timeout or fail; each such instance updates unresponsiveness
penalty of associated URL context. Further, the penalty is time bound and
deteriorate with time.
2. Cached URLs are sorted once a failure is encountered, priority followed
is:
2.1. Unresponsiveness penalty server(s) least preferred
2.2. Server(s) with high total-failures less preferred
2.3. Server(s) with high total-malformed response less preferred.
3. Updates RESTClient to throw 'retryable' error up to the client such as:
'connection_failed' and/or 'timeout'
4. Extend RESTUrl to support IPv6 format.

Testing

RESTUnit - 100K (new test added for coverage)
devRunCorrectness
2023-06-14 08:06:39 -07:00
A.J. Beamon
4e49f5b26d Merge pull request #10435 from sfc-gh-jslocum/disable_low_value_tests
removing/disabling explicit test files for quick running unit tests, …
2023-06-08 08:19:22 -07:00
Xiaoxi Wang
85a9f01554 fix format issue; combine mock DD tests into 1 toml file; temporarily
disable add storage servers in MockReadWrite test
2023-06-07 22:01:33 -07:00
Xiaoxi Wang
c307795301 Fix merge shard bug in mock finishMoveKeys; And testRawFinishMoveKeys bug in workload IDDTxnProcessorApiCorrectness.actor.cpp 2023-06-07 22:01:33 -07:00
Xiaoxi Wang
b09483a644 addStoragePerProcess method. Add testClass for mock dd test. 2023-06-07 22:01:33 -07:00
Xiaoxi Wang
e139f5bf90 Correctly handle buggify errors in MGSWaitStorageMetrics 2023-06-07 22:01:33 -07:00
Xiaoxi Wang
b444eb1f22 Make DataDistributor use the configuration object in DDSharedContext; Change Mock test config 2023-06-07 22:01:33 -07:00
Xiaoxi Wang
ac16dbd0d8 Fix mock DD incompatible places 2023-06-07 22:01:33 -07:00
Xiaoxi Wang
bef639ab81 change how MockDataDistributor start 2023-06-07 22:01:33 -07:00
Josh Slocum
07dd44d659 removing/disabling explicit test files for quick running unit tests, and converting actor fuzz to a unit test 2023-06-07 16:00:46 -05:00
He Liu
8ad7ec6fdf Psm ss (#9817)
* Update NativeAPI getCheckpointForRange().

* Implemented checkpoint in SS.

* clean up.

* Disabled StorageServerCheckpointTest.

* Serialized checkpoint creation and deletion.

Simplified checkpoint GC, via deleting CheckpointMetaData::dir.

* Fixed PhysicalShardMove test. Where fetchCheckpoint target range is misset.

* Minor improvements on CheckpointMetaData and DataMoveMetaData.

* fmt.

* Optimized PhysicalShardMove test

cleanup.

* Refactored ShardedRocks checkpoint/restore for psm.

* Complete ShardedRocks::restore.

* dismiss operation_obsolete, and throw actor_cancelled.

* Validate checkpoint when !asKeyValues.

* fmt.

* Don't read from uninitialized physical shard.

* Resolved commments.

* cleanup.

* Added verify_checksum_before_restore for ShardedRocks.

* Added ShardedRocksDB checkpoint/restore unit test.

* Populate CheckpointMetaData::dir in RocksDB.

* Rename MovingIn as Adding.

* Added StorageServerUtils.

* Added physical shard move in SS.

* Fix on ApplyMetaData, doFetchFile error handling etc.

* Debugging incorrect shard size.

* Create/delete checkpoints only when Physical shard move is enabled.

* Added back SHARD_ENCODE_LOCATION_METADATA.

* Fixed bytesSample incorrect issue.

Essentially dedicated CheckpointRocksDBCF as key-value based checkpoint, will need to add a new format for the file-based checkpoint.

* Cleanup.

* Cleanup & compile rocksdb with 8.1 branch.

* clean up.

* clean up.

* Allowed request_maybe_delivered error type in FetchShard.

* Added FDBRocksDBVersion.h.

* Fixed stuck fetchShard.

* Don't create checkpoint on TSS.

* Upgrade to RocksDB 8.1.1

* Cleanup.

* Fixed accidently deleted db_path and name fields.

* Improved trace event.

* Removed redundants from previuos ShardedrocksDB.

* Cleanup.

* cleanup.

* cleanup.

* reanme `state`.

* Cleanup.

* Removed excessive TraceEvent.

* * Fixed shardMap race condition on different threads
* Added *Stats, logging data move rates.
* Added `DD_PHYSICAL_SHARD_MOVE_PROBABILITY` to support hybrid data move.

* Resolved comments.

* fmt.

* Use physical shard move in PhysicalShardMoveTest.

* Enforce physical-shard-move for PhysicalShardMoveTest.

* fmt
2023-05-23 11:18:35 -07:00
Hui Liu
7ca13d8f9c support blob restore in fdbrestore (#10248) 2023-05-19 14:45:14 -07:00
He Liu
a5f639f859 Fix psm test (#10273) 2023-05-18 14:54:26 -07:00
Ata E Husain Bohra
e25b9ff686 EaR: REST based Simulated KMS Vault request handler interface (#10240)
* EaR: REST based Simulated KMS Vault request hanlder interface

Description

  diff-1: Address review comments
             Improve unit test case coverage
  diff-2: Extend RESTKmsConnectorUtil to generate HTTP::Header

EaR simulation testing is currently driven using SimKmsConnector
interface, it exposes endpoints directly invoked by EKP to fetch
encryption keys. Approach avoids testing RESTKms communication
path. Recently FDB codebase got extended by adding HTTPServer
interface, which was a gap prohibiting end-to-end testing of
EaR code.

Patch proposes following changes:
1. Refactor RESTKmsConnector to move common code and definitions
to RESTKmsConnectorUtil namespace
2. Introduce RESTSimKmsVault accepting HTTP format requests and
providing appropriate HTTP response.

Testing

RESTUnit          100K + 5k valgrind
devRunCorrectness 100K

Testing
2023-05-17 12:38:09 -07:00
Josh Slocum
9c081f8a08 Sim http server improvements (#10217)
* Passes existing tests

* adding http unit test for wrong md5 sum

* Added new HTTPKeyValueStore workload to test long-running http clients

* fixing warnings
2023-05-12 16:33:32 -05:00
Hui Liu
53e68065e7 Support blob manifest backup for fdbbackup cmdline (#10091) 2023-05-08 16:07:22 -07:00
Josh Slocum
a4dffa087a Adding Simulated HTTP Server and refactoring HTTP code (#10112)
* Adding Simulated HTTP Server and refactoring HTTP code

* fixing formatting

* fixing merge conflicts

* fixing more merge conflicts

* code review feedback

* changing reference counted interface

* more fixes

* fixing ide build i guess
2023-05-05 12:19:17 -05:00
Hui Liu
bd8c15634e Create blob connection per tenant for blob restore (#10070) 2023-05-01 10:26:55 -07:00
Xiaoxi Wang
4b129b3ef4 Fix StorageMetrics wait timeout 2023-04-26 14:23:09 -07:00
Xiaoxi Wang
617a7bcd23 enable MockDDTrackerWorkload 2023-04-26 14:23:09 -07:00
Nim Wijetunga
22ba818133 Prevent Encryption Key Refresh for Non-Latest Keys (#9959)
prevent refresh for old encryption keys
2023-04-18 09:43:24 -07:00
Josh Slocum
d37b2b0a76 Adding BlobFailureInjection workload (#9833)
* Adding BlobFailureInjection workload

* fixing formatting
2023-04-06 15:10:36 -05:00
Zhe Wu
074db3c646 Update SpecialKeySpaceRobustness config 2023-03-31 21:59:12 -07:00
Ata E Husain Bohra
0e720634f3 EaR: Allow RESTKmsConnector validation token newline char sanitization (#9831)
Description

Patch proposes ability to remove newline characters from KMSConnector
validation tokens

Testing

RESTKmsConnectorUnit.toml
2023-03-29 16:56:46 -07:00
Hao Fu
b205862798 Fix finishedQueries metric, add metrics reporting in GetMappedRange t… (#9796)
* Fix finishedQueries metric, add metrics reporting in GetMappedRange test [release-7.1] (#9785)

* Fix finishedQueries metric, add metrics reporting in GetMappedRange test

* refactor to make format work

* resolve comments

* Fix more comments

* Fix bugs and change running time of test

* use double for options
2023-03-27 18:17:22 -07:00
Zhe Wu
a278e72b3c Update SpecialKeySpaceRobustness.toml to include extra processes to make exclusion work 2023-03-24 18:49:19 -07:00
Hui Liu
499a4cab93 Add correctness test for point-in-time restore (#9185) 2023-03-14 08:56:34 -07:00
Nim Wijetunga
218ed4519f Strengthen Snapshot Backup/Restore Asserts (#9552)
strengthen backup/restore asserts for encryption
2023-03-08 15:24:02 -08:00
Ata E Husain Bohra
1c997acfd2 Remove Test pattern from RandomUnit Test
Description

Remove Test pattern from RandomUnit Test

Testing
2023-03-08 10:00:14 -08:00