Commit Graph

808 Commits

Author SHA1 Message Date
Zhe Wang
ca4ab1eca9 Fix traceTooManyEvents and externalTimeouts in BulkLoad test (#11769) 2024-11-11 11:05:43 -08:00
Syed Paymaan Raza
7a5f61cc65 Address feedback in PR #11753 (#11757) 2024-11-05 20:30:40 -08:00
Zhe Wang
ab9ce0df15 Cherrypick recent DD changes from release-7.3 (#11754)
* [Release-7.3] TeamRedundant and TeamUnhealthy data moves choose best destination with probability (#11668)

* team redundant and unhealthy data moves can choose best dest with probability

* nits

* nits

* enable wantTrueBestIfMoveout

* fix getteam stuck

* [Release-7.3] Delay team remover when space pivot is low (#11665)

* [Release-7.3] Validate ServerTeam count per server in simulation (#11678)

* validate server team count in simulation

* change naming (not relevant to the PR title)

* address comments and add a new trace event BuildTeamsLastBuildTeamsFailed triggered when buildTeam failed
2024-11-05 18:52:40 -08:00
Syed Paymaan Raza
84fb8f843c Gray failure allows storage servers to complain (#11753) 2024-11-05 16:53:02 -08:00
Syed Paymaan Raza
7d529ec724 Invalidate gray failure complaints from excluded processes (#11749) 2024-11-04 13:49:25 -08:00
neethuhaneesha
64030de741 Max range deletions knob update to prevent OOMs. (#11738) 2024-10-28 11:45:44 -07:00
Yao Xiao
afbcf5ef5f Enable backward read in consistency checker. (#11721)
* Do backward reads in consistency checker.

* Add knob for read options in consistency checker.
2024-10-26 09:57:36 -07:00
Zhe Wang
43446204ed Database Per-Range Lock (#11693)
* range lock framework

* improve the framework

* persist to txnStateStore

* fix bugs

* code clean

* code clean

* bug fix

* address comments

* add complex test workload and fix bugs found by the workload

* add workload correctness check and fix bugs

* code clean up

* add random range lock injection

* fix bugs in RandomRangeLock.actor.cpp

* enable random range lock injection in general workloads

* add rangelockcycle test

* disable random range lock in backup workloads

* nits

* add range lock ownership concept

* enable lock ownership to rangeLock

* api deal with tenant

* fix CI

* add test for multiple rangeLock owners

* nits

* address comments and renaming

* address comments
2024-10-23 16:25:56 -07:00
Syed Paymaan Raza
5f480947ad [fdbserver] Gray failure and simulator improvements related to remote processes (#11717)
* [fdbserver][simulator] Add remoteDesiredTLogCount option

* [fdbserver][simulator] Allow explicitly specifying number of stateless classes in each DC

* [fdbserver][gray_failure] RemoteTLog lagging SS simulation test

* [fdbserver][gray_failure] Consider remote processes + CC inter/intra latency awareness

* [fdbserver][cc] Make processInSameDC O(1)
2024-10-23 13:15:29 -07:00
Yao Xiao
7290369aac Use a single iterator pool for all physical shards. (#11699)
* Rewrite iterator pool.

* simulation fix
2024-10-15 17:28:54 -07:00
Jingyu Zhou
7c6c8ae095 Merge pull request #11709 from yao-xiao-github/knob-main
Update sharded rocksdb knobs.
2024-10-10 15:11:43 -07:00
Yao Xiao
6a87e6042f update knobs 2024-10-10 13:51:50 -07:00
Zhe Wang
fcb0030883 add probability that Memory gets selected 2024-10-10 11:24:40 -07:00
Zhe Wang
7d95b87483 improve the probability that sharded rocksdb is selected in simulation tests 2024-10-10 09:48:23 -07:00
Syed Paymaan Raza
0946f49579 [gray-failure] Remove CC_PAUSE_HEALTH_MONITOR (#11675) 2024-10-07 11:59:19 -07:00
neethuhaneesha
5637f23231 Increasing minimum age to wiggle to avoid re-wiggling migrated rocksdb storage servers (#11683) 2024-09-26 12:59:37 -07:00
Yao Xiao
83dd1f202e Fix block cache size error and improve logging. (#11681) 2024-09-24 13:08:31 -07:00
Jingyu Zhou
b872a8ea57 Merge pull request #11637 from neethuhaneesha/direct_io_enabling
Enabling rocksdb direct_io and wiggle knobs
2024-09-19 10:12:14 -07:00
Syed Paymaan Raza
e1c7cdd3e3 [CC+Worker] Enable WORKER_HEALTH_MONITOR related knobs in simulation tests (#11657) 2024-09-12 23:27:47 -07:00
Yao Xiao
289d02899f Add knobs for caching index blocks. (#11650) 2024-09-11 22:58:26 -07:00
neethuhaneesha
8ff623e523 Enabling rocksdb direct_io and wiggle knobs 2024-09-11 09:28:44 -07:00
Jingyu Zhou
d730db521a Fix a Valgrind error (#11645)
buggifyShortReadWindow used unitialized variable ENABLE_VERSION_VECTOR.
2024-09-10 15:29:36 -04:00
Dan Lambright
5eafd46351 Disable version vector on batches with backed up mutations (#11634)
Co-authored-by: Dan Lambright <hlambright@apple.com>
2024-09-09 21:18:59 -04:00
Zhe Wang
5ee0db13e6 Fix external timeout with ShardedRocksDB and re-enable ShardedRocksDB in simulation tests (#11638)
* speedup sharded rocksdb in simulation

* re-enable shardedrocksdb and disable physical shard move
2024-09-08 10:57:55 -07:00
Zhe Wang
3305d2e3ee fix storage engine selection (#11586) 2024-08-20 09:24:30 -07:00
Zhe Wang
6c502e9707 Solve RocksDB external timeout error and re-enable RocksDB simulation tests (#11577)
* init knob tune

* include rocksdb in tests

* probably reuse rocksdb iterator in simulation

* clear unnecessary knob change
2024-08-16 12:37:18 -07:00
Syed Paymaan Raza
c3e7542cda Update end year in copyright header 2024-08-02 09:40:11 -07:00
Zhe Wang
a245b9622c Fix a couple of simulation failures (#11543)
* Add usable region check per shard for encode shard location metadata

* nits

* nit

* address comments

* fix SS assertion failed for a wrong data move type generated by an old binary which does not encode the data move type in the data move id

* fix ClientTransactionProfilingCorrectness 7.3 upgrade test considering physical shard move compatibility

* code clean

* split CycleTestRestart in upgrading test from release-7.3

* address comments

* nits
2024-08-01 22:32:32 -07:00
Zhe Wang
def1f0edc8 fix dd stuck due to long fetch shard (#11537) 2024-07-29 21:35:49 -07:00
Zhe Wang
74990e44bd Bulk Loading Framework (#11369) 2024-07-23 14:57:28 -07:00
Dan Lambright
1e834f84c8 Add dynamic knob to disable gray failure recoveries. (#11509)
Co-authored-by: Dan Lambright <hlambright@apple.com>
2024-07-20 14:35:21 -04:00
Yao Xiao
c630fa2296 Fix wait (#11474) 2024-07-18 11:28:34 -07:00
Vishesh Yadav
6cd5ad2ffe Add code back 2024-07-10 18:52:14 -07:00
Vishesh Yadav
591efa1d1b Remove swift 2024-06-29 11:11:48 -07:00
Sreenath Bodagala
d7eb028b2a Enable replica consistency check on data movement (#11415)
* - Enable replica consistency check on data movement (and, randomly, on
all reads)

* - Address PR review comments
2024-06-17 17:07:32 -04:00
Yao Xiao
1791d07be1 Improvements (#11363) 2024-05-15 09:04:50 -07:00
Yao Xiao
0d25e0a9f7 Adjust knob. (#11395) 2024-05-14 10:44:40 -07:00
neethuhaneesha
fa15b9df49 RocksDB memtable max range deletions knob update. (#11386) 2024-05-13 15:54:43 -07:00
Sreenath Bodagala
033df029a5 - Support for doing replica consistency check on data movement (#11373) 2024-05-10 14:15:17 -04:00
neethuhaneesha
8ade53977a Adjusting block cache size knob. (#11356) 2024-05-02 13:57:33 -07:00
Yao Xiao
67a588380e shard size log (#11342) 2024-04-29 13:42:19 -07:00
Yao Xiao
9789c7f4ff async io (#11325) 2024-04-22 14:20:11 -07:00
Yao Xiao
81b342fccd Don't remove team when total team count is within threshold (#11295) 2024-04-19 15:40:42 -07:00
neethuhaneesha
ed7a275231 Rocksdb caching knob options. (#11282) 2024-04-17 10:09:14 -07:00
Yao Xiao
be3dcbde62 Sharded RocksDB knob changes. (#11291) 2024-04-16 11:15:08 -07:00
neethuhaneesha
c96dcc74a7 Add rocksdb direct_io knobs. 2024-03-27 10:34:00 -07:00
neethuhaneesha
e26981a7a9 Added max range deletions before flush knob and some knob changes. (#11242) 2024-03-12 14:46:16 -07:00
Yao Xiao
19e3f3e2dd Disable compaction compaction for newly added shard. (#11238)
* Disable compaction compaction for newly added shard.
2024-03-07 14:41:53 -08:00
Johannes M. Scheuermann
484c5deaf0 Allow to disable the removal of maintenance mode when a SS outside of the maintenance zone fails 2024-02-22 18:29:20 +01:00
neethuhaneesha
7db980e185 Rocksdb in-memory data structures protection checksums. (#11206) 2024-02-19 16:46:12 -08:00