Commit Graph

185 Commits

Author SHA1 Message Date
Zhe Wang
43446204ed Database Per-Range Lock (#11693)
* range lock framework

* improve the framework

* persist to txnStateStore

* fix bugs

* code clean

* code clean

* bug fix

* address comments

* add complex test workload and fix bugs found by the workload

* add workload correctness check and fix bugs

* code clean up

* add random range lock injection

* fix bugs in RandomRangeLock.actor.cpp

* enable random range lock injection in general workloads

* add rangelockcycle test

* disable random range lock in backup workloads

* nits

* add range lock ownership concept

* enable lock ownership to rangeLock

* api deal with tenant

* fix CI

* add test for multiple rangeLock owners

* nits

* address comments and renaming

* address comments
2024-10-23 16:25:56 -07:00
Syed Paymaan Raza
c3e7542cda Update end year in copyright header 2024-08-02 09:40:11 -07:00
Jingyu Zhou
a342aa704d Merge pull request #11348 from sbodagala/main
Handle errors thrown by the load balancer replica consistency check framework
2024-05-02 11:50:23 -07:00
Sreenath Bodagala
df2b7b4fe8 - Address PR review comments 2024-05-01 19:41:00 +00:00
Zhe Wang
bf53218556 Improve distributed consistency checker (#11346)
* ConsistencyCheckerUrgent repeated run

* address comments

* avoid trace SevError for TesterRecruitmentTimeout unless it keeps failure for over 1 day

* address comments

* address comments
2024-04-30 14:45:32 -07:00
Sreenath Bodagala
d6f6b45125 - Handle errors thrown during replica consistency check 2024-04-30 21:37:50 +00:00
Zhe Wang
314f4c41c7 Fix ACS mutation bug and improve accumulative checksum (#11319)
* enable acs by default

* code clean

* improve ACS code

* nits

* nits

* fix data corruption issue triggered by acs mutation
2024-04-20 01:31:55 -07:00
Zhe Wang
832972e2da Validate Mutation Version in Accumulative Checksum Framework (#11293)
* validate-mutation-version-in-acs-framework

* turn off knob

* randomly enable feature
2024-04-12 10:15:46 -07:00
Sreenath Bodagala
a4430b9169 Compare storage replicas on reads (#11235)
* - Compare storage replicas on reads (in "loadBalance()")

* - Do consistency check on reads in loadbalance

* - Do replica consistency check in the case where loadBalance issues
requests to multiple storage servers

* - Address a state variable related bug

* - Code formatting

* - API simplification

* - Simplify code

* - Code formatting

* - Address a review comment
2024-04-11 16:08:54 -04:00
Zhe Wang
33eecd0775 Real-time corruption detection with accumulative checksum (#11255)
* acs framework

* code refactor and fix bugs

* add ss crash loop protector

* use sharedptr instead of raw pointer

* fixed critical bugs and add provate mutation acs to the framework

* enable ACS for all mutations except for clear serverTag mutation and fix bugs

* fix restarting tests

* refactor code and fix bugs

* fix AccumulativeChecksumState toString

* fix bugs

* allow all mutations in acs and fixed bugs

* fix bugs and code cleanup

* code clean up for adding recovery support

* simplify code and support recovery

* clear acs state at ss

* fix bug

* terminate validator if ss will be removed in the current batch

* simplify code

* add trace

* address comments

* optimize code

* deep copy when adding mutation to acs validator

* warp encode and decode persist acs key

* make acstable private

* remove unless func

* remove unless func

* remove epoch in ACS validator

* add acs mutation counter in SS metrics

* code cleanup and make knob check better

* make mutation buffer global

* simplify code

* add comments

* make knob randomly set

* address comments

* ss reboot after acs mismatch found
2024-04-04 15:03:44 -07:00
Zhe Wang
b10c7107bb Enable Accumulative Checksum in MutationRef (#11225)
* code clean up and add accumulative checksum bits to mutation ref

* address comments and fix issues

* address comments

* propagate acs index from commit proxy to storage server

* address comments

* address comments

* address comments

* address comments
2024-03-11 09:51:31 -07:00
Zhe Wang
308ff77e91 Consistency Check Urgent (Cherrypick from Release-7.1) (#11217)
* cherry-pick-distributed-consistency-checker

* code cleanup

* refactor code, decouple consistencyCheckerUrgent and consistency checker

* fix workload for consistencycheckurgent

* add new consistencycheckurgent role type

* fix CI

* address comments
2024-02-28 14:22:47 -08:00
He Liu
9d8d52cbb7 Added checksum in MutationRef (#11181)
* Append checksum to param2.

* Pass sim tests w/o validating checksums.

* Code cleanup.

* Renew checksum.

* Remove checksum for all private mutations.

* Added checksum validation at SS.

* Fixed VERSION_TIMESTAMP.

* Disable Mutation Checksum by default.

* Cleanup.

* cleanup.
2024-02-09 13:36:41 -08:00
hao fu
3967136eeb Add Knobs to control retry delays for BlobStore
We learnt that a new connection needs to be made for each HTTP
request through proxy to AWS S3, thus it would fail when try
to re-use the connections and see retryable errors.

Meanwhile, delay between each retry grow 2x each time it failed,
if the connection pool has a larger size, the delay can be long.

As a result, this change adds Knobs to cap the max delay of
retryable errors, including one for general retryable errors,
and the other only for connection failure error.

This change also adds more logging for debugging.
2023-10-23 16:39:34 -07:00
Zhe Wu
aea57f6da4 Create MAX_WRITE_TRANSACTION_LIFE_VERSIONS client knob 2023-09-14 14:01:43 -07:00
Yi Wu
3287098b4a EaR: Handle KMS timeout in storage server and commit proxy 2023-08-28 16:17:43 -07:00
Nim Wijetunga
7f2260bbd2 Add Encryption Related Latency Metrics (#10596)
* add ss and cp latency metrics

* make changes
2023-07-14 11:30:16 -07:00
Ata E Husain Bohra
7779c908b3 EaR: Remove usage of ENABLE_CONFIGURABLE_ENCRYPTION knob (#10570)
Description

Given Configurable encryption has been checked in and being tested via
simulation for more than a month and also to avoid penalty of accessing
KNOBS in inline commit path, patch retires the KNOB and make
ConfigurationEncryption default EaR mode for FDB.

BlobCipher still supports the old format header and encryption semantics,
will remove the dead code as a followup PR.

Testing

devRunCorrectness - 100K
2023-06-30 17:48:09 -07:00
Jefferson Zhong
13853c9f89 Move stepSize knob from ClientKnobs to ServerKnobs 2023-06-16 14:48:11 -07:00
Evan Tschannen
88eed268c3 added a knob for how many bytes are read from disk 2023-06-11 16:10:20 -07:00
Evan Tschannen
359e178dcd Merge branch 'main' into feature-durable-change-feed
# Conflicts:
#	fdbclient/ClientKnobs.cpp
#	fdbserver/BlobManager.actor.cpp
#	fdbserver/worker.actor.cpp
2023-06-11 13:58:35 -07:00
Evan Tschannen
f69f4c73ad addressed review comments 2023-06-11 13:54:38 -07:00
sfc-gh-tclinkenbeard
71846070d6 Update default tag throttling knob values 2023-05-25 16:45:32 -07:00
Josh Slocum
8f241632af adding knob to allow relative paths for local backup containers 2023-05-23 17:06:49 -05:00
Josh Slocum
d038154d69 re-enabling change feed coalesce knob (#10317) 2023-05-23 14:43:11 -05:00
Josh Slocum
629b068145 Bg tenant metadata restarting (#10235)
* making blob metadata optionally deterministic across runs

* Non restarting test passes after refactor

* adding downgrade version test

* formatting
2023-05-23 11:24:13 -05:00
Hui Liu
7ca13d8f9c support blob restore in fdbrestore (#10248) 2023-05-19 14:45:14 -07:00
Jefferson Zhong
3760522dc2 Make stepSize configurable for preloadApplyMutationsKeyVersionMap 2023-05-19 10:57:30 -07:00
Ata E Husain Bohra
18fd2702c4 EaR: Implement SimKmsVault interface, refactor SimKmsConnector (#10194)
Description

Patch implements a SimKmsVault interface allowing unittest/simulation
to satisfy encryption lookup usecases. It also refactors existing
SimKmsConnector to leverage SimKmsVault APIs

Testing

devRunCorrectness - 100K
/simKmsVault - asan & valgrind
EncryptionUnitTest
2023-05-10 12:44:53 -07:00
Jingyu Zhou
78434517ff Increase buggified STORAGE_METRICS_SHARD_LIMIT value
The previous buggified value 3 can be the same as key location size, thus
causing splitStorageMetrics() to stuck.
2023-05-04 19:31:43 -07:00
Josh Slocum
5b47913882 disabling global conncetion pool for now (#10054) 2023-04-28 09:48:56 -05:00
Hui Liu
711e040627 RestoreConfig - use restoreRangeSet to replace restoreRanges (#9912) 2023-04-06 11:16:05 -07:00
Josh Slocum
a5b4212990 adding blob granule logical size 2023-03-15 08:54:49 -05:00
Nim Wijetunga
218ed4519f Strengthen Snapshot Backup/Restore Asserts (#9552)
strengthen backup/restore asserts for encryption
2023-03-08 15:24:02 -08:00
Ata E Husain Bohra
d0eec9d0ba EaR: REST KMS fixes - encryption integration testing (#9598)
* EaR: REST KMS fixes - encryption integration testing

Description

Major changes:
1. Multiple fixes observed while performing integration end-to-end
testing for Encryption at-rest feature.
2. Improve REST module logging. Introduced FLOW_KNOBS->REST_LOG_LEVEL
to have more granular control of feature logging disconnected from
the cluster log level.

Testing

Integration testbed:
1. Run fdbserver standalone
2. Run external KMS http-server to serve encryption key fetch requests
2023-03-08 09:49:43 -08:00
Ata E Husain Bohra
a45de70003 EaR: RESTClient HTTP compliance, fix json request content type (#9544)
* EaR: RESTClient HTTP compliance, fix json request content type

Description

  diff-1: Address review comments

RESTClient is responsible to handle FDB <-> KMS communication
for Encryption and other usecases. By design, it only supports
"secure connection" i.e. "https"; however, it seems there is a
need to expand the module to support "http" connection,
for instance: test and dev deployments for instance.

However, given RESTClient gets involved in handling high
sensitive contents such as: plaintext "encryption cipher
from a KMS", the feature is guarded using
CLIENT_KNOB->REST_KMS_ENABLE_NOT_SECURE_CONNECTION which is
settable using FDBServer command line argument
"--kms-rest-enable_not_secure_connection" (boolean)

Testing

Deployed a standalone fdbserver and communicate with a
simple "http" server
2023-03-06 16:06:03 -08:00
Josh Slocum
301f2fd201 disabling feed coalesce for now 2023-02-28 12:07:12 -06:00
sfc-gh-tclinkenbeard
1aef6cb5f7 Merge remote-tracking branch 'origin/main' into expose-tag-throttled-duration 2023-02-17 20:41:59 -08:00
Ata E Husain Bohra
99b23ac04d EaR: Configurable encryption support for Tlog mutations (#9394)
* EaR: Configurable encryption support for TLog mutations

Description

  diff-1 : Address review comments

Major changes includes:
1. Update the code involved in ensuring Tlog mutation encryption to be
compliant with "configurable encryption" feature.
2. Update ENABLE_CONFIGURABLE_ENCRYPTION flag to be 'true' by default
and BUGGIFY it.

Testing

devRunCorrectness - 100K
2023-02-16 19:01:59 -08:00
Nim Wijetunga
e03eca778c Configurable Encryption Support for Backup (#9375)
Snapshot backup configurable encryption support
2023-02-16 15:03:27 -08:00
A.J. Beamon
13eee09ce8 Merge branch 'main' into metacluster-mgmt-restore 2023-02-10 10:58:01 -08:00
A.J. Beamon
4b13c9c211 Make a few minor fixes, refactor some code for clarity, and improve throughput of repopulating a management cluster 2023-02-10 10:41:55 -08:00
sfc-gh-tclinkenbeard
31c3365215 Increase default value for MAX_TRANSACTION_TAG_LENGTH 2023-02-09 11:31:10 -08:00
A.J. Beamon
2d59c5681d Bug fixes and test improvements for management cluster restoration 2023-02-09 08:42:23 -08:00
Ata E Husain Bohra
9c649d7880 EaR: Configurable encryption framework (#9271)
* EaR: Configurable encryption framework

Description

EaR implementation only supports fixed size on-disk encryption header format.
One drawback of the scheme is, introducing a newer encryption scheme as well
as updating header format in future may incur data migration restrictions.
Major changes proposed in the patch includes:
1. Flexible Encryption header format allowing the following:
 1.1. Header flags (metadata) can evolve separately from the encryption algorithm
 1.2. Specific encryption algorithm header to allow future extensions.
2. Update the BlobCipher encryption/decryption util classes to work with newer
encryption header format.
3. Continue supporting multiple encryption authentication schemes such as:
HMAC-SHA and AES-CMAC; also, supports no encryption-authentication schemes.
4. Refactor BlobCipher unit test to enable testing of new format.
5. Configuration knobs to control encryption header flags and algorithm
versions.

Note: 
The on-disk header storage footprint savings due to the newer scheme is as follows:
1. No encryption authentication: 54% smaller compared to existing implementation.
3. AES-CMAC: 16% smaller compared to existing implementation.
3. HMAC-SHA encryption authentication: almost same size.


Testing

BlobCipherTest
EncryptionOpsTest
2023-02-08 22:51:05 -08:00
sfc-gh-tclinkenbeard
09ad864eb5 Merge remote-tracking branch 'origin/main' into expose-tag-throttled-duration 2023-02-08 11:25:14 -08:00
Nim Wijetunga
86f3665514 Handle EKP Tenant Not Found Errors (#9261)
handle EKP tenant not found errors
2023-02-01 19:15:38 -08:00
Josh Slocum
1b4753a4d4 Fix chunked reads (#9246)
* removing chunked read loop

* reducing memory overhead of async file block cache by freeing some blocks during read if no longer needed
2023-01-30 13:43:24 -06:00
Josh Slocum
f41b61aacf Blobstore static connection pool, and observability improvements (#9234)
* Adding global connection pool for multiple blobstore instances

* adding knob to enable/disable blobstore global connection pool

* Adding BlobStoreMetrics and BlobStoreRequestLatency logging for better blobstore observability
2023-01-27 16:46:26 -06:00
Josh Slocum
0881c0e4e2 Bg perf 2 (#9052)
* added dynamic write amp calculations for blob granule compaction

* changing blob worker parallelism counts to bytes budget to handle less uniform operation sizes

* more snapshotting parallelism for behind feeds

* add a bit of observability when this happens

* adding knobs

* typo

* adjusting some knobs up with buggified granule size

* fixing bugs in dynamic write amp

* fixing formatting

* fixing bug in knob buggification

* fix formatting
2023-01-26 16:56:45 -06:00