* range lock framework
* improve the framework
* persist to txnStateStore
* fix bugs
* code clean
* code clean
* bug fix
* address comments
* add complex test workload and fix bugs found by the workload
* add workload correctness check and fix bugs
* code clean up
* add random range lock injection
* fix bugs in RandomRangeLock.actor.cpp
* enable random range lock injection in general workloads
* add rangelockcycle test
* disable random range lock in backup workloads
* nits
* add range lock ownership concept
* enable lock ownership to rangeLock
* api deal with tenant
* fix CI
* add test for multiple rangeLock owners
* nits
* address comments and renaming
* address comments
* ConsistencyCheckerUrgent repeated run
* address comments
* avoid trace SevError for TesterRecruitmentTimeout unless it keeps failure for over 1 day
* address comments
* address comments
* - Compare storage replicas on reads (in "loadBalance()")
* - Do consistency check on reads in loadbalance
* - Do replica consistency check in the case where loadBalance issues
requests to multiple storage servers
* - Address a state variable related bug
* - Code formatting
* - API simplification
* - Simplify code
* - Code formatting
* - Address a review comment
* acs framework
* code refactor and fix bugs
* add ss crash loop protector
* use sharedptr instead of raw pointer
* fixed critical bugs and add provate mutation acs to the framework
* enable ACS for all mutations except for clear serverTag mutation and fix bugs
* fix restarting tests
* refactor code and fix bugs
* fix AccumulativeChecksumState toString
* fix bugs
* allow all mutations in acs and fixed bugs
* fix bugs and code cleanup
* code clean up for adding recovery support
* simplify code and support recovery
* clear acs state at ss
* fix bug
* terminate validator if ss will be removed in the current batch
* simplify code
* add trace
* address comments
* optimize code
* deep copy when adding mutation to acs validator
* warp encode and decode persist acs key
* make acstable private
* remove unless func
* remove unless func
* remove epoch in ACS validator
* add acs mutation counter in SS metrics
* code cleanup and make knob check better
* make mutation buffer global
* simplify code
* add comments
* make knob randomly set
* address comments
* ss reboot after acs mismatch found
* cherry-pick-distributed-consistency-checker
* code cleanup
* refactor code, decouple consistencyCheckerUrgent and consistency checker
* fix workload for consistencycheckurgent
* add new consistencycheckurgent role type
* fix CI
* address comments
We learnt that a new connection needs to be made for each HTTP
request through proxy to AWS S3, thus it would fail when try
to re-use the connections and see retryable errors.
Meanwhile, delay between each retry grow 2x each time it failed,
if the connection pool has a larger size, the delay can be long.
As a result, this change adds Knobs to cap the max delay of
retryable errors, including one for general retryable errors,
and the other only for connection failure error.
This change also adds more logging for debugging.
Description
Given Configurable encryption has been checked in and being tested via
simulation for more than a month and also to avoid penalty of accessing
KNOBS in inline commit path, patch retires the KNOB and make
ConfigurationEncryption default EaR mode for FDB.
BlobCipher still supports the old format header and encryption semantics,
will remove the dead code as a followup PR.
Testing
devRunCorrectness - 100K
* EaR: REST KMS fixes - encryption integration testing
Description
Major changes:
1. Multiple fixes observed while performing integration end-to-end
testing for Encryption at-rest feature.
2. Improve REST module logging. Introduced FLOW_KNOBS->REST_LOG_LEVEL
to have more granular control of feature logging disconnected from
the cluster log level.
Testing
Integration testbed:
1. Run fdbserver standalone
2. Run external KMS http-server to serve encryption key fetch requests
* EaR: RESTClient HTTP compliance, fix json request content type
Description
diff-1: Address review comments
RESTClient is responsible to handle FDB <-> KMS communication
for Encryption and other usecases. By design, it only supports
"secure connection" i.e. "https"; however, it seems there is a
need to expand the module to support "http" connection,
for instance: test and dev deployments for instance.
However, given RESTClient gets involved in handling high
sensitive contents such as: plaintext "encryption cipher
from a KMS", the feature is guarded using
CLIENT_KNOB->REST_KMS_ENABLE_NOT_SECURE_CONNECTION which is
settable using FDBServer command line argument
"--kms-rest-enable_not_secure_connection" (boolean)
Testing
Deployed a standalone fdbserver and communicate with a
simple "http" server
* EaR: Configurable encryption support for TLog mutations
Description
diff-1 : Address review comments
Major changes includes:
1. Update the code involved in ensuring Tlog mutation encryption to be
compliant with "configurable encryption" feature.
2. Update ENABLE_CONFIGURABLE_ENCRYPTION flag to be 'true' by default
and BUGGIFY it.
Testing
devRunCorrectness - 100K
* EaR: Configurable encryption framework
Description
EaR implementation only supports fixed size on-disk encryption header format.
One drawback of the scheme is, introducing a newer encryption scheme as well
as updating header format in future may incur data migration restrictions.
Major changes proposed in the patch includes:
1. Flexible Encryption header format allowing the following:
1.1. Header flags (metadata) can evolve separately from the encryption algorithm
1.2. Specific encryption algorithm header to allow future extensions.
2. Update the BlobCipher encryption/decryption util classes to work with newer
encryption header format.
3. Continue supporting multiple encryption authentication schemes such as:
HMAC-SHA and AES-CMAC; also, supports no encryption-authentication schemes.
4. Refactor BlobCipher unit test to enable testing of new format.
5. Configuration knobs to control encryption header flags and algorithm
versions.
Note:
The on-disk header storage footprint savings due to the newer scheme is as follows:
1. No encryption authentication: 54% smaller compared to existing implementation.
3. AES-CMAC: 16% smaller compared to existing implementation.
3. HMAC-SHA encryption authentication: almost same size.
Testing
BlobCipherTest
EncryptionOpsTest
* Adding global connection pool for multiple blobstore instances
* adding knob to enable/disable blobstore global connection pool
* Adding BlobStoreMetrics and BlobStoreRequestLatency logging for better blobstore observability
* added dynamic write amp calculations for blob granule compaction
* changing blob worker parallelism counts to bytes budget to handle less uniform operation sizes
* more snapshotting parallelism for behind feeds
* add a bit of observability when this happens
* adding knobs
* typo
* adjusting some knobs up with buggified granule size
* fixing bugs in dynamic write amp
* fixing formatting
* fixing bug in knob buggification
* fix formatting