Commit Graph

207 Commits

Author SHA1 Message Date
Syed Paymaan Raza
c3e7542cda Update end year in copyright header 2024-08-02 09:40:11 -07:00
Dimitris Apostolou
a88114c222 Fix typos 2024-02-07 01:16:00 +02:00
Hao Fu
9b17dd8caf Fix backup workers stability issues (#11044)
This PR includes a few stability fixes for Backup Worker

* Fixed memory bookkeeping issue in Backup Worker. Previously
it didn't release flow lock correctly when erasing messages.

* Added TLogServer fix to return 0 from poppedVersion() for
unrecognized log router tags.
2023-11-13 15:55:25 -08:00
Jingyu Zhou
64a6ab7122 Add ID to DEBUG_MUTATION in backup worker 2023-05-03 13:15:18 -07:00
Nim Wijetunga
6e4e6ab2f4 Revert "Revert "Refactor GetEncryptCipherKeys (#9600)"" (#9903)
* Revert "Revert "Refactor GetEncryptCipherKeys (#9600)" (#9708)"
2023-04-05 10:03:48 -07:00
Ata E Husain Bohra
dbcab0b1bd Revert "Refactor GetEncryptCipherKeys (#9600)" (#9708)
This reverts commit 2702665e35.
2023-03-15 12:10:08 -07:00
Jingyu Zhou
b755e668bf Merge pull request #9601 from jzhou77/fix-head
Allow log router to detect slow peeks and to switch DC for peeking
2023-03-09 15:34:24 -08:00
Nim Wijetunga
2702665e35 Refactor GetEncryptCipherKeys (#9600)
* inital commit

* address pr comments
2023-03-08 17:05:03 -08:00
Jingyu Zhou
0259a243ae Switch DC if log router peek becomes stuck
Trying to a different DC if this happens.
2023-03-06 17:41:56 -08:00
Ata E Husain Bohra
99b23ac04d EaR: Configurable encryption support for Tlog mutations (#9394)
* EaR: Configurable encryption support for TLog mutations

Description

  diff-1 : Address review comments

Major changes includes:
1. Update the code involved in ensuring Tlog mutation encryption to be
compliant with "configurable encryption" feature.
2. Update ENABLE_CONFIGURABLE_ENCRYPTION flag to be 'true' by default
and BUGGIFY it.

Testing

devRunCorrectness - 100K
2023-02-16 19:01:59 -08:00
Yi Wu
fe18c87ac6 EaR: commit proxy fetch additional cipher keys post-resolution (#9308)
Commit proxy needs to fetch additional cipher keys post-resolution, since tenant ids for raw access requests and cross-tenant clear ranges are calculated after resolution.
2023-02-14 13:05:51 -08:00
FoundationDB CI
86d6106dc1 format source code after switch to clang 15 2022-12-08 17:26:45 +00:00
Jingyu Zhou
d3f0e396ac Fix backup worker assertion failure
The number of released bytes exceeds the number of acquired bytes in locks.
This is because the bytes counted towards release is calculated after a "wait",
when more bytes could be allocated.
2022-11-19 09:18:49 -08:00
sfc-gh-tclinkenbeard
74212eeacf Encapsulate CounterCollection 2022-10-25 10:17:15 -07:00
A.J. Beamon
3353103d9d Fix filtering of potential backup mutations in commit proxy and backup worker; add code probe to ensure we are testing default backup sharing and add some tests to hit it 2022-09-27 15:25:30 -07:00
A.J. Beamon
fda0d7223d Update backup to include system key ranges needed for tenants. Run simulated backup tests with tenants. 2022-09-22 10:00:13 -07:00
A.J. Beamon
4fd64630e8 Convert literal string ref instances to use _sr suffix 2022-09-19 11:35:58 -07:00
Yi Wu
d831c87d14 Add encryption metrics (#8070)
Adding the following metrics:
* BlobCipherKeyCache hit/miss
* EKP: KMS requests latencies
* For each component that using encryption, they now need to pass a UsageType enum to the encryption helper methods (GetEncryptCipherKeys/GetLatestEncryptCipherKey/encrypt/decrypt) and those methods will help to log get cipher key latency samples and encryption/decryption cpu times accordingly.
2022-09-09 18:43:09 -07:00
Nim Wijetunga
a857609478 refactor ekp interface 2022-08-23 23:04:12 -07:00
Steve Atherton
9a6387af70 Merge pull request #7582 from sfc-gh-yiwu/encrypt_mutation
Make encrypted mutation serialize as normal mutations
2022-08-17 01:30:51 -07:00
Jingyu Zhou
8fb6d59e94 Use error_code_grv_proxy_memory_limit_exceeded
instead of error_code_proxy_memory_limit_exceeded
2022-08-12 11:36:17 -07:00
Jingyu Zhou
ed1b7ef173 Add a client knob GRV_ERROR_RETRY_DELAY 2022-08-12 11:13:32 -07:00
Jingyu Zhou
609e29f4c0 Fix backup worker to handle GRV errors 2022-08-12 11:13:32 -07:00
Yi Wu
db8f00acf6 address comments 2022-08-09 19:16:28 -07:00
Yi Wu
509067016f Make encrypted mutation serialize as normal mutations 2022-08-09 19:16:26 -07:00
Markus Pilman
1de37afd52 Make TEST macros C++ only (#7558)
* proof of concept

* use code-probe instead of test

* code probe working on gcc

* code probe implemented

* renamed TestProbe to CodeProbe

* fixed refactoring typo

* support filtered output

* print probes at end of simulation

* fix missed probes print

* fix deduplication

* Fix refactoring issues

* revert bad refactor

* make sure file paths are relative

* fix more wrong refactor changes
2022-07-19 13:15:51 -07:00
A.J. Beamon
1dc78a3b3f Add support for different transaction types (Reference<ITransaction>, Reference<ReadYourWritesTransaction>) to key backed types 2022-07-01 10:02:39 -07:00
Yi Wu
364644673f Support TLog encryption in commit proxy (#6942)
This PR add support for TLog encryption through commit proxy. The encryption is done on per-mutation basis. As CP writes mutations to TLog, it inserts encryption header alongside encrypted mutations. Storage server (and other consumers of TLog such as storage cache and backup worker) decrypts the mutations as they peek TLog.
2022-06-29 14:21:05 -07:00
Markus Pilman
8af056e7b0 fdbclient now compiling 2022-06-23 18:05:36 -06:00
Ray Jenkins
dc9e782ccc OpenTelemetry Tracing Perf Fixes (#6990) 2022-05-02 14:56:51 -05:00
Ray Jenkins
1c5bf135d5 Revert "Migrate to OpenTelemetry tracing. (#6855)" (#6941)
This reverts commit 5df3bac110.
2022-04-25 09:29:56 -05:00
Ray Jenkins
5df3bac110 Migrate to OpenTelemetry tracing. (#6855) 2022-04-20 09:26:37 -05:00
Dan Lambright
e43fde16ec formatting 2022-04-08 17:28:16 -04:00
Dan Lambright
62975f87d1 Formatting 2022-04-08 15:04:46 -04:00
Jingyu Zhou
cfcf0f152c Merge branch 'main-4a085fc84' into vv
Fix Conflicts:
	fdbclient/NativeAPI.actor.cpp
	fdbserver/ClusterRecovery.actor.cpp
	fdbserver/MasterInterface.h
	fdbserver/masterserver.actor.cpp
	flow/error_definitions.h
2022-03-30 22:28:06 -07:00
Jingyu Zhou
e9659b5dd4 Merge branch 'master-PR-6500' into vv
Fix Conflicts:
	fdbclient/CommitProxyInterface.h
	fdbclient/NativeAPI.actor.cpp
	fdbserver/masterserver.actor.cpp
2022-03-30 14:53:49 -07:00
sfc-gh-tclinkenbeard
a71099471b Update copyright header dates 2022-03-21 13:36:23 -07:00
A.J. Beamon
250a88e682 Enforce that trace event suppression calls happen first when using trace event call chaining. Fix various instances where we weren't following this requirement. 2022-02-24 12:25:52 -08:00
Dan Lambright
9544379cdf rebase 2022-01-20 11:12:33 -05:00
A.J. Beamon
07e5319477 Merge pull request #6165 from sfc-gh-ajbeamon/native-api-refactor
Refactor Native API Transactions
2022-01-13 14:53:19 -08:00
A.J. Beamon
17415168b6 Make the use provisional proxies parameter be of type UseProvisionalProxies everywhere. 2022-01-13 12:41:20 -08:00
Ata E Husain Bohra
936bf5336a Revert "Revert "Refactor: ClusterController driving cluster-recovery state machine" (#6191)
* Revert "Revert "Refactor: ClusterController driving cluster-recovery state machine""

Major changes includes:
1. Re-revert Sequencer refactor commits listed below (in listed order):
1.a. This reverts commit bb17e194d9.
1.b. This reverts commit d174bb2e06.
1.c. This reverts commit 30b05b469c.

2. Update Status.actor to track ClusterController interface to track
   recovery status.
3. Introduce a ServerKnob to define "cluster recovery trace event"
   prefix; for now keeping it as "Master", however, it should allow
   smooth transition to "Cluster" prefix as it seems more appropriate.
2022-01-06 12:15:51 -08:00
Aaron Molitor
30b05b469c Revert "Refactor: ClusterController driving cluster-recovery state machine"
This reverts commit dfe9d184ff.
2021-12-24 11:25:51 -08:00
Ata E Husain Bohra
dfe9d184ff Refactor: ClusterController driving cluster-recovery state machine
At present, cluster recovery process consists of following steps:
1. ClusterController clusterWatchDatabase actor recruits
   master/sequencer process.
2. Sequencer process implements the cluster recovery state machine,
   responsible to recruit all other processes as well restore the
   cluster state.

Patch proposes a scheme where the cluster recovery state machine
is implemented and driven by the ClusterController process instead
of the Sequencer process.

Advantages of the scheme could be:
1. Simplified design where ClusterController recruits "sequencer"
   process like other worker processes compared to current scheme
   where "sequencer" process gets special treatment. In newer scheme
   sequencer is responsible for maintaining/providing
   "committed version" (as expected).
2. ClusterController is responsible for worker processes recruitment,
   the sequencer though orchestrating the recovery state machine, it
   need to reachout to the ClusterController for recruiting worker
   processes etc.

NOTE:
Patch has moved the recovery state machine code from
'sequencer' -> 'cluster-controller' process, however, necessary
updates were done for both functionality as well as performance
improvement reasons.

Next Steps:
Cluster recovery documentation will be updated in near future.
2021-12-22 14:06:27 -08:00
Jingyu Zhou
0eeee21a9f Fix duplicated fields in trace events 2021-10-15 09:47:23 -04:00
Dan Lambright
f68d018a06 conflicts 2021-10-08 12:39:39 -04:00
A.J. Beamon
c2885ab70d The BACKUP_LOCK_BYTES knob could be buggified to a value that was too low, resulting in backup getting stuck. 2021-09-24 09:15:30 -07:00
Sreenath Bodagala
852fc96200 Address simulation test failures caused by:
- Assertion failures in MoveKeys.actor.cpp
- Wrong results returned by getRange()

Changes:

DatabaseContext.h, NativeAPI.actor.[h,cpp]:
- Introduce a new flag, TransactionInfo::readVersionObtainedFromGrvProxy.
- Set this flag to true by default, and clear it when the read version of a
transaction is explicitly set (by using setVersion()).
- Modify getLatestCommitVersions() to not populate "latestCommitVersions" if
this flag is not set. (This will cause storage server to read at the specified
read version.)

- Modify getRange() actor to always use the specified version as the read
version (except when the specified version is latestVersion).

- Modify waitForCommittedVersion(), getRawVersion(), and getConsistentReadVersion()
to update local version vector cache after receiving GetReadVersionReply.

IClientApi.h, IConfigTransaction.h, ISingleThreadTransaction.h,
MultiVersionTransaction[.actor].[h,cpp], ThreadSafeTransaction.[h,cpp],
ApiWorkload.h:
- Add methods to get the spanID of a transaction and also the version vector
cached in a transaction. (Likely to be useful for debugging simulation test
failures.)

VersionVector.h:
- Update "maxVersion" when populating/applying a delta. (Note that empty
mutation messages only update VersionVector::maxVersion.)

BackupWorker.actor.cpp:
- Update local version vector cache after receiving GetReadVersionReply message.

Status.actor.cpp:
- Update local version vector cache and
TransactionInfo::info.readVersionObtainedFromGrvProxy after setting the
read version.
2021-09-16 12:00:26 -04:00
Dan Lambright
8689e1f106 merge with master 2021-08-30 15:29:08 -04:00
Steve Atherton
faa4154a56 MutationTracking now uses a vector of keys to track. Removed "Mutation" detail from DEBUG_MUTATION() events because they are duplicates of the fields already logged in the returned MutationTracking event, which are now renamed and combined into "Mutation". Removed more toString() calls in TraceEvent detail values. 2021-08-09 23:30:45 -07:00