Commit Graph

1852 Commits

Author SHA1 Message Date
Zhe Wang
43446204ed Database Per-Range Lock (#11693)
* range lock framework

* improve the framework

* persist to txnStateStore

* fix bugs

* code clean

* code clean

* bug fix

* address comments

* add complex test workload and fix bugs found by the workload

* add workload correctness check and fix bugs

* code clean up

* add random range lock injection

* fix bugs in RandomRangeLock.actor.cpp

* enable random range lock injection in general workloads

* add rangelockcycle test

* disable random range lock in backup workloads

* nits

* add range lock ownership concept

* enable lock ownership to rangeLock

* api deal with tenant

* fix CI

* add test for multiple rangeLock owners

* nits

* address comments and renaming

* address comments
2024-10-23 16:25:56 -07:00
John Brownlee
cd2962f10c Rename fields in new trace events to match formatting standards. 2024-08-16 11:44:25 -07:00
John Brownlee
860963ba05 Reformat changes. 2024-08-15 12:31:17 -07:00
John Brownlee
cd4eb794b5 Add debug logging to help validate the transaction options set in fdbbackup. 2024-08-15 12:01:56 -07:00
Syed Paymaan Raza
c3e7542cda Update end year in copyright header 2024-08-02 09:40:11 -07:00
Sreenath Bodagala
93b62f6299 - Cleanup error/trace messages logged in the context of replica comparison (#11467) 2024-06-18 17:33:19 -04:00
Xiaoge Su
3e3eee98fc fixup! Reformat source 2024-06-17 11:41:06 -07:00
Xiaoge Su
afc04366fb Rewrite BUGGIFY related code
This is a rewrite of BUGGIFY function/macros. Seems the performance
improved a lot during the simulation, e.g.

fdbserver -r simulation -b on -f ../CycleTest.toml -s 99438

Without this patch:

Unseed: 54646
Elapsed: 494.091327 simsec, 14.586831 real seconds

With this patch:

Unseed: 54646
Elapsed: 494.091327 simsec, 12.580612 real seconds

I expected the improvement but did not expect a ~13% improvement.
2024-06-17 11:41:06 -07:00
hao fu
6b782c10f6 Fix globalconfig refresh hang issue
CC sets a version to int_max in ClientDBInfo indicating a refresh, however,
proxy server would reject this version for the error of future_version.

This change fixes this issue by not sending int_max, instead maintaining a
lastKnown in memory and send it to grvproxy to get latest globalconfig.

this change also fixes some java tests that were used to test the fix
2024-05-14 15:40:03 -07:00
Sreenath Bodagala
d6f6b45125 - Handle errors thrown during replica consistency check 2024-04-30 21:37:50 +00:00
Jingyu Zhou
9ac965886c Throw errors in getConsistentReadVersion
In the current code, errors are retried in getConsistentReadVersion, so it's
possible that the client has cancelled the GRV request, but readVersionBatcher
continue retrying, which can lead to many clients DDoS GRV proxies, especially
when the database has become unavailable for a while and clients are issuing
many GRV requests.
2024-04-17 09:13:21 -07:00
Sreenath Bodagala
a4430b9169 Compare storage replicas on reads (#11235)
* - Compare storage replicas on reads (in "loadBalance()")

* - Do consistency check on reads in loadbalance

* - Do replica consistency check in the case where loadBalance issues
requests to multiple storage servers

* - Address a state variable related bug

* - Code formatting

* - API simplification

* - Simplify code

* - Code formatting

* - Address a review comment
2024-04-11 16:08:54 -04:00
Dimitris Apostolou
a88114c222 Fix typos 2024-02-07 01:16:00 +02:00
Josh Slocum
611eb00fe1 stuck watch bug fix
* buggify watch version retry and fix multiple watch race after retry

* watch debugging improvements
2024-01-03 16:05:42 -06:00
Dan Lambright
015167c17e Throttle commits against hot shards (#10970)
* throttle hot shards

* expire throttled shards over time

* add backoff

* Parallelize messaging from RK to CP

* Obtain shards from a single SS

* handle expired transactions

* bump transaction_throttled_hot_shard

* Change SevError to SevWarn for CannotMonitorHotShardForSS

* Add log per request
2023-10-31 12:01:34 -04:00
Sreenath Bodagala
3dcee84898 Merge remote-tracking branch 'apple-upstream/main' 2023-10-09 15:21:16 +00:00
Sreenath Bodagala
3c01b1befe - Add a special key in order to fetch a specific group of status json fields. 2023-09-25 16:23:19 +00:00
Jingyu Zhou
f42dd41ae8 Merge pull request #10810 from sfc-gh-tclinkenbeard/main-fix-clear-cost-estimation
Fix quota throttler clear cost estimation
2023-09-20 20:48:40 -07:00
Zhe Wu
aea57f6da4 Create MAX_WRITE_TRANSACTION_LIFE_VERSIONS client knob 2023-09-14 14:01:43 -07:00
sfc-gh-tclinkenbeard
57eff6c5aa Track cost of point clears 2023-08-22 15:43:13 -07:00
Evan Tschannen
b247f565b7 cancel durable change feed actors in DatabaseContext destructor 2023-06-27 09:22:47 -07:00
A.J. Beamon
75ec56bffb When redoing a key location request, wait until after we've checked whether we've satisfied our min rows 2023-06-20 16:02:12 -07:00
Evan Tschannen
88eed268c3 added a knob for how many bytes are read from disk 2023-06-11 16:10:20 -07:00
Evan Tschannen
a8ceadd917 actor cancellation still needs to unset storage 2023-06-11 14:55:05 -07:00
Evan Tschannen
359e178dcd Merge branch 'main' into feature-durable-change-feed
# Conflicts:
#	fdbclient/ClientKnobs.cpp
#	fdbserver/BlobManager.actor.cpp
#	fdbserver/worker.actor.cpp
2023-06-11 13:58:35 -07:00
Evan Tschannen
f69f4c73ad addressed review comments 2023-06-11 13:54:38 -07:00
Evan Tschannen
7322e21e23 fixed compiler error 2023-06-11 09:25:05 -07:00
Evan Tschannen
334a868dfe fix: respect end when reading from disk; update the starting version when leaving a hole on disk 2023-06-11 09:24:09 -07:00
Evan Tschannen
d03f08f914 fix: not all mutations were being made durable 2023-06-10 18:36:02 -07:00
Evan Tschannen
be8d8a8f72 fix: popping the cache was removing too many versions 2023-06-09 16:20:48 -07:00
Evan Tschannen
33a7f57da5 fix: clear the cache when popping change feeds; do not insert versions into the cache that are already durable 2023-06-09 13:49:33 -07:00
Evan Tschannen
197c39b552 cache change feeds using a storage engine to avoid reading them for the server on startup 2023-06-07 08:41:31 -07:00
Vaidas Gasiunas
60753b5b57 Fix a couple thread-safety issues (#10359)
* Make CodeProbeImpl::_hitCount atomic

* Structure access to TraceLog::logTraceEventMetrics so that it is written before a trace log is opened and only read from one thread after it is opened.

* Fix condition in assert

* Rename TraceLog::log to logMetrics and move initialization of trace log metrics into TraceLog::open

---------

Co-authored-by: A.J. Beamon <aj.beamon@snowflake.com>
2023-05-26 19:36:02 +02:00
He Liu
8ad7ec6fdf Psm ss (#9817)
* Update NativeAPI getCheckpointForRange().

* Implemented checkpoint in SS.

* clean up.

* Disabled StorageServerCheckpointTest.

* Serialized checkpoint creation and deletion.

Simplified checkpoint GC, via deleting CheckpointMetaData::dir.

* Fixed PhysicalShardMove test. Where fetchCheckpoint target range is misset.

* Minor improvements on CheckpointMetaData and DataMoveMetaData.

* fmt.

* Optimized PhysicalShardMove test

cleanup.

* Refactored ShardedRocks checkpoint/restore for psm.

* Complete ShardedRocks::restore.

* dismiss operation_obsolete, and throw actor_cancelled.

* Validate checkpoint when !asKeyValues.

* fmt.

* Don't read from uninitialized physical shard.

* Resolved commments.

* cleanup.

* Added verify_checksum_before_restore for ShardedRocks.

* Added ShardedRocksDB checkpoint/restore unit test.

* Populate CheckpointMetaData::dir in RocksDB.

* Rename MovingIn as Adding.

* Added StorageServerUtils.

* Added physical shard move in SS.

* Fix on ApplyMetaData, doFetchFile error handling etc.

* Debugging incorrect shard size.

* Create/delete checkpoints only when Physical shard move is enabled.

* Added back SHARD_ENCODE_LOCATION_METADATA.

* Fixed bytesSample incorrect issue.

Essentially dedicated CheckpointRocksDBCF as key-value based checkpoint, will need to add a new format for the file-based checkpoint.

* Cleanup.

* Cleanup & compile rocksdb with 8.1 branch.

* clean up.

* clean up.

* Allowed request_maybe_delivered error type in FetchShard.

* Added FDBRocksDBVersion.h.

* Fixed stuck fetchShard.

* Don't create checkpoint on TSS.

* Upgrade to RocksDB 8.1.1

* Cleanup.

* Fixed accidently deleted db_path and name fields.

* Improved trace event.

* Removed redundants from previuos ShardedrocksDB.

* Cleanup.

* cleanup.

* cleanup.

* reanme `state`.

* Cleanup.

* Removed excessive TraceEvent.

* * Fixed shardMap race condition on different threads
* Added *Stats, logging data move rates.
* Added `DD_PHYSICAL_SHARD_MOVE_PROBABILITY` to support hybrid data move.

* Resolved comments.

* fmt.

* Use physical shard move in PhysicalShardMoveTest.

* Enforce physical-shard-move for PhysicalShardMoveTest.

* fmt
2023-05-23 11:18:35 -07:00
sfc-gh-tclinkenbeard
7ef66ab356 Add OutstandingWatches and WatchMapSize to TransactionMetrics 2023-05-22 12:07:10 -07:00
Hui Liu
7ca13d8f9c support blob restore in fdbrestore (#10248) 2023-05-19 14:45:14 -07:00
A.J. Beamon
712fefd59f Merge pull request #10213 from sfc-gh-ajbeamon/tenant-code-probes
Add code probes for tenant and metacluster code
2023-05-15 12:13:00 -07:00
Sam Gwydir
6c16875c34 Add networkoption to disable non-TLS connections (#9984)
* Add networkoption to disable non-TLS connections

* add disable plaintext connection to fdbserver

* python doc

* Formatting

* Add tls disable plaintext connection to client api test

* review

* fix negative test

* formatting

* add TLS support to c client config tests

Adds support for TLS in the client and server separately

* add tests for disable_plaintext_connections

Test TLS and Plaintext Clusters and Clients

* Fix documentation

* Rename option to indicate it is client-only

* clearer formatting

* default to allowing plaintext connections

* add SetTLSDisablePlaintextConnection to go bindings
2023-05-13 00:14:11 +02:00
A.J. Beamon
d8141c049d Add code probes for tenant code 2023-05-10 20:44:39 -07:00
Josh Slocum
9a2365daa8 fixing bugs with tenant_mode required on external clients and changin… (#10183)
* fixing bugs with tenant_mode required on external clients and changing test to find them

* Update fdbcli/BlobKeyCommand.actor.cpp

Co-authored-by: A.J. Beamon <aj.beamon@snowflake.com>

---------

Co-authored-by: A.J. Beamon <aj.beamon@snowflake.com>
2023-05-09 13:41:58 -05:00
Josh Slocum
e69d54fbc0 Block unblobbify (#10182)
* stregthening check for not merging consecutive blob ranges

* implementing expanded unblobbify and changing tests to account
2023-05-09 11:43:11 -05:00
Josh Slocum
6be0c74d5b Adding explicit blob range mutation log to handle large number of ranges (#10174)
* Adding explicit blob range mutation log to handle large number of ranges

* fixing ide build
2023-05-09 11:30:04 -05:00
Jay Zhuang
0ab691b707 Merge pull request #10002 from sfc-gh-jazhuang/readThrough
Fix RangeResult.readThrough misuse
2023-04-27 09:59:11 -07:00
Xiaoxi Wang
a05e078c4a Remove locations.size() == expectedShardCount assertion and add comments 2023-04-26 14:23:09 -07:00
Steve Atherton
7f6d5f296a Merge commit 'e318fc260070ba6ba604930b8f259c9b655938ea' into keybackedrangemap
# Conflicts:
#	flow/include/flow/error_definitions.h
2023-04-25 14:21:23 -07:00
Jingyu Zhou
6b15d67928 Merge pull request #10010 from jzhou77/main
Properly handle proxy_memory_limit_exceeded error for GetKeyServerLocationsRequest
2023-04-25 11:18:03 -07:00
Jingyu Zhou
74bb659f71 Simplify backoff calls per comment 2023-04-25 09:15:16 -07:00
Jingyu Zhou
c544985fe5 Add trace events and adjust backoff
For each success, half the backoff until less than initial backoff value, then
set the backoff to 0.
2023-04-20 15:56:06 -07:00
Jingyu Zhou
a83295e3bd Add backoff to lookupTenantImpl for commit_proxy_memory_limit_exceeded error 2023-04-19 16:55:46 -07:00
Jingyu Zhou
3bfd353a22 Add backoff to getKeyLocation_internal as well 2023-04-19 16:45:50 -07:00