Commit Graph

1044 Commits

Author SHA1 Message Date
Sepeth
2a82f22fe5 Fix warnings for long long or int64_t format specifiers by switching to fmt::print* (#11574) 2024-09-12 12:10:40 -07:00
Jingyu Zhou
e1781b5234 Merge pull request #11571 from jzhou77/fix 2024-08-13 19:53:40 -07:00
Syed Paymaan Raza
a12b8a7ffc [fdbcli] Add exclude in progress signal (#11569) 2024-08-13 15:37:01 -07:00
Jingyu Zhou
d0b6430a69 Add more trace events for exclude command
Use PRIORITY_SYSTEM_IMMEDIATE for excludeServersAndLocalities() call.
2024-08-12 18:19:57 -07:00
Syed Paymaan Raza
392bad2bd3 More copyright end year updates (#11556) 2024-08-05 14:00:32 -07:00
Syed Paymaan Raza
c3e7542cda Update end year in copyright header 2024-08-02 09:40:11 -07:00
Zhe Wang
74990e44bd Bulk Loading Framework (#11369) 2024-07-23 14:57:28 -07:00
Jingyu Zhou
d9e4c49503 Fix more -Wunused-variable warnings 2024-07-17 15:35:49 -07:00
Zhe Wang
8e099d276d Improve checkall tool (#11440) (#11476)
* improve checkall

* fmt

* simplify

* nit

* simplify

* nit
2024-06-25 23:56:12 -07:00
neethuhaneesha
c96dcc74a7 Add rocksdb direct_io knobs. 2024-03-27 10:34:00 -07:00
Zhe Wang
a930806d36 cherry-pick-checkall-to-main 2024-02-15 21:35:35 -08:00
Dimitris Apostolou
a88114c222 Fix typos 2024-02-07 01:16:00 +02:00
hao fu
e8cdfc5a0c Fix checkall when shard is large
the begin key has to be updated for checkall command when a shard
is large, this PR makes this change.
2024-01-09 18:19:49 -08:00
Sreenath Bodagala
8d2feda708 - Print warning (that the check was incomplete) irrespective of
whether the first batch of keys fetched are consistent or not.
2023-12-11 20:07:37 +00:00
Sreenath Bodagala
4edd5ec435 - Print indexes of corrupted keys. Also, print a warning in case the
check was incomplete.
2023-12-11 19:55:10 +00:00
Sreenath Bodagala
b9662794f4 - Print information about "cached" and "more" flags too. 2023-12-11 17:10:30 +00:00
Sreenath Bodagala
182ef6f199 - Bump up the number of keys that "fdbcli checkall" can fetch
per GetKeyValueRequest call.
2023-12-11 01:57:07 +00:00
Sreenath Bodagala
5c31d1a8f5 - Address a review comment 2023-12-07 15:56:05 +00:00
Sreenath Bodagala
fe13f740e6 - Address issues in code related to "fdbcli checkall". 2023-12-07 00:30:34 +00:00
Zhe Wang
1e9c5bb390 Propagate data move reason from DD to SS (#11063)
* encode reason to data move id

* address comments

* fix data move id decode bug and add assert for data move decode invariant

* address comments
2023-11-15 13:07:11 -08:00
He Liu
b8f1670a0e Physical shard move tss (#11057)
* Refactored newDataMoveId() and decodeServerKeysValue().

* Enabled physical shard move for tss.

* Added unit test & cleanup.

* clean up test configs.
2023-11-13 11:34:07 -08:00
Johannes Scheuermann
a0cb59244d Allow the exclusions of localities that are not matching any process (#11033) 2023-10-26 09:24:26 +02:00
Johannes M. Scheuermann
64b45088ae Make sure server list is validated against the excluded localites 2023-10-24 10:28:23 +02:00
William Dowling
0f752473be Merge branch 'main' into radixtree-production 2023-09-25 09:52:20 +02:00
Zhe Wang
29a2f63f8d Fix SSShard Audit (#10896)
* fix ssshard

* address comments

* fmt
2023-09-13 21:15:12 -07:00
Zhe Wu
9e5488dd3d Make sure that storage and tlog are always set to a valid type 2023-09-06 14:58:42 -07:00
Hui Liu
4d2a7d507d Add a new blob restore state to fix a race after data copy (#10854) 2023-09-05 14:04:35 -07:00
Lukas Joswiak
bfb1c51299 Add clearknob fdbcli command
The `clearknob` command clears the value that a knob has been set to in
the configuration database. Note that this does not mean the knob value
itself gets cleared - only the value in the configuration database is
cleared. The value of the knob will revert to whatever is hardcoded in
the corresponding `*Knobs.cpp` file.

Sample `fdbcli` session:

```
Welcome to the fdbcli. For help, type `help'.
fdb> getknob min_trace_severity
`min_trace_severity' is not found
fdb> setknob min_trace_severity 20
Please set a description for the change. Description must be non-empty
description: test
Committed (2)
fdb> getknob min_trace_severity
`min_trace_severity' is `20'
fdb> clearknob min_trace_severity
Please set a description for the change. Description must be non-empty
description: clear
Committed (4)
fdb> getknob min_trace_severity
`min_trace_severity' is not found
```

Transactions are also supported with the new `clearknob` command:

```
Welcome to the fdbcli. For help, type `help'.
fdb> begin
Transaction started
fdb> setknob min_trace_severity 20
fdb> clearknob min_trace_severity
fdb> commit
Please set a description for the change. Description must be non-empty.
description: test
Committed (16)
fdb> getknob min_trace_severity
`min_trace_severity' is not found
```
2023-08-31 17:36:05 -07:00
Zhe Wang
7e8f326277 Audit storage for specific engine (#10781)
* audit storage for specific engine

* fix getStorageType

* fix budget of skipAuditOnRange

* fix budget in scheduleAuditOnRange

* fix CI error

* improve trace events

* address comments
2023-08-23 10:51:24 -07:00
Zhe Wang
f1c17b27fc Multiple improvements to AuditStorages (#10685)
* remove danger DDAudit assert, add AuditRate knob, add progress check when ssshard complete, add progress check for ssshard in fdbcli

* throttle progress check for ssshard

* fix getAuditProgressByServer

* fix trace event for ss audit

* using name -- checkMoveKeysLockForAudit

* new scheduleAuditLocationMetadata

* address comments

* shorten progress summary for ssshard

* simplify getAuditProgressByServer in fdbcli
2023-08-14 13:13:49 -07:00
Zhe Wu
eb6f0c613d Add documentation for perpetual_storage_wiggle_engine config 2023-08-10 09:35:57 -07:00
Zhe Wu
ab4ae712e8 Add PerpetualWiggleStorageMigrationWorkload documentation. 2023-08-10 09:35:57 -07:00
Zhe Wu
863038a44c Add improvement for initializing storage server using new perpetual_wiggle_storage_engine config 2023-08-10 09:35:57 -07:00
Jingyu Zhou
22a3ea803c Add "checkall" debug command for fdbcli (#10687)
* Add "checkall" for checking \xff\x02/blog/ keys

* Avoid GRV calls for getlocation

* Update comments

* Add non-stopping checking and remove verbose output

* Update checkall command to accept customized range

* Fix format

* Fix a compiling issue and output
2023-07-26 17:19:16 -07:00
Zhe Wang
522c9d4f0f Add new implementation of audit storage for user data (#10613)
* remainingBudgetForAuditTasks should be managed within audit

* fix CI

* add audit storage test for various ranges

* clean DD

* new auditStorageUserDataQ

* fix assert fail in startTrackShardAssignment

* fix assert fail in ssaudit

* address comments

* replace assert with audit_cancel in ss audits

* add audit check progress tool

* add observability to audit progress and fix audit bugs

* fix audit progress issues and add sim test for audit progress and add trace event for the audit progress and add fdbcli to track the audit progress

* remove old audit storage on SS

* check audit progress when auditCore completes
2023-07-16 09:56:26 -07:00
William Dowling
3ea1ba1648 Remove beta status from RadixTree storage engine 2023-07-05 17:54:54 +02:00
Yanqin Jin
626a8a1a5f SNOW-804199 Support restoring a cluster with a tenant in the error state (#357)
If we restore a cluster and a previously created tenant was not included in the backup, then the tenant will be marked in an error state on the management cluster. It is then up to the operator to resolve the error, generally by deleting the tenant and recreating it if needed.

There is, however, the possibility that we restored a backup that was older than we wanted, and a newer backup would have the tenant. If we tried to restore the newer backup, it would not leave the previously missing tenant in a fully usable state.

We need to have a way to deal with this case. One option is to allow us to clear the error state of a tenant, and that can be performed before (or maybe even after) the second restore.

Test plan:
Joshua test
100K ensemble: 20230613-225414-yajin-439d13ef3c6b3afd fail=0
2023-06-15 22:23:46 -07:00
Josh Slocum
31e4610b56 misc operational and documentation improvements (#10465)
* misc operational and documentation improvements

* fixing doc build
2023-06-12 15:14:01 -05:00
Jon Fu
b4e2aef58b add tenant_id_prefix to metacluster status (#10455) 2023-06-09 15:03:49 -04:00
Jingyu Zhou
66b0699774 Fix IDE build 2023-06-08 16:59:17 -07:00
Jingyu Zhou
b8c0087ca6 Fix compiling errors 2023-06-07 15:10:00 -07:00
Jingyu Zhou
614686f737 Add getlocation and getall fdbcli debug commands
getlocation: returns the SS list for a key
getall: returns both the SS list and values on the SS for a key
2023-06-07 14:36:16 -07:00
He Liu
ea2b611061 Print server IP address. (#10423) 2023-06-07 13:22:25 -07:00
Josh Slocum
220b7d1a37 Consistency scan test improvements (#10402)
* adding consistency scan clear stats and testing in simulation

* Adding test that intentionally injects corruption in consistency scan requests and ensures the scan finds it

* cleanup

* adding assert false to disabled code
2023-06-07 07:21:47 -05:00
Zhe Wang
f8f8f72c4e Add audit storage cancellation (#10386)
* list audits

* cancel audits and corresponding tests

* make audit storage dblock aware

* increase audit retry since we are able to cancel

* fix updateAuditState and fdb github ci

* fmt

* fix fdbcli audit_storage and fix CI issue

* fix fdb cli

* address comments

* fmt
2023-06-06 14:29:53 -07:00
He Liu
fc8543125c Added location_metadata fdbcli to query shard locations, assignements… (#10395)
* Added location_metadata fdbcli to query shard locations, assignements, numbers etc.

* Added `listshards` to get some random physical/non-physical shards.

* Resolved comments.
2023-06-06 10:33:48 -07:00
Zhe Wang
61aaca005e SS Audit Storage Throttling (#10322)
* ss audit storage throttling

* add audit manager to ss

* reduce CONCURRENT_AUDIT_TASK_COUNT_MAX

* revises comments

* fix audit cli

* fix getAuditStates

* remove toStringForCLI
2023-05-29 14:43:47 -07:00
Hui Liu
7ca13d8f9c support blob restore in fdbrestore (#10248) 2023-05-19 14:45:14 -07:00
Josh Slocum
2916a11a86 New ConsistencyScan (#10265)
* Remove duplicate getRange() for DB handles and update existing GetRange to accept DB handles.

* Initial progress checkpoint on new ConsistencyScan role.

* Updated TODOs, finished most if not all state updates.

* placeholder

* Add more TODOs, documentation and comment improvements.

* Checkpoint round state to avoid advancing progress if commit fails.

* Bug fix, check is supposed to be for overlap, not lack of overlap.

* Added more TODO's and added faked read results / exceptions and faked DB size retrieval to prove the consistencyScanCore logic works.

* Update JSON schemas and command help.

* Add comment about lifetime stats reset.

* More TODO comments and some renames for clarity, some bug fixes.

* properly stopping consistency scan in simulation so that it doesn't run forever and cause quiet database to fail

* removing trailing comma from consistency_scan json schema

* Making CC inconsistency not an error if it's intentional tss corruption

* consistency scan actually reads storage locations

* added check that consistency scan actually completes a round in simulation, fixed bug and added debugging around consistency scan getting stuck

* made consistency scan properly fetch database size

* refactoring data check to be used in both consistency scan and consistency check

* checking that consistency scan always completes at least one round and doesn't get stuck

* cleanup

* fixing ide build

* consistencyscan fdbcli command wasn't actually changing db state

* consistencyscan fdbcli command always said enabled even when it wasn't

---------

Co-authored-by: Steve Atherton <steve.atherton@snowflake.com>
2023-05-18 15:02:41 -05:00
Sam Gwydir
6c16875c34 Add networkoption to disable non-TLS connections (#9984)
* Add networkoption to disable non-TLS connections

* add disable plaintext connection to fdbserver

* python doc

* Formatting

* Add tls disable plaintext connection to client api test

* review

* fix negative test

* formatting

* add TLS support to c client config tests

Adds support for TLS in the client and server separately

* add tests for disable_plaintext_connections

Test TLS and Plaintext Clusters and Clients

* Fix documentation

* Rename option to indicate it is client-only

* clearer formatting

* default to allowing plaintext connections

* add SetTLSDisablePlaintextConnection to go bindings
2023-05-13 00:14:11 +02:00