These features have been previously marked for deletion per PR #12400.
This change necessarily affects a lot of files. In general I found it preferable to cut along the FDB <-> tenant boundary, rather than try to cut tenant into multiple pieces, stitch the Frankenstein tenant implementation back together with FDB, and generally remove the limbs one by one. So it is a single big deletion.
Note that some tenant-related metadata has been written in a non-flag-controlled manner by prior releases and probably must be ignored indefinitely. Fortunately this is isolated to include/fdbclient/ClientLogEvents.h. (Details: deleting an Optional from a serialized struct results in deserialization of garbage in upgrade tests. The serialized nullopt to indicate "no Tenant" is formally part of FDB persistent metadata even in FDB clusters that never would have enabled the tenant feature.)
During the course of testing these changes, many interesting bugs were encountered. I won't discuss details of them here. Causes range from flat out damage (by me) to production code in the course of removing tenant related bits (mainly in NativeAPI.actor.cpp and CommitProxy.actor.cpp), damage to various workload files (particularly FuzzApiCorrectness.actor.cpp, which is very sensitive to changes), and many toml files needing updated test flags/options.
More testing details: https://quip-apple.com/Zr6VAycxoli9
20251209-012852-gglass-8ff850b772d868f2 compressed=True data_size=35311687 duration=21671404 ended=500000 fail_fast=1000 max_runs=500000 pass=500000 priority=100 remaining=0 runtime=2:31:30 sanity=False started=500000 stopped=20251209-040022 submitted=20251209-012852 timeout=5400 username=gglass
* remove some unneeded tests, and remove mentions of deleted tests from tests/CmakeLists.txt
* Initiate removal of metacluster. NOTE: this seems to also want removal of tenant. Consider removing them together.
* work on removing metacluster
* delete files with `Tenant` in the name, having reviewed them to ensure that they basically contain what the name implies
* fdb_c.h: remove prototypes for C API methods which have been deleted (blob granule) or which are so long deprecated that they are outside any reasonable/documented support window
* Surgical removal of tenant references from files in bindings/ top level directory. Compilation not yet attempted.
* Surgical removal of tenant related stuff from fdbcli/ top level directory. Compilation not yet attempted.
* Misc tenant code removal, and other stuff which I think may not be needed. Compilation still not attempted.
* Remove more tenant or tenant-adjacent or blob-granule-adjacent stuff. Or at least stuff that looks adjacent to that stuff. Not compiled or tested.
* Start removing Tenant stuff from fdbclient/. Far from complete. Compilation not attempted.
* Remove tenant references from many source files. There are still about 7 principal fdbclient/ and fdbserver/ files with a lot of tenant logic left to delete. Also, all of fdbserver/workloads needs to be looked at. Still have not attempted compilation.
* Remove tenant entanglement from watch functionality
* Remove tenant stuff from fdbserver/tester.actor.cpp
* Delete metacluster workloads
* Remove tenant related stuff from workloads. Also taken the liberty of removing some functionality that appears unused or untestable by Apple.
* Checkpoint tenant removal from FuzzApiCorrectness.actor.cpp
* NativeAPI.actor.cpp: `Tenant` has left the building.
* SimulatedCluster.actor.cpp: `Tenant` has left the building
* DDShardTracker.actor.cpp: Tenant evicted
* storageserver.actor.cpp: `tenant` has left the building.
* fdbserver/workloads/FuzzApiCorrectness.actor.cpp: remove tenant references, but some lingering cleanup needed in `loadAndRun`
* FileBackupAgent.actor.cpp: tenant has left the building
* CommitProxyServer.actor.cpp: remove tenant
* Remove more tenant references from misc files such as bindings tests, documentation, and some fdbserver headers I left earlier
* Fix missing-file errors in CMakeLists.txt files. This is the first attempt to compile this stuff.
* checkpoint misc changes to fix compile errors
* checkpoint more compile fixes
* StorageServerInterface.h: put back more verify() calls
* More misc compile fixes
* whole bunch of misc fixups including some code put-backs to address compile errors
* More compile fixes
* More compile fixes. Still does not compile.
* incremental compile fixing
* ...
* ...
* Checkpoint a bunch of compile fixes. Not quite there but getting closer
* More compile fixes. There seem to be about 10 files left, mainly CommitProxyServer.actor.cpp and storageserver.actor.cpp
* IT COMPILES NOW. THIS IS STILL ALL UNTESTED. Unsurprisingly, CommitProxyServer.actor.cpp and storageserver.actor.cpp took the most tweaking.
The updates in CMakeLists.txt and workloads/UnitTests.actor.cpp are basically trivial and mainly reflect
the ordering of dependencies -- that stuff didn't get attempted until all of fdbserver compiled.
* Put back one block relating to encryption at rest mode. Simplify some TODO(gglass) instances.
* Put back some encryption related knobs
* remove `enable_tenants` from local_cluster.py to maybe fix some ctests
* Remove tenant related options from toml files.
* feature-status.md: add a line for encryption at rest, which seems to have been added for multi-tenant; status is now in doubt
* Fix a pretty bad bug introduced in tenant deletion; ensure we dont attempt to construct a std::string of negative length
* workloads/FuzzApiCorrectness.actor.cpp: avoid division by zero
* flow/Platform.actor.cpp: add a try/catch wrapper around side threads; emit a better addr2line type command
* NativeAPI.actor.cpp: fix a bug introduced in tenant removal relating to reporting conflicting keys under conflictingKeysRange
* ReportConflictingKeys.actor.cpp: separate an ANDed assert into two asserts
* SpecialKeySPaceCorrectness.actor.cpp: put back some logic removed with tenant removal. This test was failing due to a bug with conflict key range reporting. Fixed separately in NativeAPI.actor.cpp.
* remove QuotaCommand.actor.cpp
* Force disable tenant and encryption on disk in upgrade tests
* Add back file I guess I deleted? who knows
* put back another file
* design/feature-status.md: update the new row for encryption at rest to firm up the claim that it is experimental, unowned, and scheduled for deletion
* Remove EncryptKeyProxyTest since we do not use it
* new file tests/slow/BulkDumpingS3WithChaos.toml: remove tenantModes setting
* Undo damage to pushToBackupMutations() from removing tenant feature. This caused inverted_range errors and failed commits in backup related simulations.
* tests/restarting/from_7.4.0/Snap*-1: ensure that tenantModes = disabled
* Try again on workloads/FuzzApiCorrectness.actor.cpp
* simplify tenant-free (mostly) FuzzApiCorrectness workload code
* try harder to remove lingering tenant-related brokenness from FuzzApiCorrectness.actor.cpp
* Explicitly specify tenantModes = ['disabled'] in all the -1 restart files
* Remove tenantModes from 7.1-based upgrade tests as its an unknown option. Hopefully the code doesnt actually turn on tenant stuff
* do not specify tenantModes in downgrade tests
* Downgrade test to_7.4.5: dont say tenantModes
* more tenantModes updates
* Remove a legacy allowDefaultTenant that no longer is meaningful in downgrade to 8.0
* Put back empty Optional<TenantName> turdlets into serialized log events to avoid breaking ClientTransactionProfilingCorrectness upgrade tests (even with tenantMode = disabled)
* disable encryption on a few more upgrade related test cases. That feature is slated for removal anyway
* Remove unneeded workload files that have been subject to #if 0 for a while. Remove commented out block in ClusterRecovery
* disable encryption in more upgrade tests
* Remove choice four-letter words from commentary
* Format 42 files
* Try to fix a doc bug failing the CI build
* More doc compilation error fixes
* Delete more tenant junk from documentation
* fix spelling mistake in comment
* Remove deleted cross-references from documentation. This necessitated editing release 3.0.0 release notes, which is insane.
* Remove more tenant stuff from bindings tests
* Remove more tenant bits from design/ files
* Remove more tenant related stuff
* Delete more tenant references. Put back ten-ant spellings as tenant now that grep output is substantially reduced.
* Put back some tenant stuff into apitester; its deletion seems to have introduced bugs. Also whine about comments some more, because, really, the comments deserve it.
* Updates to workload files and one other thing based on review comments
* de-actorify decodeKVPairs
* format one source file
* Restore transaction tagging doc
* Restore throttle doc details in administration.rst
* Restore fdbserver/workloads/GetEstimatedRangeSize.actor.cpp and associated toml file, minus tenant stuff
* bindings/c/test/{shim related}: update comments and disable functionality that no longer works post-tenant
* put the cli-throttle tag back in
* bindingtester: fix python syntax errors
* remove useless comment
* Remove comment about useless comments, and remove the useless comments
* Initiate deletion of storage cache feature. This is rough and is mostly done by commenting out code in case backtracking is needed. Compiles. Not tested.
* fix some test errors about cache consistency check options which we no longer care about
* design/feature-status.md: Storage Cache status updated to `has been deleted`.
* Delete it for real
* disable BackupS3BlobCorrectness.toml because it fails a lot
* In the interest of a single-purpose, clean diff, put back a removed dumb warning that generates compile error noise
* fix formatting
* Add TODO comment to remove tagLocalityLogRouter
* fix typo
Continuing the deletions of unowned experimental features as listed in #12400.
ChangeFeed is mostly contained in NativeAPI.actor.cpp and storageserver.actor.cpp, with a modest amount of code in dedicated files and a scattering of updates in misc other places where features tend to pile into.
There are a few lingering TODOs for fine tuning of the additional removal, including a state machine in storageserver.actor.cpp. My preference is to checkpoint this diff before continuing with more experimental/risky fine grain surgery in close proximity to code which must remain. This PR nets -6000 lines, mostly in NativeAPI.actor.cpp and storageserver.actor.cpp, so benefits should accrue in terms of compile times and general "less unwanted code showing up on your screen" when working in these files.
Ran overnight:
20251016-003236-gglass-46416fec30cddcb0 compressed=True data_size=38462735 duration=14586409 ended=310182 fail=10 fail_fast=10 max_runs=500000 pass=310172 priority=100 remaining=0 runtime=1:21:30 sanity=False started=314133 stopped=20251016-015406 submitted=20251016-003236 timeout=5400 username=gglass
The 10 failures were in a specific recent unit test failure not related to this PR (link to details shared in Slack).
Prior to this:
20251015-232157-gglass-46416fec30cddcb0 compressed=True data_size=38462735 duration=5241939 ended=99998 fail=2 fail_fast=10 max_runs=100000 pass=99996 priority=100 remaining=0 runtime=1:10:39 sanity=False started=100000 stopped=20251016-003236 submitted=20251015-232157 timeout=5400 username=gglass
Those 2 failures were one existing bug (there is a radar for it) and one where amusingly Joshua decided to run a deleted BlobGranule test case, which I am just going to ignore.
* Initiate deletion of changefeed feature. Probably does not compile
* Checkpoint removal of changefeed reature. This set of changes compiles but is untested.
* feature-status.md: ChangeFeed: status is now `has been deleted`.
* Format code. This passed 100k simulations minus 2: one was, Joshua running a deleted test case (wtf?) and the second was a test case with an open radar
This is the first experimental feature to be deleted in the list published at PR #12400.
There is more code here than I anticipated. It is about 40,000 lines total, of which about three quarters are in dedicated files which I am deleting, and about one quarter is in shared files. That means about 10k lines in shared files, which is the stuff we tend to notice day to day (that plus the test failures on heretofore not-yet-disabled test cases, which I am now deleting).
I ran 3 million simulations, mostly against 692df86 or very similar code (differing by one TraceEvent). This was prior to syncing with upstream/main, which had no conflicts and from which I don't expect problems. The number of failures in these runs was about 8. We looked at them and believe there is a high likelihood that these are existing issues not related to the changes in this PR. More details on these failures can be found in docs linked from here: https://quip-apple.com/MN7gAyXLjgyn
* change Long Term status for unowned features for "scheduled for deletion" where applicable
* Relax wording about scheduled for deletion features
* Delete blob granule feature. WIP. Does not compile.
* more incremental hacking to remove / comment out blob granule related code
* more hacking to remove blob granule related code, e.g. blob manager and blob migrator roles
* delete more blob granule stuff
* more hacking
* more hacking
* more hacking
* More changes to remove blob granule related code. IT COMPILES NOW
* dont try to run AuthzSecurity tests as we have deleted that workload as part of this effort
* delete more stuff that matches, abbreviates, or smells like blob granule related
* EncryptKeyProxy: dont do blobMetadata stuff, because that is not used and support is being removed
* delete more references to blob granule stuff
* SimulationConfig::setEncryptionAtRestMode: always use DISABLED; also disable EncryptKeyProxyTest.toml
* format code
* manual update to bindings/java/src/tests.cmake to remove a deleted file
* fix compile errors. I guess by default I dont build Java bindings
* remove unneeded blob granule functions rather than #if..#endif them out
* remove more code in #if..#endif
* remove more code in #if 0..#endif
* revert changes to fdb_c.h in preparation for marking removed API calls as removed
* rework C API declarations to in preparation for marking blob granule APIs as removed
* deprecate removed glob granule related API functions as of version 740 (and add a comment to request a justification of this convention)
* make progress on broken ctests. E.g. 1) python does not need to do blob granule stuff. 2) authz tests seemingly not needed
* remove blob granule stuff from Java and Python APIs and fix test runner stuff so that ctests pass
* reformat comments to fix compile error. FIXME: why is this error not happening on the default compile commands we use
* hacks all the way down to try to fix the Mac build
* add pointed comment about the perceived pointlessness of the API deprecation scheme embodied in this source file
* really serious about the C++ style comments, arent we
* remove commented-out code from prior iterative efforts
* put back undeleted code in original order
* delete commented-out code
* update feature-status.md to say blob granule is mostly deleted
* upgrade `mostly deleted` to `has been deleted`
The `clearknob` command clears the value that a knob has been set to in
the configuration database. Note that this does not mean the knob value
itself gets cleared - only the value in the configuration database is
cleared. The value of the knob will revert to whatever is hardcoded in
the corresponding `*Knobs.cpp` file.
Sample `fdbcli` session:
```
Welcome to the fdbcli. For help, type `help'.
fdb> getknob min_trace_severity
`min_trace_severity' is not found
fdb> setknob min_trace_severity 20
Please set a description for the change. Description must be non-empty
description: test
Committed (2)
fdb> getknob min_trace_severity
`min_trace_severity' is `20'
fdb> clearknob min_trace_severity
Please set a description for the change. Description must be non-empty
description: clear
Committed (4)
fdb> getknob min_trace_severity
`min_trace_severity' is not found
```
Transactions are also supported with the new `clearknob` command:
```
Welcome to the fdbcli. For help, type `help'.
fdb> begin
Transaction started
fdb> setknob min_trace_severity 20
fdb> clearknob min_trace_severity
fdb> commit
Please set a description for the change. Description must be non-empty.
description: test
Committed (16)
fdb> getknob min_trace_severity
`min_trace_severity' is not found
```
* list audits
* cancel audits and corresponding tests
* make audit storage dblock aware
* increase audit retry since we are able to cancel
* fix updateAuditState and fdb github ci
* fmt
* fix fdbcli audit_storage and fix CI issue
* fix fdb cli
* address comments
* fmt
* Added location_metadata fdbcli to query shard locations, assignements, numbers etc.
* Added `listshards` to get some random physical/non-physical shards.
* Resolved comments.
* Add networkoption to disable non-TLS connections
* add disable plaintext connection to fdbserver
* python doc
* Formatting
* Add tls disable plaintext connection to client api test
* review
* fix negative test
* formatting
* add TLS support to c client config tests
Adds support for TLS in the client and server separately
* add tests for disable_plaintext_connections
Test TLS and Plaintext Clusters and Clients
* Fix documentation
* Rename option to indicate it is client-only
* clearer formatting
* default to allowing plaintext connections
* add SetTLSDisablePlaintextConnection to go bindings
* Implemented AuditUtils.actor.cpp
Moved AuditUtils to fdbserver/
* Persist AuditStorageState.
* Passed persisted AuditStorageState test.
* Added audit_storage_error to indicate a corruption is caught.
Throw/Send audit_storage_error when there is a data corruption.
Added doAuditStorage() for resuming Audit.
* Load and resume AuditStorage when DD restarts.
* Generate audit id monotonically.
* Fixed minor issue AuditId/Type was not set.
* Adding getLatestAuditStates.
* Improved persisted errors and added AuditStorageCommand.actor.cpp for
fdbcli.
* Added `audit_storage` fdbcli command.
* fmt.
* Fixed null shared_ptr issue.
* Improve audit data.
* Change DDAuditFailed to SevWarn.
* Sev.
* set SERVE_AUDIT_STORAGE_PARALLELISM to 1.
* Moved AuditUtils* to fdbclient/.
* Added getAuditStatus fdbcli command.
* Refactor audit storage fdb cli commands.
* Added auditStorage in sim.
* Cleanup.
* Resolved comments.
* Resolved comments.
* Test disabling audit for sims.
* Cleanup.
Co-authored-by: He Liu <heliu@apple.com>
- setknob <knob_name> <knob_value> [config_class]
- getknob <knob_name> [config_class]
- Added new option to begin to specify if it's a configuration txn. Syntax is begin [config-txn]
- Added utility function for converting tuples to string
- Added knobmanagment test in fdbcli_tests.py
* Recruit new singleton for consistency checker.
* Recruit the consistency checker only if enabled.
* Add a yield in monitorConsistencyChecker().
* Minor fixes.
* Consistency check workload enhancements.
* Minor fixes and clarifications.
* clang format
* Clang format.
* Minor fixes, cleanup, debug tracing.
* Misc.
* Move the consistency scan information from dbconfig to a key backed object.
* Move consistency scan config out of db cofig to a state object and feature rename.
* ConsistencyCheck workload refactor.
* devFormat
* Update fdbcli/ConsistencyScanCommand.actor.cpp
* Review Comments.
Co-authored-by: negoyal <neelam.goyal@gmail.com>
Co-authored-by: Ata E Husain Bohra <ata.husain@snowflake.com>
* Remove API 720 guards for tenants (experimental feature) and the cluster ID special keys (no need to guard)
* Enable the relaxed special key access in transactions that need to use special key-space APIs introduced in 7.2
At least one of the coordinator addresses in cluster file must contain ":tls" suffix
if fdbcli's resolved TLS client configuration holds any of the TLS elements (key, cert, or CA)
Conversely, if none of the TLS elements are configured,
at least one of coordinator addresses must be without ":tls" suffix
* Add the verify option for \xff\xff/worker_interfaces
* Remove unused code
* update documentations
* update documentations
* solve comments from review
* update some of the comments to be more clear