441 Commits

Author SHA1 Message Date
Jingyu Zhou
2d2a2144f4 Update copyright years to 2013-2026 (#12653)
No functional changes.
2026-01-22 10:49:41 -08:00
Jingyu Zhou
dfbde65a14 Remove blob failure injections (#12620)
* Remove blob failure injections

Follow-up for the cleanup done at #12435. These functions are unused now.

* Fix an assertion failure in simulation

sim2 has "ASSERT(seconds >= -0.0001);" in delay() function, which was
triggering from the tlog code.

Reproduction:

-f ./tests/fast/SidebandSingle.toml -s 3567205446 -b on
2026-01-06 16:09:11 -08:00
gxglass
bab7637d87 Delete multitenant and metacluster features (#12583)
These features have been previously marked for deletion per PR #12400.

This change necessarily affects a lot of files. In general I found it preferable to cut along the FDB <-> tenant boundary, rather than try to cut tenant into multiple pieces, stitch the Frankenstein tenant implementation back together with FDB, and generally remove the limbs one by one. So it is a single big deletion.

Note that some tenant-related metadata has been written in a non-flag-controlled manner by prior releases and probably must be ignored indefinitely. Fortunately this is isolated to include/fdbclient/ClientLogEvents.h. (Details: deleting an Optional from a serialized struct results in deserialization of garbage in upgrade tests. The serialized nullopt to indicate "no Tenant" is formally part of FDB persistent metadata even in FDB clusters that never would have enabled the tenant feature.)

During the course of testing these changes, many interesting bugs were encountered. I won't discuss details of them here. Causes range from flat out damage (by me) to production code in the course of removing tenant related bits (mainly in NativeAPI.actor.cpp and CommitProxy.actor.cpp), damage to various workload files (particularly FuzzApiCorrectness.actor.cpp, which is very sensitive to changes), and many toml files needing updated test flags/options.

More testing details: https://quip-apple.com/Zr6VAycxoli9

20251209-012852-gglass-8ff850b772d868f2 compressed=True data_size=35311687 duration=21671404 ended=500000 fail_fast=1000 max_runs=500000 pass=500000 priority=100 remaining=0 runtime=2:31:30 sanity=False started=500000 stopped=20251209-040022 submitted=20251209-012852 timeout=5400 username=gglass

* remove some unneeded tests, and remove mentions of deleted tests from tests/CmakeLists.txt

* Initiate removal of metacluster. NOTE: this seems to also want removal of tenant. Consider removing them together.

* work on removing metacluster

* delete files with `Tenant` in the name, having reviewed them to ensure that they basically contain what the name implies

* fdb_c.h: remove prototypes for C API methods which have been deleted (blob granule) or which are so long deprecated that they are outside any reasonable/documented support window

* Surgical removal of tenant references from files in bindings/ top level directory.  Compilation not yet attempted.

* Surgical removal of tenant related stuff from fdbcli/ top level directory.  Compilation not yet attempted.

* Misc tenant code removal, and other stuff which I think may not be needed.  Compilation still not attempted.

* Remove more tenant or tenant-adjacent or blob-granule-adjacent stuff.  Or at least stuff that looks adjacent to that stuff.  Not compiled or tested.

* Start removing Tenant stuff from fdbclient/.  Far from complete.  Compilation not attempted.

* Remove tenant references from many source files.  There are still about 7 principal fdbclient/ and fdbserver/ files with a lot of tenant logic left to delete. Also, all of fdbserver/workloads needs to be looked at.  Still have not attempted compilation.

* Remove tenant entanglement from watch functionality

* Remove tenant stuff from fdbserver/tester.actor.cpp

* Delete metacluster workloads

* Remove tenant related stuff from workloads.  Also taken the liberty of removing some functionality that appears unused or untestable by Apple.

* Checkpoint tenant removal from FuzzApiCorrectness.actor.cpp

* NativeAPI.actor.cpp: `Tenant` has left the building.

* SimulatedCluster.actor.cpp: `Tenant` has left the building

* DDShardTracker.actor.cpp: Tenant evicted

* storageserver.actor.cpp: `tenant` has left the building.

* fdbserver/workloads/FuzzApiCorrectness.actor.cpp: remove tenant references, but some lingering cleanup needed in `loadAndRun`

* FileBackupAgent.actor.cpp: tenant has left the building

* CommitProxyServer.actor.cpp: remove tenant

* Remove more tenant references from misc files such as bindings tests, documentation, and some fdbserver headers I left earlier

* Fix missing-file errors in CMakeLists.txt files.  This is the first attempt to compile this stuff.

* checkpoint misc changes to fix compile errors

* checkpoint more compile fixes

* StorageServerInterface.h: put back more verify() calls

* More misc compile fixes

* whole bunch of misc fixups including some code put-backs to address compile errors

* More compile fixes

* More compile fixes.  Still does not compile.

* incremental compile fixing

* ...

* ...

* Checkpoint a bunch of compile fixes.  Not quite there but getting closer

* More compile fixes.  There seem to be about 10 files left, mainly CommitProxyServer.actor.cpp and storageserver.actor.cpp

* IT COMPILES NOW.  THIS IS STILL ALL UNTESTED.  Unsurprisingly, CommitProxyServer.actor.cpp and storageserver.actor.cpp took the most tweaking.

The updates in CMakeLists.txt and workloads/UnitTests.actor.cpp are basically trivial and mainly reflect
the ordering of dependencies -- that stuff didn't get attempted until all of fdbserver compiled.

* Put back one block relating to encryption at rest mode.  Simplify some TODO(gglass) instances.

* Put back some encryption related knobs

* remove `enable_tenants` from local_cluster.py to maybe fix some ctests

* Remove tenant related options from toml files.

* feature-status.md: add a line for encryption at rest, which seems to have been added for multi-tenant; status is now in doubt

* Fix a pretty bad bug introduced in tenant deletion; ensure we dont attempt to construct a std::string of negative length

* workloads/FuzzApiCorrectness.actor.cpp: avoid division by zero

* flow/Platform.actor.cpp: add a try/catch wrapper around side threads; emit a better addr2line type command

* NativeAPI.actor.cpp: fix a bug introduced in tenant removal relating to reporting conflicting keys under conflictingKeysRange

* ReportConflictingKeys.actor.cpp: separate an ANDed assert into two asserts

* SpecialKeySPaceCorrectness.actor.cpp: put back some logic removed with tenant removal.  This test was failing due to a bug with conflict key range reporting.  Fixed separately in NativeAPI.actor.cpp.

* remove QuotaCommand.actor.cpp

* Force disable tenant and encryption on disk in upgrade tests

* Add back file I guess I deleted?  who knows

* put back another file

* design/feature-status.md: update the new row for encryption at rest to firm up the claim that it is experimental, unowned, and scheduled for deletion

* Remove EncryptKeyProxyTest since we do not use it

* new file tests/slow/BulkDumpingS3WithChaos.toml: remove tenantModes setting

* Undo damage to pushToBackupMutations() from removing tenant feature.  This caused inverted_range errors and failed commits in backup related simulations.

* tests/restarting/from_7.4.0/Snap*-1: ensure that tenantModes = disabled

* Try again on workloads/FuzzApiCorrectness.actor.cpp

* simplify tenant-free (mostly) FuzzApiCorrectness workload code

* try harder to remove lingering tenant-related brokenness from FuzzApiCorrectness.actor.cpp

* Explicitly specify tenantModes = ['disabled'] in all the -1 restart files

* Remove tenantModes from 7.1-based upgrade tests as its an unknown option.  Hopefully the code doesnt actually turn on tenant stuff

* do not specify tenantModes in downgrade tests

* Downgrade test to_7.4.5: dont say tenantModes

* more tenantModes updates

* Remove a legacy allowDefaultTenant that no longer is meaningful in downgrade to 8.0

* Put back empty Optional<TenantName> turdlets into serialized log events to avoid breaking ClientTransactionProfilingCorrectness upgrade tests (even with tenantMode = disabled)

* disable encryption on a few more upgrade related test cases.  That feature is slated for removal anyway

* Remove unneeded workload files that have been subject to #if 0 for a while. Remove commented out block in ClusterRecovery

* disable encryption in more upgrade tests

* Remove choice four-letter words from commentary

* Format 42 files

* Try to fix a doc bug failing the CI build

* More doc compilation error fixes

* Delete more tenant junk from documentation

* fix spelling mistake in comment

* Remove deleted cross-references from documentation.  This necessitated editing release 3.0.0 release notes, which is insane.

* Remove more tenant stuff from bindings tests

* Remove more tenant bits from design/ files

* Remove more tenant related stuff

* Delete more tenant references.  Put back ten-ant spellings as tenant now that grep output is substantially reduced.

* Put back some tenant stuff into apitester; its deletion seems to have introduced bugs.  Also whine about comments some more, because, really, the comments deserve it.

* Updates to workload files and one other thing based on review comments

* de-actorify decodeKVPairs

* format one source file

* Restore transaction tagging doc

* Restore throttle doc details in administration.rst

* Restore fdbserver/workloads/GetEstimatedRangeSize.actor.cpp and associated toml file, minus tenant stuff

* bindings/c/test/{shim related}: update comments and disable functionality that no longer works post-tenant

* put the cli-throttle tag back in

* bindingtester: fix python syntax errors

* remove useless comment

* Remove comment about useless comments, and remove the useless comments
2025-12-09 12:39:41 -08:00
Jingyu Zhou
b7fa9b301c Ignore remoteSatelliteTLogsDead in kill decision making (#12545)
Remote satellite is not used in recoveries, thus even if they are dead, they
have no effect. Simulation found a scenario where remoteTLogsDead is true, but
remoteSatelliteTLogsDead is false. In this case, simulator doesn't think it's
going to kill too many machines.

However, because one remote tlog missing, recovery can't reach all tlogs
recruited state, thus blocking remote SSes from catching up, resulting in
consistency check failure.

Seed: -f ./tests/slow/BackupOldAndNewRestore.toml -s 1884352193 -b on
commit: 53fe3ec741 with gcc build
2025-11-11 19:53:46 -08:00
Jingyu Zhou
b720c6d884 Fix how connectionFailuresDisableDuration is used in Tester (#12537)
* Fix how connectionFailuresDisableDuration is used in Tester

This should be used to disable connection failures, not to enable it.

* Fix a comment
2025-11-03 19:28:30 -08:00
Michael Stack
6c78bcec65 Fix MockS3Server canBeSet() assertion failure. Seen on mac. (#12497)
* Fix MockS3Server canBeSet() assertion failure. Seen on mac.

- Fix UnsentPacketQueue pointer overwrite issue in MockS3Server; was
  overwriting existing response->data.content pointer.
- Replace 'new UnsentPacketQueue()' with 'discardAll()' to use existing content queue
- Remove conditional reference counting that could cause memory corruption
- Simplify clone() method to always return new instance
- Add null check on globals and aggressive clearing of currentProcess.

* ASSERT global id >= 0.
Comment on when currentProcess is nullptr

(Address review feedback)

---------

Co-authored-by: michael stack <stack@duboce.com>
2025-10-23 17:24:41 -07:00
Syed Paymaan Raza
99012d2e05 Clean up some headers and dead code (#12488)
* Clean up some headers and dead code

* self review
2025-10-21 10:30:50 -07:00
Jingyu Zhou
822c9167a2 Fix a bug where speedUpSimulation is not enabled (#12476)
* Fix a bug speedUpSimulation is not enabled

If the connectionFailureDisableTime is slightly larger than connectionFailureDisableTime,
disableConnectionFailures returns a small value such that disableConnectionFailuresAfter()
is not retrying. As a result, clogging is still in place, and cause many test failures.

A typical symptom is that some transaction can't commit due to transaction_too_old errors.
If we look closely, we'll find that there are many ProxyReject errors with QDelay > 5s.
A failure I looked at has only one commit proxy and one resolver, and the latency
between them is quite high:

5.494145 Sim2Connection Machine=2.0.2.1:2 ID=0000000000000000 From=2.0.2.1:2 To=2.0.2.2:2 SendBufSize=1736851 Latency=0.0292578 StableConnection=0
5.494145 Sim2Connection Machine=2.0.2.1:2 ID=0000000000000000 From=2.0.2.2:2 To=2.0.2.1:2 SendBufSize=510278 Latency=0.0158367 StableConnection=0

So after a while, all transactions got either transaction_too_old errors or
ProxyReject'ed, because resolution took about 100ms on average.

To reproduce:
Commit: a3720567a2
Seed: -b on -f tests/fast/BackupCorrectnessWithEKPKeyFetchFailures.toml -s 2458961283

* Add a const DISABLE_CONNECTION_FAILURE_MIN_INTERVAL
2025-10-20 09:37:28 -07:00
gxglass
b1d6dcf0e7 Delete blob granule feature (#12435)
This is the first experimental feature to be deleted in the list published at PR #12400.

There is more code here than I anticipated. It is about 40,000 lines total, of which about three quarters are in dedicated files which I am deleting, and about one quarter is in shared files. That means about 10k lines in shared files, which is the stuff we tend to notice day to day (that plus the test failures on heretofore not-yet-disabled test cases, which I am now deleting).

I ran 3 million simulations, mostly against 692df86 or very similar code (differing by one TraceEvent). This was prior to syncing with upstream/main, which had no conflicts and from which I don't expect problems. The number of failures in these runs was about 8. We looked at them and believe there is a high likelihood that these are existing issues not related to the changes in this PR. More details on these failures can be found in docs linked from here: https://quip-apple.com/MN7gAyXLjgyn

* change Long Term status for unowned features for "scheduled for deletion" where applicable

* Relax wording about scheduled for deletion features

* Delete blob granule feature.  WIP.  Does not compile.

* more incremental hacking to remove / comment out blob granule related code

* more hacking to remove blob granule related code, e.g. blob manager and blob migrator roles

* delete more blob granule stuff

* more hacking

* more hacking

* more hacking

* More changes to remove blob granule related code.  IT COMPILES NOW

* dont try to run AuthzSecurity tests as we have deleted that workload as part of this effort

* delete more stuff that matches, abbreviates, or smells like blob granule related

* EncryptKeyProxy: dont do blobMetadata stuff, because that is not used and support is being removed

* delete more references to blob granule stuff

* SimulationConfig::setEncryptionAtRestMode: always use DISABLED; also disable EncryptKeyProxyTest.toml

* format code

* manual update to bindings/java/src/tests.cmake to remove a deleted file

* fix compile errors.  I guess by default I dont build Java bindings

* remove unneeded blob granule functions rather than #if..#endif them out

* remove more code in #if..#endif

* remove more code in #if 0..#endif

* revert changes to fdb_c.h in preparation for marking removed API calls as removed

* rework C API declarations to in preparation for marking blob granule APIs as removed

* deprecate removed glob granule related API functions as of version 740 (and add a comment to request a justification of this convention)

* make progress on broken ctests.  E.g. 1) python does not need to do blob granule stuff.  2) authz tests seemingly not needed

* remove blob granule stuff from Java and Python APIs and fix test runner stuff so that ctests pass

* reformat comments to fix compile error.  FIXME: why is this error not happening on the default compile commands we use

* hacks all the way down to try to fix the Mac build

* add pointed comment about the perceived pointlessness of the API deprecation scheme embodied in this source file

* really serious about the C++ style comments, arent we

* remove commented-out code from prior iterative efforts

* put back undeleted code in original order

* delete commented-out code

* update feature-status.md to say blob granule is mostly deleted

* upgrade `mostly deleted` to `has been deleted`
2025-10-13 16:18:56 -07:00
gxglass
b4fb2f439e Misc small changes and FIXME comment additions from reading fdbrpc code (#12395)
Small changes accumulated while reading through a bunch of files in fdbrpc (mostly).

Throughout: add FIXME comments where it would be useful to have a comment explaining something or suggesting a change of some kind.

Several places: rename a variable to get a better name

A few places: delete code that has been commented out for > 5 years

A few places: delete useless or apparently out of date, incorrect comments

One place: delete a function that has no callers

One place: reorder methods, listing them in the order which users are to call them (i.e. the natural order for explanation)

One or more places: delete unneeded FIXME comments

network.h: put the function comments before the functions they describe

* put function comments before the function they describe

* add fixme comment

* setLocalAddress had an obviously broken function comment.  Then I noticed nobody was calling it.  So delete it

* QueueModel.h: list methods and comments in order that methods are to be called by users; remove a many years old commented out block of code

* fdbrpc.h: FIXMEs to request explanations of Public and Private

* Add a variety of FIXME comments to FlowTransport.cpp to be revisited later

* add FIXMEs for comment additions; remove useless or outdated comments; rename a variable for clarity; other very cosmetic updates

* add FIXMEs for suggested comments; rename a variable; delete one or more useless comments

* simplify some FIXME comments

* add one space to address CODE FORMAT CLEAN CI check

* Updates based on code review comments

* fix comment
2025-09-26 11:43:19 -07:00
Vishesh Yadav
9f094417a2 Fix isOnMainThread in Simulation and Testing (#11978)
* Fix isOnMainThread in Simulation and Testing

isOnMainThread() is used to check if the currently running task
is on the FDB's event loop. However, in simulation this behaviour
is broken and always returns false.

In other modes such as UnitTest mode since `runTests()` is called before
`g_network->run()`, but without a wait() statement the event loop never
gets chance to set itself as main thread and the tests never sees
current thread as main thread. Therefore we add a yield inside
`runTests()` so yield control back to caller block and continue
with g_network->run() which eventually schedule it back after
initialization.

* Update fdbserver/tester.actor.cpp

Co-authored-by: Syed Paymaan Raza <1238752+spraza@users.noreply.github.com>

---------

Co-authored-by: Syed Paymaan Raza <1238752+spraza@users.noreply.github.com>
2025-02-27 13:14:17 -08:00
Jingyu Zhou
7459749834 Extend connection failure in GcGenerations workload
The Tester may disable the connection failure after the GcGenerations enables
it. So we want to extend the connection failure for the Tester in this case.
2025-02-05 20:21:21 -08:00
Syed Paymaan Raza
c3e7542cda Update end year in copyright header 2024-08-02 09:40:11 -07:00
Xiaoge Su
cf70d45e6d Add PeerAddress to all PeerAddr/Peer TraceEvent
This is to address #4846
2024-07-09 16:49:22 -07:00
Hao Fu
8555ac9b71 Implement checksum via LRU-like approach to save space (#11194) 2024-02-21 12:24:51 +08:00
Xiaoge Su
91ec1fdf10 Provide actor call backtrace
See design/AcAC.md
2023-09-19 20:58:33 -07:00
Josh Slocum
8ce1796a7a Stable http ports (#10604)
* Ensuring HTTP ports are stable for the same servers, and adding a test http server that ensures no other users accidentally talk to it

* fixing warning with werror

* more werror fixes
2023-07-13 14:06:52 -04:00
Zhe Wu
d8eaf28bfd Simulate connection failure in simulation 2023-07-07 11:34:12 -07:00
Konrad `ktoso` Malawski
c26aa0b2a3 Introduce initial Swift support in fdbserver (#10156)
* [fdbserver] workaround the FRT type layout issue to get Swfit getVersion working

* MasterData.actor.h: fix comment typo

* masterserver.swift: some tweaks

* masterserver.swift: remove getVersion function, use the method

* masterserver.swift: print replied version to output for tracing

* [swift] add radar links for C++ interop issues found in getVersion bringup

* Update fdbserver.actor.cpp

* Migrate MasterData closer to full reference type

This removes the workaround for the FRT type layout issue, and gets us closer to making MasterData a full reference type

* [interop] require a new toolchain (>= Oct 19th) to build

* [Swift] fix computation of toAdd for getVersion Swift implementation

* add Swift to FDBClient and add async `atLeast` to NotifiedVersion

* fix

* use new atLeast API in master server

* =build fixup link dependencies in swift fdbclient

* clocks

* +clock implement Clock using Flow's notion of time

* [interop] workaround the immortal retain/release issue

* [swift] add script to get latest centos toolchain

* always install swift hooks; not only in "test" mode

* simulator - first thing running WIP

* cleanups

* more cleanup

* working snapshot

* remove sim debug printlns

* added convenience for whenAtLeast

* try Alex's workaround

* annotate nonnull

* cleanup clock a little bit

* fix missing impls after rebase

* Undo the swift_lookup_Map_UID_CommitProxyVersionReplies workaround

No longer needed - the issue was retain/release

* [flow][swift] add Swift version of BUGGIFY

* [swiftication] add CounterValue type to provide value semantics for Counter types on the Swift side

* remove extraneous requestingProxyUID local

* masterserver: initial Swift state prototype

* [interop] make the Swiftied getVersion work

* masterserver - remove the C++ implementation (it can't be supported as state is now missing)

* Remove unnecessary SWIFT_CXX_REF_IMMORTAL annotations from Flow types

* Remove C++ implementation of CommitProxyVersionReplies - it's in Swift now

* [swift interop] remove more SWIFT_CXX_REF_IMMORTAL

* [swift interop] add SWIFT_CXX_IMMORTAL_SINGLETON_TYPE annotation for semanticly meaningful immortal uses

* rename SWIFT_CXX_REF_IMMORTAL -> UNSAFE_SWIFT_CXX_IMMORTAL_REF

* Move master server waitForPrev to swift

* =build fix linking swift in all modules

* =build single link option

* =cmake avoid manual math, just get "last" element from list

* implement Streams support (#18)

* [interop] update to new toolchain #6

* [interop] remove C++ vtable linking workarounds

* [interop] make MasterData proper reference counted SWIFT_CXX_REF_MASTERDATA

* [interop] use Swift array to pass UIDs to registerLastCommitProxyVersionReplies

* [interop] expose MasterServer actor to C++ without wrapper struct

* [interop] we no longer need expose on methods 🥳

* [interop] initial prototype of storing CheckedContinuation on the C++ side

* Example of invoking a synchronous swift function from a C++ unit test. (#21)

* move all "tests" we have in Swift, and priority support into real modules (#24)

* Make set continuation functions inline

* Split flow_swift into flow_swift and flow_swift_future to break circular dependency

* rename SwiftContinuationCallbackStruct to FlowCallbackForSwiftContinuation

* Future interop: use a method in a class template for continuation set call

* Revert "Merge pull request #22 from FoundationDB/cpp-continuation" (#30)

* Basic Swift Guide (#29)

Co-authored-by: Alex Lorenz <arphaman@gmail.com>

* Revert "Revert "Merge pull request #22 from FoundationDB/cpp-continuation" (#30)"

This reverts commit c025fe6258.

* Restore the C++ continuation, but it seems waitValue is broken for CInt somehow now

* disable broken tests - waitValue not accessible

* Streams can be async iterated over (#27)

Co-authored-by: Alex Lorenz <arphaman@gmail.com>

* remove work in progress things (#35)

* remove some not used (yet) code

* remove expose func for CInt, it's a primitive so we always have witness info (#37)

* +masterdata implement provideVersions in Swift (#36)

* serveLiveCommittedVersion in Swift (#38)

* Port updateLiveCommittedVersion to swift (#33)

Co-authored-by: Konrad `ktoso` Malawski <konrad_malawski@apple.com>

* Implement updateRecoveryData in Swift (#39)

Co-authored-by: Alex Lorenz <arphaman@gmail.com>

* Simplify flow_swift to avoid multiple targets and generate separate CheckedContinuation header

* Uncomment test which was blocked on extensions not being picked up (#31)

* [interop] Use a separate target for Swift-to-C++ header generation

* reduce boilerplate in future and stream support (#41)

* [interop] require interop v8 - that will fix linker issue (https://github.com/apple/swift/issues/62448)

* [interop] fix swift_stream_support.h Swift include

* [interop] bump up requirement to version 9

* [interop] Generalize the Flow.Optional -> Swift.Optional conversion using generics

* [WIP] masterServer func in Swift (#45)

* [interop] Try conforms_to with a SWIFT_CONFORMS_TO macro for Optional conformance (#49)

* [interop] include FlowOptionalProtocol source file when generating Flow_CheckedContinuation.h

This header generation step depends on the import of the C++ Flow module, which requires the presence of FlowOptionalProtocol

* conform Future to FlowFutureOps

* some notes

* move to value() so we can use discardable result for Flow.Void

* make calling into Swift async funcs nicer by returning Flow Futures

* [interop] hide initial use of FlowCheckedContinuation in flow.h to break dependency cycle

* [fdbserver] fix an EncryptionOpsUtils.h modularization issue (showed up with modularized libc++)

* Pass GCC toolchain using CMAKE_Swift_COMPILE_EXTERNAL_TOOLCHAIN to Swift's clang importer

* [interop] drop the no longer needed libstdc++ include directories

* [cmake] add a configuration check to ensure Swift can import C++ standard library

* [swift] include msgpack from msgpack_DIR

* [interop] make sure the FDB module maps have 'export' directive

* add import 'flow_swift' to swift_fdbserver_cxx_swift_value_conformance.swift

This is needed for CONFORMS_TO to work in imported modules

* make sure the Swift -> C++ manually bridged function signature matches generated signature

* [interop][workaround] force back use of @expose attribute before _Concurrency issue is fixed

* [interop] make getResolutionBalancer return a pointer to allow Swift to use it

We should revert back to a reference once compiler allows references again

* [interop] add a workaround for 'pop' being marked as unsafe in Swift

* masterserver.swift: MasterData returns the Swift actor pointer in an unsafe manner

* Add a 'getCopy' method to AsyncVar to make it more Swift friendly

* [interop] bump up the toolchain requirement

* Revert "[interop][workaround] force back use of @expose attribute before _Concurrency issue is fixed"

This reverts commit b01b271a76.

* [interop] add FIXME comments highlighting new issue workarounds

* [interop] adopt the new C++ interoperability compiler flag

* [interop] generate swift compile commands

* Do not deduplicate Swift compilation commands

* [interop] generate swift compile commands

* Do not deduplicate Swift compilation commands

* flow actorcompiler.h: add a SWIFT_ACTOR empty macro definition

This is needed to make the actor files parsable by clangd

* [cmake] add missing dependencies

* experimental cross compile

* [cmake] fix triple in cross-compiled cmake flags

* [interop] update to interop toolchain version 16

* [x-compile] add flags for cross-compiling boost

* cleanup x-compile cmake changes

* [cmake] fix typo in CMAKE_Swift_COMPILER_EXTERNAL_TOOLCHAIN config variable

* [interop] pass MasterDataActor from Swift to C++ and back to Swift

* [fdbserver] Swift->C++ header generation for FDBServer should use same module cache path

* Update swift_get_latest_toolchain.sh to fetch 5.9 toochains

* set HAVE_FLAG_SEARCH_PATHS_FIRST for cross compilation

* Resolve conflicts in net2/sim2/actors, can't build yet

* undo SWIFT_ACTOR changes, not necessary for merge

* guard c++ compiler flags with is_cxx_compile

* Update flow/actorcompiler/ActorParser.cs

Co-authored-by: Evan Wilde <etceterawilde@gmail.com>

* update the boost dependency

* Include boost directory from the container for Swift

* conform flow's Optional to FlowOptionalProtocol again

* Guard entire RocksDBLogForwarder.h with SSD_ROCKSDB_EXPERIMENTAL to avoid failing on missing rocksdb APIs

* remove extraneous merge marker

* [swift] update swift_test_streams.swifto to use vars in more places

* Add header guard to flow/include/flow/ThreadSafeQueue.h to fix moduralization issue

* Update net and sim impls

* [cmake] use prebuilt libc++ boost only when we're actually using libc++

* [fdbserver] Swift->C++ header generation for FDBServer should use same module cache path

* fixups after merge

* remove CustomStringConvertible conformance that would not be used

* remove self-caused deprecation warnings in future_support

* handle newly added task priority

* reformatting

* future: make value() not mutating

* remove FIXME, not needed anymore

* future: clarify why as functions

* Support TraceEvent in Swift

* Enable TraceEvent using a class wrapper in Swift

* prearing WITH_SWIFT flag

* wip disabled failing Go stuff

* cleanup WITH_SWIFT_FLAG and reenable Go

* wip disabled failing Go stuff

* move setting flag before printing it

* Add SWIFT_IDE_SETUP and cleanup guides and build a bit

* Revert "Wipe packet buffers that held serialized WipedString (#10018)"

This reverts commit e2df6e3302.

* [Swift] Compile workaround in KeyBackedRangeMap; default init is incorrect

* [interop] do not add FlowFutureOps conformance when building flow clang module for Flow checked continuation header pre-generation

* make sure to show  -DUSE_LIBCXX=OFF in readme

* readme updates

* do not print to stderr

* Update Swift and C++ code to build with latest Swift 5.9 toolchain now that we no longer support universal references and bridge the methods that take in a constant reference template parameter correctly

* Fix SERVER_KNOBS and enable use them for masterserver

* Bump to C++20, Swift is now able to handle it as well

* Put waitForPrev behind FLOW_WITH_SWIFT knob

* Forward declare updateLiveCommittedVersion

* Remove unused code

* fix wrong condition set for updateLiveCommittedVersion

* Revert "Revert "Wipe packet buffers that held serialized WipedString (#10018)""

This reverts commit 5ad8dce052.

* Enable go-bindings in cmake

* Revert "Revert "Wipe packet buffers that held serialized WipedString (#10018)""

This reverts commit 5ad8dce052.

* USE_SWIFT flag so we "build without swift" until ready to by default

* uncomment a few tests which were disabled during USE_SWIFT enablement

* the option is WITH_SWIFT, not USE

* formatting

* Fix masterserver compile error

* Fix some build errors.

How did it not merge cleanly? :/

* remove initializer list from constructor

* Expect Swift toolchain only if WITH_SWIFT is enabled

* Don't require Flow_CheckedContinuation when Swift is disabled

* Don't compile FlowCheckedContinuation when WITH_SWIFT=OFF

* No-op Swift macros

* More compile guards

* fix typo

* Run clang-format

* Guard swift/bridging include in fdbrpc

* Remove printf to pass the test

* Remove some more printf to avoid potential issues

TODO: Need to be TraceEvents instead

* Remove __has_feature(nullability) as its only used in Swift

* Don't use __FILENAME__

* Don't call generate_module_map outside WITH_SWIFT

* Add some more cmake stuff under WITH_SWIFT guard

* Some more guards

* Bring back TLSTest.cpp

* clang-format

* fix comment formatting

* Remove unused command line arg

* fix cmake formatting in some files

* Address some review comments

* fix clang-format error

---------

Co-authored-by: Alex Lorenz <arphaman@gmail.com>
Co-authored-by: Russell Sears <russell_sears@apple.com>
Co-authored-by: Evan Wilde <etceterawilde@gmail.com>
Co-authored-by: Alex Lorenz <aleksei_lorenz@apple.com>
Co-authored-by: Vishesh Yadav <vishesh_yadav@apple.com>
Co-authored-by: Vishesh Yadav <vishesh3y@gmail.com>
2023-06-02 16:09:28 -05:00
Josh Slocum
9c081f8a08 Sim http server improvements (#10217)
* Passes existing tests

* adding http unit test for wrong md5 sum

* Added new HTTPKeyValueStore workload to test long-running http clients

* fixing warnings
2023-05-12 16:33:32 -05:00
Josh Slocum
a4dffa087a Adding Simulated HTTP Server and refactoring HTTP code (#10112)
* Adding Simulated HTTP Server and refactoring HTTP code

* fixing formatting

* fixing merge conflicts

* fixing more merge conflicts

* code review feedback

* changing reference counted interface

* more fixes

* fixing ide build i guess
2023-05-05 12:19:17 -05:00
Steve Atherton
95be00a8be Move AsyncFileWriteChecker to right above SimpleFile in the file stack in simulation, which is analogous to where it is created in production and prevents false positive errors caused by stacking it on top of AsyncFileNonDurable multiple times for different users of the same file. 2023-05-03 10:39:13 -07:00
A.J. Beamon
bcb8d01bc4 Merge pull request #9956 from sfc-gh-ajbeamon/fix-can-kill-processes-in-multi-region-config
Fix canKillProcesses to ignore region config when usableRegions is 1
2023-04-12 10:07:17 -07:00
A.J. Beamon
120933da5f When usableRegions is 1 but we have multiple regions in our configuration, canKillProcesses needs to perform the check as if there is only one region 2023-04-12 08:49:25 -07:00
A.J. Beamon
4142762981 The RebootProcessAndSwitch kill type is meant to be used on an entire cluster, but it was not rebooting protected processes in this way. 2023-04-10 15:58:19 -07:00
Josh Slocum
d37b2b0a76 Adding BlobFailureInjection workload (#9833)
* Adding BlobFailureInjection workload

* fixing formatting
2023-04-06 15:10:36 -05:00
Zhe Wu
d576d9a66a Remote debug TraceEvent 2023-03-27 11:47:11 -07:00
Zhe Wu
40dc54223c Add GC generation test, and make all simulation test passing 2023-03-27 11:46:13 -07:00
Zhe Wu
b4e62b9b3e Update log cursor timeout check 2023-03-21 22:03:17 -07:00
Jingyu Zhou
5c97fb2c20 Use a constant for connectionFailuresDisableDuration 2023-03-09 09:50:24 -08:00
Jingyu Zhou
e18ed14278 Refactor to address comments 2023-03-09 09:39:27 -08:00
Jingyu Zhou
493e81f31d Limit connection failures to be within tests
In particular, disable connection failures when initializing the database
during the startup phase, i.e., before running with test specs.
2023-03-08 15:36:58 -08:00
Russell Sears
bcc05b1058 Improve support for prebuilt boost 2023-02-27 15:38:58 -06:00
Jingyu Zhou
9a257a60a4 Address review comments 2023-02-24 10:47:32 -08:00
Jingyu Zhou
0b2e02c402 Fix rare test failures
Unclog after DB is recovered, otherwise another recovery may become stuck again.
2023-02-23 15:42:33 -08:00
Jingyu Zhou
65443b6541 Fix compiling errors 2023-02-23 15:02:44 -08:00
Jingyu Zhou
ecae81882c Change to only clog once for a particular tlog
If we repeat clogging, different tlogs may be excluded, which can cause the
recovery to stuck.
2023-02-23 14:31:39 -08:00
Jingyu Zhou
955826f2fe Add ClogTlog workload 2023-02-23 14:31:12 -08:00
Junhyun Shim
d9c126a2d9 Introduce WipedString for Arena block holding AuthZ tokens (#9381)
* Enable secure allocation mode in Arena

This mode allows zeroing out blocks holding sensitive data after use

* Introduce WipedString to all token-holding memory

Also introduce a option flag "sensitive"

* Make pointer equivalency a hard requirement for non-ASAN builds

So that we can detect when Arena/malloc/memory-wipe behavior changes
2023-02-16 10:44:32 +01:00
Jingyu Zhou
622520bd2d Return the source team if remote DC is dead
Also refactor the code with findTeamFromServers().
2023-02-10 11:11:07 -08:00
Jingyu Zhou
6c4a9b5f23 Fix DD stuck when remote DC is dead
When remote DC is down, the remote team collection of DD can initializing
waiting for the remote to recover (all_tlog_recruited state). However, the
getTeam request can already be served by the remote team collection. So, for
a RelocateShard (data movement such as split, move), it will get a team for
the remote DC. But the data movement can't make progress on the remote team
because the remote DC hasn't recovered yet. Because of the stuck of data
movement, the primary cannot reach the "storage_recovered" state and stay in
accepting_commit state.

The specifc test failure: slow/ApiCorrectness.toml -s 339026305 -b on
at commit:  0edd899d65

In this test, primary DC has 1 SS killed, remote DC has 2 TLog and 2 SS killed.
So the remote is dead, the remaining 2 SSes can't make progress because of the
loss of 2 TLogs. The repairDeadDatacenter() can't reach the "storage_recovered"
state due to DD's failure of moving shards away from the killed SS in the
primary.

The fix is to exclude all remote in repairDeadDatacenter() so that tells DD to
mark all SSes in the remote as unhealthy. Another fix is to return empty
results for getTeam request if the remote team collection is not ready. This
will allow the data movement to continue, essentially remote team is not changed
for the data movement.
2023-02-10 11:11:07 -08:00
Junhyun Shim
be225acd2a Merge remote-tracking branch 'origin/main' into authz-tenant-name-to-tenant-id 2023-02-06 23:13:43 +01:00
Xiaoxi Wang
7190fa0c08 Merge branch 'main' of https://github.com/apple/foundationdb into fix/main/testTimeout 2023-02-03 13:48:54 -08:00
Xiaoxi Wang
b757e8914a fix BOOST_SYSTEM_NO_LIB redefinition in CI 2023-02-03 13:47:50 -08:00
Junhyun Shim
ce652fa284 Replace AuthZ's use of tenant names in token with tenant ID
Also, to minimize audit log loss, handle token usage audit logging at each usage.
This has a side-effect of making the token use log less bursty.
This also subtly changes the dedup cache policy.
Dedup time window used to be 5 seconds (default) since the start of batch-logging.
Now it's 5 seconds from the first usage since the closing of the previous dedup window
2023-02-03 21:46:31 +01:00
Jingyu Zhou
e96adfa449 Fix excessive killing for HA configuration
In the HA configuration, it's possible the remote DC was killed 2 out of 3
machines, left not enough machines for a successful recovery. So this PR changes
to Reboot to avoid such excessive killings.
2023-02-01 15:16:10 -08:00
Chaoguang Lin
4c5cbe6cda Merge branch 'main' of github.com:apple/foundationdb into fix-nightly-failure 2023-01-25 18:43:37 -08:00
Chaoguang Lin
fce9490c19 A Fix from Evan 2023-01-25 15:55:24 -08:00
Xiaoge Su
eb4e147ebf Reformat source 2023-01-24 15:06:27 -08:00
Xiaoge Su
0a60142160 Extract ProcessInfo, MachineInfo, KillType out from ISimulator 2023-01-24 14:48:42 -08:00