apple-foundationdb

mirror of https://github.com/apple/foundationdb.git synced 2026-01-25 12:28:19 +00:00

Author	SHA1	Message	Date
Syed Paymaan Raza	c3e7542cda	Update end year in copyright header	2024-08-02 09:40:11 -07:00
Lukas Joswiak	611849cd5c	Throw when we get cancelled instead of sending an error UBSAN was complaining about undefined behavior when trying to access the `SAV` inside a promise after an actor had been cancelled. If we are cancelled, don't try to return an error, just throw.	2023-06-07 15:26:31 -07:00
Lukas Joswiak	795b666e23	Fix a rare configuration database data loss bug See the comment contained in this commit. This bug could only manifest under a specific set of circumstances: 1. A coordinator change is started 2. The coordinator change succeeds, but its action of clearing `previousCoordinatorsKey` is delayed. 3. A minority of `ConfigNode`s have an old state of the configuration database, compared to the majority. 4. A `ConfigNode` in the majority dies and permanently loses data. 5. A long delay occurs on the `PaxosConfigConsumer` when it tries to read the latest changes from the `ConfigNode`s. In the above circumstances, the `ConfigBroadcaster` could incorrectly send a snapshot of an old state of the configuration database to a majority of `ConfigNode`s. This would cause new, durable, and acknowledged commit data to be overwritten. Note that this bug only affects the configuration database (used for knob storage). It does not affect the normal keyspace.	2022-11-22 11:20:04 -08:00
Lukas Joswiak	8d237ba493	Fix various correctness and timeout issues Contains the following fixes: * When handling the special case rollforward where nodes can be rolled forward even if a majority are at version 0, we don't want to reset the live version of the node being rolled forward. This is because a quorum of nodes at version 0 can continue handing out and incrementing their live version, and if they are rolled forward there is the potential for them to go back in time in regard to their live version. So in this one special case, they should maintain their existing live version. * Fixes some unseed issues due to fields not being initialized properly. * Temporarily disables a coordinator restart in the recovery path (in the coordinated state) due to it causing a timeout. This needs more investigation in the future.	2022-09-13 16:53:54 -07:00
Lukas Joswiak	74ac617a34	Add support for changing coordinators to the configuration database Configuration database data lives on the coordinators. When a change coordinators command is issued, the data must be sent to the new coordinators to keep the database consistent.	2022-09-13 16:53:54 -07:00
Markus Pilman	1de37afd52	Make TEST macros C++ only (#7558 ) * proof of concept * use code-probe instead of test * code probe working on gcc * code probe implemented * renamed TestProbe to CodeProbe * fixed refactoring typo * support filtered output * print probes at end of simulation * fix missed probes print * fix deduplication * Fix refactoring issues * revert bad refactor * make sure file paths are relative * fix more wrong refactor changes	2022-07-19 13:15:51 -07:00
Lukas Joswiak	9ca8a3c683	Reenable status json for dynamic knobs, add unit test	2022-06-21 11:43:05 -07:00
Andrew Noyes	6f500b59c0	Fix a heap-use-after-free in PaxosConfigConsumer.actor.cpp (#7244 ) * Fix a heap-use-after-free in PaxosConfigConsumer.actor.cpp * Two more defensive local promises * Two more defensive promise copies * Fix latent logic error	2022-05-25 12:08:30 -07:00
Renxuan Wang	154de018ff	One place in PaxosConfigConsumer was missed out in #6926 . (#7006 ) * One place in PaxosConfigConsumer was missed out. * Minor improvements.	2022-04-28 18:32:55 -07:00
Renxuan Wang	c69a07a858	Check in the new Hostname logic. (#6926 ) * Revert #6655. 20220407-031010-renxuan-c101052c21da8346 compressed=True data_size=31004844 duration=4310801 ended=100000 fail_fast=10 max_runs=100000 pass=100000 priority=100 remaining=0 runtime=1:04:15 sanity=False started=100047 stopped=20220407-041425 submitted=20220407-031010 timeout=5400 username=renxuan * Revert #6271. 20220407-051532-renxuan-470f0fe6aac1c217 compressed=True data_size=30982370 duration=3491067 ended=100002 fail_fast=10 max_runs=100000 pass=100002 priority=100 remaining=0 runtime=0:59:57 sanity=False started=100141 stopped=20220407-061529 submitted=20220407-051532 timeout=5400 username=renxuan * Revert #6266. Remove resolving-related functionalities in connection string. Connection string will be used for storing purpose only, and non-mutable. 20220407-175119-renxuan-55d30ee1a4b42c2f compressed=True data_size=30970443 duration=5437659 ended=100000 fail_fast=10 max_runs=100000 pass=100000 priority=100 remaining=0 runtime=0:59:31 sanity=False started=100154 stopped=20220407-185050 submitted=20220407-175119 timeout=5400 username=renxuan * Add hostname to coordinator interfaces. * Turn on the new hostname logic. * Add the corresponding change in config txns. The most notable change is before calling basicLoadBalance(), we need to call tryInitializeRequestStream() to initialize request streams first. Passed correctness tests. * Return error when hostnames cannot be resolved in coordinators command. * Minor fixes.	2022-04-27 21:54:13 -07:00
Chaoguang Lin	af9deeabc2	Move the Promise<QuorumVersion> before the Future vector to be destroyed after the vector	2022-03-22 16:12:41 -07:00
sfc-gh-tclinkenbeard	a71099471b	Update copyright header dates	2022-03-21 13:36:23 -07:00
sfc-gh-tclinkenbeard	0e7dc83f25	Fix compilation issues with ModelInterface construction in configuration database code	2022-03-16 14:25:32 -07:00
Lukas Joswiak	c3e48fff9f	Update fdbserver/PaxosConfigConsumer.actor.cpp Co-authored-by: Trevor Clinkenbeard <trevor.clinkenbeard@snowflake.com>	2022-03-16 08:59:12 -07:00
Lukas Joswiak	582ba5d519	Fix issue with stuck config nodes In rare circumstances where the cluster controller dies / moves to a new machine, sometimes only a minority of `ConfigNode`s received messages telling them they were registered. When the `ConfigNode`s attempt to register with the new broadcaster (on the new cluster controller), the knob system would get stuck because only a minority would be registered. Part of this change allows registration of unregistered `ConfigNode`s if there is no path to a majority of registered nodes.	2022-03-15 11:42:58 -07:00
Lukas Joswiak	d0da6c63c1	Rollforward out of date nodes, compaction fixes	2022-03-14 11:20:56 -07:00
Lukas Joswiak	a8828db58e	Load balance dynamic knob requests This commit also removes an attempt to read the latest configuration snapshot when a rollforward timeout occurs. The normal retry loop will eventually fetch an up to date snapshot and the rollforward will be retried.	2022-02-22 10:53:48 -08:00
Lukas Joswiak	e8354d82bd	Fix timeout issue when using >3 coordinators The calculation to determine how many non-timeout replies had been received was incorrect, causing rollback/rollforward requests to not be sent, causing the dynamic knob subsystem to get stuck.	2022-02-09 13:43:33 -08:00
Lukas Joswiak	7fc4f0d649	Reuse existing quorum timeout error code	2022-02-09 13:43:33 -08:00
Lukas Joswiak	d5a562e6b8	Fix dynamic knobs correctness issues	2022-02-09 13:43:32 -08:00
Lukas Joswiak	30b525a607	Add assertions to check rollback	2021-10-25 12:03:22 -07:00
Lukas Joswiak	c96f560cbe	Verify rollback of a single version in simulation, other small fixes	2021-10-25 12:03:22 -07:00
Lukas Joswiak	6078664792	clang-format	2021-10-25 12:03:22 -07:00
Lukas Joswiak	57c2cf4a24	Retry messages to well known endpoints, add notes for future work	2021-10-25 12:03:22 -07:00
Lukas Joswiak	92998fd20b	Merge rollback message into rollforward message	2021-10-25 12:03:22 -07:00
Lukas Joswiak	7357d7714c	Retry with well known endpoints, move last committed check to consumer	2021-10-25 12:03:22 -07:00
Lukas Joswiak	1631a1b352	Update fdbserver/PaxosConfigConsumer.actor.cpp Co-authored-by: Trevor Clinkenbeard <trevor.clinkenbeard@snowflake.com>	2021-10-25 12:03:22 -07:00
Lukas Joswiak	e79c6c7456	Fix issue where previous commit messages were reused Fixes an issue where commit versions from previous requests sent to ConfigNodes were being reused when a new quorum of commit versions was requested. This was occurring due to a failure to reset the state of GetCommittedVersionQuorum after a full snapshot request.	2021-10-25 12:03:22 -07:00
Lukas Joswiak	9d78604c5b	Add rollback and rollforward logic to ConfigBroadcaster	2021-10-25 12:03:22 -07:00
Lukas Joswiak	9a39da85b1	Fix issue where previous commit messages were reused Fixes an issue where commit versions from previous requests sent to ConfigNodes were being reused when a new quorum of commit versions was requested. This was occurring due to a failure to reset the state of GetCommittedVersionQuorum after a full snapshot request.	2021-10-25 12:03:22 -07:00
Lukas Joswiak	48dc91dd7f	Add rollback and rollforward logic to ConfigBroadcaster	2021-10-25 12:03:22 -07:00
sfc-gh-tclinkenbeard	b15daf1886	Added PImpl class This class propogates the constness of methods to their pimpl implementations	2021-08-09 10:04:34 -07:00
sfc-gh-tclinkenbeard	9cfd6ed955	Add simple implementation to PaxosConfigConsumer	2021-07-18 17:07:10 -07:00
sfc-gh-tclinkenbeard	748a3ebfbe	Add GetSnapshotAndChangesRequest type	2021-05-18 15:28:44 -07:00
sfc-gh-tclinkenbeard	ea8396c9be	Improve decoupling of configuration database interfaces and implementations	2021-05-17 15:31:03 -07:00
sfc-gh-tclinkenbeard	32f38394b1	Added dummy PaxosConfigConsumer implementation	2021-05-17 13:41:50 -07:00

36 Commits