apple-foundationdb

mirror of https://github.com/apple/foundationdb.git synced 2026-01-24 20:08:38 +00:00

Author	SHA1	Message	Date
Jingyu Zhou	6fe4d74522	Remove an unused parameter from cleanupRecoveryActorCollection (#12654 ) No functional change	2026-01-22 14:29:04 -08:00
Michael Stack	9d0e169496	Add error_code_audit_storage_task_outdated to bypass list rather than do special-case handling (#12652 )	2026-01-22 13:49:11 -08:00
Jingyu Zhou	2d2a2144f4	Update copyright years to 2013-2026 (#12653 ) No functional changes.	2026-01-22 10:49:41 -08:00
Michael Stack	7802b83882	Fix s3client_test ctest 'ERROR: Missing s3client/ls_test/sub1/file2_2 in ls output' (#12650 ) The correct number of files are listed: 2026-01-16T04:59:53+00:00 ERROR: Missing s3client/ls_test/sub1/file2_2 in ls output 2026-01-16T04:59:53+00:00 === DEBUG: Recursive ls output === Contents of blobstore://127.0.0.1:8081/s3client/ls_test?bucket=test-bucket&region=us-east-1&secure_connection=0: s3client/ls_test/file1_1 26.00 B s3client/ls_test/file1_2 26.00 B s3client/ls_test/sub1/file2_1 26.00 B s3client/ls_test/sub1/file2_2 26.00 B s3client/ls_test/sub1/sub2/file3_1 26.00 B s3client/ls_test/sub1/sub2/file3_2 26.00 B 2026-01-16T04:59:53+00:00 === END DEBUG === ... but we overcount because of the HTTP loggging. Lines like this... [4a60d5628a4137592fe32d5a5b949bb8] HTTP starting GET /test-bucket/s3client/ls_test/sub1/file2_2?tagging= .... matches the pattern. Just look at stdout. Don't mix in stderr (HTTP logs). Co-authored-by: michael stack <stack@duboce.com>	2026-01-22 08:46:36 -08:00
Michael Stack	564e95b681	Integrate BulkDump/BulkLoad with backup/restore system (#12608 ) * Integrate BulkDump/BulkLoad with backup/restore system This commit adds the ability to use BulkDump for creating backup snapshots and BulkLoad for restoring them, providing faster backup/restore operations for large databases. Key changes: - Add BulkDumpTaskFunc to create SST file snapshots during backup - Add BulkLoadRestoreTaskFunc to restore from BulkDump snapshots - Store bulkDumpJobId in snapshot metadata for restore coordination - Add snapshotMode parameter (0=RANGEFILE, 1=BULKDUMP) to control backup type - Add useRangeFileRestore parameter to control restore method - Add CLIENT_KNOBS for configurable job timeouts - Add test assertions to verify BulkDump/BulkLoad execution - Check for existing running jobs to avoid conflicts when multiple agents run - Properly scope state variables for error handling in Flow actors New test: tests/slow/BackupS3BlobBulkLoadRestore.toml * Update design/bulkload-restore-integration.md	2026-01-21 21:29:23 -08:00
walter	e2baa88a84	Add rocksdb options index_block_restart_interval and index_type (#12639 )	2026-01-21 12:42:38 -08:00
Syed Paymaan Raza	0c67384ac5	Remove unused fdbservice directory (#12623 ) The fdbservice directory contained Windows-specific service code that is no longer maintained and does not look like it is used elsewhere. This removes the directory and its corresponding CMake configuration.	2026-01-20 11:12:20 -08:00
Jingyu Zhou	8c1a69ba60	Refactor log router monitoring and re-recruitment (#12642 )	2026-01-19 18:24:05 -08:00
Jingyu Zhou	3d41289ffb	Merge pull request #12644 from jzhou77/mailmap	2026-01-19 18:22:16 -08:00
Jingyu Zhou	7f06b8e334	Fix IDE build compiling errors	2026-01-19 14:29:12 -08:00
Jingyu Zhou	75fae7434f	Remove leftover storage cache role after #12486 They are unused declarations now.	2026-01-19 13:52:38 -08:00
Jingyu Zhou	5789c55556	Fix a few file read cancellation bugs (#12643 )	2026-01-16 17:57:37 -08:00
Jingyu Zhou	9d7431dbba	Remove unused code probes at master role (#12637 ) Coverage tool found no match for these events, because the refactoring has moved monitoring of txn system failures from master role to CC now.	2026-01-14 15:12:27 -08:00
gxglass	b1848af90a	HealthMetricsApi workload: only check when we've received metrics (#12636 ) rdar://166184432 Sometimes we don't get enough data to compute all of the stats that this workload wants to see as non-zero. Maintain a flag gotMetrics and do checks on metrics only if this flag is set. Debugged by the obvious method of adding TraceEvents to see what was happening with this workload. Also some minor TraceEvent updates and simplify a variable name. 20260113-232530-gglass-05eb3d48db99757b compressed=True data_size=35600940 duration=3977581 ended=100000 fail_fast=1000 max_runs=100000 pass=100000 priority=100 remaining=0 runtime=8:02:14 sanity=False started=100000 stopped=20260114-072744 submitted=20260113-232530 timeout=5400 username=gglass * HealthMetricsApi workload: only check() aggressively when we have received full metrics * HealthMetricsApi.actor.cpp: address review comments, and add a overall comment saying that this seems testable outside simulation	2026-01-14 14:55:13 -08:00
dependabot[bot]	20a5722935	Bump authlib (#12628 ) Bumps the pip group with 1 update in the /tests/TestRunner directory: [authlib](https://github.com/authlib/authlib). Updates `authlib` from 1.6.5 to 1.6.6 - [Release notes](https://github.com/authlib/authlib/releases) - [Changelog](https://github.com/authlib/authlib/blob/main/docs/changelog.rst) - [Commits](https://github.com/authlib/authlib/compare/v1.6.5...v1.6.6) --- updated-dependencies: - dependency-name: authlib dependency-version: 1.6.6 dependency-type: direct:production dependency-group: pip ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-01-14 11:18:38 -08:00
Jingyu Zhou	3b574c4e9b	Fix code coverage data by removing hits from old binaries (#12635 ) * Fix code coverage data by removing hits from old binaries Otherwise, the data are counting from old binaries that may have different lines and thus inflating the total number of probes. * Address a review comment	2026-01-13 16:44:36 -08:00
walter	5614ae64aa	Add rocksdb option max_bytes_for_level_multiplier (#12634 )	2026-01-13 09:46:24 -08:00
Syed Paymaan Raza	ba1d659587	Delete sharded rocks extraneous code (logWriteSize function and its callsites) (#12633 )	2026-01-12 22:39:45 -08:00
Vishesh Yadav	af732673f1	Implement Exclude commands in gRPC (#12603 )	2026-01-12 11:38:36 -08:00
Akanksha Mahajan	2c06c99f1d	EncryptionBackup: Log encryption key file access failures at SevError severity (#12629 ) * Change the error to Sev40 * Fix formatting error	2026-01-09 10:30:17 -08:00
Michael Stack	a5ab18f449	Fix CycleBadRead when BackupS3BlobCorrectness.toml -s 1157546047 -b off (#12627 ) Take a lock while backup and restore are running so we can leave cycle running while backup and restore in operation	2026-01-08 12:23:45 -08:00
Michael Stack	dde3488bcf	s3client_test ctest failed in nightly. Add retry on non-recursive listing (to match retry on recursive listing) (#12617 ) * Retry because mocks3 update is not immediate * Undo disown watchdog on shutdown. Do a couple of kill strategies instead. (Fixes s3_backup_test hang on cleanup after PASSED seen in PR build). * Remove useless comment * Add -a on grep .. could be binary in fdbserver output	2026-01-08 12:04:59 -08:00
Jingyu Zhou	771dec1278	Update mailmap for some authors (#12622 )	2026-01-07 16:54:35 -08:00
Michael Stack	aa35d6cc29	Add restore validation feature: restores to special keyspace allowing validating backup/restore in single cluster (space willing) (#12573 ) * Add restore validation feature with simplified backup gap fix Implements restore validation using audit_storage to verify backup/restore correctness. Includes a minimal fix for the backup gap bug. Key components: - ValidateRestore audit type: compares source keys against restored keys at \xff\x02/rlog/ prefix in storage server - DD audit fixes: propagate validation errors, handle DD failover correctly - RestoreValidation and BackupAndRestoreValidation workloads for testing - Simplified backup gap fix: prevent snapshot from finishing in the same iteration it dispatches the last tasks (single flag + one check)	2026-01-07 15:23:02 -08:00
VXTLS	31d7eadd52	Make `FDB_USE_CSHARP_TOOLS` authoritative and consistently honored across the build (#12615 ) * Make FDB_USE_CSHARP_TOOLS authoritative across the build Historically, FDB_USE_CSHARP_TOOLS acted as a preference hint, and parts of the build could still probe for or assume the presence of C# tooling even when it was disabled. This change makes the option authoritative and consistently honored across the build system. C# tooling is now used only when explicitly enabled and available, and all downstream assumptions are gated accordingly. The default configuration and tool preference order remain unchanged. * cmake files changes * WIP: tmp test * Honor reviewer feedback on C# toolchain detection and actor comparison - Stop assuming C# tooling availability on Windows; explicitly probe for .NET using find_program. - Prefer .NET over mono on all platforms, with mono used only as a fallback. - Fail explicitly when FDB_USE_CSHARP_TOOLS=ON but no C# toolchain is found. - Preserve Python/C# actor output comparison when C# tooling is available, skipping it only when C# is explicitly disabled or unavailable. - Simplify Python argument parsing and remove unnecessary textwrap usage.	2026-01-06 20:55:33 -08:00
Syed Paymaan Raza	2f91f6338c	Remove unused DummyWorkload (#12624 ) DummyWorkload is not referenced by any test files or other code.	2026-01-06 20:20:59 -08:00
Jingyu Zhou	dfbde65a14	Remove blob failure injections (#12620 ) * Remove blob failure injections Follow-up for the cleanup done at #12435. These functions are unused now. * Fix an assertion failure in simulation sim2 has "ASSERT(seconds >= -0.0001);" in delay() function, which was triggering from the tlog code. Reproduction: -f ./tests/fast/SidebandSingle.toml -s 3567205446 -b on	2026-01-06 16:09:11 -08:00
Copilot	c797e35cd0	Add .mailmap for contributor identity mapping (#12602 ) * Initial plan * Add .mailmap with Mohamad Gebai entry Co-authored-by: saintstack <48398+saintstack@users.noreply.github.com> * Clarify .mailmap format with improved comments Co-authored-by: saintstack <48398+saintstack@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: saintstack <48398+saintstack@users.noreply.github.com>	2026-01-06 13:06:30 -08:00
Akanksha Mahajan	e8a1cfb8b8	Add FileLevelEncryption field to BackupDescription JSON output (#12619 )	2026-01-06 12:48:02 -08:00
Alexey Pavlenko	7bb558ddb0	Increase mode_bytes for the StreamingMode.WANT_ALL (#12616 )	2026-01-05 20:55:46 -08:00
Michael Stack	a2bb2e39c1	Run BulkDumpingS3 tests with 3 replicas instead of 1 (#12609 ) Co-authored-by: stack <stack@duboce.com>	2026-01-05 14:56:11 -08:00
Xiaoge Su	e8732a8ccd	Do not force using pypi.python.org when installing sphinx In certain situation, e.g., internal build of FoundationDB that does not allows access to external websites, forcing a https://pypi.python.org/simple will prevent from using other pypis. This patch will remove this hard requirement.	2026-01-05 13:56:07 -08:00
daleiz	401032d042	fix: add virtual dtor for RequestBase (#12614 )	2026-01-05 13:25:36 -08:00
daleiz	13f784b554	Fix misaligned access UB in Endpoint class (#12613 )	2026-01-02 20:38:16 -08:00
Xiaoge Su	f22ed30b97	Use Findbenchmark instead of hardcoding the google-benchmark path in … (#12612 ) * Use Findbenchmark instead of hardcoding the google-benchmark path in flowbench * fixup! Add missing include directories * fixup! Remove extra PRIVATE token * target_link_libraries -> target_include_directories * Support downloading google-benchmark if not found locally This may need to be reconsidered, as it seems more reasonable that the developer prepares the library rather than letting CMake takes the responsibility of monitoring the availability of an external project. * Remove extra creation of flowbench target	2026-01-02 13:55:51 -08:00
Michael Stack	6e8be999cc	XDB-432-7.4 flexible keys parsing for backup cli (#12605 ) Author: Mark Shabanov <mshabanov@openintegration.inc> Signed-off-by: dlambrig Signed-off-by: sbodagala	2025-12-22 14:55:05 -08:00
neethuhaneesha	0d21f35e70	Tx mutations, LogProtocolMessage, SpanContextMessage need to be skipped before coverting them to mutations (#12575 )	2025-12-21 12:39:47 -08:00
neethuhaneesha	afca911361	Backup worker using proxy from command line to upload to S3. (#12565 )	2025-12-19 10:37:28 -08:00
Jingyu Zhou	549d324f13	Remove verbose actorcompiler output (#12600 ) To keep the compiling output clean after PR #12559.	2025-12-18 21:04:30 -08:00
Michael Stack	4772e6f1e6	Exclude single_process_fdbcli_tests (as we do single_process_external_client_fdbcli_tests); it fails with ASAN enabled (#12601 ) Signed-off-by: jzhou77	2025-12-18 11:43:39 -08:00
Jingyu Zhou	271851906e	Re-recruit log routers after failures to avoid recoveries (#12558 ) * Re-recruit log routers after failures Log routers are stateless roles that can reconstruct its state after crash. This is an attemp to avoid triggering recovery if one of log routers crashed. To simplify the work, only the current generations of log routers are monitored and re-recruited after crashes. Previous generations of log routers are not handled in this change, as they are short lived and purged after recovery reaches the fully_recovered state. * Monitor log routers after full recovery I.e., monitorAndRecruitLogRouters() waits for full recovery. * Some cleanup * Add WorkerCache for log routers To avoid duplicated log routers running, though only one will be used (but it's confusing when debugging). 20251114-230514-jzhou-59d6afe1e475c495 * Fix monitoring to happen after full recovery 20251115-041144-jzhou-4278fe608ed051b7 * Keep monitoring log routers before recovery completion 20251115-050228-jzhou-7c37cfb1d6e36ced * monitorAndRecruitLogRouters detects recoveries 20251115-205132-jzhou-4d50f7c5914e883a * Make monitorAndRecruitLogRouters long running 20251115-210426-jzhou-e63a8fcf26a76c81 * Recruit failed log routers in parallel 20251115-222239-jzhou-9eb2287e12c93f7a * Rix replaced log router's begin version Use the TLog's reply.popped version as its start version. 20251116-025329-jzhou-51a4def306038241 * clang-format fix * Address review comments 20251120-221940-jzhou-28d51f8a0400377e compressed=True data_size=37463028 duration=8246235 ended=100000 fail_fast=10 max_runs=100000 pass=100000 priority=100 remaining=0 runtime=0:51:39 sanity=False started=100000 stopped=20251120-231119 submitted=20251120-221940 timeout=5400 username=jzhou * Add exponential backoff for log router re-recruitment 20251121-041700-jzhou-7126a109c1e39c76 * Fix crashes 20251121-043947-jzhou-6f373c1c64faa1b2 * Disable a verbose event * Add CC_RERECRUIT_LOG_ROUTER_ENABLED to control this feature 20251215-220843-jzhou-70dad477b39640e9	2025-12-17 15:42:49 -08:00
Trevor Clinkenbeard	170013d69b	Support tracking read latency metrics per read type (#12586 )	2025-12-17 15:02:28 -08:00
Dan Lambright	2d52509099	Reserve vector capacity for tempTagMessages in TLog commit path (#12571 ) * Reserve Vector Capacity for tempTagMessages * Add knob ENABLE_TLOG_TEMP_TAG_MESSAGES_RESERVE --------- Co-authored-by: Dan Lambright <hlambright@apple.com>	2025-12-17 13:50:30 -08:00
Hendrik Hofstadt	1c00763a55	Port actorcompiler to python (#12559 ) * Port actorcompiler to python * Address review and restore C# compiler	2025-12-17 13:33:07 -08:00
Jingyu Zhou	1c2a1dd653	Force simulator to have a cap on satellite logs (#12597 ) * Force simulator to have a cap on satellite logs If not, ChangeConfig workload or simulation may choose a number high than available machines. As a result, the recruitment will fail, blocking recovery from finishing, thus making the database unavailable. To reproduce: Seed: -f ./tests/fast/LocalRatekeeper.toml -s 1185956409 -b on Branch: main Commit ID: `280b10fa49` 500k 20251216-233658-jzhou-66053213858cc41d * Address review comments.	2025-12-17 12:22:07 -08:00
Michael Stack	280b10fa49	Add the watchdog cleanup added to other scripts to bulkload test too (#12595 ) * Add the watchdog cleanup added to other scripts to bulkload test too * Add delay before listing files * Wait until the listing is complete rather than hard-coded time (Reviewer suggestion) Signed-off-by: gxglass	2025-12-15 15:46:21 -08:00
Martynas Jurkus	224e0daa8f	Go binding: add GetMainThreadBusyness method to Database (#12594 )	2025-12-15 08:36:30 -08:00
gxglass	df96a141f5	Add back C library implementations for fdb_database_open_tenant and 3 other methods (#12593 ) This allows 7.x python libraries to load these methods. It does this on startup to set up python/C API stuff, regardless of whether the API user actually invokes this functionality (which was experimental and is now removed). More details: rdar://166307379 Testing: ctest -R c_api ctest -R python 20251211-224723-gglass-17ed16f020aa18de compressed=True data_size=35311862 duration=5560013 ended=100000 fail_fast=1000 max_runs=100000 pass=100000 priority=100 remaining=0 runtime=1:01:01 sanity=False started=100000 stopped=20251211-234824 submitted=20251211-224723 timeout=5400 username=gglass	2025-12-12 09:34:27 -08:00
daleiz	a34f5a46a7	Improve compiler flag for ARM64 (#12589 ) Replace -march=armv8.2-a+crc+simd with -march=armv8.2-a+lse+crc since SIMD (NEON) is already mandatory in ARMv8, and LSE (Large System Extensions) is more important, which is supported on Graviton2 and later.	2025-12-10 12:36:41 -08:00
Michael Stack	f8f90c96f0	Add logging around cleanup and add a watchdog to kill regardless (#12587 ) * Add logging around cleanup and add a watchdog to kill regardless after 30 seconds * Address review comments Signed-off-by: gxglass	2025-12-09 20:09:47 -08:00

1 2 3 4 5 ...

28256 Commits