Commit Graph

326 Commits

Author SHA1 Message Date
Syed Paymaan Raza
aaba814512 Fix two cases of non-determinism in simulation (#11766) 2024-11-08 14:38:15 -08:00
Vishesh Yadav
42f5e84306 Log all incoming connections 2024-10-09 11:09:50 -07:00
Syed Paymaan Raza
c3e7542cda Update end year in copyright header 2024-08-02 09:40:11 -07:00
Sreenath Bodagala
d7eb028b2a Enable replica consistency check on data movement (#11415)
* - Enable replica consistency check on data movement (and, randomly, on
all reads)

* - Address PR review comments
2024-06-17 17:07:32 -04:00
Sreenath Bodagala
df2b7b4fe8 - Address PR review comments 2024-05-01 19:41:00 +00:00
Sreenath Bodagala
d6f6b45125 - Handle errors thrown during replica consistency check 2024-04-30 21:37:50 +00:00
Sreenath Bodagala
bd68263558 - Disabe replica consistency check related knob 2024-04-22 21:53:28 +00:00
Sreenath Bodagala
a4430b9169 Compare storage replicas on reads (#11235)
* - Compare storage replicas on reads (in "loadBalance()")

* - Do consistency check on reads in loadbalance

* - Do replica consistency check in the case where loadBalance issues
requests to multiple storage servers

* - Address a state variable related bug

* - Code formatting

* - API simplification

* - Simplify code

* - Code formatting

* - Address a review comment
2024-04-11 16:08:54 -04:00
Hao Fu
8555ac9b71 Implement checksum via LRU-like approach to save space (#11194) 2024-02-21 12:24:51 +08:00
Johannes M. Scheuermann
0370cc08e1 Add knob to allow fdbserver to abort under abnormal behaviour 2024-02-14 10:15:14 +01:00
Dimitris Apostolou
a88114c222 Fix typos 2024-02-07 01:16:00 +02:00
Jingyu Zhou
7e54174725 Add a knob RESOLVE_PREFER_IPV4_ADDR to prefer IPv4 addresses
The default is to prefer IPv6 addresses.
2023-08-23 14:54:07 -07:00
Yi Wu
e8d3e926b5 Merge REST_KMS_RESTCLIENT knobs with RESTCLIENT knobs 2023-07-17 20:06:02 -07:00
Nim Wijetunga
7f2260bbd2 Add Encryption Related Latency Metrics (#10596)
* add ss and cp latency metrics

* make changes
2023-07-14 11:30:16 -07:00
Evan Tschannen
eb772c0043 added a blob worker specific page cache size for redwood so that it does not have to be changed manually in fdb.conf for all blob worker processes 2023-06-13 10:35:13 -07:00
Yi Wu
7048ad21a8 EaR: reduce metrics logging (#10453)
* EaR: reduce metrics logging

BlobCipherMetrics used to break down by usage types (whehter it is for tlog, redwood, backup, etc), and these counters will be printed to trace log even when encryption is not enabled, or the specific usage is not happening on a node (e.g. a node with only stateless roles will also print blob cipher counters for redwood). We are reducing the BlobCipherMetrics loggings by:
1. Default to not breakdown the metrics by usage type, and the behavior is controlled by the knob  `ENCRYPT_KEY_CACHE_ENABLE_DETAIL_LOGGING`
2. When the detail breakdown is enabled, the counters are lazily initialize
3. Even if the counters are initialized, they will not be logged if the count is 0 (so like if a node was recruited as tlog but then drops the tlog role later on, the tlog counter inside BlobCipherMetrics will not be logged anymore).

* buggify BlobCipherMetrics detail logging knob

* format
2023-06-09 12:07:49 -07:00
Nim Wijetunga
95bf14323f EKP and KMS Health Check (#10341)
EKP and KMS Health Check
2023-06-01 16:24:04 -07:00
Josh Slocum
a4dffa087a Adding Simulated HTTP Server and refactoring HTTP code (#10112)
* Adding Simulated HTTP Server and refactoring HTTP code

* fixing formatting

* fixing merge conflicts

* fixing more merge conflicts

* code review feedback

* changing reference counted interface

* more fixes

* fixing ide build i guess
2023-05-05 12:19:17 -05:00
Junhyun Shim
e2df6e3302 Wipe packet buffers that held serialized WipedString (#10018)
* Extend WipedString guarantees to serialized packets

* Apply review suggestions
2023-04-20 16:38:55 +02:00
Ata E Husain Bohra
3f6fcada45 EaR - Misc fixes found using end-to-end integration testing (#9806)
* EaR - Misc fixes found using end-to-end integration testing

Description

Major changes proposed includes:
1. RESTClient filtering of trailing `/`(s) characters from
input URI resource path
2. Avoid EKP exponential backup given RESTClient supports
exponential backoffs retries for all retryable errors.
3. Memory allocation optimizations:
 3.1. BaseCipher key management using Standalone semantics
 in KMSConnector interface endpoints
 3.2. Optimize memcpy while looking encryption-keys in EKP endpoints
4. Avoid delay while starting EKP, given its criticality during
cluster recovery.
5. Update BlobCipher to handle variable size BaseCipher buffer
6. Improved logging

Testing

Setup:
1. External KMS server to supply encryption keys (inhouse)
2. Create cluster with: cluster_aware & domain_aware config

* Fix EncryptionOps test

Description

Testing

* EaR - Misc fixes found using end-to-end integration testing

Description

Major changes:
1. Cleanup EKP driven exponential backup files.
2. Update EKP not to use #1.

Testing

* EaR - Misc fixes found using end-to-end integration testing

Description

Address review comments

Testing

* Fix AES 256 key length value

Description

Testing

* Address review comments

Description

Testing
2023-03-30 22:22:26 -07:00
Jay Zhuang
0efd403e59 Add inplace encryption/decryption API 2023-03-23 15:26:22 -07:00
Ata E Husain Bohra
d0eec9d0ba EaR: REST KMS fixes - encryption integration testing (#9598)
* EaR: REST KMS fixes - encryption integration testing

Description

Major changes:
1. Multiple fixes observed while performing integration end-to-end
testing for Encryption at-rest feature.
2. Improve REST module logging. Introduced FLOW_KNOBS->REST_LOG_LEVEL
to have more granular control of feature logging disconnected from
the cluster log level.

Testing

Integration testbed:
1. Run fdbserver standalone
2. Run external KMS http-server to serve encryption key fetch requests
2023-03-08 09:49:43 -08:00
Nim Wijetunga
57ff58fd1a EKP Retry Loop on KMS Connection Failures (#9524)
EKP Retry Loop
2023-03-03 09:41:20 -08:00
Junhyun Shim
b811881f41 Allow unthrottled, unsuppressed traces for security-related events (#9459)
* Define API for unsuppressable TraceEvent types

Add trace checking tests for authz trace events

* Revert temporary configurations used for debugging

* Simplify/Modernize flow audit logging API

- Do event type whitelist checks at compile time
- Use ""_audit literal API instead of a tag struct
- Replace int with a lightweight struct for tracking/modifying TraceEvent enablement

* Revert installing signal handler for SIGTERM and refactor test script

Move trace checker to local_cluster.py

* Lengthen public key refresh interval and add more audited events

* Try and make MSVC and Mac build happy

* consteval > constexpr

'inline consteval' still causes link errors in Mac builds
2023-02-27 21:51:13 +01:00
Junhyun Shim
1afd63d7e3 Minimize the risk of TracedTooManyLines in simulation
- Disable audit logging for simulation
- Relax the max_trace_lines knob limit to reduce false positives
2023-02-06 21:50:39 +01:00
Yi Wu
17fdbc46a5 EaR: Add page checksum to Redwood pages in no-auth mode (#8965)
Previously with EaR we always enable authentication (e.g. we encrypt Redwood pages). The authentication is a form of checksum, so dedicated page checksum was not needed. This PR adds back xxhash page checksum when authentication is disabled. Also change the knob to default disable authentication.
2023-01-03 10:30:07 -08:00
Kevin Hoxha
a05649c620 metrics: Add knob to control emission of DDSketch buckets 2022-12-14 14:33:39 -08:00
Kevin Hoxha
3cea754ba3 metrics: Add OTEL metric definitions 2022-12-08 10:07:11 -08:00
Kevin Hoxha
5a9d3343cc metrics: Add IMetricClient and StatsdMetric to send batches over UDP 2022-12-08 10:07:11 -08:00
Kevin Hoxha
f3431fe1e7 metrics: Add MetricsLogger loop and more knobs 2022-12-08 10:07:11 -08:00
Kevin Hoxha
fe73576cc7 metrics: Add knobs and make Counter, LatencySample implement flush() method 2022-12-08 10:07:11 -08:00
Marian Dvorsky
085fce8478 Distributed tracing related improvements (#8942)
Several fixes/improvements related to distributed traces.

Remove "key" attributes and the TRACING_SPAN_ATTRIBUTES_ENABLED knob: we almost never want to log actual keys (as they can contain private data), however, we do want to use other span attributes.
In Transaction::setTransactionID, properly propagate spanContext flags, and set all copies of spancontext
2022-12-07 17:58:52 +01:00
A.J. Beamon
3e9b6ce937 Don't disallow a seed value of 0 2022-12-02 15:32:34 -08:00
Lukas Joswiak
7d73d52a91 Enable tracing in simulation
This will help test the flow of span IDs from the client all the way to
the storage servers.
2022-11-21 09:03:22 -08:00
Sam Gwydir
34b8c5eb2b ENCRYPT_KEY_CACHE_LOGGING_SAMPLE_SIZE -> ENCRYPT_KEY_CACHE_LOGGING_SKETCH_ACCURACY 2022-11-14 10:47:45 -08:00
Sam Gwydir
23706c957b Use DDSketch for Sample Data. 2022-11-12 13:45:46 -08:00
Steve Atherton
d169875423 Add a knob for whether to allow guard pages in memory allocations done via mmapInternal(). The knob defaults to false. 2022-10-16 19:55:07 -07:00
Markus Pilman
c143f1db33 Merge pull request #8455 from sfc-gh-mpilman/features/token-audit-logging
Audit all AuthZ token usages
2022-10-12 14:22:55 -06:00
Kevin Hoxha
ff1b2df8f6 fdbcli: Add options for knob management
- setknob <knob_name> <knob_value> [config_class]
- getknob <knob_name> [config_class]
- Added new option to begin to specify if it's a configuration txn. Syntax is begin [config-txn]
- Added utility function for converting tuples to string
- Added knobmanagment test in fdbcli_tests.py
2022-10-11 15:32:01 -07:00
Markus Pilman
5239c491c4 Audit all AuthZ token usages 2022-10-11 14:34:10 -06:00
Marian Dvorsky
c6c449d047 Extract TaskQueue out of Net2 and reuse it in sim2 (#8330)
* Extract TaskQueue out of Net2 and reuse it in sim2

* empty commit

* Address review comments

* Introduce MAX_RUNLOOP_SLEEP_DELAY

* Apply clang-format
2022-10-10 12:46:06 -07:00
sfc-gh-tclinkenbeard
fec791be62 Merge remote-tracking branch 'origin/main' into split-failure-injection-workloads 2022-09-30 18:00:48 -07:00
Ata E Husain Bohra
03f1d13be3 Enable encryption authentication configurability (#8312)
* Enable encryption authentication configurability

Description

 diff-1: Remove memcpy due to auth-token computation
         Address review comments

Patch proposes major changes:
1. Enable FDB to choose encryption authentication as a configurable
parameter. Fix issues choosing ENCRYPT_HEADER_AUTH_TOKEN_NONE mode.
2. Introduce AES_CMAC as supported encryption authentication scheme.

Patch allows cluster to govern: if encryption authentication needs to
enabled, if yes, then choose from two supported schemes:
1. HMAC_SHA_256
2. AES_256_CMAC

Testing

devRunCorrectness - 100K
BlobCipher unittests
EncryptionOps.toml
BlobGranuleCorrectness/BlobGranuleCorrectnessClean
2022-09-29 16:18:55 -07:00
sfc-gh-tclinkenbeard
2434f18c5c Enable failure injection for all simulation tests 2022-09-29 14:14:24 -07:00
sfc-gh-tclinkenbeard
5e71f365fb Set ENABLE_SIMULATION_IMPROVEMENTS to true 2022-09-29 12:40:09 -07:00
Markus Pilman
e1627e0a78 Merge remote-tracking branch 'origin/main' into features/always-inject-faults 2022-09-19 09:38:55 -06:00
Steve Atherton
2bf90ca5ec Change KAIO latency metrics to use LatencySample for easier usability. Rename a SQLite-specific knob to indicate it is specific to SQLite. 2022-09-15 13:27:23 -07:00
Markus Pilman
2d1b58d020 Update flow/Knobs.cpp
Co-authored-by: Trevor Clinkenbeard <trevor.clinkenbeard@snowflake.com>
2022-09-15 11:54:45 -06:00
Markus Pilman
acd24d6c81 Merge remote-tracking branch 'origin/main' into features/always-inject-faults 2022-09-12 16:44:16 -06:00
Yi Wu
d831c87d14 Add encryption metrics (#8070)
Adding the following metrics:
* BlobCipherKeyCache hit/miss
* EKP: KMS requests latencies
* For each component that using encryption, they now need to pass a UsageType enum to the encryption helper methods (GetEncryptCipherKeys/GetLatestEncryptCipherKey/encrypt/decrypt) and those methods will help to log get cipher key latency samples and encryption/decryption cpu times accordingly.
2022-09-09 18:43:09 -07:00