Commit Graph

49 Commits

Author SHA1 Message Date
Edward Thomson
c3512fe6d2 Merge branch 'main' into multi-pack-index-odb-write 2021-08-29 21:35:40 -04:00
lhchavez
ea285904dc midx: Introduce git_odb_write_multi_pack_index()
This change introduces git_odb_write_multi_pack_index(), which creates a
`multi-pack-index` file from all the `.pack` files that have been loaded
in the ODB.

Fixes: #5399
2021-08-27 04:12:16 -07:00
lhchavez
9d117e3857 midx: Add a way to write multi-pack-index files
This change adds the git_midx_writer_* functions to allow to
write and create `multi-pack-index` files from `.idx`/`.pack` files.

Part of: #5399
2021-08-27 04:10:37 -07:00
lhchavez
fff209c400 midx: Add a way to write multi-pack-index files
This change adds the git_midx_writer_* functions to allow to
write and create `multi-pack-index` files from `.idx`/`.pack` files.

Part of: #5399
2021-07-26 18:49:18 -07:00
lhchavez
d50d3db64a midx: Fix a bug in git_midx_needs_refresh()
The very last check in the freshness check for the midx was wrong ><
This was also because this function was not tested.
2021-01-07 06:37:11 -08:00
Edward Thomson
0450e3134b tests: ifdef out unused function in no-thread builds 2020-12-05 22:07:49 +00:00
lhchavez
322c15ee85 Make the pack and mwindow implementations data-race-free
This change fixes a packfile heap corruption that can happen when
interacting with multiple packfiles concurrently across multiple
threads. This is exacerbated by setting a lower mwindow open file limit.

This change:

* Renames most of the internal methods in pack.c to clearly indicate
  that they expect to be called with a certain lock held, making
  reasoning about the state of locks a bit easier.
* Splits the `git_pack_file` lock in two: the one in `git_pack_file`
  only protects the `index_map`. The protection to `git_mwindow_file` is
  now in that struct.
* Explicitly checks for freshness of the `git_pack_file` in
  `git_packfile_unpack_header`: this allows the mwindow implementation
  to close files whenever there is enough cache pressure, and
  `git_packfile_unpack_header` will reopen the packfile if needed.
* After a call to `p_munmap()`, the `data` and `len` fields are poisoned
  with `NULL` to make use-after-frees more evident and crash rather than
  being open to the possibility of heap corruption.
* Adds a test case to prevent this from regressing in the future.

Fixes: #5591
2020-11-28 19:40:56 -08:00
lhchavez
f847fa7b34 midx: Support multi-pack-index files in odb_pack.c
This change adds support for reading multi-pack-index files from the
packfile odb backend. This also makes git_pack_file objects open their
backing failes lazily in more scenarios, since the multi-pack-index can
avoid having to open them in some cases (yay!).

This change also refreshes the documentation found in src/odb_pack.c to
match the updated code.

Part of: #5399
2020-11-27 11:40:02 +00:00
Edward Thomson
e316b0d3d6 runtime: move init/shutdown into the "runtime"
Provide a mechanism for system components to register for initialization
and shutdown of the libgit2 runtime.
2020-10-11 20:13:04 +01:00
lhchavez
005e77157d multipack: Introduce a parser for multi-pack-index files
This change is the first in a series to add support for git's
multi-pack-index. This should speed up large repositories significantly.

Part of: #5399
2020-10-05 05:08:38 -07:00
lhchavez
92d42eb3d8 Minor nits and style formatting 2020-07-12 09:53:10 -07:00
lhchavez
d88994da60 Create the repository within the test
This change moves the humongous static repository with 1025 commits into
the test file, now with a more modest 16 commits.
2020-06-27 07:38:02 -07:00
lhchavez
eab2b0448e Review feedback
* Change the default of the file limit to 0 (unlimited).
* Changed the heuristic to close files to be the file that contains the
  least-recently-used window such that the window is the
  most-recently-used in the file, and the file does not have in-use
  windows.
* Parameterized the filelimit test to check for a limit of 1 and 100
  open windows.
2020-06-26 16:45:25 -07:00
lhchavez
9679df5736 mwindow: set limit on number of open files
There are some cases in which repositories accrue a large number of
packfiles. The existing mwindow limit applies only to the total size of
mmap'd files, not on their number. This leads to a situation in which
having lots of small packfiles could exhaust the allowed number of open
files, particularly on macOS, where the default ulimit is very low
(256).

This change adds a new configuration parameter
(GIT_OPT_SET_MWINDOW_FILE_LIMIT) that sets the maximum number of open
packfiles, with a default of 128. This is low enough so that even macOS
users should not hit it during normal use.

Based on PR #5386, originally written by @josharian.

Fixes: #2758
2020-06-21 07:15:26 -07:00
Josh Triplett
5278a00611 git_packbuilder_write: Allow setting path to NULL to use the default path
If given a NULL path, write to the object path of the repository.

Add tests for the new behavior.
2020-05-23 16:07:54 -07:00
Patrick Steinhardt
2dc7b5ef0e tests: pack: add missing asserts around git_packbuilder_write 2020-01-09 12:22:28 +01:00
Patrick Steinhardt
e54343a402 fileops: rename to "futils.h" to match function signatures
Our file utils functions all have a "futils" prefix, e.g.
`git_futils_touch`. One would thus naturally guess that their
definitions and implementation would live in files "futils.h" and
"futils.c", respectively, but in fact they live in "fileops.h".

Rename the files to match expectations.
2019-07-20 19:11:20 +02:00
Edward Thomson
a1ef995dc0 indexer: use git_indexer_progress throughout
Update internal usage of `git_transfer_progress` to
`git_indexer_progreses`.
2019-02-22 11:25:14 +00:00
Patrick Steinhardt
18cf56989a maps: provide high-level iteration interface
Currently, our headers need to leak some implementation details of maps due to
their direct use of indices in the implementation of their foreach macros. This
makes it impossible to completely hide the map structures away, and also makes
it impossible to include the khash implementation header in the C files of the
respective map only.

This is now being fixed by providing a high-level iteration interface
`map_iterate`, which takes as inputs the map that shall be iterated over, an
iterator as well as the locations where keys and values shall be put into. For
simplicity's sake, the iterator is a simple `size_t` that shall initialized to
`0` on the first call. All existing foreach macros are then adjusted to make use
of this new function.
2019-02-15 13:16:48 +01:00
Patrick Steinhardt
7e926ef306 maps: provide a uniform entry count interface
There currently exist two different function names for getting the entry count
of maps, where offmaps offset and string maps use `num_entries` and OID maps use
`size`. In most programming languages with built-in map types, this is simply
called `size`, which is also shorter to type. Thus, this commit renames the
other two functions `num_entries` to match the common way and adjusts all
callers.
2019-02-15 13:16:48 +01:00
Dhruva Krishnamurthy
004a339874 Allow bypassing check '.keep' files using libgit2 option 'GIT_OPT_IGNORE_PACK_KEEP_FILE_CHECK' 2019-02-02 07:56:03 -08:00
Edward Thomson
f673e232af git_error: use new names in internal APIs and usage
Move to the `git_error` name in the internal API for error-related
functions.
2019-01-22 22:30:35 +00:00
Edward Thomson
168fe39bea object_type: use new enumeration names
Use the new object_type enumeration names within the codebase.
2018-12-01 11:54:57 +00:00
Patrick Steinhardt
852bc9f496 khash: remove intricate knowledge of khash types
Instead of using the `khiter_t`, `git_strmap_iter` and `khint_t` types,
simply use `size_t` instead. This decouples code from the khash stuff
and makes it possible to move the khash includes into the implementation
files.
2018-11-28 15:22:27 +01:00
Patrick Steinhardt
65a4b06eb1 tests: indexer: add test to exercise our connectivity checking
The new connectivity tests are not currently being verified at all due
to being turned off by default. Create two test cases for a pack file
which fails our checks and one which suceeds.
2018-06-22 09:52:12 +02:00
Patrick Steinhardt
c16556aadd indexer: introduce options struct to git_indexer_new
We strive to keep an options structure to many functions to be able to
extend options in the future without breaking the API. `git_indexer_new`
doesn't have one right now, but we want to be able to add an option
for enabling strict packfile verification.

Add a new `git_indexer_options` structure and adjust callers to use
that.
2018-06-22 09:52:12 +02:00
Patrick Steinhardt
ecf4f33a4e Convert usage of git_buf_free to new git_buf_dispose 2018-06-10 19:34:37 +02:00
lhchavez
c3514b0b82 Fix unpack double free
If an element has been cached, but then the call to
packfile_unpack_compressed() fails, the very next thing that happens is
that its data is freed and then the element is not removed from the
cache, which frees the data again.

This change sets obj->data to NULL to avoid the double-free. It also
stops trying to resolve deltas after two continuous failed rounds of
resolution, and adds a test for this.
2017-12-23 14:59:07 +00:00
lhchavez
c8aaba2441 libFuzzer: Fix missing trailer crash
This change fixes an invalid memory access when the trailer is missing /
corrupt.

Found using libFuzzer.
2017-12-08 14:37:46 +00:00
lhchavez
400caed3e0 libFuzzer: Fix a git_packfile_stream leak
This change ensures that the git_packfile_stream object in
git_indexer_append() does not leak when the stream has errors.

Found using libFuzzer.
2017-12-06 04:32:16 +00:00
Edward Thomson
6f960b553b Merge pull request #4088 from chescock/packfile-name-using-complete-hash
Ensure packfiles with different contents have different names
2017-06-11 10:37:46 +01:00
Patrick Steinhardt
6c23704df5 settings: rename GIT_OPT_ENABLE_SYNCHRONOUS_OBJECT_CREATION
Initially, the setting has been solely used to enable the use of
`fsync()` when creating objects. Since then, the use has been extended
to also cover references and index files. As the option is not yet part
of any release, we can still correct this by renaming the option to
something more sensible, indicating not only correlation to objects.

This commit renames the option to `GIT_OPT_ENABLE_FSYNC_GITDIR`. We also
move the variable from the object to repository source code.
2017-06-08 21:40:18 +02:00
Chris Hescock
c0e5415566 indexer: name pack files after trailer hash
Upstream git.git has changed the way how packfiles are named.
Previously, they were using a hash of the contained object's OIDs, which
has then been changed to use the hash of the complete packfile instead.
See 1190a1acf (pack-objects: name pack files after trailer hash,
2013-12-05) in the git.git repository for more information on this
change.

This commit changes our logic to match the behavior of core git.
2017-05-19 10:10:54 -04:00
Edward Thomson
1c04a96b25 Honor core.fsyncObjectFiles 2017-03-02 09:11:33 +00:00
Edward Thomson
3ac05d1149 win32: don't fsync parent directories on Windows
Windows doesn't support it.
2017-02-28 13:29:01 +00:00
Edward Thomson
2a5ad7d0f2 fsync: call it "synchronous" object writing
Rename `GIT_OPT_ENABLE_SYNCHRONIZED_OBJECT_CREATION` ->
`GIT_OPT_ENABLE_SYNCHRONOUS_OBJECT_CREATION`.
2017-02-28 13:29:01 +00:00
Edward Thomson
1229e1c4d7 fsync parent directories when fsyncing
When fsync'ing files, fsync the parent directory in the case where we
rename a file into place, or create a new file, to ensure that the
directory entry is flushed correctly.
2017-02-28 13:28:36 +00:00
Edward Thomson
1c2c0ae2a4 packbuilder: honor git_object__synchronized_writing
Honor `git_object__synchronized_writing` when creating a packfile and
corresponding index.
2017-02-28 13:27:50 +00:00
lhchavez
f5586f5c73 Addressed review feedback 2017-01-14 16:37:00 +00:00
lhchavez
a7ff6e5e5e Fix the memory leak 2017-01-03 18:24:51 -08:00
lhchavez
def644e48a Add a test 2017-01-01 17:35:29 -08:00
Jacques Germishuys
0478b7f472 Silence unused return value warning 2014-09-26 12:12:09 +02:00
Ciro Santilli
3b2cb2c91e Factor 40 and 41 constants from source. 2014-09-16 13:07:04 +02:00
Edward Thomson
0cee70ebb7 Introduce cl_assert_equal_oid 2014-07-01 14:40:16 -04:00
Carlos Martín Nieto
b3b66c5793 Share packs across repository instances
Opening the same repository multiple times will currently open the same
file multiple times, as well as map the same region of the file multiple
times. This is not necessary, as the packfile data is immutable.

Instead of opening and closing packfiles directly, introduce an
indirection and allocate packfiles globally. This does mean locking on
each packfile open, but we already use this lock for the global mwindow
list so it doesn't introduce a new contention point.
2014-06-23 21:50:36 +02:00
Philip Kelley
d7a294633d Fix a bug in the pack::packbuilder suite 2014-05-17 16:58:09 -04:00
Russell Belfer
7697e54176 Test cancel from indexer progress callback
This adds tests that try canceling an indexer operation from
within the progress callback.

After writing the tests, I wanted to run this under valgrind and
had a number of errors in that situation because mmap wasn't
working.  I added a CMake option to force emulation of mmap and
consolidated the Amiga-specific code into that new place (so we
don't actually need separate Amiga code now, just have to turn on
-DNO_MMAP).

Additionally, I made the indexer code propagate error codes more
reliably than it used to.
2013-12-11 15:02:20 -08:00
Ben Straub
83e1efbf46 Update files that reference tests-clar 2013-11-14 14:10:32 -08:00
Ben Straub
1782038144 Rename tests-clar to tests 2013-11-14 14:05:52 -08:00