plocate

mirror of http://git.sesse.net/plocate synced 2025-10-06 03:32:43 +02:00

Author	SHA1	Message	Date
Steinar H. Gunderson	82f5f11a22	Fix a comment in db.h.	2021-12-28 14:20:04 +01:00
Steinar H. Gunderson	d0f2469aed	Fix an issue where the database could be built with the wrong check_visibility flag. The check_visibility flag would never be set in the header, and thus be set to some random variable instead of what the user wanted.	2020-12-05 10:50:49 +01:00
Steinar H. Gunderson	23668b1483	Honor the “require visibility” flag (in the negative).	2020-11-28 18:17:23 +01:00
Steinar H. Gunderson	63fd24efd7	Add a native updatedb. This incorporates some code from mlocate's updatedb, and thus is compatible with /etc/updatedb.conf, and supports all the pruning options from it. All the code has been heavily modified, e.g. the gnulib dependency has been removed and replaced with STL code (kicking 10k+ lines of code), the bind mount code has been fixed (it was all broken since the switch from /etc/mtab to /proc/self/mountinfo) and everything has been reformatted. Like with mlocate, plocate's updatedb is merging, ie., it can skip readdir() on unchanged directories. (The logic here is also copied pretty verbatim from mlocate.) updatedb reads plocate's native format; there's a new max_version 2 that contains directory timestamps (without it, updatedb will fall back to a full scan). The timestamps increase the database size by only about 1%, which is a good tradeoff when we're getting rid of the entire mlocate database. We liberally use modern features to simplify the implementation; in particular, openat() to avoid race conditions, instead of mlocate's complicated chdir() dance. Unfortunately, the combination of the slightly strange storage order from mlocate, and openat(), means we can need to keep up a bunch of file descriptors open, but they are not an expensive resource these days, and we try to bump the limit ourselves if we are allowed to. We also use O_TMPFILE, to make sure we never leave a half-finished file lying around (mlocate's updatedb tries to catch signals instead). All of this may hinder portability, so we might ease up on the requirements later. We don't use io_uring for updatedb at this point. plocate-build does not write the needed timestamps, so the first upgrade from mlocate to native plocate requires a full rescan. NOTE: The format is _not_ frozen yet, and won't be until actual release.	2020-11-25 00:58:09 +01:00
Steinar H. Gunderson	15235ad941	Use zstd dictionaries. Since we have small strings, they can benefit from some shared context, and zstd supports this. plocate-build now reads the mlocate database twice; the first pass samples 1000 random blocks, which it uses to train a 1 kB dictionary. (zstd recommends much larger dictionaries, but practical testing seems to indicate this doesn't help us much, and might actually be harmful.) We get ~20% slower builds and ~7% smaller .db files -- but more interestingly, linear search speed is up ~20% (which indicates that decompression in itself benefits more). We need to read the 1 kB dictionary, but it's practically free since it's stored next to the header and so small. This is a version bump (to version 1), so we're not forward-compatible, but we're backward-compatible (plocate still reads version 0 files just fine). Since we're adding more fields to the header anyway, we can add a new “max_version” field that allows for marking backwards-compatible changes in the future, ie., if plocate-build adds more information that plocate would like to use but that older plocate versions can simply ignore.	2020-10-13 17:53:02 +02:00
Steinar H. Gunderson	d5f6c3c0a4	Fix searching for very short (1 or 2 bytes) queries. plocate had assumptions about the layout of the file, that no longer held. Use the pad field to simplify things. This requires a database rebuild, but only for short queries. Normal queries will continue to work, so there's no version bump.	2020-10-03 10:49:10 +02:00
Steinar H. Gunderson	96d1b7ab7a	Make some padding in the header explicit.	2020-10-02 18:36:46 +02:00
Steinar H. Gunderson	94cd925830	Rerun clang-format.	2020-09-30 21:52:16 +02:00
Steinar H. Gunderson	c41f998855	Switch trigram lookup from binary search to a hash table. Binary search was fine when we just wanted simplicity, but for I/O optimization on rotating media, we want as few seeks as possible. A hash table with open addressing gives us just that; Robin Hood hashing makes it possible for us to guarantee maximum probe length, so we can just read 256 bytes (plus a little slop) for each lookup and that's it. This kills ~30 ms or so cold-cache. This breaks the format, so we use the chance to add a magic and a proper header to provide some more flexibility in case we want to change the builder.	2020-09-30 19:46:53 +02:00

9 Commits