Viktor Lofgren
48791f56bd
(index) Put back Chesterton's fence
2025-09-24 16:09:54 +02:00
Viktor Lofgren
708caa8791
(index) Update verbatim match handling to account for matches that span multiple tags
2025-09-24 15:43:00 +02:00
Viktor Lofgren
32394f42b9
(index) Update verbatim match handling to account for matches that span multiple tags
2025-09-24 15:41:53 +02:00
Viktor Lofgren
b8e3445ce0
(index) Update verbatim match handling to account for matches that span multiple tags
2025-09-24 15:22:50 +02:00
Viktor Lofgren
4694d36ed2
(index) Tweak ranking bonuses for partial matches
2025-09-24 15:01:29 +02:00
Viktor Lofgren
187b4828e6
(index) Sort doc ids passed to re-ranking
2025-09-24 13:26:53 +02:00
Viktor Lofgren
fbfea8539b
(refac) Merge IndexResultScoreCalculator into IndexResultRankingService
2025-09-24 11:51:16 +02:00
Viktor Lofgren
a40c2a8146
(index) Partition index journal by language to speed up index construction
2025-09-21 13:53:43 +02:00
Viktor Lofgren
a3416bf48e
(query) Fix timeout settings to use ms and not s
2025-09-19 22:45:22 +02:00
Viktor Lofgren
54c91a84e3
(query) Make the query client give up if the request exceeds its configured timeout by 50%
2025-09-19 18:59:35 +02:00
Viktor Lofgren
af8a13a7fb
(index) Correct file name compatibility with previous versions
2025-09-19 09:40:43 +02:00
Viktor Lofgren
c661ebb619
(refac) Move language-processing into functions
...
It's long surpassed the single-responsibility library it once was, and is as such out of place in its original location, and fits better among the function-type modules.
2025-09-18 10:30:40 +02:00
Viktor Lofgren
ae31bc8498
(lang+search) Clean up LanguageConfiguration initialization and LangCommand
2025-09-16 11:47:15 +02:00
Viktor Lofgren
938721b793
(index) Backwards compatible loading of old words file in index loading
2025-09-11 15:42:31 +02:00
Viktor Lofgren
f68bcefc75
(index) Correct index construction to use the correct files for Fwd index
2025-09-09 11:21:48 +02:00
Viktor Lofgren
78246b9a63
(index) Fix journal language enumeration
2025-09-08 15:38:26 +02:00
Viktor Lofgren
bffc159486
(language) Make unicode normalization configurable
2025-09-08 13:18:58 +02:00
Viktor Lofgren
edd453531e
(index) Partition keyword lexicons by language
2025-09-04 17:24:48 +02:00
Viktor Lofgren
673c65d3c9
(refac) Fold term-frequency-dict into language-processing
2025-09-03 12:59:10 +02:00
Viktor Lofgren
acb9ec7b15
(refac) Consistently use 'languageIsoCode' for the language field
2025-09-03 12:54:18 +02:00
Viktor Lofgren
47079e05db
(index) Store language information in the index journal
2025-09-03 12:33:24 +02:00
Viktor Lofgren
c93056e77f
(refac) Clean up index code
2025-09-03 09:51:57 +02:00
Viktor Lofgren
6f7530e807
(refac) Clean up index code
2025-09-02 18:53:58 +02:00
Viktor Lofgren
87ce4a1b52
(refac) Clean up index code
2025-09-02 17:52:38 +02:00
Viktor Lofgren
52194cbe7a
(refac) Clean up index code
2025-09-02 17:44:42 +02:00
Viktor Lofgren
fd1ac03c78
(refac) Clean up index code
2025-09-02 17:30:19 +02:00
Viktor Lofgren
5e5b86efb4
(refac) Clean up index code
2025-09-02 17:24:30 +02:00
Viktor Lofgren
f332ec6191
(refac) Clean up index code
2025-09-02 13:13:10 +02:00
Viktor Lofgren
c25c1af437
(refac) Clean up index code
2025-09-02 13:04:05 +02:00
Viktor Lofgren
eb0c911b45
(refac) Clean up index code
2025-09-02 12:50:07 +02:00
Viktor Lofgren
1979870ce4
(refac) Merge index-forward, index-reverse, index/query into index
...
The project has too many submodules, and it's a bit of a headache to navigate.
2025-09-02 12:30:42 +02:00
Viktor Lofgren
0ba2ea38e1
(index) Move reverse index into a distinct package
2025-09-02 11:59:56 +02:00
Viktor Lofgren
d6cfbceeea
(index) Use a configurable hasher in the index
2025-09-01 13:44:28 +02:00
Viktor Lofgren
e369d200cc
(refac) Simplify index data model by merging SearchParameters, SearchTerms and ResultRankingContext into a new object called SearchContext
...
The previous design was difficult to reason about as similar data was stored in several places, and different functions wanted different nearly identical (but not fully identical) context objects.
This is in preparation for making the keyword hash function configurable, as we want focus all the code that hashes keywords into one place.
2025-09-01 13:17:11 +02:00
Viktor Lofgren
946d64c8da
(index) Make hash algorithm selection configurable, writer-side
2025-09-01 12:03:01 +02:00
Viktor Lofgren
42f043a60f
(API) Add language parameter to the APIs
2025-09-01 09:33:39 +02:00
Viktor Lofgren
70b4ed6d81
(ldb) Pipe language information into LDB database
2025-08-29 10:55:47 +02:00
Viktor Lofgren
45dc6412c1
(converter) Add language column to slop tables
2025-08-29 10:55:47 +02:00
Viktor Lofgren
0525303b68
(index) Add upper limit to span lengths
...
Apparently outliers exist that are larger than SHORT_MAX. This is probably not interesting, so we'll truncate at 8192 for now.
Adding logging statement to get more information about which spans these are so we can address the root cause down the line.
2025-08-17 08:44:57 +02:00
Viktor Lofgren
51912e0176
(index) Tweak default values for IndexQueryExecution
2025-08-15 08:07:00 +02:00
Viktor Lofgren
de1b4d5372
(index) Make metrics make more sense by normalizing them by query budget
2025-08-15 03:16:22 +02:00
Viktor Lofgren
50ac926060
(index) Make metrics make more sense by normalizing them by query budget
2025-08-15 03:11:57 +02:00
Viktor Lofgren
d711ee75b5
(index) Add performance metrics
2025-08-15 00:48:52 +02:00
Viktor Lofgren
aee262e5f6
(index) Safeguard against arena-leaks during exceptions
...
The GC would catch these eventually, but it's nice to clean up ourselves in a timely manner.
2025-08-14 19:28:31 +02:00
Viktor Lofgren
4a98a3c711
(skiplist) Move to a separate directory instead of in the btree module
2025-08-14 01:09:46 +02:00
Viktor Lofgren
2a2d951c2f
(index) Fix unhinged default values for index.preparationThreads
2025-08-14 00:54:35 +02:00
Viktor Lofgren
379a1be074
(index) Add better timeout handling in UringQueue, fix slow memory leak on timeout exception
2025-08-14 00:52:50 +02:00
Viktor Lofgren
1c49a0f5ad
(index) Add system properties for toggling O_DIRECT mode for positions and spans
2025-08-12 15:11:13 +02:00
Viktor Lofgren
90325be447
(minor) Fix comments
2025-08-11 23:19:53 +02:00
Viktor Lofgren
dc89587af3
(index) Improve disk locality of the positions data
2025-08-11 21:17:12 +02:00