mirror of
https://github.com/helix-editor/spellbook.git
synced 2025-10-06 00:02:48 +02:00
This is the last part of the `suggester`. Hunspell has a bespoke string similarity measurement called "ngram similarity." Conceptually it's like Jaro or Levenshtein similarity - a measurement for how close two strings are. The suggester resorts to ngram suggestions when it believes that the simple string edits in `suggest_low` are not high quality. Ngram suggestions are a pipeline: * Iterate on all stems in the wordlist. Take the 100 most promising according to a basic ngram similarity score. * Expand all affixes for each stem and give each expanded form a score based on another ngram-similarity-based metric. Take up to the top 200 most promising candidates. * Determine a threshold to eliminate lower quality candidates. * Return the last remaining most promising candidates. It's notable that because we iterate on the entire wordlist that ngram suggestions are far far slower than the basic edit based suggestions.
56 lines
1.6 KiB
Rust
56 lines
1.6 KiB
Rust
/*
|
|
Most basic example for the suggester for quick debugging.
|
|
|
|
This example doesn't check whether the input word is in the dictionary first.
|
|
|
|
## Usage
|
|
|
|
```
|
|
$ cargo run --example suggest ansi
|
|
Compiled the dictionary in 127ms
|
|
Suggestions for "ansi": "ANSI", "ans", "anti", "ans i" (checked in 1367µs)
|
|
```
|
|
*/
|
|
use std::time::Instant;
|
|
|
|
use spellbook::Dictionary;
|
|
|
|
const EN_US_AFF: &str = include_str!("../vendor/en_US/en_US.aff");
|
|
const EN_US_DIC: &str = include_str!("../vendor/en_US/en_US.dic");
|
|
|
|
fn main() {
|
|
let mut args = std::env::args().skip(1);
|
|
let word = match args.next() {
|
|
Some(arg) => arg,
|
|
None => {
|
|
eprintln!("Usage: suggest WORD");
|
|
std::process::exit(1);
|
|
}
|
|
};
|
|
|
|
let now = Instant::now();
|
|
let dict = Dictionary::new(EN_US_AFF, EN_US_DIC).unwrap();
|
|
println!("Compiled the dictionary in {}ms", now.elapsed().as_millis());
|
|
|
|
let mut suggestions = Vec::with_capacity(5);
|
|
let now = Instant::now();
|
|
dict.suggest(&word, &mut suggestions);
|
|
let time = now.elapsed().as_micros();
|
|
if suggestions.is_empty() {
|
|
println!("No suggestions found for \"{word}\" (checked in {time}µs)");
|
|
} else {
|
|
let suggestions = suggestions
|
|
.into_iter()
|
|
.fold(String::new(), |mut s, suggestion| {
|
|
if !s.is_empty() {
|
|
s.push_str(", ");
|
|
}
|
|
s.push('"');
|
|
s.push_str(&suggestion);
|
|
s.push('"');
|
|
s
|
|
});
|
|
println!("Suggestions for \"{word}\": {suggestions} (checked in {time}µs)");
|
|
}
|
|
}
|