Files
spellbook/examples/suggest.rs
Michael Davis 3f0aa5cab0 Implement "ngram" suggestions
This is the last part of the `suggester`. Hunspell has a bespoke
string similarity measurement called "ngram similarity." Conceptually
it's like Jaro or Levenshtein similarity - a measurement for how close
two strings are.

The suggester resorts to ngram suggestions when it believes that the
simple string edits in `suggest_low` are not high quality. Ngram
suggestions are a pipeline:

* Iterate on all stems in the wordlist. Take the 100 most promising
  according to a basic ngram similarity score.
* Expand all affixes for each stem and give each expanded form a score
  based on another ngram-similarity-based metric. Take up to the top 200
  most promising candidates.
* Determine a threshold to eliminate lower quality candidates.
* Return the last remaining most promising candidates.

It's notable that because we iterate on the entire wordlist that ngram
suggestions are far far slower than the basic edit based suggestions.
2024-11-11 17:25:37 -05:00

56 lines
1.6 KiB
Rust

/*
Most basic example for the suggester for quick debugging.
This example doesn't check whether the input word is in the dictionary first.
## Usage
```
$ cargo run --example suggest ansi
Compiled the dictionary in 127ms
Suggestions for "ansi": "ANSI", "ans", "anti", "ans i" (checked in 1367µs)
```
*/
use std::time::Instant;
use spellbook::Dictionary;
const EN_US_AFF: &str = include_str!("../vendor/en_US/en_US.aff");
const EN_US_DIC: &str = include_str!("../vendor/en_US/en_US.dic");
fn main() {
let mut args = std::env::args().skip(1);
let word = match args.next() {
Some(arg) => arg,
None => {
eprintln!("Usage: suggest WORD");
std::process::exit(1);
}
};
let now = Instant::now();
let dict = Dictionary::new(EN_US_AFF, EN_US_DIC).unwrap();
println!("Compiled the dictionary in {}ms", now.elapsed().as_millis());
let mut suggestions = Vec::with_capacity(5);
let now = Instant::now();
dict.suggest(&word, &mut suggestions);
let time = now.elapsed().as_micros();
if suggestions.is_empty() {
println!("No suggestions found for \"{word}\" (checked in {time}µs)");
} else {
let suggestions = suggestions
.into_iter()
.fold(String::new(), |mut s, suggestion| {
if !s.is_empty() {
s.push_str(", ");
}
s.push('"');
s.push_str(&suggestion);
s.push('"');
s
});
println!("Suggestions for \"{word}\": {suggestions} (checked in {time}µs)");
}
}