Viktor
39a055aa94
Update ROADMAP.md
2025-06-07 14:01:01 +02:00
Viktor Lofgren
37aaa90dc9
(deploy) Clean up deploy script
deploy-0211
2025-06-07 13:43:56 +02:00
Viktor
24022c5adc
Merge pull request #203 from MarginaliaSearch/nsfw-domain-lists
...
Nsfw blocking via UT1 domain lists
deploy-0210
deploy-0209
2025-06-07 13:24:05 +02:00
Viktor Lofgren
1de9ecc0b6
(nsfw) Add metrics to the filtering so we can monitor it
2025-06-07 13:17:05 +02:00
Viktor Lofgren
9b80245ea0
(nsfw) Move filtering to the IndexApiClient, and add filtering options to the internal APIs and public API.
2025-06-07 12:54:20 +02:00
Viktor Lofgren
4e1595c1a6
(nsfw) Initial work on adding UT1-based domain filtering
2025-06-06 14:23:37 +02:00
Viktor Lofgren
0be8585fa5
Add tag format hint to deploy script
2025-06-06 10:03:18 +02:00
Viktor Lofgren
a0fe070fe7
Redeploy browserless and assistant.
deploy-0208
2025-06-06 09:51:39 +02:00
Viktor Lofgren
abe9da0fc6
(search) Ensure the new search UI sets the correct content-type for opensearch.xml
deploy-0207
2025-05-29 12:44:55 +02:00
Viktor Lofgren
56d0128b0a
(dom-sample) Remove redundant code
2025-05-28 17:43:46 +02:00
Viktor Lofgren
840b68ac55
(dom-sample) Minor cleanups
2025-05-28 16:27:27 +02:00
Viktor Lofgren
c34ff6d6c3
(dom-sample) Use WAL journal for dom sample db
deploy-0206
2025-05-28 16:16:28 +02:00
Viktor Lofgren
32780967d8
(dom-sample) Initialize dom sampler
deploy-0205
2025-05-28 16:06:05 +02:00
Viktor Lofgren
7330bc489d
(deploy) Correct deploy script for browserless
deploy-0204
deploy-0203
2025-05-28 15:58:12 +02:00
Viktor Lofgren
ea23f33738
(deploy) Correct deploy script for headlesschrome
deploy-0202
2025-05-28 15:56:05 +02:00
Viktor Lofgren
4a8a028118
(deploy) Deploy assistant and browserless
deploy-0201
2025-05-28 15:50:26 +02:00
Viktor
a25bc647be
Merge pull request #201 from MarginaliaSearch/website-capture
...
Capture website snapshots
deploy-0200
2025-05-28 15:49:03 +02:00
Viktor Lofgren
a720dba3a2
(deploy) Add browserless to deploy script
2025-05-28 15:48:32 +02:00
Viktor Lofgren
284f382867
(dom-sample) Fix initialization to work the same as screenshot capture
2025-05-28 15:40:09 +02:00
Viktor Lofgren
a80717f138
(dom-sample) Cleanup
2025-05-28 15:32:54 +02:00
Viktor Lofgren
d6da715fa4
(dom-sample) Add basic retrieval logic
...
First iteration is single threaded for simplicity
2025-05-28 15:18:15 +02:00
Viktor Lofgren
c1ec7aa491
(dom-sample) Add a boolean to the sample db when we've accepted a cookie dialogue
2025-05-28 14:45:19 +02:00
Viktor Lofgren
3daf37e283
(dom-sample) Improve storage of DOM sample data
2025-05-28 14:34:34 +02:00
Viktor Lofgren
44a774d3a8
(browserless) Add --pull option to Docker build command
...
This ensures we fetch the latest base image when we build.
2025-05-28 14:09:32 +02:00
Viktor Lofgren
597aeaf496
(website-capture) Correct manifest
...
run_at is set at the content_script level, not the root object.
2025-05-28 14:05:16 +02:00
Viktor Lofgren
06df7892c2
(website-capture) Clean up code
2025-05-27 15:56:59 +02:00
Viktor Lofgren
dc26854268
(website-capture) Add a marker to the network log when we've accepted a cookie dialog
2025-05-27 15:21:02 +02:00
Viktor Lofgren
9f16326cba
(website-capture) Add logic that automatically identifies and agrees to cookie consent popovers
...
Oftentimes, ads don't load until after you've agreed to the popover.
2025-05-27 15:11:47 +02:00
Viktor Lofgren
ed66d0b3a7
(website-capture) Amend the extension to also capture web request information
2025-05-26 14:00:43 +02:00
Viktor Lofgren
c3afc82dad
(website-capture) Rename scripts to be more consistent with extension terminology
2025-05-26 13:13:11 +02:00
Viktor Lofgren
08e25e539e
(website-capture) Minor cleanups
2025-05-21 14:55:03 +02:00
Viktor Lofgren
4946044dd0
(website-capture) Update BrowserlesClient to use the new image
2025-05-21 14:14:18 +02:00
Viktor Lofgren
edf382e1c5
(website-capture) Add a custom docker image with a new custom extension for DOM capture
...
The original approach of injecting javascript into the page directly didn't work with pages that reloaded themselves. To work around this, a chrome extension is used instead that does the same work, but subscribes to reload events and re-installs the change listener.
2025-05-21 14:13:54 +02:00
Viktor Lofgren
644cba32e4
(website-capture) Remove dead imports
2025-05-20 16:08:48 +02:00
Viktor Lofgren
34b76390b2
(website-capture) Add storage object for DOM samples
2025-05-20 16:05:54 +02:00
Viktor Lofgren
43cd507971
(crawler) Add a migration workaround so we can still open old slop crawl data with the new column added
deploy-0199
2025-05-19 14:47:38 +02:00
Viktor Lofgren
cc40e99fdc
(crawler) Add a migration workaround so we can still open old slop crawl data with the new column added
deploy-0198
2025-05-19 14:37:59 +02:00
Viktor Lofgren
8a944cf4c6
(crawler) Add request time to crawl data
...
This is an interesting indicator of website quality.
deploy-0197
2025-05-19 14:07:41 +02:00
Viktor Lofgren
1c128e6d82
(crawler) Add request time to crawl data
...
This is an interesting indicator of website quality.
deploy-0196
2025-05-19 14:02:03 +02:00
Viktor Lofgren
be039d1a8c
(live-capture) Add a new function for capturing the DOM of a website after rendering
...
The new code injects a javascript that attempts to trigger popovers, and then alters the DOM to add attributes containing CSS elements with position and visibility.
2025-05-19 13:26:07 +02:00
Viktor Lofgren
4edc0d3267
(converter) Increase work buffer for converter
...
Conversion on index node 7 in production is crashing ostensibly because this buffer is too small.
deploy-0195
2025-05-18 13:22:44 +02:00
Viktor Lofgren
890f521d0d
(pdf) Fix crash for some bold lines
deploy-0194
2025-05-18 13:05:05 +02:00
Viktor Lofgren
b1814a30f7
(deploy) Redeploy all services.
deploy-0193
2025-05-17 13:11:51 +02:00
Viktor Lofgren
f59a9eb025
(legacy-search) Soften domain limit constraints in URL deduplication
deploy-0192
2025-05-17 00:04:27 +02:00
Viktor Lofgren
599534806b
(search) Soften domain limit constraints in URL deduplication
deploy-0191
2025-05-17 00:00:42 +02:00
Viktor Lofgren
7e8253dac7
(search) Clean up debug logging
2025-05-17 00:00:28 +02:00
Viktor Lofgren
97a6780ea3
(search) Add debug logging for specific query
deploy-0190
2025-05-16 23:41:35 +02:00
Viktor Lofgren
eb634beec8
(search) Add debug logging for specific query
deploy-0189
2025-05-16 23:34:03 +02:00
Viktor Lofgren
269ebd1654
Revert "(query) Add debug logging for specific query"
...
This reverts commit 39ce40bfeb
.
deploy-0188
2025-05-16 23:29:06 +02:00
Viktor Lofgren
39ce40bfeb
(query) Add debug logging for specific query
deploy-0187
2025-05-16 23:23:53 +02:00