(actor) Make ping spawner auto-spawn the process

(ping) Parameterize thread counts for availability and DNS job consumers
(ping) Reduce maximum total connections in HttpClientProvider to improve resource management
2025-10-06 07:32:38 +02:00 · 2025-06-12 13:46:50 +02:00 · 2025-06-12 13:34:58 +02:00 · 2025-06-12 13:04:55 +02:00 · 2025-06-12 12:56:33 +02:00 · 2025-06-12 00:18:07 +02:00
585 changed files with 33165 additions and 4633 deletions
--- a/.github/FUNDING.yml
+++ b/.github/FUNDING.yml
@@ -1,5 +1,6 @@
 # These are supported funding model platforms

+polar: marginalia-search
 github: MarginaliaSearch
 patreon: marginalia_nu
 open_collective: # Replace with a single Open Collective username
--- a/.gitignore
+++ b/.gitignore
@@ -7,3 +7,4 @@ build/
 lombok.config
 Dockerfile
 run
+jte-classes
--- a/ROADMAP.md
+++ b/ROADMAP.md
@@ -1,4 +1,4 @@
-# Roadmap 2024-2025
+# Roadmap 2025

 This is a roadmap with major features planned for Marginalia Search.

@@ -8,20 +8,10 @@ be implemented as well.
 Major goals:

 * Reach 1 billion pages indexed
-* Improve technical ability of indexing and search.  Although this area has improved a bit, the
-  search engine is still not very good at dealing with longer queries.

-## Proper Position Index (COMPLETED 2024-09)

-The search engine uses a fixed width bit mask to indicate word positions.  It has the benefit
-of being very fast to evaluate and works well for what it is, but is inaccurate and has the 
-drawback of making support for quoted search terms inaccurate and largely reliant on indexing 
-word n-grams known beforehand.  This limits the ability to interpret longer queries.
-
-The positions mask should be supplemented or replaced with a more accurate (e.g.) gamma coded positions
-list, as is the civilized way of doing this.
-
-Completed with PR https://github.com/MarginaliaSearch/MarginaliaSearch/pull/99
+* Improve technical ability of indexing and search.  ~~Although this area has improved a bit, the
+  search engine is still not very good at dealing with longer queries.~~  (As of PR [#129](https://github.com/MarginaliaSearch/MarginaliaSearch/pull/129), this has improved significantly.  There is still more work to be done )

 ## Hybridize crawler w/ Common Crawl data

@@ -37,8 +27,7 @@ Retaining the ability to independently crawl the web is still strongly desirable

 ## Safe Search

-The search engine has a bit of a problem showing spicy content mixed in with the results.  It would be desirable
-to have a way to filter this out.  It's likely something like a URL blacklist (e.g. [UT1](https://dsi.ut-capitole.fr/blacklists/index_en.php) )
+The search engine has a bit of a problem showing spicy content mixed in with the results.  It would be desirable to have a way to filter this out.  It's likely something like a URL blacklist (e.g. [UT1](https://dsi.ut-capitole.fr/blacklists/index_en.php) )
 combined with naive bayesian filter would go a long way, or something more sophisticated...?

 ## Additional Language Support
@@ -49,23 +38,6 @@ associated with each language added, at least a models file or two, as well as s

 It would be very helpful to find a speaker of a large language other than English to help in the fine tuning.

-## Finalize RSS support (COMPLETED 2024-11)
-
-Marginalia has experimental RSS preview support for a few domains.  This works well and
-it should be extended to all domains.  It would also be interesting to offer search of the
-RSS data itself, or use the RSS set to feed a special live index that updates faster than the
-main dataset. 
-
-Completed with PR [#122](https://github.com/MarginaliaSearch/MarginaliaSearch/pull/122) 
-
-## Support for binary formats like PDF
-
-The crawler needs to be modified to retain them, and the conversion logic needs to parse them.  
-The documents database probably should have some sort of flag indicating it's a PDF as well.
-
-PDF parsing is known to be a bit of a security liability so some thought needs to be put in
-that direction as well.
-
 ## Custom ranking logic

 Stract does an interesting thing where they have configurable search filters.
@@ -74,5 +46,50 @@ This looks like a good idea that wouldn't just help clean up the search filters
 website, but might be cheap enough we might go as far as to offer a number of ad-hoc custom search
 filter for any API consumer.

-I've talked to the stract dev and he does not think it's a good idea to mimic their optics language, 
-which is quite ad-hoc, but instead to work together to find some new common description language for this. 
+I've talked to the stract dev and he does not think it's a good idea to mimic their optics language, which is quite ad-hoc, but instead to work together to find some new common description language for this. 
+
+## Show favicons next to search results
+
+This is expected from search engines.  Basic proof of concept sketch of fetching this data has been done, but the feature is some way from being reality. 
+
+## Specialized crawler for github
+
+One of the search engine's biggest limitations right now is that it does not index github at all.   A specialized crawler that fetches at least the readme.md would go a long way toward providing search capabilities in this domain.
+
+# Completed
+
+## Support for binary formats like PDF (COMPLETED 2025-05)
+
+The crawler needs to be modified to retain them, and the conversion logic needs to parse them.  
+The documents database probably should have some sort of flag indicating it's a PDF as well.
+
+PDF parsing is known to be a bit of a security liability so some thought needs to be put in
+that direction as well.
+
+## Web Design Overhaul (COMPLETED 2025-01)
+
+The design is kinda clunky and hard to maintain, and needlessly outdated-looking.  
+
+PR [#127](https://github.com/MarginaliaSearch/MarginaliaSearch/pull/127)
+
+## Finalize RSS support (COMPLETED 2024-11)
+
+Marginalia has experimental RSS preview support for a few domains.  This works well and
+it should be extended to all domains.  It would also be interesting to offer search of the
+RSS data itself, or use the RSS set to feed a special live index that updates faster than the
+main dataset. 
+
+Completed with PR [#122](https://github.com/MarginaliaSearch/MarginaliaSearch/pull/122) and PR [#125](https://github.com/MarginaliaSearch/MarginaliaSearch/pull/125)
+
+## Proper Position Index (COMPLETED 2024-09)
+
+The search engine uses a fixed width bit mask to indicate word positions.  It has the benefit
+of being very fast to evaluate and works well for what it is, but is inaccurate and has the 
+drawback of making support for quoted search terms inaccurate and largely reliant on indexing 
+word n-grams known beforehand.  This limits the ability to interpret longer queries.
+
+The positions mask should be supplemented or replaced with a more accurate (e.g.) gamma coded positions
+list, as is the civilized way of doing this.
+
+Completed with PR [#99](https://github.com/MarginaliaSearch/MarginaliaSearch/pull/99)
+
--- a/build.gradle
+++ b/build.gradle
@@ -5,7 +5,7 @@ plugins {

    // This is a workaround for a bug in the Jib plugin that causes it to stall randomly
    // https://github.com/GoogleContainerTools/jib/issues/3347
-    id 'com.google.cloud.tools.jib' version '3.4.3' apply(false)
+    id 'com.google.cloud.tools.jib' version '3.4.5' apply(false)
 }

 group 'marginalia'
@@ -43,11 +43,11 @@ subprojects.forEach {it ->
 }

 ext {
-    jvmVersion=23
-    dockerImageBase='container-registry.oracle.com/graalvm/jdk:23'
+    jvmVersion = 24
+    dockerImageBase='container-registry.oracle.com/graalvm/jdk:24'
    dockerImageTag='latest'
    dockerImageRegistry='marginalia'
-    jibVersion = '3.4.3'
+    jibVersion = '3.4.5'
 }

 idea {
--- a/code/common/config/java/nu/marginalia/LanguageModels.java
+++ b/code/common/config/java/nu/marginalia/LanguageModels.java
@@ -24,58 +24,4 @@ public class LanguageModels {
        this.fasttextLanguageModel = fasttextLanguageModel;
        this.segments = segments;
    }
-
-    public static LanguageModelsBuilder builder() {
-        return new LanguageModelsBuilder();
-    }
-
-    public static class LanguageModelsBuilder {
-        private Path termFrequencies;
-        private Path openNLPSentenceDetectionData;
-        private Path posRules;
-        private Path posDict;
-        private Path fasttextLanguageModel;
-        private Path segments;
-
-        LanguageModelsBuilder() {
-        }
-
-        public LanguageModelsBuilder termFrequencies(Path termFrequencies) {
-            this.termFrequencies = termFrequencies;
-            return this;
-        }
-
-        public LanguageModelsBuilder openNLPSentenceDetectionData(Path openNLPSentenceDetectionData) {
-            this.openNLPSentenceDetectionData = openNLPSentenceDetectionData;
-            return this;
-        }
-
-        public LanguageModelsBuilder posRules(Path posRules) {
-            this.posRules = posRules;
-            return this;
-        }
-
-        public LanguageModelsBuilder posDict(Path posDict) {
-            this.posDict = posDict;
-            return this;
-        }
-
-        public LanguageModelsBuilder fasttextLanguageModel(Path fasttextLanguageModel) {
-            this.fasttextLanguageModel = fasttextLanguageModel;
-            return this;
-        }
-
-        public LanguageModelsBuilder segments(Path segments) {
-            this.segments = segments;
-            return this;
-        }
-
-        public LanguageModels build() {
-            return new LanguageModels(this.termFrequencies, this.openNLPSentenceDetectionData, this.posRules, this.posDict, this.fasttextLanguageModel, this.segments);
-        }
-
-        public String toString() {
-            return "LanguageModels.LanguageModelsBuilder(termFrequencies=" + this.termFrequencies + ", openNLPSentenceDetectionData=" + this.openNLPSentenceDetectionData + ", posRules=" + this.posRules + ", posDict=" + this.posDict + ", fasttextLanguageModel=" + this.fasttextLanguageModel + ", segments=" + this.segments + ")";
-        }
-    }
 }
--- a/code/common/config/java/nu/marginalia/UserAgent.java
+++ b/code/common/config/java/nu/marginalia/UserAgent.java
@@ -1,3 +1,8 @@
 package nu.marginalia;

+/**
+ * A record representing a User Agent.
+ * @param uaString - the header value of the User Agent
+ * @param uaIdentifier - what we look for in robots.txt
+ */
 public record UserAgent(String uaString, String uaIdentifier) {}
--- a/code/common/db/java/nu/marginalia/db/DbDomainQueries.java
+++ b/code/common/db/java/nu/marginalia/db/DbDomainQueries.java
@@ -8,18 +8,23 @@ import com.google.inject.Inject;
 import com.google.inject.Singleton;
 import com.zaxxer.hikari.HikariDataSource;
 import nu.marginalia.model.EdgeDomain;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;

 import java.sql.SQLException;
-import java.util.NoSuchElementException;
-import java.util.Optional;
-import java.util.OptionalInt;
+import java.util.*;
 import java.util.concurrent.ExecutionException;

@Singleton
 public class DbDomainQueries {
    private final HikariDataSource dataSource;

+    private static final Logger logger = LoggerFactory.getLogger(DbDomainQueries.class);
+
    private final Cache<EdgeDomain, Integer> domainIdCache = CacheBuilder.newBuilder().maximumSize(10_000).build();
+    private final Cache<EdgeDomain, DomainIdWithNode> domainWithNodeCache = CacheBuilder.newBuilder().maximumSize(10_000).build();
+    private final Cache<Integer, EdgeDomain> domainNameCache = CacheBuilder.newBuilder().maximumSize(10_000).build();
+    private final Cache<String, List<DomainWithNode>> siblingsCache = CacheBuilder.newBuilder().maximumSize(10_000).build();

    @Inject
    public DbDomainQueries(HikariDataSource dataSource)
@@ -28,26 +33,59 @@ public class DbDomainQueries {
    }


-    public Integer getDomainId(EdgeDomain domain) {
-        try (var connection = dataSource.getConnection()) {
-
+    public Integer getDomainId(EdgeDomain domain) throws NoSuchElementException {
+        try {
            return domainIdCache.get(domain, () -> {
-                try (var stmt = connection.prepareStatement("SELECT ID FROM EC_DOMAIN WHERE DOMAIN_NAME=?")) {
+                try (var connection = dataSource.getConnection();
+                     var stmt = connection.prepareStatement("SELECT ID FROM EC_DOMAIN WHERE DOMAIN_NAME=?")) {
+
                    stmt.setString(1, domain.toString());
                    var rsp = stmt.executeQuery();
                    if (rsp.next()) {
                        return rsp.getInt(1);
                    }
                }
+                catch (SQLException ex) {
+                    throw new RuntimeException(ex);
+                }
+
                throw new NoSuchElementException();
            });
        }
+        catch (UncheckedExecutionException ex) {
+            throw new NoSuchElementException();
+        }
        catch (ExecutionException ex) {
            throw new RuntimeException(ex.getCause());
        }
+    }
+
+
+    public DomainIdWithNode getDomainIdWithNode(EdgeDomain domain) throws NoSuchElementException {
+        try {
+            return domainWithNodeCache.get(domain, () -> {
+                try (var connection = dataSource.getConnection();
+                     var stmt = connection.prepareStatement("SELECT ID, NODE_AFFINITY FROM EC_DOMAIN WHERE DOMAIN_NAME=?")) {
+
+                    stmt.setString(1, domain.toString());
+                    var rsp = stmt.executeQuery();
+                    if (rsp.next()) {
+                        return new DomainIdWithNode(rsp.getInt(1), rsp.getInt(2));
+                    }
+                }
                catch (SQLException ex) {
                    throw new RuntimeException(ex);
                }
+
+                throw new NoSuchElementException();
+            });
+        }
+        catch (UncheckedExecutionException ex) {
+            throw new NoSuchElementException();
+        }
+        catch (ExecutionException ex) {
+            throw new RuntimeException(ex.getCause());
+        }
    }

    public OptionalInt tryGetDomainId(EdgeDomain domain) {
@@ -80,22 +118,62 @@ public class DbDomainQueries {
    }

    public Optional<EdgeDomain> getDomain(int id) {
-        try (var connection = dataSource.getConnection()) {

+        EdgeDomain existing = domainNameCache.getIfPresent(id);
+        if (existing != null) {
+            return Optional.of(existing);
+        }
+
+        try (var connection = dataSource.getConnection()) {
            try (var stmt = connection.prepareStatement("SELECT DOMAIN_NAME FROM EC_DOMAIN WHERE ID=?")) {
                stmt.setInt(1, id);
                var rsp = stmt.executeQuery();
                if (rsp.next()) {
-                    return Optional.of(new EdgeDomain(rsp.getString(1)));
+                    var val = new EdgeDomain(rsp.getString(1));
+                    domainNameCache.put(id, val);
+                    return Optional.of(val);
                }
                return Optional.empty();
            }
        }
-        catch (UncheckedExecutionException ex) {
-            throw new RuntimeException(ex.getCause());
-        }
        catch (SQLException ex) {
            throw new RuntimeException(ex);
        }
    }
+
+    public List<DomainWithNode> otherSubdomains(EdgeDomain domain, int cnt) throws ExecutionException {
+        String topDomain = domain.topDomain;
+
+        return siblingsCache.get(topDomain, () -> {
+            List<DomainWithNode> ret = new ArrayList<>();
+
+            try (var conn = dataSource.getConnection();
+                 var stmt = conn.prepareStatement("SELECT DOMAIN_NAME, NODE_AFFINITY FROM EC_DOMAIN WHERE DOMAIN_TOP = ? LIMIT ?")) {
+                stmt.setString(1, topDomain);
+                stmt.setInt(2, cnt);
+
+                var rs = stmt.executeQuery();
+                while (rs.next()) {
+                    var sibling = new EdgeDomain(rs.getString(1));
+
+                    if (sibling.equals(domain))
+                        continue;
+
+                    ret.add(new DomainWithNode(sibling, rs.getInt(2)));
+                }
+            } catch (SQLException e) {
+                logger.error("Failed to get domain neighbors");
+            }
+            return ret;
+        });
+
+    }
+
+    public record DomainWithNode (EdgeDomain domain, int nodeAffinity) {
+        public boolean isIndexed() {
+            return nodeAffinity > 0;
+        }
+    }
+
+    public record DomainIdWithNode (int domainId, int nodeAffinity) { }
 }
--- a/code/common/db/java/nu/marginalia/db/DbDomainStatsExportMultitool.java
+++ b/code/common/db/java/nu/marginalia/db/DbDomainStatsExportMultitool.java
@@ -1,118 +0,0 @@
-package nu.marginalia.db;
-
-import com.zaxxer.hikari.HikariDataSource;
-
-import java.sql.Connection;
-import java.sql.PreparedStatement;
-import java.sql.SQLException;
-import java.util.ArrayList;
-import java.util.List;
-import java.util.OptionalInt;
-
-/** Class used in exporting data.  This is intended to be used for a brief time
- * and then discarded, not kept around as a service.
- */
-public class DbDomainStatsExportMultitool implements AutoCloseable {
-    private final Connection connection;
-    private final int nodeId;
-    private final PreparedStatement knownUrlsQuery;
-    private final PreparedStatement visitedUrlsQuery;
-    private final PreparedStatement goodUrlsQuery;
-    private final PreparedStatement domainNameToId;
-
-    private final PreparedStatement allDomainsQuery;
-    private final PreparedStatement crawlQueueDomains;
-    private final PreparedStatement indexedDomainsQuery;
-
-    public DbDomainStatsExportMultitool(HikariDataSource dataSource, int nodeId) throws SQLException {
-        this.connection = dataSource.getConnection();
-        this.nodeId = nodeId;
-
-        knownUrlsQuery = connection.prepareStatement("""
-                SELECT KNOWN_URLS
-                FROM EC_DOMAIN INNER JOIN DOMAIN_METADATA
-                    ON EC_DOMAIN.ID=DOMAIN_METADATA.ID
-                WHERE DOMAIN_NAME=?
-                """);
-        visitedUrlsQuery = connection.prepareStatement("""
-                SELECT VISITED_URLS
-                FROM EC_DOMAIN INNER JOIN DOMAIN_METADATA
-                    ON EC_DOMAIN.ID=DOMAIN_METADATA.ID
-                WHERE DOMAIN_NAME=?
-                """);
-        goodUrlsQuery = connection.prepareStatement("""
-                SELECT GOOD_URLS
-                FROM EC_DOMAIN INNER JOIN DOMAIN_METADATA
-                    ON EC_DOMAIN.ID=DOMAIN_METADATA.ID
-                WHERE DOMAIN_NAME=?
-                """);
-        domainNameToId = connection.prepareStatement("""
-                SELECT ID
-                FROM EC_DOMAIN
-                WHERE DOMAIN_NAME=?
-                """);
-        allDomainsQuery = connection.prepareStatement("""
-                SELECT DOMAIN_NAME
-                FROM EC_DOMAIN
-                """);
-        crawlQueueDomains = connection.prepareStatement("""
-                SELECT DOMAIN_NAME
-                FROM CRAWL_QUEUE
-                """);
-        indexedDomainsQuery = connection.prepareStatement("""
-                SELECT DOMAIN_NAME
-                FROM EC_DOMAIN
-                WHERE INDEXED > 0
-                """);
-    }
-
-    public OptionalInt getVisitedUrls(String domainName) throws SQLException {
-        return executeNameToIntQuery(domainName, visitedUrlsQuery);
-    }
-
-    public OptionalInt getDomainId(String domainName) throws SQLException {
-        return executeNameToIntQuery(domainName, domainNameToId);
-    }
-
-    public List<String> getCrawlQueueDomains() throws SQLException {
-        return executeListQuery(crawlQueueDomains, 100);
-    }
-    public List<String> getAllIndexedDomains() throws SQLException {
-        return executeListQuery(indexedDomainsQuery, 100_000);
-    }
-
-    private OptionalInt executeNameToIntQuery(String domainName, PreparedStatement statement)
-            throws SQLException {
-        statement.setString(1, domainName);
-        var rs = statement.executeQuery();
-
-        if (rs.next()) {
-            return OptionalInt.of(rs.getInt(1));
-        }
-
-        return OptionalInt.empty();
-    }
-
-    private List<String> executeListQuery(PreparedStatement statement, int sizeHint) throws SQLException {
-        List<String> ret = new ArrayList<>(sizeHint);
-
-        var rs = statement.executeQuery();
-
-        while (rs.next()) {
-            ret.add(rs.getString(1));
-        }
-
-        return ret;
-    }
-
-    @Override
-    public void close() throws SQLException {
-        knownUrlsQuery.close();
-        goodUrlsQuery.close();
-        visitedUrlsQuery.close();
-        allDomainsQuery.close();
-        crawlQueueDomains.close();
-        domainNameToId.close();
-        connection.close();
-    }
-}
--- a/code/common/db/resources/db/migration/V25_01_0_000__nsfw_domains.sql
+++ b/code/common/db/resources/db/migration/V25_01_0_000__nsfw_domains.sql
@@ -0,0 +1,5 @@
+CREATE TABLE IF NOT EXISTS WMSA_prod.NSFW_DOMAINS (
+    ID INT NOT NULL AUTO_INCREMENT,
+    TIER INT NOT NULL,
+    PRIMARY KEY (ID)
+);
--- a/code/common/db/resources/db/migration/V25_01_0_001__ping_domains.sql
+++ b/code/common/db/resources/db/migration/V25_01_0_001__ping_domains.sql
@@ -0,0 +1,213 @@
+
+-- Create metadata tables for domain ping status and security information
+
+-- These are not ICMP pings, but rather HTTP(S) pings to check the availability and security
+-- of web servers associated with domains, to assess uptime and changes in security configurations
+-- indicating ownership changes or security issues.
+
+-- Note: DOMAIN_ID and NODE_ID are used to identify the domain and the node that performed the ping.
+-- These are strictly speaking foreign keys to the EC_DOMAIN table, but as it
+-- is strictly append-only, we do not need to enforce foreign key constraints.
+
+CREATE TABLE IF NOT EXISTS DOMAIN_AVAILABILITY_INFORMATION (
+    DOMAIN_ID INT NOT NULL PRIMARY KEY,
+    NODE_ID INT NOT NULL,
+
+    SERVER_AVAILABLE BOOLEAN NOT NULL,  -- Indicates if the server is available (true) or not (false)
+    SERVER_IP VARBINARY(16),            -- IP address of the server (IPv4 or IPv6)
+    SERVER_IP_ASN INTEGER,              -- Autonomous System number
+
+    DATA_HASH BIGINT,                   -- Hash of the data for integrity checks
+    SECURITY_CONFIG_HASH BIGINT,        -- Hash of the security configuration for integrity checks
+
+    HTTP_SCHEMA ENUM('HTTP', 'HTTPS'),  -- HTTP or HTTPS protocol used
+    HTTP_ETAG VARCHAR(255),             -- ETag of the resource as per HTTP headers
+    HTTP_LAST_MODIFIED VARCHAR(255),    -- Last modified date of the resource as per HTTP headers
+    HTTP_STATUS INT,                    -- HTTP status code (e.g., 200, 404, etc.)
+    HTTP_LOCATION VARCHAR(255),         -- If the server redirects, this is the location of the redirect
+    HTTP_RESPONSE_TIME_MS SMALLINT UNSIGNED, -- Response time in milliseconds
+
+    ERROR_CLASSIFICATION ENUM('NONE', 'TIMEOUT', 'SSL_ERROR', 'DNS_ERROR', 'CONNECTION_ERROR', 'HTTP_CLIENT_ERROR', 'HTTP_SERVER_ERROR', 'UNKNOWN'), -- Classification of the error if the server is not available
+    ERROR_MESSAGE VARCHAR(255),         -- Error message if the server is not available
+
+    TS_LAST_PING TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, -- Timestamp of the last ping
+    TS_LAST_AVAILABLE TIMESTAMP,        -- Timestamp of the last time the server was available
+    TS_LAST_ERROR TIMESTAMP,             -- Timestamp of the last error encountered
+
+    NEXT_SCHEDULED_UPDATE TIMESTAMP NOT NULL,
+    BACKOFF_CONSECUTIVE_FAILURES INT NOT NULL DEFAULT 0, -- Number of consecutive failures to ping the server
+    BACKOFF_FETCH_INTERVAL INT NOT NULL DEFAULT 60 -- Interval in seconds for the next scheduled ping
+) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin;
+
+CREATE INDEX IF NOT EXISTS DOMAIN_AVAILABILITY_INFORMATION__NODE_ID__DOMAIN_ID_IDX ON DOMAIN_AVAILABILITY_INFORMATION (NODE_ID, DOMAIN_ID);
+CREATE INDEX IF NOT EXISTS DOMAIN_AVAILABILITY_INFORMATION__NEXT_SCHEDULED_UPDATE_IDX ON DOMAIN_AVAILABILITY_INFORMATION (NODE_ID, NEXT_SCHEDULED_UPDATE);
+
+
+
+CREATE TABLE IF NOT EXISTS DOMAIN_SECURITY_INFORMATION (
+    DOMAIN_ID INT NOT NULL PRIMARY KEY,
+    NODE_ID INT NOT NULL,
+
+    ASN INTEGER,                     -- Autonomous System Number (ASN) of the server
+    HTTP_SCHEMA ENUM('HTTP', 'HTTPS'),  -- HTTP or HTTPS protocol used
+    HTTP_VERSION VARCHAR(10),           -- HTTP version used (e.g., HTTP/1.1, HTTP/2)
+    HTTP_COMPRESSION VARCHAR(50),       -- Compression method used (e.g., gzip, deflate, br)
+    HTTP_CACHE_CONTROL TEXT,            -- Cache control directives from HTTP headers
+
+    SSL_CERT_NOT_BEFORE TIMESTAMP,         -- Valid from date (usually same as issued)
+    SSL_CERT_NOT_AFTER TIMESTAMP,          -- Valid until date (usually same as expires)
+
+    SSL_CERT_ISSUER VARCHAR(255),         -- CA that issued the cert
+    SSL_CERT_SUBJECT VARCHAR(255),        -- Certificate subject/CN
+
+    SSL_CERT_PUBLIC_KEY_HASH BINARY(32),     -- SHA-256 hash of the public key
+    SSL_CERT_SERIAL_NUMBER VARCHAR(100),     -- Unique cert serial number
+    SSL_CERT_FINGERPRINT_SHA256 BINARY(32),  -- SHA-256 fingerprint for exact identification
+    SSL_CERT_SAN TEXT,                       -- Subject Alternative Names (JSON array)
+    SSL_CERT_WILDCARD BOOLEAN,               -- Wildcard certificate (*.example.com)
+
+    SSL_PROTOCOL VARCHAR(20),             -- TLS 1.2, TLS 1.3, etc.
+    SSL_CIPHER_SUITE VARCHAR(100),        -- e.g., TLS_AES_256_GCM_SHA384
+    SSL_KEY_EXCHANGE VARCHAR(50),         -- ECDHE, RSA, etc.
+    SSL_CERTIFICATE_CHAIN_LENGTH TINYINT, -- Number of certs in chain
+
+    SSL_CERTIFICATE_VALID BOOLEAN,        -- Valid cert chain
+
+    HEADER_CORS_ALLOW_ORIGIN TEXT,               -- Could be *, specific domains, or null
+    HEADER_CORS_ALLOW_CREDENTIALS BOOLEAN,       -- Credential handling
+    HEADER_CONTENT_SECURITY_POLICY_HASH INT,     -- CSP header, hash of the policy
+    HEADER_STRICT_TRANSPORT_SECURITY VARCHAR(255), -- HSTS header
+    HEADER_REFERRER_POLICY VARCHAR(50),          -- Referrer handling
+    HEADER_X_FRAME_OPTIONS VARCHAR(50),          -- Clickjacking protection
+    HEADER_X_CONTENT_TYPE_OPTIONS VARCHAR(50),   -- MIME sniffing protection
+    HEADER_X_XSS_PROTECTION VARCHAR(50),         -- XSS protection header
+
+    HEADER_SERVER VARCHAR(255),                 -- Server header (e.g., Apache, Nginx, etc.)
+    HEADER_X_POWERED_BY VARCHAR(255),           -- X-Powered-By header (if present)
+
+    TS_LAST_UPDATE TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP -- Timestamp of the last SSL check
+) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin;
+
+
+CREATE INDEX IF NOT EXISTS DOMAIN_SECURITY_INFORMATION__NODE_ID__DOMAIN_ID_IDX ON DOMAIN_SECURITY_INFORMATION (NODE_ID, DOMAIN_ID);
+
+CREATE TABLE IF NOT EXISTS DOMAIN_SECURITY_EVENTS (
+    CHANGE_ID BIGINT AUTO_INCREMENT PRIMARY KEY, -- Unique identifier for the change
+    DOMAIN_ID INT NOT NULL, -- Domain ID, used as a foreign key to EC_DOMAIN
+    NODE_ID INT NOT NULL,
+
+    TS_CHANGE TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, -- Timestamp of the change
+
+    CHANGE_ASN BOOLEAN  NOT NULL DEFAULT FALSE, -- Indicates if the change is related to ASN (Autonomous System Number)
+    CHANGE_CERTIFICATE_FINGERPRINT BOOLEAN  NOT NULL DEFAULT FALSE, -- Indicates if the change is related to SSL certificate fingerprint
+    CHANGE_CERTIFICATE_PROFILE BOOLEAN  NOT NULL DEFAULT FALSE, -- Indicates if the change is related to SSL certificate profile (e.g., algorithm, exchange)
+    CHANGE_CERTIFICATE_SAN BOOLEAN  NOT NULL DEFAULT FALSE, -- Indicates if the change is related to SSL certificate SAN (Subject Alternative Name)
+    CHANGE_CERTIFICATE_PUBLIC_KEY BOOLEAN  NOT NULL DEFAULT FALSE, -- Indicates if the change is related to SSL certificate public key
+    CHANGE_SECURITY_HEADERS BOOLEAN  NOT NULL DEFAULT FALSE, -- Indicates if the change is related to security headers
+    CHANGE_IP_ADDRESS BOOLEAN  NOT NULL DEFAULT FALSE, -- Indicates if the change is related to IP address
+    CHANGE_SOFTWARE BOOLEAN  NOT NULL DEFAULT FALSE, -- Indicates if the change is related to the generator (e.g., web server software)
+    OLD_CERT_TIME_TO_EXPIRY INT, -- Time to expiry of the old certificate in hours, if applicable
+
+    SECURITY_SIGNATURE_BEFORE BLOB NOT NULL, -- Security signature before the change, gzipped json record
+    SECURITY_SIGNATURE_AFTER BLOB NOT NULL  -- Security signature after the change, gzipped json record
+) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin;
+
+CREATE INDEX IF NOT EXISTS DOMAIN_SECURITY_EVENTS__NODE_ID__DOMAIN_ID_IDX ON DOMAIN_SECURITY_EVENTS (NODE_ID, DOMAIN_ID);
+CREATE INDEX IF NOT EXISTS DOMAIN_SECURITY_EVENTS__TS_CHANGE_IDX ON DOMAIN_SECURITY_EVENTS (TS_CHANGE);
+
+CREATE TABLE IF NOT EXISTS DOMAIN_AVAILABILITY_EVENTS (
+    DOMAIN_ID INT NOT NULL,
+    NODE_ID INT NOT NULL,
+
+    AVAILABLE BOOLEAN NOT NULL, -- True if the service is available, false if it is not
+    OUTAGE_TYPE ENUM('NONE', 'TIMEOUT', 'SSL_ERROR', 'DNS_ERROR', 'CONNECTION_ERROR', 'HTTP_CLIENT_ERROR', 'HTTP_SERVER_ERROR', 'UNKNOWN') NOT NULL,
+    HTTP_STATUS_CODE INT, -- HTTP status code if available (e.g., 200, 404, etc.)
+    ERROR_MESSAGE VARCHAR(255),       -- Specific error details
+
+    TS_CHANGE TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP, -- Timestamp of the last update
+
+    AVAILABILITY_RECORD_ID BIGINT AUTO_INCREMENT,
+    P_KEY_MONTH TINYINT NOT NULL DEFAULT MONTH(TS_CHANGE), -- Month of the change for partitioning
+    PRIMARY KEY (AVAILABILITY_RECORD_ID, P_KEY_MONTH)
+)
+CHARACTER SET utf8mb4 COLLATE utf8mb4_bin
+PARTITION BY RANGE (P_KEY_MONTH) (
+    PARTITION p0 VALUES LESS THAN (1),  -- January
+    PARTITION p1 VALUES LESS THAN (2),  -- February
+    PARTITION p2 VALUES LESS THAN (3),  -- March
+    PARTITION p3 VALUES LESS THAN (4),  -- April
+    PARTITION p4 VALUES LESS THAN (5),  -- May
+    PARTITION p5 VALUES LESS THAN (6),  -- June
+    PARTITION p6 VALUES LESS THAN (7),  -- July
+    PARTITION p7 VALUES LESS THAN (8),  -- August
+    PARTITION p8 VALUES LESS THAN (9),  -- September
+    PARTITION p9 VALUES LESS THAN (10), -- October
+    PARTITION p10 VALUES LESS THAN (11), -- November
+    PARTITION p11 VALUES LESS THAN (12)  -- December
+);
+
+CREATE INDEX DOMAIN_AVAILABILITY_EVENTS__DOMAIN_ID_TS_IDX ON DOMAIN_AVAILABILITY_EVENTS (DOMAIN_ID, TS_CHANGE);
+CREATE INDEX DOMAIN_AVAILABILITY_EVENTS__TS_CHANGE_IDX ON DOMAIN_AVAILABILITY_EVENTS (TS_CHANGE);
+
+CREATE TABLE IF NOT EXISTS DOMAIN_DNS_INFORMATION (
+    DNS_ROOT_DOMAIN_ID INT AUTO_INCREMENT PRIMARY KEY,
+    ROOT_DOMAIN_NAME VARCHAR(255) NOT NULL UNIQUE,
+    NODE_AFFINITY INT NOT NULL,              -- Node ID that performs the DNS check, assign randomly across nodes
+
+    DNS_A_RECORDS TEXT,                      -- JSON array of IPv4 addresses
+    DNS_AAAA_RECORDS TEXT,                   -- JSON array of IPv6 addresses
+    DNS_CNAME_RECORD VARCHAR(255),           -- Canonical name (if applicable)
+    DNS_MX_RECORDS TEXT,                     -- JSON array of mail exchange records
+    DNS_CAA_RECORDS TEXT,                    -- Certificate Authority Authorization
+    DNS_TXT_RECORDS TEXT,                    -- TXT records (SPF, DKIM, verification, etc.)
+    DNS_NS_RECORDS TEXT,                     -- Name servers (JSON array)
+    DNS_SOA_RECORD TEXT,                     -- Start of Authority (JSON object)
+
+    TS_LAST_DNS_CHECK TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
+    TS_NEXT_DNS_CHECK TIMESTAMP NOT NULL,
+    DNS_CHECK_PRIORITY TINYINT DEFAULT 0    -- Priority of the DNS check, in case we want to schedule a refresh sooner
+) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
+
+CREATE INDEX DOMAIN_DNS_INFORMATION__PRIORITY_NEXT_CHECK_IDX ON DOMAIN_DNS_INFORMATION (NODE_AFFINITY, DNS_CHECK_PRIORITY DESC, TS_NEXT_DNS_CHECK);
+
+CREATE TABLE IF NOT EXISTS DOMAIN_DNS_EVENTS (
+     DNS_ROOT_DOMAIN_ID INT NOT NULL,
+     NODE_ID INT NOT NULL,
+
+     TS_CHANGE TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
+
+-- DNS change type flags
+     CHANGE_A_RECORDS BOOLEAN NOT NULL DEFAULT FALSE,        -- IPv4 address changes
+     CHANGE_AAAA_RECORDS BOOLEAN NOT NULL DEFAULT FALSE,     -- IPv6 address changes
+     CHANGE_CNAME BOOLEAN NOT NULL DEFAULT FALSE,            -- CNAME changes
+     CHANGE_MX_RECORDS BOOLEAN NOT NULL DEFAULT FALSE,       -- Mail server changes
+     CHANGE_CAA_RECORDS BOOLEAN NOT NULL DEFAULT FALSE,      -- Certificate authority changes
+     CHANGE_TXT_RECORDS BOOLEAN NOT NULL DEFAULT FALSE,      -- TXT record changes (SPF, DKIM, etc.)
+     CHANGE_NS_RECORDS BOOLEAN NOT NULL DEFAULT FALSE,       -- Name server changes (big red flag!)
+     CHANGE_SOA_RECORD BOOLEAN NOT NULL DEFAULT FALSE,       -- Start of Authority changes
+
+     DNS_SIGNATURE_BEFORE BLOB NOT NULL,  -- Compressed JSON snapshot of DNS records before change
+     DNS_SIGNATURE_AFTER BLOB NOT NULL,    -- Compressed JSON snapshot of DNS records after change
+
+     DNS_EVENT_ID BIGINT AUTO_INCREMENT,
+     P_KEY_MONTH TINYINT NOT NULL DEFAULT MONTH(TS_CHANGE), -- Month of the change for partitioning
+     PRIMARY KEY (DNS_EVENT_ID, P_KEY_MONTH)
+)
+CHARACTER SET utf8mb4 COLLATE utf8mb4_bin
+PARTITION BY RANGE (P_KEY_MONTH) (
+    PARTITION p0 VALUES LESS THAN (1),  -- January
+    PARTITION p1 VALUES LESS THAN (2),  -- February
+    PARTITION p2 VALUES LESS THAN (3),  -- March
+    PARTITION p3 VALUES LESS THAN (4),  -- April
+    PARTITION p4 VALUES LESS THAN (5),  -- May
+    PARTITION p5 VALUES LESS THAN (6),  -- June
+    PARTITION p6 VALUES LESS THAN (7),  -- July
+    PARTITION p7 VALUES LESS THAN (8),  -- August
+    PARTITION p8 VALUES LESS THAN (9),  -- September
+    PARTITION p9 VALUES LESS THAN (10), -- October
+    PARTITION p10 VALUES LESS THAN (11), -- November
+    PARTITION p11 VALUES LESS THAN (12)  -- December
+);
+
+CREATE INDEX DOMAIN_DNS_EVENTS__DNS_ROOT_DOMAIN_ID_TS_IDX ON DOMAIN_DNS_EVENTS (DNS_ROOT_DOMAIN_ID, TS_CHANGE);
+CREATE INDEX DOMAIN_DNS_EVENTS__TS_CHANGE_IDX ON DOMAIN_DNS_EVENTS (TS_CHANGE);
--- a/code/common/model/java/nu/marginalia/model/DocumentFormat.java
+++ b/code/common/model/java/nu/marginalia/model/DocumentFormat.java
@@ -0,0 +1,24 @@
+package nu.marginalia.model;
+
+public enum DocumentFormat {
+    PLAIN(0, 1, "text"),
+    PDF(0, 1, "pdf"),
+    UNKNOWN(0, 1, "???"),
+    HTML123(0, 1, "html"),
+    HTML4(-0.1, 1.05, "html"),
+    XHTML(-0.1, 1.05, "html"),
+    HTML5(0.5, 1.1, "html");
+
+    /** Used to tune quality score */
+    public final double offset;
+    /** Used to tune quality score */
+    public final double scale;
+    public final String shortFormat;
+
+    DocumentFormat(double offset, double scale, String shortFormat) {
+        this.offset = offset;
+        this.scale = scale;
+        this.shortFormat = shortFormat;
+    }
+
+}
--- a/code/common/model/java/nu/marginalia/model/EdgeDomain.java
+++ b/code/common/model/java/nu/marginalia/model/EdgeDomain.java
@@ -14,7 +14,7 @@ public class EdgeDomain implements Serializable {
    @Nonnull
    public final String topDomain;

-    public EdgeDomain(String host) {
+    public EdgeDomain(@Nonnull String host) {
        Objects.requireNonNull(host, "domain name must not be null");

        host = host.toLowerCase();
@@ -61,6 +61,10 @@ public class EdgeDomain implements Serializable {
        this.topDomain = topDomain;
    }

+    public static String getTopDomain(String host) {
+        return new EdgeDomain(host).topDomain;
+    }
+
    private boolean looksLikeGovTld(String host) {
        if (host.length() < 8)
            return false;
@@ -108,32 +112,6 @@ public class EdgeDomain implements Serializable {
        return topDomain;
    }

-    public String getDomainKey() {
-        int cutPoint = topDomain.indexOf('.');
-        if (cutPoint < 0) {
-            return topDomain;
-        }
-        return topDomain.substring(0, cutPoint).toLowerCase();
-    }
-
-    public String getLongDomainKey() {
-        StringBuilder ret = new StringBuilder();
-
-        int cutPoint = topDomain.indexOf('.');
-        if (cutPoint < 0) {
-            ret.append(topDomain);
-        } else {
-            ret.append(topDomain, 0, cutPoint);
-        }
-
-        if (!subDomain.isEmpty() && !"www".equals(subDomain)) {
-            ret.append(":");
-            ret.append(subDomain);
-        }
-
-        return ret.toString().toLowerCase();
-    }
-
    /** If possible, try to provide an alias domain,
     * i.e. a domain name that is very likely to link to this one
     * */
--- a/code/common/model/java/nu/marginalia/model/EdgeUrl.java
+++ b/code/common/model/java/nu/marginalia/model/EdgeUrl.java
@@ -1,16 +1,14 @@
 package nu.marginalia.model;

 import nu.marginalia.util.QueryParams;
+import org.apache.commons.lang3.StringUtils;

 import javax.annotation.Nullable;
 import java.io.Serializable;
-import java.net.MalformedURLException;
-import java.net.URI;
-import java.net.URISyntaxException;
-import java.net.URL;
+import java.net.*;
+import java.nio.charset.StandardCharsets;
 import java.util.Objects;
 import java.util.Optional;
-import java.util.regex.Pattern;

 public class EdgeUrl implements Serializable {
    public final String proto;
@@ -33,7 +31,7 @@ public class EdgeUrl implements Serializable {

    private static URI parseURI(String url) throws URISyntaxException {
        try {
-            return new URI(urlencodeFixer(url));
+            return EdgeUriFactory.parseURILenient(url);
        } catch (URISyntaxException ex) {
            throw new URISyntaxException("Failed to parse URI '" + url + "'", ex.getMessage());
        }
@@ -51,58 +49,6 @@ public class EdgeUrl implements Serializable {
        }
    }

-    private static Pattern badCharPattern = Pattern.compile("[ \t\n\"<>\\[\\]()',|]");
-
-    /* Java's URI parser is a bit too strict in throwing exceptions when there's an error.
-
-       Here on the Internet, standards are like the picture on the box of the frozen pizza,
-       and what you get is more like what's on the inside, we try to patch things instead,
-       just give it a best-effort attempt att cleaning out broken or unnecessary constructions
-       like bad or missing URLEncoding
-     */
-    public static String urlencodeFixer(String url) throws URISyntaxException {
-        var s = new StringBuilder();
-        String goodChars = "&.?:/-;+$#";
-        String hexChars = "0123456789abcdefABCDEF";
-
-        int pathIdx = findPathIdx(url);
-        if (pathIdx < 0) { // url looks like http://marginalia.nu
-            return url + "/";
-        }
-        s.append(url, 0, pathIdx);
-
-        // We don't want the fragment, and multiple fragments breaks the Java URIParser for some reason
-        int end = url.indexOf("#");
-        if (end < 0) end = url.length();
-
-        for (int i = pathIdx; i < end; i++) {
-            int c = url.charAt(i);
-
-            if (goodChars.indexOf(c) >= 0 || (c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z') || (c >= '0' && c <= '9')) {
-                s.appendCodePoint(c);
-            } else if (c == '%' && i + 2 < end) {
-                int cn = url.charAt(i + 1);
-                int cnn = url.charAt(i + 2);
-                if (hexChars.indexOf(cn) >= 0 && hexChars.indexOf(cnn) >= 0) {
-                    s.appendCodePoint(c);
-                } else {
-                    s.append("%25");
-                }
-            } else {
-                s.append(String.format("%%%02X", c));
-            }
-        }
-
-        return s.toString();
-    }
-
-    private static int findPathIdx(String url) throws URISyntaxException {
-        int colonIdx = url.indexOf(':');
-        if (colonIdx < 0 || colonIdx + 2 >= url.length()) {
-            throw new URISyntaxException(url, "Lacking protocol");
-        }
-        return url.indexOf('/', colonIdx + 2);
-    }

    public EdgeUrl(URI URI) {
        try {
@@ -166,11 +112,32 @@ public class EdgeUrl implements Serializable {
            sb.append(port);
        }

+        EdgeUriFactory.urlencodePath(sb, path);
+
+        if (param != null) {
+            EdgeUriFactory.urlencodeQuery(sb, param);
+        }
+
+        return sb.toString();
+    }
+
+
+    public String toDisplayString() {
+        StringBuilder sb = new StringBuilder(256);
+
+        sb.append(proto);
+        sb.append("://");
+        sb.append(domain);
+
+        if (port != null) {
+            sb.append(':');
+            sb.append(port);
+        }
+
        sb.append(path);

        if (param != null) {
-            sb.append('?');
-            sb.append(param);
+            sb.append('?').append(param);
        }

        return sb.toString();
@@ -247,3 +214,244 @@ public class EdgeUrl implements Serializable {
    }

 }
+
+class EdgeUriFactory {
+    public static URI parseURILenient(String url) throws URISyntaxException {
+
+        if (shouldOmitUrlencodeRepair(url)) {
+            try {
+                return new URI(url);
+            }
+            catch (URISyntaxException ex) {
+                // ignore and run the lenient parser
+            }
+        }
+
+        var s = new StringBuilder(url.length()+8);
+
+        int pathIdx = findPathIdx(url);
+        if (pathIdx < 0) { // url looks like http://marginalia.nu
+            return new URI(url + "/");
+        }
+        s.append(url, 0, pathIdx);
+
+        // We don't want the fragment, and multiple fragments breaks the Java URIParser for some reason
+        int end = url.indexOf("#");
+        if (end < 0) end = url.length();
+
+        int queryIdx = url.indexOf('?');
+        if (queryIdx < 0) queryIdx = end;
+
+        urlencodePath(s, url.substring(pathIdx, queryIdx));
+        if (queryIdx < end) {
+            urlencodeQuery(s, url.substring(queryIdx + 1, end));
+        }
+        return new URI(s.toString());
+    }
+
+    /** Break apart the path element of an URI into its components, and then
+     * urlencode any component that needs it, and recombine it into a single
+     * path element again.
+     */
+    public static void urlencodePath(StringBuilder sb, String path) {
+        if (path == null || path.isEmpty()) {
+            return;
+        }
+
+        String[] pathParts = StringUtils.split(path, '/');
+        if (pathParts.length == 0) {
+            sb.append('/');
+            return;
+        }
+
+        boolean shouldUrlEncode = false;
+        for (String pathPart : pathParts) {
+            if (pathPart.isEmpty()) continue;
+
+            if (needsUrlEncode(pathPart)) {
+                shouldUrlEncode = true;
+                break;
+            }
+        }
+
+        for (String pathPart : pathParts) {
+            if (pathPart.isEmpty()) continue;
+
+            if (shouldUrlEncode) {
+                sb.append('/');
+                sb.append(URLEncoder.encode(pathPart, StandardCharsets.UTF_8).replace("+", "%20"));
+            } else {
+                sb.append('/');
+                sb.append(pathPart);
+            }
+        }
+
+        if (path.endsWith("/")) {
+            sb.append('/');
+        }
+
+    }
+
+    /** Break apart the query element of a URI into its components, and then
+     * urlencode any component that needs it, and recombine it into a single
+     * query element again.
+     */
+    public static void urlencodeQuery(StringBuilder sb, String param) {
+        if (param == null || param.isEmpty()) {
+            return;
+        }
+
+        String[] queryParts = StringUtils.split(param, '&');
+
+        boolean shouldUrlEncode = false;
+        for (String queryPart : queryParts) {
+            if (queryPart.isEmpty()) continue;
+
+            if (needsUrlEncode(queryPart)) {
+                shouldUrlEncode = true;
+                break;
+            }
+        }
+
+        boolean first = true;
+        for (String queryPart : queryParts) {
+            if (queryPart.isEmpty()) continue;
+
+            if (first) {
+                sb.append('?');
+                first = false;
+            } else {
+                sb.append('&');
+            }
+
+            if (shouldUrlEncode) {
+                int idx = queryPart.indexOf('=');
+                if (idx < 0) {
+                    sb.append(URLEncoder.encode(queryPart, StandardCharsets.UTF_8));
+                } else {
+                    sb.append(URLEncoder.encode(queryPart.substring(0, idx), StandardCharsets.UTF_8));
+                    sb.append('=');
+                    sb.append(URLEncoder.encode(queryPart.substring(idx + 1), StandardCharsets.UTF_8));
+                }
+            } else {
+                sb.append(queryPart);
+            }
+        }
+    }
+
+    /** Test if the url element needs URL encoding.
+     * <p></p>
+     * Note we may have been given an already encoded path element,
+     * so we include % and + in the list of good characters
+     */
+    static boolean needsUrlEncode(String urlElement) {
+        for (int i = 0; i < urlElement.length(); i++) {
+            char c = urlElement.charAt(i);
+
+            if (isUrlSafe(c)) continue;
+            if ("+".indexOf(c) >= 0) continue;
+            if (c == '%' && i + 2 < urlElement.length()) {
+                char c1 = urlElement.charAt(i + 1);
+                char c2 = urlElement.charAt(i + 2);
+                if (isHexDigit(c1) && isHexDigit(c2)) {
+                    i += 2;
+                    continue;
+                }
+            }
+
+            return true;
+        }
+
+        return false;
+    }
+
+
+    static boolean isUrlSafe(int c) {
+        if (c >= 'a' && c <= 'z') return true;
+        if (c >= 'A' && c <= 'Z') return true;
+        if (c >= '0' && c <= '9') return true;
+        if (c == '-' || c == '_' || c == '.' || c == '~') return true;
+
+        return false;
+    }
+
+    /** Test if the URL is a valid URL that does not need to be
+     * urlencoded.
+     * <p></p>
+     * This is a very simple heuristic test that does not guarantee
+     * that the URL is valid, but it will identify cases where we
+     * are fairly certain that the URL does not need encoding,
+     * so we can skip a bunch of allocations and string operations
+     * that would otherwise be needed to fix the URL.
+     */
+    static boolean shouldOmitUrlencodeRepair(String url) {
+        int idx = 0;
+        final int len = url.length();
+
+        // Validate the scheme
+        while (idx < len - 2) {
+            char c = url.charAt(idx++);
+            if (c == ':') break;
+            if (!isAsciiAlphabetic(c)) return false;
+        }
+        if (url.charAt(idx++) != '/') return false;
+        if (url.charAt(idx++) != '/') return false;
+
+        // Validate the authority
+        while (idx < len) {
+            char c = url.charAt(idx++);
+            if (c == '/') break;
+            if (c == ':') continue;
+            if (c == '@') continue;
+            if (!isUrlSafe(c)) return false;
+        }
+
+        // Validate the path
+        if (idx >= len) return true;
+
+        while (idx < len) {
+            char c = url.charAt(idx++);
+            if (c == '?') break;
+            if (c == '/') continue;
+            if (c == '#') return true;
+            if (!isUrlSafe(c)) return false;
+        }
+
+        if (idx >= len) return true;
+
+        // Validate the query
+        while (idx < len) {
+            char c = url.charAt(idx++);
+            if (c == '&') continue;
+            if (c == '=') continue;
+            if (c == '#') return true;
+            if (!isUrlSafe(c)) return false;
+        }
+
+        return true;
+    }
+
+
+    private static boolean isAsciiAlphabetic(int c) {
+        return (c >= 'a' && c <= 'f') || (c >= 'A' && c <= 'F');
+    }
+
+    private static boolean isHexDigit(int c) {
+        return (c >= '0' && c <= '9') || (c >= 'a' && c <= 'f') || (c >= 'A' && c <= 'F');
+    }
+
+    /** Find the index of the path element in a URL.
+     * <p></p>
+     * The path element starts after the scheme and authority part of the URL,
+     * which is everything up to and including the first slash after the colon.
+     */
+    private static int findPathIdx(String url) throws URISyntaxException {
+        int colonIdx = url.indexOf(':');
+        if (colonIdx < 0 || colonIdx + 3 >= url.length()) {
+            throw new URISyntaxException(url, "Lacking scheme");
+        }
+        return url.indexOf('/', colonIdx + 3);
+    }
+
+
+}
--- a/code/common/model/java/nu/marginalia/model/crawl/HtmlFeature.java
+++ b/code/common/model/java/nu/marginalia/model/crawl/HtmlFeature.java
@@ -28,6 +28,8 @@ public enum HtmlFeature {

    GA_SPAM("special:gaspam"),

+    PDF("format:pdf"),
+
    /** For fingerprinting and ranking */
    OPENGRAPH("special:opengraph"),
    OPENGRAPH_IMAGE("special:opengraph:image"),
--- a/code/common/model/java/nu/marginalia/model/gson/GsonFactory.java
+++ b/code/common/model/java/nu/marginalia/model/gson/GsonFactory.java
@@ -6,11 +6,20 @@ import nu.marginalia.model.EdgeDomain;
 import nu.marginalia.model.EdgeUrl;

 import java.net.URISyntaxException;
+import java.time.Instant;

 public class GsonFactory {
    public static Gson get() {
        return new GsonBuilder()
                .registerTypeAdapterFactory(RecordTypeAdapterFactory.builder().allowMissingComponentValues().create())
+                .registerTypeAdapter(Instant.class, (JsonSerializer<Instant>) (src, typeOfSrc, context) -> new JsonPrimitive(src.toEpochMilli()))
+                .registerTypeAdapter(Instant.class, (JsonDeserializer<Instant>) (json, typeOfT, context) -> {
+                    if (json.isJsonPrimitive() && json.getAsJsonPrimitive().isNumber()) {
+                        return Instant.ofEpochMilli(json.getAsLong());
+                    } else {
+                        throw new JsonParseException("Expected a number for Instant");
+                    }
+                })
                .registerTypeAdapter(EdgeUrl.class, (JsonSerializer<EdgeUrl>) (src, typeOfSrc, context) -> new JsonPrimitive(src.toString()))
                .registerTypeAdapter(EdgeDomain.class, (JsonSerializer<EdgeDomain>) (src, typeOfSrc, context) -> new JsonPrimitive(src.toString()))
                .registerTypeAdapter(EdgeUrl.class, (JsonDeserializer<EdgeUrl>) (json, typeOfT, context) -> {
--- a/code/common/model/java/nu/marginalia/model/html/HtmlStandard.java
+++ b/code/common/model/java/nu/marginalia/model/html/HtmlStandard.java
@@ -1,22 +0,0 @@
-package nu.marginalia.model.html;
-
-// This class really doesn't belong anywhere, but will squat here for now
-public enum HtmlStandard {
-    PLAIN(0, 1),
-    UNKNOWN(0, 1),
-    HTML123(0, 1),
-    HTML4(-0.1, 1.05),
-    XHTML(-0.1, 1.05),
-    HTML5(0.5, 1.1);
-
-    /** Used to tune quality score */
-    public final double offset;
-    /** Used to tune quality score */
-    public final double scale;
-
-    HtmlStandard(double offset, double scale) {
-        this.offset = offset;
-        this.scale = scale;
-    }
-
-}
--- a/code/common/model/java/nu/marginalia/model/idx/DocumentFlags.java
+++ b/code/common/model/java/nu/marginalia/model/idx/DocumentFlags.java
@@ -9,7 +9,7 @@ public enum DocumentFlags {
    GeneratorForum,
    GeneratorWiki,
    Sideloaded,
-    Unused7,
+    PdfFile,
    Unused8,
    ;

--- a/code/common/model/java/nu/marginalia/util/QueryParams.java
+++ b/code/common/model/java/nu/marginalia/util/QueryParams.java
@@ -83,6 +83,11 @@ public class QueryParams {
        if (path.endsWith("StoryView.py")) { // folklore.org is neat
            return param.startsWith("project=") || param.startsWith("story=");
        }
+
+        // www.perseus.tufts.edu:
+        if (param.startsWith("collection=")) return true;
+        if (param.startsWith("doc=")) return true;
+
        return false;
    }
 }
--- a/code/common/model/test/nu/marginalia/model/EdgeDomainTest.java
+++ b/code/common/model/test/nu/marginalia/model/EdgeDomainTest.java
@@ -8,14 +8,6 @@ import static org.junit.jupiter.api.Assertions.assertEquals;

 class EdgeDomainTest {

-    @Test
-    public void testSkepdic() throws URISyntaxException {
-        var domain = new EdgeUrl("http://www.skepdic.com/astrology.html");
-        assertEquals("skepdic", domain.getDomain().getDomainKey());
-        var domain2 = new EdgeUrl("http://skepdic.com/astrology.html");
-        assertEquals("skepdic", domain2.getDomain().getDomainKey());
-    }
-
    @Test
    public void testHkDomain() throws URISyntaxException {
        var domain = new EdgeUrl("http://l7072i3.l7c.net");
--- a/code/common/model/test/nu/marginalia/model/EdgeUrlTest.java
+++ b/code/common/model/test/nu/marginalia/model/EdgeUrlTest.java
@@ -1,6 +1,6 @@
 package nu.marginalia.model;

-import nu.marginalia.model.EdgeUrl;
+import org.junit.jupiter.api.Assertions;
 import org.junit.jupiter.api.Test;

 import java.net.URISyntaxException;
@@ -21,25 +21,70 @@ class EdgeUrlTest {
                new EdgeUrl("https://memex.marginalia.nu/#here")
        );
    }
+
    @Test
-    public void testParam() throws URISyntaxException {
-        System.out.println(new EdgeUrl("https://memex.marginalia.nu/index.php?id=1").toString());
-        System.out.println(new EdgeUrl("https://memex.marginalia.nu/showthread.php?id=1&count=5&tracking=123").toString());
-    }
-    @Test
-    void urlencodeFixer() throws URISyntaxException {
-        System.out.println(EdgeUrl.urlencodeFixer("https://www.example.com/#heredoc"));
-        System.out.println(EdgeUrl.urlencodeFixer("https://www.example.com/%-sign"));
-        System.out.println(EdgeUrl.urlencodeFixer("https://www.example.com/%22-sign"));
-        System.out.println(EdgeUrl.urlencodeFixer("https://www.example.com/\n \"huh\""));
+    void testUriFromString() throws URISyntaxException {
+        // We test these URLs several times as we perform URLEncode-fixing both when parsing the URL and when
+        // converting it back to a string, we want to ensure there is no changes along the way.
+
+        Assertions.assertEquals("/", EdgeUriFactory.parseURILenient("https://www.example.com/").getPath());
+        Assertions.assertEquals("https://www.example.com/", EdgeUriFactory.parseURILenient("https://www.example.com/").toString());
+        Assertions.assertEquals("https://www.example.com/", new EdgeUrl("https://www.example.com/").toString());
+
+        Assertions.assertEquals("/", EdgeUriFactory.parseURILenient("https://www.example.com/#heredoc").getPath());
+        Assertions.assertEquals("https://www.example.com/", EdgeUriFactory.parseURILenient("https://www.example.com/#heredoc").toString());
+        Assertions.assertEquals("https://www.example.com/", new EdgeUrl("https://www.example.com/#heredoc").toString());
+
+        Assertions.assertEquals("/trailingslash/", EdgeUriFactory.parseURILenient("https://www.example.com/trailingslash/").getPath());
+        Assertions.assertEquals("https://www.example.com/trailingslash/", EdgeUriFactory.parseURILenient("https://www.example.com/trailingslash/").toString());
+        Assertions.assertEquals("https://www.example.com/trailingslash/", new EdgeUrl("https://www.example.com/trailingslash/").toString());
+
+        Assertions.assertEquals("/%-sign", EdgeUriFactory.parseURILenient("https://www.example.com/%-sign").getPath());
+        Assertions.assertEquals("https://www.example.com/%25-sign", EdgeUriFactory.parseURILenient("https://www.example.com/%-sign").toString());
+        Assertions.assertEquals("https://www.example.com/%25-sign", new EdgeUrl("https://www.example.com/%-sign").toString());
+
+        Assertions.assertEquals("/%-sign/\"-sign", EdgeUriFactory.parseURILenient("https://www.example.com//%-sign/\"-sign").getPath());
+        Assertions.assertEquals("https://www.example.com/%25-sign/%22-sign", EdgeUriFactory.parseURILenient("https://www.example.com//%-sign/\"-sign").toString());
+        Assertions.assertEquals("https://www.example.com/%25-sign/%22-sign", new EdgeUrl("https://www.example.com//%-sign/\"-sign").toString());
+
+        Assertions.assertEquals("/\"-sign", EdgeUriFactory.parseURILenient("https://www.example.com/%22-sign").getPath());
+        Assertions.assertEquals("https://www.example.com/%22-sign", EdgeUriFactory.parseURILenient("https://www.example.com/%22-sign").toString());
+        Assertions.assertEquals("https://www.example.com/%22-sign", new EdgeUrl("https://www.example.com/%22-sign").toString());
+
+        Assertions.assertEquals("/\n \"huh\"", EdgeUriFactory.parseURILenient("https://www.example.com/\n \"huh\"").getPath());
+        Assertions.assertEquals("https://www.example.com/%0A%20%22huh%22", EdgeUriFactory.parseURILenient("https://www.example.com/\n \"huh\"").toString());
+        Assertions.assertEquals("https://www.example.com/%0A%20%22huh%22", new EdgeUrl("https://www.example.com/\n \"huh\"").toString());
+
+        Assertions.assertEquals("/wiki/Sámi", EdgeUriFactory.parseURILenient("https://en.wikipedia.org/wiki/Sámi").getPath());
+        Assertions.assertEquals("https://en.wikipedia.org/wiki/S%C3%A1mi", EdgeUriFactory.parseURILenient("https://en.wikipedia.org/wiki/Sámi").toString());
+        Assertions.assertEquals("https://en.wikipedia.org/wiki/S%C3%A1mi", new EdgeUrl("https://en.wikipedia.org/wiki/Sámi").toString());
+
+        Assertions.assertEquals("https://www.prijatelji-zivotinja.hr/index.en.php?id=2301k", new EdgeUrl("https://www.prijatelji-zivotinja.hr/index.en.php?id=2301k").toString());
    }

    @Test
    void testParms() throws URISyntaxException {
-        System.out.println(new EdgeUrl("https://search.marginalia.nu/?id=123"));
-        System.out.println(new EdgeUrl("https://search.marginalia.nu/?t=123"));
-        System.out.println(new EdgeUrl("https://search.marginalia.nu/?v=123"));
-        System.out.println(new EdgeUrl("https://search.marginalia.nu/?m=123"));
-        System.out.println(new EdgeUrl("https://search.marginalia.nu/?follow=123"));
+        Assertions.assertEquals("id=123", new EdgeUrl("https://search.marginalia.nu/?id=123").param);
+        Assertions.assertEquals("https://search.marginalia.nu/?id=123", new EdgeUrl("https://search.marginalia.nu/?id=123").toString());
+
+        Assertions.assertEquals("t=123", new EdgeUrl("https://search.marginalia.nu/?t=123").param);
+        Assertions.assertEquals("https://search.marginalia.nu/?t=123", new EdgeUrl("https://search.marginalia.nu/?t=123").toString());
+
+        Assertions.assertEquals("v=123", new EdgeUrl("https://search.marginalia.nu/?v=123").param);
+        Assertions.assertEquals("https://search.marginalia.nu/?v=123", new EdgeUrl("https://search.marginalia.nu/?v=123").toString());
+
+        Assertions.assertEquals("id=1", new EdgeUrl("https://memex.marginalia.nu/showthread.php?id=1&count=5&tracking=123").param);
+        Assertions.assertEquals("https://memex.marginalia.nu/showthread.php?id=1",
+                new EdgeUrl("https://memex.marginalia.nu/showthread.php?id=1&count=5&tracking=123").toString());
+
+
+        Assertions.assertEquals("id=1&t=5", new EdgeUrl("https://memex.marginalia.nu/shöwthrëad.php?id=1&t=5&tracking=123").param);
+        Assertions.assertEquals("https://memex.marginalia.nu/sh%C3%B6wthr%C3%ABad.php?id=1&t=5", new EdgeUrl("https://memex.marginalia.nu/shöwthrëad.php?id=1&t=5&tracking=123").toString());
+
+        Assertions.assertEquals("id=1&t=5", new EdgeUrl("https://memex.marginalia.nu/shöwthrëad.php?trëaking=123&id=1&t=5&").param);
+        Assertions.assertEquals("https://memex.marginalia.nu/sh%C3%B6wthr%C3%ABad.php?id=1&t=5", new EdgeUrl("https://memex.marginalia.nu/shöwthrëad.php?trëaking=123&id=1&t=5&").toString());
+
+        Assertions.assertNull(new EdgeUrl("https://search.marginalia.nu/?m=123").param);
+        Assertions.assertNull(new EdgeUrl("https://search.marginalia.nu/?follow=123").param);
    }
 }
--- a/code/common/service/build.gradle
+++ b/code/common/service/build.gradle
@@ -42,6 +42,12 @@ dependencies {
    implementation libs.bundles.curator
    implementation libs.bundles.flyway

+    libs.bundles.jooby.get().each {
+        implementation dependencies.create(it) {
+            exclude group: 'org.slf4j'
+        }
+    }
+
    testImplementation libs.bundles.slf4j.test
    implementation libs.bundles.mariadb

--- a/code/common/service/java/nu/marginalia/process/control/ProcessAdHocTaskHeartbeatImpl.java
+++ b/code/common/service/java/nu/marginalia/process/control/ProcessAdHocTaskHeartbeatImpl.java
@@ -59,17 +59,14 @@ public class ProcessAdHocTaskHeartbeatImpl implements AutoCloseable, ProcessAdHo
     */
    @Override
    public void progress(String step, int stepProgress, int stepCount) {
+        int lastProgress = this.progress;
        this.step = step;
-
-
-        // off by one since we calculate the progress based on the number of steps,
-        // and Enum.ordinal() is zero-based (so the 5th step in a 5 step task is 4, not 5; resulting in the
-        // final progress being 80% and not 100%)
-
        this.progress = (int) Math.round(100. * stepProgress / (double) stepCount);

+        if (this.progress / 10 != lastProgress / 10) {
            logger.info("ProcessTask {} progress: {}%", taskBase, progress);
        }
+    }

    /** Wrap a collection to provide heartbeat progress updates as it's iterated through */
    @Override
--- a/code/common/service/java/nu/marginalia/process/control/ProcessEventLog.java
+++ b/code/common/service/java/nu/marginalia/process/control/ProcessEventLog.java
@@ -0,0 +1,59 @@
+package nu.marginalia.process.control;
+
+import com.google.inject.Inject;
+import com.google.inject.Singleton;
+import com.zaxxer.hikari.HikariDataSource;
+import nu.marginalia.process.ProcessConfiguration;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.sql.SQLException;
+import java.util.Objects;
+import java.util.UUID;
+
+@Singleton
+public class ProcessEventLog {
+    private final HikariDataSource dataSource;
+
+    private final Logger logger = LoggerFactory.getLogger(ProcessEventLog.class);
+
+    private final String serviceName;
+    private final UUID instanceUuid;
+    private final String serviceBase;
+
+    @Inject
+    public ProcessEventLog(HikariDataSource dataSource, ProcessConfiguration configuration) {
+        this.dataSource = dataSource;
+
+        this.serviceName = configuration.processName() + ":" + configuration.node();
+        this.instanceUuid = configuration.instanceUuid();
+        this.serviceBase = configuration.processName();
+
+        logger.info("Starting service {} instance {}", serviceName, instanceUuid);
+
+        logEvent("PCS-START", serviceName);
+    }
+
+    public void logEvent(Class<?> type, String message) {
+        logEvent(type.getSimpleName(), message);
+    }
+    public void logEvent(String type, String message) {
+
+        try (var conn = dataSource.getConnection();
+             var stmt = conn.prepareStatement("""
+                        INSERT INTO SERVICE_EVENTLOG(SERVICE_NAME, SERVICE_BASE, INSTANCE, EVENT_TYPE, EVENT_MESSAGE)
+                        VALUES (?, ?, ?, ?, ?)
+                     """)) {
+            stmt.setString(1, serviceName);
+            stmt.setString(2, serviceBase);
+            stmt.setString(3, instanceUuid.toString());
+            stmt.setString(4, type);
+            stmt.setString(5, Objects.requireNonNull(message, ""));
+
+            stmt.executeUpdate();
+        }
+        catch (SQLException ex) {
+            logger.error("Failed to log event {}:{}", type, message);
+        }
+    }
+}
--- a/code/common/service/java/nu/marginalia/process/log/WorkLog.java
+++ b/code/common/service/java/nu/marginalia/process/log/WorkLog.java
@@ -10,7 +10,9 @@ import java.nio.charset.StandardCharsets;
 import java.nio.file.Files;
 import java.nio.file.Path;
 import java.time.LocalDateTime;
-import java.util.*;
+import java.util.HashSet;
+import java.util.Optional;
+import java.util.Set;
 import java.util.function.Function;

 /** WorkLog is a journal of work done by a process,
@@ -61,6 +63,12 @@ public class WorkLog implements AutoCloseable, Closeable {
        return new WorkLoadIterable<>(logFile, mapper);
    }

+    public static int countEntries(Path crawlerLog) throws IOException{
+        try (var linesStream = Files.lines(crawlerLog)) {
+            return (int) linesStream.filter(WorkLogEntry::isJobId).count();
+        }
+    }
+
    // Use synchro over concurrent set to avoid competing writes
    // - correct is better than fast here, it's sketchy enough to use
    // a PrintWriter
--- a/code/common/service/java/nu/marginalia/service/client/GrpcMultiNodeChannelPool.java
+++ b/code/common/service/java/nu/marginalia/service/client/GrpcMultiNodeChannelPool.java
@@ -7,8 +7,6 @@ import nu.marginalia.service.discovery.property.PartitionTraits;
 import nu.marginalia.service.discovery.property.ServiceEndpoint;
 import nu.marginalia.service.discovery.property.ServiceKey;
 import nu.marginalia.service.discovery.property.ServicePartition;
-import org.slf4j.Logger;
-import org.slf4j.LoggerFactory;

 import java.util.List;
 import java.util.concurrent.CompletableFuture;
@@ -24,7 +22,7 @@ import java.util.function.Function;
 public class GrpcMultiNodeChannelPool<STUB> {
    private final ConcurrentHashMap<Integer, GrpcSingleNodeChannelPool<STUB>> pools =
            new ConcurrentHashMap<>();
-    private static final Logger logger = LoggerFactory.getLogger(GrpcMultiNodeChannelPool.class);
+
    private final ServiceRegistryIf serviceRegistryIf;
    private final ServiceKey<? extends PartitionTraits.Multicast> serviceKey;
    private final Function<ServiceEndpoint.InstanceAddress, ManagedChannel> channelConstructor;
--- a/code/common/service/java/nu/marginalia/service/client/GrpcSingleNodeChannelPool.java
+++ b/code/common/service/java/nu/marginalia/service/client/GrpcSingleNodeChannelPool.java
@@ -10,6 +10,8 @@ import nu.marginalia.service.discovery.property.ServiceKey;
 import org.jetbrains.annotations.NotNull;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
+import org.slf4j.Marker;
+import org.slf4j.MarkerFactory;

 import java.time.Duration;
 import java.util.*;
@@ -26,13 +28,13 @@ import java.util.function.Function;
 public class GrpcSingleNodeChannelPool<STUB> extends ServiceChangeMonitor {
    private final Map<InstanceAddress, ConnectionHolder> channels = new ConcurrentHashMap<>();

+    private final Marker grpcMarker = MarkerFactory.getMarker("GRPC");
    private static final Logger logger = LoggerFactory.getLogger(GrpcSingleNodeChannelPool.class);

    private final ServiceRegistryIf serviceRegistryIf;
    private final Function<InstanceAddress, ManagedChannel> channelConstructor;
    private final Function<ManagedChannel, STUB> stubConstructor;

-
    public GrpcSingleNodeChannelPool(ServiceRegistryIf serviceRegistryIf,
                                     ServiceKey<? extends PartitionTraits.Unicast> serviceKey,
                                     Function<InstanceAddress, ManagedChannel> channelConstructor,
@@ -48,8 +50,6 @@ public class GrpcSingleNodeChannelPool<STUB> extends ServiceChangeMonitor {
        serviceRegistryIf.registerMonitor(this);

        onChange();
-
-        awaitChannel(Duration.ofSeconds(5));
    }


@@ -62,10 +62,10 @@ public class GrpcSingleNodeChannelPool<STUB> extends ServiceChangeMonitor {
        for (var route : Sets.symmetricDifference(oldRoutes, newRoutes)) {
            ConnectionHolder oldChannel;
            if (newRoutes.contains(route)) {
-                logger.info("Adding route {}", route);
+                logger.info(grpcMarker, "Adding route {} => {}", serviceKey, route);
                oldChannel = channels.put(route, new ConnectionHolder(route));
            } else {
-                logger.info("Expelling route {}", route);
+                logger.info(grpcMarker, "Expelling route {} => {}", serviceKey, route);
                oldChannel = channels.remove(route);
            }
            if (oldChannel != null) {
@@ -103,7 +103,7 @@ public class GrpcSingleNodeChannelPool<STUB> extends ServiceChangeMonitor {
            }

            try {
-                logger.info("Creating channel for {}:{}", serviceKey, address);
+                logger.info(grpcMarker, "Creating channel for {} => {}", serviceKey, address);
                value = channelConstructor.apply(address);
                if (channel.compareAndSet(null, value)) {
                    return value;
@@ -114,7 +114,7 @@ public class GrpcSingleNodeChannelPool<STUB> extends ServiceChangeMonitor {
                }
            }
            catch (Exception e) {
-                logger.error("Failed to get channel for " + address, e);
+                logger.error(grpcMarker, "Failed to get channel for " + address, e);
                return null;
            }
        }
@@ -206,7 +206,7 @@ public class GrpcSingleNodeChannelPool<STUB> extends ServiceChangeMonitor {
        }

        for (var e : exceptions) {
-            logger.error("Failed to call service {}", serviceKey, e);
+            logger.error(grpcMarker, "Failed to call service {}", serviceKey, e);
        }

        throw new ServiceNotAvailableException(serviceKey);
--- a/code/common/service/java/nu/marginalia/service/client/ServiceNotAvailableException.java
+++ b/code/common/service/java/nu/marginalia/service/client/ServiceNotAvailableException.java
@@ -4,6 +4,11 @@ import nu.marginalia.service.discovery.property.ServiceKey;

 public class ServiceNotAvailableException extends RuntimeException {
    public ServiceNotAvailableException(ServiceKey<?> key) {
-        super("Service " + key + " not available");
+        super(key.toString());
+    }
+
+    @Override
+    public StackTraceElement[] getStackTrace() { // Suppress stack trace
+        return new StackTraceElement[0];
    }
 }
--- a/code/common/service/java/nu/marginalia/service/control/ServiceAdHocTaskHeartbeatImpl.java
+++ b/code/common/service/java/nu/marginalia/service/control/ServiceAdHocTaskHeartbeatImpl.java
@@ -57,16 +57,13 @@ public class ServiceAdHocTaskHeartbeatImpl implements AutoCloseable, ServiceAdHo
     */
    @Override
    public void progress(String step, int stepProgress, int stepCount) {
+        int lastProgress = this.progress;
        this.step = step;
-
-
-        // off by one since we calculate the progress based on the number of steps,
-        // and Enum.ordinal() is zero-based (so the 5th step in a 5 step task is 4, not 5; resulting in the
-        // final progress being 80% and not 100%)
-
        this.progress = (int) Math.round(100. * stepProgress / (double) stepCount);

-        logger.info("ServiceTask {} progress: {}%", taskBase, progress);
+        if (this.progress / 10 != lastProgress / 10) {
+            logger.info("ProcessTask {} progress: {}%", taskBase, progress);
+        }
    }

    public void shutDown() {
--- a/code/common/service/java/nu/marginalia/service/discovery/ServiceRegistryIf.java
+++ b/code/common/service/java/nu/marginalia/service/discovery/ServiceRegistryIf.java
@@ -1,17 +1,23 @@
 package nu.marginalia.service.discovery;

-import nu.marginalia.service.discovery.monitor.*;
+import com.google.inject.ImplementedBy;
+import nu.marginalia.service.discovery.monitor.ServiceChangeMonitor;
+import nu.marginalia.service.discovery.monitor.ServiceMonitorIf;
 import nu.marginalia.service.discovery.property.ServiceEndpoint;
-import static nu.marginalia.service.discovery.property.ServiceEndpoint.*;
-
 import nu.marginalia.service.discovery.property.ServiceKey;

+import java.util.Collection;
 import java.util.List;
 import java.util.UUID;
+import java.util.function.BiConsumer;
+import java.util.function.Consumer;
+
+import static nu.marginalia.service.discovery.property.ServiceEndpoint.InstanceAddress;

 /** A service registry that allows services to register themselves and
 * be discovered by other services on the network.
 */
+@ImplementedBy(ZkServiceRegistry.class)
 public interface ServiceRegistryIf {
    /**
     * Register a service with the registry.
@@ -57,4 +63,9 @@ public interface ServiceRegistryIf {
     * </ul>
     * */
    void registerMonitor(ServiceMonitorIf monitor) throws Exception;
+
+    void registerProcess(String processName, int nodeId);
+    void deregisterProcess(String processName, int nodeId);
+    void watchProcess(String processName, int nodeId, Consumer<Boolean> callback) throws Exception;
+    void watchProcessAnyNode(String processName, Collection<Integer> nodes, BiConsumer<Boolean, Integer> callback) throws Exception;
 }
--- a/code/common/service/java/nu/marginalia/service/discovery/ZkServiceRegistry.java
+++ b/code/common/service/java/nu/marginalia/service/discovery/ZkServiceRegistry.java
@@ -13,11 +13,10 @@ import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;

 import java.nio.charset.StandardCharsets;
-import java.util.ArrayList;
-import java.util.List;
-import java.util.Random;
-import java.util.UUID;
+import java.util.*;
 import java.util.concurrent.TimeUnit;
+import java.util.function.BiConsumer;
+import java.util.function.Consumer;

 import static nu.marginalia.service.discovery.property.ServiceEndpoint.InstanceAddress;

@@ -256,6 +255,90 @@ public class ZkServiceRegistry implements ServiceRegistryIf {
                .forPath("/running-instances");
    }

+    @Override
+    public void registerProcess(String processName, int nodeId) {
+        String path = "/process-locks/" + processName + "/" + nodeId;
+        try {
+            curatorFramework.create()
+                    .creatingParentsIfNeeded()
+                    .withMode(CreateMode.EPHEMERAL)
+                    .forPath(path);
+            livenessPaths.add(path);
+        }
+        catch (Exception ex) {
+            logger.error("Failed to register process {} on node {}", processName, nodeId, ex);
+        }
+    }
+
+    @Override
+    public void deregisterProcess(String processName, int nodeId) {
+        String path = "/process-locks/" + processName + "/" + nodeId;
+        try {
+            curatorFramework.delete().forPath(path);
+            livenessPaths.remove(path);
+        }
+        catch (Exception ex) {
+            logger.error("Failed to deregister process {} on node {}", processName, nodeId, ex);
+        }
+    }
+
+    @Override
+    public void watchProcess(String processName, int nodeId, Consumer<Boolean> callback) throws Exception {
+        String path = "/process-locks/" + processName + "/" + nodeId;
+
+        // first check if the path exists and call the callback accordingly
+
+        if (curatorFramework.checkExists().forPath(path) != null) {
+            callback.accept(true);
+        }
+        else {
+            callback.accept(false);
+        }
+
+        curatorFramework.watchers().add()
+                .usingWatcher((Watcher) change -> {
+                    Watcher.Event.EventType type = change.getType();
+
+                    if (type == Watcher.Event.EventType.NodeCreated) {
+                        callback.accept(true);
+                    }
+                    if (type == Watcher.Event.EventType.NodeDeleted) {
+                        callback.accept(false);
+                    }
+                })
+                .forPath(path);
+
+    }
+
+    @Override
+    public void watchProcessAnyNode(String processName, Collection<Integer> nodes, BiConsumer<Boolean, Integer> callback) throws Exception {
+
+        for (int node : nodes) {
+            String path = "/process-locks/" + processName + "/" + node;
+
+            // first check if the path exists and call the callback accordingly
+            if (curatorFramework.checkExists().forPath(path) != null) {
+                callback.accept(true, node);
+            }
+            else {
+                callback.accept(false, node);
+            }
+
+            curatorFramework.watchers().add()
+                    .usingWatcher((Watcher) change -> {
+                        Watcher.Event.EventType type = change.getType();
+
+                        if (type == Watcher.Event.EventType.NodeCreated) {
+                            callback.accept(true, node);
+                        }
+                        if (type == Watcher.Event.EventType.NodeDeleted) {
+                            callback.accept(false, node);
+                        }
+                    })
+                    .forPath(path);
+        }
+    }
+
    /* Exposed for tests */
    public synchronized void shutDown() {
        if (stopped)
--- a/code/common/service/java/nu/marginalia/service/discovery/property/ServiceEndpoint.java
+++ b/code/common/service/java/nu/marginalia/service/discovery/property/ServiceEndpoint.java
@@ -48,5 +48,10 @@ public record ServiceEndpoint(String host, int port) {
        public int port() {
            return endpoint.port();
        }
+
+        @Override
+        public String toString() {
+            return endpoint().host() + ":" + endpoint.port() + " [" + instance + "]";
+        }
    }
 }
--- a/code/common/service/java/nu/marginalia/service/discovery/property/ServiceKey.java
+++ b/code/common/service/java/nu/marginalia/service/discovery/property/ServiceKey.java
@@ -48,6 +48,19 @@ public sealed interface ServiceKey<P extends ServicePartition> {
        {
            throw new UnsupportedOperationException();
        }
+
+        @Override
+        public String toString() {
+            final String shortName;
+
+            int periodIndex = name.lastIndexOf('.');
+
+            if (periodIndex >= 0) shortName = name.substring(periodIndex+1);
+            else shortName = name;
+
+            return "rest:" + shortName;
+        }
+
    }
    record Grpc<P extends ServicePartition>(String name, P partition) implements ServiceKey<P> {
        public String baseName() {
@@ -64,6 +77,18 @@ public sealed interface ServiceKey<P extends ServicePartition> {
        {
            return new Grpc<>(name, partition);
        }
+
+        @Override
+        public String toString() {
+            final String shortName;
+
+            int periodIndex = name.lastIndexOf('.');
+
+            if (periodIndex >= 0) shortName = name.substring(periodIndex+1);
+            else shortName = name;
+
+            return "grpc:" + shortName + "[" + partition.identifier() + "]";
+        }
    }

 }
--- a/code/common/service/java/nu/marginalia/service/module/DatabaseModule.java
+++ b/code/common/service/java/nu/marginalia/service/module/DatabaseModule.java
@@ -89,7 +89,7 @@ public class DatabaseModule extends AbstractModule {
            config.addDataSourceProperty("prepStmtCacheSize", "250");
            config.addDataSourceProperty("prepStmtCacheSqlLimit", "2048");

-            config.setMaximumPoolSize(5);
+            config.setMaximumPoolSize(Integer.getInteger("db.poolSize", 5));
            config.setMinimumIdle(2);

            config.setMaxLifetime(Duration.ofMinutes(9).toMillis());
--- a/code/common/service/java/nu/marginalia/service/module/ServiceConfigurationModule.java
+++ b/code/common/service/java/nu/marginalia/service/module/ServiceConfigurationModule.java
@@ -6,6 +6,7 @@ import nu.marginalia.service.ServiceId;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;

+import java.io.IOException;
 import java.net.InetAddress;
 import java.net.NetworkInterface;
 import java.util.Enumeration;
@@ -115,11 +116,12 @@ public class ServiceConfigurationModule extends AbstractModule {
        }
    }

-    public static String getLocalNetworkIP() throws Exception {
+    public static String getLocalNetworkIP() throws IOException {
        Enumeration<NetworkInterface> nets = NetworkInterface.getNetworkInterfaces();

        while (nets.hasMoreElements()) {
            NetworkInterface netif = nets.nextElement();
+            logger.info("Considering network interface {}:  Up? {},  Loopback? {}", netif.getDisplayName(), netif.isUp(), netif.isLoopback());
            if (!netif.isUp() || netif.isLoopback()) {
                continue;
            }
@@ -127,6 +129,7 @@ public class ServiceConfigurationModule extends AbstractModule {
            Enumeration<InetAddress> inetAddresses = netif.getInetAddresses();
            while (inetAddresses.hasMoreElements()) {
                InetAddress addr = inetAddresses.nextElement();
+                logger.info("Considering address {}: SiteLocal? {}, Loopback? {}", addr.getHostAddress(), addr.isSiteLocalAddress(), addr.isLoopbackAddress());
                if (addr.isSiteLocalAddress() && !addr.isLoopbackAddress()) {
                    return addr.getHostAddress();
                }
--- a/code/common/service/java/nu/marginalia/service/server/JoobyService.java
+++ b/code/common/service/java/nu/marginalia/service/server/JoobyService.java
@@ -0,0 +1,187 @@
+package nu.marginalia.service.server;
+
+import io.jooby.*;
+import io.prometheus.client.Counter;
+import nu.marginalia.mq.inbox.MqInboxIf;
+import nu.marginalia.service.client.ServiceNotAvailableException;
+import nu.marginalia.service.discovery.property.ServiceEndpoint;
+import nu.marginalia.service.discovery.property.ServiceKey;
+import nu.marginalia.service.discovery.property.ServicePartition;
+import nu.marginalia.service.module.ServiceConfiguration;
+import nu.marginalia.service.server.jte.JteModule;
+import nu.marginalia.service.server.mq.ServiceMqSubscription;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+import org.slf4j.Marker;
+import org.slf4j.MarkerFactory;
+
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.util.List;
+
+public class JoobyService {
+    private final Logger logger = LoggerFactory.getLogger(getClass());
+
+    // Marker for filtering out sensitive content from the persistent logs
+    private final Marker httpMarker = MarkerFactory.getMarker("HTTP");
+
+    private final Initialization initialization;
+
+    private final static Counter request_counter = Counter.build("wmsa_request_counter", "Request Counter")
+            .labelNames("service", "node")
+            .register();
+    private final static Counter request_counter_good = Counter.build("wmsa_request_counter_good", "Good Requests")
+            .labelNames("service", "node")
+            .register();
+    private final static Counter request_counter_bad = Counter.build("wmsa_request_counter_bad", "Bad Requests")
+            .labelNames("service", "node")
+            .register();
+    private final static Counter request_counter_err = Counter.build("wmsa_request_counter_err", "Error Requests")
+            .labelNames("service", "node")
+            .register();
+    private final String serviceName;
+    private static volatile boolean initialized = false;
+
+    protected final MqInboxIf messageQueueInbox;
+    private final int node;
+    private GrpcServer grpcServer;
+
+    private ServiceConfiguration config;
+    private final List<MvcExtension> joobyServices;
+    private final ServiceEndpoint restEndpoint;
+
+    public JoobyService(BaseServiceParams params,
+                        ServicePartition partition,
+                        List<DiscoverableService> grpcServices,
+                        List<MvcExtension> joobyServices
+    ) throws Exception {
+
+        this.joobyServices = joobyServices;
+        this.initialization = params.initialization;
+        config = params.configuration;
+        node = config.node();
+
+        String inboxName = config.serviceName();
+        logger.info("Inbox name: {}", inboxName);
+
+        var serviceRegistry = params.serviceRegistry;
+
+        restEndpoint = serviceRegistry.registerService(ServiceKey.forRest(config.serviceId(), config.node()),
+                config.instanceUuid(), config.externalAddress());
+
+        var mqInboxFactory = params.messageQueueInboxFactory;
+        messageQueueInbox = mqInboxFactory.createSynchronousInbox(inboxName, config.node(), config.instanceUuid());
+        messageQueueInbox.subscribe(new ServiceMqSubscription(this));
+
+        serviceName = System.getProperty("service-name");
+
+        initialization.addCallback(params.heartbeat::start);
+        initialization.addCallback(messageQueueInbox::start);
+        initialization.addCallback(() -> params.eventLog.logEvent("SVC-INIT", serviceName + ":" + config.node()));
+        initialization.addCallback(() -> serviceRegistry.announceInstance(config.instanceUuid()));
+
+        Thread.setDefaultUncaughtExceptionHandler((t, e) -> {
+            if (e instanceof ServiceNotAvailableException) {
+                // reduce log spam for this common case
+                logger.error("Service not available: {}", e.getMessage());
+            }
+            else {
+                logger.error("Uncaught exception", e);
+            }
+            request_counter_err.labels(serviceName, Integer.toString(node)).inc();
+        });
+
+        if (!initialization.isReady() && ! initialized ) {
+            initialized = true;
+            grpcServer = new GrpcServer(config, serviceRegistry, partition, grpcServices);
+            grpcServer.start();
+        }
+    }
+
+    public void startJooby(Jooby jooby) {
+
+        logger.info("{} Listening to {}:{} ({})", getClass().getSimpleName(),
+                restEndpoint.host(),
+                restEndpoint.port(),
+                config.externalAddress());
+
+        // FIXME:  This won't work outside of docker, may need to submit a PR to jooby to allow classpaths here
+        if (Files.exists(Path.of("/app/resources/jte")) || Files.exists(Path.of("/app/classes/jte-precompiled"))) {
+            jooby.install(new JteModule(Path.of("/app/resources/jte"), Path.of("/app/classes/jte-precompiled")));
+        }
+        if (Files.exists(Path.of("/app/resources/static"))) {
+            jooby.assets("/*", Paths.get("/app/resources/static"));
+        }
+        var options = new ServerOptions();
+        options.setHost(config.bindAddress());
+        options.setPort(restEndpoint.port());
+
+        // Enable gzip compression of response data, but set compression to the lowest level
+        // since it doesn't really save much more space to dial it up.  It's typically a
+        // single digit percentage difference since HTML already compresses very well with level = 1.
+        options.setCompressionLevel(1);
+
+        // Set a cap on the number of worker threads, as Jooby's default value does not seem to consider
+        // multi-tenant servers with high thread counts, and spins up an exorbitant number of threads in that
+        // scenario
+        options.setWorkerThreads(Math.min(128, options.getWorkerThreads()));
+
+
+        jooby.setServerOptions(options);
+
+        jooby.get("/internal/ping", ctx -> "pong");
+        jooby.get("/internal/started", this::isInitialized);
+        jooby.get("/internal/ready", this::isReady);
+
+        for (var service : joobyServices) {
+            jooby.mvc(service);
+        }
+
+        jooby.before(this::auditRequestIn);
+        jooby.after(this::auditRequestOut);
+    }
+
+    private Object isInitialized(Context ctx) {
+        if (initialization.isReady()) {
+            return "ok";
+        }
+        else {
+            ctx.setResponseCode(StatusCode.FAILED_DEPENDENCY_CODE);
+            return "bad";
+        }
+    }
+
+    public boolean isReady() {
+        return true;
+    }
+
+    private String isReady(Context ctx) {
+        if (isReady()) {
+            return "ok";
+        }
+        else {
+            ctx.setResponseCode(StatusCode.FAILED_DEPENDENCY_CODE);
+            return "bad";
+        }
+    }
+
+    private void auditRequestIn(Context ctx) {
+        request_counter.labels(serviceName, Integer.toString(node)).inc();
+    }
+
+    private void auditRequestOut(Context ctx, Object result, Throwable failure) {
+        if (ctx.getResponseCode().value() < 400) {
+            request_counter_good.labels(serviceName, Integer.toString(node)).inc();
+        }
+        else {
+            request_counter_bad.labels(serviceName, Integer.toString(node)).inc();
+        }
+
+        if (failure != null) {
+            logger.error("Request failed " + ctx.getMethod() + " " + ctx.getRequestURL(), failure);
+            request_counter_err.labels(serviceName, Integer.toString(node)).inc();
+        }
+    }
+
+}
--- a/code/common/service/java/nu/marginalia/service/server/MetricsServer.java
+++ b/code/common/service/java/nu/marginalia/service/server/MetricsServer.java
@@ -6,17 +6,22 @@ import nu.marginalia.service.module.ServiceConfiguration;
 import org.eclipse.jetty.server.Server;
 import org.eclipse.jetty.servlet.ServletContextHandler;
 import org.eclipse.jetty.servlet.ServletHolder;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;

 import java.net.InetSocketAddress;

 public class MetricsServer {

+    private static final Logger logger = LoggerFactory.getLogger(MetricsServer.class);
+
    @Inject
-    public MetricsServer(ServiceConfiguration configuration) throws Exception {
+    public MetricsServer(ServiceConfiguration configuration) {
        // If less than zero, we forego setting up a metrics server
        if (configuration.metricsPort() < 0)
            return;

+        try {
            Server server = new Server(new InetSocketAddress(configuration.bindAddress(), configuration.metricsPort()));

            ServletContextHandler context = new ServletContextHandler();
@@ -25,6 +30,12 @@ public class MetricsServer {

            context.addServlet(new ServletHolder(new MetricsServlet()), "/metrics");

+            logger.info("MetricsServer listening on {}:{}", configuration.bindAddress(), configuration.metricsPort());
+
            server.start();
        }
+        catch (Exception|NoSuchMethodError ex) {
+            logger.error("Failed to set up metrics server", ex);
+        }
+    }
 }
--- a/code/common/service/java/nu/marginalia/service/server/RateLimiter.java
+++ b/code/common/service/java/nu/marginalia/service/server/RateLimiter.java
@@ -35,21 +35,8 @@ public class RateLimiter {
    }


-    public static RateLimiter forExpensiveRequest() {
-        return new RateLimiter(5, 10);
-    }
-
    public static RateLimiter custom(int perMinute) {
-        return new RateLimiter(perMinute, 60);
-    }
-
-    public static RateLimiter forSpamBots() {
-        return new RateLimiter(120, 3600);
-    }
-
-
-    public static RateLimiter forLogin() {
-        return new RateLimiter(3, 15);
+        return new RateLimiter(4 * perMinute, perMinute);
    }

    private void cleanIdleBuckets() {
@@ -62,7 +49,7 @@ public class RateLimiter {
    }

    private Bucket createBucket() {
-        var refill = Refill.greedy(1, Duration.ofSeconds(refillRate));
+        var refill = Refill.greedy(refillRate, Duration.ofSeconds(60));
        var bw = Bandwidth.classic(capacity, refill);
        return Bucket.builder().addLimit(bw).build();
    }
--- a/code/common/service/java/nu/marginalia/service/server/SparkService.java
+++ b/code/common/service/java/nu/marginalia/service/server/SparkService.java
@@ -16,7 +16,7 @@ import spark.Spark;

 import java.util.List;

-public class Service {
+public class SparkService {
    private final Logger logger = LoggerFactory.getLogger(getClass());

    // Marker for filtering out sensitive content from the persistent logs
@@ -43,7 +43,7 @@ public class Service {
    private final int node;
    private GrpcServer grpcServer;

-    public Service(BaseServiceParams params,
+    public SparkService(BaseServiceParams params,
                        Runnable configureStaticFiles,
                        ServicePartition partition,
                        List<DiscoverableService> grpcServices) throws Exception {
@@ -126,18 +126,18 @@ public class Service {
        }
    }

-    public Service(BaseServiceParams params,
+    public SparkService(BaseServiceParams params,
                        ServicePartition partition,
                        List<DiscoverableService> grpcServices) throws Exception {
        this(params,
-                Service::defaultSparkConfig,
+                SparkService::defaultSparkConfig,
                partition,
                grpcServices);
    }

-    public Service(BaseServiceParams params) throws Exception {
+    public SparkService(BaseServiceParams params) throws Exception {
        this(params,
-                Service::defaultSparkConfig,
+                SparkService::defaultSparkConfig,
                ServicePartition.any(),
                List.of());
    }
--- a/code/common/service/java/nu/marginalia/service/server/jte/JteModule.java
+++ b/code/common/service/java/nu/marginalia/service/server/jte/JteModule.java
@@ -0,0 +1,61 @@
+package nu.marginalia.service.server.jte;
+
+import edu.umd.cs.findbugs.annotations.NonNull;
+import edu.umd.cs.findbugs.annotations.Nullable;
+import gg.jte.ContentType;
+import gg.jte.TemplateEngine;
+import gg.jte.resolve.DirectoryCodeResolver;
+import io.jooby.*;
+
+import java.io.File;
+import java.nio.file.Path;
+import java.util.List;
+import java.util.Objects;
+import java.util.Optional;
+import java.util.stream.Stream;
+
+// Temporary workaround for a bug
+// APL-2.0 https://github.com/jooby-project/jooby
+public class JteModule implements Extension {
+    private Path sourceDirectory;
+    private Path classDirectory;
+    private TemplateEngine templateEngine;
+
+    public JteModule(@NonNull Path sourceDirectory, @NonNull Path classDirectory) {
+        this.sourceDirectory = (Path)Objects.requireNonNull(sourceDirectory, "Source directory is required.");
+        this.classDirectory = (Path)Objects.requireNonNull(classDirectory, "Class directory is required.");
+    }
+
+    public JteModule(@NonNull Path sourceDirectory) {
+        this.sourceDirectory = (Path)Objects.requireNonNull(sourceDirectory, "Source directory is required.");
+    }
+
+    public JteModule(@NonNull TemplateEngine templateEngine) {
+        this.templateEngine = (TemplateEngine)Objects.requireNonNull(templateEngine, "Template engine is required.");
+    }
+
+    public void install(@NonNull Jooby application) {
+        if (this.templateEngine == null) {
+            this.templateEngine = create(application.getEnvironment(), this.sourceDirectory, this.classDirectory);
+        }
+
+        ServiceRegistry services = application.getServices();
+        services.put(TemplateEngine.class, this.templateEngine);
+        application.encoder(MediaType.html, new JteTemplateEngine(this.templateEngine));
+    }
+
+    public static TemplateEngine create(@NonNull Environment environment, @NonNull Path sourceDirectory, @Nullable Path classDirectory) {
+        boolean dev = environment.isActive("dev", new String[]{"test"});
+        if (dev) {
+            Objects.requireNonNull(sourceDirectory, "Source directory is required.");
+            Path requiredClassDirectory = (Path)Optional.ofNullable(classDirectory).orElseGet(() -> sourceDirectory.resolve("jte-classes"));
+            TemplateEngine engine = TemplateEngine.create(new DirectoryCodeResolver(sourceDirectory), requiredClassDirectory, ContentType.Html, environment.getClassLoader());
+            Optional<List<String>> var10000 = Optional.ofNullable(System.getProperty("jooby.run.classpath")).map((it) -> it.split(File.pathSeparator)).map(Stream::of).map(Stream::toList);
+            Objects.requireNonNull(engine);
+            var10000.ifPresent(engine::setClassPath);
+            return engine;
+        } else {
+            return classDirectory == null ? TemplateEngine.createPrecompiled(ContentType.Html) : TemplateEngine.createPrecompiled(classDirectory, ContentType.Html);
+        }
+    }
+}
--- a/code/common/service/java/nu/marginalia/service/server/jte/JteTemplateEngine.java
+++ b/code/common/service/java/nu/marginalia/service/server/jte/JteTemplateEngine.java
@@ -0,0 +1,48 @@
+package nu.marginalia.service.server.jte;
+
+import edu.umd.cs.findbugs.annotations.NonNull;
+import gg.jte.TemplateEngine;
+import io.jooby.Context;
+import io.jooby.MapModelAndView;
+import io.jooby.ModelAndView;
+import io.jooby.buffer.DataBuffer;
+import io.jooby.internal.jte.DataBufferOutput;
+
+import java.nio.charset.StandardCharsets;
+import java.util.HashMap;
+import java.util.List;
+
+// Temporary workaround for a bug
+// APL-2.0 https://github.com/jooby-project/jooby
+class JteTemplateEngine implements io.jooby.TemplateEngine {
+    private final TemplateEngine jte;
+    private final List<String> extensions;
+
+    public JteTemplateEngine(TemplateEngine jte) {
+        this.jte = jte;
+        this.extensions = List.of(".jte", ".kte");
+    }
+
+
+    @NonNull @Override
+    public List<String> extensions() {
+        return extensions;
+    }
+
+    @Override
+    public DataBuffer render(Context ctx, ModelAndView modelAndView) {
+        var buffer = ctx.getBufferFactory().allocateBuffer();
+        var output = new DataBufferOutput(buffer, StandardCharsets.UTF_8);
+        var attributes = ctx.getAttributes();
+        if (modelAndView instanceof MapModelAndView mapModelAndView) {
+            var mapModel = new HashMap<String, Object>();
+            mapModel.putAll(attributes);
+            mapModel.putAll(mapModelAndView.getModel());
+            jte.render(modelAndView.getView(), mapModel, output);
+        } else {
+            jte.render(modelAndView.getView(), modelAndView.getModel(), output);
+        }
+
+        return buffer;
+    }
+}
--- a/code/common/service/java/nu/marginalia/service/server/mq/ServiceMqSubscription.java
+++ b/code/common/service/java/nu/marginalia/service/server/mq/ServiceMqSubscription.java
@@ -3,7 +3,6 @@ package nu.marginalia.service.server.mq;
 import nu.marginalia.mq.MqMessage;
 import nu.marginalia.mq.inbox.MqInboxResponse;
 import nu.marginalia.mq.inbox.MqSubscription;
-import nu.marginalia.service.server.Service;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;

@@ -15,10 +14,10 @@ import java.util.Map;
 public class ServiceMqSubscription implements MqSubscription {
    private static final Logger logger = LoggerFactory.getLogger(ServiceMqSubscription.class);
    private final Map<String, Method> requests = new HashMap<>();
-    private final Service service;
+    private final Object service;


-    public ServiceMqSubscription(Service service) {
+    public ServiceMqSubscription(Object service) {
        this.service = service;

        /* Wire up all methods annotated with @MqRequest and @MqNotification
--- a/code/common/service/resources/log4j2-json.xml
+++ b/code/common/service/resources/log4j2-json.xml
@@ -3,8 +3,16 @@
        <Console name="Console" target="SYSTEM_OUT">
            <PatternLayout pattern="%d{HH:mm:ss,SSS} %style{%-8markerSimpleName}{FG_Cyan} %highlight{%-5level}{FATAL=red, ERROR=red, WARN=yellow} %-24t %-20c{1}  --  %msg%n"/>
            <Filters>
+                <MarkerFilter marker="PROCESS" onMatch="DENY" onMismatch="NEUTRAL" />
                <MarkerFilter marker="QUERY" onMatch="DENY" onMismatch="NEUTRAL" />
                <MarkerFilter marker="HTTP" onMatch="DENY" onMismatch="NEUTRAL" />
+                <MarkerFilter marker="CRAWLER" onMatch="DENY" onMismatch="NEUTRAL" />
+            </Filters>
+        </Console>
+        <Console name="ProcessConsole" target="SYSTEM_OUT">
+            <PatternLayout pattern="%style{P}{FG_Cyan} %msg%n"/>
+            <Filters>
+                <MarkerFilter marker="PROCESS" onMatch="ALLOW" onMismatch="DENY" />
            </Filters>
        </Console>
        <RollingFile name="LogToFile" fileName="${env:WMSA_LOG_DIR:-/var/log/wmsa}/wmsa-${sys:service-name}-${env:WMSA_SERVICE_NODE:-0}.log" filePattern="/var/log/wmsa/wmsa-${sys:service-name}-${env:WMSA_SERVICE_NODE:-0}-log-%d{MM-dd-yy-HH-mm-ss}-%i.log.gz"
@@ -13,15 +21,29 @@
            <Filters>
                <MarkerFilter marker="QUERY" onMatch="DENY" onMismatch="NEUTRAL" />
                <MarkerFilter marker="HTTP" onMatch="DENY" onMismatch="NEUTRAL" />
+                <MarkerFilter marker="CRAWLER" onMatch="DENY" onMismatch="NEUTRAL" />
+                <MarkerFilter marker="PROCESS" onMatch="DENY" onMismatch="NEUTRAL" />
            </Filters>
            <SizeBasedTriggeringPolicy size="10MB" />
        </RollingFile>
+        <RollingFile name="LogToFile" fileName="${env:WMSA_LOG_DIR:-/var/log/wmsa}/crawler-audit-${env:WMSA_SERVICE_NODE:-0}.log" filePattern="/var/log/wmsa/crawler-audit-${env:WMSA_SERVICE_NODE:-0}-log-%d{MM-dd-yy-HH-mm-ss}-%i.log.gz"
+                     ignoreExceptions="false">
+            <PatternLayout>
+                <Pattern>%d{yyyy-MM-dd HH:mm:ss,SSS}: %msg{nolookups}%n</Pattern>
+            </PatternLayout>
+            <SizeBasedTriggeringPolicy size="100MB" />
+            <Filters>
+                <MarkerFilter marker="CRAWLER" onMatch="ALLOW" onMismatch="DENY" />
+            </Filters>
+        </RollingFile>
    </Appenders>
    <Loggers>
        <Logger name="org.apache.zookeeper" level="WARN" />
-
+        <Logger name="org.apache.pdfbox" level="ERROR" />
+        <Logger name="org.apache.fontbox.ttf" level="ERROR" />
        <Root level="info">
            <AppenderRef ref="Console"/>
+            <AppenderRef ref="ProcessConsole"/>
            <AppenderRef ref="LogToFile"/>
        </Root>
    </Loggers>
--- a/code/common/service/resources/log4j2-prod.xml
+++ b/code/common/service/resources/log4j2-prod.xml
@@ -1,10 +1,49 @@
 <Configuration xmlns="http://logging.apache.org/log4j/2.0/config" >
    <Appenders>
-        <Console name="Console" target="SYSTEM_OUT">
-            <PatternLayout pattern="%d{HH:mm:ss,SSS} %style{%-8markerSimpleName}{FG_Cyan} %highlight{%-5level}{FATAL=red, ERROR=red, WARN=yellow} %-24t %-20c{1}  --  %msg%n"/>
+        <Console name="ConsoleInfo" target="SYSTEM_OUT">
+            <PatternLayout pattern="- %d{HH:mm:ss,SSS} %-20c{1} -- %msg%n"/>
            <Filters>
+                <LevelMatchFilter level="INFO" onMatch="ALLOW" onMismatch="DENY"/>
+                <MarkerFilter marker="PROCESS" onMatch="DENY" onMismatch="NEUTRAL" />
                <MarkerFilter marker="QUERY" onMatch="DENY" onMismatch="NEUTRAL" />
                <MarkerFilter marker="HTTP" onMatch="DENY" onMismatch="NEUTRAL" />
+                <MarkerFilter marker="CRAWLER" onMatch="DENY" onMismatch="NEUTRAL" />
+            </Filters>
+        </Console>
+        <Console name="ConsoleWarn" target="SYSTEM_OUT">
+            <PatternLayout pattern="⚠ %d{HH:mm:ss,SSS} %-20c{1} -- %msg%n"/>
+            <Filters>
+                <LevelMatchFilter level="WARN" onMatch="ALLOW" onMismatch="DENY"/>
+                <MarkerFilter marker="PROCESS" onMatch="DENY" onMismatch="NEUTRAL" />
+                <MarkerFilter marker="QUERY" onMatch="DENY" onMismatch="NEUTRAL" />
+                <MarkerFilter marker="HTTP" onMatch="DENY" onMismatch="NEUTRAL" />
+                <MarkerFilter marker="CRAWLER" onMatch="DENY" onMismatch="NEUTRAL" />
+            </Filters>
+        </Console>
+        <Console name="ConsoleError" target="SYSTEM_OUT">
+            <PatternLayout pattern="🔥 %d{HH:mm:ss,SSS} %-20c{1} -- %msg%n"/>
+            <Filters>
+                <LevelMatchFilter level="ERROR" onMatch="ALLOW" onMismatch="DENY"/>
+                <MarkerFilter marker="PROCESS" onMatch="DENY" onMismatch="NEUTRAL" />
+                <MarkerFilter marker="QUERY" onMatch="DENY" onMismatch="NEUTRAL" />
+                <MarkerFilter marker="HTTP" onMatch="DENY" onMismatch="NEUTRAL" />
+                <MarkerFilter marker="CRAWLER" onMatch="DENY" onMismatch="NEUTRAL" />
+            </Filters>
+        </Console>
+        <Console name="ConsoleFatal" target="SYSTEM_OUT">
+            <PatternLayout pattern="💀 %d{HH:mm:ss,SSS} %-20c{1} -- %msg%n"/>
+            <Filters>
+                <LevelMatchFilter level="FATAL" onMatch="ALLOW" onMismatch="DENY"/>
+                <MarkerFilter marker="PROCESS" onMatch="DENY" onMismatch="NEUTRAL" />
+                <MarkerFilter marker="QUERY" onMatch="DENY" onMismatch="NEUTRAL" />
+                <MarkerFilter marker="HTTP" onMatch="DENY" onMismatch="NEUTRAL" />
+                <MarkerFilter marker="CRAWLER" onMatch="DENY" onMismatch="NEUTRAL" />
+            </Filters>
+        </Console>
+        <Console name="ProcessConsole" target="SYSTEM_OUT">
+            <PatternLayout pattern="%style{%msg%n}{FG_Cyan}"/>
+            <Filters>
+                <MarkerFilter marker="PROCESS" onMatch="ALLOW" onMismatch="DENY" />
            </Filters>
        </Console>
        <RollingFile name="LogToFile" fileName="${env:WMSA_LOG_DIR:-/var/log/wmsa}/wmsa-${sys:service-name}-${env:WMSA_SERVICE_NODE:-0}.log" filePattern="/var/log/wmsa/wmsa-${sys:service-name}-${env:WMSA_SERVICE_NODE:-0}-log-%d{MM-dd-yy-HH-mm-ss}-%i.log.gz"
@@ -17,14 +56,30 @@
                <MarkerFilter marker="PROCESS" onMatch="DENY" onMismatch="NEUTRAL" />
                <MarkerFilter marker="QUERY" onMatch="DENY" onMismatch="NEUTRAL" />
                <MarkerFilter marker="HTTP" onMatch="DENY" onMismatch="NEUTRAL" />
+                <MarkerFilter marker="CRAWLER" onMatch="DENY" onMismatch="NEUTRAL" />
+            </Filters>
+        </RollingFile>
+        <RollingFile name="LogToFile" fileName="${env:WMSA_LOG_DIR:-/var/log/wmsa}/crawler-audit-${env:WMSA_SERVICE_NODE:-0}.log" filePattern="/var/log/wmsa/crawler-audit-${env:WMSA_SERVICE_NODE:-0}-log-%d{MM-dd-yy-HH-mm-ss}-%i.log.gz"
+                     ignoreExceptions="false">
+            <PatternLayout>
+                <Pattern>%d{yyyy-MM-dd HH:mm:ss,SSS}: %msg{nolookups}%n</Pattern>
+            </PatternLayout>
+            <SizeBasedTriggeringPolicy size="100MB" />
+            <Filters>
+                <MarkerFilter marker="CRAWLER" onMatch="ALLOW" onMismatch="DENY" />
            </Filters>
        </RollingFile>
    </Appenders>
    <Loggers>
        <Logger name="org.apache.zookeeper" level="WARN" />
-
+        <Logger name="org.apache.pdfbox" level="ERROR" />
+        <Logger name="org.apache.fontbox.ttf" level="ERROR" />
        <Root level="info">
-            <AppenderRef ref="Console"/>
+            <AppenderRef ref="ConsoleInfo"/>
+            <AppenderRef ref="ConsoleWarn"/>
+            <AppenderRef ref="ConsoleError"/>
+            <AppenderRef ref="ConsoleFatal"/>
+            <AppenderRef ref="ProcessConsole"/>
            <AppenderRef ref="LogToFile"/>
        </Root>
    </Loggers>
--- a/code/common/service/resources/log4j2-test.xml
+++ b/code/common/service/resources/log4j2-test.xml
@@ -1,15 +1,50 @@
 <Configuration xmlns="http://logging.apache.org/log4j/2.0/config" >
    <Appenders>
-        <Console name="Console" target="SYSTEM_OUT">
-            <PatternLayout pattern="%d{HH:mm:ss,SSS} %style{%-8markerSimpleName}{FG_Cyan} %highlight{%-5level}{FATAL=red, ERROR=red, WARN=yellow} %-24t %-20c{1}  --  %msg%n"/>
+        <Console name="ConsoleInfo" target="SYSTEM_OUT">
+            <PatternLayout pattern="- %d{HH:mm:ss,SSS} %-20c{1} -- %msg%n"/>
+            <Filters>
+                <LevelMatchFilter level="INFO" onMatch="ALLOW" onMismatch="DENY"/>
+                <MarkerFilter marker="PROCESS" onMatch="DENY" onMismatch="NEUTRAL" />
+            </Filters>
+        </Console>
+        <Console name="ConsoleWarn" target="SYSTEM_OUT">
+            <PatternLayout pattern="⚠ %d{HH:mm:ss,SSS} %-20c{1} -- %msg%n"/>
+            <Filters>
+                <LevelMatchFilter level="WARN" onMatch="ALLOW" onMismatch="DENY"/>
+                <MarkerFilter marker="PROCESS" onMatch="DENY" onMismatch="NEUTRAL" />
+            </Filters>
+        </Console>
+        <Console name="ConsoleError" target="SYSTEM_OUT">
+            <PatternLayout pattern="🔥 %d{HH:mm:ss,SSS} %-20c{1} -- %msg%n"/>
+            <Filters>
+                <LevelMatchFilter level="ERROR" onMatch="ALLOW" onMismatch="DENY"/>
+                <MarkerFilter marker="PROCESS" onMatch="DENY" onMismatch="NEUTRAL" />
+            </Filters>
+        </Console>
+        <Console name="ConsoleFatal" target="SYSTEM_OUT">
+            <PatternLayout pattern="💀 %d{HH:mm:ss,SSS} %-20c{1} -- %msg%n"/>
+            <Filters>
+                <LevelMatchFilter level="FATAL" onMatch="ALLOW" onMismatch="DENY"/>
+                <MarkerFilter marker="PROCESS" onMatch="DENY" onMismatch="NEUTRAL" />
+            </Filters>
+        </Console>
+        <Console name="ProcessConsole" target="SYSTEM_OUT">
+            <PatternLayout pattern="%style{%msg%n}{FG_Cyan}"/>
+            <Filters>
+                <MarkerFilter marker="PROCESS" onMatch="ALLOW" onMismatch="DENY" />
+            </Filters>
        </Console>
    </Appenders>
    <Loggers>
        <Logger name="org.apache.zookeeper" level="WARN" />
-
+        <Logger name="org.apache.pdfbox" level="ERROR" />
+        <Logger name="org.apache.fontbox.ttf" level="ERROR" />
        <Root level="info">
-            <AppenderRef ref="Console"/>
-            <AppenderRef ref="LogToFile"/>
+            <AppenderRef ref="ConsoleInfo"/>
+            <AppenderRef ref="ConsoleWarn"/>
+            <AppenderRef ref="ConsoleError"/>
+            <AppenderRef ref="ConsoleFatal"/>
+            <AppenderRef ref="ProcessConsole"/>
        </Root>
    </Loggers>
 </Configuration>
--- a/code/common/service/test/nu/marginalia/service/discovery/ZkServiceRegistryTest.java
+++ b/code/common/service/test/nu/marginalia/service/discovery/ZkServiceRegistryTest.java
@@ -25,7 +25,7 @@ import static org.mockito.Mockito.when;
 class ZkServiceRegistryTest {
    private static final int ZOOKEEPER_PORT = 2181;
    private static final GenericContainer<?> zookeeper =
-            new GenericContainer<>("zookeeper:3.8.0")
+            new GenericContainer<>("zookeeper:3.8")
                    .withExposedPorts(ZOOKEEPER_PORT);

    List<ZkServiceRegistry> registries = new ArrayList<>();
--- a/code/execution/api/java/nu/marginalia/executor/client/ExecutorExportClient.java
+++ b/code/execution/api/java/nu/marginalia/executor/client/ExecutorExportClient.java
@@ -48,12 +48,13 @@ public class ExecutorExportClient {
        return msgId;
    }

-    public void exportSampleData(int node, FileStorageId fid, int size, String name) {
+    public void exportSampleData(int node, FileStorageId fid, int size, String ctFilter, String name) {
        channelPool.call(ExecutorExportApiBlockingStub::exportSampleData)
                .forNode(node)
                .run(RpcExportSampleData.newBuilder()
                        .setFileStorageId(fid.id())
                        .setSize(size)
+                        .setCtFilter(ctFilter)
                        .setName(name)
                        .build());
    }
--- a/code/execution/api/src/main/protobuf/executor-api.proto
+++ b/code/execution/api/src/main/protobuf/executor-api.proto
@@ -100,6 +100,7 @@ message RpcExportSampleData {
  int64 fileStorageId = 1;
  int32 size = 2;
  string name = 3;
+  string ctFilter = 4;
 }
 message RpcDownloadSampleData {
  string sampleSet = 1;
--- a/code/execution/build.gradle
+++ b/code/execution/build.gradle
@@ -19,6 +19,7 @@ dependencies {
    implementation project(':code:processes:crawling-process')
    implementation project(':code:processes:live-crawling-process')
    implementation project(':code:processes:loading-process')
+    implementation project(':code:processes:ping-process')
    implementation project(':code:processes:converting-process')
    implementation project(':code:processes:index-constructor-process')

@@ -37,6 +38,7 @@ dependencies {
    implementation project(':code:functions:link-graph:api')
    implementation project(':code:functions:live-capture:api')
    implementation project(':code:functions:search-query')
+    implementation project(':code:functions:nsfw-domain-filter')
    implementation project(':code:execution:api')

    implementation project(':code:processes:crawling-process:model')
--- a/code/execution/java/nu/marginalia/actor/ExecutorActor.java
+++ b/code/execution/java/nu/marginalia/actor/ExecutorActor.java
@@ -6,11 +6,13 @@ import java.util.Set;

 public enum ExecutorActor {
    PREC_EXPORT_ALL(NodeProfile.BATCH_CRAWL, NodeProfile.MIXED),
+    SYNC_NSFW_LISTS(NodeProfile.BATCH_CRAWL, NodeProfile.MIXED),

    CRAWL(NodeProfile.BATCH_CRAWL, NodeProfile.MIXED),
    RECRAWL(NodeProfile.BATCH_CRAWL, NodeProfile.MIXED),
    RECRAWL_SINGLE_DOMAIN(NodeProfile.BATCH_CRAWL, NodeProfile.MIXED),
    PROC_CRAWLER_SPAWNER(NodeProfile.BATCH_CRAWL, NodeProfile.MIXED),
+    PROC_PING_SPAWNER(NodeProfile.BATCH_CRAWL, NodeProfile.MIXED, NodeProfile.SIDELOAD),
    PROC_EXPORT_TASKS_SPAWNER(NodeProfile.BATCH_CRAWL, NodeProfile.MIXED),
    ADJACENCY_CALCULATION(NodeProfile.BATCH_CRAWL, NodeProfile.MIXED),
    EXPORT_DATA(NodeProfile.BATCH_CRAWL, NodeProfile.MIXED),
@@ -20,6 +22,7 @@ public enum ExecutorActor {
    EXPORT_FEEDS(NodeProfile.BATCH_CRAWL, NodeProfile.MIXED),
    EXPORT_SAMPLE_DATA(NodeProfile.BATCH_CRAWL, NodeProfile.MIXED),
    DOWNLOAD_SAMPLE(NodeProfile.BATCH_CRAWL, NodeProfile.MIXED),
+    MIGRATE_CRAWL_DATA(NodeProfile.BATCH_CRAWL, NodeProfile.MIXED),

    PROC_CONVERTER_SPAWNER(NodeProfile.BATCH_CRAWL, NodeProfile.MIXED, NodeProfile.SIDELOAD),
    PROC_LOADER_SPAWNER(NodeProfile.BATCH_CRAWL, NodeProfile.MIXED, NodeProfile.SIDELOAD),
@@ -34,7 +37,8 @@ public enum ExecutorActor {
    LIVE_CRAWL(NodeProfile.REALTIME),
    PROC_LIVE_CRAWL_SPAWNER(NodeProfile.REALTIME),
    SCRAPE_FEEDS(NodeProfile.REALTIME),
-    UPDATE_RSS(NodeProfile.REALTIME);
+    UPDATE_RSS(NodeProfile.REALTIME)
+    ;

    public String id() {
        return "fsm:" + name().toLowerCase();
--- a/code/execution/java/nu/marginalia/actor/ExecutorActorControlService.java
+++ b/code/execution/java/nu/marginalia/actor/ExecutorActorControlService.java
@@ -49,6 +49,7 @@ public class ExecutorActorControlService {
                                       RecrawlSingleDomainActor recrawlSingleDomainActor,
                                       RestoreBackupActor restoreBackupActor,
                                       ConverterMonitorActor converterMonitorFSM,
+                                       PingMonitorActor pingMonitorActor,
                                       CrawlerMonitorActor crawlerMonitorActor,
                                       LiveCrawlerMonitorActor liveCrawlerMonitorActor,
                                       LoaderMonitorActor loaderMonitor,
@@ -66,7 +67,9 @@ public class ExecutorActorControlService {
                                       DownloadSampleActor downloadSampleActor,
                                       ScrapeFeedsActor scrapeFeedsActor,
                                       ExecutorActorStateMachines stateMachines,
+                                       MigrateCrawlDataActor migrateCrawlDataActor,
                                       ExportAllPrecessionActor exportAllPrecessionActor,
+                                       UpdateNsfwFiltersActor updateNsfwFiltersActor,
                                       UpdateRssActor updateRssActor) throws SQLException {
        this.messageQueueFactory = messageQueueFactory;
        this.eventLog = baseServiceParams.eventLog;
@@ -87,6 +90,7 @@ public class ExecutorActorControlService {
        register(ExecutorActor.PROC_CONVERTER_SPAWNER, converterMonitorFSM);
        register(ExecutorActor.PROC_LOADER_SPAWNER, loaderMonitor);
        register(ExecutorActor.PROC_CRAWLER_SPAWNER, crawlerMonitorActor);
+        register(ExecutorActor.PROC_PING_SPAWNER, pingMonitorActor);
        register(ExecutorActor.PROC_LIVE_CRAWL_SPAWNER, liveCrawlerMonitorActor);
        register(ExecutorActor.PROC_EXPORT_TASKS_SPAWNER, exportTasksMonitorActor);

@@ -107,6 +111,9 @@ public class ExecutorActorControlService {
        register(ExecutorActor.SCRAPE_FEEDS, scrapeFeedsActor);
        register(ExecutorActor.UPDATE_RSS, updateRssActor);

+        register(ExecutorActor.MIGRATE_CRAWL_DATA, migrateCrawlDataActor);
+        register(ExecutorActor.SYNC_NSFW_LISTS, updateNsfwFiltersActor);
+
        if (serviceConfiguration.node() == 1) {
            register(ExecutorActor.PREC_EXPORT_ALL, exportAllPrecessionActor);
        }
--- a/code/execution/java/nu/marginalia/actor/proc/PingMonitorActor.java
+++ b/code/execution/java/nu/marginalia/actor/proc/PingMonitorActor.java
@@ -0,0 +1,186 @@
+package nu.marginalia.actor.proc;
+
+import com.google.gson.Gson;
+import com.google.inject.Inject;
+import com.google.inject.Singleton;
+import nu.marginalia.actor.prototype.RecordActorPrototype;
+import nu.marginalia.actor.state.ActorResumeBehavior;
+import nu.marginalia.actor.state.ActorStep;
+import nu.marginalia.actor.state.Resume;
+import nu.marginalia.actor.state.Terminal;
+import nu.marginalia.mq.MqMessageState;
+import nu.marginalia.mq.persistence.MqMessageHandlerRegistry;
+import nu.marginalia.mq.persistence.MqPersistence;
+import nu.marginalia.mqapi.ProcessInboxNames;
+import nu.marginalia.mqapi.ping.PingRequest;
+import nu.marginalia.nodecfg.NodeConfigurationService;
+import nu.marginalia.nodecfg.model.NodeProfile;
+import nu.marginalia.process.ProcessService;
+import nu.marginalia.service.module.ServiceConfiguration;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.sql.SQLException;
+import java.util.Set;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicBoolean;
+
+@Singleton
+public class PingMonitorActor extends RecordActorPrototype {
+
+    private final MqPersistence persistence;
+    private final ProcessService processService;
+
+    private final Logger logger = LoggerFactory.getLogger(getClass());
+
+    public static final int MAX_ATTEMPTS = 3;
+    private final String inboxName;
+    private final ProcessService.ProcessId processId;
+    private final ExecutorService executorService = Executors.newSingleThreadExecutor();
+    private final int node;
+    private final boolean isPrimaryNode;
+    private final Gson gson;
+
+    public record Initial() implements ActorStep {}
+    @Resume(behavior = ActorResumeBehavior.RETRY)
+    public record Monitor(int errorAttempts) implements ActorStep {}
+    @Resume(behavior = ActorResumeBehavior.RESTART)
+    public record Run(int attempts) implements ActorStep {}
+    @Terminal
+    public record Aborted() implements ActorStep {}
+
+    @Override
+    public ActorStep transition(ActorStep self) throws Exception {
+        return switch (self) {
+            case Initial i -> {
+                PingRequest request = new PingRequest(isPrimaryNode ? "primary": "secondary");
+
+                persistence.sendNewMessage(inboxName, null, null,
+                        "PingRequest",
+                        gson.toJson(request),
+                        null);
+
+                yield new Monitor(0);
+            }
+            case Monitor(int errorAttempts) -> {
+                for (;;) {
+                    var messages = persistence.eavesdrop(inboxName, 1);
+
+                    if (messages.isEmpty() && !processService.isRunning(processId)) {
+                        synchronized (processId) {
+                            processId.wait(5000);
+                        }
+
+                        if (errorAttempts > 0) { // Reset the error counter if there is silence in the inbox
+                            yield new Monitor(0);
+                        }
+                        // else continue
+                    } else {
+                        // Special: Associate this thread with the message so that we can get tracking
+                        MqMessageHandlerRegistry.register(messages.getFirst().msgId());
+
+                        yield new Run(0);
+                    }
+                }
+            }
+            case Run(int attempts) -> {
+                try {
+                    long startTime = System.currentTimeMillis();
+                    var exec = new TaskExecution();
+                    long endTime = System.currentTimeMillis();
+
+                    if (exec.isError()) {
+                        if (attempts < MAX_ATTEMPTS)
+                            yield new Run(attempts + 1);
+                        else
+                            yield new Error();
+                    }
+                    else if (endTime - startTime < TimeUnit.SECONDS.toMillis(1)) {
+                        // To avoid boot loops, we transition to error if the process
+                        // didn't run for longer than 1 seconds.  This might happen if
+                        // the process crashes before it can reach the heartbeat and inbox
+                        // stages of execution.  In this case it would not report having acted
+                        // on its message, and the process would be restarted forever without
+                        // the attempts counter incrementing.
+                        yield new Error("Process terminated within 1 seconds of starting");
+                    }
+                }
+                catch (InterruptedException ex) {
+                    // We get this exception when the process is cancelled by the user
+
+                    processService.kill(processId);
+                    setCurrentMessageToDead();
+
+                    yield new Aborted();
+                }
+
+                yield new Monitor(attempts);
+            }
+            default -> new Error();
+        };
+    }
+
+    public String describe() {
+        return "Spawns a(n) " + processId +  " process and monitors its inbox for messages";
+    }
+
+    @Inject
+    public PingMonitorActor(Gson gson,
+                                       NodeConfigurationService nodeConfigurationService,
+                                       ServiceConfiguration configuration,
+                                       MqPersistence persistence,
+                                       ProcessService processService) throws SQLException {
+        super(gson);
+        this.gson = gson;
+        this.node = configuration.node();
+        this.persistence = persistence;
+        this.processService = processService;
+        this.inboxName = ProcessInboxNames.PING_INBOX + ":" + node;
+        this.processId = ProcessService.ProcessId.PING;
+
+        this.isPrimaryNode = Set.of(NodeProfile.BATCH_CRAWL, NodeProfile.MIXED)
+                .contains(nodeConfigurationService.get(node).profile());
+    }
+
+    /** Sets the message to dead in the database to avoid
+     * the service respawning on the same task when we
+     * re-enable this actor */
+    private void setCurrentMessageToDead() {
+        try {
+            var messages = persistence.eavesdrop(inboxName, 1);
+
+            if (messages.isEmpty()) // Possibly a race condition where the task is already finished
+                return;
+
+            var theMessage = messages.iterator().next();
+            persistence.updateMessageState(theMessage.msgId(), MqMessageState.DEAD);
+        }
+        catch (SQLException ex) {
+            logger.error("Tried but failed to set the message for " + processId + " to dead", ex);
+        }
+    }
+
+    /** Encapsulates the execution of the process in a separate thread so that
+     * we can interrupt the thread if the process is cancelled */
+    private class TaskExecution {
+        private final AtomicBoolean error = new AtomicBoolean(false);
+        public TaskExecution() throws ExecutionException, InterruptedException {
+            // Run this call in a separate thread so that this thread can be interrupted waiting for it
+            executorService.submit(() -> {
+                try {
+                    processService.trigger(processId);
+                } catch (Exception e) {
+                    logger.warn("Error in triggering process", e);
+                    error.set(true);
+                }
+            }).get(); // Wait for the process to start
+        }
+
+        public boolean isError() {
+            return error.get();
+        }
+    }
+}
--- a/code/execution/java/nu/marginalia/actor/proc/UpdateRssActor.java
+++ b/code/execution/java/nu/marginalia/actor/proc/UpdateRssActor.java
@@ -14,6 +14,8 @@ import nu.marginalia.mq.persistence.MqPersistence;
 import nu.marginalia.nodecfg.NodeConfigurationService;
 import nu.marginalia.nodecfg.model.NodeProfile;
 import nu.marginalia.service.module.ServiceConfiguration;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;

 import java.time.Duration;
 import java.time.LocalDateTime;
@@ -29,6 +31,7 @@ public class UpdateRssActor extends RecordActorPrototype {

    private final NodeConfigurationService nodeConfigurationService;
    private final MqPersistence persistence;
+    private static final Logger logger = LoggerFactory.getLogger(UpdateRssActor.class);

    @Inject
    public UpdateRssActor(Gson gson,
@@ -101,8 +104,8 @@ public class UpdateRssActor extends RecordActorPrototype {
            case UpdateRefresh(int count, long msgId) -> {
                MqMessage msg = persistence.waitForMessageTerminalState(msgId, Duration.ofSeconds(10), Duration.ofHours(12));
                if (msg == null) {
-                    // Retry the update
-                    yield new Error("Failed to update feeds: message not found");
+                    logger.warn("UpdateRefresh is taking a very long time");
+                    yield new UpdateRefresh(count, msgId);
                } else if (msg.state() != MqMessageState.OK) {
                    // Retry the update
                    yield new Error("Failed to update feeds: " + msg.state());
@@ -119,8 +122,8 @@ public class UpdateRssActor extends RecordActorPrototype {
            case UpdateClean(long msgId) -> {
                MqMessage msg = persistence.waitForMessageTerminalState(msgId, Duration.ofSeconds(10), Duration.ofHours(12));
                if (msg == null) {
-                    // Retry the update
-                    yield new Error("Failed to update feeds: message not found");
+                    logger.warn("UpdateClean is taking a very long time");
+                    yield new UpdateClean(msgId);
                } else if (msg.state() != MqMessageState.OK) {
                    // Retry the update
                    yield new Error("Failed to update feeds: " + msg.state());
--- a/code/execution/java/nu/marginalia/actor/task/DownloadSampleActor.java
+++ b/code/execution/java/nu/marginalia/actor/task/DownloadSampleActor.java
@@ -8,6 +8,7 @@ import nu.marginalia.actor.state.ActorResumeBehavior;
 import nu.marginalia.actor.state.ActorStep;
 import nu.marginalia.actor.state.Resume;
 import nu.marginalia.service.control.ServiceEventLog;
+import nu.marginalia.service.control.ServiceHeartbeat;
 import nu.marginalia.storage.FileStorageService;
 import nu.marginalia.storage.model.FileStorage;
 import nu.marginalia.storage.model.FileStorageId;
@@ -19,6 +20,7 @@ import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;

 import java.io.*;
+import java.net.HttpURLConnection;
 import java.net.MalformedURLException;
 import java.net.URI;
 import java.net.URL;
@@ -32,6 +34,7 @@ public class DownloadSampleActor extends RecordActorPrototype {

    private final FileStorageService storageService;
    private final ServiceEventLog eventLog;
+    private final ServiceHeartbeat heartbeat;
    private final Logger logger = LoggerFactory.getLogger(getClass());

    @Resume(behavior = ActorResumeBehavior.ERROR)
@@ -66,15 +69,39 @@ public class DownloadSampleActor extends RecordActorPrototype {

                Files.deleteIfExists(Path.of(tarFileName));

-                try (var is = new BufferedInputStream(new URI(downloadURI).toURL().openStream());
+                HttpURLConnection urlConnection = (HttpURLConnection) new URI(downloadURI).toURL().openConnection();
+
+                try (var hb = heartbeat.createServiceAdHocTaskHeartbeat("Downloading sample")) {
+                    long size = urlConnection.getContentLengthLong();
+                    byte[] buffer = new byte[8192];
+
+                    try (var is = new BufferedInputStream(urlConnection.getInputStream());
                         var os = new BufferedOutputStream(Files.newOutputStream(Path.of(tarFileName), StandardOpenOption.CREATE))) {
-                    is.transferTo(os);
+                        long copiedSize = 0;
+
+                        while (copiedSize < size) {
+                            int read = is.read(buffer);
+
+                            if (read < 0) // We've been promised a file of length 'size'
+                                throw new IOException("Unexpected end of stream");
+
+                            os.write(buffer, 0, read);
+                            copiedSize += read;
+
+                            // Update progress bar
+                            hb.progress(String.format("%d MB", copiedSize / 1024 / 1024), (int) (copiedSize / 1024), (int) (size / 1024));
+                        }
+                    }
+
                }
                catch (Exception ex) {
                    eventLog.logEvent(DownloadSampleActor.class, "Error downloading sample");
                    logger.error("Error downloading sample", ex);
                    yield new Error();
                }
+                finally {
+                    urlConnection.disconnect();
+                }

                eventLog.logEvent(DownloadSampleActor.class, "Download complete");
                yield new Extract(fileStorageId, tarFileName);
@@ -170,11 +197,12 @@ public class DownloadSampleActor extends RecordActorPrototype {
    @Inject
    public DownloadSampleActor(Gson gson,
                               FileStorageService storageService,
-                               ServiceEventLog eventLog)
+                               ServiceEventLog eventLog, ServiceHeartbeat heartbeat)
    {
        super(gson);
        this.storageService = storageService;
        this.eventLog = eventLog;
+        this.heartbeat = heartbeat;
    }

 }
--- a/code/execution/java/nu/marginalia/actor/task/ExportSampleDataActor.java
+++ b/code/execution/java/nu/marginalia/actor/task/ExportSampleDataActor.java
@@ -26,32 +26,32 @@ public class ExportSampleDataActor extends RecordActorPrototype {
    private final MqOutbox exportTasksOutbox;
    private final Logger logger = LoggerFactory.getLogger(getClass());

-    public record Export(FileStorageId crawlId, int size, String name) implements ActorStep {}
-    public record Run(FileStorageId crawlId, FileStorageId destId, int size, String name, long msgId) implements ActorStep {
-        public Run(FileStorageId crawlId, FileStorageId destId, int size, String name) {
-            this(crawlId, destId, size, name, -1);
+    public record Export(FileStorageId crawlId, int size, String ctFilter, String name) implements ActorStep {}
+    public record Run(FileStorageId crawlId, FileStorageId destId, int size, String ctFilter, String name, long msgId) implements ActorStep {
+        public Run(FileStorageId crawlId, FileStorageId destId, int size, String name, String ctFilter) {
+            this(crawlId, destId, size, name, ctFilter,-1);
        }
    }

    @Override
    public ActorStep transition(ActorStep self) throws Exception {
        return switch(self) {
-            case Export(FileStorageId crawlId, int size, String name) -> {
+            case Export(FileStorageId crawlId, int size, String ctFilter, String name) -> {
                var storage = storageService.allocateStorage(FileStorageType.EXPORT,
                        "crawl-sample-export",
                        "Crawl Data Sample " + name + "/" + size + " " + LocalDateTime.now()
                );

                if (storage == null) yield new Error("Bad storage id");
-                yield new Run(crawlId, storage.id(), size, name);
+                yield new Run(crawlId, storage.id(), size, ctFilter, name);
            }
-            case Run(FileStorageId crawlId, FileStorageId destId, int size, String name, long msgId) when msgId < 0 -> {
+            case Run(FileStorageId crawlId, FileStorageId destId, int size, String ctFilter, String name, long msgId) when msgId < 0 -> {
                storageService.setFileStorageState(destId, FileStorageState.NEW);

-                long newMsgId = exportTasksOutbox.sendAsync(ExportTaskRequest.sampleData(crawlId, destId, size, name));
-                yield new Run(crawlId, destId, size, name, newMsgId);
+                long newMsgId = exportTasksOutbox.sendAsync(ExportTaskRequest.sampleData(crawlId, destId, ctFilter, size, name));
+                yield new Run(crawlId, destId, size, ctFilter, name, newMsgId);
            }
-            case Run(_, FileStorageId destId, _, _, long msgId) -> {
+            case Run(_, FileStorageId destId, _, _, _, long msgId) -> {
                var rsp = processWatcher.waitResponse(exportTasksOutbox, ProcessService.ProcessId.EXPORT_TASKS, msgId);

                if (rsp.state() != MqMessageState.OK) {
@@ -70,7 +70,7 @@ public class ExportSampleDataActor extends RecordActorPrototype {

    @Override
    public String describe() {
-        return "Export RSS/Atom feeds from crawl data";
+        return "Export sample crawl data";
    }

    @Inject
--- a/code/execution/java/nu/marginalia/actor/task/LiveCrawlActor.java
+++ b/code/execution/java/nu/marginalia/actor/task/LiveCrawlActor.java
@@ -50,12 +50,18 @@ public class LiveCrawlActor extends RecordActorPrototype {
                yield new Monitor("-");
            }
            case Monitor(String feedsHash) -> {
+                // Sleep initially in case this is during start-up
                for (;;) {
+                    try {
+                        Thread.sleep(Duration.ofMinutes(15));
                        String currentHash = feedsClient.getFeedDataHash();
                        if (!Objects.equals(currentHash, feedsHash)) {
                            yield new LiveCrawl(currentHash);
                        }
-                    Thread.sleep(Duration.ofMinutes(15));
+                    }
+                    catch (RuntimeException ex) {
+                        logger.error("Failed to fetch feed data hash");
+                    }
                }
            }
            case LiveCrawl(String feedsHash, long msgId) when msgId < 0 -> {
--- a/code/execution/java/nu/marginalia/actor/task/MigrateCrawlDataActor.java
+++ b/code/execution/java/nu/marginalia/actor/task/MigrateCrawlDataActor.java
@@ -0,0 +1,150 @@
+package nu.marginalia.actor.task;
+
+import com.google.gson.Gson;
+import jakarta.inject.Inject;
+import jakarta.inject.Singleton;
+import nu.marginalia.actor.prototype.RecordActorPrototype;
+import nu.marginalia.actor.state.ActorStep;
+import nu.marginalia.io.CrawlerOutputFile;
+import nu.marginalia.process.log.WorkLog;
+import nu.marginalia.process.log.WorkLogEntry;
+import nu.marginalia.service.control.ServiceHeartbeat;
+import nu.marginalia.slop.SlopCrawlDataRecord;
+import nu.marginalia.storage.FileStorageService;
+import nu.marginalia.storage.model.FileStorage;
+import nu.marginalia.storage.model.FileStorageId;
+import org.apache.logging.log4j.util.Strings;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.StandardCopyOption;
+import java.util.Map;
+import java.util.Optional;
+import java.util.function.Function;
+
+@Singleton
+public class MigrateCrawlDataActor extends RecordActorPrototype {
+
+    private final FileStorageService fileStorageService;
+    private final ServiceHeartbeat serviceHeartbeat;
+    private static final Logger logger = LoggerFactory.getLogger(MigrateCrawlDataActor.class);
+
+    @Inject
+    public MigrateCrawlDataActor(Gson gson, FileStorageService fileStorageService, ServiceHeartbeat serviceHeartbeat) {
+        super(gson);
+
+        this.fileStorageService = fileStorageService;
+        this.serviceHeartbeat = serviceHeartbeat;
+    }
+
+    public record Run(long fileStorageId) implements ActorStep {}
+
+    @Override
+    public ActorStep transition(ActorStep self) throws Exception {
+        return switch (self) {
+            case Run(long fileStorageId) -> {
+
+                FileStorage storage = fileStorageService.getStorage(FileStorageId.of(fileStorageId));
+                Path root = storage.asPath();
+
+                Path crawlerLog = root.resolve("crawler.log");
+                Path newCrawlerLog = Files.createTempFile(root, "crawler", ".migrate.log");
+
+                int totalEntries = WorkLog.countEntries(crawlerLog);
+
+                try (WorkLog workLog = new WorkLog(newCrawlerLog);
+                     var heartbeat = serviceHeartbeat.createServiceAdHocTaskHeartbeat("Migrating")
+                ) {
+                    int entryIdx = 0;
+
+                    for (Map.Entry<WorkLogEntry, Path> item : WorkLog.iterableMap(crawlerLog, new CrawlDataLocator(root))) {
+
+                        final WorkLogEntry entry = item.getKey();
+                        final Path inputPath = item.getValue();
+
+                        Path outputPath = inputPath;
+                        heartbeat.progress("Migrating" + inputPath.getFileName(), entryIdx++, totalEntries);
+
+                        if (inputPath.toString().endsWith(".parquet")) {
+                            String domain = entry.id();
+                            String id = Integer.toHexString(domain.hashCode());
+
+                            outputPath = CrawlerOutputFile.createSlopPath(root, id, domain);
+
+                            if (Files.exists(inputPath)) {
+                                try {
+                                    SlopCrawlDataRecord.convertFromParquet(inputPath, outputPath);
+                                    Files.deleteIfExists(inputPath);
+                                } catch (Exception ex) {
+                                    outputPath = inputPath; // don't update the work log on error
+                                    logger.error("Failed to convert " + inputPath, ex);
+                                }
+                            }
+                            else if (!Files.exists(inputPath) && !Files.exists(outputPath)) {
+                                // if the input file is missing, and the output file is missing, we just write the log
+                                // record identical to the old one
+                                outputPath = inputPath;
+                            }
+                        }
+
+                        // Write a log entry for the (possibly) converted file
+                        workLog.setJobToFinished(entry.id(), outputPath.toString(), entry.cnt());
+                    }
+                }
+
+                Path oldCrawlerLog = Files.createTempFile(root, "crawler-", ".migrate.old.log");
+                Files.move(crawlerLog, oldCrawlerLog, StandardCopyOption.REPLACE_EXISTING);
+                Files.move(newCrawlerLog, crawlerLog);
+
+                yield new End();
+            }
+            default -> new Error();
+        };
+    }
+
+    private static class CrawlDataLocator implements Function<WorkLogEntry, Optional<Map.Entry<WorkLogEntry, Path>>> {
+
+        private final Path crawlRootDir;
+
+        CrawlDataLocator(Path crawlRootDir) {
+            this.crawlRootDir = crawlRootDir;
+        }
+
+        @Override
+        public Optional<Map.Entry<WorkLogEntry, Path>> apply(WorkLogEntry entry) {
+            var path = getCrawledFilePath(crawlRootDir, entry.path());
+
+            if (!Files.exists(path)) {
+                return Optional.empty();
+            }
+
+            try {
+                return Optional.of(Map.entry(entry, path));
+            }
+            catch (Exception ex) {
+                return Optional.empty();
+            }
+        }
+
+        private Path getCrawledFilePath(Path crawlDir, String fileName) {
+            int sp = fileName.lastIndexOf('/');
+
+            // Normalize the filename
+            if (sp >= 0 && sp + 1< fileName.length())
+                fileName = fileName.substring(sp + 1);
+            if (fileName.length() < 4)
+                fileName = Strings.repeat("0", 4 - fileName.length()) + fileName;
+
+            String sp1 = fileName.substring(0, 2);
+            String sp2 = fileName.substring(2, 4);
+            return crawlDir.resolve(sp1).resolve(sp2).resolve(fileName);
+        }
+    }
+
+    @Override
+    public String describe() {
+        return "Migrates crawl data to the latest format";
+    }
+}
--- a/code/execution/java/nu/marginalia/actor/task/UpdateNsfwFiltersActor.java
+++ b/code/execution/java/nu/marginalia/actor/task/UpdateNsfwFiltersActor.java
@@ -0,0 +1,53 @@
+package nu.marginalia.actor.task;
+
+import com.google.gson.Gson;
+import com.google.inject.Inject;
+import com.google.inject.Singleton;
+import nu.marginalia.actor.prototype.RecordActorPrototype;
+import nu.marginalia.actor.state.ActorStep;
+import nu.marginalia.nsfw.NsfwDomainFilter;
+import nu.marginalia.service.module.ServiceConfiguration;
+
+@Singleton
+public class UpdateNsfwFiltersActor extends RecordActorPrototype {
+    private final ServiceConfiguration serviceConfiguration;
+    private final NsfwDomainFilter nsfwDomainFilter;
+
+    public record Initial() implements ActorStep {}
+    public record Run() implements ActorStep {}
+
+    @Override
+    public ActorStep transition(ActorStep self) throws Exception {
+        return switch(self) {
+            case Initial() -> {
+                if (serviceConfiguration.node() != 1) {
+                    yield new Error("This actor can only run on node 1");
+                }
+                else {
+                    yield new Run();
+                }
+            }
+            case Run() -> {
+                nsfwDomainFilter.fetchLists();
+                yield new End();
+            }
+            default -> new Error();
+        };
+    }
+
+    @Override
+    public String describe() {
+        return "Sync NSFW filters";
+    }
+
+    @Inject
+    public UpdateNsfwFiltersActor(Gson gson,
+                                  ServiceConfiguration serviceConfiguration,
+                                  NsfwDomainFilter nsfwDomainFilter)
+    {
+        super(gson);
+        this.serviceConfiguration = serviceConfiguration;
+        this.nsfwDomainFilter = nsfwDomainFilter;
+    }
+
+}
--- a/code/execution/java/nu/marginalia/execution/ExecutorExportGrpcService.java
+++ b/code/execution/java/nu/marginalia/execution/ExecutorExportGrpcService.java
@@ -49,6 +49,7 @@ public class ExecutorExportGrpcService
                    new ExportSampleDataActor.Export(
                            FileStorageId.of(request.getFileStorageId()),
                            request.getSize(),
+                            request.getCtFilter(),
                            request.getName()
                    )
            );
--- a/code/execution/java/nu/marginalia/process/ProcessService.java
+++ b/code/execution/java/nu/marginalia/process/ProcessService.java
@@ -8,6 +8,7 @@ import nu.marginalia.crawl.CrawlerMain;
 import nu.marginalia.index.IndexConstructorMain;
 import nu.marginalia.livecrawler.LiveCrawlerMain;
 import nu.marginalia.loading.LoaderMain;
+import nu.marginalia.ping.PingMain;
 import nu.marginalia.service.control.ServiceEventLog;
 import nu.marginalia.service.server.BaseServiceParams;
 import nu.marginalia.task.ExportTasksMain;
@@ -41,6 +42,7 @@ public class ProcessService {
        return switch (id) {
            case "converter" -> ProcessId.CONVERTER;
            case "crawler" -> ProcessId.CRAWLER;
+            case "ping" -> ProcessId.PING;
            case "loader" -> ProcessId.LOADER;
            case "export-tasks" -> ProcessId.EXPORT_TASKS;
            case "index-constructor" -> ProcessId.INDEX_CONSTRUCTOR;
@@ -50,6 +52,7 @@ public class ProcessService {

    public enum ProcessId {
        CRAWLER(CrawlerMain.class),
+        PING(PingMain.class),
        LIVE_CRAWLER(LiveCrawlerMain.class),
        CONVERTER(ConverterMain.class),
        LOADER(LoaderMain.class),
@@ -68,6 +71,7 @@ public class ProcessService {
                case LIVE_CRAWLER -> "LIVE_CRAWLER_PROCESS_OPTS";
                case CONVERTER -> "CONVERTER_PROCESS_OPTS";
                case LOADER -> "LOADER_PROCESS_OPTS";
+                case PING -> "PING_PROCESS_OPTS";
                case INDEX_CONSTRUCTOR -> "INDEX_CONSTRUCTION_PROCESS_OPTS";
                case EXPORT_TASKS -> "EXPORT_TASKS_PROCESS_OPTS";
            };
--- a/code/features-search/random-websites/java/nu/marginalia/browse/model/BrowseResultSet.java
+++ b/code/features-search/random-websites/java/nu/marginalia/browse/model/BrowseResultSet.java
@@ -6,4 +6,8 @@ public record BrowseResultSet(Collection<BrowseResult> results, String focusDoma
    public BrowseResultSet(Collection<BrowseResult> results) {
        this(results, "");
    }
+
+    public boolean hasFocusDomain() {
+        return focusDomain != null && !focusDomain.isBlank();
+    }
 }
--- a/code/functions/domain-info/api/java/nu/marginalia/api/domains/DomainsProtobufCodec.java
+++ b/code/functions/domain-info/api/java/nu/marginalia/api/domains/DomainsProtobufCodec.java
@@ -38,6 +38,7 @@ public class DomainsProtobufCodec {
                        sd.getIndexed(),
                        sd.getActive(),
                        sd.getScreenshot(),
+                        sd.getFeed(),
                        SimilarDomain.LinkType.valueOf(sd.getLinkType().name())
                );
            }
--- a/code/functions/domain-info/api/java/nu/marginalia/api/domains/model/DomainInformation.java
+++ b/code/functions/domain-info/api/java/nu/marginalia/api/domains/model/DomainInformation.java
@@ -71,6 +71,23 @@ public class DomainInformation {
        return new String(Character.toChars(firstChar)) + new String(Character.toChars(secondChar));
    }

+    public String getAsnFlag() {
+        if (asnCountry == null || asnCountry.codePointCount(0, asnCountry.length()) != 2) {
+            return "";
+        }
+        String country = asnCountry;
+
+        if ("UK".equals(country)) {
+            country = "GB";
+        }
+
+        int offset = 0x1F1E6;
+        int asciiOffset = 0x41;
+        int firstChar = Character.codePointAt(country, 0) - asciiOffset + offset;
+        int secondChar = Character.codePointAt(country, 1) - asciiOffset + offset;
+        return new String(Character.toChars(firstChar)) + new String(Character.toChars(secondChar));
+    }
+
    public EdgeDomain getDomain() {
        return this.domain;
    }
--- a/code/functions/domain-info/api/java/nu/marginalia/api/domains/model/SimilarDomain.java
+++ b/code/functions/domain-info/api/java/nu/marginalia/api/domains/model/SimilarDomain.java
@@ -9,6 +9,7 @@ public record SimilarDomain(EdgeUrl url,
                            boolean indexed,
                            boolean active,
                            boolean screenshot,
+                            boolean feed,
                            LinkType linkType) {

    public String getRankSymbols() {
@@ -52,12 +53,12 @@ public record SimilarDomain(EdgeUrl url,
            return NONE;
        }

-        public String toString() {
+        public String faIcon() {
            return switch (this) {
-                case FOWARD -> "&#8594;";
-                case BACKWARD -> "&#8592;";
-                case BIDIRECTIONAL -> "&#8646;";
-                case NONE -> "-";
+                case FOWARD -> "fa-solid fa-arrow-right";
+                case BACKWARD -> "fa-solid fa-arrow-left";
+                case BIDIRECTIONAL -> "fa-solid fa-arrow-right-arrow-left";
+                case NONE -> "";
            };
        }

--- a/code/functions/domain-info/api/src/main/protobuf/domain-info.proto
+++ b/code/functions/domain-info/api/src/main/protobuf/domain-info.proto
@@ -101,6 +101,7 @@ message RpcSimilarDomain {
  bool active = 6;
  bool screenshot = 7;
  LINK_TYPE linkType = 8;
+  bool feed = 9;

  enum LINK_TYPE {
      BACKWARD = 0;
--- a/code/functions/domain-info/java/nu/marginalia/functions/domains/SimilarDomainsService.java
+++ b/code/functions/domain-info/java/nu/marginalia/functions/domains/SimilarDomainsService.java
@@ -9,6 +9,7 @@ import gnu.trove.map.hash.TIntIntHashMap;
 import gnu.trove.set.TIntSet;
 import gnu.trove.set.hash.TIntHashSet;
 import it.unimi.dsi.fastutil.ints.Int2DoubleArrayMap;
+import nu.marginalia.WmsaHome;
 import nu.marginalia.api.domains.RpcSimilarDomain;
 import nu.marginalia.api.domains.model.SimilarDomain;
 import nu.marginalia.api.linkgraph.AggregateLinkGraphClient;
@@ -17,10 +18,14 @@ import org.roaringbitmap.RoaringBitmap;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;

+import java.nio.file.Path;
+import java.sql.DriverManager;
 import java.sql.ResultSet;
 import java.sql.SQLException;
 import java.util.ArrayList;
+import java.util.HashSet;
 import java.util.List;
+import java.util.Set;
 import java.util.concurrent.Executors;
 import java.util.concurrent.ScheduledExecutorService;
 import java.util.concurrent.TimeUnit;
@@ -32,12 +37,13 @@ public class SimilarDomainsService {
    private final HikariDataSource dataSource;
    private final AggregateLinkGraphClient linkGraphClient;

-    private volatile TIntIntHashMap domainIdToIdx = new TIntIntHashMap(100_000);
+    private final TIntIntHashMap domainIdToIdx = new TIntIntHashMap(100_000);
    private volatile int[] domainIdxToId;

    public volatile Int2DoubleArrayMap[] relatedDomains;
    public volatile TIntList[] domainNeighbors = null;
    public volatile RoaringBitmap screenshotDomains = null;
+    public volatile RoaringBitmap feedDomains = null;
    public volatile RoaringBitmap activeDomains = null;
    public volatile RoaringBitmap indexedDomains = null;
    public volatile TIntDoubleHashMap domainRanks = null;
@@ -82,6 +88,7 @@ public class SimilarDomainsService {
                domainNames = new String[domainIdToIdx.size()];
                domainNeighbors = new TIntList[domainIdToIdx.size()];
                screenshotDomains = new RoaringBitmap();
+                feedDomains = new RoaringBitmap();
                activeDomains = new RoaringBitmap();
                indexedDomains = new RoaringBitmap();
                relatedDomains = new Int2DoubleArrayMap[domainIdToIdx.size()];
@@ -145,10 +152,12 @@ public class SimilarDomainsService {
                        activeDomains.add(idx);
                }

-                updateScreenshotInfo();
-
                logger.info("Loaded {} domains", domainRanks.size());
                isReady = true;
+
+                // We can defer these as they only populate a roaringbitmap, and will degrade gracefully when not complete
+                updateScreenshotInfo();
+                updateFeedInfo();
            }
        }
        catch (SQLException throwables) {
@@ -156,6 +165,42 @@ public class SimilarDomainsService {
        }
    }

+    private void updateFeedInfo() {
+        Set<String> feedsDomainNames = new HashSet<>(500_000);
+        Path readerDbPath = WmsaHome.getDataPath().resolve("rss-feeds.db").toAbsolutePath();
+        String dbUrl = "jdbc:sqlite:" + readerDbPath;
+
+        logger.info("Opening feed db at " + dbUrl);
+
+        try (var conn = DriverManager.getConnection(dbUrl);
+             var stmt = conn.createStatement()) {
+            var rs = stmt.executeQuery("""
+                select
+                    json_extract(feed, '$.domain') as domain
+                from feed
+                where json_array_length(feed, '$.items') > 0
+                """);
+            while (rs.next()) {
+                feedsDomainNames.add(rs.getString(1));
+            }
+        }
+        catch (SQLException ex) {
+            logger.error("Failed to read RSS feed items", ex);
+        }
+
+        for (int idx = 0; idx < domainNames.length; idx++) {
+            String name = domainNames[idx];
+            if (name == null) {
+                continue;
+            }
+
+            if (feedsDomainNames.contains(name)) {
+                feedDomains.add(idx);
+            }
+        }
+
+    }
+
    private void updateScreenshotInfo() {
        try (var connection = dataSource.getConnection()) {
            try (var stmt = connection.createStatement()) {
@@ -254,6 +299,7 @@ public class SimilarDomainsService {
                    .setIndexed(indexedDomains.contains(idx))
                    .setActive(activeDomains.contains(idx))
                    .setScreenshot(screenshotDomains.contains(idx))
+                    .setFeed(feedDomains.contains(idx))
                    .setLinkType(RpcSimilarDomain.LINK_TYPE.valueOf(linkType.name()))
                    .build());

@@ -369,6 +415,7 @@ public class SimilarDomainsService {
                            .setIndexed(indexedDomains.contains(idx))
                            .setActive(activeDomains.contains(idx))
                            .setScreenshot(screenshotDomains.contains(idx))
+                            .setFeed(feedDomains.contains(idx))
                            .setLinkType(RpcSimilarDomain.LINK_TYPE.valueOf(linkType.name()))
                    .build());

--- a/code/functions/favicon/api/build.gradle
+++ b/code/functions/favicon/api/build.gradle
@@ -0,0 +1,47 @@
+plugins {
+    id 'java'
+
+    id "com.google.protobuf" version "0.9.4"
+    id 'jvm-test-suite'
+}
+
+java {
+    toolchain {
+        languageVersion.set(JavaLanguageVersion.of(rootProject.ext.jvmVersion))
+    }
+}
+
+jar.archiveBaseName = 'favicon-api'
+
+apply from: "$rootProject.projectDir/protobuf.gradle"
+apply from: "$rootProject.projectDir/srcsets.gradle"
+
+dependencies {
+    implementation project(':code:common:model')
+    implementation project(':code:common:config')
+    implementation project(':code:common:service')
+
+    implementation libs.bundles.slf4j
+
+    implementation libs.prometheus
+    implementation libs.notnull
+    implementation libs.guava
+    implementation dependencies.create(libs.guice.get()) {
+        exclude group: 'com.google.guava'
+    }
+    implementation libs.gson
+    implementation libs.bundles.protobuf
+    implementation libs.guava
+    libs.bundles.grpc.get().each {
+        implementation dependencies.create(it) {
+            exclude group: 'com.google.guava'
+        }
+    }
+
+
+
+    testImplementation libs.bundles.slf4j.test
+    testImplementation libs.bundles.junit
+    testImplementation libs.mockito
+
+}
--- a/code/functions/favicon/api/java/nu/marginalia/api/favicon/FaviconClient.java
+++ b/code/functions/favicon/api/java/nu/marginalia/api/favicon/FaviconClient.java
@@ -0,0 +1,39 @@
+package nu.marginalia.api.favicon;
+
+import com.google.inject.Inject;
+import nu.marginalia.service.client.GrpcChannelPoolFactory;
+import nu.marginalia.service.client.GrpcMultiNodeChannelPool;
+import nu.marginalia.service.discovery.property.ServiceKey;
+import nu.marginalia.service.discovery.property.ServicePartition;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.Optional;
+
+public class FaviconClient {
+    private static final Logger logger = LoggerFactory.getLogger(FaviconClient.class);
+
+    private final GrpcMultiNodeChannelPool<FaviconAPIGrpc.FaviconAPIBlockingStub> channelPool;
+
+    @Inject
+    public FaviconClient(GrpcChannelPoolFactory factory) {
+        this.channelPool = factory.createMulti(
+                ServiceKey.forGrpcApi(FaviconAPIGrpc.class, ServicePartition.multi()),
+                FaviconAPIGrpc::newBlockingStub);
+    }
+
+    public record FaviconData(byte[] bytes, String contentType) {}
+
+
+    public Optional<FaviconData> getFavicon(String domain, int node) {
+        RpcFaviconResponse rsp = channelPool.call(FaviconAPIGrpc.FaviconAPIBlockingStub::getFavicon)
+                .forNode(node)
+                .run(RpcFaviconRequest.newBuilder().setDomain(domain).build());
+
+        if (rsp.getData().isEmpty())
+            return Optional.empty();
+
+        return Optional.of(new FaviconData(rsp.getData().toByteArray(), rsp.getContentType()));
+    }
+
+}
--- a/code/functions/favicon/api/src/main/protobuf/favicon.proto
+++ b/code/functions/favicon/api/src/main/protobuf/favicon.proto
@@ -0,0 +1,20 @@
+syntax="proto3";
+package marginalia.api.favicon;
+
+option java_package="nu.marginalia.api.favicon";
+option java_multiple_files=true;
+
+service FaviconAPI {
+  /** Fetches information about a domain. */
+  rpc getFavicon(RpcFaviconRequest) returns (RpcFaviconResponse) {}
+}
+
+message RpcFaviconRequest {
+  string domain = 1;
+}
+
+message RpcFaviconResponse {
+  string domain = 1;
+  bytes data = 2;
+  string contentType = 3;
+}
--- a/code/functions/favicon/build.gradle
+++ b/code/functions/favicon/build.gradle
@@ -0,0 +1,49 @@
+plugins {
+    id 'java'
+
+    id 'application'
+    id 'jvm-test-suite'
+}
+
+java {
+    toolchain {
+        languageVersion.set(JavaLanguageVersion.of(rootProject.ext.jvmVersion))
+    }
+}
+
+apply from: "$rootProject.projectDir/srcsets.gradle"
+
+dependencies {
+    implementation project(':code:common:config')
+    implementation project(':code:common:service')
+    implementation project(':code:common:model')
+    implementation project(':code:common:db')
+    implementation project(':code:functions:favicon:api')
+    implementation project(':code:processes:crawling-process')
+
+    implementation libs.bundles.slf4j
+
+    implementation libs.prometheus
+    implementation libs.guava
+    libs.bundles.grpc.get().each {
+        implementation dependencies.create(it) {
+            exclude group: 'com.google.guava'
+        }
+    }
+
+
+    implementation libs.notnull
+    implementation libs.guava
+    implementation dependencies.create(libs.guice.get()) {
+        exclude group: 'com.google.guava'
+    }
+    implementation dependencies.create(libs.spark.get()) {
+        exclude group: 'org.eclipse.jetty'
+    }
+
+    testImplementation libs.bundles.slf4j.test
+    testImplementation libs.bundles.junit
+    testImplementation libs.mockito
+
+
+}
--- a/code/functions/favicon/java/nu/marginalia/functions/favicon/FaviconGrpcService.java
+++ b/code/functions/favicon/java/nu/marginalia/functions/favicon/FaviconGrpcService.java
@@ -0,0 +1,48 @@
+package nu.marginalia.functions.favicon;
+
+import com.google.inject.Inject;
+import com.google.inject.Singleton;
+import com.google.protobuf.ByteString;
+import io.grpc.stub.StreamObserver;
+import nu.marginalia.api.favicon.FaviconAPIGrpc;
+import nu.marginalia.api.favicon.RpcFaviconRequest;
+import nu.marginalia.api.favicon.RpcFaviconResponse;
+import nu.marginalia.crawl.DomainStateDb;
+import nu.marginalia.service.server.DiscoverableService;
+
+import java.util.Optional;
+
+@Singleton
+public class FaviconGrpcService extends FaviconAPIGrpc.FaviconAPIImplBase implements DiscoverableService {
+    private final DomainStateDb domainStateDb;
+
+    @Inject
+    public FaviconGrpcService(DomainStateDb domainStateDb) {
+        this.domainStateDb = domainStateDb;
+    }
+
+    public boolean shouldRegisterService() {
+        return domainStateDb.isAvailable();
+    }
+
+    @Override
+    public void getFavicon(RpcFaviconRequest request, StreamObserver<RpcFaviconResponse> responseObserver) {
+        Optional<DomainStateDb.FaviconRecord> icon = domainStateDb.getIcon(request.getDomain());
+
+        RpcFaviconResponse response;
+        if (icon.isEmpty()) {
+            response = RpcFaviconResponse.newBuilder().build();
+        }
+        else {
+            var iconRecord = icon.get();
+            response = RpcFaviconResponse.newBuilder()
+                            .setContentType(iconRecord.contentType())
+                            .setDomain(request.getDomain())
+                            .setData(ByteString.copyFrom(iconRecord.imageData()))
+                            .build();
+        }
+
+        responseObserver.onNext(response);
+        responseObserver.onCompleted();
+    }
+}
--- a/code/functions/live-capture/api/java/nu/marginalia/api/feeds/FeedsClient.java
+++ b/code/functions/live-capture/api/java/nu/marginalia/api/feeds/FeedsClient.java
@@ -59,12 +59,6 @@ public class FeedsClient {
                .forEachRemaining(rsp -> consumer.accept(rsp.getDomain(), new ArrayList<>(rsp.getUrlList())));
    }

-    public record UpdatedDomain(String domain, List<String> urls) {
-        public UpdatedDomain(RpcUpdatedLinksResponse rsp) {
-            this(rsp.getDomain(), new ArrayList<>(rsp.getUrlList()));
-        }
-    }
-
    /** Get the hash of the feed data, for identifying when the data has been updated */
    public String getFeedDataHash() {
        return channelPool.call(FeedApiGrpc.FeedApiBlockingStub::getFeedDataHash)
--- a/code/functions/live-capture/api/java/nu/marginalia/api/livecapture/LiveCaptureClient.java
+++ b/code/functions/live-capture/api/java/nu/marginalia/api/livecapture/LiveCaptureClient.java
@@ -5,6 +5,7 @@ import com.google.inject.Singleton;
 import nu.marginalia.api.livecapture.LiveCaptureApiGrpc.LiveCaptureApiBlockingStub;
 import nu.marginalia.service.client.GrpcChannelPoolFactory;
 import nu.marginalia.service.client.GrpcSingleNodeChannelPool;
+import nu.marginalia.service.client.ServiceNotAvailableException;
 import nu.marginalia.service.discovery.property.ServiceKey;
 import nu.marginalia.service.discovery.property.ServicePartition;
 import org.slf4j.Logger;
@@ -29,6 +30,9 @@ public class LiveCaptureClient {
            channelPool.call(LiveCaptureApiBlockingStub::requestScreengrab)
                    .run(RpcDomainId.newBuilder().setDomainId(domainId).build());
        }
+        catch (ServiceNotAvailableException e) {
+            logger.info("requestScreengrab() failed since the service is not available");
+        }
        catch (Exception e) {
            logger.error("API Exception", e);
        }
--- a/code/functions/live-capture/api/src/main/protobuf/feeds.proto
+++ b/code/functions/live-capture/api/src/main/protobuf/feeds.proto
@@ -46,6 +46,7 @@ message RpcFeed {
  string feedUrl = 3;
  string updated = 4;
  repeated RpcFeedItem items = 5;
+  int64 fetchTimestamp = 6;
 }

 message RpcFeedItem {
--- a/code/functions/live-capture/build.gradle
+++ b/code/functions/live-capture/build.gradle
@@ -24,14 +24,17 @@ dependencies {
    implementation project(':code:libraries:message-queue')

    implementation project(':code:execution:api')
+    implementation project(':code:processes:crawling-process:ft-content-type')
+    implementation project(':third-party:rssreader')

    implementation libs.jsoup
-    implementation libs.rssreader
    implementation libs.opencsv
+    implementation libs.slop
    implementation libs.sqlite
    implementation libs.bundles.slf4j
    implementation libs.commons.lang3
    implementation libs.commons.io
+    implementation libs.wiremock

    implementation libs.prometheus
    implementation libs.guava
@@ -54,8 +57,6 @@ dependencies {
    implementation libs.bundles.gson
    implementation libs.bundles.mariadb

-
-
    testImplementation libs.bundles.slf4j.test
    testImplementation libs.bundles.junit
    testImplementation libs.mockito
--- a/code/functions/live-capture/java/nu/marginalia/domsample/DomSampleService.java
+++ b/code/functions/live-capture/java/nu/marginalia/domsample/DomSampleService.java
@@ -0,0 +1,126 @@
+package nu.marginalia.domsample;
+
+import com.google.inject.Inject;
+import com.zaxxer.hikari.HikariDataSource;
+import jakarta.inject.Named;
+import nu.marginalia.domsample.db.DomSampleDb;
+import nu.marginalia.livecapture.BrowserlessClient;
+import nu.marginalia.service.module.ServiceConfiguration;
+import org.apache.commons.lang3.StringUtils;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.net.URI;
+import java.net.URISyntaxException;
+import java.time.Duration;
+import java.util.HashSet;
+import java.util.Set;
+import java.util.concurrent.TimeUnit;
+
+public class DomSampleService {
+    private final DomSampleDb db;
+    private final HikariDataSource mariadbDataSource;
+    private final URI browserlessURI;
+
+    private static final Logger logger = LoggerFactory.getLogger(DomSampleService.class);
+
+    @Inject
+    public DomSampleService(DomSampleDb db,
+                            HikariDataSource mariadbDataSource,
+                            @Named("browserless-uri") String browserlessAddress,
+                            ServiceConfiguration serviceConfiguration)
+            throws URISyntaxException
+    {
+        this.db = db;
+        this.mariadbDataSource = mariadbDataSource;
+
+        if (StringUtils.isEmpty(browserlessAddress) || serviceConfiguration.node() > 1) {
+            logger.warn("Live capture service will not run");
+            browserlessURI = null;
+        }
+        else {
+            browserlessURI = new URI(browserlessAddress);
+        }
+    }
+
+    public void start() {
+        if (browserlessURI == null) {
+            logger.warn("DomSampleService is not enabled due to missing browserless URI or multi-node configuration");
+            return;
+        }
+
+        Thread.ofPlatform().daemon().start(this::run);
+    }
+
+    public void syncDomains() {
+        Set<String> dbDomains = new HashSet<>();
+
+        logger.info("Fetching domains from database...");
+
+        try (var conn = mariadbDataSource.getConnection();
+            var stmt = conn.prepareStatement("""
+                SELECT DOMAIN_NAME 
+                FROM EC_DOMAIN 
+                WHERE NODE_AFFINITY>0
+                """)
+        ) {
+            var rs = stmt.executeQuery();
+            while (rs.next()) {
+                dbDomains.add(rs.getString("DOMAIN_NAME"));
+            }
+        } catch (Exception e) {
+            throw new RuntimeException("Failed to sync domains", e);
+        }
+
+        logger.info("Found {} domains in database", dbDomains.size());
+
+        db.syncDomains(dbDomains);
+
+        logger.info("Synced domains to sqlite");
+    }
+
+    public void run() {
+
+        try (var client = new BrowserlessClient(browserlessURI)) {
+
+            while (!Thread.currentThread().isInterrupted()) {
+
+                try {
+                    // Grace sleep in case we're operating on an empty domain list
+                    TimeUnit.SECONDS.sleep(15);
+
+                    syncDomains();
+                    var domains = db.getScheduledDomains();
+
+                    for (var domain : domains) {
+                        updateDomain(client, domain);
+                    }
+                } catch (InterruptedException e) {
+                    Thread.currentThread().interrupt();
+                    logger.info("DomSampleService interrupted, stopping...");
+                    return;
+                } catch (Exception e) {
+                    logger.error("Error in DomSampleService run loop", e);
+                }
+            }
+
+        }
+    }
+
+    private void updateDomain(BrowserlessClient client, String domain) {
+        var rootUrl = "https://" + domain + "/";
+        try {
+            var content = client.annotatedContent(rootUrl, new BrowserlessClient.GotoOptions("load", Duration.ofSeconds(10).toMillis()));
+
+            if (content.isPresent()) {
+                db.saveSample(domain, rootUrl, content.get());
+            }
+        } catch (Exception e) {
+            logger.error("Failed to process domain: " + domain, e);
+        }
+        finally {
+            db.flagDomainAsFetched(domain);
+        }
+    }
+
+}
--- a/code/functions/live-capture/java/nu/marginalia/domsample/db/DomSampleDb.java
+++ b/code/functions/live-capture/java/nu/marginalia/domsample/db/DomSampleDb.java
@@ -0,0 +1,174 @@
+package nu.marginalia.domsample.db;
+
+import nu.marginalia.WmsaHome;
+import org.jsoup.Jsoup;
+
+import java.nio.file.Path;
+import java.sql.Connection;
+import java.sql.DriverManager;
+import java.sql.SQLException;
+import java.util.*;
+
+public class DomSampleDb implements AutoCloseable {
+    private static final String dbFileName = "dom-sample.db";
+    private final Connection connection;
+
+    public DomSampleDb() throws SQLException{
+        this(WmsaHome.getDataPath().resolve(dbFileName));
+    }
+
+    public DomSampleDb(Path dbPath) throws SQLException {
+        String dbUrl = "jdbc:sqlite:" + dbPath.toAbsolutePath();
+
+        connection = DriverManager.getConnection(dbUrl);
+
+        try (var stmt = connection.createStatement()) {
+            stmt.executeUpdate("CREATE TABLE IF NOT EXISTS samples (url TEXT PRIMARY KEY, domain TEXT, sample BLOB, requests BLOB, accepted_popover BOOLEAN DEFAULT FALSE)");
+            stmt.executeUpdate("CREATE INDEX IF NOT EXISTS domain_index ON samples (domain)");
+            stmt.executeUpdate("CREATE TABLE IF NOT EXISTS schedule (domain TEXT PRIMARY KEY, last_fetch TIMESTAMP DEFAULT NULL)");
+            stmt.execute("PRAGMA journal_mode=WAL");
+        }
+
+    }
+
+    public void syncDomains(Set<String> domains) {
+        Set<String> currentDomains = new HashSet<>();
+        try (var stmt = connection.prepareStatement("SELECT domain FROM schedule")) {
+            var rs = stmt.executeQuery();
+            while (rs.next()) {
+                currentDomains.add(rs.getString("domain"));
+            }
+        } catch (SQLException e) {
+            throw new RuntimeException("Failed to sync domains", e);
+        }
+
+        Set<String> toRemove = new HashSet<>(currentDomains);
+        Set<String> toAdd = new HashSet<>(domains);
+
+        toRemove.removeAll(domains);
+        toAdd.removeAll(currentDomains);
+
+        try (var removeStmt = connection.prepareStatement("DELETE FROM schedule WHERE domain = ?");
+                var addStmt = connection.prepareStatement("INSERT OR IGNORE INTO schedule (domain) VALUES (?)")
+        ) {
+            for (String domain : toRemove) {
+                removeStmt.setString(1, domain);
+                removeStmt.executeUpdate();
+            }
+
+            for (String domain : toAdd) {
+                addStmt.setString(1, domain);
+                addStmt.executeUpdate();
+            }
+        } catch (SQLException e) {
+            throw new RuntimeException("Failed to remove domains", e);
+        }
+    }
+
+    public List<String> getScheduledDomains() {
+        List<String> domains = new ArrayList<>();
+        try (var stmt = connection.prepareStatement("SELECT domain FROM schedule ORDER BY last_fetch IS NULL DESC, last_fetch ASC")) {
+            var rs = stmt.executeQuery();
+            while (rs.next()) {
+                domains.add(rs.getString("domain"));
+            }
+        } catch (SQLException e) {
+            throw new RuntimeException("Failed to get scheduled domains", e);
+        }
+        return domains;
+    }
+
+    public void flagDomainAsFetched(String domain) {
+        try (var stmt = connection.prepareStatement("INSERT OR REPLACE INTO schedule (domain, last_fetch) VALUES (?, CURRENT_TIMESTAMP)")) {
+            stmt.setString(1, domain);
+            stmt.executeUpdate();
+        } catch (SQLException e) {
+            throw new RuntimeException("Failed to flag domain as fetched", e);
+        }
+    }
+
+
+    public record Sample(String url, String domain, String sample, String requests, boolean acceptedPopover) {}
+
+    public List<Sample> getSamples(String domain) throws SQLException {
+        List<Sample> samples = new ArrayList<>();
+
+        try (var stmt = connection.prepareStatement("""
+                SELECT url, sample, requests, accepted_popover
+                FROM samples 
+                WHERE domain = ?
+                """))
+        {
+            stmt.setString(1, domain);
+            var rs = stmt.executeQuery();
+            while (rs.next()) {
+                samples.add(
+                        new Sample(
+                                rs.getString("url"),
+                                domain,
+                                rs.getString("sample"),
+                                rs.getString("requests"),
+                                rs.getBoolean("accepted_popover")
+                        )
+                );
+            }
+        }
+        return samples;
+    }
+
+    public void saveSample(String domain, String url, String rawContent) throws SQLException {
+        var doc = Jsoup.parse(rawContent);
+
+        var networkRequests = doc.getElementById("marginalia-network-requests");
+
+        boolean acceptedPopover = false;
+
+        StringBuilder requestTsv = new StringBuilder();
+        if (networkRequests != null) {
+
+            acceptedPopover = !networkRequests.getElementsByClass("marginalia-agreed-cookies").isEmpty();
+
+            for (var request : networkRequests.getElementsByClass("network-request")) {
+                String method = request.attr("data-method");
+                String urlAttr = request.attr("data-url");
+                String timestamp = request.attr("data-timestamp");
+
+                requestTsv
+                        .append(method)
+                        .append('\t')
+                        .append(timestamp)
+                        .append('\t')
+                        .append(urlAttr.replace('\n', ' '))
+                        .append("\n");
+            }
+
+            networkRequests.remove();
+        }
+
+        doc.body().removeAttr("id");
+
+        String sample = doc.html();
+
+        saveSampleRaw(domain, url, sample, requestTsv.toString().trim(), acceptedPopover);
+
+    }
+
+    public void saveSampleRaw(String domain, String url, String sample, String requests, boolean acceptedPopover) throws SQLException {
+        try (var stmt = connection.prepareStatement("""
+                INSERT OR REPLACE 
+                INTO samples (domain, url, sample, requests, accepted_popover) 
+                VALUES (?, ?, ?, ?, ?)
+                """)) {
+            stmt.setString(1, domain);
+            stmt.setString(2, url);
+            stmt.setString(3, sample);
+            stmt.setString(4, requests);
+            stmt.setBoolean(5, acceptedPopover);
+            stmt.executeUpdate();
+        }
+    }
+
+    public void close() throws SQLException {
+        connection.close();
+    }
+}
--- a/code/functions/live-capture/java/nu/marginalia/livecapture/BrowserlessClient.java
+++ b/code/functions/live-capture/java/nu/marginalia/livecapture/BrowserlessClient.java
@@ -1,21 +1,28 @@
 package nu.marginalia.livecapture;

 import com.google.gson.Gson;
+import nu.marginalia.WmsaHome;
 import nu.marginalia.model.gson.GsonFactory;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;

 import java.io.IOException;
 import java.net.URI;
+import java.net.URLEncoder;
 import java.net.http.HttpClient;
 import java.net.http.HttpRequest;
 import java.net.http.HttpResponse;
+import java.nio.charset.StandardCharsets;
 import java.time.Duration;
+import java.util.List;
 import java.util.Map;
+import java.util.Optional;

 /** Client for local browserless.io API */
 public class BrowserlessClient implements AutoCloseable {
+
    private static final Logger logger = LoggerFactory.getLogger(BrowserlessClient.class);
+    private static final String BROWSERLESS_TOKEN = System.getProperty("live-capture.browserless-token", "BROWSERLESS_TOKEN");

    private final HttpClient httpClient = HttpClient.newBuilder()
            .version(HttpClient.Version.HTTP_1_1)
@@ -25,18 +32,21 @@ public class BrowserlessClient implements AutoCloseable {
    private final URI browserlessURI;
    private final Gson gson = GsonFactory.get();

+    private final String userAgent = WmsaHome.getUserAgent().uaString();
+
    public BrowserlessClient(URI browserlessURI) {
        this.browserlessURI = browserlessURI;
    }

-    public String content(String url, GotoOptions gotoOptions) throws IOException, InterruptedException {
+    public Optional<String> content(String url, GotoOptions gotoOptions) throws IOException, InterruptedException {
        Map<String, Object> requestData = Map.of(
                "url", url,
+                "userAgent", userAgent,
                "gotoOptions", gotoOptions
        );

        var request = HttpRequest.newBuilder()
-                .uri(browserlessURI.resolve("/content"))
+                .uri(browserlessURI.resolve("/content?token="+BROWSERLESS_TOKEN))
                .method("POST", HttpRequest.BodyPublishers.ofString(
                        gson.toJson(requestData)
                ))
@@ -47,10 +57,46 @@ public class BrowserlessClient implements AutoCloseable {

        if (rsp.statusCode() >= 300) {
            logger.info("Failed to fetch content for {}, status {}", url, rsp.statusCode());
-            return null;
+            return Optional.empty();
        }

-        return rsp.body();
+        return Optional.of(rsp.body());
+    }
+
+    /** Fetches content with a marginalia hack extension loaded that decorates the DOM with attributes for
+     * certain CSS attributes, to be able to easier identify popovers and other nuisance elements.
+     */
+    public Optional<String> annotatedContent(String url, GotoOptions gotoOptions) throws IOException, InterruptedException {
+        Map<String, Object> requestData = Map.of(
+                "url", url,
+                "userAgent", userAgent,
+                "gotoOptions", gotoOptions,
+                "waitForSelector", Map.of("selector", "#marginaliahack", "timeout", 15000)
+        );
+
+        // Launch parameters for the browserless instance to load the extension
+        Map<String, Object> launchParameters = Map.of(
+                "args", List.of("--load-extension=/dom-export")
+        );
+
+        String launchParametersStr = URLEncoder.encode(gson.toJson(launchParameters), StandardCharsets.UTF_8);
+
+        var request = HttpRequest.newBuilder()
+                .uri(browserlessURI.resolve("/content?token="+BROWSERLESS_TOKEN+"&launch="+launchParametersStr))
+                .method("POST", HttpRequest.BodyPublishers.ofString(
+                        gson.toJson(requestData)
+                ))
+                .header("Content-type", "application/json")
+                .build();
+
+        var rsp = httpClient.send(request, HttpResponse.BodyHandlers.ofString());
+
+        if (rsp.statusCode() >= 300) {
+            logger.info("Failed to fetch annotated content for {}, status {}", url, rsp.statusCode());
+            return Optional.empty();
+        }
+
+        return Optional.of(rsp.body());
    }

    public byte[] screenshot(String url, GotoOptions gotoOptions, ScreenshotOptions screenshotOptions)
@@ -58,12 +104,13 @@ public class BrowserlessClient implements AutoCloseable {

        Map<String, Object> requestData = Map.of(
                "url", url,
+                "userAgent", userAgent,
                "options", screenshotOptions,
                "gotoOptions", gotoOptions
        );

        var request = HttpRequest.newBuilder()
-                .uri(browserlessURI.resolve("/screenshot"))
+                .uri(browserlessURI.resolve("/screenshot?token="+BROWSERLESS_TOKEN))
                .method("POST", HttpRequest.BodyPublishers.ofString(
                        gson.toJson(requestData)
                ))
@@ -82,7 +129,7 @@ public class BrowserlessClient implements AutoCloseable {
    }

    @Override
-    public void close() throws Exception {
+    public void close() {
        httpClient.shutdownNow();
    }

--- a/code/functions/live-capture/java/nu/marginalia/livecapture/LiveCaptureGrpcService.java
+++ b/code/functions/live-capture/java/nu/marginalia/livecapture/LiveCaptureGrpcService.java
@@ -126,7 +126,6 @@ public class LiveCaptureGrpcService
                }
                else {
                    EdgeDomain domain = domainNameOpt.get();
-                    String domainNameStr = domain.toString();

                    if (!isValidDomainForCapture(domain)) {
                        ScreenshotDbOperations.flagDomainAsFetched(conn, domain);
--- a/code/functions/live-capture/java/nu/marginalia/rss/db/FeedDb.java
+++ b/code/functions/live-capture/java/nu/marginalia/rss/db/FeedDb.java
@@ -8,13 +8,16 @@ import nu.marginalia.rss.model.FeedDefinition;
 import nu.marginalia.rss.model.FeedItems;
 import nu.marginalia.service.module.ServiceConfiguration;
 import org.jetbrains.annotations.NotNull;
+import org.jetbrains.annotations.Nullable;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;

 import java.io.BufferedInputStream;
+import java.io.IOException;
 import java.nio.file.Files;
 import java.nio.file.Path;
 import java.nio.file.StandardCopyOption;
+import java.nio.file.attribute.PosixFileAttributes;
 import java.security.MessageDigest;
 import java.time.Instant;
 import java.util.Base64;
@@ -125,6 +128,26 @@ public class FeedDb {
        return FeedItems.none();
    }

+
+    @Nullable
+    public String getEtag(EdgeDomain domain) {
+        if (!feedDbEnabled) {
+            throw new IllegalStateException("Feed database is disabled on this node");
+        }
+
+        // Capture the current reader to avoid concurrency issues
+        FeedDbReader reader = this.reader;
+        try {
+            if (reader != null) {
+                return reader.getEtag(domain);
+            }
+        }
+        catch (Exception e) {
+            logger.error("Error getting etag for " + domain, e);
+        }
+        return null;
+    }
+
    public Optional<String> getFeedAsJson(String domain) {
        if (!feedDbEnabled) {
            throw new IllegalStateException("Feed database is disabled on this node");
@@ -209,4 +232,36 @@ public class FeedDb {

        reader.getLinksUpdatedSince(since, consumer);
    }
+
+    public Instant getFetchTime() {
+        if (!Files.exists(readerDbPath)) {
+            return Instant.EPOCH;
+        }
+
+        try {
+            return Files.readAttributes(readerDbPath, PosixFileAttributes.class)
+                    .creationTime()
+                    .toInstant();
+        }
+        catch (IOException ex) {
+            logger.error("Failed to read the creatiom time of {}", readerDbPath);
+            return Instant.EPOCH;
+        }
+    }
+
+    public boolean hasData() {
+        if (!feedDbEnabled) {
+            throw new IllegalStateException("Feed database is disabled on this node");
+        }
+
+        // Capture the current reader to avoid concurrency issues
+        FeedDbReader reader = this.reader;
+
+        if (reader != null) {
+            return reader.hasData();
+        }
+
+        return false;
+    }
+
 }
--- a/code/functions/live-capture/java/nu/marginalia/rss/db/FeedDbReader.java
+++ b/code/functions/live-capture/java/nu/marginalia/rss/db/FeedDbReader.java
@@ -8,6 +8,7 @@ import nu.marginalia.rss.model.FeedItems;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;

+import javax.annotation.Nullable;
 import java.nio.file.Path;
 import java.sql.Connection;
 import java.sql.DriverManager;
@@ -32,6 +33,7 @@ public class FeedDbReader implements AutoCloseable {
        try (var stmt = connection.createStatement()) {
            stmt.executeUpdate("CREATE TABLE IF NOT EXISTS feed (domain TEXT PRIMARY KEY, feed JSON)");
            stmt.executeUpdate("CREATE TABLE IF NOT EXISTS errors (domain TEXT PRIMARY KEY, cnt INT DEFAULT 0)");
+            stmt.executeUpdate("CREATE TABLE IF NOT EXISTS etags (domain TEXT PRIMARY KEY, etag TEXT)");
        }
    }

@@ -106,6 +108,22 @@ public class FeedDbReader implements AutoCloseable {
        return FeedItems.none();
    }

+    @Nullable
+    public String getEtag(EdgeDomain domain) {
+        try (var stmt = connection.prepareStatement("SELECT etag FROM etags WHERE DOMAIN = ?")) {
+            stmt.setString(1, domain.toString());
+            var rs = stmt.executeQuery();
+
+            if (rs.next()) {
+                return rs.getString(1);
+            }
+        } catch (SQLException e) {
+            logger.error("Error getting etag for " + domain, e);
+        }
+
+        return null;
+    }
+
    private FeedItems deserialize(String string) {
        return gson.fromJson(string, FeedItems.class);
    }
@@ -141,4 +159,18 @@ public class FeedDbReader implements AutoCloseable {
    }


+    public boolean hasData() {
+        try (var stmt = connection.prepareStatement("SELECT 1 FROM feed LIMIT 1")) {
+            var rs = stmt.executeQuery();
+            if (rs.next()) {
+                return rs.getBoolean(1);
+            }
+            else {
+                return false;
+            }
+        }
+        catch (SQLException ex) {
+            return false;
+        }
+    }
 }
--- a/code/functions/live-capture/java/nu/marginalia/rss/db/FeedDbWriter.java
+++ b/code/functions/live-capture/java/nu/marginalia/rss/db/FeedDbWriter.java
@@ -20,6 +20,7 @@ public class FeedDbWriter implements AutoCloseable {
    private final Connection connection;
    private final PreparedStatement insertFeedStmt;
    private final PreparedStatement insertErrorStmt;
+    private final PreparedStatement insertEtagStmt;
    private final Path dbPath;

    private volatile boolean closed = false;
@@ -34,10 +35,12 @@ public class FeedDbWriter implements AutoCloseable {
        try (var stmt = connection.createStatement()) {
            stmt.executeUpdate("CREATE TABLE IF NOT EXISTS feed (domain TEXT PRIMARY KEY, feed JSON)");
            stmt.executeUpdate("CREATE TABLE IF NOT EXISTS errors (domain TEXT PRIMARY KEY, cnt INT DEFAULT 0)");
+            stmt.executeUpdate("CREATE TABLE IF NOT EXISTS etags (domain TEXT PRIMARY KEY, etag TEXT)");
        }

        insertFeedStmt = connection.prepareStatement("INSERT INTO feed (domain, feed) VALUES (?, ?)");
        insertErrorStmt = connection.prepareStatement("INSERT INTO errors (domain, cnt) VALUES (?, ?)");
+        insertEtagStmt = connection.prepareStatement("INSERT INTO etags (domain, etag) VALUES (?, ?)");
    }

    public Path getDbPath() {
@@ -56,6 +59,20 @@ public class FeedDbWriter implements AutoCloseable {
        }
    }

+    public synchronized void saveEtag(String domain, String etag) {
+        if (etag == null || etag.isBlank())
+            return;
+
+        try {
+            insertEtagStmt.setString(1, domain.toLowerCase());
+            insertEtagStmt.setString(2, etag);
+            insertEtagStmt.executeUpdate();
+        }
+        catch (SQLException e) {
+            logger.error("Error saving etag for " + domain, e);
+        }
+    }
+
    public synchronized void setErrorCount(String domain, int count) {
        try {
            insertErrorStmt.setString(1, domain);
--- a/code/functions/live-capture/java/nu/marginalia/rss/model/FeedItem.java
+++ b/code/functions/live-capture/java/nu/marginalia/rss/model/FeedItem.java
@@ -1,6 +1,6 @@
 package nu.marginalia.rss.model;

-import com.apptasticsoftware.rssreader.Item;
+import nu.marginalia.rss.svc.SimpleFeedParser;
 import org.apache.commons.lang3.StringUtils;
 import org.jetbrains.annotations.NotNull;
 import org.jsoup.Jsoup;
@@ -18,37 +18,33 @@ public record FeedItem(String title,
    public static final int MAX_DESC_LENGTH = 255;
    public static final DateTimeFormatter DATE_FORMAT = DateTimeFormatter.ofPattern("yyyy-MM-dd'T'HH:mm:ss.SSSZ");

-    public static FeedItem fromItem(Item item, boolean keepFragment) {
-        String title = item.getTitle().orElse("");
+    public static FeedItem fromItem(SimpleFeedParser.ItemData item, boolean keepFragment) {
+        String title = item.title();
        String date = getItemDate(item);
        String description = getItemDescription(item);
        String url;

-        if (keepFragment || item.getLink().isEmpty()) {
-            url = item.getLink().orElse("");
+        if (keepFragment) {
+            url = item.url();
        }
        else {
            try {
-                String link = item.getLink().get();
+                String link = item.url();
                var linkUri = new URI(link);
                var cleanUri = new URI(linkUri.getScheme(), linkUri.getAuthority(), linkUri.getPath(), linkUri.getQuery(), null);
                url = cleanUri.toString();
            }
            catch (Exception e) {
                // fallback to original link if we can't clean it, this is not a very important step
-                url = item.getLink().get();
+                url = item.url();
            }
        }

        return new FeedItem(title, date, description, url);
    }

-    private static String getItemDescription(Item item) {
-        Optional<String> description = item.getDescription();
-        if (description.isEmpty())
-            return "";
-
-        String rawDescription = description.get();
+    private static String getItemDescription(SimpleFeedParser.ItemData item) {
+        String rawDescription = item.description();
        if (rawDescription.indexOf('<') >= 0) {
            rawDescription = Jsoup.parseBodyFragment(rawDescription).text();
        }
@@ -58,15 +54,18 @@ public record FeedItem(String title,

    // e.g. http://fabiensanglard.net/rss.xml does dates like this:  1 Apr 2021 00:00:00 +0000
    private static final DateTimeFormatter extraFormatter = DateTimeFormatter.ofPattern("d MMM yyyy HH:mm:ss Z");
-    private static String getItemDate(Item item) {
+    private static String getItemDate(SimpleFeedParser.ItemData item) {
        Optional<ZonedDateTime> zonedDateTime = Optional.empty();
        try {
            zonedDateTime = item.getPubDateZonedDateTime();
        }
        catch (Exception e) {
-            zonedDateTime = item.getPubDate()
-                    .map(extraFormatter::parse)
-                    .map(ZonedDateTime::from);
+            try {
+                zonedDateTime = Optional.of(ZonedDateTime.from(extraFormatter.parse(item.pubDate())));
+            }
+            catch (Exception e2) {
+                // ignore
+            }
        }

        return zonedDateTime.map(date -> date.format(DATE_FORMAT)).orElse("");
--- a/code/functions/live-capture/java/nu/marginalia/rss/svc/FeedFetcherService.java
+++ b/code/functions/live-capture/java/nu/marginalia/rss/svc/FeedFetcherService.java
@@ -1,10 +1,10 @@
 package nu.marginalia.rss.svc;

-import com.apptasticsoftware.rssreader.Item;
-import com.apptasticsoftware.rssreader.RssReader;
 import com.google.inject.Inject;
 import com.opencsv.CSVReader;
 import nu.marginalia.WmsaHome;
+import nu.marginalia.contenttype.ContentType;
+import nu.marginalia.contenttype.DocumentBodyToString;
 import nu.marginalia.executor.client.ExecutorClient;
 import nu.marginalia.model.EdgeDomain;
 import nu.marginalia.nodecfg.NodeConfigurationService;
@@ -18,7 +18,6 @@ import nu.marginalia.storage.FileStorageService;
 import nu.marginalia.storage.model.FileStorage;
 import nu.marginalia.storage.model.FileStorageType;
 import nu.marginalia.util.SimpleBlockingThreadPool;
-import org.apache.commons.io.input.BOMInputStream;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;

@@ -30,15 +29,12 @@ import java.net.URISyntaxException;
 import java.net.http.HttpClient;
 import java.net.http.HttpRequest;
 import java.net.http.HttpResponse;
-import java.nio.charset.StandardCharsets;
 import java.sql.SQLException;
-import java.time.Duration;
-import java.time.LocalDateTime;
-import java.time.ZonedDateTime;
+import java.time.*;
 import java.time.format.DateTimeFormatter;
 import java.util.*;
+import java.util.concurrent.ExecutorService;
 import java.util.concurrent.Executors;
-import java.util.concurrent.ThreadLocalRandom;
 import java.util.concurrent.TimeUnit;
 import java.util.concurrent.atomic.AtomicInteger;
 import java.util.function.BiFunction;
@@ -49,8 +45,6 @@ public class FeedFetcherService {
    private static final int MAX_FEED_ITEMS = 10;
    private static final Logger logger = LoggerFactory.getLogger(FeedFetcherService.class);

-    private final RssReader rssReader = new RssReader();
-
    private final FeedDb feedDb;
    private final FileStorageService fileStorageService;
    private final NodeConfigurationService nodeConfigurationService;
@@ -60,7 +54,6 @@ public class FeedFetcherService {
    private final DomainLocks domainLocks = new DomainLocks();

    private volatile boolean updating;
-    private boolean deterministic = false;

    @Inject
    public FeedFetcherService(FeedDb feedDb,
@@ -79,11 +72,6 @@ public class FeedFetcherService {
    public enum UpdateMode {
        CLEAN,
        REFRESH
-    };
-
-    /** Disable random-based heuristics.  This is meant for testing */
-    public void setDeterministic() {
-        this.deterministic = true;
    }

    public void updateFeeds(UpdateMode updateMode) throws IOException {
@@ -92,6 +80,7 @@ public class FeedFetcherService {
            throw new IllegalStateException("Already updating feeds, refusing to start another update");
        }

+
        try (FeedDbWriter writer = feedDb.createWriter();
             HttpClient client = HttpClient.newBuilder()
                .connectTimeout(Duration.ofSeconds(15))
@@ -99,6 +88,8 @@ public class FeedFetcherService {
                .followRedirects(HttpClient.Redirect.NORMAL)
                .version(HttpClient.Version.HTTP_2)
                .build();
+             ExecutorService fetchExecutor = Executors.newCachedThreadPool();
+             FeedJournal feedJournal = FeedJournal.create();
             var heartbeat = serviceHeartbeat.createServiceAdHocTaskHeartbeat("Update Rss Feeds")
        ) {
            updating = true;
@@ -124,36 +115,40 @@ public class FeedFetcherService {

            for (var feed : definitions) {
                executor.submitQuietly(() -> {
-                    var oldData = feedDb.getFeed(new EdgeDomain(feed.domain()));
+                    try {
+                        EdgeDomain domain = new EdgeDomain(feed.domain());
+                        var oldData = feedDb.getFeed(domain);

-                    // If we have existing data, we might skip updating it with a probability that increases with time,
-                    // this is to avoid hammering the feeds that are updated very rarely and save some time and resources
-                    // on our end
+                        @Nullable
+                        String ifModifiedSinceDate = switch(updateMode) {
+                            case REFRESH -> getIfModifiedSinceDate(feedDb);
+                            case CLEAN -> null;
+                        };

-                    if (!oldData.isEmpty()) {
-                        Duration duration = feed.durationSinceUpdated();
-                        long daysSinceUpdate = duration.toDays();
-
-
-                        if (deterministic || (daysSinceUpdate > 2 && ThreadLocalRandom.current()
-                                .nextInt(1, 1 + (int) Math.min(10, daysSinceUpdate) / 2) > 1))
-                        {
-                            // Skip updating this feed, just write the old data back instead
-                            writer.saveFeed(oldData);
-                            return;
-                        }
-                    }
+                        @Nullable
+                        String ifNoneMatchTag = switch (updateMode) {
+                            case REFRESH -> feedDb.getEtag(domain);
+                            case CLEAN -> null;
+                        };

                        FetchResult feedData;
                        try (DomainLocks.DomainLock domainLock = domainLocks.lockDomain(new EdgeDomain(feed.domain()))) {
-                        feedData = fetchFeedData(feed, client);
-                    }
-                    catch (Exception ex) {
+                            feedData = fetchFeedData(feed, client, fetchExecutor, ifModifiedSinceDate, ifNoneMatchTag);
+                        } catch (Exception ex) {
                            feedData = new FetchResult.TransientError();
                        }

                        switch (feedData) {
-                        case FetchResult.Success(String value) -> writer.saveFeed(parseFeed(value, feed));
+                            case FetchResult.Success(String value, String etag) -> {
+                                writer.saveEtag(feed.domain(), etag);
+                                writer.saveFeed(parseFeed(value, feed));
+
+                                feedJournal.record(feed.feedUrl(), value);
+                            }
+                            case FetchResult.NotModified() -> {
+                                writer.saveEtag(feed.domain(), ifNoneMatchTag);
+                                writer.saveFeed(oldData);
+                            }
                            case FetchResult.TransientError() -> {
                                int errorCount = errorCounts.getOrDefault(feed.domain().toLowerCase(), 0);
                                writer.setErrorCount(feed.domain().toLowerCase(), ++errorCount);
@@ -163,13 +158,17 @@ public class FeedFetcherService {
                                    writer.saveFeed(oldData);
                                }
                            }
-                        case FetchResult.PermanentError() -> {} // let the definition be forgotten about
+                            case FetchResult.PermanentError() -> {
+                            } // let the definition be forgotten about
                        }

+                    }
+                    finally {
                        if ((definitionsUpdated.incrementAndGet() % 1_000) == 0) {
                            // Update the progress every 1k feeds, to avoid hammering the database and flooding the logs
                            heartbeat.progress("Updated " + definitionsUpdated + "/" + totalDefinitions + " feeds", definitionsUpdated.get(), totalDefinitions);
                        }
+                    }
                });
            }

@@ -196,30 +195,83 @@ public class FeedFetcherService {
        }
    }

-    private FetchResult fetchFeedData(FeedDefinition feed, HttpClient client) {
+    @Nullable
+    static String getIfModifiedSinceDate(FeedDb feedDb) {
+
+        // If the db is fresh, we don't send If-Modified-Since
+        if (!feedDb.hasData())
+            return null;
+
+        Instant cutoffInstant = feedDb.getFetchTime();
+
+        // If we're unable to establish fetch time, we don't send If-Modified-Since
+        if (cutoffInstant == Instant.EPOCH)
+            return null;
+
+        return cutoffInstant.atZone(ZoneId.of("GMT")).format(DateTimeFormatter.RFC_1123_DATE_TIME);
+    }
+
+    private FetchResult fetchFeedData(FeedDefinition feed,
+                                      HttpClient client,
+                                      ExecutorService executorService,
+                                      @Nullable String ifModifiedSinceDate,
+                                      @Nullable String ifNoneMatchTag)
+    {
        try {
            URI uri = new URI(feed.feedUrl());

-            HttpRequest getRequest = HttpRequest.newBuilder()
+            HttpRequest.Builder requestBuilder = HttpRequest.newBuilder()
                    .GET()
                    .uri(uri)
                    .header("User-Agent", WmsaHome.getUserAgent().uaIdentifier())
+                    .header("Accept-Encoding", "gzip")
                    .header("Accept", "text/*, */*;q=0.9")
                    .timeout(Duration.ofSeconds(15))
-                    .build();
+                    ;
+
+            // Set the If-Modified-Since or If-None-Match headers if we have them
+            // though since there are certain idiosyncrasies in server implementations,
+            // we avoid setting both at the same time as that may turn a 304 into a 200.
+            if (ifNoneMatchTag != null) {
+                requestBuilder.header("If-None-Match", ifNoneMatchTag);
+            } else if (ifModifiedSinceDate != null) {
+                requestBuilder.header("If-Modified-Since", ifModifiedSinceDate);
+            }
+
+
+            HttpRequest getRequest = requestBuilder.build();

            for (int i = 0; i < 3; i++) {
-                var rs = client.send(getRequest, HttpResponse.BodyHandlers.ofString());
-                if (429 == rs.statusCode()) {
+
+                /* Note we need to use an executor to time-limit the send() method in HttpClient, as
+                 * its support for timeouts only applies to the time until response starts to be received,
+                 * and does not catch the case when the server starts to send data but then hangs.
+                 */
+                HttpResponse<byte[]> rs = executorService.submit(
+                        () -> client.send(getRequest, HttpResponse.BodyHandlers.ofByteArray()))
+                                .get(15, TimeUnit.SECONDS);
+
+                if (rs.statusCode() == 429) { // Too Many Requests
                    int retryAfter = Integer.parseInt(rs.headers().firstValue("Retry-After").orElse("2"));
                    Thread.sleep(Duration.ofSeconds(Math.clamp(retryAfter, 1, 5)));
-                } else if (200 == rs.statusCode()) {
-                    return new FetchResult.Success(rs.body());
-                } else if (404 == rs.statusCode()) {
-                    return new FetchResult.PermanentError(); // never try again
-                } else {
-                    return new FetchResult.TransientError(); // we try again in a few days
+                    continue;
                }
+
+                String newEtagValue = rs.headers().firstValue("ETag").orElse("");
+
+                return switch (rs.statusCode()) {
+                    case 200 -> {
+                        byte[] responseData = getResponseData(rs);
+
+                        String contentType = rs.headers().firstValue("Content-Type").orElse("");
+                        String bodyText = DocumentBodyToString.getStringData(ContentType.parse(contentType), responseData);
+
+                        yield new FetchResult.Success(bodyText, newEtagValue);
+                    }
+                    case 304 -> new FetchResult.NotModified(); // via If-Modified-Since semantics
+                    case 404 -> new FetchResult.PermanentError(); // never try again
+                    default -> new FetchResult.TransientError(); // we try again later
+                };
            }
        }
        catch (Exception ex) {
@@ -229,8 +281,22 @@ public class FeedFetcherService {
        return new FetchResult.TransientError();
    }

+    private byte[] getResponseData(HttpResponse<byte[]> response) throws IOException {
+        String encoding = response.headers().firstValue("Content-Encoding").orElse("");
+
+        if ("gzip".equals(encoding)) {
+            try (var stream = new GZIPInputStream(new ByteArrayInputStream(response.body()))) {
+                return stream.readAllBytes();
+            }
+        }
+        else {
+            return response.body();
+        }
+    }
+
    public sealed interface FetchResult {
-        record Success(String value) implements FetchResult {}
+        record Success(String value, String etag) implements FetchResult {}
+        record NotModified() implements FetchResult {}
        record TransientError() implements FetchResult {}
        record PermanentError()  implements FetchResult {}
    }
@@ -300,10 +366,7 @@ public class FeedFetcherService {

    public FeedItems parseFeed(String feedData, FeedDefinition definition) {
        try {
-            List<Item> rawItems = rssReader.read(
-                    // Massage the data to maximize the possibility of the flaky XML parser consuming it
-                    new BOMInputStream(new ByteArrayInputStream(feedData.trim().getBytes(StandardCharsets.UTF_8)), false)
-            ).toList();
+            List<SimpleFeedParser.ItemData> rawItems = SimpleFeedParser.parse(feedData);

            boolean keepUriFragment = rawItems.size() < 2 || areFragmentsDisparate(rawItems);

@@ -333,16 +396,16 @@ public class FeedFetcherService {
     * @param items The items to check
     * @return True if we should keep the fragments, false otherwise
     */
-    private boolean areFragmentsDisparate(List<Item> items) {
+    private boolean areFragmentsDisparate(List<SimpleFeedParser.ItemData> items) {
        Set<String> seenFragments = new HashSet<>();

        try {
            for (var item : items) {
-                if (item.getLink().isEmpty()) {
+                if (item.url().isBlank()) {
                    continue;
                }

-                var link = item.getLink().get();
+                var link = item.url();
                if (!link.contains("#")) {
                    continue;
                }
@@ -361,7 +424,7 @@ public class FeedFetcherService {
        return seenFragments.size() > 1;
    }

-    private static class IsFeedItemDateValid implements Predicate<FeedItem> {
+    static class IsFeedItemDateValid implements Predicate<FeedItem> {
        private final String today = ZonedDateTime.now().format(DateTimeFormatter.ISO_ZONED_DATE_TIME);

        public boolean test(FeedItem item) {
--- a/code/functions/live-capture/java/nu/marginalia/rss/svc/FeedJournal.java
+++ b/code/functions/live-capture/java/nu/marginalia/rss/svc/FeedJournal.java
@@ -0,0 +1,76 @@
+package nu.marginalia.rss.svc;
+
+import nu.marginalia.WmsaHome;
+import nu.marginalia.slop.SlopTable;
+import nu.marginalia.slop.column.string.StringColumn;
+import nu.marginalia.slop.desc.StorageType;
+import org.apache.commons.io.FileUtils;
+
+import java.io.IOException;
+import java.nio.charset.StandardCharsets;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.util.function.BiConsumer;
+
+/** Utility for recording fetched feeds to a journal, useful in debugging feed parser issues.
+ */
+public interface FeedJournal extends AutoCloseable {
+    StringColumn urlColumn = new StringColumn("url");
+    StringColumn contentsColumn = new StringColumn("contents", StandardCharsets.UTF_8, StorageType.ZSTD);
+
+    void record(String url, String contents) throws IOException;
+    void close() throws IOException;
+
+
+    static FeedJournal create() throws IOException {
+        if (Boolean.getBoolean("feedFetcher.persistJournal")) {
+            Path journalPath = WmsaHome.getDataPath().resolve("feed-journal");
+            if (Files.isDirectory(journalPath)) {
+                FileUtils.deleteDirectory(journalPath.toFile());
+            }
+            Files.createDirectories(journalPath);
+            return new RecordingFeedJournal(journalPath);
+        }
+        else {
+            return new NoOpFeedJournal();
+        }
+    }
+
+    class NoOpFeedJournal implements FeedJournal {
+        @Override
+        public void record(String url, String contents) {}
+
+        @Override
+        public void close() {}
+    }
+
+    class RecordingFeedJournal extends SlopTable implements FeedJournal {
+
+        private final StringColumn.Writer urlWriter;
+        private final StringColumn.Writer contentsWriter;
+
+        public RecordingFeedJournal(Path path) throws IOException {
+            super(path, SlopTable.getNumPages(path, FeedJournal.urlColumn));
+
+            urlWriter = urlColumn.create(this);
+            contentsWriter = contentsColumn.create(this);
+        }
+
+        public synchronized void record(String url, String contents) throws IOException {
+            urlWriter.put(url);
+            contentsWriter.put(contents);
+        }
+    }
+
+    static void replay(Path journalPath, BiConsumer<String, String> urlAndContent) throws IOException {
+        try (SlopTable table = new SlopTable(journalPath)) {
+            final StringColumn.Reader urlReader = urlColumn.open(table);
+            final StringColumn.Reader contentsReader = contentsColumn.open(table);
+
+            while (urlReader.hasRemaining()) {
+                urlAndContent.accept(urlReader.get(), contentsReader.get());
+            }
+        }
+
+    }
+}
--- a/code/functions/live-capture/java/nu/marginalia/rss/svc/FeedsGrpcService.java
+++ b/code/functions/live-capture/java/nu/marginalia/rss/svc/FeedsGrpcService.java
@@ -107,8 +107,7 @@ public class FeedsGrpcService extends FeedApiGrpc.FeedApiImplBase implements Dis

    @Override
    public void getFeed(RpcDomainId request,
-                        StreamObserver<RpcFeed> responseObserver)
-    {
+                        StreamObserver<RpcFeed> responseObserver) {
        if (!feedDb.isEnabled()) {
            responseObserver.onError(new IllegalStateException("Feed database is disabled on this node"));
            return;
@@ -126,7 +125,8 @@ public class FeedsGrpcService extends FeedApiGrpc.FeedApiImplBase implements Dis
                .setDomainId(request.getDomainId())
                .setDomain(domainName.get().toString())
                .setFeedUrl(feedItems.feedUrl())
-                .setUpdated(feedItems.updated());
+                .setUpdated(feedItems.updated())
+                .setFetchTimestamp(feedDb.getFetchTime().toEpochMilli());

        for (var item : feedItems.items()) {
            retB.addItemsBuilder()
--- a/code/functions/live-capture/java/nu/marginalia/rss/svc/SimpleFeedParser.java
+++ b/code/functions/live-capture/java/nu/marginalia/rss/svc/SimpleFeedParser.java
@@ -0,0 +1,102 @@
+package nu.marginalia.rss.svc;
+
+import com.apptasticsoftware.rssreader.DateTimeParser;
+import com.apptasticsoftware.rssreader.util.Default;
+import org.jsoup.Jsoup;
+import org.jsoup.parser.Parser;
+
+import java.time.ZonedDateTime;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Optional;
+
+public class SimpleFeedParser {
+
+    private static final DateTimeParser dateTimeParser = Default.getDateTimeParser();
+
+    public record ItemData (
+            String title,
+            String description,
+            String url,
+            String pubDate
+    ) {
+        public boolean isWellFormed() {
+            return title != null && !title.isBlank() &&
+                    description != null && !description.isBlank() &&
+                    url != null && !url.isBlank() &&
+                    pubDate != null && !pubDate.isBlank();
+        }
+
+        public Optional<ZonedDateTime> getPubDateZonedDateTime() {
+            try {
+                return Optional.ofNullable(dateTimeParser.parse(pubDate()));
+            }
+            catch (Exception e) {
+                return Optional.empty();
+            }
+        }
+
+    }
+
+    public static List<ItemData> parse(String content) {
+        var doc = Jsoup.parse(content, Parser.xmlParser());
+        List<ItemData> ret = new ArrayList<>();
+
+        doc.select("item, entry").forEach(element -> {
+            String link = "";
+            String title = "";
+            String description = "";
+            String pubDate = "";
+
+            for (String attr : List.of("title", "dc:title")) {
+                if (!title.isBlank())
+                    break;
+                var tag = element.getElementsByTag(attr).first();
+                if (tag != null) {
+                    title = tag.text();
+                }
+            }
+
+            for (String attr : List.of("title", "summary", "content", "description", "dc:description")) {
+                if (!description.isBlank())
+                    break;
+                var tag = element.getElementsByTag(attr).first();
+                if (tag != null) {
+                    description = tag.text();
+                }
+            }
+
+            for (String attr : List.of("pubDate", "published", "updated", "issued", "created", "dc:date")) {
+                if (!pubDate.isBlank())
+                    break;
+                var tag = element.getElementsByTag(attr).first();
+                if (tag != null) {
+                    pubDate = tag.text();
+                }
+            }
+
+            for (String attr : List.of("link", "url")) {
+                if (!link.isBlank())
+                    break;
+                var tag = element.getElementsByTag(attr).first();
+
+                if (tag != null) {
+                    String linkText = tag.text();
+
+                    if (linkText.isBlank()) {
+                        linkText = tag.attr("href");
+                    }
+
+                    link = linkText;
+                }
+
+            }
+
+            ret.add(new ItemData(title, description, link, pubDate));
+        });
+
+
+        return ret;
+    }
+
+}
--- a/code/functions/live-capture/test-resources/nlnet.atom
+++ b/code/functions/live-capture/test-resources/nlnet.atom
@@ -0,0 +1,27 @@
+<feed xmlns="http://www.w3.org/2005/Atom" xml:base="https://nlnet.nl">
+  <title type="text">NLnet news</title>
+  <updated>2025-01-01T00:00:00Z</updated>
+  <id>https://nlnet.nl/feed.atom</id>
+  <link rel="self" type="application/atom+xml" href="https://nlnet.nl/feed.atom"/>
+  <entry>
+    <id>https://nlnet.nl/news/2025/20250101-announcing-grantees-June-call.html</id>
+    <author>
+      <name>NLnet</name>
+    </author>
+    <title type="xhtml">
+      <div xmlns="http://www.w3.org/1999/xhtml">50 Free and Open Source Projects Selected for NGI Zero grants</div>
+    </title>
+    <link href="/news/2025/20250101-announcing-grantees-June-call.html"/>
+    <updated>2025-01-01T00:00:00Z</updated>
+    <content type="xhtml">
+      <div xmlns="http://www.w3.org/1999/xhtml">
+        <p class="paralead">Happy 2025 everyone! On this first day of the fresh new year we are happy to announce 50 project teams were selected to receive NGI Zero grants. We are welcoming projects from 18 countries involving people and organisations of various types: individuals, associations, small and medium enterprises, foundations, universities, and informal collectives. The new projects are all across the different layers of the NGI technology stack: from trustworthy open hardware to services &amp; applications which provide autonomy for end-users.</p>
+        <p>The 50 free and open source projects were selected across two funds. 19 teams will receive grants from the <a href="/commonsfund/">NGI Zero Commons Fund</a>, a broadly themed fund that supports people working on reclaiming the public nature of the internet. The other 31 projects will work within <a href="/core/">NGI Zero Core</a> which focuses on strengthening the open internet architecture. Both funds offer financial and practical support. The latter consisting of <a href="/NGI0/services/">support services</a> such as accessibility and security audits, advice on license compliance, help with testing, documentation or UX design.</p>
+        <h2>If you applied for a grant</h2>
+        <p>This is the selection for the <a href="https://nlnet.nl/news/2024/20240401-call.html">June call</a>. We always inform <em>all</em> applicants about the outcome of the review ahead of the public announcement, if the are selected or not. If you have not heard anything, you probably applied to a later call that is still under review. You can see which call you applied to by checking the application number assigned to the project when you applied. The second number in the sequence refers to the month of the call, so 06 in the case of the June call. (It should not happen, but if you did apply to the June call and did not hear anything, do contact us.)</p>
+        <h2>Meet the new projects!</h2>
+      </div>
+    </content>
+  </entry>
+
+</feed>
--- a/code/functions/live-capture/test/nu/marginalia/domsample/db/DomSampleDbTest.java
+++ b/code/functions/live-capture/test/nu/marginalia/domsample/db/DomSampleDbTest.java
@@ -0,0 +1,113 @@
+package nu.marginalia.domsample.db;
+
+import org.junit.jupiter.api.AfterEach;
+import org.junit.jupiter.api.BeforeEach;
+import org.junit.jupiter.api.Test;
+import org.testcontainers.shaded.org.apache.commons.io.FileUtils;
+
+import java.io.IOException;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.util.*;
+
+import static org.junit.jupiter.api.Assertions.*;
+
+class DomSampleDbTest {
+    Path tempDir;
+
+    @BeforeEach
+    void setUp() throws Exception {
+        tempDir = Files.createTempDirectory("test");
+    }
+
+    @AfterEach
+    void tearDown() throws IOException {
+        FileUtils.deleteDirectory(tempDir.toFile());
+    }
+
+    @Test
+    public void testSetUp() {
+        var dbPath = tempDir.resolve("test.db");
+        try (var db = new DomSampleDb(dbPath)) {
+        }
+        catch (Exception e) {
+            fail("Failed to set up database: " + e.getMessage());
+        }
+    }
+
+    @Test
+    public void testSyncDomains() {
+        var dbPath = tempDir.resolve("test.db");
+        try (var db = new DomSampleDb(dbPath)) {
+
+            db.syncDomains(Set.of("example.com", "test.com", "foobar.com"));
+            assertEquals(Set.of("example.com", "test.com", "foobar.com"), new HashSet<>(db.getScheduledDomains()));
+            db.syncDomains(Set.of("example.com", "test.com"));
+            assertEquals(Set.of("example.com", "test.com"), new HashSet<>(db.getScheduledDomains()));
+            db.syncDomains(Set.of("foobar.com", "test.com"));
+            assertEquals(Set.of("foobar.com", "test.com"), new HashSet<>(db.getScheduledDomains()));
+        }
+        catch (Exception e) {
+            fail("Failed to sync domains: " + e.getMessage());
+        }
+    }
+
+    @Test
+    public void testFetchDomains() {
+        var dbPath = tempDir.resolve("test.db");
+        try (var db = new DomSampleDb(dbPath)) {
+
+            db.syncDomains(Set.of("example.com", "test.com", "foobar.com"));
+            db.flagDomainAsFetched("example.com");
+            db.flagDomainAsFetched("test.com");
+            db.flagDomainAsFetched("foobar.com");
+            assertEquals(List.of("example.com", "test.com", "foobar.com"), db.getScheduledDomains());
+            db.flagDomainAsFetched("test.com");
+            assertEquals(List.of("example.com", "foobar.com", "test.com"), db.getScheduledDomains());
+        }
+        catch (Exception e) {
+            fail("Failed to sync domains: " + e.getMessage());
+        }
+    }
+
+    @Test
+    public void saveLoadSingle() {
+        var dbPath = tempDir.resolve("test.db");
+        try (var db = new DomSampleDb(dbPath)) {
+            db.saveSampleRaw("example.com", "http://example.com/sample", "sample data", "requests data", true);
+            var samples = db.getSamples("example.com");
+            assertEquals(1, samples.size());
+            var sample = samples.getFirst();
+            assertEquals("example.com", sample.domain());
+            assertEquals("http://example.com/sample", sample.url());
+            assertEquals("sample data", sample.sample());
+            assertEquals("requests data", sample.requests());
+            assertTrue(sample.acceptedPopover());
+        }
+        catch (Exception e) {
+            fail("Failed to save/load sample: " + e.getMessage());
+        }
+    }
+
+    @Test
+    public void saveLoadTwo() {
+        var dbPath = tempDir.resolve("test.db");
+        try (var db = new DomSampleDb(dbPath)) {
+            db.saveSampleRaw("example.com", "http://example.com/sample", "sample data", "r1", true);
+            db.saveSampleRaw("example.com", "http://example.com/sample2", "sample data2", "r2", false);
+            var samples = db.getSamples("example.com");
+            assertEquals(2, samples.size());
+
+            Map<String, String> samplesByUrl = new HashMap<>();
+            for (var sample : samples) {
+                samplesByUrl.put(sample.url(), sample.sample());
+            }
+
+            assertEquals("sample data", samplesByUrl.get("http://example.com/sample"));
+            assertEquals("sample data2", samplesByUrl.get("http://example.com/sample2"));
+        }
+        catch (Exception e) {
+            fail("Failed to save/load sample: " + e.getMessage());
+        }
+    }
+}
--- a/code/functions/live-capture/test/nu/marginalia/livecapture/BrowserlessClientTest.java
+++ b/code/functions/live-capture/test/nu/marginalia/livecapture/BrowserlessClientTest.java
@@ -1,36 +1,137 @@
 package nu.marginalia.livecapture;

+import com.github.tomakehurst.wiremock.WireMockServer;
+import com.github.tomakehurst.wiremock.core.WireMockConfiguration;
+import nu.marginalia.WmsaHome;
+import nu.marginalia.domsample.db.DomSampleDb;
+import nu.marginalia.service.module.ServiceConfigurationModule;
 import org.junit.jupiter.api.Assertions;
 import org.junit.jupiter.api.BeforeAll;
+import org.junit.jupiter.api.Tag;
 import org.junit.jupiter.api.Test;
 import org.testcontainers.containers.GenericContainer;
+import org.testcontainers.images.PullPolicy;
 import org.testcontainers.junit.jupiter.Testcontainers;
 import org.testcontainers.utility.DockerImageName;

+import java.io.IOException;
 import java.net.URI;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.util.Map;
+
+import static com.github.tomakehurst.wiremock.client.WireMock.*;
+

@Testcontainers
+@Tag("slow")
 public class BrowserlessClientTest {
-    static GenericContainer<?> container = new GenericContainer<>(DockerImageName.parse("browserless/chrome")).withExposedPorts(3000);
+    // Run gradle docker if this image is not available
+    static GenericContainer<?> container = new GenericContainer<>(DockerImageName.parse("marginalia-browserless"))
+            .withEnv(Map.of("TOKEN", "BROWSERLESS_TOKEN"))
+            .withImagePullPolicy(PullPolicy.defaultPolicy())
+            .withNetworkMode("bridge")
+            .withLogConsumer(frame -> {
+                System.out.print(frame.getUtf8String());
+            })
+            .withExposedPorts(3000);
+
+    static WireMockServer wireMockServer =
+            new WireMockServer(WireMockConfiguration.wireMockConfig()
+                    .port(18089));
+
+    static String localIp;
+
+    static URI browserlessURI;
+    static URI browserlessWssURI;

    @BeforeAll
-    public static void setup() {
+    public static void setup() throws IOException {
        container.start();
+
+        browserlessURI = URI.create(String.format("http://%s:%d/",
+                container.getHost(),
+                container.getMappedPort(3000))
+        );
+
+        browserlessWssURI = URI.create(String.format("ws://%s:%d/?token=BROWSERLESS_TOKEN",
+                container.getHost(),
+                container.getMappedPort(3000))
+        );
+
+
+        wireMockServer.start();
+        wireMockServer.stubFor(get("/").willReturn(aResponse().withStatus(200).withBody("Ok")));
+
+        localIp = ServiceConfigurationModule.getLocalNetworkIP();
+
+    }
+
+    @Tag("flaky")
+    @Test
+    public void testInspectContentUA__Flaky() throws Exception {
+        try (var client = new BrowserlessClient(browserlessURI)) {
+            client.content("http://" + localIp + ":18089/",
+                    BrowserlessClient.GotoOptions.defaultValues()
+            );
+        }
+
+        wireMockServer.verify(getRequestedFor(urlEqualTo("/")).withHeader("User-Agent", equalTo(WmsaHome.getUserAgent().uaString())));
+    }
+
+    @Tag("flaky")
+    @Test
+    public void testInspectScreenshotUA__Flaky() throws Exception {
+        try (var client = new BrowserlessClient(browserlessURI)) {
+            client.screenshot("http://" + localIp + ":18089/",
+                    BrowserlessClient.GotoOptions.defaultValues(),
+                    BrowserlessClient.ScreenshotOptions.defaultValues()
+            );
+        }
+
+        wireMockServer.verify(getRequestedFor(urlEqualTo("/")).withHeader("User-Agent", equalTo(WmsaHome.getUserAgent().uaString())));
    }

    @Test
    public void testContent() throws Exception {
-        try (var client = new BrowserlessClient(URI.create("http://" + container.getHost() + ":" + container.getMappedPort(3000)))) {
-            var content = client.content("https://www.marginalia.nu/", BrowserlessClient.GotoOptions.defaultValues());
-            Assertions.assertNotNull(content, "Content should not be null");
+        try (var client = new BrowserlessClient(browserlessURI)) {
+            var content = client.content("https://www.marginalia.nu/", BrowserlessClient.GotoOptions.defaultValues()).orElseThrow();
+
            Assertions.assertFalse(content.isBlank(), "Content should not be empty");
        }
    }

+    @Test
+    public void testAnnotatedContent() throws Exception {
+
+        try (var client = new BrowserlessClient(browserlessURI);
+             DomSampleDb dbop = new DomSampleDb(Path.of("/tmp/dom-sample.db"))
+        ) {
+            var content = client.annotatedContent("https://marginalia.nu/", BrowserlessClient.GotoOptions.defaultValues()).orElseThrow();
+            dbop.saveSample("marginalia.nu", "https://marginalia.nu/", content);
+            System.out.println(content);
+            Assertions.assertFalse(content.isBlank(), "Content should not be empty");
+
+            dbop.getSamples("marginalia.nu").forEach(sample -> {
+                System.out.println("Sample URL: " + sample.url());
+                System.out.println("Sample Content: " + sample.sample());
+                System.out.println("Sample Requests: " + sample.requests());
+                System.out.println("Accepted Popover: " + sample.acceptedPopover());
+            });
+        }
+        finally {
+            Files.deleteIfExists(Path.of("/tmp/dom-sample.db"));
+        }
+
+    }
+
    @Test
    public void testScreenshot() throws Exception {
-        try (var client = new BrowserlessClient(URI.create("http://" + container.getHost() + ":" + container.getMappedPort(3000)))) {
-            var screenshot = client.screenshot("https://www.marginalia.nu/", BrowserlessClient.GotoOptions.defaultValues(), BrowserlessClient.ScreenshotOptions.defaultValues());
+        try (var client = new BrowserlessClient(browserlessURI)) {
+            var screenshot = client.screenshot("https://www.marginalia.nu/",
+                    BrowserlessClient.GotoOptions.defaultValues(),
+                    BrowserlessClient.ScreenshotOptions.defaultValues());
+
            Assertions.assertNotNull(screenshot, "Screenshot should not be null");
        }
    }
--- a/code/functions/live-capture/test/nu/marginalia/rss/svc/FeedFetcherServiceTest.java
+++ b/code/functions/live-capture/test/nu/marginalia/rss/svc/FeedFetcherServiceTest.java
@@ -96,10 +96,31 @@ class FeedFetcherServiceTest extends AbstractModule {
            feedDb.switchDb(writer);
        }

-        feedFetcherService.setDeterministic();
        feedFetcherService.updateFeeds(FeedFetcherService.UpdateMode.REFRESH);

-        Assertions.assertFalse(feedDb.getFeed(new EdgeDomain("www.marginalia.nu")).isEmpty());
+        var result = feedDb.getFeed(new EdgeDomain("www.marginalia.nu"));
+        System.out.println(result);
+        Assertions.assertFalse(result.isEmpty());
+    }
+
+    @Tag("flaky")
+    @Test
+    public void testFetchRepeatedly() throws Exception {
+        try (var writer = feedDb.createWriter()) {
+            writer.saveFeed(new FeedItems("www.marginalia.nu", "https://www.marginalia.nu/log/index.xml", "", List.of()));
+            feedDb.switchDb(writer);
+        }
+
+        feedFetcherService.updateFeeds(FeedFetcherService.UpdateMode.REFRESH);
+        Assertions.assertNotNull(feedDb.getEtag(new EdgeDomain("www.marginalia.nu")));
+        feedFetcherService.updateFeeds(FeedFetcherService.UpdateMode.REFRESH);
+        Assertions.assertNotNull(feedDb.getEtag(new EdgeDomain("www.marginalia.nu")));
+        feedFetcherService.updateFeeds(FeedFetcherService.UpdateMode.REFRESH);
+        Assertions.assertNotNull(feedDb.getEtag(new EdgeDomain("www.marginalia.nu")));
+
+        var result = feedDb.getFeed(new EdgeDomain("www.marginalia.nu"));
+        System.out.println(result);
+        Assertions.assertFalse(result.isEmpty());
    }

    @Tag("flaky")
@@ -110,7 +131,6 @@ class FeedFetcherServiceTest extends AbstractModule {
            feedDb.switchDb(writer);
        }

-        feedFetcherService.setDeterministic();
        feedFetcherService.updateFeeds(FeedFetcherService.UpdateMode.REFRESH);

        // We forget the feed on a 404 error
--- a/code/functions/math/api/java/nu/marginalia/api/math/model/DictionaryResponse.java
+++ b/code/functions/math/api/java/nu/marginalia/api/math/model/DictionaryResponse.java
@@ -7,4 +7,8 @@ public record DictionaryResponse(String word, List<DictionaryEntry> entries) {
        this.word = word;
        this.entries = entries.stream().toList(); // Make an immutable copy
    }
+
+    public boolean hasEntries() {
+        return !entries.isEmpty();
+    }
 }
--- a/code/functions/nsfw-domain-filter/build.gradle
+++ b/code/functions/nsfw-domain-filter/build.gradle
@@ -0,0 +1,43 @@
+plugins {
+    id 'java'
+    id 'jvm-test-suite'
+}
+
+java {
+    toolchain {
+        languageVersion.set(JavaLanguageVersion.of(rootProject.ext.jvmVersion))
+    }
+}
+
+apply from: "$rootProject.projectDir/srcsets.gradle"
+
+dependencies {
+
+    implementation project(':code:common:config')
+    implementation project(':code:common:model')
+    implementation project(':code:common:db')
+
+
+    implementation libs.bundles.slf4j
+    implementation libs.prometheus
+    implementation libs.guava
+    implementation libs.commons.lang3
+    implementation dependencies.create(libs.guice.get()) {
+        exclude group: 'com.google.guava'
+    }
+    implementation libs.notnull
+    implementation libs.fastutil
+    implementation libs.bundles.mariadb
+
+
+    testImplementation libs.bundles.slf4j.test
+    testImplementation libs.bundles.junit
+    testImplementation libs.mockito
+
+    testImplementation platform('org.testcontainers:testcontainers-bom:1.17.4')
+    testImplementation libs.commons.codec
+    testImplementation project(':code:common:service')
+    testImplementation 'org.testcontainers:mariadb:1.17.4'
+    testImplementation 'org.testcontainers:junit-jupiter:1.17.4'
+    testImplementation project(':code:libraries:test-helpers')
+}
--- a/code/functions/nsfw-domain-filter/java/nu/marginalia/nsfw/NsfwDomainFilter.java
+++ b/code/functions/nsfw-domain-filter/java/nu/marginalia/nsfw/NsfwDomainFilter.java
@@ -0,0 +1,192 @@
+package nu.marginalia.nsfw;
+
+import com.google.inject.Inject;
+import com.google.inject.Singleton;
+import com.google.inject.name.Named;
+import com.zaxxer.hikari.HikariDataSource;
+import it.unimi.dsi.fastutil.ints.IntOpenHashSet;
+import org.apache.commons.lang3.StringUtils;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.BufferedReader;
+import java.io.ByteArrayInputStream;
+import java.io.InputStreamReader;
+import java.net.http.HttpClient;
+import java.net.http.HttpRequest;
+import java.net.http.HttpResponse;
+import java.sql.SQLException;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+import java.util.concurrent.TimeUnit;
+import java.util.zip.GZIPInputStream;
+
+@Singleton
+public class NsfwDomainFilter {
+    private final HikariDataSource dataSource;
+
+    private final List<String> dangerLists;
+    private final List<String> smutLists;
+
+    private volatile IntOpenHashSet blockedDomainIdsTier1 = new IntOpenHashSet();
+    private volatile IntOpenHashSet blockedDomainIdsTier2 = new IntOpenHashSet();
+
+    private static final Logger logger = LoggerFactory.getLogger(NsfwDomainFilter.class);
+
+    public static final int NSFW_DISABLE = 0;
+    public static final int NSFW_BLOCK_DANGER = 1;
+    public static final int NSFW_BLOCK_SMUT = 2;
+
+    @Inject
+    public NsfwDomainFilter(HikariDataSource dataSource,
+                            @Named("nsfw.dangerLists") List<String> dangerLists,
+                            @Named("nsfw.smutLists") List<String> smutLists
+                            ) {
+        this.dataSource = dataSource;
+
+        this.dangerLists = dangerLists;
+        this.smutLists = smutLists;
+
+        Thread.ofPlatform().daemon().name("NsfwDomainFilterSync").start(() -> {
+            while (true) {
+                sync();
+                try {
+                    TimeUnit.HOURS.sleep(1);
+                } catch (InterruptedException e) {
+                    Thread.currentThread().interrupt();
+                    break; // Exit the loop if interrupted
+                }
+            }
+        });
+    }
+
+    public boolean isBlocked(int domainId, int tier) {
+        if (tier == 0)
+            return false;
+
+        if (tier >= 1 && blockedDomainIdsTier1.contains(domainId))
+            return true;
+        if (tier >= 2 && blockedDomainIdsTier2.contains(domainId))
+            return true;
+
+        return false;
+    }
+
+    private synchronized void sync() {
+        try (var conn = dataSource.getConnection();
+             var stmt = conn.prepareStatement("SELECT ID, TIER FROM NSFW_DOMAINS")
+        ) {
+            var rs = stmt.executeQuery();
+            IntOpenHashSet tier1 = new IntOpenHashSet();
+            IntOpenHashSet tier2 = new IntOpenHashSet();
+
+            while (rs.next()) {
+                int domainId = rs.getInt("ID");
+                int tier = rs.getInt("TIER");
+
+                switch (tier) {
+                    case 1 -> tier1.add(domainId);
+                    case 2 -> tier2.add(domainId);
+                }
+            }
+
+            this.blockedDomainIdsTier1 = tier1;
+            this.blockedDomainIdsTier2 = tier2;
+
+            logger.info("NSFW domain filter synced: {} tier 1, {} tier 2", tier1.size(), tier2.size());
+
+        }
+        catch (SQLException ex) {
+            logger.error("Failed to sync NSFW domain filter", ex);
+        }
+    }
+
+    public synchronized void fetchLists() {
+        try (var conn = dataSource.getConnection();
+             HttpClient client = HttpClient.newBuilder()
+                     .followRedirects(HttpClient.Redirect.ALWAYS)
+                     .build();
+             var stmt = conn.createStatement();
+             var insertStmt = conn.prepareStatement("INSERT IGNORE INTO NSFW_DOMAINS_TMP (ID, TIER) SELECT ID, ? FROM EC_DOMAIN WHERE DOMAIN_NAME = ?")) {
+
+            stmt.execute("DROP TABLE IF EXISTS NSFW_DOMAINS_TMP");
+            stmt.execute("CREATE TABLE NSFW_DOMAINS_TMP LIKE NSFW_DOMAINS");
+
+            List<String> combinedDangerList = new ArrayList<>(10_000);
+            for (var dangerListUrl : dangerLists) {
+                combinedDangerList.addAll(fetchList(client, dangerListUrl));
+            }
+
+            for (String domain : combinedDangerList) {
+                insertStmt.setInt(1, NSFW_BLOCK_DANGER);
+                insertStmt.setString(2, domain);
+                insertStmt.execute();
+            }
+
+            List<String> combinedSmutList = new ArrayList<>(10_000);
+            for (var smutListUrl : smutLists) {
+                combinedSmutList.addAll(fetchList(client, smutListUrl));
+            }
+
+            for (String domain : combinedSmutList) {
+                insertStmt.setInt(1, NSFW_BLOCK_SMUT);
+                insertStmt.setString(2, domain);
+                insertStmt.addBatch();
+                insertStmt.execute();
+            }
+
+            stmt.execute("""
+                    DROP TABLE IF EXISTS NSFW_DOMAINS
+                    """);
+            stmt.execute("""
+                    RENAME TABLE NSFW_DOMAINS_TMP TO NSFW_DOMAINS
+                    """);
+            sync();
+        }
+        catch (SQLException ex) {
+            logger.error("Failed to fetch NSFW domain lists", ex);
+        }
+     }
+
+     public List<String> fetchList(HttpClient client, String url) {
+
+        logger.info("Fetching NSFW domain list from {}", url);
+
+        var request = HttpRequest.newBuilder()
+                .uri(java.net.URI.create(url))
+                .build();
+
+        try {
+            if (url.endsWith(".gz")) {
+                var response = client.send(request, HttpResponse.BodyHandlers.ofByteArray());
+
+                byte[] body = response.body();
+
+                try (var reader = new BufferedReader(new InputStreamReader(new GZIPInputStream(new ByteArrayInputStream(body))))) {
+                    return reader.lines()
+                            .filter(StringUtils::isNotEmpty)
+                            .toList();
+                } catch (Exception e) {
+                    logger.error("Error reading GZIP response from {}", url, e);
+                }
+            } else {
+                var response = client.send(request, HttpResponse.BodyHandlers.ofString());
+                if (response.statusCode() == 200) {
+
+                    return Arrays.stream(StringUtils.split(response.body(), "\n"))
+                            .filter(StringUtils::isNotEmpty)
+                            .toList();
+                } else {
+                    logger.warn("Failed to fetch list from {}: HTTP {}", url, response.statusCode());
+                }
+            }
+        }
+        catch (Exception e) {
+            logger.error("Error fetching NSFW domain list from {}", url, e);
+        }
+
+
+        return List.of();
+     }
+}
--- a/code/functions/nsfw-domain-filter/java/nu/marginalia/nsfw/NsfwFilterModule.java
+++ b/code/functions/nsfw-domain-filter/java/nu/marginalia/nsfw/NsfwFilterModule.java
@@ -0,0 +1,30 @@
+package nu.marginalia.nsfw;
+
+import com.google.inject.AbstractModule;
+import com.google.inject.Provides;
+import jakarta.inject.Named;
+
+import java.util.List;
+
+public class NsfwFilterModule extends AbstractModule {
+
+    @Provides
+    @Named("nsfw.dangerLists")
+    public List<String> nsfwDomainLists1() {
+        return List.of(
+                "https://raw.githubusercontent.com/olbat/ut1-blacklists/refs/heads/master/blacklists/cryptojacking/domains",
+                "https://raw.githubusercontent.com/olbat/ut1-blacklists/refs/heads/master/blacklists/malware/domains",
+                "https://raw.githubusercontent.com/olbat/ut1-blacklists/refs/heads/master/blacklists/phishing/domains"
+        );
+    }
+    @Provides
+    @Named("nsfw.smutLists")
+    public List<String> nsfwDomainLists2() {
+        return List.of(
+                "https://github.com/olbat/ut1-blacklists/raw/refs/heads/master/blacklists/adult/domains.gz",
+                "https://raw.githubusercontent.com/olbat/ut1-blacklists/refs/heads/master/blacklists/gambling/domains"
+        );
+    }
+
+    public void configure() {}
+}
--- a/code/functions/nsfw-domain-filter/test/nu/marginalia/nsfw/NsfwDomainFilterTest.java
+++ b/code/functions/nsfw-domain-filter/test/nu/marginalia/nsfw/NsfwDomainFilterTest.java
@@ -0,0 +1,108 @@
+package nu.marginalia.nsfw;
+
+
+import com.google.inject.AbstractModule;
+import com.google.inject.Guice;
+import com.google.inject.Provides;
+import com.zaxxer.hikari.HikariConfig;
+import com.zaxxer.hikari.HikariDataSource;
+import jakarta.inject.Named;
+import nu.marginalia.test.TestMigrationLoader;
+import org.junit.jupiter.api.BeforeAll;
+import org.junit.jupiter.api.Tag;
+import org.junit.jupiter.api.Test;
+import org.testcontainers.containers.MariaDBContainer;
+import org.testcontainers.junit.jupiter.Container;
+import org.testcontainers.junit.jupiter.Testcontainers;
+
+import java.io.IOException;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.util.List;
+
+import static org.junit.jupiter.api.Assertions.assertFalse;
+import static org.junit.jupiter.api.Assertions.assertTrue;
+
+
+@Tag("slow")
+@Testcontainers
+class NsfwDomainFilterTest extends AbstractModule {
+
+    @Container
+    static MariaDBContainer<?> mariaDBContainer = new MariaDBContainer<>("mariadb")
+            .withDatabaseName("WMSA_prod")
+            .withUsername("wmsa")
+            .withPassword("wmsa")
+            .withNetworkAliases("mariadb");
+
+    static HikariDataSource dataSource;
+    static Path tempDir;
+
+    @BeforeAll
+    public static void setUpDb() throws IOException {
+        tempDir = Files.createTempDirectory(NsfwDomainFilterTest.class.getSimpleName());
+
+        System.setProperty("system.homePath", tempDir.toString());
+
+        HikariConfig config = new HikariConfig();
+        config.setJdbcUrl(mariaDBContainer.getJdbcUrl());
+        config.setUsername("wmsa");
+        config.setPassword("wmsa");
+
+        dataSource = new HikariDataSource(config);
+
+        TestMigrationLoader.flywayMigration(dataSource);
+
+        try (var conn = dataSource.getConnection();
+             var stmt = conn.prepareStatement("INSERT INTO EC_DOMAIN (DOMAIN_NAME, DOMAIN_TOP, NODE_AFFINITY) VALUES (?, ?, 1)")
+        ) {
+
+            // Ensure the database is ready
+            conn.createStatement().execute("SELECT 1");
+
+            stmt.setString(1, "www.google.com");
+            stmt.setString(2, "google.com");
+            stmt.executeUpdate();
+            stmt.setString(1, "www.bing.com");
+            stmt.setString(2, "bing.com");
+            stmt.executeUpdate();
+        } catch (Exception e) {
+            throw new RuntimeException("Failed to connect to the database", e);
+        }
+    }
+
+    @Provides
+    @Named("nsfw.dangerLists")
+    public List<String> nsfwDomainLists1() {
+        return List.of(
+                "https://downloads.marginalia.nu/test/list1"
+        );
+    }
+
+    @Provides
+    @Named("nsfw.smutLists")
+    public List<String> nsfwDomainLists2() {
+        return List.of(
+                "https://downloads.marginalia.nu/test/list2.gz"
+        );
+    }
+
+    public void configure() {
+        bind(HikariDataSource.class).toInstance(dataSource);
+    }
+
+    @Test
+    public void test() {
+        var filter = Guice
+                .createInjector(this)
+                .getInstance(NsfwDomainFilter.class);
+
+        filter.fetchLists();
+
+        assertTrue(filter.isBlocked(1, NsfwDomainFilter.NSFW_BLOCK_DANGER));
+        assertTrue(filter.isBlocked(1, NsfwDomainFilter.NSFW_BLOCK_SMUT));
+        assertFalse(filter.isBlocked(2, NsfwDomainFilter.NSFW_BLOCK_DANGER));
+        assertTrue(filter.isBlocked(2, NsfwDomainFilter.NSFW_BLOCK_SMUT));
+    }
+
+}
--- a/code/functions/search-query/api/java/nu/marginalia/api/searchquery/IndexProtobufCodec.java
+++ b/code/functions/search-query/api/java/nu/marginalia/api/searchquery/IndexProtobufCodec.java
@@ -2,9 +2,6 @@ package nu.marginalia.api.searchquery;

 import nu.marginalia.api.searchquery.model.query.SearchPhraseConstraint;
 import nu.marginalia.api.searchquery.model.query.SearchQuery;
-import nu.marginalia.api.searchquery.model.results.Bm25Parameters;
-import nu.marginalia.api.searchquery.model.results.ResultRankingParameters;
-import nu.marginalia.index.query.limit.QueryLimits;
 import nu.marginalia.index.query.limit.SpecificationLimit;
 import nu.marginalia.index.query.limit.SpecificationLimitType;

@@ -27,37 +24,19 @@ public class IndexProtobufCodec {
                .build();
    }

-    public static  QueryLimits convertQueryLimits(RpcQueryLimits queryLimits) {
-        return new QueryLimits(
-                queryLimits.getResultsByDomain(),
-                queryLimits.getResultsTotal(),
-                queryLimits.getTimeoutMs(),
-                queryLimits.getFetchSize()
-        );
-    }
-
-    public static RpcQueryLimits convertQueryLimits(QueryLimits queryLimits) {
-        return RpcQueryLimits.newBuilder()
-                .setResultsByDomain(queryLimits.resultsByDomain())
-                .setResultsTotal(queryLimits.resultsTotal())
-                .setTimeoutMs(queryLimits.timeoutMs())
-                .setFetchSize(queryLimits.fetchSize())
-                .build();
-    }
-
    public static SearchQuery convertRpcQuery(RpcQuery query) {
-        List<SearchPhraseConstraint> phraeConstraints = new ArrayList<>();
+        List<SearchPhraseConstraint> phraseConstraints = new ArrayList<>();

        for (int j = 0; j < query.getPhrasesCount(); j++) {
            var coh = query.getPhrases(j);
            if (coh.getType() == RpcPhrases.TYPE.OPTIONAL) {
-                phraeConstraints.add(new SearchPhraseConstraint.Optional(List.copyOf(coh.getTermsList())));
+                phraseConstraints.add(new SearchPhraseConstraint.Optional(List.copyOf(coh.getTermsList())));
            }
            else if (coh.getType() == RpcPhrases.TYPE.MANDATORY) {
-                phraeConstraints.add(new SearchPhraseConstraint.Mandatory(List.copyOf(coh.getTermsList())));
+                phraseConstraints.add(new SearchPhraseConstraint.Mandatory(List.copyOf(coh.getTermsList())));
            }
            else if (coh.getType() == RpcPhrases.TYPE.FULL) {
-                phraeConstraints.add(new SearchPhraseConstraint.Full(List.copyOf(coh.getTermsList())));
+                phraseConstraints.add(new SearchPhraseConstraint.Full(List.copyOf(coh.getTermsList())));
            }
            else {
                throw new IllegalArgumentException("Unknown phrase constraint type: " + coh.getType());
@@ -70,7 +49,7 @@ public class IndexProtobufCodec {
                query.getExcludeList(),
                query.getAdviceList(),
                query.getPriorityList(),
-                phraeConstraints
+                phraseConstraints
        );
    }

@@ -103,60 +82,4 @@ public class IndexProtobufCodec {
        return subqueryBuilder.build();
    }

-    public static ResultRankingParameters convertRankingParameterss(RpcResultRankingParameters params) {
-        if (params == null)
-            return ResultRankingParameters.sensibleDefaults();
-
-        return new ResultRankingParameters(
-                new Bm25Parameters(params.getBm25K(), params.getBm25B()),
-                params.getShortDocumentThreshold(),
-                params.getShortDocumentPenalty(),
-                params.getDomainRankBonus(),
-                params.getQualityPenalty(),
-                params.getShortSentenceThreshold(),
-                params.getShortSentencePenalty(),
-                params.getBm25Weight(),
-                params.getTcfFirstPositionWeight(),
-                params.getTcfVerbatimWeight(),
-                params.getTcfProximityWeight(),
-                ResultRankingParameters.TemporalBias.valueOf(params.getTemporalBias().getBias().name()),
-                params.getTemporalBiasWeight(),
-                params.getExportDebugData()
-        );
-    }
-
-    public static RpcResultRankingParameters convertRankingParameterss(ResultRankingParameters rankingParams,
-                                                                       RpcTemporalBias temporalBias)
-    {
-        if (rankingParams == null) {
-            rankingParams = ResultRankingParameters.sensibleDefaults();
-        }
-
-        var builder = RpcResultRankingParameters.newBuilder()
-                        .setBm25B(rankingParams.bm25Params.b())
-                        .setBm25K(rankingParams.bm25Params.k())
-                        .setShortDocumentThreshold(rankingParams.shortDocumentThreshold)
-                        .setShortDocumentPenalty(rankingParams.shortDocumentPenalty)
-                        .setDomainRankBonus(rankingParams.domainRankBonus)
-                        .setQualityPenalty(rankingParams.qualityPenalty)
-                        .setShortSentenceThreshold(rankingParams.shortSentenceThreshold)
-                        .setShortSentencePenalty(rankingParams.shortSentencePenalty)
-                        .setBm25Weight(rankingParams.bm25Weight)
-                        .setTcfFirstPositionWeight(rankingParams.tcfFirstPosition)
-                        .setTcfProximityWeight(rankingParams.tcfProximity)
-                        .setTcfVerbatimWeight(rankingParams.tcfVerbatim)
-                        .setTemporalBiasWeight(rankingParams.temporalBiasWeight)
-                        .setExportDebugData(rankingParams.exportDebugData);
-
-        if (temporalBias != null && temporalBias.getBias() != RpcTemporalBias.Bias.NONE) {
-            builder.setTemporalBias(temporalBias);
-        }
-        else {
-            builder.setTemporalBias(RpcTemporalBias.newBuilder()
-                    .setBias(RpcTemporalBias.Bias.valueOf(rankingParams.temporalBias.name())));
-        }
-
-        return builder.build();
-    }
-
 }
--- a/code/functions/search-query/api/java/nu/marginalia/api/searchquery/QueryClient.java
+++ b/code/functions/search-query/api/java/nu/marginalia/api/searchquery/QueryClient.java
@@ -9,10 +9,9 @@ import nu.marginalia.service.client.GrpcChannelPoolFactory;
 import nu.marginalia.service.client.GrpcSingleNodeChannelPool;
 import nu.marginalia.service.discovery.property.ServiceKey;
 import nu.marginalia.service.discovery.property.ServicePartition;
-import org.slf4j.Logger;
-import org.slf4j.LoggerFactory;

 import javax.annotation.CheckReturnValue;
+import java.time.Duration;

@Singleton
 public class QueryClient  {
@@ -24,13 +23,14 @@ public class QueryClient  {

    private final GrpcSingleNodeChannelPool<QueryApiGrpc.QueryApiBlockingStub> queryApiPool;

-    private final Logger logger = LoggerFactory.getLogger(getClass());
-
    @Inject
-    public QueryClient(GrpcChannelPoolFactory channelPoolFactory) {
+    public QueryClient(GrpcChannelPoolFactory channelPoolFactory) throws InterruptedException {
        this.queryApiPool = channelPoolFactory.createSingle(
                ServiceKey.forGrpcApi(QueryApiGrpc.class, ServicePartition.any()),
                QueryApiGrpc::newBlockingStub);
+
+        // Hold up initialization until we have a downstream connection
+        this.queryApiPool.awaitChannel(Duration.ofSeconds(5));
    }

    @CheckReturnValue
--- a/code/functions/search-query/api/java/nu/marginalia/api/searchquery/QueryProtobufCodec.java
+++ b/code/functions/search-query/api/java/nu/marginalia/api/searchquery/QueryProtobufCodec.java
@@ -1,11 +1,8 @@
 package nu.marginalia.api.searchquery;

-import nu.marginalia.api.searchquery.model.query.ProcessedQuery;
-import nu.marginalia.api.searchquery.model.query.QueryParams;
-import nu.marginalia.api.searchquery.model.query.QueryResponse;
-import nu.marginalia.api.searchquery.model.query.SearchSpecification;
+import nu.marginalia.api.searchquery.model.query.*;
 import nu.marginalia.api.searchquery.model.results.DecoratedSearchResultItem;
-import nu.marginalia.api.searchquery.model.results.ResultRankingParameters;
+import nu.marginalia.api.searchquery.model.results.PrototypeRankingParameters;
 import nu.marginalia.api.searchquery.model.results.SearchResultItem;
 import nu.marginalia.api.searchquery.model.results.SearchResultKeywordScore;
 import nu.marginalia.api.searchquery.model.results.debug.DebugFactor;
@@ -32,12 +29,14 @@ public class QueryProtobufCodec {
        builder.setSearchSetIdentifier(query.specs.searchSetIdentifier);
        builder.setHumanQuery(request.getHumanQuery());

+        builder.setNsfwFilterTierValue(request.getNsfwFilterTierValue());
+
        builder.setQuality(IndexProtobufCodec.convertSpecLimit(query.specs.quality));
        builder.setYear(IndexProtobufCodec.convertSpecLimit(query.specs.year));
        builder.setSize(IndexProtobufCodec.convertSpecLimit(query.specs.size));
        builder.setRank(IndexProtobufCodec.convertSpecLimit(query.specs.rank));

-        builder.setQueryLimits(IndexProtobufCodec.convertQueryLimits(query.specs.queryLimits));
+        builder.setQueryLimits(query.specs.queryLimits);

        // Query strategy may be overridden by the query, but if not, use the one from the request
        if (query.specs.queryStrategy != null && query.specs.queryStrategy != QueryStrategy.AUTO)
@@ -45,9 +44,27 @@ public class QueryProtobufCodec {
        else
            builder.setQueryStrategy(request.getQueryStrategy());

+        if (request.getTemporalBias().getBias() != RpcTemporalBias.Bias.NONE) {
            if (query.specs.rankingParams != null) {
-            builder.setParameters(IndexProtobufCodec.convertRankingParameterss(query.specs.rankingParams, request.getTemporalBias()));
+                builder.setParameters(
+                        RpcResultRankingParameters.newBuilder(query.specs.rankingParams)
+                                .setTemporalBias(request.getTemporalBias())
+                                .build()
+                );
+            } else {
+                builder.setParameters(
+                        RpcResultRankingParameters.newBuilder(PrototypeRankingParameters.sensibleDefaults())
+                                .setTemporalBias(request.getTemporalBias())
+                                .build()
+                );
            }
+        } else if (query.specs.rankingParams != null) {
+            builder.setParameters(query.specs.rankingParams);
+        }
+        // else {
+        // if we have no ranking params, we don't need to set them, the client check and use the default values
+        // so we don't need to send this huge object over the wire
+        // }

        return builder.build();
    }
@@ -60,23 +77,20 @@ public class QueryProtobufCodec {
        builder.setSearchSetIdentifier(query.specs.searchSetIdentifier);
        builder.setHumanQuery(humanQuery);

+        builder.setNsfwFilterTier(RpcIndexQuery.NSFW_FILTER_TIER.DANGER);
+
        builder.setQuality(IndexProtobufCodec.convertSpecLimit(query.specs.quality));
        builder.setYear(IndexProtobufCodec.convertSpecLimit(query.specs.year));
        builder.setSize(IndexProtobufCodec.convertSpecLimit(query.specs.size));
        builder.setRank(IndexProtobufCodec.convertSpecLimit(query.specs.rank));

-        builder.setQueryLimits(IndexProtobufCodec.convertQueryLimits(query.specs.queryLimits));
+        builder.setQueryLimits(query.specs.queryLimits);

        // Query strategy may be overridden by the query, but if not, use the one from the request
        builder.setQueryStrategy(query.specs.queryStrategy.name());

        if (query.specs.rankingParams != null) {
-            builder.setParameters(IndexProtobufCodec.convertRankingParameterss(
-                    query.specs.rankingParams,
-                    RpcTemporalBias.newBuilder().setBias(
-                                    RpcTemporalBias.Bias.NONE)
-                            .build())
-            );
+            builder.setParameters(query.specs.rankingParams);
        }

        return builder.build();
@@ -95,10 +109,11 @@ public class QueryProtobufCodec {
                IndexProtobufCodec.convertSpecLimit(request.getSize()),
                IndexProtobufCodec.convertSpecLimit(request.getRank()),
                request.getDomainIdsList(),
-                IndexProtobufCodec.convertQueryLimits(request.getQueryLimits()),
+                request.getQueryLimits(),
                request.getSearchSetIdentifier(),
                QueryStrategy.valueOf(request.getQueryStrategy()),
-                ResultRankingParameters.TemporalBias.valueOf(request.getTemporalBias().getBias().name()),
+                RpcTemporalBias.Bias.valueOf(request.getTemporalBias().getBias().name()),
+                NsfwFilterTier.fromCodedValue(request.getNsfwFilterTierValue()),
                request.getPagination().getPage()
        );
    }
@@ -294,9 +309,9 @@ public class QueryProtobufCodec {
                IndexProtobufCodec.convertSpecLimit(specs.getYear()),
                IndexProtobufCodec.convertSpecLimit(specs.getSize()),
                IndexProtobufCodec.convertSpecLimit(specs.getRank()),
-                IndexProtobufCodec.convertQueryLimits(specs.getQueryLimits()),
+                specs.getQueryLimits(),
                QueryStrategy.valueOf(specs.getQueryStrategy()),
-                IndexProtobufCodec.convertRankingParameterss(specs.getParameters())
+                specs.hasParameters() ? specs.getParameters() : null
        );
    }

@@ -307,19 +322,20 @@ public class QueryProtobufCodec {
                .addAllTacitExcludes(params.tacitExcludes())
                .addAllTacitPriority(params.tacitPriority())
                .setHumanQuery(params.humanQuery())
-                .setQueryLimits(IndexProtobufCodec.convertQueryLimits(params.limits()))
+                .setQueryLimits(params.limits())
                .setQuality(IndexProtobufCodec.convertSpecLimit(params.quality()))
                .setYear(IndexProtobufCodec.convertSpecLimit(params.year()))
                .setSize(IndexProtobufCodec.convertSpecLimit(params.size()))
                .setRank(IndexProtobufCodec.convertSpecLimit(params.rank()))
                .setSearchSetIdentifier(params.identifier())
                .setQueryStrategy(params.queryStrategy().name())
+                .setNsfwFilterTierValue(params.filterTier().getCodedValue())
                .setTemporalBias(RpcTemporalBias.newBuilder()
                        .setBias(RpcTemporalBias.Bias.valueOf(params.temporalBias().name()))
                        .build())
                .setPagination(RpcQsQueryPagination.newBuilder()
                        .setPage(params.page())
-                        .setPageSize(Math.min(100, params.limits().resultsTotal()))
+                        .setPageSize(Math.min(100, params.limits().getResultsTotal()))
                        .build());

        if (params.nearDomain() != null)
--- a/code/functions/search-query/api/java/nu/marginalia/api/searchquery/model/query/NsfwFilterTier.java
+++ b/code/functions/search-query/api/java/nu/marginalia/api/searchquery/model/query/NsfwFilterTier.java
@@ -0,0 +1,26 @@
+package nu.marginalia.api.searchquery.model.query;
+
+public enum NsfwFilterTier {
+    OFF(0),
+    DANGER(1),
+    PORN_AND_GAMBLING(2);
+
+    private final int codedValue; // same as ordinal() for now, but can be changed later if needed
+
+    NsfwFilterTier(int codedValue) {
+        this.codedValue = codedValue;
+    }
+
+    public static NsfwFilterTier fromCodedValue(int codedValue) {
+        for (NsfwFilterTier tier : NsfwFilterTier.values()) {
+            if (tier.codedValue == codedValue) {
+                return tier;
+            }
+        }
+        throw new IllegalArgumentException("Invalid coded value for NsfwFilterTirer: " + codedValue);
+    }
+
+    public int getCodedValue() {
+        return codedValue;
+    }
+}
--- a/code/functions/search-query/api/java/nu/marginalia/api/searchquery/model/query/QueryParams.java
+++ b/code/functions/search-query/api/java/nu/marginalia/api/searchquery/model/query/QueryParams.java
@@ -1,7 +1,7 @@
 package nu.marginalia.api.searchquery.model.query;

-import nu.marginalia.api.searchquery.model.results.ResultRankingParameters;
-import nu.marginalia.index.query.limit.QueryLimits;
+import nu.marginalia.api.searchquery.RpcQueryLimits;
+import nu.marginalia.api.searchquery.RpcTemporalBias;
 import nu.marginalia.index.query.limit.QueryStrategy;
 import nu.marginalia.index.query.limit.SpecificationLimit;

@@ -21,14 +21,15 @@ public record QueryParams(
        SpecificationLimit size,
        SpecificationLimit rank,
        List<Integer> domainIds,
-        QueryLimits limits,
+        RpcQueryLimits limits,
        String identifier,
        QueryStrategy queryStrategy,
-        ResultRankingParameters.TemporalBias temporalBias,
+        RpcTemporalBias.Bias temporalBias,
+        NsfwFilterTier filterTier,
        int page
 )
 {
-    public QueryParams(String query, QueryLimits limits, String identifier) {
+    public QueryParams(String query, RpcQueryLimits limits, String identifier, NsfwFilterTier filterTier) {
        this(query, null,
                List.of(),
                List.of(),
@@ -42,7 +43,8 @@ public record QueryParams(
                limits,
                identifier,
                QueryStrategy.AUTO,
-                ResultRankingParameters.TemporalBias.NONE,
+                RpcTemporalBias.Bias.NONE,
+                filterTier,
                1 // page
                );
    }
--- a/Show More
+++ b/Show More