mirror of
https://github.com/MarginaliaSearch/MarginaliaSearch.git
synced 2025-10-06 07:32:38 +02:00
Compare commits
24 Commits
deploy-004
...
deploy-006
Author | SHA1 | Date | |
---|---|---|---|
|
4342e42722 | ||
|
bc818056e6 | ||
|
de2feac238 | ||
|
1e770205a5 | ||
|
e44ecd6d69 | ||
|
5b93a0e633 | ||
|
08fb0e5efe | ||
|
bcf67782ea | ||
|
ef3f175ede | ||
|
bbe4b5d9fd | ||
|
c67a635103 | ||
|
20b24133fb | ||
|
f2567677e8 | ||
|
bc2c2061f2 | ||
|
1c7f5a31a5 | ||
|
59a8ea60f7 | ||
|
aa9b1244ea | ||
|
2d17233366 | ||
|
b245cc9f38 | ||
|
6614d05bdf | ||
|
55aeb03c4a | ||
|
faa589962f | ||
|
c7edd6b39f | ||
|
79da622e3b |
39
ROADMAP.md
39
ROADMAP.md
@@ -1,4 +1,4 @@
|
|||||||
# Roadmap 2024-2025
|
# Roadmap 2025
|
||||||
|
|
||||||
This is a roadmap with major features planned for Marginalia Search.
|
This is a roadmap with major features planned for Marginalia Search.
|
||||||
|
|
||||||
@@ -30,12 +30,6 @@ Retaining the ability to independently crawl the web is still strongly desirable
|
|||||||
The search engine has a bit of a problem showing spicy content mixed in with the results. It would be desirable to have a way to filter this out. It's likely something like a URL blacklist (e.g. [UT1](https://dsi.ut-capitole.fr/blacklists/index_en.php) )
|
The search engine has a bit of a problem showing spicy content mixed in with the results. It would be desirable to have a way to filter this out. It's likely something like a URL blacklist (e.g. [UT1](https://dsi.ut-capitole.fr/blacklists/index_en.php) )
|
||||||
combined with naive bayesian filter would go a long way, or something more sophisticated...?
|
combined with naive bayesian filter would go a long way, or something more sophisticated...?
|
||||||
|
|
||||||
## Web Design Overhaul
|
|
||||||
|
|
||||||
The design is kinda clunky and hard to maintain, and needlessly outdated-looking.
|
|
||||||
|
|
||||||
In progress: PR [#127](https://github.com/MarginaliaSearch/MarginaliaSearch/pull/127) -- demo available at https://test.marginalia.nu/
|
|
||||||
|
|
||||||
## Additional Language Support
|
## Additional Language Support
|
||||||
|
|
||||||
It would be desirable if the search engine supported more languages than English. This is partially about
|
It would be desirable if the search engine supported more languages than English. This is partially about
|
||||||
@@ -62,8 +56,31 @@ filter for any API consumer.
|
|||||||
|
|
||||||
I've talked to the stract dev and he does not think it's a good idea to mimic their optics language, which is quite ad-hoc, but instead to work together to find some new common description language for this.
|
I've talked to the stract dev and he does not think it's a good idea to mimic their optics language, which is quite ad-hoc, but instead to work together to find some new common description language for this.
|
||||||
|
|
||||||
|
## Show favicons next to search results
|
||||||
|
|
||||||
|
This is expected from search engines. Basic proof of concept sketch of fetching this data has been done, but the feature is some way from being reality.
|
||||||
|
|
||||||
|
## Specialized crawler for github
|
||||||
|
|
||||||
|
One of the search engine's biggest limitations right now is that it does not index github at all. A specialized crawler that fetches at least the readme.md would go a long way toward providing search capabilities in this domain.
|
||||||
|
|
||||||
# Completed
|
# Completed
|
||||||
|
|
||||||
|
## Web Design Overhaul (COMPLETED 2025-01)
|
||||||
|
|
||||||
|
The design is kinda clunky and hard to maintain, and needlessly outdated-looking.
|
||||||
|
|
||||||
|
PR [#127](https://github.com/MarginaliaSearch/MarginaliaSearch/pull/127)
|
||||||
|
|
||||||
|
## Finalize RSS support (COMPLETED 2024-11)
|
||||||
|
|
||||||
|
Marginalia has experimental RSS preview support for a few domains. This works well and
|
||||||
|
it should be extended to all domains. It would also be interesting to offer search of the
|
||||||
|
RSS data itself, or use the RSS set to feed a special live index that updates faster than the
|
||||||
|
main dataset.
|
||||||
|
|
||||||
|
Completed with PR [#122](https://github.com/MarginaliaSearch/MarginaliaSearch/pull/122) and PR [#125](https://github.com/MarginaliaSearch/MarginaliaSearch/pull/125)
|
||||||
|
|
||||||
## Proper Position Index (COMPLETED 2024-09)
|
## Proper Position Index (COMPLETED 2024-09)
|
||||||
|
|
||||||
The search engine uses a fixed width bit mask to indicate word positions. It has the benefit
|
The search engine uses a fixed width bit mask to indicate word positions. It has the benefit
|
||||||
@@ -76,11 +93,3 @@ list, as is the civilized way of doing this.
|
|||||||
|
|
||||||
Completed with PR [#99](https://github.com/MarginaliaSearch/MarginaliaSearch/pull/99)
|
Completed with PR [#99](https://github.com/MarginaliaSearch/MarginaliaSearch/pull/99)
|
||||||
|
|
||||||
## Finalize RSS support (COMPLETED 2024-11)
|
|
||||||
|
|
||||||
Marginalia has experimental RSS preview support for a few domains. This works well and
|
|
||||||
it should be extended to all domains. It would also be interesting to offer search of the
|
|
||||||
RSS data itself, or use the RSS set to feed a special live index that updates faster than the
|
|
||||||
main dataset.
|
|
||||||
|
|
||||||
Completed with PR [#122](https://github.com/MarginaliaSearch/MarginaliaSearch/pull/122) and PR [#125](https://github.com/MarginaliaSearch/MarginaliaSearch/pull/125)
|
|
||||||
|
@@ -47,7 +47,7 @@ ext {
|
|||||||
dockerImageBase='container-registry.oracle.com/graalvm/jdk:23'
|
dockerImageBase='container-registry.oracle.com/graalvm/jdk:23'
|
||||||
dockerImageTag='latest'
|
dockerImageTag='latest'
|
||||||
dockerImageRegistry='marginalia'
|
dockerImageRegistry='marginalia'
|
||||||
jibVersion = '3.4.3'
|
jibVersion = '3.4.4'
|
||||||
|
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@@ -20,7 +20,10 @@ public class DbDomainQueries {
|
|||||||
private final HikariDataSource dataSource;
|
private final HikariDataSource dataSource;
|
||||||
|
|
||||||
private static final Logger logger = LoggerFactory.getLogger(DbDomainQueries.class);
|
private static final Logger logger = LoggerFactory.getLogger(DbDomainQueries.class);
|
||||||
|
|
||||||
private final Cache<EdgeDomain, Integer> domainIdCache = CacheBuilder.newBuilder().maximumSize(10_000).build();
|
private final Cache<EdgeDomain, Integer> domainIdCache = CacheBuilder.newBuilder().maximumSize(10_000).build();
|
||||||
|
private final Cache<Integer, EdgeDomain> domainNameCache = CacheBuilder.newBuilder().maximumSize(10_000).build();
|
||||||
|
private final Cache<String, List<DomainWithNode>> siblingsCache = CacheBuilder.newBuilder().maximumSize(10_000).build();
|
||||||
|
|
||||||
@Inject
|
@Inject
|
||||||
public DbDomainQueries(HikariDataSource dataSource)
|
public DbDomainQueries(HikariDataSource dataSource)
|
||||||
@@ -30,16 +33,21 @@ public class DbDomainQueries {
|
|||||||
|
|
||||||
|
|
||||||
public Integer getDomainId(EdgeDomain domain) throws NoSuchElementException {
|
public Integer getDomainId(EdgeDomain domain) throws NoSuchElementException {
|
||||||
try (var connection = dataSource.getConnection()) {
|
try {
|
||||||
|
|
||||||
return domainIdCache.get(domain, () -> {
|
return domainIdCache.get(domain, () -> {
|
||||||
try (var stmt = connection.prepareStatement("SELECT ID FROM EC_DOMAIN WHERE DOMAIN_NAME=?")) {
|
try (var connection = dataSource.getConnection();
|
||||||
|
var stmt = connection.prepareStatement("SELECT ID FROM EC_DOMAIN WHERE DOMAIN_NAME=?")) {
|
||||||
|
|
||||||
stmt.setString(1, domain.toString());
|
stmt.setString(1, domain.toString());
|
||||||
var rsp = stmt.executeQuery();
|
var rsp = stmt.executeQuery();
|
||||||
if (rsp.next()) {
|
if (rsp.next()) {
|
||||||
return rsp.getInt(1);
|
return rsp.getInt(1);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
catch (SQLException ex) {
|
||||||
|
throw new RuntimeException(ex);
|
||||||
|
}
|
||||||
|
|
||||||
throw new NoSuchElementException();
|
throw new NoSuchElementException();
|
||||||
});
|
});
|
||||||
}
|
}
|
||||||
@@ -49,9 +57,6 @@ public class DbDomainQueries {
|
|||||||
catch (ExecutionException ex) {
|
catch (ExecutionException ex) {
|
||||||
throw new RuntimeException(ex.getCause());
|
throw new RuntimeException(ex.getCause());
|
||||||
}
|
}
|
||||||
catch (SQLException ex) {
|
|
||||||
throw new RuntimeException(ex);
|
|
||||||
}
|
|
||||||
}
|
}
|
||||||
|
|
||||||
public OptionalInt tryGetDomainId(EdgeDomain domain) {
|
public OptionalInt tryGetDomainId(EdgeDomain domain) {
|
||||||
@@ -84,47 +89,55 @@ public class DbDomainQueries {
|
|||||||
}
|
}
|
||||||
|
|
||||||
public Optional<EdgeDomain> getDomain(int id) {
|
public Optional<EdgeDomain> getDomain(int id) {
|
||||||
try (var connection = dataSource.getConnection()) {
|
|
||||||
|
|
||||||
|
EdgeDomain existing = domainNameCache.getIfPresent(id);
|
||||||
|
if (existing != null) {
|
||||||
|
return Optional.of(existing);
|
||||||
|
}
|
||||||
|
|
||||||
|
try (var connection = dataSource.getConnection()) {
|
||||||
try (var stmt = connection.prepareStatement("SELECT DOMAIN_NAME FROM EC_DOMAIN WHERE ID=?")) {
|
try (var stmt = connection.prepareStatement("SELECT DOMAIN_NAME FROM EC_DOMAIN WHERE ID=?")) {
|
||||||
stmt.setInt(1, id);
|
stmt.setInt(1, id);
|
||||||
var rsp = stmt.executeQuery();
|
var rsp = stmt.executeQuery();
|
||||||
if (rsp.next()) {
|
if (rsp.next()) {
|
||||||
return Optional.of(new EdgeDomain(rsp.getString(1)));
|
var val = new EdgeDomain(rsp.getString(1));
|
||||||
|
domainNameCache.put(id, val);
|
||||||
|
return Optional.of(val);
|
||||||
}
|
}
|
||||||
return Optional.empty();
|
return Optional.empty();
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
catch (UncheckedExecutionException ex) {
|
|
||||||
throw new RuntimeException(ex.getCause());
|
|
||||||
}
|
|
||||||
catch (SQLException ex) {
|
catch (SQLException ex) {
|
||||||
throw new RuntimeException(ex);
|
throw new RuntimeException(ex);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
public List<DomainWithNode> otherSubdomains(EdgeDomain domain, int cnt) {
|
public List<DomainWithNode> otherSubdomains(EdgeDomain domain, int cnt) throws ExecutionException {
|
||||||
List<DomainWithNode> ret = new ArrayList<>();
|
String topDomain = domain.topDomain;
|
||||||
|
|
||||||
try (var conn = dataSource.getConnection();
|
return siblingsCache.get(topDomain, () -> {
|
||||||
var stmt = conn.prepareStatement("SELECT DOMAIN_NAME, NODE_AFFINITY FROM EC_DOMAIN WHERE DOMAIN_TOP = ? LIMIT ?")) {
|
List<DomainWithNode> ret = new ArrayList<>();
|
||||||
stmt.setString(1, domain.topDomain);
|
|
||||||
stmt.setInt(2, cnt);
|
|
||||||
|
|
||||||
var rs = stmt.executeQuery();
|
try (var conn = dataSource.getConnection();
|
||||||
while (rs.next()) {
|
var stmt = conn.prepareStatement("SELECT DOMAIN_NAME, NODE_AFFINITY FROM EC_DOMAIN WHERE DOMAIN_TOP = ? LIMIT ?")) {
|
||||||
var sibling = new EdgeDomain(rs.getString(1));
|
stmt.setString(1, topDomain);
|
||||||
|
stmt.setInt(2, cnt);
|
||||||
|
|
||||||
if (sibling.equals(domain))
|
var rs = stmt.executeQuery();
|
||||||
continue;
|
while (rs.next()) {
|
||||||
|
var sibling = new EdgeDomain(rs.getString(1));
|
||||||
|
|
||||||
ret.add(new DomainWithNode(sibling, rs.getInt(2)));
|
if (sibling.equals(domain))
|
||||||
|
continue;
|
||||||
|
|
||||||
|
ret.add(new DomainWithNode(sibling, rs.getInt(2)));
|
||||||
|
}
|
||||||
|
} catch (SQLException e) {
|
||||||
|
logger.error("Failed to get domain neighbors");
|
||||||
}
|
}
|
||||||
} catch (SQLException e) {
|
return ret;
|
||||||
logger.error("Failed to get domain neighbors");
|
});
|
||||||
}
|
|
||||||
|
|
||||||
return ret;
|
|
||||||
}
|
}
|
||||||
|
|
||||||
public record DomainWithNode (EdgeDomain domain, int nodeAffinity) {
|
public record DomainWithNode (EdgeDomain domain, int nodeAffinity) {
|
||||||
|
@@ -1,118 +0,0 @@
|
|||||||
package nu.marginalia.db;
|
|
||||||
|
|
||||||
import com.zaxxer.hikari.HikariDataSource;
|
|
||||||
|
|
||||||
import java.sql.Connection;
|
|
||||||
import java.sql.PreparedStatement;
|
|
||||||
import java.sql.SQLException;
|
|
||||||
import java.util.ArrayList;
|
|
||||||
import java.util.List;
|
|
||||||
import java.util.OptionalInt;
|
|
||||||
|
|
||||||
/** Class used in exporting data. This is intended to be used for a brief time
|
|
||||||
* and then discarded, not kept around as a service.
|
|
||||||
*/
|
|
||||||
public class DbDomainStatsExportMultitool implements AutoCloseable {
|
|
||||||
private final Connection connection;
|
|
||||||
private final int nodeId;
|
|
||||||
private final PreparedStatement knownUrlsQuery;
|
|
||||||
private final PreparedStatement visitedUrlsQuery;
|
|
||||||
private final PreparedStatement goodUrlsQuery;
|
|
||||||
private final PreparedStatement domainNameToId;
|
|
||||||
|
|
||||||
private final PreparedStatement allDomainsQuery;
|
|
||||||
private final PreparedStatement crawlQueueDomains;
|
|
||||||
private final PreparedStatement indexedDomainsQuery;
|
|
||||||
|
|
||||||
public DbDomainStatsExportMultitool(HikariDataSource dataSource, int nodeId) throws SQLException {
|
|
||||||
this.connection = dataSource.getConnection();
|
|
||||||
this.nodeId = nodeId;
|
|
||||||
|
|
||||||
knownUrlsQuery = connection.prepareStatement("""
|
|
||||||
SELECT KNOWN_URLS
|
|
||||||
FROM EC_DOMAIN INNER JOIN DOMAIN_METADATA
|
|
||||||
ON EC_DOMAIN.ID=DOMAIN_METADATA.ID
|
|
||||||
WHERE DOMAIN_NAME=?
|
|
||||||
""");
|
|
||||||
visitedUrlsQuery = connection.prepareStatement("""
|
|
||||||
SELECT VISITED_URLS
|
|
||||||
FROM EC_DOMAIN INNER JOIN DOMAIN_METADATA
|
|
||||||
ON EC_DOMAIN.ID=DOMAIN_METADATA.ID
|
|
||||||
WHERE DOMAIN_NAME=?
|
|
||||||
""");
|
|
||||||
goodUrlsQuery = connection.prepareStatement("""
|
|
||||||
SELECT GOOD_URLS
|
|
||||||
FROM EC_DOMAIN INNER JOIN DOMAIN_METADATA
|
|
||||||
ON EC_DOMAIN.ID=DOMAIN_METADATA.ID
|
|
||||||
WHERE DOMAIN_NAME=?
|
|
||||||
""");
|
|
||||||
domainNameToId = connection.prepareStatement("""
|
|
||||||
SELECT ID
|
|
||||||
FROM EC_DOMAIN
|
|
||||||
WHERE DOMAIN_NAME=?
|
|
||||||
""");
|
|
||||||
allDomainsQuery = connection.prepareStatement("""
|
|
||||||
SELECT DOMAIN_NAME
|
|
||||||
FROM EC_DOMAIN
|
|
||||||
""");
|
|
||||||
crawlQueueDomains = connection.prepareStatement("""
|
|
||||||
SELECT DOMAIN_NAME
|
|
||||||
FROM CRAWL_QUEUE
|
|
||||||
""");
|
|
||||||
indexedDomainsQuery = connection.prepareStatement("""
|
|
||||||
SELECT DOMAIN_NAME
|
|
||||||
FROM EC_DOMAIN
|
|
||||||
WHERE INDEXED > 0
|
|
||||||
""");
|
|
||||||
}
|
|
||||||
|
|
||||||
public OptionalInt getVisitedUrls(String domainName) throws SQLException {
|
|
||||||
return executeNameToIntQuery(domainName, visitedUrlsQuery);
|
|
||||||
}
|
|
||||||
|
|
||||||
public OptionalInt getDomainId(String domainName) throws SQLException {
|
|
||||||
return executeNameToIntQuery(domainName, domainNameToId);
|
|
||||||
}
|
|
||||||
|
|
||||||
public List<String> getCrawlQueueDomains() throws SQLException {
|
|
||||||
return executeListQuery(crawlQueueDomains, 100);
|
|
||||||
}
|
|
||||||
public List<String> getAllIndexedDomains() throws SQLException {
|
|
||||||
return executeListQuery(indexedDomainsQuery, 100_000);
|
|
||||||
}
|
|
||||||
|
|
||||||
private OptionalInt executeNameToIntQuery(String domainName, PreparedStatement statement)
|
|
||||||
throws SQLException {
|
|
||||||
statement.setString(1, domainName);
|
|
||||||
var rs = statement.executeQuery();
|
|
||||||
|
|
||||||
if (rs.next()) {
|
|
||||||
return OptionalInt.of(rs.getInt(1));
|
|
||||||
}
|
|
||||||
|
|
||||||
return OptionalInt.empty();
|
|
||||||
}
|
|
||||||
|
|
||||||
private List<String> executeListQuery(PreparedStatement statement, int sizeHint) throws SQLException {
|
|
||||||
List<String> ret = new ArrayList<>(sizeHint);
|
|
||||||
|
|
||||||
var rs = statement.executeQuery();
|
|
||||||
|
|
||||||
while (rs.next()) {
|
|
||||||
ret.add(rs.getString(1));
|
|
||||||
}
|
|
||||||
|
|
||||||
return ret;
|
|
||||||
}
|
|
||||||
|
|
||||||
@Override
|
|
||||||
public void close() throws SQLException {
|
|
||||||
knownUrlsQuery.close();
|
|
||||||
goodUrlsQuery.close();
|
|
||||||
visitedUrlsQuery.close();
|
|
||||||
allDomainsQuery.close();
|
|
||||||
crawlQueueDomains.close();
|
|
||||||
domainNameToId.close();
|
|
||||||
connection.close();
|
|
||||||
}
|
|
||||||
}
|
|
@@ -89,7 +89,7 @@ public class DatabaseModule extends AbstractModule {
|
|||||||
config.addDataSourceProperty("prepStmtCacheSize", "250");
|
config.addDataSourceProperty("prepStmtCacheSize", "250");
|
||||||
config.addDataSourceProperty("prepStmtCacheSqlLimit", "2048");
|
config.addDataSourceProperty("prepStmtCacheSqlLimit", "2048");
|
||||||
|
|
||||||
config.setMaximumPoolSize(5);
|
config.setMaximumPoolSize(Integer.getInteger("db.poolSize", 5));
|
||||||
config.setMinimumIdle(2);
|
config.setMinimumIdle(2);
|
||||||
|
|
||||||
config.setMaxLifetime(Duration.ofMinutes(9).toMillis());
|
config.setMaxLifetime(Duration.ofMinutes(9).toMillis());
|
||||||
|
@@ -15,7 +15,9 @@ import java.util.Map;
|
|||||||
|
|
||||||
/** Client for local browserless.io API */
|
/** Client for local browserless.io API */
|
||||||
public class BrowserlessClient implements AutoCloseable {
|
public class BrowserlessClient implements AutoCloseable {
|
||||||
|
|
||||||
private static final Logger logger = LoggerFactory.getLogger(BrowserlessClient.class);
|
private static final Logger logger = LoggerFactory.getLogger(BrowserlessClient.class);
|
||||||
|
private static final String BROWSERLESS_TOKEN = System.getProperty("live-capture.browserless-token", "BROWSERLESS_TOKEN");
|
||||||
|
|
||||||
private final HttpClient httpClient = HttpClient.newBuilder()
|
private final HttpClient httpClient = HttpClient.newBuilder()
|
||||||
.version(HttpClient.Version.HTTP_1_1)
|
.version(HttpClient.Version.HTTP_1_1)
|
||||||
@@ -36,7 +38,7 @@ public class BrowserlessClient implements AutoCloseable {
|
|||||||
);
|
);
|
||||||
|
|
||||||
var request = HttpRequest.newBuilder()
|
var request = HttpRequest.newBuilder()
|
||||||
.uri(browserlessURI.resolve("/content"))
|
.uri(browserlessURI.resolve("/content?token="+BROWSERLESS_TOKEN))
|
||||||
.method("POST", HttpRequest.BodyPublishers.ofString(
|
.method("POST", HttpRequest.BodyPublishers.ofString(
|
||||||
gson.toJson(requestData)
|
gson.toJson(requestData)
|
||||||
))
|
))
|
||||||
@@ -63,7 +65,7 @@ public class BrowserlessClient implements AutoCloseable {
|
|||||||
);
|
);
|
||||||
|
|
||||||
var request = HttpRequest.newBuilder()
|
var request = HttpRequest.newBuilder()
|
||||||
.uri(browserlessURI.resolve("/screenshot"))
|
.uri(browserlessURI.resolve("/screenshot?token="+BROWSERLESS_TOKEN))
|
||||||
.method("POST", HttpRequest.BodyPublishers.ofString(
|
.method("POST", HttpRequest.BodyPublishers.ofString(
|
||||||
gson.toJson(requestData)
|
gson.toJson(requestData)
|
||||||
))
|
))
|
||||||
|
@@ -1,6 +1,6 @@
|
|||||||
package nu.marginalia.rss.model;
|
package nu.marginalia.rss.model;
|
||||||
|
|
||||||
import com.apptasticsoftware.rssreader.Item;
|
import nu.marginalia.rss.svc.SimpleFeedParser;
|
||||||
import org.apache.commons.lang3.StringUtils;
|
import org.apache.commons.lang3.StringUtils;
|
||||||
import org.jetbrains.annotations.NotNull;
|
import org.jetbrains.annotations.NotNull;
|
||||||
import org.jsoup.Jsoup;
|
import org.jsoup.Jsoup;
|
||||||
@@ -18,37 +18,33 @@ public record FeedItem(String title,
|
|||||||
public static final int MAX_DESC_LENGTH = 255;
|
public static final int MAX_DESC_LENGTH = 255;
|
||||||
public static final DateTimeFormatter DATE_FORMAT = DateTimeFormatter.ofPattern("yyyy-MM-dd'T'HH:mm:ss.SSSZ");
|
public static final DateTimeFormatter DATE_FORMAT = DateTimeFormatter.ofPattern("yyyy-MM-dd'T'HH:mm:ss.SSSZ");
|
||||||
|
|
||||||
public static FeedItem fromItem(Item item, boolean keepFragment) {
|
public static FeedItem fromItem(SimpleFeedParser.ItemData item, boolean keepFragment) {
|
||||||
String title = item.getTitle().orElse("");
|
String title = item.title();
|
||||||
String date = getItemDate(item);
|
String date = getItemDate(item);
|
||||||
String description = getItemDescription(item);
|
String description = getItemDescription(item);
|
||||||
String url;
|
String url;
|
||||||
|
|
||||||
if (keepFragment || item.getLink().isEmpty()) {
|
if (keepFragment) {
|
||||||
url = item.getLink().orElse("");
|
url = item.url();
|
||||||
}
|
}
|
||||||
else {
|
else {
|
||||||
try {
|
try {
|
||||||
String link = item.getLink().get();
|
String link = item.url();
|
||||||
var linkUri = new URI(link);
|
var linkUri = new URI(link);
|
||||||
var cleanUri = new URI(linkUri.getScheme(), linkUri.getAuthority(), linkUri.getPath(), linkUri.getQuery(), null);
|
var cleanUri = new URI(linkUri.getScheme(), linkUri.getAuthority(), linkUri.getPath(), linkUri.getQuery(), null);
|
||||||
url = cleanUri.toString();
|
url = cleanUri.toString();
|
||||||
}
|
}
|
||||||
catch (Exception e) {
|
catch (Exception e) {
|
||||||
// fallback to original link if we can't clean it, this is not a very important step
|
// fallback to original link if we can't clean it, this is not a very important step
|
||||||
url = item.getLink().get();
|
url = item.url();
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
return new FeedItem(title, date, description, url);
|
return new FeedItem(title, date, description, url);
|
||||||
}
|
}
|
||||||
|
|
||||||
private static String getItemDescription(Item item) {
|
private static String getItemDescription(SimpleFeedParser.ItemData item) {
|
||||||
Optional<String> description = item.getDescription();
|
String rawDescription = item.description();
|
||||||
if (description.isEmpty())
|
|
||||||
return "";
|
|
||||||
|
|
||||||
String rawDescription = description.get();
|
|
||||||
if (rawDescription.indexOf('<') >= 0) {
|
if (rawDescription.indexOf('<') >= 0) {
|
||||||
rawDescription = Jsoup.parseBodyFragment(rawDescription).text();
|
rawDescription = Jsoup.parseBodyFragment(rawDescription).text();
|
||||||
}
|
}
|
||||||
@@ -58,15 +54,18 @@ public record FeedItem(String title,
|
|||||||
|
|
||||||
// e.g. http://fabiensanglard.net/rss.xml does dates like this: 1 Apr 2021 00:00:00 +0000
|
// e.g. http://fabiensanglard.net/rss.xml does dates like this: 1 Apr 2021 00:00:00 +0000
|
||||||
private static final DateTimeFormatter extraFormatter = DateTimeFormatter.ofPattern("d MMM yyyy HH:mm:ss Z");
|
private static final DateTimeFormatter extraFormatter = DateTimeFormatter.ofPattern("d MMM yyyy HH:mm:ss Z");
|
||||||
private static String getItemDate(Item item) {
|
private static String getItemDate(SimpleFeedParser.ItemData item) {
|
||||||
Optional<ZonedDateTime> zonedDateTime = Optional.empty();
|
Optional<ZonedDateTime> zonedDateTime = Optional.empty();
|
||||||
try {
|
try {
|
||||||
zonedDateTime = item.getPubDateZonedDateTime();
|
zonedDateTime = item.getPubDateZonedDateTime();
|
||||||
}
|
}
|
||||||
catch (Exception e) {
|
catch (Exception e) {
|
||||||
zonedDateTime = item.getPubDate()
|
try {
|
||||||
.map(extraFormatter::parse)
|
zonedDateTime = Optional.of(ZonedDateTime.from(extraFormatter.parse(item.pubDate())));
|
||||||
.map(ZonedDateTime::from);
|
}
|
||||||
|
catch (Exception e2) {
|
||||||
|
// ignore
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
return zonedDateTime.map(date -> date.format(DATE_FORMAT)).orElse("");
|
return zonedDateTime.map(date -> date.format(DATE_FORMAT)).orElse("");
|
||||||
|
@@ -1,7 +1,5 @@
|
|||||||
package nu.marginalia.rss.svc;
|
package nu.marginalia.rss.svc;
|
||||||
|
|
||||||
import com.apptasticsoftware.rssreader.Item;
|
|
||||||
import com.apptasticsoftware.rssreader.RssReader;
|
|
||||||
import com.google.inject.Inject;
|
import com.google.inject.Inject;
|
||||||
import com.opencsv.CSVReader;
|
import com.opencsv.CSVReader;
|
||||||
import nu.marginalia.WmsaHome;
|
import nu.marginalia.WmsaHome;
|
||||||
@@ -20,7 +18,6 @@ import nu.marginalia.storage.FileStorageService;
|
|||||||
import nu.marginalia.storage.model.FileStorage;
|
import nu.marginalia.storage.model.FileStorage;
|
||||||
import nu.marginalia.storage.model.FileStorageType;
|
import nu.marginalia.storage.model.FileStorageType;
|
||||||
import nu.marginalia.util.SimpleBlockingThreadPool;
|
import nu.marginalia.util.SimpleBlockingThreadPool;
|
||||||
import org.apache.commons.io.input.BOMInputStream;
|
|
||||||
import org.slf4j.Logger;
|
import org.slf4j.Logger;
|
||||||
import org.slf4j.LoggerFactory;
|
import org.slf4j.LoggerFactory;
|
||||||
|
|
||||||
@@ -32,7 +29,6 @@ import java.net.URISyntaxException;
|
|||||||
import java.net.http.HttpClient;
|
import java.net.http.HttpClient;
|
||||||
import java.net.http.HttpRequest;
|
import java.net.http.HttpRequest;
|
||||||
import java.net.http.HttpResponse;
|
import java.net.http.HttpResponse;
|
||||||
import java.nio.charset.StandardCharsets;
|
|
||||||
import java.sql.SQLException;
|
import java.sql.SQLException;
|
||||||
import java.time.*;
|
import java.time.*;
|
||||||
import java.time.format.DateTimeFormatter;
|
import java.time.format.DateTimeFormatter;
|
||||||
@@ -48,8 +44,6 @@ public class FeedFetcherService {
|
|||||||
private static final int MAX_FEED_ITEMS = 10;
|
private static final int MAX_FEED_ITEMS = 10;
|
||||||
private static final Logger logger = LoggerFactory.getLogger(FeedFetcherService.class);
|
private static final Logger logger = LoggerFactory.getLogger(FeedFetcherService.class);
|
||||||
|
|
||||||
private final RssReader rssReader = new RssReader();
|
|
||||||
|
|
||||||
private final FeedDb feedDb;
|
private final FeedDb feedDb;
|
||||||
private final FileStorageService fileStorageService;
|
private final FileStorageService fileStorageService;
|
||||||
private final NodeConfigurationService nodeConfigurationService;
|
private final NodeConfigurationService nodeConfigurationService;
|
||||||
@@ -72,17 +66,6 @@ public class FeedFetcherService {
|
|||||||
this.nodeConfigurationService = nodeConfigurationService;
|
this.nodeConfigurationService = nodeConfigurationService;
|
||||||
this.serviceHeartbeat = serviceHeartbeat;
|
this.serviceHeartbeat = serviceHeartbeat;
|
||||||
this.executorClient = executorClient;
|
this.executorClient = executorClient;
|
||||||
|
|
||||||
|
|
||||||
// Add support for some alternate date tags for atom
|
|
||||||
rssReader.addItemExtension("issued", this::setDateFallback);
|
|
||||||
rssReader.addItemExtension("created", this::setDateFallback);
|
|
||||||
}
|
|
||||||
|
|
||||||
private void setDateFallback(Item item, String value) {
|
|
||||||
if (item.getPubDate().isEmpty()) {
|
|
||||||
item.setPubDate(value);
|
|
||||||
}
|
|
||||||
}
|
}
|
||||||
|
|
||||||
public enum UpdateMode {
|
public enum UpdateMode {
|
||||||
@@ -371,12 +354,7 @@ public class FeedFetcherService {
|
|||||||
|
|
||||||
public FeedItems parseFeed(String feedData, FeedDefinition definition) {
|
public FeedItems parseFeed(String feedData, FeedDefinition definition) {
|
||||||
try {
|
try {
|
||||||
feedData = sanitizeEntities(feedData);
|
List<SimpleFeedParser.ItemData> rawItems = SimpleFeedParser.parse(feedData);
|
||||||
|
|
||||||
List<Item> rawItems = rssReader.read(
|
|
||||||
// Massage the data to maximize the possibility of the flaky XML parser consuming it
|
|
||||||
new BOMInputStream(new ByteArrayInputStream(feedData.trim().getBytes(StandardCharsets.UTF_8)), false)
|
|
||||||
).toList();
|
|
||||||
|
|
||||||
boolean keepUriFragment = rawItems.size() < 2 || areFragmentsDisparate(rawItems);
|
boolean keepUriFragment = rawItems.size() < 2 || areFragmentsDisparate(rawItems);
|
||||||
|
|
||||||
@@ -399,33 +377,6 @@ public class FeedFetcherService {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
private static final Map<String, String> HTML_ENTITIES = Map.of(
|
|
||||||
"»", "»",
|
|
||||||
"«", "«",
|
|
||||||
"—", "--",
|
|
||||||
"–", "-",
|
|
||||||
"’", "'",
|
|
||||||
"‘", "'",
|
|
||||||
""", "\"",
|
|
||||||
" ", ""
|
|
||||||
);
|
|
||||||
|
|
||||||
/** The XML parser will blow up if you insert HTML entities in the feed XML,
|
|
||||||
* which is unfortunately relatively common. Replace them as far as is possible
|
|
||||||
* with their corresponding characters
|
|
||||||
*/
|
|
||||||
static String sanitizeEntities(String feedData) {
|
|
||||||
String result = feedData;
|
|
||||||
for (Map.Entry<String, String> entry : HTML_ENTITIES.entrySet()) {
|
|
||||||
result = result.replace(entry.getKey(), entry.getValue());
|
|
||||||
}
|
|
||||||
|
|
||||||
// Handle lone ampersands not part of a recognized XML entity
|
|
||||||
result = result.replaceAll("&(?!(amp|lt|gt|apos|quot);)", "&");
|
|
||||||
|
|
||||||
return result;
|
|
||||||
}
|
|
||||||
|
|
||||||
/** Decide whether to keep URI fragments in the feed items.
|
/** Decide whether to keep URI fragments in the feed items.
|
||||||
* <p></p>
|
* <p></p>
|
||||||
* We keep fragments if there are multiple different fragments in the items.
|
* We keep fragments if there are multiple different fragments in the items.
|
||||||
@@ -433,16 +384,16 @@ public class FeedFetcherService {
|
|||||||
* @param items The items to check
|
* @param items The items to check
|
||||||
* @return True if we should keep the fragments, false otherwise
|
* @return True if we should keep the fragments, false otherwise
|
||||||
*/
|
*/
|
||||||
private boolean areFragmentsDisparate(List<Item> items) {
|
private boolean areFragmentsDisparate(List<SimpleFeedParser.ItemData> items) {
|
||||||
Set<String> seenFragments = new HashSet<>();
|
Set<String> seenFragments = new HashSet<>();
|
||||||
|
|
||||||
try {
|
try {
|
||||||
for (var item : items) {
|
for (var item : items) {
|
||||||
if (item.getLink().isEmpty()) {
|
if (item.url().isBlank()) {
|
||||||
continue;
|
continue;
|
||||||
}
|
}
|
||||||
|
|
||||||
var link = item.getLink().get();
|
var link = item.url();
|
||||||
if (!link.contains("#")) {
|
if (!link.contains("#")) {
|
||||||
continue;
|
continue;
|
||||||
}
|
}
|
||||||
|
@@ -10,6 +10,7 @@ import java.io.IOException;
|
|||||||
import java.nio.charset.StandardCharsets;
|
import java.nio.charset.StandardCharsets;
|
||||||
import java.nio.file.Files;
|
import java.nio.file.Files;
|
||||||
import java.nio.file.Path;
|
import java.nio.file.Path;
|
||||||
|
import java.util.function.BiConsumer;
|
||||||
|
|
||||||
/** Utility for recording fetched feeds to a journal, useful in debugging feed parser issues.
|
/** Utility for recording fetched feeds to a journal, useful in debugging feed parser issues.
|
||||||
*/
|
*/
|
||||||
@@ -59,6 +60,17 @@ public interface FeedJournal extends AutoCloseable {
|
|||||||
urlWriter.put(url);
|
urlWriter.put(url);
|
||||||
contentsWriter.put(contents);
|
contentsWriter.put(contents);
|
||||||
}
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
static void replay(Path journalPath, BiConsumer<String, String> urlAndContent) throws IOException {
|
||||||
|
try (SlopTable table = new SlopTable(journalPath)) {
|
||||||
|
final StringColumn.Reader urlReader = urlColumn.open(table);
|
||||||
|
final StringColumn.Reader contentsReader = contentsColumn.open(table);
|
||||||
|
|
||||||
|
while (urlReader.hasRemaining()) {
|
||||||
|
urlAndContent.accept(urlReader.get(), contentsReader.get());
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
@@ -0,0 +1,94 @@
|
|||||||
|
package nu.marginalia.rss.svc;
|
||||||
|
|
||||||
|
import com.apptasticsoftware.rssreader.DateTimeParser;
|
||||||
|
import com.apptasticsoftware.rssreader.util.Default;
|
||||||
|
import org.jsoup.Jsoup;
|
||||||
|
import org.jsoup.parser.Parser;
|
||||||
|
|
||||||
|
import java.time.ZonedDateTime;
|
||||||
|
import java.util.ArrayList;
|
||||||
|
import java.util.List;
|
||||||
|
import java.util.Optional;
|
||||||
|
|
||||||
|
public class SimpleFeedParser {
|
||||||
|
|
||||||
|
private static final DateTimeParser dateTimeParser = Default.getDateTimeParser();
|
||||||
|
|
||||||
|
public record ItemData (
|
||||||
|
String title,
|
||||||
|
String description,
|
||||||
|
String url,
|
||||||
|
String pubDate
|
||||||
|
) {
|
||||||
|
public boolean isWellFormed() {
|
||||||
|
return title != null && !title.isBlank() &&
|
||||||
|
description != null && !description.isBlank() &&
|
||||||
|
url != null && !url.isBlank() &&
|
||||||
|
pubDate != null && !pubDate.isBlank();
|
||||||
|
}
|
||||||
|
|
||||||
|
public Optional<ZonedDateTime> getPubDateZonedDateTime() {
|
||||||
|
try {
|
||||||
|
return Optional.ofNullable(dateTimeParser.parse(pubDate()));
|
||||||
|
}
|
||||||
|
catch (Exception e) {
|
||||||
|
return Optional.empty();
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
}
|
||||||
|
|
||||||
|
public static List<ItemData> parse(String content) {
|
||||||
|
var doc = Jsoup.parse(content, Parser.xmlParser());
|
||||||
|
List<ItemData> ret = new ArrayList<>();
|
||||||
|
|
||||||
|
doc.select("item, entry").forEach(element -> {
|
||||||
|
String link = "";
|
||||||
|
String title = "";
|
||||||
|
String description = "";
|
||||||
|
String pubDate = "";
|
||||||
|
|
||||||
|
for (String attr : List.of("title", "dc:title")) {
|
||||||
|
if (!title.isBlank())
|
||||||
|
break;
|
||||||
|
var tag = element.getElementsByTag(attr).first();
|
||||||
|
if (tag != null) {
|
||||||
|
title = tag.text();
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
for (String attr : List.of("title", "summary", "content", "description", "dc:description")) {
|
||||||
|
if (!description.isBlank())
|
||||||
|
break;
|
||||||
|
var tag = element.getElementsByTag(attr).first();
|
||||||
|
if (tag != null) {
|
||||||
|
description = tag.text();
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
for (String attr : List.of("pubDate", "published", "updated", "issued", "created", "dc:date")) {
|
||||||
|
if (!pubDate.isBlank())
|
||||||
|
break;
|
||||||
|
var tag = element.getElementsByTag(attr).first();
|
||||||
|
if (tag != null) {
|
||||||
|
pubDate = tag.text();
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
for (String attr : List.of("link", "url")) {
|
||||||
|
if (!link.isBlank())
|
||||||
|
break;
|
||||||
|
var tag = element.getElementsByTag(attr).first();
|
||||||
|
if (tag != null) {
|
||||||
|
link = tag.text();
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
ret.add(new ItemData(title, description, link, pubDate));
|
||||||
|
});
|
||||||
|
|
||||||
|
|
||||||
|
return ret;
|
||||||
|
}
|
||||||
|
|
||||||
|
}
|
@@ -2,16 +2,21 @@ package nu.marginalia.livecapture;
|
|||||||
|
|
||||||
import org.junit.jupiter.api.Assertions;
|
import org.junit.jupiter.api.Assertions;
|
||||||
import org.junit.jupiter.api.BeforeAll;
|
import org.junit.jupiter.api.BeforeAll;
|
||||||
|
import org.junit.jupiter.api.Tag;
|
||||||
import org.junit.jupiter.api.Test;
|
import org.junit.jupiter.api.Test;
|
||||||
import org.testcontainers.containers.GenericContainer;
|
import org.testcontainers.containers.GenericContainer;
|
||||||
import org.testcontainers.junit.jupiter.Testcontainers;
|
import org.testcontainers.junit.jupiter.Testcontainers;
|
||||||
import org.testcontainers.utility.DockerImageName;
|
import org.testcontainers.utility.DockerImageName;
|
||||||
|
|
||||||
import java.net.URI;
|
import java.net.URI;
|
||||||
|
import java.util.Map;
|
||||||
|
|
||||||
@Testcontainers
|
@Testcontainers
|
||||||
|
@Tag("slow")
|
||||||
public class BrowserlessClientTest {
|
public class BrowserlessClientTest {
|
||||||
static GenericContainer<?> container = new GenericContainer<>(DockerImageName.parse("browserless/chrome")).withExposedPorts(3000);
|
static GenericContainer<?> container = new GenericContainer<>(DockerImageName.parse("browserless/chrome"))
|
||||||
|
.withEnv(Map.of("TOKEN", "BROWSERLESS_TOKEN"))
|
||||||
|
.withExposedPorts(3000);
|
||||||
|
|
||||||
@BeforeAll
|
@BeforeAll
|
||||||
public static void setup() {
|
public static void setup() {
|
||||||
|
@@ -1,50 +0,0 @@
|
|||||||
package nu.marginalia.rss.svc;
|
|
||||||
|
|
||||||
import com.apptasticsoftware.rssreader.Item;
|
|
||||||
import com.apptasticsoftware.rssreader.RssReader;
|
|
||||||
import org.junit.jupiter.api.Assertions;
|
|
||||||
import org.junit.jupiter.api.Test;
|
|
||||||
|
|
||||||
import java.util.List;
|
|
||||||
import java.util.Optional;
|
|
||||||
|
|
||||||
public class TestXmlSanitization {
|
|
||||||
|
|
||||||
@Test
|
|
||||||
public void testPreservedEntities() {
|
|
||||||
Assertions.assertEquals("&", FeedFetcherService.sanitizeEntities("&"));
|
|
||||||
Assertions.assertEquals("<", FeedFetcherService.sanitizeEntities("<"));
|
|
||||||
Assertions.assertEquals(">", FeedFetcherService.sanitizeEntities(">"));
|
|
||||||
Assertions.assertEquals("'", FeedFetcherService.sanitizeEntities("'"));
|
|
||||||
}
|
|
||||||
|
|
||||||
@Test
|
|
||||||
public void testNlnetTitleTag() {
|
|
||||||
// The NLnet atom feed puts HTML tags in the entry/title tags, which breaks the vanilla RssReader code
|
|
||||||
|
|
||||||
// Verify we're able to consume and strip out the HTML tags
|
|
||||||
RssReader r = new RssReader();
|
|
||||||
|
|
||||||
List<Item> items = r.read(ClassLoader.getSystemResourceAsStream("nlnet.atom")).toList();
|
|
||||||
|
|
||||||
Assertions.assertEquals(1, items.size());
|
|
||||||
for (var item : items) {
|
|
||||||
Assertions.assertEquals(Optional.of("50 Free and Open Source Projects Selected for NGI Zero grants"), item.getTitle());
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
@Test
|
|
||||||
public void testStrayAmpersand() {
|
|
||||||
Assertions.assertEquals("Bed & Breakfast", FeedFetcherService.sanitizeEntities("Bed & Breakfast"));
|
|
||||||
}
|
|
||||||
|
|
||||||
@Test
|
|
||||||
public void testTranslatedHtmlEntity() {
|
|
||||||
Assertions.assertEquals("Foo -- Bar", FeedFetcherService.sanitizeEntities("Foo — Bar"));
|
|
||||||
}
|
|
||||||
|
|
||||||
@Test
|
|
||||||
public void testTranslatedHtmlEntityQuot() {
|
|
||||||
Assertions.assertEquals("\"Bob\"", FeedFetcherService.sanitizeEntities(""Bob""));
|
|
||||||
}
|
|
||||||
}
|
|
@@ -16,20 +16,19 @@ import org.slf4j.LoggerFactory;
|
|||||||
|
|
||||||
import java.util.ArrayList;
|
import java.util.ArrayList;
|
||||||
import java.util.Comparator;
|
import java.util.Comparator;
|
||||||
import java.util.Iterator;
|
|
||||||
import java.util.List;
|
import java.util.List;
|
||||||
import java.util.concurrent.CompletableFuture;
|
import java.util.concurrent.CompletableFuture;
|
||||||
import java.util.concurrent.ExecutorService;
|
import java.util.concurrent.ExecutorService;
|
||||||
import java.util.concurrent.Executors;
|
import java.util.concurrent.Executors;
|
||||||
|
import java.util.concurrent.atomic.AtomicInteger;
|
||||||
import static java.lang.Math.clamp;
|
import java.util.function.Consumer;
|
||||||
|
|
||||||
@Singleton
|
@Singleton
|
||||||
public class IndexClient {
|
public class IndexClient {
|
||||||
private static final Logger logger = LoggerFactory.getLogger(IndexClient.class);
|
private static final Logger logger = LoggerFactory.getLogger(IndexClient.class);
|
||||||
private final GrpcMultiNodeChannelPool<IndexApiGrpc.IndexApiBlockingStub> channelPool;
|
private final GrpcMultiNodeChannelPool<IndexApiGrpc.IndexApiBlockingStub> channelPool;
|
||||||
private final DomainBlacklistImpl blacklist;
|
private final DomainBlacklistImpl blacklist;
|
||||||
private static final ExecutorService executor = Executors.newVirtualThreadPerTaskExecutor();
|
private static final ExecutorService executor = Executors.newCachedThreadPool();
|
||||||
|
|
||||||
@Inject
|
@Inject
|
||||||
public IndexClient(GrpcChannelPoolFactory channelPoolFactory, DomainBlacklistImpl blacklist) {
|
public IndexClient(GrpcChannelPoolFactory channelPoolFactory, DomainBlacklistImpl blacklist) {
|
||||||
@@ -51,40 +50,37 @@ public class IndexClient {
|
|||||||
|
|
||||||
/** Execute a query on the index partitions and return the combined results. */
|
/** Execute a query on the index partitions and return the combined results. */
|
||||||
public AggregateQueryResponse executeQueries(RpcIndexQuery indexRequest, Pagination pagination) {
|
public AggregateQueryResponse executeQueries(RpcIndexQuery indexRequest, Pagination pagination) {
|
||||||
List<CompletableFuture<Iterator<RpcDecoratedResultItem>>> futures =
|
|
||||||
channelPool.call(IndexApiGrpc.IndexApiBlockingStub::query)
|
|
||||||
.async(executor)
|
|
||||||
.runEach(indexRequest);
|
|
||||||
|
|
||||||
final int requestedMaxResults = indexRequest.getQueryLimits().getResultsTotal();
|
final int requestedMaxResults = indexRequest.getQueryLimits().getResultsTotal();
|
||||||
final int resultsUpperBound = requestedMaxResults * channelPool.getNumNodes();
|
|
||||||
|
|
||||||
List<RpcDecoratedResultItem> results = new ArrayList<>(resultsUpperBound);
|
AtomicInteger totalNumResults = new AtomicInteger(0);
|
||||||
|
|
||||||
for (var future : futures) {
|
List<RpcDecoratedResultItem> results =
|
||||||
try {
|
channelPool.call(IndexApiGrpc.IndexApiBlockingStub::query)
|
||||||
future.get().forEachRemaining(results::add);
|
.async(executor)
|
||||||
}
|
.runEach(indexRequest)
|
||||||
catch (Exception e) {
|
.stream()
|
||||||
logger.error("Downstream exception", e);
|
.map(future -> future.thenApply(iterator -> {
|
||||||
}
|
List<RpcDecoratedResultItem> ret = new ArrayList<>(requestedMaxResults);
|
||||||
}
|
iterator.forEachRemaining(ret::add);
|
||||||
|
totalNumResults.addAndGet(ret.size());
|
||||||
|
return ret;
|
||||||
|
}))
|
||||||
|
.mapMulti((CompletableFuture<List<RpcDecoratedResultItem>> fut, Consumer<List<RpcDecoratedResultItem>> c) ->{
|
||||||
|
try {
|
||||||
|
c.accept(fut.join());
|
||||||
|
} catch (Exception e) {
|
||||||
|
logger.error("Error while fetching results", e);
|
||||||
|
}
|
||||||
|
})
|
||||||
|
.flatMap(List::stream)
|
||||||
|
.filter(item -> !isBlacklisted(item))
|
||||||
|
.sorted(comparator)
|
||||||
|
.skip(Math.max(0, (pagination.page - 1) * pagination.pageSize))
|
||||||
|
.limit(pagination.pageSize)
|
||||||
|
.toList();
|
||||||
|
|
||||||
// Sort the results by ranking score and remove blacklisted domains
|
return new AggregateQueryResponse(results, pagination.page(), totalNumResults.get());
|
||||||
results.sort(comparator);
|
|
||||||
results.removeIf(this::isBlacklisted);
|
|
||||||
|
|
||||||
int numReceivedResults = results.size();
|
|
||||||
|
|
||||||
// pagination is typically 1-indexed, so we need to adjust the start and end indices
|
|
||||||
int indexStart = (pagination.page - 1) * pagination.pageSize;
|
|
||||||
int indexEnd = (pagination.page) * pagination.pageSize;
|
|
||||||
|
|
||||||
results = results.subList(
|
|
||||||
clamp(indexStart, 0, Math.max(0, results.size() - 1)), // from is inclusive, so subtract 1 from size()
|
|
||||||
clamp(indexEnd, 0, results.size()));
|
|
||||||
|
|
||||||
return new AggregateQueryResponse(results, pagination.page(), numReceivedResults);
|
|
||||||
}
|
}
|
||||||
|
|
||||||
private boolean isBlacklisted(RpcDecoratedResultItem item) {
|
private boolean isBlacklisted(RpcDecoratedResultItem item) {
|
||||||
|
@@ -22,6 +22,7 @@ import java.nio.charset.StandardCharsets;
|
|||||||
import java.nio.file.Files;
|
import java.nio.file.Files;
|
||||||
import java.nio.file.Path;
|
import java.nio.file.Path;
|
||||||
import java.security.NoSuchAlgorithmException;
|
import java.security.NoSuchAlgorithmException;
|
||||||
|
import java.time.Duration;
|
||||||
import java.time.Instant;
|
import java.time.Instant;
|
||||||
import java.util.*;
|
import java.util.*;
|
||||||
|
|
||||||
@@ -167,6 +168,19 @@ public class WarcRecorder implements AutoCloseable {
|
|||||||
warcRequest.http(); // force HTTP header to be parsed before body is consumed so that caller can use it
|
warcRequest.http(); // force HTTP header to be parsed before body is consumed so that caller can use it
|
||||||
writer.write(warcRequest);
|
writer.write(warcRequest);
|
||||||
|
|
||||||
|
if (Duration.between(date, Instant.now()).compareTo(Duration.ofSeconds(9)) > 0
|
||||||
|
&& inputBuffer.size() < 2048)
|
||||||
|
{
|
||||||
|
// Fast detection and mitigation of crawler traps that respond with slow
|
||||||
|
// small responses, with a high branching factor
|
||||||
|
|
||||||
|
// Note we bail *after* writing the warc records, this will effectively only
|
||||||
|
// prevent link extraction from the document.
|
||||||
|
|
||||||
|
logger.warn("URL {} took too long to fetch and was too small for the effort", requestUri);
|
||||||
|
return new HttpFetchResult.ResultException(new IOException("Likely crawler trap"));
|
||||||
|
}
|
||||||
|
|
||||||
return new HttpFetchResult.ResultOk(responseUri,
|
return new HttpFetchResult.ResultOk(responseUri,
|
||||||
response.code(),
|
response.code(),
|
||||||
inputBuffer.headers(),
|
inputBuffer.headers(),
|
||||||
|
@@ -84,18 +84,33 @@ public record SearchParameters(WebsiteUrl url,
|
|||||||
}
|
}
|
||||||
|
|
||||||
public String renderUrl() {
|
public String renderUrl() {
|
||||||
String path = String.format("/search?query=%s&profile=%s&js=%s&adtech=%s&recent=%s&searchTitle=%s&newfilter=%s&page=%d",
|
|
||||||
URLEncoder.encode(query, StandardCharsets.UTF_8),
|
|
||||||
URLEncoder.encode(profile.filterId, StandardCharsets.UTF_8),
|
|
||||||
URLEncoder.encode(js.value, StandardCharsets.UTF_8),
|
|
||||||
URLEncoder.encode(adtech.value, StandardCharsets.UTF_8),
|
|
||||||
URLEncoder.encode(recent.value, StandardCharsets.UTF_8),
|
|
||||||
URLEncoder.encode(searchTitle.value, StandardCharsets.UTF_8),
|
|
||||||
Boolean.valueOf(newFilter).toString(),
|
|
||||||
page
|
|
||||||
);
|
|
||||||
|
|
||||||
return path;
|
StringBuilder pathBuilder = new StringBuilder("/search?");
|
||||||
|
pathBuilder.append("query=").append(URLEncoder.encode(query, StandardCharsets.UTF_8));
|
||||||
|
|
||||||
|
if (profile != SearchProfile.NO_FILTER) {
|
||||||
|
pathBuilder.append("&profile=").append(URLEncoder.encode(profile.filterId, StandardCharsets.UTF_8));
|
||||||
|
}
|
||||||
|
if (js != SearchJsParameter.DEFAULT) {
|
||||||
|
pathBuilder.append("&js=").append(URLEncoder.encode(js.value, StandardCharsets.UTF_8));
|
||||||
|
}
|
||||||
|
if (adtech != SearchAdtechParameter.DEFAULT) {
|
||||||
|
pathBuilder.append("&adtech=").append(URLEncoder.encode(adtech.value, StandardCharsets.UTF_8));
|
||||||
|
}
|
||||||
|
if (recent != SearchRecentParameter.DEFAULT) {
|
||||||
|
pathBuilder.append("&recent=").append(URLEncoder.encode(recent.value, StandardCharsets.UTF_8));
|
||||||
|
}
|
||||||
|
if (searchTitle != SearchTitleParameter.DEFAULT) {
|
||||||
|
pathBuilder.append("&searchTitle=").append(URLEncoder.encode(searchTitle.value, StandardCharsets.UTF_8));
|
||||||
|
}
|
||||||
|
if (page != 1) {
|
||||||
|
pathBuilder.append("&page=").append(page);
|
||||||
|
}
|
||||||
|
if (newFilter) {
|
||||||
|
pathBuilder.append("&newfilter=").append(Boolean.valueOf(newFilter).toString());
|
||||||
|
}
|
||||||
|
|
||||||
|
return pathBuilder.toString();
|
||||||
}
|
}
|
||||||
|
|
||||||
public RpcTemporalBias.Bias temporalBias() {
|
public RpcTemporalBias.Bias temporalBias() {
|
||||||
|
@@ -3,27 +3,22 @@ package nu.marginalia.search.command.commands;
|
|||||||
import com.google.inject.Inject;
|
import com.google.inject.Inject;
|
||||||
import io.jooby.MapModelAndView;
|
import io.jooby.MapModelAndView;
|
||||||
import io.jooby.ModelAndView;
|
import io.jooby.ModelAndView;
|
||||||
import nu.marginalia.search.JteRenderer;
|
|
||||||
import nu.marginalia.search.SearchOperator;
|
import nu.marginalia.search.SearchOperator;
|
||||||
import nu.marginalia.search.command.SearchCommandInterface;
|
import nu.marginalia.search.command.SearchCommandInterface;
|
||||||
import nu.marginalia.search.command.SearchParameters;
|
import nu.marginalia.search.command.SearchParameters;
|
||||||
import nu.marginalia.search.model.DecoratedSearchResults;
|
import nu.marginalia.search.model.DecoratedSearchResults;
|
||||||
import nu.marginalia.search.model.NavbarModel;
|
import nu.marginalia.search.model.NavbarModel;
|
||||||
|
|
||||||
import java.io.IOException;
|
|
||||||
import java.util.Map;
|
import java.util.Map;
|
||||||
import java.util.Optional;
|
import java.util.Optional;
|
||||||
|
|
||||||
public class SearchCommand implements SearchCommandInterface {
|
public class SearchCommand implements SearchCommandInterface {
|
||||||
private final SearchOperator searchOperator;
|
private final SearchOperator searchOperator;
|
||||||
private final JteRenderer jteRenderer;
|
|
||||||
|
|
||||||
|
|
||||||
@Inject
|
@Inject
|
||||||
public SearchCommand(SearchOperator searchOperator,
|
public SearchCommand(SearchOperator searchOperator){
|
||||||
JteRenderer jteRenderer) throws IOException {
|
|
||||||
this.searchOperator = searchOperator;
|
this.searchOperator = searchOperator;
|
||||||
this.jteRenderer = jteRenderer;
|
|
||||||
}
|
}
|
||||||
|
|
||||||
@Override
|
@Override
|
||||||
|
@@ -28,6 +28,7 @@ import org.slf4j.LoggerFactory;
|
|||||||
import java.sql.SQLException;
|
import java.sql.SQLException;
|
||||||
import java.util.*;
|
import java.util.*;
|
||||||
import java.util.concurrent.CompletableFuture;
|
import java.util.concurrent.CompletableFuture;
|
||||||
|
import java.util.concurrent.ExecutionException;
|
||||||
import java.util.concurrent.Future;
|
import java.util.concurrent.Future;
|
||||||
import java.util.concurrent.TimeUnit;
|
import java.util.concurrent.TimeUnit;
|
||||||
import java.util.function.Supplier;
|
import java.util.function.Supplier;
|
||||||
@@ -67,8 +68,12 @@ public class SearchSiteInfoService {
|
|||||||
this.screenshotService = screenshotService;
|
this.screenshotService = screenshotService;
|
||||||
this.dataSource = dataSource;
|
this.dataSource = dataSource;
|
||||||
this.searchSiteSubscriptions = searchSiteSubscriptions;
|
this.searchSiteSubscriptions = searchSiteSubscriptions;
|
||||||
|
|
||||||
|
Thread.ofPlatform().name("Recently Added Domains Model Updater").start(this::modelUpdater);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
private volatile SiteOverviewModel cachedOverviewModel = new SiteOverviewModel(List.of());
|
||||||
|
|
||||||
@GET
|
@GET
|
||||||
@Path("/site")
|
@Path("/site")
|
||||||
public ModelAndView<?> handleOverview(@QueryParam String domain) {
|
public ModelAndView<?> handleOverview(@QueryParam String domain) {
|
||||||
@@ -77,23 +82,48 @@ public class SearchSiteInfoService {
|
|||||||
return new MapModelAndView("redirect.jte", Map.of("url", "/site/"+domain));
|
return new MapModelAndView("redirect.jte", Map.of("url", "/site/"+domain));
|
||||||
}
|
}
|
||||||
|
|
||||||
List<SiteOverviewModel.DiscoveredDomain> domains = new ArrayList<>();
|
|
||||||
|
|
||||||
try (var conn = dataSource.getConnection();
|
|
||||||
var stmt = conn.prepareStatement("SELECT DOMAIN_NAME, DISCOVER_DATE FROM EC_DOMAIN WHERE NODE_AFFINITY = 0 ORDER BY ID DESC LIMIT 10")) {
|
|
||||||
|
|
||||||
var rs = stmt.executeQuery();
|
|
||||||
while (rs.next()) {
|
|
||||||
domains.add(new SiteOverviewModel.DiscoveredDomain(rs.getString("DOMAIN_NAME"), rs.getString("DISCOVER_DATE")));
|
|
||||||
}
|
|
||||||
}
|
|
||||||
catch (SQLException ex) {
|
|
||||||
throw new RuntimeException();
|
|
||||||
}
|
|
||||||
|
|
||||||
return new MapModelAndView("siteinfo/start.jte",
|
return new MapModelAndView("siteinfo/start.jte",
|
||||||
Map.of("navbar", NavbarModel.SITEINFO,
|
Map.of("navbar", NavbarModel.SITEINFO,
|
||||||
"model", new SiteOverviewModel(domains)));
|
"model", cachedOverviewModel));
|
||||||
|
}
|
||||||
|
|
||||||
|
private void modelUpdater() {
|
||||||
|
while (!Thread.interrupted()) {
|
||||||
|
List<SiteOverviewModel.DiscoveredDomain> domains = new ArrayList<>();
|
||||||
|
|
||||||
|
// This query can be quite expensive, so we can't run it on demand
|
||||||
|
// for every request. Instead, we run it every 15 minutes and cache
|
||||||
|
// the result.
|
||||||
|
|
||||||
|
try (var conn = dataSource.getConnection();
|
||||||
|
var stmt = conn.prepareStatement("""
|
||||||
|
SELECT DOMAIN_NAME, DISCOVER_DATE
|
||||||
|
FROM EC_DOMAIN
|
||||||
|
WHERE NODE_AFFINITY = 0
|
||||||
|
ORDER BY ID DESC
|
||||||
|
LIMIT 10
|
||||||
|
"""))
|
||||||
|
{
|
||||||
|
var rs = stmt.executeQuery();
|
||||||
|
while (rs.next()) {
|
||||||
|
domains.add(new SiteOverviewModel.DiscoveredDomain(
|
||||||
|
rs.getString("DOMAIN_NAME"),
|
||||||
|
rs.getString("DISCOVER_DATE"))
|
||||||
|
);
|
||||||
|
}
|
||||||
|
} catch (SQLException ex) {
|
||||||
|
logger.warn("Failed to get recently added domains: {}", ex.getMessage());
|
||||||
|
}
|
||||||
|
|
||||||
|
cachedOverviewModel = new SiteOverviewModel(domains);
|
||||||
|
|
||||||
|
try {
|
||||||
|
TimeUnit.MINUTES.sleep(15);
|
||||||
|
} catch (InterruptedException e) {
|
||||||
|
Thread.currentThread().interrupt();
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
public record SiteOverviewModel(List<DiscoveredDomain> domains) {
|
public record SiteOverviewModel(List<DiscoveredDomain> domains) {
|
||||||
@@ -107,7 +137,7 @@ public class SearchSiteInfoService {
|
|||||||
@PathParam String domainName,
|
@PathParam String domainName,
|
||||||
@QueryParam String view,
|
@QueryParam String view,
|
||||||
@QueryParam Integer page
|
@QueryParam Integer page
|
||||||
) throws SQLException {
|
) throws SQLException, ExecutionException {
|
||||||
|
|
||||||
if (null == domainName || domainName.isBlank()) {
|
if (null == domainName || domainName.isBlank()) {
|
||||||
return null;
|
return null;
|
||||||
@@ -193,7 +223,7 @@ public class SearchSiteInfoService {
|
|||||||
);
|
);
|
||||||
}
|
}
|
||||||
|
|
||||||
private SiteInfoWithContext listInfo(Context context, String domainName) {
|
private SiteInfoWithContext listInfo(Context context, String domainName) throws ExecutionException {
|
||||||
|
|
||||||
var domain = new EdgeDomain(domainName);
|
var domain = new EdgeDomain(domainName);
|
||||||
final int domainId = domainQueries.tryGetDomainId(domain).orElse(-1);
|
final int domainId = domainQueries.tryGetDomainId(domain).orElse(-1);
|
||||||
|
@@ -36,10 +36,11 @@
|
|||||||
|
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
|
@if (filters.showRecentOption.isSet()) <input type="hidden" name="js" value="${filters.removeJsOption.value()}"> @endif
|
||||||
|
@if (filters.reduceAdtechOption.isSet()) <input type="hidden" name="adtech" value="${filters.reduceAdtechOption.value()}"> @endif
|
||||||
|
@if (filters.searchTitleOption.isSet()) <input type="hidden" name="searchTitle" value="${filters.searchTitleOption.value()}"> @endif
|
||||||
|
@if (filters.showRecentOption.isSet()) <input type="hidden" name="recent" value="${filters.showRecentOption.value()}"> @endif
|
||||||
|
|
||||||
<input type="hidden" name="js" value="${filters.removeJsOption.value()}">
|
|
||||||
<input type="hidden" name="adtech" value="${filters.reduceAdtechOption.value()}">
|
|
||||||
<input type="hidden" name="searchTitle" value="${filters.searchTitleOption.value()}">
|
|
||||||
<input type="hidden" name="profile" value="${profile}">
|
<input type="hidden" name="profile" value="${profile}">
|
||||||
<input type="hidden" name="recent" value="${filters.showRecentOption.value()}">
|
|
||||||
</form>
|
</form>
|
||||||
|
@@ -34,12 +34,12 @@
|
|||||||
<div class="max-w-7xl mx-auto flex flex-col space-y-4 fill-w">
|
<div class="max-w-7xl mx-auto flex flex-col space-y-4 fill-w">
|
||||||
<div class="border dark:border-gray-600 dark:bg-gray-800 bg-white rounded p-2 m-4 ">
|
<div class="border dark:border-gray-600 dark:bg-gray-800 bg-white rounded p-2 m-4 ">
|
||||||
<div class="text-slate-700 dark:text-white text-sm p-4">
|
<div class="text-slate-700 dark:text-white text-sm p-4">
|
||||||
<div class="fas fa-wrench mr-1 text-margeblue dark:text-slate-200"></div>
|
<div class="fas fa-gift mr-1 text-margeblue dark:text-slate-200"></div>
|
||||||
This is the new design and home of Marginalia Search. Migration to the new domain <pre class="inline text-red-800 dark:text-red-100">marginalia-search.com</pre> is currently <em>in progress</em>,
|
This is the new design and home of Marginalia Search.
|
||||||
so mind that some things may be a bit broken for a day or two. <a href="https://about.marginalia-search.com/article/redesign/" class="underline text-liteblue dark:text-blue-200">Read more</a>.
|
You can read about what this entails <a href="https://about.marginalia-search.com/article/redesign/" class="underline text-liteblue dark:text-blue-200">here</a>.
|
||||||
<p class="my-4"></p>
|
<p class="my-4"></p>
|
||||||
If you have any issues or feedback regarding this change, please email
|
The old version of Marginalia Search remains available at
|
||||||
<a href="mailto:contact@marginalia-search.com" class="underline text-liteblue dark:text-blue-200">contact@marginalia-search.com</a>.
|
<a href="https://old-search.marginalia.nu/" class="underline text-liteblue dark:text-blue-200">https://old-search.marginalia.nu/</a>.
|
||||||
</div>
|
</div>
|
||||||
</div>
|
</div>
|
||||||
<div class="mx-auto flex flex-col sm:flex-row my-4 sm:space-x-2 space-y-2 sm:space-y-0 w-full md:w-auto px-2">
|
<div class="mx-auto flex flex-col sm:flex-row my-4 sm:space-x-2 space-y-2 sm:space-y-0 w-full md:w-auto px-2">
|
||||||
|
@@ -1,5 +1,4 @@
|
|||||||
@import nu.marginalia.db.DbDomainQueries
|
@import nu.marginalia.db.DbDomainQueries
|
||||||
@import nu.marginalia.model.EdgeDomain
|
|
||||||
@import nu.marginalia.search.svc.SearchSiteInfoService
|
@import nu.marginalia.search.svc.SearchSiteInfoService
|
||||||
@import nu.marginalia.search.svc.SearchSiteInfoService.*
|
@import nu.marginalia.search.svc.SearchSiteInfoService.*
|
||||||
@import nu.marginalia.search.model.UrlDetails
|
@import nu.marginalia.search.model.UrlDetails
|
||||||
@@ -81,35 +80,6 @@
|
|||||||
|
|
||||||
@endif
|
@endif
|
||||||
|
|
||||||
|
|
||||||
@if (!siteInfo.siblingDomains().isEmpty())
|
|
||||||
<div class="mx-3 flex place-items-baseline space-x-2 p-2 bg-gray-100 dark:bg-gray-600 rounded">
|
|
||||||
<i class="fas fa-globe"></i>
|
|
||||||
<span>Related Subdomains</span>
|
|
||||||
</div>
|
|
||||||
|
|
||||||
<table class="min-w-full divide-y divide-gray-200 dark:divide-gray-600 mx-4">
|
|
||||||
<thead>
|
|
||||||
<tr class="bg-gray-50 dark:bg-gray-700">
|
|
||||||
<th scope="col" class="px-2 py-2 text-left text-xs font-medium text-gray-500 dark:text-gray-100 uppercase tracking-wider">Domain Name</th>
|
|
||||||
</tr>
|
|
||||||
</thead>
|
|
||||||
<tbody class="bg-white dark:bg-gray-800 divide-y divide-gray-200 dark:divide-gray-600 text-xs">
|
|
||||||
@for (DbDomainQueries.DomainWithNode sibling : siteInfo.siblingDomains())
|
|
||||||
<tr>
|
|
||||||
<td class="px-3 py-6 md:py-3 whitespace-nowrap">
|
|
||||||
<a class="text-liteblue dark:text-blue-200" href="/site/${sibling.domain().toString()}">${sibling.domain().toString()}</a>
|
|
||||||
|
|
||||||
@if (!sibling.isIndexed())
|
|
||||||
<i class="ml-1 fa-regular fa-question-circle text-gray-400 dark:text-gray-600 text-xs" title="Not indexed"></i>
|
|
||||||
@endif
|
|
||||||
</td>
|
|
||||||
</tr>
|
|
||||||
@endfor
|
|
||||||
</tbody>
|
|
||||||
</table>
|
|
||||||
@endif
|
|
||||||
|
|
||||||
@if (siteInfo.domainInformation().isUnknownDomain())
|
@if (siteInfo.domainInformation().isUnknownDomain())
|
||||||
<div class="mx-3 flex place-items-baseline space-x-2 p-2 bg-gray-100 dark:bg-gray-600 rounded">
|
<div class="mx-3 flex place-items-baseline space-x-2 p-2 bg-gray-100 dark:bg-gray-600 rounded">
|
||||||
<i class="fa-regular fa-circle-question"></i>
|
<i class="fa-regular fa-circle-question"></i>
|
||||||
@@ -178,6 +148,36 @@
|
|||||||
</form>
|
</form>
|
||||||
@endif
|
@endif
|
||||||
|
|
||||||
|
|
||||||
|
@if (!siteInfo.siblingDomains().isEmpty())
|
||||||
|
<div class="mx-3 flex place-items-baseline space-x-2 p-2 bg-gray-100 dark:bg-gray-600 rounded">
|
||||||
|
<i class="fas fa-globe"></i>
|
||||||
|
<span>Related Subdomains</span>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
<table class="min-w-full divide-y divide-gray-200 dark:divide-gray-600 mx-4">
|
||||||
|
<thead>
|
||||||
|
<tr class="bg-gray-50 dark:bg-gray-700">
|
||||||
|
<th scope="col" class="px-2 py-2 text-left text-xs font-medium text-gray-500 dark:text-gray-100 uppercase tracking-wider">Domain Name</th>
|
||||||
|
</tr>
|
||||||
|
</thead>
|
||||||
|
<tbody class="bg-white dark:bg-gray-800 divide-y divide-gray-200 dark:divide-gray-600 text-xs">
|
||||||
|
@for (DbDomainQueries.DomainWithNode sibling : siteInfo.siblingDomains())
|
||||||
|
<tr>
|
||||||
|
<td class="px-3 py-6 md:py-3 whitespace-nowrap">
|
||||||
|
<a class="text-liteblue dark:text-blue-200" href="/site/${sibling.domain().toString()}">${sibling.domain().toString()}</a>
|
||||||
|
|
||||||
|
@if (!sibling.isIndexed())
|
||||||
|
<i class="ml-1 fa-regular fa-question-circle text-gray-400 dark:text-gray-600 text-xs" title="Not indexed"></i>
|
||||||
|
@endif
|
||||||
|
</td>
|
||||||
|
</tr>
|
||||||
|
@endfor
|
||||||
|
</tbody>
|
||||||
|
</table>
|
||||||
|
@endif
|
||||||
|
|
||||||
|
|
||||||
@if (siteInfo.isKnown())
|
@if (siteInfo.isKnown())
|
||||||
<div class="mx-3 flex place-items-baseline space-x-2 p-2 bg-gray-100 dark:bg-gray-600 rounded">
|
<div class="mx-3 flex place-items-baseline space-x-2 p-2 bg-gray-100 dark:bg-gray-600 rounded">
|
||||||
<i class="fas fa-chart-simple"></i>
|
<i class="fas fa-chart-simple"></i>
|
||||||
|
@@ -28,7 +28,7 @@ import java.util.concurrent.TimeUnit;
|
|||||||
public class ScreenshotCaptureToolMain {
|
public class ScreenshotCaptureToolMain {
|
||||||
|
|
||||||
private static final Logger logger = LoggerFactory.getLogger(ScreenshotCaptureToolMain.class);
|
private static final Logger logger = LoggerFactory.getLogger(ScreenshotCaptureToolMain.class);
|
||||||
|
private static final String BROWSERLESS_TOKEN = System.getenv("live-capture.browserless-token");
|
||||||
public static void main(String[] args) {
|
public static void main(String[] args) {
|
||||||
DatabaseModule databaseModule = new DatabaseModule(false);
|
DatabaseModule databaseModule = new DatabaseModule(false);
|
||||||
var ds = databaseModule.provideConnection();
|
var ds = databaseModule.provideConnection();
|
||||||
@@ -107,7 +107,7 @@ public class ScreenshotCaptureToolMain {
|
|||||||
);
|
);
|
||||||
|
|
||||||
var request = HttpRequest.newBuilder()
|
var request = HttpRequest.newBuilder()
|
||||||
.uri(new URI("http://browserless:3000/screenshot"))
|
.uri(new URI("http://browserless:3000/screenshot?token=" + BROWSERLESS_TOKEN))
|
||||||
.method("POST", HttpRequest.BodyPublishers.ofString(
|
.method("POST", HttpRequest.BodyPublishers.ofString(
|
||||||
gson.toJson(requestData)
|
gson.toJson(requestData)
|
||||||
))
|
))
|
||||||
|
@@ -72,11 +72,11 @@ services:
|
|||||||
image: "mariadb:lts"
|
image: "mariadb:lts"
|
||||||
container_name: "mariadb"
|
container_name: "mariadb"
|
||||||
env_file: "${INSTALL_DIR}/env/mariadb.env"
|
env_file: "${INSTALL_DIR}/env/mariadb.env"
|
||||||
command: ['mysqld', '--character-set-server=utf8mb4', '--collation-server=utf8mb4_unicode_ci']
|
command: ['mariadbd', '--character-set-server=utf8mb4', '--collation-server=utf8mb4_unicode_ci']
|
||||||
ports:
|
ports:
|
||||||
- "127.0.0.1:3306:3306/tcp"
|
- "127.0.0.1:3306:3306/tcp"
|
||||||
healthcheck:
|
healthcheck:
|
||||||
test: mysqladmin ping -h 127.0.0.1 -u ${uval} --password=${pval}
|
test: mariadb-admin ping -h 127.0.0.1 -u ${uval} --password=${pval}
|
||||||
start_period: 5s
|
start_period: 5s
|
||||||
interval: 5s
|
interval: 5s
|
||||||
timeout: 5s
|
timeout: 5s
|
||||||
|
@@ -103,11 +103,11 @@ services:
|
|||||||
image: "mariadb:lts"
|
image: "mariadb:lts"
|
||||||
container_name: "mariadb"
|
container_name: "mariadb"
|
||||||
env_file: "${INSTALL_DIR}/env/mariadb.env"
|
env_file: "${INSTALL_DIR}/env/mariadb.env"
|
||||||
command: ['mysqld', '--character-set-server=utf8mb4', '--collation-server=utf8mb4_unicode_ci']
|
command: ['mariadbd', '--character-set-server=utf8mb4', '--collation-server=utf8mb4_unicode_ci']
|
||||||
ports:
|
ports:
|
||||||
- "127.0.0.1:3306:3306/tcp"
|
- "127.0.0.1:3306:3306/tcp"
|
||||||
healthcheck:
|
healthcheck:
|
||||||
test: mysqladmin ping -h 127.0.0.1 -u ${uval} --password=${pval}
|
test: mariadb-admin ping -h 127.0.0.1 -u ${uval} --password=${pval}
|
||||||
start_period: 5s
|
start_period: 5s
|
||||||
interval: 5s
|
interval: 5s
|
||||||
timeout: 5s
|
timeout: 5s
|
||||||
|
@@ -129,11 +129,11 @@ services:
|
|||||||
image: "mariadb:lts"
|
image: "mariadb:lts"
|
||||||
container_name: "mariadb"
|
container_name: "mariadb"
|
||||||
env_file: "${INSTALL_DIR}/env/mariadb.env"
|
env_file: "${INSTALL_DIR}/env/mariadb.env"
|
||||||
command: ['mysqld', '--character-set-server=utf8mb4', '--collation-server=utf8mb4_unicode_ci']
|
command: ['mariadbd', '--character-set-server=utf8mb4', '--collation-server=utf8mb4_unicode_ci']
|
||||||
ports:
|
ports:
|
||||||
- "127.0.0.1:3306:3306/tcp"
|
- "127.0.0.1:3306:3306/tcp"
|
||||||
healthcheck:
|
healthcheck:
|
||||||
test: mysqladmin ping -h 127.0.0.1 -u ${uval} --password=${pval}
|
test: mariadb-admin ping -h 127.0.0.1 -u ${uval} --password=${pval}
|
||||||
start_period: 5s
|
start_period: 5s
|
||||||
interval: 5s
|
interval: 5s
|
||||||
timeout: 5s
|
timeout: 5s
|
||||||
|
@@ -3,11 +3,11 @@ services:
|
|||||||
image: "mariadb:lts"
|
image: "mariadb:lts"
|
||||||
container_name: "mariadb"
|
container_name: "mariadb"
|
||||||
env_file: "${INSTALL_DIR}/env/mariadb.env"
|
env_file: "${INSTALL_DIR}/env/mariadb.env"
|
||||||
command: ['mysqld', '--character-set-server=utf8mb4', '--collation-server=utf8mb4_unicode_ci']
|
command: ['mariadbd', '--character-set-server=utf8mb4', '--collation-server=utf8mb4_unicode_ci']
|
||||||
ports:
|
ports:
|
||||||
- "127.0.0.1:3306:3306/tcp"
|
- "127.0.0.1:3306:3306/tcp"
|
||||||
healthcheck:
|
healthcheck:
|
||||||
test: mysqladmin ping -h 127.0.0.1 -u ${uval} --password=${pval}
|
test: mariadb-admin ping -h 127.0.0.1 -u ${uval} --password=${pval}
|
||||||
start_period: 5s
|
start_period: 5s
|
||||||
interval: 5s
|
interval: 5s
|
||||||
timeout: 5s
|
timeout: 5s
|
||||||
|
Reference in New Issue
Block a user