mirror of
https://github.com/MarginaliaSearch/MarginaliaSearch.git
synced 2025-10-06 07:32:38 +02:00
Compare commits
21 Commits
deploy-001
...
deploy-002
Author | SHA1 | Date | |
---|---|---|---|
|
b62f043910 | ||
|
9b2ceaf37c | ||
|
8019c2ce18 | ||
|
4da3563d8a | ||
|
48d0a3089a | ||
|
594df64b20 | ||
|
78eb1417a7 | ||
|
67edc8f90d | ||
|
5f576b7d0c | ||
|
0b65164f60 | ||
|
9be477de33 | ||
|
710af4999a | ||
|
baeb4a46cd | ||
|
5e2a8e9f27 | ||
|
cc1a5bdf90 | ||
|
7f7b1ffaba | ||
|
0ea8092350 | ||
|
483d29497e | ||
|
bae44497fe | ||
|
0d59202aca | ||
|
0ca43f0c9c |
1
.github/FUNDING.yml
vendored
1
.github/FUNDING.yml
vendored
@@ -1,5 +1,6 @@
|
|||||||
# These are supported funding model platforms
|
# These are supported funding model platforms
|
||||||
|
|
||||||
|
polar: marginalia-search
|
||||||
github: MarginaliaSearch
|
github: MarginaliaSearch
|
||||||
patreon: marginalia_nu
|
patreon: marginalia_nu
|
||||||
open_collective: # Replace with a single Open Collective username
|
open_collective: # Replace with a single Open Collective username
|
||||||
|
52
ROADMAP.md
52
ROADMAP.md
@@ -8,20 +8,10 @@ be implemented as well.
|
|||||||
Major goals:
|
Major goals:
|
||||||
|
|
||||||
* Reach 1 billion pages indexed
|
* Reach 1 billion pages indexed
|
||||||
* Improve technical ability of indexing and search. Although this area has improved a bit, the
|
|
||||||
search engine is still not very good at dealing with longer queries.
|
|
||||||
|
|
||||||
## Proper Position Index (COMPLETED 2024-09)
|
|
||||||
|
|
||||||
The search engine uses a fixed width bit mask to indicate word positions. It has the benefit
|
* Improve technical ability of indexing and search. ~~Although this area has improved a bit, the
|
||||||
of being very fast to evaluate and works well for what it is, but is inaccurate and has the
|
search engine is still not very good at dealing with longer queries.~~ (As of PR [#129](https://github.com/MarginaliaSearch/MarginaliaSearch/pull/129), this has improved significantly. There is still more work to be done )
|
||||||
drawback of making support for quoted search terms inaccurate and largely reliant on indexing
|
|
||||||
word n-grams known beforehand. This limits the ability to interpret longer queries.
|
|
||||||
|
|
||||||
The positions mask should be supplemented or replaced with a more accurate (e.g.) gamma coded positions
|
|
||||||
list, as is the civilized way of doing this.
|
|
||||||
|
|
||||||
Completed with PR [#99](https://github.com/MarginaliaSearch/MarginaliaSearch/pull/99)
|
|
||||||
|
|
||||||
## Hybridize crawler w/ Common Crawl data
|
## Hybridize crawler w/ Common Crawl data
|
||||||
|
|
||||||
@@ -37,8 +27,7 @@ Retaining the ability to independently crawl the web is still strongly desirable
|
|||||||
|
|
||||||
## Safe Search
|
## Safe Search
|
||||||
|
|
||||||
The search engine has a bit of a problem showing spicy content mixed in with the results. It would be desirable
|
The search engine has a bit of a problem showing spicy content mixed in with the results. It would be desirable to have a way to filter this out. It's likely something like a URL blacklist (e.g. [UT1](https://dsi.ut-capitole.fr/blacklists/index_en.php) )
|
||||||
to have a way to filter this out. It's likely something like a URL blacklist (e.g. [UT1](https://dsi.ut-capitole.fr/blacklists/index_en.php) )
|
|
||||||
combined with naive bayesian filter would go a long way, or something more sophisticated...?
|
combined with naive bayesian filter would go a long way, or something more sophisticated...?
|
||||||
|
|
||||||
## Web Design Overhaul
|
## Web Design Overhaul
|
||||||
@@ -55,15 +44,6 @@ associated with each language added, at least a models file or two, as well as s
|
|||||||
|
|
||||||
It would be very helpful to find a speaker of a large language other than English to help in the fine tuning.
|
It would be very helpful to find a speaker of a large language other than English to help in the fine tuning.
|
||||||
|
|
||||||
## Finalize RSS support (COMPLETED 2024-11)
|
|
||||||
|
|
||||||
Marginalia has experimental RSS preview support for a few domains. This works well and
|
|
||||||
it should be extended to all domains. It would also be interesting to offer search of the
|
|
||||||
RSS data itself, or use the RSS set to feed a special live index that updates faster than the
|
|
||||||
main dataset.
|
|
||||||
|
|
||||||
Completed with PR [#122](https://github.com/MarginaliaSearch/MarginaliaSearch/pull/122) and PR [#125](https://github.com/MarginaliaSearch/MarginaliaSearch/pull/125)
|
|
||||||
|
|
||||||
## Support for binary formats like PDF
|
## Support for binary formats like PDF
|
||||||
|
|
||||||
The crawler needs to be modified to retain them, and the conversion logic needs to parse them.
|
The crawler needs to be modified to retain them, and the conversion logic needs to parse them.
|
||||||
@@ -80,5 +60,27 @@ This looks like a good idea that wouldn't just help clean up the search filters
|
|||||||
website, but might be cheap enough we might go as far as to offer a number of ad-hoc custom search
|
website, but might be cheap enough we might go as far as to offer a number of ad-hoc custom search
|
||||||
filter for any API consumer.
|
filter for any API consumer.
|
||||||
|
|
||||||
I've talked to the stract dev and he does not think it's a good idea to mimic their optics language,
|
I've talked to the stract dev and he does not think it's a good idea to mimic their optics language, which is quite ad-hoc, but instead to work together to find some new common description language for this.
|
||||||
which is quite ad-hoc, but instead to work together to find some new common description language for this.
|
|
||||||
|
# Completed
|
||||||
|
|
||||||
|
## Proper Position Index (COMPLETED 2024-09)
|
||||||
|
|
||||||
|
The search engine uses a fixed width bit mask to indicate word positions. It has the benefit
|
||||||
|
of being very fast to evaluate and works well for what it is, but is inaccurate and has the
|
||||||
|
drawback of making support for quoted search terms inaccurate and largely reliant on indexing
|
||||||
|
word n-grams known beforehand. This limits the ability to interpret longer queries.
|
||||||
|
|
||||||
|
The positions mask should be supplemented or replaced with a more accurate (e.g.) gamma coded positions
|
||||||
|
list, as is the civilized way of doing this.
|
||||||
|
|
||||||
|
Completed with PR [#99](https://github.com/MarginaliaSearch/MarginaliaSearch/pull/99)
|
||||||
|
|
||||||
|
## Finalize RSS support (COMPLETED 2024-11)
|
||||||
|
|
||||||
|
Marginalia has experimental RSS preview support for a few domains. This works well and
|
||||||
|
it should be extended to all domains. It would also be interesting to offer search of the
|
||||||
|
RSS data itself, or use the RSS set to feed a special live index that updates faster than the
|
||||||
|
main dataset.
|
||||||
|
|
||||||
|
Completed with PR [#122](https://github.com/MarginaliaSearch/MarginaliaSearch/pull/122) and PR [#125](https://github.com/MarginaliaSearch/MarginaliaSearch/pull/125)
|
||||||
|
@@ -7,8 +7,6 @@ import nu.marginalia.service.discovery.property.PartitionTraits;
|
|||||||
import nu.marginalia.service.discovery.property.ServiceEndpoint;
|
import nu.marginalia.service.discovery.property.ServiceEndpoint;
|
||||||
import nu.marginalia.service.discovery.property.ServiceKey;
|
import nu.marginalia.service.discovery.property.ServiceKey;
|
||||||
import nu.marginalia.service.discovery.property.ServicePartition;
|
import nu.marginalia.service.discovery.property.ServicePartition;
|
||||||
import org.slf4j.Logger;
|
|
||||||
import org.slf4j.LoggerFactory;
|
|
||||||
|
|
||||||
import java.util.List;
|
import java.util.List;
|
||||||
import java.util.concurrent.CompletableFuture;
|
import java.util.concurrent.CompletableFuture;
|
||||||
@@ -24,7 +22,7 @@ import java.util.function.Function;
|
|||||||
public class GrpcMultiNodeChannelPool<STUB> {
|
public class GrpcMultiNodeChannelPool<STUB> {
|
||||||
private final ConcurrentHashMap<Integer, GrpcSingleNodeChannelPool<STUB>> pools =
|
private final ConcurrentHashMap<Integer, GrpcSingleNodeChannelPool<STUB>> pools =
|
||||||
new ConcurrentHashMap<>();
|
new ConcurrentHashMap<>();
|
||||||
private static final Logger logger = LoggerFactory.getLogger(GrpcMultiNodeChannelPool.class);
|
|
||||||
private final ServiceRegistryIf serviceRegistryIf;
|
private final ServiceRegistryIf serviceRegistryIf;
|
||||||
private final ServiceKey<? extends PartitionTraits.Multicast> serviceKey;
|
private final ServiceKey<? extends PartitionTraits.Multicast> serviceKey;
|
||||||
private final Function<ServiceEndpoint.InstanceAddress, ManagedChannel> channelConstructor;
|
private final Function<ServiceEndpoint.InstanceAddress, ManagedChannel> channelConstructor;
|
||||||
|
@@ -10,6 +10,8 @@ import nu.marginalia.service.discovery.property.ServiceKey;
|
|||||||
import org.jetbrains.annotations.NotNull;
|
import org.jetbrains.annotations.NotNull;
|
||||||
import org.slf4j.Logger;
|
import org.slf4j.Logger;
|
||||||
import org.slf4j.LoggerFactory;
|
import org.slf4j.LoggerFactory;
|
||||||
|
import org.slf4j.Marker;
|
||||||
|
import org.slf4j.MarkerFactory;
|
||||||
|
|
||||||
import java.time.Duration;
|
import java.time.Duration;
|
||||||
import java.util.*;
|
import java.util.*;
|
||||||
@@ -26,13 +28,13 @@ import java.util.function.Function;
|
|||||||
public class GrpcSingleNodeChannelPool<STUB> extends ServiceChangeMonitor {
|
public class GrpcSingleNodeChannelPool<STUB> extends ServiceChangeMonitor {
|
||||||
private final Map<InstanceAddress, ConnectionHolder> channels = new ConcurrentHashMap<>();
|
private final Map<InstanceAddress, ConnectionHolder> channels = new ConcurrentHashMap<>();
|
||||||
|
|
||||||
|
private final Marker grpcMarker = MarkerFactory.getMarker("GRPC");
|
||||||
private static final Logger logger = LoggerFactory.getLogger(GrpcSingleNodeChannelPool.class);
|
private static final Logger logger = LoggerFactory.getLogger(GrpcSingleNodeChannelPool.class);
|
||||||
|
|
||||||
private final ServiceRegistryIf serviceRegistryIf;
|
private final ServiceRegistryIf serviceRegistryIf;
|
||||||
private final Function<InstanceAddress, ManagedChannel> channelConstructor;
|
private final Function<InstanceAddress, ManagedChannel> channelConstructor;
|
||||||
private final Function<ManagedChannel, STUB> stubConstructor;
|
private final Function<ManagedChannel, STUB> stubConstructor;
|
||||||
|
|
||||||
|
|
||||||
public GrpcSingleNodeChannelPool(ServiceRegistryIf serviceRegistryIf,
|
public GrpcSingleNodeChannelPool(ServiceRegistryIf serviceRegistryIf,
|
||||||
ServiceKey<? extends PartitionTraits.Unicast> serviceKey,
|
ServiceKey<? extends PartitionTraits.Unicast> serviceKey,
|
||||||
Function<InstanceAddress, ManagedChannel> channelConstructor,
|
Function<InstanceAddress, ManagedChannel> channelConstructor,
|
||||||
@@ -48,8 +50,6 @@ public class GrpcSingleNodeChannelPool<STUB> extends ServiceChangeMonitor {
|
|||||||
serviceRegistryIf.registerMonitor(this);
|
serviceRegistryIf.registerMonitor(this);
|
||||||
|
|
||||||
onChange();
|
onChange();
|
||||||
|
|
||||||
awaitChannel(Duration.ofSeconds(5));
|
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
@@ -62,10 +62,10 @@ public class GrpcSingleNodeChannelPool<STUB> extends ServiceChangeMonitor {
|
|||||||
for (var route : Sets.symmetricDifference(oldRoutes, newRoutes)) {
|
for (var route : Sets.symmetricDifference(oldRoutes, newRoutes)) {
|
||||||
ConnectionHolder oldChannel;
|
ConnectionHolder oldChannel;
|
||||||
if (newRoutes.contains(route)) {
|
if (newRoutes.contains(route)) {
|
||||||
logger.info("Adding route {}", route);
|
logger.info(grpcMarker, "Adding route {} => {}", serviceKey, route);
|
||||||
oldChannel = channels.put(route, new ConnectionHolder(route));
|
oldChannel = channels.put(route, new ConnectionHolder(route));
|
||||||
} else {
|
} else {
|
||||||
logger.info("Expelling route {}", route);
|
logger.info(grpcMarker, "Expelling route {} => {}", serviceKey, route);
|
||||||
oldChannel = channels.remove(route);
|
oldChannel = channels.remove(route);
|
||||||
}
|
}
|
||||||
if (oldChannel != null) {
|
if (oldChannel != null) {
|
||||||
@@ -103,7 +103,7 @@ public class GrpcSingleNodeChannelPool<STUB> extends ServiceChangeMonitor {
|
|||||||
}
|
}
|
||||||
|
|
||||||
try {
|
try {
|
||||||
logger.info("Creating channel for {}:{}", serviceKey, address);
|
logger.info(grpcMarker, "Creating channel for {} => {}", serviceKey, address);
|
||||||
value = channelConstructor.apply(address);
|
value = channelConstructor.apply(address);
|
||||||
if (channel.compareAndSet(null, value)) {
|
if (channel.compareAndSet(null, value)) {
|
||||||
return value;
|
return value;
|
||||||
@@ -114,7 +114,7 @@ public class GrpcSingleNodeChannelPool<STUB> extends ServiceChangeMonitor {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
catch (Exception e) {
|
catch (Exception e) {
|
||||||
logger.error("Failed to get channel for " + address, e);
|
logger.error(grpcMarker, "Failed to get channel for " + address, e);
|
||||||
return null;
|
return null;
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@@ -206,7 +206,7 @@ public class GrpcSingleNodeChannelPool<STUB> extends ServiceChangeMonitor {
|
|||||||
}
|
}
|
||||||
|
|
||||||
for (var e : exceptions) {
|
for (var e : exceptions) {
|
||||||
logger.error("Failed to call service {}", serviceKey, e);
|
logger.error(grpcMarker, "Failed to call service {}", serviceKey, e);
|
||||||
}
|
}
|
||||||
|
|
||||||
throw new ServiceNotAvailableException(serviceKey);
|
throw new ServiceNotAvailableException(serviceKey);
|
||||||
|
@@ -4,6 +4,11 @@ import nu.marginalia.service.discovery.property.ServiceKey;
|
|||||||
|
|
||||||
public class ServiceNotAvailableException extends RuntimeException {
|
public class ServiceNotAvailableException extends RuntimeException {
|
||||||
public ServiceNotAvailableException(ServiceKey<?> key) {
|
public ServiceNotAvailableException(ServiceKey<?> key) {
|
||||||
super("Service " + key + " not available");
|
super(key.toString());
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public StackTraceElement[] getStackTrace() { // Suppress stack trace
|
||||||
|
return new StackTraceElement[0];
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
@@ -48,5 +48,10 @@ public record ServiceEndpoint(String host, int port) {
|
|||||||
public int port() {
|
public int port() {
|
||||||
return endpoint.port();
|
return endpoint.port();
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public String toString() {
|
||||||
|
return endpoint().host() + ":" + endpoint.port() + " [" + instance + "]";
|
||||||
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
@@ -48,6 +48,19 @@ public sealed interface ServiceKey<P extends ServicePartition> {
|
|||||||
{
|
{
|
||||||
throw new UnsupportedOperationException();
|
throw new UnsupportedOperationException();
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public String toString() {
|
||||||
|
final String shortName;
|
||||||
|
|
||||||
|
int periodIndex = name.lastIndexOf('.');
|
||||||
|
|
||||||
|
if (periodIndex >= 0) shortName = name.substring(periodIndex+1);
|
||||||
|
else shortName = name;
|
||||||
|
|
||||||
|
return "rest:" + shortName;
|
||||||
|
}
|
||||||
|
|
||||||
}
|
}
|
||||||
record Grpc<P extends ServicePartition>(String name, P partition) implements ServiceKey<P> {
|
record Grpc<P extends ServicePartition>(String name, P partition) implements ServiceKey<P> {
|
||||||
public String baseName() {
|
public String baseName() {
|
||||||
@@ -64,6 +77,18 @@ public sealed interface ServiceKey<P extends ServicePartition> {
|
|||||||
{
|
{
|
||||||
return new Grpc<>(name, partition);
|
return new Grpc<>(name, partition);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public String toString() {
|
||||||
|
final String shortName;
|
||||||
|
|
||||||
|
int periodIndex = name.lastIndexOf('.');
|
||||||
|
|
||||||
|
if (periodIndex >= 0) shortName = name.substring(periodIndex+1);
|
||||||
|
else shortName = name;
|
||||||
|
|
||||||
|
return "grpc:" + shortName + "[" + partition.identifier() + "]";
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
}
|
}
|
||||||
|
@@ -101,6 +101,7 @@ message RpcSimilarDomain {
|
|||||||
bool active = 6;
|
bool active = 6;
|
||||||
bool screenshot = 7;
|
bool screenshot = 7;
|
||||||
LINK_TYPE linkType = 8;
|
LINK_TYPE linkType = 8;
|
||||||
|
bool feed = 9;
|
||||||
|
|
||||||
enum LINK_TYPE {
|
enum LINK_TYPE {
|
||||||
BACKWARD = 0;
|
BACKWARD = 0;
|
||||||
|
@@ -9,6 +9,7 @@ import gnu.trove.map.hash.TIntIntHashMap;
|
|||||||
import gnu.trove.set.TIntSet;
|
import gnu.trove.set.TIntSet;
|
||||||
import gnu.trove.set.hash.TIntHashSet;
|
import gnu.trove.set.hash.TIntHashSet;
|
||||||
import it.unimi.dsi.fastutil.ints.Int2DoubleArrayMap;
|
import it.unimi.dsi.fastutil.ints.Int2DoubleArrayMap;
|
||||||
|
import nu.marginalia.WmsaHome;
|
||||||
import nu.marginalia.api.domains.RpcSimilarDomain;
|
import nu.marginalia.api.domains.RpcSimilarDomain;
|
||||||
import nu.marginalia.api.domains.model.SimilarDomain;
|
import nu.marginalia.api.domains.model.SimilarDomain;
|
||||||
import nu.marginalia.api.linkgraph.AggregateLinkGraphClient;
|
import nu.marginalia.api.linkgraph.AggregateLinkGraphClient;
|
||||||
@@ -17,10 +18,14 @@ import org.roaringbitmap.RoaringBitmap;
|
|||||||
import org.slf4j.Logger;
|
import org.slf4j.Logger;
|
||||||
import org.slf4j.LoggerFactory;
|
import org.slf4j.LoggerFactory;
|
||||||
|
|
||||||
|
import java.nio.file.Path;
|
||||||
|
import java.sql.DriverManager;
|
||||||
import java.sql.ResultSet;
|
import java.sql.ResultSet;
|
||||||
import java.sql.SQLException;
|
import java.sql.SQLException;
|
||||||
import java.util.ArrayList;
|
import java.util.ArrayList;
|
||||||
|
import java.util.HashSet;
|
||||||
import java.util.List;
|
import java.util.List;
|
||||||
|
import java.util.Set;
|
||||||
import java.util.concurrent.Executors;
|
import java.util.concurrent.Executors;
|
||||||
import java.util.concurrent.ScheduledExecutorService;
|
import java.util.concurrent.ScheduledExecutorService;
|
||||||
import java.util.concurrent.TimeUnit;
|
import java.util.concurrent.TimeUnit;
|
||||||
@@ -32,12 +37,13 @@ public class SimilarDomainsService {
|
|||||||
private final HikariDataSource dataSource;
|
private final HikariDataSource dataSource;
|
||||||
private final AggregateLinkGraphClient linkGraphClient;
|
private final AggregateLinkGraphClient linkGraphClient;
|
||||||
|
|
||||||
private volatile TIntIntHashMap domainIdToIdx = new TIntIntHashMap(100_000);
|
private final TIntIntHashMap domainIdToIdx = new TIntIntHashMap(100_000);
|
||||||
private volatile int[] domainIdxToId;
|
private volatile int[] domainIdxToId;
|
||||||
|
|
||||||
public volatile Int2DoubleArrayMap[] relatedDomains;
|
public volatile Int2DoubleArrayMap[] relatedDomains;
|
||||||
public volatile TIntList[] domainNeighbors = null;
|
public volatile TIntList[] domainNeighbors = null;
|
||||||
public volatile RoaringBitmap screenshotDomains = null;
|
public volatile RoaringBitmap screenshotDomains = null;
|
||||||
|
public volatile RoaringBitmap feedDomains = null;
|
||||||
public volatile RoaringBitmap activeDomains = null;
|
public volatile RoaringBitmap activeDomains = null;
|
||||||
public volatile RoaringBitmap indexedDomains = null;
|
public volatile RoaringBitmap indexedDomains = null;
|
||||||
public volatile TIntDoubleHashMap domainRanks = null;
|
public volatile TIntDoubleHashMap domainRanks = null;
|
||||||
@@ -82,6 +88,7 @@ public class SimilarDomainsService {
|
|||||||
domainNames = new String[domainIdToIdx.size()];
|
domainNames = new String[domainIdToIdx.size()];
|
||||||
domainNeighbors = new TIntList[domainIdToIdx.size()];
|
domainNeighbors = new TIntList[domainIdToIdx.size()];
|
||||||
screenshotDomains = new RoaringBitmap();
|
screenshotDomains = new RoaringBitmap();
|
||||||
|
feedDomains = new RoaringBitmap();
|
||||||
activeDomains = new RoaringBitmap();
|
activeDomains = new RoaringBitmap();
|
||||||
indexedDomains = new RoaringBitmap();
|
indexedDomains = new RoaringBitmap();
|
||||||
relatedDomains = new Int2DoubleArrayMap[domainIdToIdx.size()];
|
relatedDomains = new Int2DoubleArrayMap[domainIdToIdx.size()];
|
||||||
@@ -145,10 +152,12 @@ public class SimilarDomainsService {
|
|||||||
activeDomains.add(idx);
|
activeDomains.add(idx);
|
||||||
}
|
}
|
||||||
|
|
||||||
updateScreenshotInfo();
|
|
||||||
|
|
||||||
logger.info("Loaded {} domains", domainRanks.size());
|
logger.info("Loaded {} domains", domainRanks.size());
|
||||||
isReady = true;
|
isReady = true;
|
||||||
|
|
||||||
|
// We can defer these as they only populate a roaringbitmap, and will degrade gracefully when not complete
|
||||||
|
updateScreenshotInfo();
|
||||||
|
updateFeedInfo();
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
catch (SQLException throwables) {
|
catch (SQLException throwables) {
|
||||||
@@ -156,6 +165,42 @@ public class SimilarDomainsService {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
private void updateFeedInfo() {
|
||||||
|
Set<String> feedsDomainNames = new HashSet<>(500_000);
|
||||||
|
Path readerDbPath = WmsaHome.getDataPath().resolve("rss-feeds.db").toAbsolutePath();
|
||||||
|
String dbUrl = "jdbc:sqlite:" + readerDbPath;
|
||||||
|
|
||||||
|
logger.info("Opening feed db at " + dbUrl);
|
||||||
|
|
||||||
|
try (var conn = DriverManager.getConnection(dbUrl);
|
||||||
|
var stmt = conn.createStatement()) {
|
||||||
|
var rs = stmt.executeQuery("""
|
||||||
|
select
|
||||||
|
json_extract(feed, '$.domain') as domain
|
||||||
|
from feed
|
||||||
|
where json_array_length(feed, '$.items') > 0
|
||||||
|
""");
|
||||||
|
while (rs.next()) {
|
||||||
|
feedsDomainNames.add(rs.getString(1));
|
||||||
|
}
|
||||||
|
}
|
||||||
|
catch (SQLException ex) {
|
||||||
|
logger.error("Failed to read RSS feed items", ex);
|
||||||
|
}
|
||||||
|
|
||||||
|
for (int idx = 0; idx < domainNames.length; idx++) {
|
||||||
|
String name = domainNames[idx];
|
||||||
|
if (name == null) {
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (feedsDomainNames.contains(name)) {
|
||||||
|
feedDomains.add(idx);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
}
|
||||||
|
|
||||||
private void updateScreenshotInfo() {
|
private void updateScreenshotInfo() {
|
||||||
try (var connection = dataSource.getConnection()) {
|
try (var connection = dataSource.getConnection()) {
|
||||||
try (var stmt = connection.createStatement()) {
|
try (var stmt = connection.createStatement()) {
|
||||||
@@ -254,6 +299,7 @@ public class SimilarDomainsService {
|
|||||||
.setIndexed(indexedDomains.contains(idx))
|
.setIndexed(indexedDomains.contains(idx))
|
||||||
.setActive(activeDomains.contains(idx))
|
.setActive(activeDomains.contains(idx))
|
||||||
.setScreenshot(screenshotDomains.contains(idx))
|
.setScreenshot(screenshotDomains.contains(idx))
|
||||||
|
.setFeed(feedDomains.contains(idx))
|
||||||
.setLinkType(RpcSimilarDomain.LINK_TYPE.valueOf(linkType.name()))
|
.setLinkType(RpcSimilarDomain.LINK_TYPE.valueOf(linkType.name()))
|
||||||
.build());
|
.build());
|
||||||
|
|
||||||
@@ -369,6 +415,7 @@ public class SimilarDomainsService {
|
|||||||
.setIndexed(indexedDomains.contains(idx))
|
.setIndexed(indexedDomains.contains(idx))
|
||||||
.setActive(activeDomains.contains(idx))
|
.setActive(activeDomains.contains(idx))
|
||||||
.setScreenshot(screenshotDomains.contains(idx))
|
.setScreenshot(screenshotDomains.contains(idx))
|
||||||
|
.setFeed(feedDomains.contains(idx))
|
||||||
.setLinkType(RpcSimilarDomain.LINK_TYPE.valueOf(linkType.name()))
|
.setLinkType(RpcSimilarDomain.LINK_TYPE.valueOf(linkType.name()))
|
||||||
.build());
|
.build());
|
||||||
|
|
||||||
|
@@ -5,6 +5,7 @@ import com.google.inject.Singleton;
|
|||||||
import nu.marginalia.api.livecapture.LiveCaptureApiGrpc.LiveCaptureApiBlockingStub;
|
import nu.marginalia.api.livecapture.LiveCaptureApiGrpc.LiveCaptureApiBlockingStub;
|
||||||
import nu.marginalia.service.client.GrpcChannelPoolFactory;
|
import nu.marginalia.service.client.GrpcChannelPoolFactory;
|
||||||
import nu.marginalia.service.client.GrpcSingleNodeChannelPool;
|
import nu.marginalia.service.client.GrpcSingleNodeChannelPool;
|
||||||
|
import nu.marginalia.service.client.ServiceNotAvailableException;
|
||||||
import nu.marginalia.service.discovery.property.ServiceKey;
|
import nu.marginalia.service.discovery.property.ServiceKey;
|
||||||
import nu.marginalia.service.discovery.property.ServicePartition;
|
import nu.marginalia.service.discovery.property.ServicePartition;
|
||||||
import org.slf4j.Logger;
|
import org.slf4j.Logger;
|
||||||
@@ -29,6 +30,9 @@ public class LiveCaptureClient {
|
|||||||
channelPool.call(LiveCaptureApiBlockingStub::requestScreengrab)
|
channelPool.call(LiveCaptureApiBlockingStub::requestScreengrab)
|
||||||
.run(RpcDomainId.newBuilder().setDomainId(domainId).build());
|
.run(RpcDomainId.newBuilder().setDomainId(domainId).build());
|
||||||
}
|
}
|
||||||
|
catch (ServiceNotAvailableException e) {
|
||||||
|
logger.info("requestScreengrab() failed since the service is not available");
|
||||||
|
}
|
||||||
catch (Exception e) {
|
catch (Exception e) {
|
||||||
logger.error("API Exception", e);
|
logger.error("API Exception", e);
|
||||||
}
|
}
|
||||||
|
@@ -402,6 +402,7 @@ public class FeedFetcherService {
|
|||||||
"–", "-",
|
"–", "-",
|
||||||
"’", "'",
|
"’", "'",
|
||||||
"‘", "'",
|
"‘", "'",
|
||||||
|
""", "\"",
|
||||||
" ", ""
|
" ", ""
|
||||||
);
|
);
|
||||||
|
|
||||||
|
@@ -10,7 +10,6 @@ public class TestXmlSanitization {
|
|||||||
Assertions.assertEquals("&", FeedFetcherService.sanitizeEntities("&"));
|
Assertions.assertEquals("&", FeedFetcherService.sanitizeEntities("&"));
|
||||||
Assertions.assertEquals("<", FeedFetcherService.sanitizeEntities("<"));
|
Assertions.assertEquals("<", FeedFetcherService.sanitizeEntities("<"));
|
||||||
Assertions.assertEquals(">", FeedFetcherService.sanitizeEntities(">"));
|
Assertions.assertEquals(">", FeedFetcherService.sanitizeEntities(">"));
|
||||||
Assertions.assertEquals(""", FeedFetcherService.sanitizeEntities("""));
|
|
||||||
Assertions.assertEquals("'", FeedFetcherService.sanitizeEntities("'"));
|
Assertions.assertEquals("'", FeedFetcherService.sanitizeEntities("'"));
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -23,4 +22,9 @@ public class TestXmlSanitization {
|
|||||||
public void testTranslatedHtmlEntity() {
|
public void testTranslatedHtmlEntity() {
|
||||||
Assertions.assertEquals("Foo -- Bar", FeedFetcherService.sanitizeEntities("Foo — Bar"));
|
Assertions.assertEquals("Foo -- Bar", FeedFetcherService.sanitizeEntities("Foo — Bar"));
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@Test
|
||||||
|
public void testTranslatedHtmlEntityQuot() {
|
||||||
|
Assertions.assertEquals("\"Bob\"", FeedFetcherService.sanitizeEntities(""Bob""));
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
@@ -9,10 +9,9 @@ import nu.marginalia.service.client.GrpcChannelPoolFactory;
|
|||||||
import nu.marginalia.service.client.GrpcSingleNodeChannelPool;
|
import nu.marginalia.service.client.GrpcSingleNodeChannelPool;
|
||||||
import nu.marginalia.service.discovery.property.ServiceKey;
|
import nu.marginalia.service.discovery.property.ServiceKey;
|
||||||
import nu.marginalia.service.discovery.property.ServicePartition;
|
import nu.marginalia.service.discovery.property.ServicePartition;
|
||||||
import org.slf4j.Logger;
|
|
||||||
import org.slf4j.LoggerFactory;
|
|
||||||
|
|
||||||
import javax.annotation.CheckReturnValue;
|
import javax.annotation.CheckReturnValue;
|
||||||
|
import java.time.Duration;
|
||||||
|
|
||||||
@Singleton
|
@Singleton
|
||||||
public class QueryClient {
|
public class QueryClient {
|
||||||
@@ -24,13 +23,14 @@ public class QueryClient {
|
|||||||
|
|
||||||
private final GrpcSingleNodeChannelPool<QueryApiGrpc.QueryApiBlockingStub> queryApiPool;
|
private final GrpcSingleNodeChannelPool<QueryApiGrpc.QueryApiBlockingStub> queryApiPool;
|
||||||
|
|
||||||
private final Logger logger = LoggerFactory.getLogger(getClass());
|
|
||||||
|
|
||||||
@Inject
|
@Inject
|
||||||
public QueryClient(GrpcChannelPoolFactory channelPoolFactory) {
|
public QueryClient(GrpcChannelPoolFactory channelPoolFactory) throws InterruptedException {
|
||||||
this.queryApiPool = channelPoolFactory.createSingle(
|
this.queryApiPool = channelPoolFactory.createSingle(
|
||||||
ServiceKey.forGrpcApi(QueryApiGrpc.class, ServicePartition.any()),
|
ServiceKey.forGrpcApi(QueryApiGrpc.class, ServicePartition.any()),
|
||||||
QueryApiGrpc::newBlockingStub);
|
QueryApiGrpc::newBlockingStub);
|
||||||
|
|
||||||
|
// Hold up initialization until we have a downstream connection
|
||||||
|
this.queryApiPool.awaitChannel(Duration.ofSeconds(5));
|
||||||
}
|
}
|
||||||
|
|
||||||
@CheckReturnValue
|
@CheckReturnValue
|
||||||
|
@@ -25,6 +25,7 @@ public class QueryExpansion {
|
|||||||
this::joinDashes,
|
this::joinDashes,
|
||||||
this::splitWordNum,
|
this::splitWordNum,
|
||||||
this::joinTerms,
|
this::joinTerms,
|
||||||
|
this::categoryKeywords,
|
||||||
this::ngramAll
|
this::ngramAll
|
||||||
);
|
);
|
||||||
|
|
||||||
@@ -98,6 +99,24 @@ public class QueryExpansion {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// Category keyword substitution, e.g. guitar wiki -> guitar generator:wiki
|
||||||
|
public void categoryKeywords(QWordGraph graph) {
|
||||||
|
|
||||||
|
for (var qw : graph) {
|
||||||
|
|
||||||
|
// Ensure we only perform the substitution on the last word in the query
|
||||||
|
if (!graph.getNextOriginal(qw).getFirst().isEnd()) {
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
|
||||||
|
switch (qw.word()) {
|
||||||
|
case "recipe", "recipes" -> graph.addVariant(qw, "category:food");
|
||||||
|
case "forum" -> graph.addVariant(qw, "generator:forum");
|
||||||
|
case "wiki" -> graph.addVariant(qw, "generator:wiki");
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
// Turn 'lawn chair' into 'lawnchair'
|
// Turn 'lawn chair' into 'lawnchair'
|
||||||
public void joinTerms(QWordGraph graph) {
|
public void joinTerms(QWordGraph graph) {
|
||||||
QWord prev = null;
|
QWord prev = null;
|
||||||
|
@@ -155,16 +155,25 @@ public class QueryParser {
|
|||||||
|
|
||||||
// Remove trailing punctuation
|
// Remove trailing punctuation
|
||||||
int lastChar = str.charAt(str.length() - 1);
|
int lastChar = str.charAt(str.length() - 1);
|
||||||
if (":.,!?$'".indexOf(lastChar) >= 0)
|
if (":.,!?$'".indexOf(lastChar) >= 0) {
|
||||||
entity.replace(new QueryToken.LiteralTerm(str.substring(0, str.length() - 1), lt.displayStr()));
|
str = str.substring(0, str.length() - 1);
|
||||||
|
entity.replace(new QueryToken.LiteralTerm(str, lt.displayStr()));
|
||||||
|
}
|
||||||
|
|
||||||
// Remove term elements that aren't indexed by the search engine
|
// Remove term elements that aren't indexed by the search engine
|
||||||
if (str.endsWith("'s"))
|
if (str.endsWith("'s")) {
|
||||||
entity.replace(new QueryToken.LiteralTerm(str.substring(0, str.length() - 2), lt.displayStr()));
|
str = str.substring(0, str.length() - 2);
|
||||||
if (str.endsWith("()"))
|
entity.replace(new QueryToken.LiteralTerm(str, lt.displayStr()));
|
||||||
entity.replace(new QueryToken.LiteralTerm(str.substring(0, str.length() - 2), lt.displayStr()));
|
}
|
||||||
if (str.startsWith("$"))
|
if (str.endsWith("()")) {
|
||||||
entity.replace(new QueryToken.LiteralTerm(str.substring(1), lt.displayStr()));
|
str = str.substring(0, str.length() - 2);
|
||||||
|
entity.replace(new QueryToken.LiteralTerm(str, lt.displayStr()));
|
||||||
|
}
|
||||||
|
|
||||||
|
while (str.startsWith("$") || str.startsWith("_")) {
|
||||||
|
str = str.substring(1);
|
||||||
|
entity.replace(new QueryToken.LiteralTerm(str, lt.displayStr()));
|
||||||
|
}
|
||||||
|
|
||||||
if (entity.isBlank()) {
|
if (entity.isBlank()) {
|
||||||
entity.remove();
|
entity.remove();
|
||||||
|
@@ -1,165 +0,0 @@
|
|||||||
package nu.marginalia.util.language;
|
|
||||||
|
|
||||||
import com.google.inject.Inject;
|
|
||||||
import nu.marginalia.term_frequency_dict.TermFrequencyDict;
|
|
||||||
import org.slf4j.Logger;
|
|
||||||
import org.slf4j.LoggerFactory;
|
|
||||||
|
|
||||||
import java.io.BufferedReader;
|
|
||||||
import java.io.InputStreamReader;
|
|
||||||
import java.util.*;
|
|
||||||
import java.util.regex.Pattern;
|
|
||||||
import java.util.stream.Collectors;
|
|
||||||
|
|
||||||
public class EnglishDictionary {
|
|
||||||
private final Set<String> englishWords = new HashSet<>();
|
|
||||||
private final TermFrequencyDict tfDict;
|
|
||||||
private final Logger logger = LoggerFactory.getLogger(getClass());
|
|
||||||
|
|
||||||
@Inject
|
|
||||||
public EnglishDictionary(TermFrequencyDict tfDict) {
|
|
||||||
this.tfDict = tfDict;
|
|
||||||
try (var resource = Objects.requireNonNull(ClassLoader.getSystemResourceAsStream("dictionary/en-words"),
|
|
||||||
"Could not load word frequency table");
|
|
||||||
var br = new BufferedReader(new InputStreamReader(resource))
|
|
||||||
) {
|
|
||||||
for (;;) {
|
|
||||||
String s = br.readLine();
|
|
||||||
if (s == null) {
|
|
||||||
break;
|
|
||||||
}
|
|
||||||
englishWords.add(s.toLowerCase());
|
|
||||||
}
|
|
||||||
}
|
|
||||||
catch (Exception ex) {
|
|
||||||
throw new RuntimeException(ex);
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
public boolean isWord(String word) {
|
|
||||||
return englishWords.contains(word);
|
|
||||||
}
|
|
||||||
|
|
||||||
private static final Pattern ingPattern = Pattern.compile(".*(\\w)\\1ing$");
|
|
||||||
|
|
||||||
public Collection<String> getWordVariants(String s) {
|
|
||||||
var variants = findWordVariants(s);
|
|
||||||
|
|
||||||
var ret = variants.stream()
|
|
||||||
.filter(var -> tfDict.getTermFreq(var) > 100)
|
|
||||||
.collect(Collectors.toList());
|
|
||||||
|
|
||||||
if (s.equals("recipe") || s.equals("recipes")) {
|
|
||||||
ret.add("category:food");
|
|
||||||
}
|
|
||||||
|
|
||||||
return ret;
|
|
||||||
}
|
|
||||||
|
|
||||||
|
|
||||||
public Collection<String> findWordVariants(String s) {
|
|
||||||
int sl = s.length();
|
|
||||||
|
|
||||||
if (sl < 2) {
|
|
||||||
return Collections.emptyList();
|
|
||||||
}
|
|
||||||
if (s.endsWith("s")) {
|
|
||||||
String a = s.substring(0, sl-1);
|
|
||||||
String b = s + "es";
|
|
||||||
if (isWord(a) && isWord(b)) {
|
|
||||||
return List.of(a, b);
|
|
||||||
}
|
|
||||||
else if (isWord(a)) {
|
|
||||||
return List.of(a);
|
|
||||||
}
|
|
||||||
else if (isWord(b)) {
|
|
||||||
return List.of(b);
|
|
||||||
}
|
|
||||||
}
|
|
||||||
if (s.endsWith("sm")) {
|
|
||||||
String a = s.substring(0, sl-1)+"t";
|
|
||||||
String b = s.substring(0, sl-1)+"ts";
|
|
||||||
if (isWord(a) && isWord(b)) {
|
|
||||||
return List.of(a, b);
|
|
||||||
}
|
|
||||||
else if (isWord(a)) {
|
|
||||||
return List.of(a);
|
|
||||||
}
|
|
||||||
else if (isWord(b)) {
|
|
||||||
return List.of(b);
|
|
||||||
}
|
|
||||||
}
|
|
||||||
if (s.endsWith("st")) {
|
|
||||||
String a = s.substring(0, sl-1)+"m";
|
|
||||||
String b = s + "s";
|
|
||||||
if (isWord(a) && isWord(b)) {
|
|
||||||
return List.of(a, b);
|
|
||||||
}
|
|
||||||
else if (isWord(a)) {
|
|
||||||
return List.of(a);
|
|
||||||
}
|
|
||||||
else if (isWord(b)) {
|
|
||||||
return List.of(b);
|
|
||||||
}
|
|
||||||
}
|
|
||||||
else if (ingPattern.matcher(s).matches() && sl > 4) { // humming, clapping
|
|
||||||
var a = s.substring(0, sl-4);
|
|
||||||
var b = s.substring(0, sl-3) + "ed";
|
|
||||||
|
|
||||||
if (isWord(a) && isWord(b)) {
|
|
||||||
return List.of(a, b);
|
|
||||||
}
|
|
||||||
else if (isWord(a)) {
|
|
||||||
return List.of(a);
|
|
||||||
}
|
|
||||||
else if (isWord(b)) {
|
|
||||||
return List.of(b);
|
|
||||||
}
|
|
||||||
}
|
|
||||||
else {
|
|
||||||
String a = s + "s";
|
|
||||||
String b = ingForm(s);
|
|
||||||
String c = s + "ed";
|
|
||||||
|
|
||||||
if (isWord(a) && isWord(b) && isWord(c)) {
|
|
||||||
return List.of(a, b, c);
|
|
||||||
}
|
|
||||||
else if (isWord(a) && isWord(b)) {
|
|
||||||
return List.of(a, b);
|
|
||||||
}
|
|
||||||
else if (isWord(b) && isWord(c)) {
|
|
||||||
return List.of(b, c);
|
|
||||||
}
|
|
||||||
else if (isWord(a) && isWord(c)) {
|
|
||||||
return List.of(a, c);
|
|
||||||
}
|
|
||||||
else if (isWord(a)) {
|
|
||||||
return List.of(a);
|
|
||||||
}
|
|
||||||
else if (isWord(b)) {
|
|
||||||
return List.of(b);
|
|
||||||
}
|
|
||||||
else if (isWord(c)) {
|
|
||||||
return List.of(c);
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
return Collections.emptyList();
|
|
||||||
}
|
|
||||||
|
|
||||||
public String ingForm(String s) {
|
|
||||||
if (s.endsWith("t") && !s.endsWith("tt")) {
|
|
||||||
return s + "ting";
|
|
||||||
}
|
|
||||||
if (s.endsWith("n") && !s.endsWith("nn")) {
|
|
||||||
return s + "ning";
|
|
||||||
}
|
|
||||||
if (s.endsWith("m") && !s.endsWith("mm")) {
|
|
||||||
return s + "ming";
|
|
||||||
}
|
|
||||||
if (s.endsWith("r") && !s.endsWith("rr")) {
|
|
||||||
return s + "ring";
|
|
||||||
}
|
|
||||||
return s + "ing";
|
|
||||||
}
|
|
||||||
}
|
|
@@ -0,0 +1,32 @@
|
|||||||
|
package nu.marginalia.functions.searchquery.query_parser;
|
||||||
|
|
||||||
|
import nu.marginalia.functions.searchquery.query_parser.token.QueryToken;
|
||||||
|
import org.junit.jupiter.api.Assertions;
|
||||||
|
import org.junit.jupiter.api.Test;
|
||||||
|
|
||||||
|
import java.util.List;
|
||||||
|
|
||||||
|
class QueryParserTest {
|
||||||
|
|
||||||
|
@Test
|
||||||
|
// https://github.com/MarginaliaSearch/MarginaliaSearch/issues/140
|
||||||
|
void parse__builtin_ffs() {
|
||||||
|
QueryParser parser = new QueryParser();
|
||||||
|
var tokens = parser.parse("__builtin_ffs");
|
||||||
|
Assertions.assertEquals(List.of(new QueryToken.LiteralTerm("builtin_ffs", "__builtin_ffs")), tokens);
|
||||||
|
}
|
||||||
|
|
||||||
|
@Test
|
||||||
|
void trailingParens() {
|
||||||
|
QueryParser parser = new QueryParser();
|
||||||
|
var tokens = parser.parse("strcpy()");
|
||||||
|
Assertions.assertEquals(List.of(new QueryToken.LiteralTerm("strcpy", "strcpy()")), tokens);
|
||||||
|
}
|
||||||
|
|
||||||
|
@Test
|
||||||
|
void trailingQuote() {
|
||||||
|
QueryParser parser = new QueryParser();
|
||||||
|
var tokens = parser.parse("bob's");
|
||||||
|
Assertions.assertEquals(List.of(new QueryToken.LiteralTerm("bob", "bob's")), tokens);
|
||||||
|
}
|
||||||
|
}
|
@@ -12,6 +12,7 @@ import nu.marginalia.index.query.limit.SpecificationLimit;
|
|||||||
import nu.marginalia.index.query.limit.SpecificationLimitType;
|
import nu.marginalia.index.query.limit.SpecificationLimitType;
|
||||||
import nu.marginalia.segmentation.NgramLexicon;
|
import nu.marginalia.segmentation.NgramLexicon;
|
||||||
import nu.marginalia.term_frequency_dict.TermFrequencyDict;
|
import nu.marginalia.term_frequency_dict.TermFrequencyDict;
|
||||||
|
import org.junit.jupiter.api.Assertions;
|
||||||
import org.junit.jupiter.api.BeforeAll;
|
import org.junit.jupiter.api.BeforeAll;
|
||||||
import org.junit.jupiter.api.Test;
|
import org.junit.jupiter.api.Test;
|
||||||
|
|
||||||
@@ -207,6 +208,17 @@ public class QueryFactoryTest {
|
|||||||
System.out.println(subquery);
|
System.out.println(subquery);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@Test
|
||||||
|
public void testExpansion9() {
|
||||||
|
var subquery = parseAndGetSpecs("pie recipe");
|
||||||
|
|
||||||
|
Assertions.assertTrue(subquery.query.compiledQuery.contains(" category:food "));
|
||||||
|
|
||||||
|
subquery = parseAndGetSpecs("recipe pie");
|
||||||
|
|
||||||
|
Assertions.assertFalse(subquery.query.compiledQuery.contains(" category:food "));
|
||||||
|
}
|
||||||
|
|
||||||
@Test
|
@Test
|
||||||
public void testParsing() {
|
public void testParsing() {
|
||||||
var subquery = parseAndGetSpecs("strlen()");
|
var subquery = parseAndGetSpecs("strlen()");
|
||||||
|
@@ -27,7 +27,7 @@ public class SentenceSegmentSplitter {
|
|||||||
else {
|
else {
|
||||||
// If we flatten unicode, we do this...
|
// If we flatten unicode, we do this...
|
||||||
// FIXME: This can almost definitely be cleaned up and simplified.
|
// FIXME: This can almost definitely be cleaned up and simplified.
|
||||||
wordBreakPattern = Pattern.compile("([^/_#@.a-zA-Z'+\\-0-9\\u00C0-\\u00D6\\u00D8-\\u00f6\\u00f8-\\u00ff]+)|[|]|(\\.(\\s+|$))");
|
wordBreakPattern = Pattern.compile("([^/<>$:_#@.a-zA-Z'+\\-0-9\\u00C0-\\u00D6\\u00D8-\\u00f6\\u00f8-\\u00ff]+)|[|]|(\\.(\\s+|$))");
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@@ -28,6 +28,20 @@ class SentenceExtractorTest {
|
|||||||
System.out.println(dld);
|
System.out.println(dld);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@Test
|
||||||
|
void testCplusplus() {
|
||||||
|
var dld = sentenceExtractor.extractSentence("std::vector", EnumSet.noneOf(HtmlTag.class));
|
||||||
|
assertEquals(1, dld.length());
|
||||||
|
assertEquals("std::vector", dld.wordsLowerCase[0]);
|
||||||
|
}
|
||||||
|
|
||||||
|
@Test
|
||||||
|
void testPHP() {
|
||||||
|
var dld = sentenceExtractor.extractSentence("$_GET", EnumSet.noneOf(HtmlTag.class));
|
||||||
|
assertEquals(1, dld.length());
|
||||||
|
assertEquals("$_get", dld.wordsLowerCase[0]);
|
||||||
|
}
|
||||||
|
|
||||||
@Test
|
@Test
|
||||||
void testPolishArtist() {
|
void testPolishArtist() {
|
||||||
var dld = sentenceExtractor.extractSentence("Uklański", EnumSet.noneOf(HtmlTag.class));
|
var dld = sentenceExtractor.extractSentence("Uklański", EnumSet.noneOf(HtmlTag.class));
|
||||||
|
@@ -20,34 +20,11 @@ public record ContentTags(String etag, String lastMod) {
|
|||||||
public void paint(Request.Builder getBuilder) {
|
public void paint(Request.Builder getBuilder) {
|
||||||
|
|
||||||
if (etag != null) {
|
if (etag != null) {
|
||||||
getBuilder.addHeader("If-None-Match", ifNoneMatch());
|
getBuilder.addHeader("If-None-Match", etag);
|
||||||
}
|
}
|
||||||
|
|
||||||
if (lastMod != null) {
|
if (lastMod != null) {
|
||||||
getBuilder.addHeader("If-Modified-Since", ifModifiedSince());
|
getBuilder.addHeader("If-Modified-Since", lastMod);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
private String ifNoneMatch() {
|
|
||||||
// Remove the W/ prefix if it exists
|
|
||||||
|
|
||||||
//'W/' (case-sensitive) indicates that a weak validator is used. Weak etags are
|
|
||||||
// easy to generate, but are far less useful for comparisons. Strong validators
|
|
||||||
// are ideal for comparisons but can be very difficult to generate efficiently.
|
|
||||||
// Weak ETag values of two representations of the same resources might be semantically
|
|
||||||
// equivalent, but not byte-for-byte identical. This means weak etags prevent caching
|
|
||||||
// when byte range requests are used, but strong etags mean range requests can
|
|
||||||
// still be cached.
|
|
||||||
// - https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/ETag
|
|
||||||
|
|
||||||
if (null != etag && etag.startsWith("W/")) {
|
|
||||||
return etag.substring(2);
|
|
||||||
} else {
|
|
||||||
return etag;
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
private String ifModifiedSince() {
|
|
||||||
return lastMod;
|
|
||||||
}
|
|
||||||
}
|
}
|
||||||
|
@@ -34,8 +34,9 @@ import java.util.*;
|
|||||||
public class WarcRecorder implements AutoCloseable {
|
public class WarcRecorder implements AutoCloseable {
|
||||||
/** Maximum time we'll wait on a single request */
|
/** Maximum time we'll wait on a single request */
|
||||||
static final int MAX_TIME = 30_000;
|
static final int MAX_TIME = 30_000;
|
||||||
/** Maximum (decompressed) size we'll fetch */
|
|
||||||
static final int MAX_SIZE = 1024 * 1024 * 10;
|
/** Maximum (decompressed) size we'll save */
|
||||||
|
static final int MAX_SIZE = Integer.getInteger("crawler.maxFetchSize", 10 * 1024 * 1024);
|
||||||
|
|
||||||
private final WarcWriter writer;
|
private final WarcWriter writer;
|
||||||
private final Path warcFile;
|
private final Path warcFile;
|
||||||
|
@@ -1,11 +1,15 @@
|
|||||||
package nu.marginalia.io;
|
package nu.marginalia.io;
|
||||||
|
|
||||||
|
import nu.marginalia.model.crawldata.CrawledDocument;
|
||||||
|
import nu.marginalia.model.crawldata.CrawledDomain;
|
||||||
import nu.marginalia.model.crawldata.SerializableCrawlData;
|
import nu.marginalia.model.crawldata.SerializableCrawlData;
|
||||||
import org.jetbrains.annotations.Nullable;
|
import org.jetbrains.annotations.Nullable;
|
||||||
|
|
||||||
import java.io.IOException;
|
import java.io.IOException;
|
||||||
import java.nio.file.Path;
|
import java.nio.file.Path;
|
||||||
|
import java.util.ArrayList;
|
||||||
import java.util.Iterator;
|
import java.util.Iterator;
|
||||||
|
import java.util.List;
|
||||||
|
|
||||||
/** Closable iterator exceptional over serialized crawl data
|
/** Closable iterator exceptional over serialized crawl data
|
||||||
* The data may appear in any order, and the iterator must be closed.
|
* The data may appear in any order, and the iterator must be closed.
|
||||||
@@ -26,6 +30,37 @@ public interface SerializableCrawlDataStream extends AutoCloseable {
|
|||||||
@Nullable
|
@Nullable
|
||||||
default Path path() { return null; }
|
default Path path() { return null; }
|
||||||
|
|
||||||
|
/** For tests */
|
||||||
|
default List<SerializableCrawlData> asList() throws IOException {
|
||||||
|
List<SerializableCrawlData> data = new ArrayList<>();
|
||||||
|
while (hasNext()) {
|
||||||
|
data.add(next());
|
||||||
|
}
|
||||||
|
return data;
|
||||||
|
}
|
||||||
|
|
||||||
|
/** For tests */
|
||||||
|
default List<CrawledDocument> docsAsList() throws IOException {
|
||||||
|
List<CrawledDocument> data = new ArrayList<>();
|
||||||
|
while (hasNext()) {
|
||||||
|
if (next() instanceof CrawledDocument doc) {
|
||||||
|
data.add(doc);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return data;
|
||||||
|
}
|
||||||
|
|
||||||
|
/** For tests */
|
||||||
|
default List<CrawledDomain> domainsAsList() throws IOException {
|
||||||
|
List<CrawledDomain> data = new ArrayList<>();
|
||||||
|
while (hasNext()) {
|
||||||
|
if (next() instanceof CrawledDomain domain) {
|
||||||
|
data.add(domain);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return data;
|
||||||
|
}
|
||||||
|
|
||||||
// Dummy iterator over nothing
|
// Dummy iterator over nothing
|
||||||
static SerializableCrawlDataStream empty() {
|
static SerializableCrawlDataStream empty() {
|
||||||
return new SerializableCrawlDataStream() {
|
return new SerializableCrawlDataStream() {
|
||||||
|
@@ -26,6 +26,7 @@ import java.net.http.HttpHeaders;
|
|||||||
import java.net.http.HttpRequest;
|
import java.net.http.HttpRequest;
|
||||||
import java.net.http.HttpResponse;
|
import java.net.http.HttpResponse;
|
||||||
import java.time.Duration;
|
import java.time.Duration;
|
||||||
|
import java.util.ArrayList;
|
||||||
import java.util.List;
|
import java.util.List;
|
||||||
import java.util.Optional;
|
import java.util.Optional;
|
||||||
import java.util.concurrent.ThreadLocalRandom;
|
import java.util.concurrent.ThreadLocalRandom;
|
||||||
@@ -47,6 +48,8 @@ public class SimpleLinkScraper implements AutoCloseable {
|
|||||||
private final Duration readTimeout = Duration.ofSeconds(10);
|
private final Duration readTimeout = Duration.ofSeconds(10);
|
||||||
private final DomainLocks domainLocks = new DomainLocks();
|
private final DomainLocks domainLocks = new DomainLocks();
|
||||||
|
|
||||||
|
private final static int MAX_SIZE = Integer.getInteger("crawler.maxFetchSize", 10 * 1024 * 1024);
|
||||||
|
|
||||||
public SimpleLinkScraper(LiveCrawlDataSet dataSet,
|
public SimpleLinkScraper(LiveCrawlDataSet dataSet,
|
||||||
DbDomainQueries domainQueries,
|
DbDomainQueries domainQueries,
|
||||||
DomainBlacklist domainBlacklist) {
|
DomainBlacklist domainBlacklist) {
|
||||||
@@ -65,52 +68,68 @@ public class SimpleLinkScraper implements AutoCloseable {
|
|||||||
pool.submitQuietly(() -> retrieveNow(domain, id.getAsInt(), urls));
|
pool.submitQuietly(() -> retrieveNow(domain, id.getAsInt(), urls));
|
||||||
}
|
}
|
||||||
|
|
||||||
public void retrieveNow(EdgeDomain domain, int domainId, List<String> urls) throws Exception {
|
public int retrieveNow(EdgeDomain domain, int domainId, List<String> urls) throws Exception {
|
||||||
|
|
||||||
|
EdgeUrl rootUrl = domain.toRootUrlHttps();
|
||||||
|
|
||||||
|
List<EdgeUrl> relevantUrls = new ArrayList<>();
|
||||||
|
|
||||||
|
for (var url : urls) {
|
||||||
|
Optional<EdgeUrl> optParsedUrl = lp.parseLink(rootUrl, url);
|
||||||
|
if (optParsedUrl.isEmpty()) {
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
if (dataSet.hasUrl(optParsedUrl.get())) {
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
relevantUrls.add(optParsedUrl.get());
|
||||||
|
}
|
||||||
|
|
||||||
|
if (relevantUrls.isEmpty()) {
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
int fetched = 0;
|
||||||
|
|
||||||
try (HttpClient client = HttpClient
|
try (HttpClient client = HttpClient
|
||||||
.newBuilder()
|
.newBuilder()
|
||||||
.connectTimeout(connectTimeout)
|
.connectTimeout(connectTimeout)
|
||||||
.followRedirects(HttpClient.Redirect.NEVER)
|
.followRedirects(HttpClient.Redirect.NEVER)
|
||||||
.version(HttpClient.Version.HTTP_2)
|
.version(HttpClient.Version.HTTP_2)
|
||||||
.build();
|
.build();
|
||||||
DomainLocks.DomainLock lock = domainLocks.lockDomain(domain) // throttle concurrent access per domain; do not remove
|
// throttle concurrent access per domain; IDE will complain it's not used, but it holds a semaphore -- do not remove:
|
||||||
|
DomainLocks.DomainLock lock = domainLocks.lockDomain(domain)
|
||||||
) {
|
) {
|
||||||
|
|
||||||
EdgeUrl rootUrl = domain.toRootUrlHttps();
|
|
||||||
|
|
||||||
SimpleRobotRules rules = fetchRobotsRules(rootUrl, client);
|
SimpleRobotRules rules = fetchRobotsRules(rootUrl, client);
|
||||||
|
|
||||||
if (rules == null) { // I/O error fetching robots.txt
|
if (rules == null) { // I/O error fetching robots.txt
|
||||||
// If we can't fetch the robots.txt,
|
// If we can't fetch the robots.txt,
|
||||||
for (var url : urls) {
|
for (var url : relevantUrls) {
|
||||||
lp.parseLink(rootUrl, url).ifPresent(this::maybeFlagAsBad);
|
maybeFlagAsBad(url);
|
||||||
}
|
}
|
||||||
return;
|
return fetched;
|
||||||
}
|
}
|
||||||
|
|
||||||
CrawlDelayTimer timer = new CrawlDelayTimer(rules.getCrawlDelay());
|
CrawlDelayTimer timer = new CrawlDelayTimer(rules.getCrawlDelay());
|
||||||
|
|
||||||
for (var url : urls) {
|
for (var parsedUrl : relevantUrls) {
|
||||||
Optional<EdgeUrl> optParsedUrl = lp.parseLink(rootUrl, url);
|
|
||||||
if (optParsedUrl.isEmpty()) {
|
|
||||||
continue;
|
|
||||||
}
|
|
||||||
if (dataSet.hasUrl(optParsedUrl.get())) {
|
|
||||||
continue;
|
|
||||||
}
|
|
||||||
|
|
||||||
EdgeUrl parsedUrl = optParsedUrl.get();
|
if (!rules.isAllowed(parsedUrl.toString())) {
|
||||||
if (!rules.isAllowed(url)) {
|
|
||||||
maybeFlagAsBad(parsedUrl);
|
maybeFlagAsBad(parsedUrl);
|
||||||
continue;
|
continue;
|
||||||
}
|
}
|
||||||
|
|
||||||
switch (fetchUrl(domainId, parsedUrl, timer, client)) {
|
switch (fetchUrl(domainId, parsedUrl, timer, client)) {
|
||||||
case FetchResult.Success(int id, EdgeUrl docUrl, String body, String headers)
|
case FetchResult.Success(int id, EdgeUrl docUrl, String body, String headers) -> {
|
||||||
-> dataSet.saveDocument(id, docUrl, body, headers, "");
|
dataSet.saveDocument(id, docUrl, body, headers, "");
|
||||||
|
fetched++;
|
||||||
|
}
|
||||||
case FetchResult.Error(EdgeUrl docUrl) -> maybeFlagAsBad(docUrl);
|
case FetchResult.Error(EdgeUrl docUrl) -> maybeFlagAsBad(docUrl);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
return fetched;
|
||||||
}
|
}
|
||||||
|
|
||||||
private void maybeFlagAsBad(EdgeUrl url) {
|
private void maybeFlagAsBad(EdgeUrl url) {
|
||||||
@@ -190,7 +209,7 @@ public class SimpleLinkScraper implements AutoCloseable {
|
|||||||
}
|
}
|
||||||
|
|
||||||
byte[] body = getResponseData(response);
|
byte[] body = getResponseData(response);
|
||||||
if (body.length > 1024 * 1024) {
|
if (body.length > MAX_SIZE) {
|
||||||
return new FetchResult.Error(parsedUrl);
|
return new FetchResult.Error(parsedUrl);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@@ -3,8 +3,8 @@ package nu.marginalia.livecrawler;
|
|||||||
import nu.marginalia.db.DomainBlacklistImpl;
|
import nu.marginalia.db.DomainBlacklistImpl;
|
||||||
import nu.marginalia.io.SerializableCrawlDataStream;
|
import nu.marginalia.io.SerializableCrawlDataStream;
|
||||||
import nu.marginalia.model.EdgeDomain;
|
import nu.marginalia.model.EdgeDomain;
|
||||||
|
import nu.marginalia.model.EdgeUrl;
|
||||||
import nu.marginalia.model.crawldata.CrawledDocument;
|
import nu.marginalia.model.crawldata.CrawledDocument;
|
||||||
import nu.marginalia.model.crawldata.CrawledDomain;
|
|
||||||
import org.apache.commons.io.FileUtils;
|
import org.apache.commons.io.FileUtils;
|
||||||
import org.junit.jupiter.api.AfterEach;
|
import org.junit.jupiter.api.AfterEach;
|
||||||
import org.junit.jupiter.api.Assertions;
|
import org.junit.jupiter.api.Assertions;
|
||||||
@@ -38,7 +38,8 @@ class SimpleLinkScraperTest {
|
|||||||
@Test
|
@Test
|
||||||
public void testRetrieveNow() throws Exception {
|
public void testRetrieveNow() throws Exception {
|
||||||
var scraper = new SimpleLinkScraper(dataSet, null, Mockito.mock(DomainBlacklistImpl.class));
|
var scraper = new SimpleLinkScraper(dataSet, null, Mockito.mock(DomainBlacklistImpl.class));
|
||||||
scraper.retrieveNow(new EdgeDomain("www.marginalia.nu"), 1, List.of("https://www.marginalia.nu/"));
|
int fetched = scraper.retrieveNow(new EdgeDomain("www.marginalia.nu"), 1, List.of("https://www.marginalia.nu/"));
|
||||||
|
Assertions.assertEquals(1, fetched);
|
||||||
|
|
||||||
var streams = dataSet.getDataStreams();
|
var streams = dataSet.getDataStreams();
|
||||||
Assertions.assertEquals(1, streams.size());
|
Assertions.assertEquals(1, streams.size());
|
||||||
@@ -46,23 +47,20 @@ class SimpleLinkScraperTest {
|
|||||||
SerializableCrawlDataStream firstStream = streams.iterator().next();
|
SerializableCrawlDataStream firstStream = streams.iterator().next();
|
||||||
Assertions.assertTrue(firstStream.hasNext());
|
Assertions.assertTrue(firstStream.hasNext());
|
||||||
|
|
||||||
if (firstStream.next() instanceof CrawledDomain domain) {
|
List<CrawledDocument> documents = firstStream.docsAsList();
|
||||||
Assertions.assertEquals("www.marginalia.nu",domain.getDomain());
|
Assertions.assertEquals(1, documents.size());
|
||||||
}
|
Assertions.assertTrue(documents.getFirst().documentBody.startsWith("<!doctype"));
|
||||||
else {
|
}
|
||||||
Assertions.fail();
|
|
||||||
}
|
|
||||||
|
|
||||||
Assertions.assertTrue(firstStream.hasNext());
|
|
||||||
|
|
||||||
if ((firstStream.next() instanceof CrawledDocument document)) {
|
|
||||||
// verify we decompress the body string
|
|
||||||
Assertions.assertTrue(document.documentBody.startsWith("<!doctype"));
|
|
||||||
}
|
|
||||||
else{
|
|
||||||
Assertions.fail();
|
|
||||||
}
|
|
||||||
|
|
||||||
Assertions.assertFalse(firstStream.hasNext());
|
@Test
|
||||||
|
public void testRetrieveNow_Redundant() throws Exception {
|
||||||
|
dataSet.saveDocument(1, new EdgeUrl("https://www.marginalia.nu/"), "<html>", "", "127.0.0.1");
|
||||||
|
var scraper = new SimpleLinkScraper(dataSet, null, Mockito.mock(DomainBlacklistImpl.class));
|
||||||
|
|
||||||
|
// If the requested URL is already in the dataSet, we retrieveNow should shortcircuit and not fetch anything
|
||||||
|
int fetched = scraper.retrieveNow(new EdgeDomain("www.marginalia.nu"), 1, List.of("https://www.marginalia.nu/"));
|
||||||
|
Assertions.assertEquals(0, fetched);
|
||||||
}
|
}
|
||||||
}
|
}
|
@@ -0,0 +1,14 @@
|
|||||||
|
<section id="frontpage-tips">
|
||||||
|
<h2>Public Beta Available</h2>
|
||||||
|
<div class="info">
|
||||||
|
<p>
|
||||||
|
A redesigned version of the search engine UI is available for beta testing.
|
||||||
|
Feel free to give it a spin, feedback is welcome!
|
||||||
|
The old one will also be keep being available if you hate it,
|
||||||
|
or have compatibility issues.
|
||||||
|
</p>
|
||||||
|
<p>
|
||||||
|
<a href="https://test.marginalia.nu/">Try it out!</a>
|
||||||
|
</p>
|
||||||
|
</div>
|
||||||
|
</section>
|
@@ -24,7 +24,7 @@
|
|||||||
<section id="frontpage">
|
<section id="frontpage">
|
||||||
{{>search/index/index-news}}
|
{{>search/index/index-news}}
|
||||||
{{>search/index/index-about}}
|
{{>search/index/index-about}}
|
||||||
{{>search/index/index-tips}}
|
{{>search/index/index-redesign}}
|
||||||
</section>
|
</section>
|
||||||
|
|
||||||
{{>search/parts/search-footer}}
|
{{>search/parts/search-footer}}
|
||||||
|
34
tools/deployment/deployment.py
Normal file → Executable file
34
tools/deployment/deployment.py
Normal file → Executable file
@@ -1,3 +1,5 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
|
||||||
from dataclasses import dataclass
|
from dataclasses import dataclass
|
||||||
import subprocess, os
|
import subprocess, os
|
||||||
from typing import List, Set, Dict, Optional
|
from typing import List, Set, Dict, Optional
|
||||||
@@ -220,6 +222,31 @@ def run_gradle_build(targets: str) -> None:
|
|||||||
if return_code != 0:
|
if return_code != 0:
|
||||||
raise BuildError(service, return_code)
|
raise BuildError(service, return_code)
|
||||||
|
|
||||||
|
|
||||||
|
def find_free_tag() -> str:
|
||||||
|
cmd = ['git', 'tag']
|
||||||
|
result = subprocess.run(cmd, capture_output=True, text=True)
|
||||||
|
|
||||||
|
if result.returncode != 0:
|
||||||
|
raise RuntimeError(f"Git command failed: {result.stderr}")
|
||||||
|
|
||||||
|
existing_tags = set(result.stdout.splitlines())
|
||||||
|
|
||||||
|
for i in range(1, 100000):
|
||||||
|
tag = f'deploy-{i:04d}'
|
||||||
|
if not tag in existing_tags:
|
||||||
|
return tag
|
||||||
|
raise RuntimeError(f"Failed to find a free deployment tag")
|
||||||
|
|
||||||
|
def add_tags(tags: str) -> None:
|
||||||
|
new_tag = find_free_tag()
|
||||||
|
|
||||||
|
cmd = ['git', 'tag', new_tag, '-am', tags]
|
||||||
|
result = subprocess.run(cmd)
|
||||||
|
|
||||||
|
if result.returncode != 0:
|
||||||
|
raise RuntimeError(f"Git command failed: {result.stderr}")
|
||||||
|
|
||||||
# Example usage:
|
# Example usage:
|
||||||
if __name__ == '__main__':
|
if __name__ == '__main__':
|
||||||
# Define service configuration
|
# Define service configuration
|
||||||
@@ -293,7 +320,9 @@ if __name__ == '__main__':
|
|||||||
parser = argparse.ArgumentParser(
|
parser = argparse.ArgumentParser(
|
||||||
prog='deployment.py',
|
prog='deployment.py',
|
||||||
description='Continuous Deployment helper')
|
description='Continuous Deployment helper')
|
||||||
|
|
||||||
parser.add_argument('-v', '--verify', help='Verify the tags are valid, if present', action='store_true')
|
parser.add_argument('-v', '--verify', help='Verify the tags are valid, if present', action='store_true')
|
||||||
|
parser.add_argument('-a', '--add', help='Add the tags provided as a new deployment tag, usually combined with -t', action='store_true')
|
||||||
parser.add_argument('-t', '--tag', help='Use the specified tag value instead of the head git tag starting with deploy-')
|
parser.add_argument('-t', '--tag', help='Use the specified tag value instead of the head git tag starting with deploy-')
|
||||||
|
|
||||||
args = parser.parse_args()
|
args = parser.parse_args()
|
||||||
@@ -314,7 +343,10 @@ if __name__ == '__main__':
|
|||||||
print("Services to build:", plan.services_to_build)
|
print("Services to build:", plan.services_to_build)
|
||||||
print("Instances to deploy:", [container.name for container in plan.instances_to_deploy])
|
print("Instances to deploy:", [container.name for container in plan.instances_to_deploy])
|
||||||
|
|
||||||
if not args.verify:
|
if args.verify:
|
||||||
|
if args.add:
|
||||||
|
add_tags(args.tag)
|
||||||
|
else:
|
||||||
print("\nExecution Plan:")
|
print("\nExecution Plan:")
|
||||||
|
|
||||||
build_and_deploy(plan, SERVICE_CONFIG)
|
build_and_deploy(plan, SERVICE_CONFIG)
|
||||||
|
Reference in New Issue
Block a user