1
1
mirror of https://github.com/MarginaliaSearch/MarginaliaSearch.git synced 2025-10-05 21:22:39 +02:00

(doc) Migrate documentation https://docs.marginalia.nu/

This commit is contained in:
Viktor Lofgren
2024-01-22 19:40:08 +01:00
parent a6d257df5b
commit 562012fb22
19 changed files with 46 additions and 1026 deletions

View File

@@ -1,8 +1,10 @@
# Run
When developing locally, this directory will contain run-time data required for
the search engine. In a clean check-out, it only contains the tools required to
bootstrap this directory structure.
This directory is a staging area for running the system. It contains scripts
and templates for installing the system on a server, and for running it locally.
See [https://docs.marginalia.nu/](https://docs.marginalia.nu/) for additional
documentation.
## Requirements
@@ -16,8 +18,7 @@ graalce is a good distribution choice but it doesn't matter too much.
## Set up
To go from a clean check out of the git repo to a running search engine,
follow these steps. This assumes a test deployment. For a production like
setup... (TODO: write a guide for this).
follow these steps.
You're assumed to sit in the project root the whole time.
@@ -35,106 +36,40 @@ $ run/setup.sh
```shell
$ ./gradlew docker
```
### 3. Initialize the database
Before the system can be brought online, the database needs to be initialized. To do this,
bring up the database in the background, and run the flyway migration tool.
### 3. Install the system
```shell
$ docker-compose up -d mariadb
$ ./gradlew flywayMigrate
$ run/install.sh <install-directory>
```
### 4. Bring the system online.
To install the system, you need to run the install script. It will prompt
you for which installation mode you want to use. The options are:
We'll run it in the foreground in the terminal this time because it's educational to see the logs.
Add `-d` to run in the background.
1. Barebones - This will install a white-label search engine with no data. You can
use this to index your own data. It disables and hides functionality that is strongly
related to the Marginalia project, such as the Marginalia GUI.
2. Full Marginalia Search instance - This will install an instance of the search engine
configured like [search.marginalia.nu](https://search.marginalia.nu). This is useful
for local development and testing.
It will also prompt you for account details for a new mariadb instance, which will be
created for you. The database will be initialized with the schema and data required
for the search engine to run.
After filling out all the details, the script will copy the installation files to the
specified directory.
### 4. Run the system
```shell
$ docker-compose up
$ cd install_directory
$ docker-compose up -d
# To see the logs:
$ docker-compose logs -f
```
There are two docker-compose files available, `docker-compose.yml` and `docker-compose-barebones.yml`;
the latter is a stripped down version that only runs the bare minimum required to run the system, for e.g.
running a whitelabel version of the system. The former is the full system with all the frills of
Marginalia Search, and is the one used by default.
You can now access a search interface at `http://localhost:8080`, and the admin interface
at `http://localhost:8081/`.
To start the barebones version, run:
```shell
$ docker-compose -f docker-compose-barebones.yml up
```
### 5. You should now be able to access the system.
By default, the docker-compose file publishes the following ports:
| Address | Description |
|-------------------------|------------------|
| http://localhost:8080/ | User-facing GUI |
| http://localhost:8081/ | Operator's GUI |
Note that the operator's GUI does not perform any sort of authentication.
Preferably don't expose it publicly, but if you absolutely must, use a proxy or
Basic Auth to add security.
### 6. Download Sample Data
A script is available for downloading sample data. The script will download the
data from https://downloads.marginalia.nu/ and extract it to the correct location.
The system will pick the data up automatically.
```shell
$ run/download-samples.sh l
```
Four sets are available:
| Name | Description |
|------|---------------------------------|
| s | Small set, 1000 domains |
| m | Medium set, 2000 domains |
| l | Large set, 5000 domains |
| xl | Extra large set, 50,000 domains |
Warning: The XL set is intended to provide a large amount of data for
setting up a pre-production environment. It may be hard to run on a smaller
machine and will on most machines take several hours to process.
The 'm' or 'l' sets are a good compromise between size and processing time
and should work on most machines.
### 7. Process the data
Bring the system online if it isn't (see step 4), then go to the operator's
GUI (see step 5).
* Go to `Node 1 -> Storage -> Crawl Data`
* Hit the toggle to set your crawl data to be active
* Go to `Actions -> Process Crawl Data -> [Trigger Reprocessing]`
This will take anywhere between a few minutes to a few hours depending on which
data set you downloaded. You can monitor the progress from the `Overview` tab.
First the CONVERTER is expected to run; this will process the data into a format
that can easily be inserted into the database and index.
Next the LOADER will run; this will insert the data into the database and index.
Next the link database will repartition itself, and finally the index will be
reconstructed. You can view the process of these steps in the `Jobs` listing.
### 8. Run the system
Once all this is done, you can go to the user-facing GUI (see step 5) and try
a search.
Important! Use the 'No Ranking' option when running locally, since you'll very
likely not have enough links for the ranking algorithm to perform well.
## Experiment Runner
The script `experiment.sh` is a launcher for the experiment runner, which is useful when
evaluating new algorithms in processing crawl data.
There is no data in the system yet. To load data into the system,
see the guide at [https://docs.marginalia.nu/](https://docs.marginalia.nu/).