(doc) Migrate documentation https://docs.marginalia.nu/

2025-10-05 21:22:39 +02:00 · 2024-01-22 19:40:08 +01:00
parent a6d257df5b
commit 562012fb22
19 changed files with 46 additions and 1026 deletions
--- a/run/readme.md
+++ b/run/readme.md
@@ -1,8 +1,10 @@
 # Run

-When developing locally, this directory will contain run-time data required for
-the search engine. In a clean check-out, it only contains the tools required to 
-bootstrap this directory structure.
+This directory is a staging area for running the system.  It contains scripts
+and templates for installing the system on a server, and for running it locally.
+
+See [https://docs.marginalia.nu/](https://docs.marginalia.nu/) for additional
+documentation.

 ## Requirements

@@ -16,8 +18,7 @@ graalce is a good distribution choice but it doesn't matter too much.
 ## Set up

 To go from a clean check out of the git repo to a running search engine,
-follow these steps.  This assumes a test deployment.  For a production like
-setup... (TODO: write a guide for this).
+follow these steps. 

 You're assumed to sit in the project root the whole time.

@@ -35,106 +36,40 @@ $ run/setup.sh
 ```shell
 $ ./gradlew docker
 ```
-
-### 3. Initialize the database
-
-Before the system can be brought online, the database needs to be initialized.  To do this,
-bring up the database in the background, and run the flyway migration tool.
+### 3.  Install the system

 ```shell
-$ docker-compose up -d mariadb
-$ ./gradlew flywayMigrate
+$ run/install.sh <install-directory>
 ```

-### 4. Bring the system online. 
+To install the system, you need to run the install script.  It will prompt 
+you for which installation mode you want to use.  The options are:

-We'll run it in the foreground in the terminal this time because it's educational to see the logs. 
-Add `-d` to run in the background.
+1. Barebones - This will install a white-label search engine with no data.  You can 
+   use this to index your own data.  It disables and hides functionality that is strongly
+   related to the Marginalia project, such as the Marginalia GUI. 
+2. Full Marginalia Search instance - This will install an instance of the search engine
+   configured like [search.marginalia.nu](https://search.marginalia.nu).  This is useful
+   for local development and testing.
+
+It will also prompt you for account details for a new mariadb instance, which will be
+created for you.  The database will be initialized with the schema and data required
+for the search engine to run.
+
+After filling out all the details, the script will copy the installation files to the
+specified directory.
+
+### 4. Run the system

 ```shell
-$ docker-compose up
+$ cd install_directory
+$ docker-compose up -d 
+# To see the logs: 
+$ docker-compose logs -f
 ```

-There are two docker-compose files available, `docker-compose.yml` and `docker-compose-barebones.yml`;
-the latter is a stripped down version that only runs the bare minimum required to run the system, for e.g.
-running a whitelabel version of the system.  The former is the full system with all the frills of
-Marginalia Search, and is the one used by default.
+You can now access a search interface at `http://localhost:8080`, and the admin interface
+at `http://localhost:8081/`.   

-To start the barebones version, run:
-
-```shell
-$ docker-compose -f docker-compose-barebones.yml up
-```
-
-### 5. You should now be able to access the system.
-
-By default, the docker-compose file publishes the following ports:
-
-| Address                 | Description      |
-|-------------------------|------------------|
-| http://localhost:8080/ | User-facing GUI  |
-| http://localhost:8081/ | Operator's GUI   |
-
-Note that the operator's GUI does not perform any sort of authentication.  
-Preferably don't expose it publicly, but if you absolutely must, use a proxy or 
-Basic Auth to add security.
-
-### 6. Download Sample Data
-
-A script is available for downloading sample data. The script will download the
-data from https://downloads.marginalia.nu/ and extract it to the correct location.
-
-The system will pick the data up automatically.
-
-```shell
-$ run/download-samples.sh l
-```
-
-Four sets are available:
-
-| Name | Description                     |
-|------|---------------------------------|
-| s    | Small set, 1000 domains         |
-| m    | Medium set, 2000 domains        |
-| l    | Large set, 5000 domains         |
-| xl   | Extra large set, 50,000 domains |
-
-Warning: The XL set is intended to provide a large amount of data for 
-setting up a pre-production environment. It may be hard to run on a smaller
-machine and will on most machines take several hours to process.
-
-The 'm' or 'l' sets are a good compromise between size and processing time 
-and should work on most machines.
-
-### 7. Process the data
-
-Bring the system online if it isn't (see step 4), then go to the operator's
-GUI (see step 5).  
-
-* Go to `Node 1 -> Storage -> Crawl Data`
-* Hit the toggle to set your crawl data to be active
-* Go to `Actions -> Process Crawl Data -> [Trigger Reprocessing]`
-
-This will take anywhere between a few minutes to a few hours depending on which
-data set you downloaded.  You can monitor the progress from the `Overview` tab.
-
-First the CONVERTER is expected to run; this will process the data into a format 
-that can easily be inserted into the database and index.
-
-Next the LOADER will run; this will insert the data into the database and index.
-
-Next the link database will repartition itself, and finally the index will be
-reconstructed.  You can view the process of these steps in the `Jobs` listing.
-
-### 8. Run the system
-
-Once all this is done, you can go to the user-facing GUI (see step 5) and try
-a search.  
-
-Important! Use the 'No Ranking' option when running locally, since you'll very
-likely not have enough links for the ranking algorithm to perform well.
-
-## Experiment Runner
-
-The script `experiment.sh` is a launcher for the experiment runner, which is useful when 
-evaluating new algorithms in processing crawl data. 
+There is no data in the system yet.  To load data into the system,
+see the guide at [https://docs.marginalia.nu/](https://docs.marginalia.nu/).