(search) Change icon for small web filter

The previous icon was of an irregular size and shifted the layout in an unaesthetic way.
(search) Adjustments to devicd detection in CSS
2025-10-05 21:22:39 +02:00 · 2025-03-17 12:07:34 +01:00 · 2025-03-17 12:04:34 +01:00 · 2025-03-17 11:39:19 +01:00 · 2025-03-10 13:48:12 +01:00 · 2025-03-10 13:38:40 +01:00
2944 changed files with 241239 additions and 59990 deletions
--- a/.github/FUNDING.yml
+++ b/.github/FUNDING.yml
@@ -0,0 +1,14 @@
+# These are supported funding model platforms
+
+polar: marginalia-search
+github: MarginaliaSearch
+patreon: marginalia_nu
+open_collective: # Replace with a single Open Collective username
+ko_fi: # Replace with a single Ko-fi username
+tidelift: # Replace with a single Tidelift platform-name/package-name e.g., npm/babel
+community_bridge: # Replace with a single Community Bridge project-name e.g., cloud-foundry
+liberapay: # Replace with a single Liberapay username
+issuehunt: # Replace with a single IssueHunt username
+otechie: # Replace with a single Otechie username
+lfx_crowdfunding: # Replace with a single LFX Crowdfunding project-name e.g., cloud-foundry
+custom: https://www.buymeacoffee.com/marginalia.nu
--- a/.gitignore
+++ b/.gitignore
@@ -4,3 +4,7 @@ build/
 *~
 .gradle/
 .idea/
+lombok.config
+Dockerfile
+run
+jte-classes
--- a/Contributors.md
+++ b/Contributors.md
@@ -0,0 +1,6 @@
+Not everyone shows up in the git commit history, doesn't mean they didn't contribute valuable changes. 
+In such circumstances, their deeds will be recorded here.
+
+* [@samstorment](https://www.github.com/samstorment) provided a design overhaul for [https://explore.marginalia.nu/](https://explore.marginalia.nu/) in [10cad3](https://github.com/MarginaliaSearch/MarginaliaSearch/commit/10cad3abb29b8a87bf5fd56afbc192335e3e94d7)
+ via [issue #44](https://github.com/MarginaliaSearch/MarginaliaSearch/issues/44).
+* [@dreimolo](https://github.com/dreimolo) provided build script [fixes for apple silicon](https://github.com/MarginaliaSearch/MarginaliaSearch/pull/64)
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -1,42 +1,20 @@
 # Contributing

-At present this is mostly a solo project, but
-external contributions are very welcome.
-
-This is a bit of a special project, 
+This is a bit of a special project,
 in part because a search engine isn't
-like a text editor that you can just 
-download and tinker with; and in part 
+like a text editor that you can just
+download and tinker with; and in part
 because it's as much a research project
 as it is a search engine.

-If you have an idea for a cool change, 
-send an email to <kontakt@marginalia.nu> and
-we can discuss its feasibility. 
+If you have an idea for a cool change,
+email <kontakt@marginalia.nu> and
+we can discuss its feasibility.

 Search is essentially a fractal of interesting
-problems, so even if you don't have an idea, 
+problems, so even if you don't have an idea,
 just a skillset (really any), odds are there's
 something interesting I could point you to.

-## Release and branches
-
-The search engine has a release cycle of
-once per 6-8 weeks, coinciding with the crawling
-cycle. Where model-breaking changes and changes to
-the crawler can be introduced. 
-
-## Running and set-up
-
-There is a complementary project, wmsa.local, which
-contains scripts and instructions for running this
-code base. 
-
-[https://git.marginalia.nu/marginalia/wmsa.local](https://git.marginalia.nu/marginalia/wmsa.local)
-
-## Documentation
-
-What documentation exists resides here:
-
-https://git.marginalia.nu/marginalia/marginalia.nu/wiki
-
+Make sure you check out the [ide-configuration guide](doc/ide-configuration.md)
+to get your IDE set up quickly and easily.
--- a/LICENSE.md
+++ b/LICENSE.md
@@ -14,3 +14,4 @@
    You should have received a copy of the GNU Affero General Public License
    along with this program.  If not, see <http://www.gnu.org/licenses/>.

+Note that packages under [third-party/](third-party/) have different licenses, and the code in [code/libraries/](code/libraries/) is dual-licensed under MIT. 
--- a/NGI0Entrust_tag.svg
+++ b/NGI0Entrust_tag.svg
@@ -0,0 +1,121 @@
+<?xml version="1.0" encoding="UTF-8" standalone="no"?>
+<!-- Created with Inkscape (http://www.inkscape.org/) -->
+
+<svg
+   version="1.1"
+   id="svg2"
+   xml:space="preserve"
+   width="1600.5095"
+   height="502.77777"
+   viewBox="0 0 480.15286 150.83333"
+   xmlns:xlink="http://www.w3.org/1999/xlink"
+   xmlns="http://www.w3.org/2000/svg"
+   xmlns:svg="http://www.w3.org/2000/svg"
+   xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
+   xmlns:cc="http://creativecommons.org/ns#"
+   xmlns:dc="http://purl.org/dc/elements/1.1/"><metadata
+     id="metadata8"><rdf:RDF><cc:Work
+         rdf:about=""><dc:format>image/svg+xml</dc:format><dc:type
+           rdf:resource="http://purl.org/dc/dcmitype/StillImage" /></cc:Work></rdf:RDF></metadata><defs
+     id="defs6"><linearGradient
+       id="linearGradient1220"><stop
+         id="stop1216"
+         offset="0"
+         style="stop-color:#98bf00;stop-opacity:1;" /><stop
+         id="stop1218"
+         offset="1"
+         style="stop-color:#98bf00;stop-opacity:0.51" /></linearGradient><linearGradient
+       x1="0"
+       y1="0"
+       x2="1"
+       y2="0"
+       gradientUnits="userSpaceOnUse"
+       gradientTransform="matrix(-139.45511,-135.52185,-135.52185,139.45511,177.4727,131.75308)"
+       spreadMethod="pad"
+       id="linearGradient28"><stop
+         style="stop-opacity:1;stop-color:#00afbc"
+         offset="0"
+         id="stop24" /><stop
+         style="stop-opacity:1;stop-color:#205374"
+         offset="1"
+         id="stop26" /></linearGradient><clipPath
+       clipPathUnits="userSpaceOnUse"
+       id="clipPath38"><path
+         d="M 0,127.984 H 415.474 V 0 H 0 Z"
+         id="path36" /></clipPath><linearGradient
+       xlink:href="#linearGradient1220"
+       id="linearGradient947"
+       gradientUnits="userSpaceOnUse"
+       x1="14.915152"
+       y1="14.167241"
+       x2="214.11908"
+       y2="111.76186"
+       gradientTransform="matrix(4.4444443,0,0,-4.4444443,-33.008887,535.8)" /><clipPath
+       clipPathUnits="userSpaceOnUse"
+       id="clipPath38-9"><path
+         d="M 0,127.984 H 415.474 V 0 H 0 Z"
+         id="path36-1" /></clipPath></defs><g
+     id="g10"
+     transform="matrix(1.3333333,0,0,-1.3333333,-9.9026662,160.74)"><g
+   id="g40"
+   transform="translate(175.9982,95.8645)" /><g
+   id="g44"
+   transform="translate(152.1193,64.9934)" />
+
+
+
+
+
+
+<g
+   id="NGI0Entrust"><title
+     id="title12661">NGI Zero Entrust</title><path
+     id="path7692"
+     style="fill:#ffffff;fill-opacity:1;stroke:none;stroke-width:0.999999"
+     d="m 133.10651,96.933602 c -6.67899,0 -12.68988,-1.41201 -18.02988,-4.23501 -5.344,-2.822 -9.51678,-6.73803 -12.52178,-11.74702 -3.004994,-5.008 -4.507906,-10.66967 -4.507906,-16.982669 0,-6.314995 1.502912,-11.974991 4.507906,-16.983985 3.005,-5.008995 7.14794,-8.924024 12.42993,-11.747021 5.282,-2.823998 11.23084,-4.23501 17.84883,-4.23501 4.613,0 9.19693,0.698875 13.75093,2.094873 0.045,0.014 0.0912,0.02819 0.13623,0.04219 7.10399,2.201999 11.88413,8.859686 11.88413,16.29668 v 9.047022 c 0,3.581996 -2.90333,6.485889 -6.48633,6.485889 h -0.50581 c -0.064,0 -0.12704,-0.0077 -0.19204,-0.0097 -0.064,0.002 -0.12704,0.0097 -0.19204,0.0097 h -7.28306 c -3.92899,0 -7.35908,-2.964914 -7.61308,-6.884912 -0.278,-4.295996 3.12428,-7.86709 7.36128,-7.86709 0.776,0 1.34293,-0.753702 1.11093,-1.493702 -0.65799,-2.087998 -2.34102,-3.751009 -4.54702,-4.333008 -2.07399,-0.546999 -4.27598,-0.820898 -6.60498,-0.820898 -4.00699,0 -7.57381,0.864972 -10.6998,2.594971 -3.127,1.729999 -5.5704,4.143993 -7.3314,7.23999 -1.761,3.095997 -2.64067,6.617018 -2.64067,10.564014 0,4.005996 0.87967,7.557666 2.64067,10.653656 1.761,3.097 4.2191,5.49317 7.3771,7.19517 3.156,1.698 6.76804,2.54883 10.83604,2.54883 4.68099,0 8.8649,-1.26899 12.5499,-3.80699 2.341,-1.61199 5.52423,-1.58761 7.75723,0.17139 3.47999,2.741 3.2889,8.04495 -0.31509,10.45196 -1.7,1.13599 -3.53807,2.11163 -5.51206,2.92763 -4.553,1.881 -9.62316,2.82305 -15.20816,2.82305 z m -93.706345,-1.09248 c -4.022996,0 -7.284815,-3.26081 -7.284815,-7.28482 v -49.17612 c 0,-4.022993 3.261819,-7.284815 7.284815,-7.284815 4.023996,0 7.284814,3.261822 7.284814,7.284815 V 62.34029 c 0,2.842996 3.564362,4.118722 5.36836,1.921728 L 76.282148,34.757135 c 1.383999,-1.685 3.450155,-2.661768 5.631153,-2.661768 h 1.380761 c 4.023997,0 7.286133,3.261822 7.286133,7.284815 v 49.17612 c 0,4.02401 -3.262136,7.28482 -7.286133,7.28482 -4.023995,0 -7.284815,-3.26081 -7.284815,-7.28482 V 65.615095 c 0,-2.844997 -3.568118,-4.119773 -5.370117,-1.917774 L 46.503925,93.172322 c -1.382997,1.69 -3.45199,2.6688 -5.635987,2.6688 z m 136.597415,-4.4e-4 c -4.074,0 -7.37578,-3.30178 -7.37578,-7.37578 V 39.472027 c 0,-4.073996 3.30178,-7.37622 7.37578,-7.37622 4.074,0 7.37622,3.302224 7.37622,7.37622 v 48.992875 c 0,4.074 -3.30222,7.37578 -7.37622,7.37578 z" /><path
+     id="path30"
+     style="fill:url(#linearGradient947);fill-opacity:1;stroke:none;stroke-width:4.44444"
+     d="M 79.115234 30 C 52.097457 30 30 52.101902 30 79.115234 L 30 423.66211 C 30 450.67989 52.097457 472.77734 79.115234 472.77734 L 812.60352 472.77734 C 839.61685 472.77734 861.7207 450.67544 861.7207 423.66211 L 861.7207 342.50586 C 861.7207 333.51919 865.28844 324.89711 871.64844 318.53711 L 912.07617 278.11133 C 923.36506 266.82688 923.33313 248.52428 912.01758 237.27539 L 871.7207 197.19922 C 865.3207 190.83922 861.7207 182.18238 861.7207 173.16016 L 861.7207 79.115234 C 861.7207 52.101902 839.61685 30 812.60352 30 L 79.115234 30 z M 558.57812 104.87891 C 583.40035 104.87891 605.93437 109.06578 626.16992 117.42578 C 634.94325 121.05245 643.11241 125.38861 650.66797 130.4375 C 666.68575 141.13528 667.53503 164.7084 652.06836 176.89062 C 642.14392 184.7084 627.99624 184.81679 617.5918 177.65234 C 601.21402 166.37234 582.6189 160.73242 561.81445 160.73242 C 543.73445 160.73242 527.68096 164.51388 513.6543 172.06055 C 499.61874 179.62499 488.69385 190.27462 480.86719 204.03906 C 473.04052 217.79906 469.13086 233.58423 469.13086 251.38867 C 469.13086 268.93089 473.04052 284.57984 480.86719 298.33984 C 488.69385 312.09984 499.55339 322.82869 513.45117 330.51758 C 527.3445 338.20647 543.19697 342.05078 561.00586 342.05078 C 571.35697 342.05078 581.14355 340.83345 590.36133 338.40234 C 600.16577 335.81568 607.64587 328.42453 610.57031 319.14453 C 611.60142 315.85564 609.0817 312.50586 605.63281 312.50586 C 586.8017 312.50586 571.68046 296.63435 572.91602 277.54102 C 574.0449 260.11879 589.28973 246.94141 606.75195 246.94141 L 639.12109 246.94141 C 639.40998 246.94141 639.69016 246.97549 639.97461 246.98438 C 640.2635 246.97549 640.54368 246.94141 640.82812 246.94141 L 643.07617 246.94141 C 659.00062 246.94141 671.9043 259.84758 671.9043 275.76758 L 671.9043 315.97656 C 671.9043 349.0299 650.65927 378.61958 619.08594 388.40625 C 618.88594 388.46847 618.68047 388.53153 618.48047 388.59375 C 598.24047 394.79819 577.86746 397.9043 557.36523 397.9043 C 527.9519 397.9043 501.51266 391.63314 478.03711 379.08203 C 454.56155 366.53536 436.14852 349.13527 422.79297 326.87305 C 409.43741 304.61083 402.75781 279.45534 402.75781 251.38867 C 402.75781 223.33089 409.43741 198.16793 422.79297 175.91016 C 436.14852 153.64793 454.6942 136.24339 478.44531 123.70117 C 502.17865 111.15451 528.89368 104.87891 558.57812 104.87891 z M 142.10547 109.73438 L 148.62891 109.73438 C 158.33557 109.73438 167.53107 114.08459 173.67773 121.5957 L 280.94531 252.5957 C 288.9542 262.38237 304.8125 256.71671 304.8125 244.07227 L 304.8125 142.11133 C 304.8125 124.22688 319.30501 109.73438 337.18945 109.73438 C 355.0739 109.73438 369.57227 124.22688 369.57227 142.11133 L 369.57227 360.67188 C 369.57227 378.55187 355.0739 393.04883 337.18945 393.04883 L 331.05273 393.04883 C 321.3594 393.04883 312.1765 388.70764 306.02539 381.21875 L 198.3418 250.08594 C 190.32402 240.32149 174.48242 245.9914 174.48242 258.62695 L 174.48242 360.67188 C 174.48242 378.55187 159.98991 393.04883 142.10547 393.04883 C 124.22547 393.04883 109.72852 378.55187 109.72852 360.67188 L 109.72852 142.11133 C 109.72852 124.22688 124.22547 109.73438 142.10547 109.73438 z M 749.20508 109.73633 C 767.31174 109.73633 781.98828 124.41091 781.98828 142.51758 L 781.98828 360.26367 C 781.98828 378.37034 767.31174 393.04688 749.20508 393.04688 C 731.09841 393.04688 716.42383 378.37034 716.42383 360.26367 L 716.42383 142.51758 C 716.42383 124.41091 731.09841 109.73633 749.20508 109.73633 z "
+     transform="matrix(0.22500001,0,0,-0.22500001,7.4269998,120.555)" /><g
+     aria-label="Z E R O"
+     transform="scale(1,-1)"
+     id="text56"
+     style="font-weight:600;font-size:31.76px;font-family:'Montserrat SemiBold';-inkscape-font-specification:Montserrat-SemiBold;fill:#6f9aa8"><path
+       d="m 261.75384,-85.665085 -13.08512,15.97528 h 13.498 v 3.4936 H 243.206 v -2.76312 l 13.08512,-15.97528 h -12.8628 v -3.4936 h 18.32552 z"
+       id="path12603" /><path
+       d="m 278.84063,-75.787725 v 6.12968 h 12.5452 v 3.46184 h -16.674 v -22.232 h 16.22936 v 3.46184 h -12.10056 v 5.78032 h 10.73488 v 3.39832 z"
+       id="path12605" /><path
+       d="m 323.74919,-66.196205 h -4.4464 l -4.54168,-6.5108 q -0.28584,0.03176 -0.85752,0.03176 h -5.01808 v 6.47904 h -4.1288 v -22.232 h 9.14688 q 2.89016,0 5.01808,0.9528 2.15968,0.9528 3.30304,2.73136 1.14336,1.77856 1.14336,4.22408 0,2.50904 -1.23864,4.31936 -1.20688,1.81032 -3.4936,2.6996 z m -4.54168,-14.32376 q 0,-2.12792 -1.39744,-3.27128 -1.39744,-1.14336 -4.09704,-1.14336 h -4.82752 v 8.86104 h 4.82752 q 2.6996,0 4.09704,-1.14336 1.39744,-1.17512 1.39744,-3.30304 z"
+       id="path12607" /><path
+       d="m 347.12448,-65.878605 q -3.39832,0 -6.12968,-1.46096 -2.73136,-1.49272 -4.2876,-4.09704 -1.55624,-2.63608 -1.55624,-5.8756 0,-3.23952 1.55624,-5.84384 1.55624,-2.63608 4.2876,-4.09704 2.73136,-1.49272 6.12968,-1.49272 3.39832,0 6.12968,1.49272 2.73136,1.46096 4.2876,4.06528 1.55624,2.60432 1.55624,5.8756 0,3.27128 -1.55624,5.8756 -1.55624,2.60432 -4.2876,4.09704 -2.73136,1.46096 -6.12968,1.46096 z m 0,-3.62064 q 2.2232,0 4.00176,-0.98456 1.77856,-1.01632 2.79488,-2.79488 1.01632,-1.81032 1.01632,-4.03352 0,-2.2232 -1.01632,-4.00176 -1.01632,-1.81032 -2.79488,-2.79488 -1.77856,-1.01632 -4.00176,-1.01632 -2.2232,0 -4.00176,1.01632 -1.77856,0.98456 -2.79488,2.79488 -1.01632,1.77856 -1.01632,4.00176 0,2.2232 1.01632,4.03352 1.01632,1.77856 2.79488,2.79488 1.77856,0.98456 4.00176,0.98456 z"
+       id="path12609" /></g><g
+     aria-label="ENTRUST"
+     transform="scale(0.99994801,-1.000052)"
+     id="Entrust"
+     style="font-weight:bold;font-size:20.009px;font-family:'Montserrat SemiBold';-inkscape-font-specification:'Montserrat SemiBold, Bold';letter-spacing:3.55932px;fill:#6f9aa8;stroke-width:0.999947"><path
+       d="m 245.81989,-41.935548 v 3.861737 h 7.90356 v 2.180981 h -10.50473 v -14.0063 h 10.2246 v 2.180981 h -7.62343 v 3.641638 h 6.76304 v 2.140963 z"
+       id="path12612" /><path
+       d="m 270.04847,-40.414864 v -9.484266 h 2.58116 v 14.0063 h -2.14096 l -7.72347,-9.484266 v 9.484266 h -2.58117 v -14.0063 h 2.14097 z"
+       id="path12614" /><path
+       d="m 285.39308,-35.89283 h -2.60117 v -11.80531 h -4.64209 v -2.20099 h 11.88535 v 2.20099 h -4.64209 z"
+       id="path12616" /><path
+       d="m 307.52074,-35.89283 h -2.80126 l -2.86129,-4.101845 q -0.18008,0.02001 -0.54024,0.02001 h -3.16142 v 4.081836 h -2.60117 v -14.0063 h 5.76259 q 1.82082,0 3.16142,0.60027 1.36061,0.60027 2.08094,1.720774 0.72032,1.120504 0.72032,2.661197 0,1.580711 -0.78035,2.721224 -0.76034,1.140513 -2.20099,1.700765 z m -2.86129,-9.024059 q 0,-1.340603 -0.88039,-2.060927 -0.8804,-0.720324 -2.58116,-0.720324 h -3.04137 v 5.582511 h 3.04137 q 1.70076,0 2.58116,-0.720324 0.88039,-0.740333 0.88039,-2.080936 z"
+       id="path12618" /><path
+       d="m 319.76395,-35.69274 q -2.90131,0 -4.52204,-1.620729 -1.62073,-1.640738 -1.62073,-4.682106 v -7.903555 h 2.60117 v 7.80351 q 0,4.121854 3.5616,4.121854 3.5416,0 3.5416,-4.121854 v -7.80351 h 2.56115 v 7.903555 q 0,3.041368 -1.62073,4.682106 -1.60072,1.620729 -4.50202,1.620729 z"
+       id="path12620" /><path
+       d="m 337.4296,-35.69274 q -1.62073,0 -3.14141,-0.460207 -1.50068,-0.460207 -2.38107,-1.220549 l 0.9004,-2.020909 q 0.86039,0.680306 2.10095,1.120504 1.26056,0.420189 2.52113,0.420189 1.5607,0 2.32105,-0.500225 0.78035,-0.500225 0.78035,-1.320594 0,-0.60027 -0.4402,-0.980441 -0.42019,-0.40018 -1.08049,-0.620279 -0.66029,-0.220099 -1.80081,-0.500225 -1.60072,-0.380171 -2.60117,-0.760342 -0.98044,-0.380171 -1.70076,-1.180531 -0.70032,-0.820369 -0.70032,-2.20099 0,-1.160522 0.62028,-2.100945 0.64029,-0.960432 1.90086,-1.520684 1.28057,-0.560252 3.1214,-0.560252 1.28058,0 2.52113,0.320144 1.24056,0.320144 2.14097,0.920414 l -0.82037,2.020909 q -0.92042,-0.540243 -1.92087,-0.820369 -1.00045,-0.280126 -1.94087,-0.280126 -1.54069,0 -2.30103,0.520234 -0.74034,0.520234 -0.74034,1.380621 0,0.60027 0.42019,0.980441 0.4402,0.380171 1.1005,0.60027 0.66029,0.220099 1.80081,0.500225 1.5607,0.360162 2.56115,0.760342 1.00045,0.380171 1.70076,1.180531 0.72033,0.80036 0.72033,2.160972 0,1.160522 -0.64029,2.100945 -0.62028,0.940423 -1.90085,1.500675 -1.28058,0.560252 -3.12141,0.560252 z"
+       id="path12622" /><path
+       d="m 354.47498,-35.89283 h -2.60117 v -11.80531 h -4.64209 v -2.20099 h 11.88535 v 2.20099 h -4.64209 z"
+       id="path12624" /></g></g>
+
+
+
+<text
+   style="font-style:normal;font-variant:normal;font-weight:bold;font-stretch:normal;font-size:20.01px;font-family:'Montserrat SemiBold';-inkscape-font-specification:'Montserrat SemiBold, Bold';font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-feature-settings:normal;text-align:start;writing-mode:lr-tb;text-anchor:start;fill:#6f9aa8;fill-opacity:1;fill-rule:nonzero;stroke:none;stroke-width:1"
+   id="text2843"
+   x="240.16206"
+   y="-35.894695"
+   transform="scale(1,-1)"><tspan
+     id="tspan2841"
+     x="240.16206"
+     y="-35.894695" /></text></g></svg>
--- a/README.md
+++ b/README.md
@@ -1,40 +1,104 @@
-# marginalia.nu
+# Marginalia Search

-This is the source code for marginalia.nu, including the [search engine](https://search.marginalia.nu), 
-the [MEMEX/gemini server](https://memex.marginalia.nu), the and the [encyclopedia service](https://encyclopedia.marginalia.nu). 
+This is the source code for [Marginalia Search](https://search.marginalia.nu). 

 The aim of the project is to develop new and alternative discovery methods for the Internet. 
 It's an experimental workshop as much as it is a public service, the overarching goal is to
-elevate the more human, non-commercial sides of the Internet. A side-goal is to do this without
-requiring datacenters and expensive enterprise hardware, to run this operation on affordable hardware.
+elevate the more human, non-commercial sides of the Internet.

-The canonical git server for this project is [https://git.marginalia.nu](https://git.marginalia.nu).
-It is fine to mirror it on other hosts, but if you have issues or questions
-git.marginalia.nu is where you want to go.
+A side-goal is to do this without requiring datacenters and enterprise hardware budgets, 
+to be able to run this operation on affordable hardware with minimal operational overhead. 

-## Important note about wmsa.local
+The long term plan is to refine the search engine so that it provide enough public value 
+that the project can be funded through grants, donations and commercial API licenses 
+(non-commercial share-alike is always free).

-This project has a [sister repository called wmsa.local](https://git.marginalia.nu/marginalia/wmsa.local)
-that contains scripts and configuration files for running and developing the code. 
+The system can both be run as a copy of Marginalia Search, or as a white-label search engine
+for your own data (either crawled or side-loaded).  At present the logic isn't very configurable, and a lot of the judgements
+made are based on the Marginalia project's goals, but additional configurability is being
+worked on!

-Without it, development is very unpleasant. 
+Here's a demo of the set-up and operation of the self-hostable barebones mode of the search engine: [🌎&nbsp;https://www.youtube.com/watch?v=PNwMkenQQ24](https://www.youtube.com/watch?v=PNwMkenQQ24)

-While developing the code, you will want an environment variable WMSA_HOME pointing to 
-the directory in which wmsa.local is checked out, otherwise the code will not run and
-several tests will fail.
+## Set up

-## Documentation
+To set up a local test environment, follow the instructions in [📄 run/readme.md](run/readme.md)!

-Documentation is a work in progress. See the [wiki](https://git.marginalia.nu/marginalia/marginalia.nu/wiki).
+Further documentation is available at [🌎&nbsp;https://docs.marginalia.nu/](https://docs.marginalia.nu/).

-## Contributing
+Before compiling, it's necessary to run [⚙️ run/setup.sh](run/setup.sh). 
+This will download supplementary model data that is necessary to run the code. 
+These are also necessary to run the tests. 

-[CONTRIBUTING.md](CONTRIBUTING.md)
+If you wish to hack on the code, check out [📄&nbsp;doc/ide-configuration.md](doc/ide-configuration.md).

-## Supporting
+## Hardware Requirements

-Consider [supporting this project](https://memex.marginalia.nu/projects/edge/supporting.gmi).
+A production-like environment requires a lot of RAM and ideally enterprise SSDs for
+the index, as well as some additional terabytes of slower harddrives for storing crawl
+data. It can be made to run on smaller hardware by limiting size of the index.  
+
+The system will definitely run on a 32 Gb machine, possibly smaller, but at that size it may not perform
+very well as it relies on disk caching to be fast. 
+
+A local developer's deployment is possible with much smaller hardware (and index size). 
+
+## Project Structure
+
+[📁 code/](code/) - The Source Code. See [📄 code/readme.md](code/readme.md) for a further breakdown of the structure and architecture.
+
+[📁 run/](run/) - Scripts and files used to run the search engine locally
+
+[📁 third-party/](third-party/) - Third party code
+
+[📁 doc/](doc/) - Supplementary documentation
+
+[📄 CONTRIBUTING.md](CONTRIBUTING.md) - How to contribute
+
+[📄 LICENSE.md](LICENSE.md) - License terms

 ## Contact

 You can email <kontakt@marginalia.nu> with any questions or feedback.
+
+## License
+
+The bulk of the project is available with AGPL 3.0, with exceptions. Some parts are co-licensed under MIT, 
+third party code may have different licenses. See the appropriate readme.md / license.md.
+
+## Versioning
+
+The project uses modified Calendar Versioning, where the first two pairs of numbers are a year and month coinciding 
+with the latest crawling operation, and the third number is a patch number.
+
+```
+            version
+           --
+     yy.mm.VV
+     -----
+     crawl
+```
+
+For example, `23.03.02` is a release with crawl data from March 2023 (released in May 2023).
+It is the second patch for the 23.02 release.
+
+Versions with the same year and month are compatible with each other, or offer an upgrade path where the same 
+data set can be used, but across different crawl sets data format changes may be introduced, and you're generally
+expected to re-crawl the data from scratch as crawler data has shelf life approximately as long as the major release
+cycles of this project. After about 2-3 months it gets noticeably stale with many dead links.
+
+For development purposes, crawling is discouraged and sample data is available. See [📄&nbsp;run/readme.md](run/readme.md)
+for more information. 
+
+## Funding
+
+### Donations
+
+Consider [donating to the project](https://www.marginalia.nu/marginalia-search/supporting/).
+
+### Grants
+
+This project was funded through the [NGI0 Entrust Fund](https://nlnet.nl/entrust), a fund established by [NLnet](https://nlnet.nl) with financial support from the European Commission's [Next Generation Internet](https://ngi.eu/) programme, under the aegis of DG Communications Networks, Content and Technology under grant agreement No 101069594.
+
+![NLnet Foundation](nlnet.png)
+![NGI0](NGI0Entrust_tag.svg)
--- a/ROADMAP.md
+++ b/ROADMAP.md
@@ -0,0 +1,95 @@
+# Roadmap 2025
+
+This is a roadmap with major features planned for Marginalia Search.
+
+It's not set in any particular order and other features will definitely 
+be implemented as well.
+
+Major goals:
+
+* Reach 1 billion pages indexed
+
+
+* Improve technical ability of indexing and search.  ~~Although this area has improved a bit, the
+  search engine is still not very good at dealing with longer queries.~~  (As of PR [#129](https://github.com/MarginaliaSearch/MarginaliaSearch/pull/129), this has improved significantly.  There is still more work to be done )
+
+## Hybridize crawler w/ Common Crawl data
+
+Sometimes Marginalia's relatively obscure crawler is blocked when attempting to crawl a website, or for
+other technical reasons it may be prevented from doing so.  A possible work-around is to hybridize the 
+crawler so that it attempts to fetch such inaccessible websites from common crawl.  This is an important 
+step on the road to 1 billion pages indexed.
+
+As a rough sketch, the crawler would identify target websites, consume CC's index, and then fetch the WARC data
+with byte range queries.  
+
+Retaining the ability to independently crawl the web is still strongly desirable so going full CC is not an option.
+
+## Safe Search
+
+The search engine has a bit of a problem showing spicy content mixed in with the results.  It would be desirable to have a way to filter this out.  It's likely something like a URL blacklist (e.g. [UT1](https://dsi.ut-capitole.fr/blacklists/index_en.php) )
+combined with naive bayesian filter would go a long way, or something more sophisticated...?
+
+## Additional Language Support
+
+It would be desirable if the search engine supported more languages than English.  This is partially about
+rooting out assumptions regarding character encoding, but there's most likely some amount of custom logic
+associated with each language added, at least a models file or two, as well as some fine tuning.
+
+It would be very helpful to find a speaker of a large language other than English to help in the fine tuning.
+
+## Support for binary formats like PDF
+
+The crawler needs to be modified to retain them, and the conversion logic needs to parse them.  
+The documents database probably should have some sort of flag indicating it's a PDF as well.
+
+PDF parsing is known to be a bit of a security liability so some thought needs to be put in
+that direction as well.
+
+## Custom ranking logic
+
+Stract does an interesting thing where they have configurable search filters.
+
+This looks like a good idea that wouldn't just help clean up the search filters on the main
+website, but might be cheap enough we might go as far as to offer a number of ad-hoc custom search
+filter for any API consumer.
+
+I've talked to the stract dev and he does not think it's a good idea to mimic their optics language, which is quite ad-hoc, but instead to work together to find some new common description language for this. 
+
+## Show favicons next to search results
+
+This is expected from search engines.  Basic proof of concept sketch of fetching this data has been done, but the feature is some way from being reality. 
+
+## Specialized crawler for github
+
+One of the search engine's biggest limitations right now is that it does not index github at all.   A specialized crawler that fetches at least the readme.md would go a long way toward providing search capabilities in this domain.
+
+# Completed
+
+## Web Design Overhaul (COMPLETED 2025-01)
+
+The design is kinda clunky and hard to maintain, and needlessly outdated-looking.  
+
+PR [#127](https://github.com/MarginaliaSearch/MarginaliaSearch/pull/127)
+
+## Finalize RSS support (COMPLETED 2024-11)
+
+Marginalia has experimental RSS preview support for a few domains.  This works well and
+it should be extended to all domains.  It would also be interesting to offer search of the
+RSS data itself, or use the RSS set to feed a special live index that updates faster than the
+main dataset. 
+
+Completed with PR [#122](https://github.com/MarginaliaSearch/MarginaliaSearch/pull/122) and PR [#125](https://github.com/MarginaliaSearch/MarginaliaSearch/pull/125)
+
+## Proper Position Index (COMPLETED 2024-09)
+
+The search engine uses a fixed width bit mask to indicate word positions.  It has the benefit
+of being very fast to evaluate and works well for what it is, but is inaccurate and has the 
+drawback of making support for quoted search terms inaccurate and largely reliant on indexing 
+word n-grams known beforehand.  This limits the ability to interpret longer queries.
+
+The positions mask should be supplemented or replaced with a more accurate (e.g.) gamma coded positions
+list, as is the civilized way of doing this.
+
+Completed with PR [#99](https://github.com/MarginaliaSearch/MarginaliaSearch/pull/99)
+
--- a/build.gradle
+++ b/build.gradle
@@ -1,62 +1,75 @@
 plugins {
    id 'java'
+    id("org.jetbrains.gradle.plugin.idea-ext") version "1.0"
+    id "me.champeau.jmh" version "0.6.6"

-    id 'com.github.johnrengelman.shadow' version '6.0.0'
+    // This is a workaround for a bug in the Jib plugin that causes it to stall randomly
+    // https://github.com/GoogleContainerTools/jib/issues/3347
+    id 'com.google.cloud.tools.jib' version '3.4.4' apply(false)
 }

-group 'nu.marginalia'
+group 'marginalia'
 version 'SNAPSHOT'
+
 compileJava.options.encoding = "UTF-8"
 compileTestJava.options.encoding = "UTF-8"
-repositories {
-    mavenLocal()
-    maven { url "https://artifactory.cronapp.io/public-release/" }
-    maven { url "https://repo1.maven.org/maven2/" }
-    maven { url "https://www2.ph.ed.ac.uk/maven2/" }
-    maven { url "https://jitpack.io/" }
-    exclusiveContent {
-        forRepository {
-            maven {
-                url = uri("https://jitpack.io")
-            }
-        }
-        filter {
-            // Only use JitPack for the `gson-record-type-adapter-factory` library
-            includeModule("com.github.Marcono1234", "gson-record-type-adapter-factory")
-        }
+
+subprojects.forEach {it ->
+    // Enable preview features for the entire project
+
+    if (it.path.contains(':code:')) {
+        sourceSets.main.java.srcDirs += file('java')
+        sourceSets.main.resources.srcDirs += file('resources')
+        sourceSets.test.java.srcDirs += file('test')
+        sourceSets.test.resources.srcDirs += file('test-resources')
    }
+
+    it.tasks.withType(JavaCompile).configureEach {
+        options.compilerArgs += ['--enable-preview']
+    }
+    it.tasks.withType(JavaExec).configureEach {
+        jvmArgs += ['--enable-preview']
+    }
+    it.tasks.withType(Test).configureEach {
+        jvmArgs += ['--enable-preview']
+    }
+
+    // Enable reproducible builds for the entire project
+    it.tasks.withType(AbstractArchiveTask).configureEach {
+        preserveFileTimestamps = false
+        reproducibleFileOrder = true
+    }
+
 }

-shadowJar {
-    zip64 true
-}
-jar {
-    manifest {
-        attributes 'Main-Class': "nu.marginalia.wmsa.configuration.ServiceDescriptor"
-    }
-    from {
-        configurations.shadow.collect { it.isDirectory() ? it : zipTree(it) }
-    }
+ext {
+    jvmVersion=23
+    dockerImageBase='container-registry.oracle.com/graalvm/jdk:23'
+    dockerImageTag='latest'
+    dockerImageRegistry='marginalia'
+    jibVersion = '3.4.4'
+
 }

+idea {
+    module {
+        // Exclude these directories from being indexed by IntelliJ
+        // as they tend to bring the IDE to its knees and use up all
+        // Inotify spots in a hurry
+        excludeDirs.add(file("$projectDir/run/node-1"))
+        excludeDirs.add(file("$projectDir/run/node-2"))
+        excludeDirs.add(file("$projectDir/run/model"))
+        excludeDirs.add(file("$projectDir/run/dist"))
+        excludeDirs.add(file("$projectDir/run/db"))
+        excludeDirs.add(file("$projectDir/run/logs"))
+        excludeDirs.add(file("$projectDir/run/data"))
+        excludeDirs.add(file("$projectDir/run/conf"))
+        excludeDirs.add(file("$projectDir/run/test-data"))
+    }
+}
 java {
    toolchain {
-        languageVersion.set(JavaLanguageVersion.of(17))
-    }
-}
-
-dependencies {
-    implementation project(':marginalia_nu')
-}
-task version() { //
-}
-
-test {
-    maxParallelForks = 16
-    forkEvery = 1
-    maxHeapSize = "8G"
-    useJUnitPlatform {
-        excludeTags "nobuild"
+        languageVersion.set(JavaLanguageVersion.of(rootProject.ext.jvmVersion))
    }
 }

--- a/code/common/config/build.gradle
+++ b/code/common/config/build.gradle
@@ -0,0 +1,41 @@
+plugins {
+    id 'java'
+
+
+    id 'jvm-test-suite'
+}
+
+java {
+    toolchain {
+        languageVersion.set(JavaLanguageVersion.of(rootProject.ext.jvmVersion))
+    }
+}
+
+apply from: "$rootProject.projectDir/srcsets.gradle"
+
+dependencies {
+    implementation project(':code:common:db')
+    implementation project(':code:common:model')
+
+    implementation libs.bundles.slf4j
+    implementation libs.bundles.mariadb
+    implementation libs.mockito
+    implementation libs.guava
+    implementation dependencies.create(libs.guice.get()) {
+        exclude group: 'com.google.guava'
+    }
+    implementation libs.gson
+
+    testImplementation libs.bundles.slf4j.test
+    testImplementation libs.bundles.junit
+
+
+    testImplementation project(':code:libraries:test-helpers')
+
+    testImplementation platform('org.testcontainers:testcontainers-bom:1.17.4')
+    testImplementation libs.commons.codec
+    testImplementation 'org.testcontainers:mariadb:1.17.4'
+    testImplementation 'org.testcontainers:junit-jupiter:1.17.4'
+    testImplementation project(':code:libraries:test-helpers')
+
+}
--- a/code/common/config/java/nu/marginalia/IndexLocations.java
+++ b/code/common/config/java/nu/marginalia/IndexLocations.java
@@ -0,0 +1,67 @@
+package nu.marginalia;
+
+import nu.marginalia.storage.FileStorageService;
+import nu.marginalia.storage.model.FileStorageBaseType;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.sql.SQLException;
+
+/** The IndexLocations class is responsible for knowledge about the locations
+ * of various important system paths.  The methods take a FileStorageService,
+ * as these paths are node-dependent.
+ */
+public class IndexLocations {
+
+    private static final Logger logger = LoggerFactory.getLogger(IndexLocations.class);
+    /** Return the path to the current link database */
+    public static Path getLinkdbLivePath(FileStorageService fileStorage) {
+        return getStorage(fileStorage, FileStorageBaseType.CURRENT, "ldbr");
+    }
+
+    /** Return the path to the next link database */
+    public static Path getLinkdbWritePath(FileStorageService fileStorage) {
+        return getStorage(fileStorage, FileStorageBaseType.CURRENT, "ldbw");
+    }
+
+    /** Return the path to the current live index */
+    public static Path getCurrentIndex(FileStorageService fileStorage) {
+        return getStorage(fileStorage, FileStorageBaseType.CURRENT, "ir");
+    }
+
+    /** Return the path to the designated index construction area */
+    public static Path getIndexConstructionArea(FileStorageService fileStorage) {
+        return getStorage(fileStorage, FileStorageBaseType.CURRENT, "iw");
+    }
+
+    /** Return the path to the search sets */
+    public static Path getSearchSetsPath(FileStorageService fileStorage) {
+        return getStorage(fileStorage, FileStorageBaseType.CURRENT, "ss");
+    }
+
+    private static Path getStorage(FileStorageService service, FileStorageBaseType baseType, String pathPart) {
+        try {
+            var base = service.getStorageBase(baseType);
+            if (base == null) {
+                throw new IllegalStateException("File storage base " + baseType + " is not configured!");
+            }
+
+            // Ensure the directory exists
+            Path ret = base.asPath().resolve(pathPart);
+            if (!Files.exists(ret)) {
+                logger.info("Creating system directory {}", ret);
+
+                Files.createDirectories(ret);
+            }
+
+            return ret;
+        }
+        catch (SQLException | IOException ex) {
+            throw new IllegalStateException("Error fetching storage " + baseType + " / " + pathPart, ex);
+        }
+    }
+
+}
--- a/code/common/config/java/nu/marginalia/LanguageModels.java
+++ b/code/common/config/java/nu/marginalia/LanguageModels.java
@@ -0,0 +1,27 @@
+package nu.marginalia;
+
+import java.nio.file.Path;
+
+public class LanguageModels {
+    public final Path termFrequencies;
+
+    public final Path openNLPSentenceDetectionData;
+    public final Path posRules;
+    public final Path posDict;
+    public final Path fasttextLanguageModel;
+    public final Path segments;
+
+    public LanguageModels(Path termFrequencies,
+                          Path openNLPSentenceDetectionData,
+                          Path posRules,
+                          Path posDict,
+                          Path fasttextLanguageModel,
+                          Path segments) {
+        this.termFrequencies = termFrequencies;
+        this.openNLPSentenceDetectionData = openNLPSentenceDetectionData;
+        this.posRules = posRules;
+        this.posDict = posDict;
+        this.fasttextLanguageModel = fasttextLanguageModel;
+        this.segments = segments;
+    }
+}
--- a/code/common/config/java/nu/marginalia/UserAgent.java
+++ b/code/common/config/java/nu/marginalia/UserAgent.java
@@ -0,0 +1,3 @@
+package nu.marginalia;
+
+public record UserAgent(String uaString, String uaIdentifier) {}
--- a/code/common/config/java/nu/marginalia/WebsiteUrl.java
+++ b/code/common/config/java/nu/marginalia/WebsiteUrl.java
@@ -0,0 +1,7 @@
+package nu.marginalia;
+
+public record WebsiteUrl(String url) {
+    public String withPath(String path) {
+        return url + path;
+    }
+}
--- a/code/common/config/java/nu/marginalia/WmsaHome.java
+++ b/code/common/config/java/nu/marginalia/WmsaHome.java
@@ -0,0 +1,117 @@
+package nu.marginalia;
+
+
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.util.Objects;
+import java.util.Optional;
+import java.util.stream.Stream;
+
+public class WmsaHome {
+    public static UserAgent getUserAgent()  {
+        return new UserAgent(
+                System.getProperty("crawler.userAgentString", "Mozilla/5.0 (compatible; Marginalia-like bot; +https://git.marginalia.nu/))"),
+                System.getProperty("crawler.userAgentIdentifier", "search.marginalia.nu")
+        );
+    }
+
+
+    public static Path getUploadDir() {
+        return Path.of(
+                System.getProperty("executor.uploadDir", "/uploads")
+        );
+    }
+
+    public static Path getHomePath() {
+        String[] possibleLocations = new String[] {
+            System.getenv("WMSA_HOME"),
+            System.getProperty("system.homePath"),
+            "/var/lib/wmsa",
+            "/wmsa"
+        };
+
+        Optional<String> retStr = Stream.of(possibleLocations)
+                .filter(Objects::nonNull)
+                .map(Path::of)
+                .filter(Files::isDirectory)
+                .map(Path::toString)
+                .findFirst();
+
+        if (retStr.isEmpty()) {
+            // Check parent directories for a fingerprint of the project's installation boilerplate
+            var prodRoot = Stream.iterate(Paths.get("").toAbsolutePath(), f -> f != null && Files.exists(f), Path::getParent)
+                    .filter(p -> Files.exists(p.resolve("conf/properties/system.properties")))
+                    .filter(p -> Files.exists(p.resolve("model/tfreq-new-algo3.bin")))
+                    .findAny();
+            if (prodRoot.isPresent()) {
+                return prodRoot.get();
+            }
+
+            // Check if we are running in a test environment by looking for fingerprints
+            // matching the base of the source tree for the project, then looking up the
+            // run directory which contains a template for the installation we can use as
+            // though it's the project root for testing purposes
+
+            var testRoot = Stream.iterate(Paths.get("").toAbsolutePath(), f -> f != null && Files.exists(f), Path::getParent)
+                    .filter(p -> Files.exists(p.resolve("run/env")))
+                    .filter(p -> Files.exists(p.resolve("run/setup.sh")))
+                    .map(p -> p.resolve("run"))
+                    .findAny();
+
+            return testRoot.orElseThrow(() -> new IllegalStateException("""
+                            Could not find $WMSA_HOME, either set environment
+                            variable, the 'system.homePath' java property,
+                            or ensure either /wmsa or /var/lib/wmsa exists
+                            """));
+        }
+
+        var ret = Path.of(retStr.get());
+
+        if (!Files.isDirectory(ret.resolve("model"))) {
+            throw new IllegalStateException("You need to run 'run/setup.sh' to download models to run/ before this will work!");
+        }
+
+        return ret;
+    }
+
+    public static Path getDataPath() {
+        return getHomePath().resolve("data");
+    }
+
+    public static Path getAdsDefinition() {
+        return getHomePath().resolve("data").resolve("adblock.txt");
+    }
+
+    public static Path getIPLocationDatabse() {
+        return getHomePath().resolve("data").resolve("IP2LOCATION-LITE-DB1.CSV");
+
+    }
+
+    public static Path getAsnMappingDatabase() {
+        return getHomePath().resolve("data").resolve("asn-data-raw-table");
+    }
+
+    public static Path getAsnInfoDatabase() {
+        return getHomePath().resolve("data").resolve("asn-used-autnums");
+    }
+
+    public static LanguageModels getLanguageModels() {
+        final Path home = getHomePath();
+
+        return new LanguageModels(
+                home.resolve("model/tfreq-new-algo3.bin"),
+                home.resolve("model/opennlp-sentence.bin"),
+                home.resolve("model/English.RDR"),
+                home.resolve("model/English.DICT"),
+                home.resolve("model/lid.176.ftz"),
+                home.resolve("model/segments.bin")
+                );
+    }
+
+    public static Path getAtagsPath() {
+        return getHomePath().resolve("data/atags.parquet");
+    }
+
+
+}
--- a/code/common/config/java/nu/marginalia/nodecfg/NodeConfigurationService.java
+++ b/code/common/config/java/nu/marginalia/nodecfg/NodeConfigurationService.java
@@ -0,0 +1,123 @@
+package nu.marginalia.nodecfg;
+
+import com.google.inject.Inject;
+import com.zaxxer.hikari.HikariDataSource;
+import nu.marginalia.nodecfg.model.NodeConfiguration;
+import nu.marginalia.nodecfg.model.NodeProfile;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.sql.SQLException;
+import java.util.ArrayList;
+import java.util.List;
+
+public class NodeConfigurationService {
+    private final Logger logger = LoggerFactory.getLogger(NodeConfigurationService.class);
+
+    private final HikariDataSource dataSource;
+
+    @Inject
+    public NodeConfigurationService(HikariDataSource dataSource) {
+        this.dataSource = dataSource;
+    }
+
+    public NodeConfiguration create(int id, String description, boolean acceptQueries, boolean keepWarcs, NodeProfile nodeProfile) throws SQLException {
+        try (var conn = dataSource.getConnection();
+             var is = conn.prepareStatement("""
+                     INSERT IGNORE INTO NODE_CONFIGURATION(ID, DESCRIPTION, ACCEPT_QUERIES, KEEP_WARCS, NODE_PROFILE) VALUES(?, ?, ?, ?, ?)
+                     """)
+        )
+        {
+            is.setInt(1, id);
+            is.setString(2, description);
+            is.setBoolean(3, acceptQueries);
+            is.setBoolean(4, keepWarcs);
+            is.setString(5, nodeProfile.name());
+
+            if (is.executeUpdate() <= 0) {
+                throw new IllegalStateException("Failed to insert configuration");
+            }
+
+            return get(id);
+        }
+    }
+
+    public List<NodeConfiguration> getAll() {
+        try (var conn = dataSource.getConnection();
+             var qs = conn.prepareStatement("""
+                     SELECT ID, DESCRIPTION, ACCEPT_QUERIES, AUTO_CLEAN, PRECESSION, KEEP_WARCS, NODE_PROFILE, DISABLED
+                     FROM NODE_CONFIGURATION
+                     """)) {
+            var rs = qs.executeQuery();
+
+            List<NodeConfiguration> ret = new ArrayList<>();
+
+            while (rs.next()) {
+                ret.add(new NodeConfiguration(
+                        rs.getInt("ID"),
+                        rs.getString("DESCRIPTION"),
+                        rs.getBoolean("ACCEPT_QUERIES"),
+                        rs.getBoolean("AUTO_CLEAN"),
+                        rs.getBoolean("PRECESSION"),
+                        rs.getBoolean("KEEP_WARCS"),
+                        NodeProfile.valueOf(rs.getString("NODE_PROFILE")),
+                        rs.getBoolean("DISABLED")
+                ));
+            }
+            return ret;
+        }
+        catch (SQLException ex) {
+            logger.warn("Failed to get node configurations", ex);
+            return List.of();
+        }
+    }
+
+    public NodeConfiguration get(int nodeId) throws SQLException {
+        try (var conn = dataSource.getConnection();
+             var qs = conn.prepareStatement("""
+                     SELECT ID, DESCRIPTION, ACCEPT_QUERIES, AUTO_CLEAN, PRECESSION, KEEP_WARCS, NODE_PROFILE, DISABLED
+                     FROM NODE_CONFIGURATION
+                     WHERE ID=?
+                     """)) {
+            qs.setInt(1, nodeId);
+            var rs = qs.executeQuery();
+            if (rs.next()) {
+                return new NodeConfiguration(
+                        rs.getInt("ID"),
+                        rs.getString("DESCRIPTION"),
+                        rs.getBoolean("ACCEPT_QUERIES"),
+                        rs.getBoolean("AUTO_CLEAN"),
+                        rs.getBoolean("PRECESSION"),
+                        rs.getBoolean("KEEP_WARCS"),
+                        NodeProfile.valueOf(rs.getString("NODE_PROFILE")),
+                        rs.getBoolean("DISABLED")
+                );
+            }
+        }
+
+        return null;
+    }
+
+    public void save(NodeConfiguration config) throws SQLException {
+        try (var conn = dataSource.getConnection();
+             var us = conn.prepareStatement("""
+                     UPDATE NODE_CONFIGURATION
+                     SET DESCRIPTION=?, ACCEPT_QUERIES=?,  AUTO_CLEAN=?, PRECESSION=?, KEEP_WARCS=?, DISABLED=?, NODE_PROFILE=?
+                     WHERE ID=?
+                     """))
+        {
+            us.setString(1, config.description());
+            us.setBoolean(2, config.acceptQueries());
+            us.setBoolean(3, config.autoClean());
+            us.setBoolean(4, config.includeInPrecession());
+            us.setBoolean(5, config.keepWarcs());
+            us.setBoolean(6, config.disabled());
+            us.setString(7, config.profile().name());
+            us.setInt(8, config.node());
+
+            if (us.executeUpdate() <= 0)
+                throw new IllegalStateException("Failed to update configuration");
+
+        }
+    }
+}
--- a/code/common/config/java/nu/marginalia/nodecfg/model/NodeConfiguration.java
+++ b/code/common/config/java/nu/marginalia/nodecfg/model/NodeConfiguration.java
@@ -0,0 +1,16 @@
+package nu.marginalia.nodecfg.model;
+
+public record NodeConfiguration(int node,
+                                String description,
+                                boolean acceptQueries,
+                                boolean autoClean,
+                                boolean includeInPrecession,
+                                boolean keepWarcs,
+                                NodeProfile profile,
+                                boolean disabled
+                                )
+{
+    public int getId() {
+        return node;
+    }
+}
--- a/code/common/config/java/nu/marginalia/nodecfg/model/NodeProfile.java
+++ b/code/common/config/java/nu/marginalia/nodecfg/model/NodeProfile.java
@@ -0,0 +1,28 @@
+package nu.marginalia.nodecfg.model;
+
+public enum NodeProfile {
+    BATCH_CRAWL,
+    REALTIME,
+    MIXED,
+    SIDELOAD;
+
+    public boolean isBatchCrawl() {
+        return this == BATCH_CRAWL;
+    }
+    public boolean isRealtime() {
+        return this == REALTIME;
+    }
+    public boolean isMixed() {
+        return this == MIXED;
+    }
+    public boolean isSideload() {
+        return this == SIDELOAD;
+    }
+
+    public boolean permitBatchCrawl() {
+        return isBatchCrawl() ||isMixed();
+    }
+    public boolean permitSideload() {
+        return isMixed() || isSideload();
+    }
+}
--- a/code/common/config/java/nu/marginalia/storage/FileStorageManifest.java
+++ b/code/common/config/java/nu/marginalia/storage/FileStorageManifest.java
@@ -0,0 +1,51 @@
+package nu.marginalia.storage;
+
+import com.google.gson.Gson;
+import nu.marginalia.model.gson.GsonFactory;
+import nu.marginalia.storage.model.FileStorage;
+import nu.marginalia.storage.model.FileStorageType;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.StandardOpenOption;
+import java.util.Optional;
+
+record FileStorageManifest(FileStorageType type, String description) {
+    private static final Gson gson = GsonFactory.get();
+    private static final String fileName = "marginalia-manifest.json";
+    private static final Logger logger = LoggerFactory.getLogger(FileStorageManifest.class);
+
+    public static Optional<FileStorageManifest> find(Path directory) {
+        Path expectedFileName = directory.resolve(fileName);
+
+        if (!Files.isRegularFile(expectedFileName) ||
+            !Files.isReadable(expectedFileName)) {
+            return Optional.empty();
+        }
+
+        try (var reader = Files.newBufferedReader(expectedFileName)) {
+            return Optional.of(gson.fromJson(reader, FileStorageManifest.class));
+        }
+        catch (Exception e) {
+            logger.warn("Failed to read manifest " + expectedFileName, e);
+            return Optional.empty();
+        }
+    }
+
+    public void write(FileStorage dir) {
+        Path expectedFileName = dir.asPath().resolve(fileName);
+
+        try (var writer = Files.newBufferedWriter(expectedFileName,
+                StandardOpenOption.CREATE,
+                StandardOpenOption.TRUNCATE_EXISTING))
+        {
+            gson.toJson(this, writer);
+        }
+        catch (Exception e) {
+            logger.warn("Failed to write manifest " + expectedFileName, e);
+        }
+    }
+
+}
--- a/code/common/config/java/nu/marginalia/storage/FileStorageService.java
+++ b/code/common/config/java/nu/marginalia/storage/FileStorageService.java
@@ -0,0 +1,582 @@
+package nu.marginalia.storage;
+
+import com.google.inject.name.Named;
+import com.zaxxer.hikari.HikariDataSource;
+import nu.marginalia.storage.model.*;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.google.inject.Inject;
+import com.google.inject.Singleton;
+
+import java.io.File;
+import java.io.IOException;
+import java.nio.file.*;
+import java.nio.file.attribute.PosixFilePermissions;
+import java.sql.SQLException;
+import java.time.LocalDateTime;
+import java.time.format.DateTimeFormatter;
+import java.util.*;
+import java.util.concurrent.ThreadLocalRandom;
+
+/** Manages file storage for processes and services
+ */
+@Singleton
+public class FileStorageService {
+    private final HikariDataSource dataSource;
+    private final int node;
+    private final Logger logger = LoggerFactory.getLogger(FileStorageService.class);
+
+    private static final DateTimeFormatter dirNameDatePattern = DateTimeFormatter.ofPattern("__uu-MM-dd'T'HH_mm_ss.SSS"); // filesystem safe ISO8601
+
+    @Inject
+    public FileStorageService(HikariDataSource dataSource,
+                              @Named("wmsa-system-node") Integer node) {
+        this.dataSource = dataSource;
+        this.node = node;
+
+        logger.info("Resolving file storage root into {}", resolveStoragePath("/").toAbsolutePath());
+    }
+
+    /** Resolve a storage path from a relative path, injecting the system configured storage root
+     * if set */
+    public static Path resolveStoragePath(String path) {
+        if (path.startsWith("/")) {
+            // Since Path.of("ANYTHING").resolve("/foo") = "/foo", we need to strip
+            // the leading slash
+            return resolveStoragePath(path.substring(1));
+        }
+
+        return Path
+                .of(System.getProperty("storage.root", "/"))
+                .resolve(path);
+    }
+
+    /** @return the storage base with the given id, or null if it does not exist */
+    public FileStorageBase getStorageBase(FileStorageBaseId id) throws SQLException  {
+        try (var conn = dataSource.getConnection();
+             var stmt = conn.prepareStatement("""
+                     SELECT ID, NAME, NODE, PATH, TYPE
+                     FROM FILE_STORAGE_BASE WHERE ID = ?
+                     """)) {
+            stmt.setLong(1, id.id());
+            try (var rs = stmt.executeQuery()) {
+                if (rs.next()) {
+                    return new FileStorageBase(
+                            new FileStorageBaseId(rs.getLong("ID")),
+                            FileStorageBaseType.valueOf(rs.getString("TYPE")),
+                            rs.getInt("NODE"),
+                            rs.getString("NAME"),
+                            rs.getString("PATH")
+                    );
+                }
+            }
+        }
+        return null;
+    }
+
+    public void synchronizeStorageManifests(FileStorageBase base) {
+        Set<String> ignoredPaths = new HashSet<>();
+
+        try (var conn = dataSource.getConnection();
+             var stmt = conn.prepareStatement("""
+                SELECT FILE_STORAGE.PATH
+                FROM FILE_STORAGE INNER JOIN FILE_STORAGE_BASE
+                ON BASE_ID = FILE_STORAGE_BASE.ID
+                WHERE BASE_ID = ?
+                AND NODE = ?
+                """)) {
+
+            stmt.setLong(1, base.id().id());
+            stmt.setInt(2, node);
+
+            var rs = stmt.executeQuery();
+            while (rs.next()) {
+                ignoredPaths.add(rs.getString(1));
+            }
+        } catch (SQLException e) {
+            throw new RuntimeException(e);
+        }
+
+        File basePathFile = base.asPath().toFile();
+        File[] files = basePathFile.listFiles(pathname -> pathname.isDirectory() && !ignoredPaths.contains(pathname.getName()));
+        if (files == null) return;
+        for (File file : files) {
+            var maybeManifest = FileStorageManifest.find(file.toPath());
+            if (maybeManifest.isEmpty()) continue;
+            var manifest = maybeManifest.get();
+
+            logger.info("Discovered new file storage: " + file.getName() + " (" + manifest.type() + ")");
+
+            try (var conn = dataSource.getConnection();
+                 var stmt = conn.prepareStatement("""
+                    INSERT INTO FILE_STORAGE(BASE_ID, PATH, TYPE, DESCRIPTION)
+                    VALUES (?, ?, ?, ?)
+                    """)) {
+                stmt.setLong(1, base.id().id());
+                stmt.setString(2, file.getName());
+                stmt.setString(3, manifest.type().name());
+                stmt.setString(4, manifest.description());
+                stmt.execute();
+                conn.commit();
+
+            } catch (SQLException e) {
+                throw new RuntimeException(e);
+            }
+        }
+    }
+
+
+    public void relateFileStorages(FileStorageId source, FileStorageId target) {
+        try (var conn = dataSource.getConnection();
+             var stmt = conn.prepareStatement("""
+                INSERT INTO FILE_STORAGE_RELATION(SOURCE_ID, TARGET_ID) VALUES (?, ?)
+                """)) {
+            stmt.setLong(1, source.id());
+            stmt.setLong(2, target.id());
+            stmt.executeUpdate();
+        } catch (SQLException e) {
+            throw new RuntimeException(e);
+        }
+    }
+
+    public List<FileStorage> getSourceFromStorage(FileStorage storage) throws SQLException {
+        try (var conn = dataSource.getConnection();
+             var stmt = conn.prepareStatement("""
+                     SELECT SOURCE_ID FROM FILE_STORAGE_RELATION WHERE TARGET_ID = ?
+                     """)) {
+            stmt.setLong(1, storage.id().id());
+            var rs = stmt.executeQuery();
+            List<FileStorage> ret = new ArrayList<>();
+            while (rs.next()) {
+                ret.add(getStorage(new FileStorageId(rs.getLong(1))));
+            }
+            return ret;
+        }
+    }
+
+    /** @return the storage base with the given type, or null if it does not exist */
+    public FileStorageBase getStorageBase(FileStorageBaseType type) throws SQLException {
+        return getStorageBase(type, node);
+    }
+
+    public FileStorageBase getStorageBase(FileStorageBaseType type, int node) throws SQLException {
+        try (var conn = dataSource.getConnection();
+             var stmt = conn.prepareStatement("""
+                     SELECT ID, NAME, NODE, PATH, TYPE
+                     FROM FILE_STORAGE_BASE WHERE TYPE = ? AND NODE = ?
+                     """)) {
+            stmt.setString(1, type.name());
+            stmt.setInt(2, node);
+            try (var rs = stmt.executeQuery()) {
+                if (rs.next()) {
+                    return new FileStorageBase(
+                            new FileStorageBaseId(rs.getLong("ID")),
+                            FileStorageBaseType.valueOf(rs.getString("TYPE")),
+                            rs.getInt("NODE"),
+                            rs.getString("NAME"),
+                            rs.getString("PATH")
+                    );
+                }
+            }
+        }
+        return null;
+    }
+
+    public FileStorageBase createStorageBase(String name, Path path, FileStorageBaseType type) throws SQLException {
+        return createStorageBase(name, path, node, type);
+    }
+
+    public FileStorageBase createStorageBase(String name, Path path, int node, FileStorageBaseType type) throws SQLException {
+
+        try (var conn = dataSource.getConnection();
+             var stmt = conn.prepareStatement("""
+                     INSERT INTO FILE_STORAGE_BASE(NAME, PATH, TYPE, NODE)
+                     VALUES (?, ?, ?, ?)
+                     """)) {
+            stmt.setString(1, name);
+            stmt.setString(2, path.toString());
+            stmt.setString(3, type.name());
+            stmt.setInt(4, node);
+
+            int update = stmt.executeUpdate();
+            if (update < 0) {
+                throw new SQLException("Failed to create storage base");
+            }
+        }
+
+        return getStorageBase(type);
+    }
+
+    private Path allocateDirectory(Path basePath, String prefix) throws IOException {
+        LocalDateTime now = LocalDateTime.now();
+        String timestampPart = now.format(dirNameDatePattern);
+        Path maybePath = basePath.resolve(prefix + timestampPart);
+
+        try {
+            Files.createDirectory(maybePath,
+                    PosixFilePermissions.asFileAttribute(PosixFilePermissions.fromString("rwxr-xr-x"))
+            );
+        }
+        catch (FileAlreadyExistsException ex) {
+            // in case of a race condition, try again with some random cruft at the end
+            maybePath = basePath.resolve(prefix + timestampPart + "_" + Long.toHexString(ThreadLocalRandom.current().nextLong()));
+
+            Files.createDirectory(maybePath,
+                    PosixFilePermissions.asFileAttribute(PosixFilePermissions.fromString("rwxr-xr-x"))
+            );
+        }
+
+        // Ensure umask didn't mess with the access permissions
+        Files.setPosixFilePermissions(maybePath, PosixFilePermissions.fromString("rwxr-xr-x"));
+
+        return maybePath;
+    }
+
+    /** Allocate a storage area of the given type */
+    public FileStorage allocateStorage(FileStorageType type,
+                                       String prefix,
+                                       String description) throws IOException, SQLException
+    {
+        var base = getStorageBase(FileStorageBaseType.forFileStorageType(type));
+
+        if (null == base)
+            throw new IllegalStateException("No storage base for type " + type + " on node " + node);
+
+        Path newDir = allocateDirectory(base.asPath(), prefix);
+
+        String relDir = base.asPath().relativize(newDir).normalize().toString();
+
+        try (var conn = dataSource.getConnection();
+             var insert = conn.prepareStatement("""
+                INSERT INTO FILE_STORAGE(PATH, TYPE, DESCRIPTION, BASE_ID)
+                VALUES (?, ?, ?, ?)
+                """);
+             var query = conn.prepareStatement("""
+                SELECT ID FROM FILE_STORAGE WHERE PATH = ? AND BASE_ID = ?
+                """)
+             ) {
+            insert.setString(1, relDir);
+            insert.setString(2, type.name());
+            insert.setString(3, description);
+            insert.setLong(4, base.id().id());
+
+            if (insert.executeUpdate() < 1) {
+                throw new SQLException("Failed to insert storage");
+            }
+
+
+            query.setString(1, relDir);
+            query.setLong(2, base.id().id());
+            var rs = query.executeQuery();
+
+            if (rs.next()) {
+                var storage = getStorage(new FileStorageId(rs.getLong("ID")));
+
+                // Write a manifest file so we can pick this up later without needing to insert it into DB
+                // (e.g. when loading from outside the system)
+                var manifest = new FileStorageManifest(type, description);
+                manifest.write(storage);
+
+                return storage;
+            }
+
+        }
+
+        throw new SQLException("Failed to insert storage");
+    }
+
+
+    public FileStorage getStorageByType(FileStorageType type) throws SQLException {
+        try (var conn = dataSource.getConnection();
+             var stmt = conn.prepareStatement("""
+                     SELECT PATH, STATE, DESCRIPTION, ID, BASE_ID, CREATE_DATE
+                     FROM FILE_STORAGE_VIEW WHERE TYPE = ? AND NODE = ?
+                     """)) {
+            stmt.setString(1, type.name());
+            stmt.setInt(2, node);
+
+            long storageId;
+            long baseId;
+            String path;
+            String state;
+            String description;
+            LocalDateTime createDateTime;
+
+            try (var rs = stmt.executeQuery()) {
+                if (rs.next()) {
+                    baseId = rs.getLong("BASE_ID");
+                    storageId = rs.getLong("ID");
+                    createDateTime = rs.getTimestamp("CREATE_DATE").toLocalDateTime();
+                    path = rs.getString("PATH");
+                    state = rs.getString("STATE");
+                    description = rs.getString("DESCRIPTION");
+                }
+                else {
+                    return null;
+                }
+
+                var base = getStorageBase(new FileStorageBaseId(baseId));
+
+                return new FileStorage(
+                        new FileStorageId(storageId),
+                        base,
+                        type,
+                        createDateTime,
+                        path,
+                        FileStorageState.parse(state),
+                        description
+                );
+            }
+        }
+    }
+
+    public List<FileStorage> getStorage(List<FileStorageId> ids) throws SQLException {
+        List<FileStorage> ret = new ArrayList<>();
+        for (var id : ids) {
+            var storage = getStorage(id);
+            if (storage == null) continue;
+            ret.add(storage);
+        }
+        return ret;
+    }
+
+    /** @return the storage with the given id, or null if it does not exist */
+    public FileStorage getStorage(FileStorageId id) throws SQLException {
+
+        try (var conn = dataSource.getConnection();
+             var stmt = conn.prepareStatement("""
+                     SELECT PATH, TYPE, STATE, DESCRIPTION, CREATE_DATE, ID, BASE_ID
+                     FROM FILE_STORAGE_VIEW WHERE ID = ?
+                     """)) {
+            stmt.setLong(1, id.id());
+
+            long storageId;
+            long baseId;
+            String path;
+            String state;
+            String description;
+            FileStorageType type;
+            LocalDateTime createDateTime;
+
+            try (var rs = stmt.executeQuery()) {
+                if (rs.next()) {
+                    baseId = rs.getLong("BASE_ID");
+                    storageId = rs.getLong("ID");
+                    type = FileStorageType.valueOf(rs.getString("TYPE"));
+                    path = rs.getString("PATH");
+                    state = rs.getString("STATE");
+                    description = rs.getString("DESCRIPTION");
+                    createDateTime = rs.getTimestamp("CREATE_DATE").toLocalDateTime();
+                }
+                else {
+                    return null;
+                }
+
+                var base = getStorageBase(new FileStorageBaseId(baseId));
+
+                return new FileStorage(
+                        new FileStorageId(storageId),
+                        base,
+                        type,
+                        createDateTime,
+                        path,
+                        FileStorageState.parse(state),
+                        description
+                );
+            }
+        }
+    }
+
+    public void deregisterFileStorage(FileStorageId id) throws SQLException {
+        try (var conn = dataSource.getConnection();
+             var stmt = conn.prepareStatement("""
+                     DELETE FROM FILE_STORAGE WHERE ID = ?
+                     """)) {
+            stmt.setLong(1, id.id());
+            stmt.executeUpdate();
+        }
+    }
+
+    public List<FileStorage> getEachFileStorage() {
+        List<FileStorage> ret = new ArrayList<>();
+        try (var conn = dataSource.getConnection();
+             var stmt = conn.prepareStatement("""
+                     SELECT PATH, STATE, TYPE, DESCRIPTION, CREATE_DATE, ID, BASE_ID
+                     FROM FILE_STORAGE_VIEW
+                     WHERE NODE=?
+                     """)) {
+
+            stmt.setInt(1, node);
+
+            long storageId;
+            long baseId;
+            String path;
+            String state;
+            String description;
+            LocalDateTime createDateTime;
+            FileStorageType type;
+
+            try (var rs = stmt.executeQuery()) {
+                while (rs.next()) {
+                    baseId = rs.getLong("BASE_ID");
+                    storageId = rs.getLong("ID");
+                    path = rs.getString("PATH");
+                    state = rs.getString("STATE");
+
+                    try {
+                        type = FileStorageType.valueOf(rs.getString("TYPE"));
+                    }
+                    catch (IllegalArgumentException ex) {
+                        logger.warn("Illegal file storage type {} in db", rs.getString("TYPE"));
+                        continue;
+                    }
+
+                    description = rs.getString("DESCRIPTION");
+                    createDateTime = rs.getTimestamp("CREATE_DATE").toLocalDateTime();
+                    var base = getStorageBase(new FileStorageBaseId(baseId));
+
+                    ret.add(new FileStorage(
+                            new FileStorageId(storageId),
+                            base,
+                            type,
+                            createDateTime,
+                            path,
+                            FileStorageState.parse(state),
+                            description
+                    ));
+                }
+            }
+        } catch (SQLException e) {
+            e.printStackTrace();
+        }
+
+        return ret;
+    }
+
+    public List<FileStorage> getEachFileStorage(FileStorageType type) {
+        return getEachFileStorage(node, type);
+    }
+
+    public List<FileStorage> getEachFileStorage(int node, FileStorageType type) {
+        List<FileStorage> ret = new ArrayList<>();
+        try (var conn = dataSource.getConnection();
+             var stmt = conn.prepareStatement("""
+                     SELECT PATH, STATE, TYPE, DESCRIPTION, CREATE_DATE, ID, BASE_ID
+                     FROM FILE_STORAGE_VIEW
+                     WHERE NODE=? AND TYPE=?
+                     """)) {
+
+            stmt.setInt(1, node);
+            stmt.setString(2, type.name());
+
+            long storageId;
+            long baseId;
+            String path;
+            String state;
+            String description;
+            LocalDateTime createDateTime;
+
+            try (var rs = stmt.executeQuery()) {
+                while (rs.next()) {
+                    baseId = rs.getLong("BASE_ID");
+                    storageId = rs.getLong("ID");
+                    path = rs.getString("PATH");
+                    state = rs.getString("STATE");
+
+                    description = rs.getString("DESCRIPTION");
+                    createDateTime = rs.getTimestamp("CREATE_DATE").toLocalDateTime();
+                    var base = getStorageBase(new FileStorageBaseId(baseId));
+
+                    ret.add(new FileStorage(
+                            new FileStorageId(storageId),
+                            base,
+                            type,
+                            createDateTime,
+                            path,
+                            FileStorageState.parse(state),
+                            description
+                    ));
+                }
+            }
+        } catch (SQLException e) {
+            e.printStackTrace();
+        }
+
+        return ret;
+    }
+    public void flagFileForDeletion(FileStorageId id) throws SQLException {
+        setFileStorageState(id, FileStorageState.DELETE);
+    }
+
+    public void enableFileStorage(FileStorageId id) throws SQLException {
+        setFileStorageState(id, FileStorageState.ACTIVE);
+    }
+    public void disableFileStorage(FileStorageId id) throws SQLException {
+        setFileStorageState(id, FileStorageState.UNSET);
+    }
+
+    public void setFileStorageState(FileStorageId id, FileStorageState state) throws SQLException {
+        try (var conn = dataSource.getConnection();
+             var flagStmt = conn.prepareStatement("UPDATE FILE_STORAGE SET STATE = ? WHERE ID = ?")) {
+            String value = state == FileStorageState.UNSET ? "" : state.name();
+            flagStmt.setString(1, value);
+            flagStmt.setLong(2, id.id());
+            flagStmt.executeUpdate();
+        }
+    }
+
+    public void disableFileStorageOfType(int nodeId, FileStorageType type) throws SQLException {
+        try (var conn = dataSource.getConnection();
+             var flagStmt = conn.prepareStatement("""
+                UPDATE FILE_STORAGE
+                INNER JOIN FILE_STORAGE_BASE ON BASE_ID=FILE_STORAGE_BASE.ID
+                SET FILE_STORAGE.STATE = ''
+                WHERE FILE_STORAGE.TYPE = ?
+                AND FILE_STORAGE.TYPE = 'ACTIVE'
+                AND FILE_STORAGE_BASE.NODE=?
+                """)) {
+            flagStmt.setString(1, type.name());
+            flagStmt.setInt(2, nodeId);
+            flagStmt.executeUpdate();
+        }
+    }
+
+    public List<FileStorageId> getActiveFileStorages(FileStorageType type) throws SQLException {
+        return getActiveFileStorages(node, type);
+    }
+    public Optional<FileStorageId> getOnlyActiveFileStorage(FileStorageType type) throws SQLException {
+        return getOnlyActiveFileStorage(node, type);
+    }
+
+    public Optional<FileStorageId> getOnlyActiveFileStorage(int nodeId, FileStorageType type) throws SQLException {
+        var storages = getActiveFileStorages(nodeId, type);
+        if (storages.size() > 1) {
+            throw new IllegalStateException("Expected [0,1] instances of FileStorage with type " + type + ", found " + storages.size());
+        }
+        return storages.stream().findFirst();
+    }
+
+    public List<FileStorageId> getActiveFileStorages(int nodeId, FileStorageType type) throws SQLException
+    {
+
+        try (var conn = dataSource.getConnection();
+             var queryStmt = conn.prepareStatement("""
+                SELECT FILE_STORAGE.ID FROM FILE_STORAGE
+                INNER JOIN FILE_STORAGE_BASE ON BASE_ID=FILE_STORAGE_BASE.ID
+                WHERE FILE_STORAGE.TYPE = ?
+                AND STATE='ACTIVE'
+                AND FILE_STORAGE_BASE.NODE=?
+                """)) {
+            queryStmt.setString(1, type.name());
+            queryStmt.setInt(2, nodeId);
+            var rs = queryStmt.executeQuery();
+            List<FileStorageId> ids = new ArrayList<>();
+            while (rs.next()) {
+                ids.add(new FileStorageId(rs.getInt(1)));
+            }
+            return ids;
+        }
+    }
+
+}
--- a/code/common/config/java/nu/marginalia/storage/model/FileStorage.java
+++ b/code/common/config/java/nu/marginalia/storage/model/FileStorage.java
@@ -0,0 +1,80 @@
+package nu.marginalia.storage.model;
+
+import nu.marginalia.storage.FileStorageService;
+
+import java.nio.file.Path;
+import java.time.LocalDateTime;
+import java.time.format.DateTimeFormatter;
+import java.util.Objects;
+
+/**
+ * Represents a file storage area
+ *
+ * @param id  the id of the storage in the database
+ * @param base the base of the storage
+ * @param type the type of data expected
+ * @param path the full path of the storage on disk
+ * @param description a description of the storage
+ */
+public record FileStorage (
+        FileStorageId id,
+        FileStorageBase base,
+        FileStorageType type,
+        LocalDateTime createDateTime,
+        String path,
+        FileStorageState state,
+        String description)
+{
+
+    public int node() {
+        return base.node();
+    }
+
+    public Path asPath() {
+        return FileStorageService.resolveStoragePath(path);
+    }
+
+
+    public boolean isActive() {
+        return FileStorageState.ACTIVE.equals(state);
+    }
+    public boolean isNoState() {
+        return FileStorageState.UNSET.equals(state);
+    }
+    public boolean isDelete() {
+        return FileStorageState.DELETE.equals(state);
+    }
+    public boolean isNew() {
+        return FileStorageState.NEW.equals(state);
+    }
+    @Override
+    public boolean equals(Object o) {
+        if (this == o) return true;
+        if (o == null || getClass() != o.getClass()) return false;
+
+        FileStorage that = (FileStorage) o;
+
+        // Exclude timestamp as it may different due to how the objects
+        // are constructed
+
+        if (!Objects.equals(id, that.id)) return false;
+        if (!Objects.equals(base, that.base)) return false;
+        if (type != that.type) return false;
+        if (!Objects.equals(path, that.path)) return false;
+        return Objects.equals(description, that.description);
+    }
+
+    @Override
+    public int hashCode() {
+        int result = id != null ? id.hashCode() : 0;
+        result = 31 * result + (base != null ? base.hashCode() : 0);
+        result = 31 * result + (type != null ? type.hashCode() : 0);
+        result = 31 * result + (path != null ? path.hashCode() : 0);
+        result = 31 * result + (description != null ? description.hashCode() : 0);
+        return result;
+    }
+
+    public String date() {
+        return createDateTime.format(DateTimeFormatter.ISO_LOCAL_DATE_TIME);
+    }
+}
--- a/code/common/config/java/nu/marginalia/storage/model/FileStorageBase.java
+++ b/code/common/config/java/nu/marginalia/storage/model/FileStorageBase.java
@@ -0,0 +1,30 @@
+package nu.marginalia.storage.model;
+
+import nu.marginalia.storage.FileStorageService;
+
+import java.nio.file.Path;
+
+/**
+ * Represents a file storage base directory
+ *
+ * @param id  the id of the storage base in the database
+ * @param type  the type of the storage base
+ * @param name  the name of the storage base
+ * @param path  the path of the storage base
+ */
+public record FileStorageBase(FileStorageBaseId id,
+                              FileStorageBaseType type,
+                              int node,
+                              String name,
+                              String path
+                              ) {
+
+    public Path asPath() {
+        return FileStorageService.resolveStoragePath(path);
+    }
+
+    public boolean isValid() {
+        return id.id() >= 0;
+    }
+
+}
--- a/code/common/config/java/nu/marginalia/storage/model/FileStorageBaseId.java
+++ b/code/common/config/java/nu/marginalia/storage/model/FileStorageBaseId.java
@@ -0,0 +1,8 @@
+package nu.marginalia.storage.model;
+
+public record FileStorageBaseId(long id) {
+
+    public String toString() {
+        return Long.toString(id);
+    }
+}
--- a/code/common/config/java/nu/marginalia/storage/model/FileStorageBaseType.java
+++ b/code/common/config/java/nu/marginalia/storage/model/FileStorageBaseType.java
@@ -0,0 +1,17 @@
+package nu.marginalia.storage.model;
+
+public enum FileStorageBaseType {
+    CURRENT,
+    WORK,
+    STORAGE,
+    BACKUP;
+
+
+    public static FileStorageBaseType forFileStorageType(FileStorageType type) {
+        return switch (type) {
+            case EXPORT, CRAWL_DATA, PROCESSED_DATA, CRAWL_SPEC -> STORAGE;
+            case BACKUP -> BACKUP;
+        };
+    }
+
+}
--- a/code/common/config/java/nu/marginalia/storage/model/FileStorageId.java
+++ b/code/common/config/java/nu/marginalia/storage/model/FileStorageId.java
@@ -0,0 +1,14 @@
+package nu.marginalia.storage.model;
+
+public record FileStorageId(long id) {
+    public static FileStorageId parse(String str) {
+        return new FileStorageId(Long.parseLong(str));
+    }
+    public static FileStorageId of(long storageId) {
+        return new FileStorageId(storageId);
+    }
+
+    public String toString() {
+        return Long.toString(id);
+    }
+}
--- a/code/common/config/java/nu/marginalia/storage/model/FileStorageState.java
+++ b/code/common/config/java/nu/marginalia/storage/model/FileStorageState.java
@@ -0,0 +1,15 @@
+package nu.marginalia.storage.model;
+
+public enum FileStorageState {
+    UNSET,
+    NEW,
+    ACTIVE,
+    DELETE;
+
+    public static FileStorageState parse(String value) {
+        if ("".equals(value)) {
+            return UNSET;
+        }
+        return valueOf(value);
+    }
+}
--- a/code/common/config/java/nu/marginalia/storage/model/FileStorageType.java
+++ b/code/common/config/java/nu/marginalia/storage/model/FileStorageType.java
@@ -0,0 +1,11 @@
+package nu.marginalia.storage.model;
+
+public enum FileStorageType {
+    @Deprecated
+    CRAWL_SPEC, //
+
+    CRAWL_DATA,
+    PROCESSED_DATA,
+    BACKUP,
+    EXPORT;
+}
--- a/code/common/config/readme.md
+++ b/code/common/config/readme.md
@@ -0,0 +1,3 @@
+# Config
+
+This package contains configuration injectables used by the services.
--- a/code/common/config/test/nu/marginalia/nodecfg/NodeConfigurationServiceTest.java
+++ b/code/common/config/test/nu/marginalia/nodecfg/NodeConfigurationServiceTest.java
@@ -0,0 +1,67 @@
+package nu.marginalia.nodecfg;
+
+import com.zaxxer.hikari.HikariConfig;
+import com.zaxxer.hikari.HikariDataSource;
+import nu.marginalia.nodecfg.model.NodeProfile;
+import nu.marginalia.test.TestMigrationLoader;
+import org.junit.jupiter.api.BeforeAll;
+import org.junit.jupiter.api.Tag;
+import org.junit.jupiter.api.Test;
+import org.junit.jupiter.api.parallel.Execution;
+import org.junit.jupiter.api.parallel.ExecutionMode;
+import org.testcontainers.containers.MariaDBContainer;
+import org.testcontainers.junit.jupiter.Container;
+import org.testcontainers.junit.jupiter.Testcontainers;
+
+import java.sql.SQLException;
+
+import static org.junit.jupiter.api.Assertions.*;
+
+@Testcontainers
+@Execution(ExecutionMode.SAME_THREAD)
+@Tag("slow")
+public class NodeConfigurationServiceTest {
+    @Container
+    static MariaDBContainer<?> mariaDBContainer = new MariaDBContainer<>("mariadb")
+            .withDatabaseName("WMSA_prod")
+            .withUsername("wmsa")
+            .withPassword("wmsa")
+            .withNetworkAliases("mariadb");
+
+    static HikariDataSource dataSource;
+    static NodeConfigurationService nodeConfigurationService;
+
+    @BeforeAll
+    public static void setup() {
+        HikariConfig config = new HikariConfig();
+        config.setJdbcUrl(mariaDBContainer.getJdbcUrl());
+        config.setUsername("wmsa");
+        config.setPassword("wmsa");
+
+        dataSource = new HikariDataSource(config);
+
+        TestMigrationLoader.flywayMigration(dataSource);
+
+        nodeConfigurationService = new NodeConfigurationService(dataSource);
+    }
+
+    @Test
+    public void test() throws SQLException {
+        var a = nodeConfigurationService.create(1, "Test", false, false, NodeProfile.MIXED);
+        var b = nodeConfigurationService.create(2, "Foo", true, false, NodeProfile.MIXED);
+
+        assertEquals(1, a.node());
+        assertEquals("Test", a.description());
+        assertFalse(a.acceptQueries());
+
+        assertEquals(2, b.node());
+        assertEquals("Foo", b.description());
+        assertTrue(b.acceptQueries());
+
+        var list = nodeConfigurationService.getAll();
+        assertEquals(2, list.size());
+        assertEquals(a, list.get(0));
+        assertEquals(b, list.get(1));
+
+    }
+}
--- a/code/common/config/test/nu/marginalia/storage/FileStorageServiceTest.java
+++ b/code/common/config/test/nu/marginalia/storage/FileStorageServiceTest.java
@@ -0,0 +1,162 @@
+package nu.marginalia.storage;
+
+import com.google.common.collect.Lists;
+import com.zaxxer.hikari.HikariConfig;
+import com.zaxxer.hikari.HikariDataSource;
+import nu.marginalia.storage.model.FileStorage;
+import nu.marginalia.storage.model.FileStorageBase;
+import nu.marginalia.storage.model.FileStorageBaseType;
+import nu.marginalia.storage.model.FileStorageType;
+import nu.marginalia.test.TestMigrationLoader;
+import org.junit.jupiter.api.*;
+import org.junit.jupiter.api.parallel.Execution;
+import org.junit.jupiter.api.parallel.ExecutionMode;
+import org.testcontainers.containers.MariaDBContainer;
+import org.testcontainers.junit.jupiter.Container;
+import org.testcontainers.junit.jupiter.Testcontainers;
+
+import java.io.IOException;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.sql.SQLException;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.UUID;
+
+@Testcontainers
+@Execution(ExecutionMode.SAME_THREAD)
+@Tag("slow")
+public class FileStorageServiceTest {
+    @Container
+    static MariaDBContainer<?> mariaDBContainer = new MariaDBContainer<>("mariadb")
+            .withDatabaseName("WMSA_prod")
+            .withUsername("wmsa")
+            .withPassword("wmsa")
+            .withNetworkAliases("mariadb");
+
+    static HikariDataSource dataSource;
+    static FileStorageService fileStorageService;
+
+    static List<Path> tempDirs = new ArrayList<>();
+
+    @BeforeAll
+    public static void setup() {
+        HikariConfig config = new HikariConfig();
+        config.setJdbcUrl(mariaDBContainer.getJdbcUrl());
+        config.setUsername("wmsa");
+        config.setPassword("wmsa");
+
+        dataSource = new HikariDataSource(config);
+
+        TestMigrationLoader.flywayMigration(dataSource);
+    }
+
+
+    @BeforeEach
+    public void setupEach() {
+        fileStorageService = new FileStorageService(dataSource, 0);
+    }
+
+    @AfterEach
+    public void tearDownEach() {
+        try (var conn = dataSource.getConnection();
+             var stmt = conn.createStatement()) {
+            stmt.execute("DELETE FROM FILE_STORAGE");
+            stmt.execute("DELETE FROM FILE_STORAGE_BASE");
+        } catch (SQLException e) {
+            throw new RuntimeException(e);
+        }
+    }
+
+    @AfterAll
+    public static void teardown() {
+        dataSource.close();
+
+        Lists.reverse(tempDirs).forEach(path -> {
+            try {
+                System.out.println("Deleting " + path);
+                Files.delete(path);
+            } catch (IOException e) {
+                e.printStackTrace();
+            }
+        });
+    }
+
+    private Path createTempDir() {
+        try {
+            Path dir = Files.createTempDirectory("file-storage-test");
+            tempDirs.add(dir);
+            return dir;
+        } catch (IOException e) {
+            throw new RuntimeException(e);
+        }
+
+    }
+
+    @Test
+    public void testPathOverride() {
+        try {
+            System.setProperty("storage.root", "/tmp");
+
+            var path = new FileStorageBase(null, null, 0, null, "test").asPath();
+            Assertions.assertEquals(Path.of("/tmp/test"), path);
+        }
+        finally {
+            System.clearProperty("storage.root");
+        }
+    }
+    @Test
+    public void testPathOverride3() {
+        try {
+            System.setProperty("storage.root", "/tmp");
+
+            var path = new FileStorageBase(null, null, 0, null, "/test").asPath();
+            Assertions.assertEquals(Path.of("/tmp/test"), path);
+        }
+        finally {
+            System.clearProperty("storage.root");
+        }
+    }
+    @Test
+    public void testPathOverride2() {
+        try {
+            System.setProperty("storage.root", "/tmp");
+
+            var path = new FileStorage(null, null, null, null, "test", null, null).asPath();
+
+            Assertions.assertEquals(Path.of("/tmp/test"), path);
+        }
+        finally {
+            System.clearProperty("storage.root");
+        }
+    }
+
+    @Test
+    public void testCreateBase() throws SQLException {
+        String name = "test-" + UUID.randomUUID();
+
+        var storage = new FileStorageService(dataSource, 0);
+        var base = storage.createStorageBase(name, createTempDir(), FileStorageBaseType.WORK);
+
+        Assertions.assertEquals(name, base.name());
+        Assertions.assertEquals(FileStorageBaseType.WORK, base.type());
+    }
+
+    @Test
+    public void testAllocateTemp() throws IOException, SQLException {
+        String name = "test-" + UUID.randomUUID();
+
+        // ensure a base exists
+        var base = fileStorageService.createStorageBase(name, createTempDir(), FileStorageBaseType.STORAGE);
+        tempDirs.add(base.asPath());
+
+        var storage = new FileStorageService(dataSource, 0);
+
+        var fileStorage = storage.allocateStorage(FileStorageType.CRAWL_DATA, "xyz", "thisShouldSucceed");
+        System.out.println("Allocated " + fileStorage.asPath());
+        Assertions.assertTrue(Files.exists(fileStorage.asPath()));
+        tempDirs.add(fileStorage.asPath());
+    }
+
+
+}
--- a/code/common/db/build.gradle
+++ b/code/common/db/build.gradle
@@ -0,0 +1,72 @@
+
+buildscript {
+    repositories {
+        mavenCentral()
+    }
+    dependencies {
+        classpath 'org.flywaydb:flyway-mysql:10.0.1'
+    }
+}
+
+plugins {
+    id 'java'
+
+    id 'jvm-test-suite'
+    id "org.flywaydb.flyway" version "10.0.1"
+}
+
+java {
+    toolchain {
+        languageVersion.set(JavaLanguageVersion.of(rootProject.ext.jvmVersion))
+    }
+}
+
+
+configurations {
+    flywayMigration.extendsFrom(implementation)
+}
+
+apply from: "$rootProject.projectDir/srcsets.gradle"
+
+dependencies {
+    implementation project(':code:common:model')
+
+    implementation libs.bundles.slf4j
+
+    implementation libs.guava
+    implementation dependencies.create(libs.guice.get()) {
+        exclude group: 'com.google.guava'
+    }
+    implementation libs.bundles.gson
+
+    implementation libs.notnull
+
+    implementation libs.commons.lang3
+
+    implementation libs.trove
+
+    implementation libs.bundles.mariadb
+    flywayMigration 'org.flywaydb:flyway-mysql:10.0.1'
+
+    testImplementation libs.bundles.slf4j.test
+    testImplementation libs.bundles.junit
+    testImplementation libs.mockito
+
+
+    testImplementation platform('org.testcontainers:testcontainers-bom:1.17.4')
+    testImplementation libs.commons.codec
+    testImplementation 'org.testcontainers:mariadb:1.17.4'
+    testImplementation 'org.testcontainers:junit-jupiter:1.17.4'
+    testImplementation project(':code:libraries:test-helpers')
+}
+
+flyway {
+    url = 'jdbc:mariadb://localhost:3306/WMSA_prod'
+    user = 'wmsa'
+    password = 'wmsa'
+    schemas = ['WMSA_prod']
+    configurations = [ 'compileClasspath', 'flywayMigration' ]
+    locations = ['filesystem:src/main/resources/db/migration']
+    cleanDisabled = false
+}
+
--- a/code/common/db/java/nu/marginalia/db/DbDomainQueries.java
+++ b/code/common/db/java/nu/marginalia/db/DbDomainQueries.java
@@ -0,0 +1,148 @@
+package nu.marginalia.db;
+
+
+import com.google.common.cache.Cache;
+import com.google.common.cache.CacheBuilder;
+import com.google.common.util.concurrent.UncheckedExecutionException;
+import com.google.inject.Inject;
+import com.google.inject.Singleton;
+import com.zaxxer.hikari.HikariDataSource;
+import nu.marginalia.model.EdgeDomain;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.sql.SQLException;
+import java.util.*;
+import java.util.concurrent.ExecutionException;
+
+@Singleton
+public class DbDomainQueries {
+    private final HikariDataSource dataSource;
+
+    private static final Logger logger = LoggerFactory.getLogger(DbDomainQueries.class);
+
+    private final Cache<EdgeDomain, Integer> domainIdCache = CacheBuilder.newBuilder().maximumSize(10_000).build();
+    private final Cache<Integer, EdgeDomain> domainNameCache = CacheBuilder.newBuilder().maximumSize(10_000).build();
+    private final Cache<String, List<DomainWithNode>> siblingsCache = CacheBuilder.newBuilder().maximumSize(10_000).build();
+
+    @Inject
+    public DbDomainQueries(HikariDataSource dataSource)
+    {
+        this.dataSource = dataSource;
+    }
+
+
+    public Integer getDomainId(EdgeDomain domain) throws NoSuchElementException {
+        try {
+            return domainIdCache.get(domain, () -> {
+                try (var connection = dataSource.getConnection();
+                     var stmt = connection.prepareStatement("SELECT ID FROM EC_DOMAIN WHERE DOMAIN_NAME=?")) {
+
+                    stmt.setString(1, domain.toString());
+                    var rsp = stmt.executeQuery();
+                    if (rsp.next()) {
+                        return rsp.getInt(1);
+                    }
+                }
+                catch (SQLException ex) {
+                    throw new RuntimeException(ex);
+                }
+
+                throw new NoSuchElementException();
+            });
+        }
+        catch (UncheckedExecutionException ex) {
+            throw new NoSuchElementException();
+        }
+        catch (ExecutionException ex) {
+            throw new RuntimeException(ex.getCause());
+        }
+    }
+
+    public OptionalInt tryGetDomainId(EdgeDomain domain) {
+
+        Integer maybeId = domainIdCache.getIfPresent(domain);
+        if (maybeId != null) {
+            return OptionalInt.of(maybeId);
+        }
+
+        try (var connection = dataSource.getConnection()) {
+
+            try (var stmt = connection.prepareStatement("SELECT ID FROM EC_DOMAIN WHERE DOMAIN_NAME=?")) {
+                stmt.setString(1, domain.toString());
+                var rsp = stmt.executeQuery();
+                if (rsp.next()) {
+                    var id = rsp.getInt(1);
+
+                    domainIdCache.put(domain, id);
+                    return OptionalInt.of(id);
+                }
+            }
+            return OptionalInt.empty();
+        }
+        catch (UncheckedExecutionException ex) {
+            throw new RuntimeException(ex.getCause());
+        }
+        catch (SQLException ex) {
+            throw new RuntimeException(ex);
+        }
+    }
+
+    public Optional<EdgeDomain> getDomain(int id) {
+
+        EdgeDomain existing = domainNameCache.getIfPresent(id);
+        if (existing != null) {
+            return Optional.of(existing);
+        }
+
+        try (var connection = dataSource.getConnection()) {
+            try (var stmt = connection.prepareStatement("SELECT DOMAIN_NAME FROM EC_DOMAIN WHERE ID=?")) {
+                stmt.setInt(1, id);
+                var rsp = stmt.executeQuery();
+                if (rsp.next()) {
+                    var val = new EdgeDomain(rsp.getString(1));
+                    domainNameCache.put(id, val);
+                    return Optional.of(val);
+                }
+                return Optional.empty();
+            }
+        }
+        catch (SQLException ex) {
+            throw new RuntimeException(ex);
+        }
+    }
+
+    public List<DomainWithNode> otherSubdomains(EdgeDomain domain, int cnt) throws ExecutionException {
+        String topDomain = domain.topDomain;
+
+        return siblingsCache.get(topDomain, () -> {
+            List<DomainWithNode> ret = new ArrayList<>();
+
+            try (var conn = dataSource.getConnection();
+                 var stmt = conn.prepareStatement("SELECT DOMAIN_NAME, NODE_AFFINITY FROM EC_DOMAIN WHERE DOMAIN_TOP = ? LIMIT ?")) {
+                stmt.setString(1, topDomain);
+                stmt.setInt(2, cnt);
+
+                var rs = stmt.executeQuery();
+                while (rs.next()) {
+                    var sibling = new EdgeDomain(rs.getString(1));
+
+                    if (sibling.equals(domain))
+                        continue;
+
+                    ret.add(new DomainWithNode(sibling, rs.getInt(2)));
+                }
+            } catch (SQLException e) {
+                logger.error("Failed to get domain neighbors");
+            }
+            return ret;
+        });
+
+    }
+
+    public record DomainWithNode (EdgeDomain domain, int nodeAffinity) {
+        public boolean isIndexed() {
+            return nodeAffinity > 0;
+        }
+    }
+}
--- a/code/common/db/java/nu/marginalia/db/DomainBlacklist.java
+++ b/code/common/db/java/nu/marginalia/db/DomainBlacklist.java
@@ -0,0 +1,13 @@
+package nu.marginalia.db;
+
+import com.google.inject.ImplementedBy;
+import gnu.trove.set.hash.TIntHashSet;
+
+@ImplementedBy(DomainBlacklistImpl.class)
+public interface DomainBlacklist {
+    boolean isBlacklisted(int domainId);
+    default TIntHashSet getSpamDomains() {
+        return new TIntHashSet();
+    }
+    void waitUntilLoaded() throws InterruptedException;
+}
--- a/code/common/db/java/nu/marginalia/db/DomainBlacklistImpl.java
+++ b/code/common/db/java/nu/marginalia/db/DomainBlacklistImpl.java
@@ -0,0 +1,126 @@
+package nu.marginalia.db;
+
+import com.google.inject.Inject;
+import com.google.inject.Singleton;
+import com.zaxxer.hikari.HikariDataSource;
+import gnu.trove.set.hash.TIntHashSet;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.sql.SQLException;
+import java.util.concurrent.TimeUnit;
+
+@Singleton
+public class DomainBlacklistImpl implements DomainBlacklist {
+    private final boolean blacklistDisabled = Boolean.getBoolean("blacklist.disable");
+
+    private final HikariDataSource dataSource;
+    private final Logger logger = LoggerFactory.getLogger(getClass());
+
+
+    private volatile TIntHashSet spamDomainSet = new TIntHashSet();
+    private volatile boolean isLoaded = false;
+
+    @Inject
+    public DomainBlacklistImpl(HikariDataSource dataSource) {
+        this.dataSource = dataSource;
+
+        Thread.ofPlatform().daemon().name("BlacklistUpdater").start(this::updateSpamList);
+    }
+
+    private void updateSpamList() {
+        // If the blacklist is disabled, we don't need to do anything
+        if (blacklistDisabled) {
+            isLoaded = true;
+
+            flagLoaded();
+
+            return;
+        }
+
+        for (;;) {
+            spamDomainSet = getSpamDomains();
+
+            // Set the flag to true after the first loading attempt, regardless of success,
+            // to avoid deadlocking threads that are waiting for this condition
+            flagLoaded();
+
+            // Sleep for 10 minutes before trying again
+            try {
+                TimeUnit.MINUTES.sleep(10);
+            }
+            catch (InterruptedException ex) {
+                break;
+            }
+        }
+
+    }
+
+    private void flagLoaded() {
+        if (!isLoaded) {
+            synchronized (this) {
+                isLoaded = true;
+                notifyAll();
+            }
+        }
+    }
+
+
+
+    /** Block until the blacklist has been loaded */
+    @Override
+    public void waitUntilLoaded() throws InterruptedException {
+        if (blacklistDisabled)
+            return;
+
+        if (!isLoaded) {
+            logger.info("Waiting for blacklist to be loaded");
+            synchronized (this) {
+                while (!isLoaded) {
+                    wait(5000);
+                }
+            }
+            logger.info("Blacklist loaded, size = {}", spamDomainSet.size());
+        }
+    }
+
+    public TIntHashSet getSpamDomains() {
+        final TIntHashSet result = new TIntHashSet(1_000_000);
+
+        if (blacklistDisabled) {
+            return result;
+        }
+
+        try (var connection = dataSource.getConnection()) {
+            try (var stmt = connection.prepareStatement("""
+                    SELECT EC_DOMAIN.ID 
+                    FROM EC_DOMAIN 
+                    INNER JOIN EC_DOMAIN_BLACKLIST 
+                    ON (EC_DOMAIN_BLACKLIST.URL_DOMAIN = EC_DOMAIN.DOMAIN_TOP 
+                     OR EC_DOMAIN_BLACKLIST.URL_DOMAIN = EC_DOMAIN.DOMAIN_NAME)
+                 """))
+            {
+                stmt.setFetchSize(1000);
+                var rsp = stmt.executeQuery();
+                while (rsp.next()) {
+                    result.add(rsp.getInt(1));
+                }
+            }
+        } catch (SQLException ex) {
+            logger.error("Failed to load spam domain list", ex);
+        }
+
+
+        return result;
+    }
+
+    @Override
+    public boolean isBlacklisted(int domainId) {
+
+        if (spamDomainSet.contains(domainId)) {
+            return true;
+        }
+
+        return false;
+    }
+}
--- a/code/common/db/java/nu/marginalia/db/DomainRankingSetsService.java
+++ b/code/common/db/java/nu/marginalia/db/DomainRankingSetsService.java
@@ -0,0 +1,162 @@
+package nu.marginalia.db;
+
+import com.google.inject.Inject;
+import com.zaxxer.hikari.HikariDataSource;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.nio.file.Path;
+import java.sql.SQLException;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+import java.util.Optional;
+
+public class DomainRankingSetsService {
+    private static final Logger logger = LoggerFactory.getLogger(DomainRankingSetsService.class);
+    private final HikariDataSource dataSource;
+
+    @Inject
+    public DomainRankingSetsService(HikariDataSource dataSource) {
+        this.dataSource = dataSource;
+    }
+
+    public Optional<DomainRankingSet> get(String name) throws SQLException {
+        try (var conn = dataSource.getConnection();
+             var stmt = conn.prepareStatement("""
+                     SELECT NAME, DESCRIPTION, DEPTH, DEFINITION
+                     FROM CONF_DOMAIN_RANKING_SET
+                     WHERE NAME = ?
+                     """)) {
+            stmt.setString(1, name);
+            var rs = stmt.executeQuery();
+
+            if (!rs.next()) {
+                return Optional.empty();
+            }
+
+            return Optional.of(new DomainRankingSet(
+                    rs.getString("NAME"),
+                    rs.getString("DESCRIPTION"),
+                    rs.getInt("DEPTH"),
+                    rs.getString("DEFINITION")
+            ));
+        }
+        catch (SQLException ex) {
+            logger.error("Failed to get domain set", ex);
+            return Optional.empty();
+        }
+    }
+
+    public void upsert(DomainRankingSet domainRankingSet) {
+        try (var conn = dataSource.getConnection();
+             var stmt = conn.prepareStatement("""
+                        REPLACE INTO CONF_DOMAIN_RANKING_SET(NAME, DESCRIPTION, DEPTH, DEFINITION)
+                        VALUES (?, ?, ?, ?)
+                        """))
+        {
+            stmt.setString(1, domainRankingSet.name());
+            stmt.setString(2, domainRankingSet.description());
+            stmt.setInt(3, domainRankingSet.depth());
+            stmt.setString(4, domainRankingSet.definition());
+            stmt.executeUpdate();
+
+            if (!conn.getAutoCommit())
+                conn.commit();
+        }
+        catch (SQLException ex) {
+            logger.error("Failed to update domain set", ex);
+        }
+    }
+
+    public void delete(DomainRankingSet domainRankingSet) {
+        try (var conn = dataSource.getConnection();
+             var stmt = conn.prepareStatement("""
+                        DELETE FROM CONF_DOMAIN_RANKING_SET
+                        WHERE NAME = ?
+                        """))
+        {
+            stmt.setString(1, domainRankingSet.name());
+            stmt.executeUpdate();
+
+            if (!conn.getAutoCommit())
+                conn.commit();
+        }
+        catch (SQLException ex) {
+            logger.error("Failed to delete domain set", ex);
+        }
+    }
+
+    public List<DomainRankingSet> getAll() {
+
+        try (var conn = dataSource.getConnection();
+             var stmt = conn.prepareStatement("""
+                     SELECT NAME, DESCRIPTION, DEPTH, DEFINITION
+                     FROM CONF_DOMAIN_RANKING_SET
+                     """)) {
+            var rs = stmt.executeQuery();
+            List<DomainRankingSet> ret = new ArrayList<>();
+
+            while (rs.next()) {
+                ret.add(
+                    new DomainRankingSet(
+                        rs.getString("NAME"),
+                        rs.getString("DESCRIPTION"),
+                        rs.getInt("DEPTH"),
+                        rs.getString("DEFINITION"))
+                );
+            }
+            return ret;
+        }
+        catch (SQLException ex) {
+            logger.error("Failed to get domain set", ex);
+            return List.of();
+        }
+    }
+
+    /**
+     * Defines a domain ranking set, parameters for the ranking algorithms.
+     *
+     * @param name        Key and name of the set
+     * @param description Human-readable description
+     * @param depth       Depth of the algorithm
+     * @param definition  Definition of the set, typically a list of domains or globs for domain-names
+     */
+    public record DomainRankingSet(String name,
+                                   String description,
+                                   int depth,
+                                   String definition) {
+
+        public Path fileName(Path base) {
+            return base.resolve(name().toLowerCase() + ".dat");
+        }
+
+        public String[] domains() {
+            return Arrays.stream(definition().split("\n+"))
+                    .map(String::trim)
+                    .filter(s -> !s.isBlank())
+                    .filter(s -> !s.startsWith("#"))
+                    .toArray(String[]::new);
+        }
+
+        public boolean isSpecial() {
+            return name().equals("BLOGS") || name().equals("NONE") || name().equals("RANK");
+        }
+
+        public DomainRankingSet withName(String name) {
+            return this.name == name ? this : new DomainRankingSet(name, description, depth, definition);
+        }
+
+        public DomainRankingSet withDescription(String description) {
+            return this.description == description ? this : new DomainRankingSet(name, description, depth, definition);
+        }
+
+        public DomainRankingSet withDepth(int depth) {
+            return this.depth == depth ? this : new DomainRankingSet(name, description, depth, definition);
+        }
+
+        public DomainRankingSet withDefinition(String definition) {
+            return this.definition == definition ? this : new DomainRankingSet(name, description, depth, definition);
+        }
+    }
+}
--- a/code/common/db/java/nu/marginalia/db/DomainTypes.java
+++ b/code/common/db/java/nu/marginalia/db/DomainTypes.java
@@ -0,0 +1,217 @@
+package nu.marginalia.db;
+
+import com.zaxxer.hikari.HikariDataSource;
+import gnu.trove.list.TIntList;
+import gnu.trove.list.array.TIntArrayList;
+import org.slf4j.LoggerFactory;
+import org.slf4j.Logger;
+
+import com.google.inject.Inject;
+import com.google.inject.Singleton;
+import java.io.BufferedReader;
+import java.io.IOException;
+import java.io.InputStreamReader;
+import java.net.URL;
+import java.sql.SQLException;
+import java.util.ArrayList;
+import java.util.List;
+
+/** A list of domains that are known to be of a certain type */
+@Singleton
+public class DomainTypes {
+
+    public enum Type {
+        BLOG,
+        CRAWL,
+        TEST
+    }
+
+    private final Logger logger = LoggerFactory.getLogger(DomainTypes.class);
+
+    private final HikariDataSource dataSource;
+
+    @Inject
+    public DomainTypes(HikariDataSource dataSource) {
+        this.dataSource = dataSource;
+    }
+
+    public String getUrlForSelection(Type type) {
+        try (var conn = dataSource.getConnection();
+             var qs = conn.prepareStatement("SELECT SOURCE FROM DOMAIN_SELECTION_TYPE WHERE NAME = ?"))
+        {
+            qs.setString(1, type.name());
+            var rs = qs.executeQuery();
+            if (rs.next()) {
+                return rs.getString("SOURCE");
+            }
+        }
+        catch (SQLException ex) {
+            ex.printStackTrace();
+        }
+
+        return "";
+    }
+
+    public void updateUrlForSelection(Type type, String newValue) throws SQLException {
+        try (var conn = dataSource.getConnection();
+             var us = conn.prepareStatement("REPLACE INTO DOMAIN_SELECTION_TYPE(NAME, SOURCE) VALUES (?, ?)")) {
+            us.setString(1, type.name());
+            us.setString(2, newValue);
+            us.executeUpdate();
+        }
+    }
+
+    /** Get all domains of a certain type, including domains that are not in the EC_DOMAIN table */
+    public List<String> getAllDomainsByType(Type type) {
+        List<String> ret = new ArrayList<>();
+
+        try (var conn = dataSource.getConnection();
+             var stmt = conn.prepareStatement("""
+                                SELECT DOMAIN_NAME
+                                FROM DOMAIN_SELECTION INNER JOIN DOMAIN_SELECTION_TYPE ON DOMAIN_TYPE_ID = DOMAIN_SELECTION_TYPE.ID
+                                WHERE DOMAIN_SELECTION_TYPE.NAME = ?
+                                """))
+        {
+            stmt.setString(1, type.name());
+            var rs = stmt.executeQuery();
+            while (rs.next()) {
+                ret.add(rs.getString(1));
+            }
+        }
+        catch (SQLException ex) {
+            throw new RuntimeException(ex);
+        }
+
+        return ret;
+    }
+
+    /** Retrieve the domain id of all domains of a certain type,
+     * ignoring entries that are not in the EC_DOMAIN table */
+    public TIntList getKnownDomainsByType(Type type) {
+        TIntList ret = new TIntArrayList();
+
+        try (var conn = dataSource.getConnection();
+             var stmt = conn.prepareStatement("""
+                            SELECT EC_DOMAIN.ID
+                            FROM DOMAIN_SELECTION
+                            INNER JOIN DOMAIN_SELECTION_TYPE ON DOMAIN_TYPE_ID = DOMAIN_SELECTION_TYPE.ID
+                            INNER JOIN EC_DOMAIN ON DOMAIN_SELECTION.DOMAIN_NAME = EC_DOMAIN.DOMAIN_NAME
+                            WHERE DOMAIN_SELECTION_TYPE.NAME = ?
+                            """))
+        {
+            stmt.setString(1, type.name());
+            var rs = stmt.executeQuery();
+            while (rs.next()) {
+                ret.add(rs.getInt(1));
+            }
+        }
+        catch (SQLException ex) {
+            throw new RuntimeException(ex);
+        }
+
+        return ret;
+    }
+
+    /** Reload the list of domains of a certain type from the source */
+    public void reloadDomainsList(Type type) throws IOException, SQLException {
+        try (var conn = dataSource.getConnection();
+             var stmt = conn.prepareStatement("""
+                            SELECT SOURCE, ID FROM DOMAIN_SELECTION_TYPE WHERE NAME = ?
+                            """);
+             var deleteStatement = conn.prepareStatement("""
+                            DELETE FROM DOMAIN_SELECTION WHERE DOMAIN_TYPE_ID = ?
+                            """);
+             var insertStatement = conn.prepareStatement("""
+                            INSERT IGNORE INTO DOMAIN_SELECTION (DOMAIN_NAME, DOMAIN_TYPE_ID) VALUES (?, ?)
+                            """)
+             )
+        {
+            stmt.setString(1, type.name());
+            var rsp = stmt.executeQuery();
+
+            if (!rsp.next()) {
+                throw new RuntimeException("No such domain selection type: " + type);
+            }
+
+            var source = rsp.getString(1);
+            int typeId = rsp.getInt(2);
+
+            List<String> downloadDomains = downloadDomainsList(source);
+
+            try {
+                conn.setAutoCommit(false);
+                deleteStatement.setInt(1, typeId);
+                deleteStatement.executeUpdate();
+
+                for (String domain : downloadDomains) {
+                    insertStatement.setString(1, domain);
+                    insertStatement.setInt(2, typeId);
+                    insertStatement.executeUpdate();
+                    // Could use batch insert here, but this executes infrequently, so it's not worth the hassle
+                }
+
+                conn.commit();
+            }
+            catch (SQLException ex) {
+                conn.rollback();
+                throw ex;
+            }
+            finally {
+                conn.setAutoCommit(true);
+            }
+        }
+    }
+
+    public List<String> downloadList(Type type) throws IOException {
+        var url = getUrlForSelection(type);
+        if (url.isBlank())
+            return List.of();
+        return downloadDomainsList(url);
+    }
+
+
+    private List<String> downloadDomainsList(String source) throws IOException {
+        if (source.isBlank())
+            return List.of();
+
+        List<String> ret = new ArrayList<>();
+
+        logger.info("Downloading domain list from {}", source);
+
+        try (var br = new BufferedReader(new InputStreamReader(new URL(source).openStream()))) {
+            String line;
+
+            while ((line = br.readLine()) != null) {
+                line = cleanDomainListLine(line);
+
+
+                if (isValidDomainListEntry(line))
+                    ret.add(line);
+            }
+        }
+
+        logger.info("-- found {}", ret.size());
+
+
+        return ret;
+    }
+
+    private String cleanDomainListLine(String line) {
+        line = line.trim();
+
+        int hashIdx = line.indexOf('#');
+        if (hashIdx >= 0)
+            line = line.substring(0, hashIdx).trim();
+
+        return line;
+    }
+
+    private boolean isValidDomainListEntry(String line) {
+        if (line.isBlank())
+            return false;
+        if (!line.matches("[a-z0-9\\-.]+"))
+            return false;
+
+        return true;
+    }
+}
--- a/code/common/db/readme.md
+++ b/code/common/db/readme.md
@@ -0,0 +1,31 @@
+# DB
+
+This module primarily contains SQL files for the URLs database. The most central tables are `EC_DOMAIN`, `EC_URL` and `EC_PAGE_DATA`.
+
+## Flyway
+
+The system uses flyway to track database changes and allow easy migrations, this is accessible via gradle tasks.
+
+* `flywayMigrate`
+* `flywayBaseline`
+* `flywayRepair`
+* `flywayClean` (dangerous as in wipes your entire database)
+
+Refer to the [Flyway documentation](https://documentation.red-gate.com/fd/flyway-documentation-138346877.html) for guidance.
+It's well documented and these are probably the only four tasks you'll ever need.
+
+If you are not running the system via docker, you need to provide alternative connection details than
+the defaults (TODO: how?).
+
+The migration files are in [resources/db/migration](resources/db/migration).  The file name convention
+incorporates the project's cal-ver versioning; and are applied in lexicographical order.
+
+    VYY_MM_v_nnn__description.sql
+
+## Central Paths
+
+* [migrations](resources/db/migration) - Flyway migrations
+
+## See Also 
+
+* [common/service](../service) implements DatabaseModule, which is from where the services get database connections.
--- a/code/common/db/resources/db/migration/V23_06_0_000__base.sql
+++ b/code/common/db/resources/db/migration/V23_06_0_000__base.sql
@@ -0,0 +1,144 @@
+
+CREATE TABLE IF NOT EXISTS EC_DOMAIN (
+    ID INT PRIMARY KEY AUTO_INCREMENT,
+
+    DOMAIN_NAME VARCHAR(255) UNIQUE NOT NULL,
+    DOMAIN_TOP VARCHAR(255) NOT NULL,
+
+    INDEXED INT DEFAULT 0 NOT NULL COMMENT "~number of documents visited / 100",
+    STATE ENUM('ACTIVE', 'EXHAUSTED', 'SPECIAL', 'SOCIAL_MEDIA', 'BLOCKED', 'REDIR', 'ERROR', 'UNKNOWN') NOT NULL DEFAULT 'active' COMMENT "@see EdgeDomainIndexingState",
+
+    RANK DOUBLE,
+    DOMAIN_ALIAS INTEGER,
+    IP VARCHAR(48),
+
+    INDEX_DATE TIMESTAMP DEFAULT NOW(),
+    DISCOVER_DATE TIMESTAMP DEFAULT NOW(),
+
+    IS_ALIVE BOOLEAN AS (STATE='ACTIVE' OR STATE='EXHAUSTED' OR STATE='SPECIAL' OR STATE='SOCIAL_MEDIA') VIRTUAL
+)
+CHARACTER SET utf8mb4
+COLLATE utf8mb4_unicode_ci;
+
+
+CREATE TABLE IF NOT EXISTS EC_URL (
+    ID INT PRIMARY KEY AUTO_INCREMENT,
+    DOMAIN_ID INT NOT NULL,
+
+    PROTO ENUM('http','https','gemini') NOT NULL COLLATE utf8mb4_unicode_ci,
+    PATH VARCHAR(255) NOT NULL,
+    PORT INT,
+    PARAM VARCHAR(255),
+
+    PATH_HASH BIGINT NOT NULL COMMENT "Hash of PATH for uniqueness check by domain",
+
+    VISITED BOOLEAN NOT NULL DEFAULT FALSE,
+
+    STATE ENUM('ok', 'redirect', 'dead', 'archived', 'disqualified') NOT NULL DEFAULT 'ok' COLLATE utf8mb4_unicode_ci,
+
+    CONSTRAINT CONS UNIQUE (DOMAIN_ID, PATH_HASH),
+    FOREIGN KEY (DOMAIN_ID) REFERENCES EC_DOMAIN(ID) ON DELETE CASCADE
+)
+CHARACTER SET utf8mb4
+COLLATE utf8mb4_bin;
+
+CREATE TABLE IF NOT EXISTS EC_PAGE_DATA (
+    ID INT PRIMARY KEY AUTO_INCREMENT,
+
+    TITLE VARCHAR(255) NOT NULL,
+    DESCRIPTION VARCHAR(255) NOT NULL,
+
+    WORDS_TOTAL INTEGER NOT NULL,
+    FORMAT ENUM('PLAIN', 'UNKNOWN', 'HTML123', 'HTML4', 'XHTML', 'HTML5', 'MARKDOWN') NOT NULL,
+    FEATURES INT COMMENT "Bit-encoded feature set of document, @see HtmlFeature" NOT NULL,
+
+    DATA_HASH BIGINT NOT NULL,
+    QUALITY DOUBLE NOT NULL,
+
+    PUB_YEAR SMALLINT,
+
+    FOREIGN KEY (ID) REFERENCES EC_URL(ID) ON DELETE CASCADE
+)
+CHARACTER SET utf8mb4
+COLLATE utf8mb4_unicode_ci;
+
+CREATE TABLE IF NOT EXISTS EC_DOMAIN_LINK (
+    ID INT PRIMARY KEY AUTO_INCREMENT,
+    SOURCE_DOMAIN_ID INT NOT NULL,
+    DEST_DOMAIN_ID INT NOT NULL,
+
+    CONSTRAINT CONS UNIQUE (SOURCE_DOMAIN_ID, DEST_DOMAIN_ID),
+
+    FOREIGN KEY (SOURCE_DOMAIN_ID) REFERENCES EC_DOMAIN(ID) ON DELETE CASCADE,
+    FOREIGN KEY (DEST_DOMAIN_ID) REFERENCES EC_DOMAIN(ID) ON DELETE CASCADE
+);
+
+CREATE TABLE IF NOT EXISTS DOMAIN_METADATA (
+    ID INT PRIMARY KEY,
+    KNOWN_URLS INT DEFAULT 0,
+    VISITED_URLS INT DEFAULT 0,
+    GOOD_URLS INT DEFAULT 0,
+
+    FOREIGN KEY (ID) REFERENCES EC_DOMAIN(ID) ON DELETE CASCADE
+);
+
+CREATE TABLE EC_FEED_URL (
+    URL VARCHAR(255) PRIMARY KEY,
+    DOMAIN_ID INT,
+
+    FOREIGN KEY (DOMAIN_ID) REFERENCES EC_DOMAIN(ID) ON DELETE CASCADE
+)
+CHARACTER SET utf8mb4
+COLLATE utf8mb4_unicode_ci;
+
+CREATE OR REPLACE VIEW EC_URL_VIEW AS
+    SELECT
+        CONCAT(EC_URL.PROTO,
+               '://',
+               EC_DOMAIN.DOMAIN_NAME,
+               IF(EC_URL.PORT IS NULL, '', CONCAT(':', EC_URL.PORT)),
+               EC_URL.PATH,
+               IF(EC_URL.PARAM IS NULL, '', CONCAT('?', EC_URL.PARAM))
+               ) AS URL,
+        EC_URL.PATH_HASH AS PATH_HASH,
+        EC_URL.PATH AS PATH,
+        EC_DOMAIN.DOMAIN_NAME AS DOMAIN_NAME,
+        EC_DOMAIN.DOMAIN_TOP AS DOMAIN_TOP,
+        EC_URL.ID AS ID,
+        EC_DOMAIN.ID AS DOMAIN_ID,
+        EC_URL.VISITED AS VISITED,
+        EC_PAGE_DATA.QUALITY AS QUALITY,
+        EC_PAGE_DATA.DATA_HASH AS DATA_HASH,
+        EC_PAGE_DATA.TITLE AS TITLE,
+        EC_PAGE_DATA.DESCRIPTION AS DESCRIPTION,
+        EC_PAGE_DATA.WORDS_TOTAL AS WORDS_TOTAL,
+        EC_PAGE_DATA.FORMAT AS FORMAT,
+        EC_PAGE_DATA.FEATURES AS FEATURES,
+        EC_DOMAIN.IP AS IP,
+        EC_URL.STATE AS STATE,
+        EC_DOMAIN.RANK AS RANK,
+        EC_DOMAIN.STATE AS DOMAIN_STATE
+    FROM EC_URL
+    LEFT JOIN EC_PAGE_DATA
+        ON EC_PAGE_DATA.ID = EC_URL.ID
+    INNER JOIN EC_DOMAIN
+        ON EC_URL.DOMAIN_ID = EC_DOMAIN.ID;
+
+
+CREATE OR REPLACE VIEW EC_RELATED_LINKS_VIEW AS
+    SELECT
+        SOURCE_DOMAIN_ID,
+        SOURCE_DOMAIN.DOMAIN_NAME AS SOURCE_DOMAIN,
+        SOURCE_DOMAIN.DOMAIN_TOP AS SOURCE_TOP_DOMAIN,
+        DEST_DOMAIN_ID,
+        DEST_DOMAIN.DOMAIN_NAME AS DEST_DOMAIN,
+        DEST_DOMAIN.DOMAIN_TOP AS DEST_TOP_DOMAIN
+    FROM EC_DOMAIN_LINK
+    INNER JOIN EC_DOMAIN AS SOURCE_DOMAIN
+        ON SOURCE_DOMAIN.ID=SOURCE_DOMAIN_ID
+    INNER JOIN EC_DOMAIN AS DEST_DOMAIN
+        ON DEST_DOMAIN.ID=DEST_DOMAIN_ID
+    ;
+
+CREATE INDEX IF NOT EXISTS EC_DOMAIN_INDEXED_INDEX ON EC_DOMAIN (INDEXED);
+CREATE INDEX IF NOT EXISTS EC_DOMAIN_TOP_DOMAIN ON EC_DOMAIN (DOMAIN_TOP);
--- a/code/common/db/resources/db/migration/V23_06_0_001__blacklist.sql
+++ b/code/common/db/resources/db/migration/V23_06_0_001__blacklist.sql
@@ -0,0 +1,8 @@
+
+CREATE TABLE IF NOT EXISTS EC_DOMAIN_BLACKLIST (
+    ID INT PRIMARY KEY AUTO_INCREMENT,
+    URL_DOMAIN VARCHAR(255) UNIQUE NOT NULL,
+    COMMENT VARCHAR(255) DEFAULT NULL
+)
+CHARACTER SET utf8mb4
+COLLATE utf8mb4_unicode_ci;
--- a/code/common/db/resources/db/migration/V23_06_0_002__dictionary.sql
+++ b/code/common/db/resources/db/migration/V23_06_0_002__dictionary.sql
@@ -0,0 +1,19 @@
+
+CREATE TABLE IF NOT EXISTS REF_DICTIONARY (
+    TYPE VARCHAR(16),
+    WORD VARCHAR(255),
+    DEFINITION VARCHAR(255)
+)
+CHARACTER SET utf8mb4
+COLLATE utf8mb4_unicode_ci;
+
+CREATE TABLE IF NOT EXISTS REF_WIKI_ARTICLE (
+    NAME VARCHAR(255) PRIMARY KEY,
+    REF_NAME VARCHAR(255) COMMENT "If this is a redirect, it redirects to this REF_WIKI_ARTICLE.NAME",
+    ENTRY LONGBLOB
+)
+ROW_FORMAT=DYNAMIC
+CHARACTER SET utf8mb4
+COLLATE utf8mb4_unicode_ci;
+
+CREATE INDEX IF NOT EXISTS REF_DICTIONARY_WORD ON REF_DICTIONARY (WORD);
--- a/code/common/db/resources/db/migration/V23_06_0_003__crawl-queue.sql
+++ b/code/common/db/resources/db/migration/V23_06_0_003__crawl-queue.sql
@@ -0,0 +1,5 @@
+
+CREATE TABLE CRAWL_QUEUE(
+    DOMAIN_NAME VARCHAR(255) UNIQUE,
+    SOURCE VARCHAR(255)
+) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
--- a/code/common/db/resources/db/migration/V23_06_0_004__screenshot.sql
+++ b/code/common/db/resources/db/migration/V23_06_0_004__screenshot.sql
@@ -0,0 +1,13 @@
+CREATE TABLE IF NOT EXISTS DATA_DOMAIN_SCREENSHOT (
+  DOMAIN_NAME VARCHAR(255) PRIMARY KEY,
+  CONTENT_TYPE ENUM ('image/png', 'image/webp', 'image/svg+xml') NOT NULL,
+  DATA LONGBLOB NOT NULL
+)
+ROW_FORMAT=DYNAMIC
+CHARACTER SET utf8mb4
+COLLATE utf8mb4_unicode_ci;
+
+CREATE TABLE DATA_DOMAIN_HISTORY (
+    DOMAIN_NAME VARCHAR(255) PRIMARY KEY,
+    SCREENSHOT_DATE DATE DEFAULT NOW()
+) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
--- a/code/common/db/resources/db/migration/V23_06_0_005__domain_complaint.sql
+++ b/code/common/db/resources/db/migration/V23_06_0_005__domain_complaint.sql
@@ -0,0 +1,15 @@
+CREATE TABLE DOMAIN_COMPLAINT(
+    ID INT PRIMARY KEY AUTO_INCREMENT,
+    DOMAIN_ID INT NOT NULL,
+
+    CATEGORY VARCHAR(255) NOT NULL,
+    DESCRIPTION TEXT,
+    SAMPLE VARCHAR(255),
+    FILE_DATE TIMESTAMP NOT NULL DEFAULT NOW(),
+
+    REVIEWED BOOLEAN AS (REVIEW_DATE > 0) VIRTUAL,
+    DECISION VARCHAR(255),
+    REVIEW_DATE TIMESTAMP,
+
+    FOREIGN KEY (DOMAIN_ID) REFERENCES EC_DOMAIN(ID) ON DELETE CASCADE
+) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
--- a/code/common/db/resources/db/migration/V23_06_0_006__api_key.sql
+++ b/code/common/db/resources/db/migration/V23_06_0_006__api_key.sql
@@ -0,0 +1,7 @@
+CREATE TABLE IF NOT EXISTS EC_API_KEY (
+    LICENSE_KEY VARCHAR(255) UNIQUE,
+    LICENSE VARCHAR(255) NOT NULL,
+    NAME VARCHAR(255) NOT NULL,
+    EMAIL VARCHAR(255) NOT NULL,
+    RATE INT DEFAULT 10
+);
--- a/code/common/db/resources/db/migration/V23_06_0_007__neighbors.sql
+++ b/code/common/db/resources/db/migration/V23_06_0_007__neighbors.sql
@@ -0,0 +1,34 @@
+
+CREATE TABLE EC_DOMAIN_NEIGHBORS (
+    ID INT PRIMARY KEY AUTO_INCREMENT,
+    DOMAIN_ID INT NOT NULL,
+    NEIGHBOR_ID INT NOT NULL,
+    ADJ_IDX INT NOT NULL,
+
+    CONSTRAINT CONS UNIQUE (DOMAIN_ID, ADJ_IDX),
+    FOREIGN KEY (DOMAIN_ID) REFERENCES EC_DOMAIN(ID) ON DELETE CASCADE
+)
+CHARACTER SET utf8mb4
+COLLATE utf8mb4_unicode_ci;
+
+CREATE TABLE EC_DOMAIN_NEIGHBORS_2 (
+    DOMAIN_ID INT NOT NULL,
+    NEIGHBOR_ID INT NOT NULL,
+    RELATEDNESS DOUBLE NOT NULL,
+
+    PRIMARY KEY (DOMAIN_ID, NEIGHBOR_ID),
+    FOREIGN KEY (DOMAIN_ID) REFERENCES EC_DOMAIN(ID) ON DELETE CASCADE,
+    FOREIGN KEY (NEIGHBOR_ID) REFERENCES EC_DOMAIN(ID) ON DELETE CASCADE
+);
+
+
+CREATE OR REPLACE VIEW EC_NEIGHBORS_VIEW AS
+  SELECT
+    DOM.DOMAIN_NAME AS DOMAIN_NAME,
+    DOM.ID AS DOMAIN_ID,
+    NEIGHBOR.DOMAIN_NAME AS NEIGHBOR_NAME,
+    NEIGHBOR.ID AS NEIGHBOR_ID,
+    ROUND(100 * RELATEDNESS) AS RELATEDNESS
+  FROM EC_DOMAIN_NEIGHBORS_2
+  INNER JOIN EC_DOMAIN DOM ON DOMAIN_ID=DOM.ID
+  INNER JOIN EC_DOMAIN NEIGHBOR ON NEIGHBOR_ID=NEIGHBOR.ID;
--- a/code/common/db/resources/db/migration/V23_06_0_008__random_domains.sql
+++ b/code/common/db/resources/db/migration/V23_06_0_008__random_domains.sql
@@ -0,0 +1,5 @@
+
+CREATE TABLE IF NOT EXISTS EC_RANDOM_DOMAINS (
+    DOMAIN_ID INT PRIMARY KEY,
+    DOMAIN_SET INT NOT NULL
+);
--- a/code/common/db/resources/db/migration/V23_06_0_009__news_feed.sql
+++ b/code/common/db/resources/db/migration/V23_06_0_009__news_feed.sql
@@ -0,0 +1,8 @@
+
+CREATE TABLE SEARCH_NEWS_FEED (
+    ID INT PRIMARY KEY AUTO_INCREMENT,
+    TITLE VARCHAR(255) NOT NULL,
+    LINK VARCHAR(255) UNIQUE NOT NULL,
+    SOURCE VARCHAR(255),
+    LIST_DATE DATE  NOT NULL
+) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin;
--- a/code/common/db/resources/db/migration/V23_07_0_001__domain_type.sql
+++ b/code/common/db/resources/db/migration/V23_07_0_001__domain_type.sql
@@ -0,0 +1,19 @@
+CREATE TABLE IF NOT EXISTS DOMAIN_SELECTION_TYPE (
+    ID INT PRIMARY KEY AUTO_INCREMENT,
+    NAME VARCHAR(255) UNIQUE,
+    SOURCE VARCHAR(255) NOT NULL
+)
+CHARACTER SET utf8mb4
+COLLATE utf8mb4_bin;
+
+CREATE TABLE DOMAIN_SELECTION (
+    DOMAIN_NAME VARCHAR(255) PRIMARY KEY,
+    DOMAIN_TYPE_ID INT,
+    FOREIGN KEY (DOMAIN_TYPE_ID) REFERENCES DOMAIN_SELECTION_TYPE(ID) ON DELETE CASCADE
+)
+CHARACTER SET utf8mb4
+COLLATE utf8mb4_unicode_ci;
+
+INSERT IGNORE INTO DOMAIN_SELECTION_TYPE(NAME, SOURCE)
+VALUES ('BLOG', 'https://raw.githubusercontent.com/MarginaliaSearch/PublicData/master/sets/blogs.txt'),
+       ('TEST', 'https://downloads.marginalia.nu/domain-list-test.txt');
--- a/code/common/db/resources/db/migration/V23_07_0_002__service_status.sql
+++ b/code/common/db/resources/db/migration/V23_07_0_002__service_status.sql
@@ -0,0 +1,27 @@
+CREATE TABLE IF NOT EXISTS SERVICE_HEARTBEAT (
+    SERVICE_NAME VARCHAR(255) PRIMARY KEY COMMENT "Full name of the service, including node id if applicable, e.g. search-service:0",
+    SERVICE_BASE VARCHAR(255) NOT NULL COMMENT "Base name of the service, e.g. search-service",
+    INSTANCE VARCHAR(255) NOT NULL COMMENT "UUID of the service instance",
+    ALIVE BOOLEAN NOT NULL DEFAULT TRUE COMMENT "Set to false when the service is doing an orderly shutdown",
+    HEARTBEAT_TIME TIMESTAMP(6) NOT NULL DEFAULT CURRENT_TIMESTAMP(6) COMMENT "Service was last seen at this point"
+);
+
+CREATE TABLE IF NOT EXISTS PROCESS_HEARTBEAT (
+    PROCESS_NAME VARCHAR(255) PRIMARY KEY COMMENT "Full name of the process, including node id if applicable, e.g. converter:0",
+    PROCESS_BASE VARCHAR(255) NOT NULL COMMENT "Base name of the process, e.g. converter",
+    INSTANCE VARCHAR(255) NOT NULL COMMENT "UUID of the process instance",
+    STATUS ENUM ('STARTING', 'RUNNING', 'STOPPED') NOT NULL DEFAULT 'STARTING' COMMENT "Status of the process",
+    PROGRESS INT NOT NULL DEFAULT 0 COMMENT "Progress of the process",
+    HEARTBEAT_TIME TIMESTAMP(6) NOT NULL DEFAULT CURRENT_TIMESTAMP(6) COMMENT "Process was last seen at this point"
+);
+
+CREATE TABLE IF NOT EXISTS SERVICE_EVENTLOG(
+    ID BIGINT AUTO_INCREMENT PRIMARY KEY COMMENT "Unique id",
+    SERVICE_NAME VARCHAR(255) NOT NULL COMMENT "Full name of the service, including node id if applicable, e.g. search-service:0",
+    SERVICE_BASE VARCHAR(255) NOT NULL COMMENT "Base name of the service, e.g. search-service",
+    INSTANCE VARCHAR(255) NOT NULL COMMENT "UUID of the service instance",
+    EVENT_TIME TIMESTAMP(6) NOT NULL DEFAULT CURRENT_TIMESTAMP(6) COMMENT "Event time",
+    EVENT_TYPE VARCHAR(255) NOT NULL COMMENT "Event type",
+    EVENT_MESSAGE VARCHAR(255) NOT NULL COMMENT "Event message"
+);
+
--- a/code/common/db/resources/db/migration/V23_07_0_003__message_queue.sql
+++ b/code/common/db/resources/db/migration/V23_07_0_003__message_queue.sql
@@ -0,0 +1,21 @@
+CREATE TABLE IF NOT EXISTS MESSAGE_QUEUE (
+    ID              BIGINT AUTO_INCREMENT PRIMARY KEY COMMENT 'Unique id',
+    RELATED_ID      BIGINT NOT NULL DEFAULT -1        COMMENT 'Unique id a related message',
+    SENDER_INBOX    VARCHAR(255)          COMMENT 'Name of the sender inbox',
+    RECIPIENT_INBOX VARCHAR(255) NOT NULL COMMENT 'Name of the recipient inbox',
+    FUNCTION        VARCHAR(255) NOT NULL COMMENT 'Which function to run',
+    PAYLOAD         TEXT                  COMMENT 'Message to recipient',
+    -- These fields are used to avoid double processing of messages
+    -- instance marks the unique instance of the party, and the tick marks
+    -- the current polling iteration.  Both are necessary.
+    OWNER_INSTANCE  VARCHAR(255)          COMMENT 'Instance UUID corresponding to the party that has claimed the message',
+    OWNER_TICK      BIGINT  DEFAULT -1    COMMENT 'Used by recipient to determine which messages it has processed',
+    STATE           ENUM('NEW', 'ACK', 'OK', 'ERR', 'DEAD')
+                    NOT NULL DEFAULT 'NEW' COMMENT 'Processing state',
+    CREATED_TIME    TIMESTAMP(6) NOT NULL DEFAULT CURRENT_TIMESTAMP(6) COMMENT 'Time of creation',
+    UPDATED_TIME    TIMESTAMP(6) NOT NULL DEFAULT CURRENT_TIMESTAMP(6) COMMENT 'Time of last update',
+    TTL             INT              COMMENT 'Time to live in seconds'
+);
+
+CREATE INDEX MESSAGE_QUEUE_STATE_IDX ON MESSAGE_QUEUE(STATE);
+CREATE INDEX MESSAGE_QUEUE_OI_TICK_IDX ON MESSAGE_QUEUE(OWNER_INSTANCE, OWNER_TICK);
--- a/code/common/db/resources/db/migration/V23_07_0_004__file_storage.sql
+++ b/code/common/db/resources/db/migration/V23_07_0_004__file_storage.sql
@@ -0,0 +1,42 @@
+CREATE TABLE IF NOT EXISTS FILE_STORAGE_BASE (
+    ID BIGINT PRIMARY KEY AUTO_INCREMENT,
+    NAME VARCHAR(255) NOT NULL UNIQUE,
+    PATH VARCHAR(255) NOT NULL UNIQUE COMMENT 'The path to the storage base',
+    TYPE ENUM ('SSD_INDEX', 'SSD_WORK', 'SLOW', 'BACKUP') NOT NULL,
+    PERMIT_TEMP BOOLEAN NOT NULL DEFAULT FALSE COMMENT 'If true, the storage can be used for temporary files'
+)
+CHARACTER SET utf8mb4
+COLLATE utf8mb4_bin;
+
+CREATE TABLE IF NOT EXISTS FILE_STORAGE (
+    ID BIGINT PRIMARY KEY AUTO_INCREMENT,
+    BASE_ID BIGINT NOT NULL,
+    PATH VARCHAR(255) NOT NULL COMMENT 'The path to the storage relative to the base',
+    DESCRIPTION VARCHAR(255) NOT NULL,
+    TYPE ENUM ('CRAWL_SPEC', 'CRAWL_DATA', 'PROCESSED_DATA', 'INDEX_STAGING', 'LEXICON_STAGING', 'INDEX_LIVE', 'LEXICON_LIVE', 'SEARCH_SETS', 'BACKUP', 'EXPORT') NOT NULL,
+    DO_PURGE BOOLEAN NOT NULL DEFAULT FALSE COMMENT 'If true, the storage may be cleaned',
+    CREATE_DATE TIMESTAMP(6) NOT NULL DEFAULT CURRENT_TIMESTAMP(6),
+    CONSTRAINT CONS UNIQUE (BASE_ID, PATH),
+    FOREIGN KEY (BASE_ID) REFERENCES FILE_STORAGE_BASE(ID) ON DELETE CASCADE
+)
+CHARACTER SET utf8mb4
+COLLATE utf8mb4_bin;
+
+CREATE TABLE IF NOT EXISTS FILE_STORAGE_RELATION (
+    SOURCE_ID BIGINT NOT NULL,
+    TARGET_ID BIGINT NOT NULL,
+    CONSTRAINT CONS UNIQUE (SOURCE_ID, TARGET_ID),
+    FOREIGN KEY (SOURCE_ID) REFERENCES FILE_STORAGE(ID) ON DELETE CASCADE,
+    FOREIGN KEY (TARGET_ID) REFERENCES FILE_STORAGE(ID) ON DELETE CASCADE
+);
+
+CREATE VIEW FILE_STORAGE_VIEW
+AS SELECT
+    CONCAT(BASE.PATH, '/', STORAGE.PATH) AS PATH,
+    STORAGE.TYPE AS TYPE,
+    DESCRIPTION AS DESCRIPTION,
+    CREATE_DATE AS CREATE_DATE,
+    STORAGE.ID AS ID,
+    BASE.ID AS BASE_ID
+FROM FILE_STORAGE STORAGE
+INNER JOIN FILE_STORAGE_BASE BASE ON STORAGE.BASE_ID=BASE.ID;
--- a/code/common/db/resources/db/migration/V23_07_0_005__file_storage_default_values.sql
+++ b/code/common/db/resources/db/migration/V23_07_0_005__file_storage_default_values.sql
@@ -0,0 +1,28 @@
+INSERT IGNORE INTO FILE_STORAGE_BASE(NAME, PATH, TYPE, PERMIT_TEMP)
+VALUES
+('Index Storage', '/vol', 'SSD_INDEX', false),
+('Data Storage', '/samples', 'SLOW', true);
+
+INSERT IGNORE INTO FILE_STORAGE(BASE_ID, PATH, DESCRIPTION, TYPE)
+SELECT ID, 'iw', "Index Staging Area", 'INDEX_STAGING'
+FROM FILE_STORAGE_BASE WHERE NAME='Index Storage';
+
+INSERT IGNORE INTO FILE_STORAGE(BASE_ID, PATH, DESCRIPTION, TYPE)
+SELECT ID, 'ir', "Index Live Area", 'INDEX_LIVE'
+FROM FILE_STORAGE_BASE WHERE NAME='Index Storage';
+
+INSERT IGNORE INTO FILE_STORAGE(BASE_ID, PATH, DESCRIPTION, TYPE)
+SELECT ID, 'lw', "Lexicon Staging Area", 'LEXICON_STAGING'
+FROM FILE_STORAGE_BASE WHERE NAME='Index Storage';
+
+INSERT IGNORE INTO FILE_STORAGE(BASE_ID, PATH, DESCRIPTION, TYPE)
+SELECT ID, 'lr', "Lexicon Live Area", 'LEXICON_LIVE'
+FROM FILE_STORAGE_BASE WHERE NAME='Index Storage';
+
+INSERT IGNORE INTO FILE_STORAGE(BASE_ID, PATH, DESCRIPTION, TYPE)
+SELECT ID, 'ss', "Search Sets", 'SEARCH_SETS'
+FROM FILE_STORAGE_BASE WHERE NAME='Index Storage';
+
+INSERT IGNORE INTO FILE_STORAGE(BASE_ID, PATH, DESCRIPTION, TYPE)
+SELECT ID, 'export', "Exported Data", 'EXPORT'
+FROM FILE_STORAGE_BASE WHERE TYPE='EXPORT';
--- a/code/common/db/resources/db/migration/V23_07_0_006__message_queue_default_jobs.sql
+++ b/code/common/db/resources/db/migration/V23_07_0_006__message_queue_default_jobs.sql
@@ -0,0 +1,7 @@
+INSERT INTO MESSAGE_QUEUE(RECIPIENT_INBOX,FUNCTION,PAYLOAD) VALUES
+	 ('fsm:converter_monitor','INITIAL',''),
+	 ('fsm:loader_monitor','INITIAL',''),
+	 ('fsm:crawler_monitor','INITIAL',''),
+	 ('fsm:message_queue_monitor','INITIAL',''),
+	 ('fsm:process_liveness_monitor','INITIAL',''),
+	 ('fsm:file_storage_monitor','INITIAL','');
--- a/code/common/db/resources/db/migration/V23_07_0_007__task_status.sql
+++ b/code/common/db/resources/db/migration/V23_07_0_007__task_status.sql
@@ -0,0 +1,10 @@
+CREATE TABLE IF NOT EXISTS TASK_HEARTBEAT (
+    TASK_NAME VARCHAR(255) PRIMARY KEY COMMENT "Full name of the task, including node id if applicable, e.g. reconvert:0",
+    TASK_BASE VARCHAR(255) NOT NULL COMMENT "Base name of the task, e.g. reconvert",
+    INSTANCE VARCHAR(255) NOT NULL COMMENT "UUID of the task instance",
+    SERVICE_INSTANCE VARCHAR(255) NOT NULL COMMENT "UUID of the parent service",
+    STATUS ENUM ('STARTING', 'RUNNING', 'STOPPED') NOT NULL DEFAULT 'STARTING' COMMENT "Status of the task",
+    PROGRESS INT NOT NULL DEFAULT 0 COMMENT "Progress of the task",
+    STAGE_NAME VARCHAR(255) DEFAULT "",
+    HEARTBEAT_TIME TIMESTAMP(6) NOT NULL DEFAULT CURRENT_TIMESTAMP(6) COMMENT "Task was last seen at this point"
+);
--- a/code/common/db/resources/db/migration/V23_07_0_008__events_index.sql
+++ b/code/common/db/resources/db/migration/V23_07_0_008__events_index.sql
@@ -0,0 +1,2 @@
+CREATE INDEX IF NOT EXISTS SERVICE_EVENTLOG__EVENT_TYPE_IDX ON SERVICE_EVENTLOG (EVENT_TYPE);
+CREATE INDEX IF NOT EXISTS SERVICE_EVENTLOG__SERVICE_NAME_IDX ON SERVICE_EVENTLOG (SERVICE_NAME);
--- a/code/common/db/resources/db/migration/V23_09_0_000__filestorage_livedb.sql
+++ b/code/common/db/resources/db/migration/V23_09_0_000__filestorage_livedb.sql
@@ -0,0 +1,9 @@
+ALTER TABLE FILE_STORAGE MODIFY COLUMN TYPE ENUM ('CRAWL_SPEC', 'CRAWL_DATA', 'PROCESSED_DATA', 'INDEX_STAGING', 'LEXICON_STAGING', 'INDEX_LIVE', 'LEXICON_LIVE', 'SEARCH_SETS', 'BACKUP', 'EXPORT', 'LINKDB_LIVE', 'LINKDB_STAGING') NOT NULL;
+
+INSERT IGNORE INTO FILE_STORAGE(BASE_ID, PATH, DESCRIPTION, TYPE)
+SELECT ID, 'ldbr', "Linkdb Current", 'LINKDB_LIVE'
+FROM FILE_STORAGE_BASE WHERE NAME='Index Storage';
+
+INSERT IGNORE INTO FILE_STORAGE(BASE_ID, PATH, DESCRIPTION, TYPE)
+SELECT ID, 'ldbw', "Linkdb Staging Area", 'LINKDB_STAGING'
+FROM FILE_STORAGE_BASE WHERE NAME='Index Storage';
--- a/code/common/db/resources/db/migration/V23_09_1_000__drop_ecurl.sql
+++ b/code/common/db/resources/db/migration/V23_09_1_000__drop_ecurl.sql
@@ -0,0 +1,3 @@
+DROP VIEW EC_URL_VIEW;
+DROP TABLE EC_PAGE_DATA;
+DROP TABLE EC_URL;
--- a/code/common/db/resources/db/migration/V23_09_2_000__filestorage_backup.sql
+++ b/code/common/db/resources/db/migration/V23_09_2_000__filestorage_backup.sql
@@ -0,0 +1,3 @@
+INSERT IGNORE INTO FILE_STORAGE_BASE(NAME, PATH, TYPE, PERMIT_TEMP)
+VALUES
+('Backup Storage', '/backup', 'BACKUP', true);
--- a/code/common/db/resources/db/migration/V23_09_2_001__filestorage_no_lexicon.sql
+++ b/code/common/db/resources/db/migration/V23_09_2_001__filestorage_no_lexicon.sql
@@ -0,0 +1 @@
+DELETE FROM FILE_STORAGE WHERE TYPE IN ('LEXICON_STAGING', 'LEXICON_LIVE');
--- a/code/common/db/resources/db/migration/V23_11_0_000__file_storage_node.sql
+++ b/code/common/db/resources/db/migration/V23_11_0_000__file_storage_node.sql
@@ -0,0 +1,21 @@
+ALTER TABLE FILE_STORAGE_BASE MODIFY COLUMN NAME VARCHAR(255) NOT NULL;
+ALTER TABLE FILE_STORAGE_BASE MODIFY COLUMN PATH VARCHAR(255) NOT NULL;
+DROP INDEX PATH ON FILE_STORAGE_BASE;
+DROP INDEX NAME ON FILE_STORAGE_BASE;
+ALTER TABLE FILE_STORAGE_BASE ADD COLUMN NODE INT NOT NULL DEFAULT -1;
+CREATE UNIQUE INDEX FILE_STORAGE_BASE__NODE_NAME ON FILE_STORAGE_BASE(NODE, NAME);
+CREATE UNIQUE INDEX FILE_STORAGE_BASE__NODE_PATH ON FILE_STORAGE_BASE(NODE, PATH);
+
+
+DROP VIEW FILE_STORAGE_VIEW;
+CREATE VIEW FILE_STORAGE_VIEW
+AS SELECT
+    CONCAT(BASE.PATH, '/', STORAGE.PATH) AS PATH,
+    STORAGE.TYPE AS TYPE,
+    NODE AS NODE,
+    DESCRIPTION AS DESCRIPTION,
+    CREATE_DATE AS CREATE_DATE,
+    STORAGE.ID AS ID,
+    BASE.ID AS BASE_ID
+FROM FILE_STORAGE STORAGE
+INNER JOIN FILE_STORAGE_BASE BASE ON STORAGE.BASE_ID=BASE.ID;
--- a/code/common/db/resources/db/migration/V23_11_0_001__heartbeat_node.sql
+++ b/code/common/db/resources/db/migration/V23_11_0_001__heartbeat_node.sql
@@ -0,0 +1,3 @@
+ALTER TABLE TASK_HEARTBEAT ADD COLUMN NODE INT NOT NULL DEFAULT -1;
+ALTER TABLE PROCESS_HEARTBEAT ADD COLUMN NODE INT NOT NULL DEFAULT -1;
+ALTER TABLE SERVICE_HEARTBEAT ADD COLUMN NODE INT NOT NULL DEFAULT -1;
--- a/code/common/db/resources/db/migration/V23_11_0_002__file_storage_state.sql
+++ b/code/common/db/resources/db/migration/V23_11_0_002__file_storage_state.sql
@@ -0,0 +1,17 @@
+ALTER TABLE FILE_STORAGE ADD COLUMN STATE VARCHAR(255) NOT NULL DEFAULT '';
+ALTER TABLE FILE_STORAGE DROP COLUMN DO_PURGE;
+
+DROP VIEW FILE_STORAGE_VIEW;
+
+CREATE VIEW FILE_STORAGE_VIEW
+AS SELECT
+    CONCAT(BASE.PATH, '/', STORAGE.PATH) AS PATH,
+    STORAGE.TYPE AS TYPE,
+    STATE AS STATE,
+    NODE AS NODE,
+    DESCRIPTION AS DESCRIPTION,
+    CREATE_DATE AS CREATE_DATE,
+    STORAGE.ID AS ID,
+    BASE.ID AS BASE_ID
+FROM FILE_STORAGE STORAGE
+INNER JOIN FILE_STORAGE_BASE BASE ON STORAGE.BASE_ID=BASE.ID;
--- a/code/common/db/resources/db/migration/V23_11_0_003__node_configuration.sql
+++ b/code/common/db/resources/db/migration/V23_11_0_003__node_configuration.sql
@@ -0,0 +1,8 @@
+CREATE TABLE NODE_CONFIGURATION (
+    ID INT PRIMARY KEY,
+    DESCRIPTION VARCHAR(255),
+    ACCEPT_QUERIES BOOLEAN,
+    AUTO_CLEAN BOOLEAN DEFAULT TRUE,
+    PRECESSION BOOLEAN DEFAULT TRUE,
+    DISABLED BOOLEAN DEFAULT FALSE
+);
--- a/code/common/db/resources/db/migration/V23_11_0_004__file_storage_base_type.sql
+++ b/code/common/db/resources/db/migration/V23_11_0_004__file_storage_base_type.sql
@@ -0,0 +1,10 @@
+ALTER TABLE FILE_STORAGE_BASE DROP COLUMN PERMIT_TEMP;
+ALTER TABLE FILE_STORAGE_BASE ADD COLUMN TYPE_NEW VARCHAR(255) NOT NULL;
+
+UPDATE FILE_STORAGE_BASE SET TYPE_NEW = 'CURRENT' WHERE TYPE='SSD_INDEX';
+UPDATE FILE_STORAGE_BASE SET TYPE_NEW = 'WORK'    WHERE TYPE='SSD_WORK';
+UPDATE FILE_STORAGE_BASE SET TYPE_NEW = 'STORAGE' WHERE TYPE='SLOW';
+UPDATE FILE_STORAGE_BASE SET TYPE_NEW = 'BACKUP'  WHERE TYPE='BACKUP';
+
+ALTER TABLE FILE_STORAGE_BASE DROP COLUMN TYPE;
+ALTER TABLE FILE_STORAGE_BASE CHANGE COLUMN TYPE_NEW TYPE VARCHAR(255) NOT NULL;
--- a/code/common/db/resources/db/migration/V23_11_0_005__clean_message_queue.sql
+++ b/code/common/db/resources/db/migration/V23_11_0_005__clean_message_queue.sql
@@ -0,0 +1 @@
+UPDATE MESSAGE_QUEUE SET STATE='DEAD' WHERE STATE='NEW';
--- a/code/common/db/resources/db/migration/V23_11_0_006__clean_stores.sql
+++ b/code/common/db/resources/db/migration/V23_11_0_006__clean_stores.sql
@@ -0,0 +1 @@
+DELETE FROM FILE_STORAGE WHERE TYPE IN ('INDEX_STAGING', 'INDEX_LIVE', 'SEARCH_SETS', 'LINKDB_LIVE', 'LINKDB_STAGING');
--- a/code/common/db/resources/db/migration/V23_11_0_007__domain_node_affinity.sql
+++ b/code/common/db/resources/db/migration/V23_11_0_007__domain_node_affinity.sql
@@ -0,0 +1 @@
+ALTER TABLE EC_DOMAIN ADD COLUMN NODE_AFFINITY INT NOT NULL;
--- a/code/common/db/resources/db/migration/V23_11_0_008__purge_procedure.sql
+++ b/code/common/db/resources/db/migration/V23_11_0_008__purge_procedure.sql
@@ -0,0 +1,9 @@
+ALTER TABLE WMSA_prod.EC_DOMAIN_LINK
+MODIFY COLUMN ID BIGINT NOT NULL AUTO_INCREMENT;
+
+DELIMITER $$
+CREATE OR REPLACE PROCEDURE PURGE_LINKS_TABLE (IN nodeId INT)
+BEGIN
+    DELETE EC_DOMAIN_LINK FROM EC_DOMAIN_LINK INNER JOIN WMSA_prod.EC_DOMAIN ON EC_DOMAIN_LINK.SOURCE_DOMAIN_ID = EC_DOMAIN.ID WHERE NODE_AFFINITY = nodeId;
+END$$
+DELIMITER ;
--- a/code/common/db/resources/db/migration/V24_01_0_001__node_config__keep_warc.sql
+++ b/code/common/db/resources/db/migration/V24_01_0_001__node_config__keep_warc.sql
@@ -0,0 +1 @@
+ALTER TABLE WMSA_prod.NODE_CONFIGURATION ADD COLUMN KEEP_WARCS BOOLEAN DEFAULT FALSE;
--- a/code/common/db/resources/db/migration/V24_01_0_002__domain_set.sql
+++ b/code/common/db/resources/db/migration/V24_01_0_002__domain_set.sql
@@ -0,0 +1,12 @@
+
+CREATE TABLE IF NOT EXISTS CONF_DOMAIN_RANKING_SET (
+    NAME VARCHAR(255) PRIMARY KEY COLLATE utf8mb4_unicode_ci,
+    DESCRIPTION VARCHAR(255) NOT NULL,
+    ALGORITHM VARCHAR(255) NOT NULL,
+    DEPTH INT NOT NULL,
+    DEFINITION LONGTEXT NOT NULL
+) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin;
+
+INSERT IGNORE INTO CONF_DOMAIN_RANKING_SET(NAME, DESCRIPTION, ALGORITHM, DEPTH, DEFINITION) VALUES ('NONE', 'Reserved: No Ranking Algorithm', 'SPECIAL', 50000, '');
+INSERT IGNORE INTO CONF_DOMAIN_RANKING_SET(NAME, DESCRIPTION, ALGORITHM, DEPTH, DEFINITION) VALUES ('BLOGS', 'Reserved: Blogs Set', 'SPECIAL', 50000, '');
+INSERT IGNORE INTO CONF_DOMAIN_RANKING_SET(NAME, DESCRIPTION, ALGORITHM, DEPTH, DEFINITION) VALUES ('RANK', 'Reserved: Main Domain Ranking', 'SPECIAL', 50000, '');
--- a/code/common/db/resources/db/migration/V24_01_0_003__mqaudit.sql
+++ b/code/common/db/resources/db/migration/V24_01_0_003__mqaudit.sql
@@ -0,0 +1 @@
+ALTER TABLE MESSAGE_QUEUE ADD COLUMN AUDIT_RELATED_ID LONG NOT NULL DEFAULT -1 COMMENT 'To be applied to any new messages created while handling a message';
--- a/code/common/db/resources/db/migration/V24_02_0_000__drop_domain_links.sql
+++ b/code/common/db/resources/db/migration/V24_02_0_000__drop_domain_links.sql
@@ -0,0 +1 @@
+DROP TABLE EC_DOMAIN_LINK;
--- a/code/common/db/resources/db/migration/V24_02_0_001__drop_ranking_set_algo.sql
+++ b/code/common/db/resources/db/migration/V24_02_0_001__drop_ranking_set_algo.sql
@@ -0,0 +1 @@
+ALTER TABLE CONF_DOMAIN_RANKING_SET DROP COLUMN ALGORITHM;
--- a/code/common/db/resources/db/migration/V24_11_0_001__add_node_profile.sql
+++ b/code/common/db/resources/db/migration/V24_11_0_001__add_node_profile.sql
@@ -0,0 +1 @@
+ALTER TABLE WMSA_prod.NODE_CONFIGURATION ADD COLUMN NODE_PROFILE VARCHAR(255) DEFAULT 'MIXED';
--- a/code/common/db/test/nu/marginalia/db/DomainRankingSetsServiceTest.java
+++ b/code/common/db/test/nu/marginalia/db/DomainRankingSetsServiceTest.java
@@ -0,0 +1,91 @@
+package nu.marginalia.db;
+
+import com.zaxxer.hikari.HikariConfig;
+import com.zaxxer.hikari.HikariDataSource;
+import nu.marginalia.test.TestMigrationLoader;
+import org.junit.jupiter.api.*;
+import org.testcontainers.containers.MariaDBContainer;
+import org.testcontainers.junit.jupiter.Container;
+import org.testcontainers.junit.jupiter.Testcontainers;
+
+import static org.junit.jupiter.api.Assertions.*;
+
+@Testcontainers
+@Tag("slow")
+class DomainRankingSetsServiceTest {
+
+    @Container
+    static MariaDBContainer<?> mariaDBContainer = new MariaDBContainer<>("mariadb")
+            .withDatabaseName("WMSA_prod")
+            .withUsername("wmsa")
+            .withPassword("wmsa")
+            .withNetworkAliases("mariadb");
+
+    static HikariDataSource dataSource;
+
+    @BeforeAll
+    public static void setup() {
+        HikariConfig config = new HikariConfig();
+        config.setJdbcUrl(mariaDBContainer.getJdbcUrl());
+        config.setUsername("wmsa");
+        config.setPassword("wmsa");
+
+        dataSource = new HikariDataSource(config);
+
+        TestMigrationLoader.flywayMigration(dataSource);
+
+        // The migration SQL will insert a few default values, we want to remove them
+        wipeDomainRankingSets(dataSource);
+    }
+
+    @AfterEach
+    public void tearDown() {
+        wipeDomainRankingSets(dataSource);
+    }
+
+    @AfterAll
+    static void tearDownAll() {
+        dataSource.close();
+        mariaDBContainer.close();
+    }
+
+    @Test
+    public void testScenarios() throws Exception {
+        var service = new DomainRankingSetsService(dataSource);
+
+        var newValue = new DomainRankingSetsService.DomainRankingSet(
+                "test",
+                "Test domain set",
+                10,
+                "test\\.nu"
+        );
+        var newValue2 = new DomainRankingSetsService.DomainRankingSet(
+                "test2",
+                "Test domain set 2",
+                20,
+                "test\\.nu 2"
+        );
+        service.upsert(newValue);
+        service.upsert(newValue2);
+        assertEquals(newValue, service.get("test").orElseThrow());
+
+        var allValues = service.getAll();
+        assertEquals(2, allValues.size());
+        assertTrue(allValues.contains(newValue));
+        assertTrue(allValues.contains(newValue2));
+
+        service.delete(newValue);
+        assertFalse(service.get("test").isPresent());
+
+        service.delete(newValue2);
+        assertFalse(service.get("test2").isPresent());
+
+        allValues = service.getAll();
+        assertEquals(0, allValues.size());
+    }
+
+    private static void wipeDomainRankingSets(HikariDataSource dataSource) {
+        var service = new DomainRankingSetsService(dataSource);
+        service.getAll().forEach(service::delete);
+    }
+}
--- a/code/common/db/test/nu/marginalia/db/DomainTypesTest.java
+++ b/code/common/db/test/nu/marginalia/db/DomainTypesTest.java
@@ -0,0 +1,73 @@
+package nu.marginalia.db;
+
+import com.google.common.collect.Sets;
+import com.zaxxer.hikari.HikariConfig;
+import com.zaxxer.hikari.HikariDataSource;
+import nu.marginalia.test.TestMigrationLoader;
+import org.junit.jupiter.api.AfterAll;
+import org.junit.jupiter.api.BeforeAll;
+import org.junit.jupiter.api.Tag;
+import org.junit.jupiter.api.Test;
+import org.testcontainers.containers.MariaDBContainer;
+import org.testcontainers.junit.jupiter.Container;
+import org.testcontainers.junit.jupiter.Testcontainers;
+
+import java.io.IOException;
+import java.sql.SQLException;
+import java.util.HashSet;
+import java.util.Set;
+
+import static org.junit.jupiter.api.Assertions.assertEquals;
+
+@Tag("slow")
+@Testcontainers
+public class DomainTypesTest {
+    @Container
+    static MariaDBContainer<?> mariaDBContainer = new MariaDBContainer<>("mariadb")
+            .withDatabaseName("WMSA_prod")
+            .withUsername("wmsa")
+            .withPassword("wmsa")
+            .withNetworkAliases("mariadb");
+
+    static HikariDataSource dataSource;
+    static DomainTypes domainTypes;
+
+    @BeforeAll
+    public static void setup() {
+        HikariConfig config = new HikariConfig();
+        config.setJdbcUrl(mariaDBContainer.getJdbcUrl());
+        config.setUsername("wmsa");
+        config.setPassword("wmsa");
+
+        dataSource = new HikariDataSource(config);
+        TestMigrationLoader.flywayMigration(dataSource);
+
+        domainTypes = new DomainTypes(dataSource);
+    }
+
+    @AfterAll
+    public static void teardown() {
+        dataSource.close();
+    }
+
+    @Test
+    public void reloadDomainsList() throws SQLException, IOException {
+        domainTypes.reloadDomainsList(DomainTypes.Type.TEST);
+
+        var downloadedDomains = new HashSet<>(domainTypes.getAllDomainsByType(DomainTypes.Type.TEST));
+
+        var expectedDomains = Set.of("www.marginalia.nu", "search.marginalia.nu", "docs.marginalia.nu",
+                                     "encyclopedia.marginalia.nu", "memex.marginalia.nu");
+
+        assertEquals(expectedDomains.size(), downloadedDomains.size());
+        assertEquals(Set.of(), Sets.symmetricDifference(expectedDomains, downloadedDomains));
+    }
+
+    @Test
+    public void configure() throws SQLException {
+        assertEquals("", domainTypes.getUrlForSelection(DomainTypes.Type.CRAWL));
+        domainTypes.updateUrlForSelection(DomainTypes.Type.CRAWL, "test");
+        assertEquals("test", domainTypes.getUrlForSelection(DomainTypes.Type.CRAWL));
+    }
+
+}
--- a/code/common/linkdb/build.gradle
+++ b/code/common/linkdb/build.gradle
@@ -0,0 +1,49 @@
+plugins {
+    id 'java'
+
+    id 'jvm-test-suite'
+}
+
+java {
+    toolchain {
+        languageVersion.set(JavaLanguageVersion.of(rootProject.ext.jvmVersion))
+    }
+}
+
+configurations {
+    flywayMigration.extendsFrom(implementation)
+}
+
+apply from: "$rootProject.projectDir/srcsets.gradle"
+
+dependencies {
+    implementation project(':code:common:model')
+    implementation project(':code:common:service')
+
+    implementation libs.bundles.slf4j
+
+    implementation libs.guava
+    implementation dependencies.create(libs.guice.get()) {
+        exclude group: 'com.google.guava'
+    }
+    implementation libs.bundles.gson
+
+    implementation libs.notnull
+    implementation libs.bundles.mariadb
+
+    implementation libs.sqlite
+    implementation libs.commons.lang3
+
+    implementation libs.trove
+
+    testImplementation libs.bundles.slf4j.test
+    testImplementation libs.bundles.junit
+    testImplementation libs.mockito
+
+    testImplementation platform('org.testcontainers:testcontainers-bom:1.17.4')
+    testImplementation libs.commons.codec
+    testImplementation 'org.testcontainers:mariadb:1.17.4'
+    testImplementation 'org.testcontainers:junit-jupiter:1.17.4'
+    testImplementation project(':code:libraries:test-helpers')
+}
+
--- a/code/common/linkdb/java/nu/marginalia/linkdb/LinkdbFileNames.java
+++ b/code/common/linkdb/java/nu/marginalia/linkdb/LinkdbFileNames.java
@@ -0,0 +1,7 @@
+package nu.marginalia.linkdb;
+
+public class LinkdbFileNames {
+    public static String DEPRECATED_LINKDB_FILE_NAME = "links.db";
+    public static String DOCDB_FILE_NAME = "documents.db";
+    public static String DOMAIN_LINKS_FILE_NAME = "domain-links.dat";
+}
--- a/code/common/linkdb/java/nu/marginalia/linkdb/docs/DocumentDbReader.java
+++ b/code/common/linkdb/java/nu/marginalia/linkdb/docs/DocumentDbReader.java
@@ -0,0 +1,135 @@
+package nu.marginalia.linkdb.docs;
+
+import com.google.inject.Inject;
+import com.google.inject.Singleton;
+import com.google.inject.name.Named;
+import gnu.trove.list.TLongList;
+import nu.marginalia.linkdb.model.DocdbUrlDetail;
+import nu.marginalia.model.EdgeUrl;
+import nu.marginalia.model.id.UrlIdCodec;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.net.URISyntaxException;
+import java.nio.file.Files;
+import java.nio.file.Path;
+
+import java.nio.file.StandardCopyOption;
+import java.sql.Connection;
+import java.sql.DriverManager;
+import java.sql.SQLException;
+import java.util.ArrayList;
+import java.util.List;
+
+/** Reads the document database, which is a SQLite database
+ * containing the URLs and metadata of the documents in the
+ * index.
+ * <p></p>
+ * The database is created by the DocumentDbWriter class.
+ * */
+@Singleton
+public class DocumentDbReader {
+    private final Path dbFile;
+    private volatile Connection connection;
+
+    private final Logger logger = LoggerFactory.getLogger(getClass());
+
+    @Inject
+    public DocumentDbReader(@Named("docdb-file") Path dbFile) throws SQLException {
+        this.dbFile = dbFile;
+
+        if (Files.exists(dbFile)) {
+            connection = createConnection();
+        }
+        else {
+            logger.warn("No docdb file {}", dbFile);
+        }
+    }
+
+    private Connection createConnection() throws SQLException {
+        try {
+            String connStr = "jdbc:sqlite:" + dbFile.toString();
+            return DriverManager.getConnection(connStr);
+        }
+        catch (SQLException ex) {
+            logger.error("Failed to connect to link database " + dbFile, ex);
+            return null;
+        }
+    }
+
+    /** Switches the input database file to a new file.
+     * <p></p>
+     * This is used to switch over to a new database file
+     * when the index is re-indexed.
+     * */
+    public void switchInput(Path newDbFile) throws IOException, SQLException {
+        if (!Files.isRegularFile(newDbFile)) {
+            logger.error("Source is not a file, refusing switch-over {}", newDbFile);
+            return;
+        }
+
+        if (connection != null) {
+            connection.close();
+        }
+
+        logger.info("Moving {} to {}", newDbFile, dbFile);
+
+        Files.move(newDbFile, dbFile, StandardCopyOption.REPLACE_EXISTING);
+
+        connection = createConnection();
+    }
+
+    /** Re-establishes the connection, useful in tests and not
+     * much else */
+    public void reconnect() throws SQLException {
+        if (connection != null)
+            connection.close();
+
+        connection = createConnection();
+    }
+
+    /** Returns the URL details for the given document ids.
+     * <p></p>
+     * This is used to get the URL details for the search
+     * results.
+     * */
+    public List<DocdbUrlDetail> getUrlDetails(TLongList ids) throws SQLException {
+        List<DocdbUrlDetail> ret = new ArrayList<>(ids.size());
+
+        if (connection == null ||
+            connection.isClosed())
+        {
+            throw new RuntimeException("URL query temporarily unavailable due to database switch");
+        }
+
+        try (var stmt = connection.prepareStatement("""
+                SELECT ID, URL, TITLE, DESCRIPTION, WORDS_TOTAL, FORMAT, FEATURES, DATA_HASH, QUALITY, PUB_YEAR
+                FROM DOCUMENT WHERE ID = ?
+                """)) {
+            for (int i = 0; i < ids.size(); i++) {
+                long id = ids.get(i);
+                stmt.setLong(1, id);
+                var rs = stmt.executeQuery();
+                if (rs.next()) {
+                    var url = new EdgeUrl(rs.getString("URL"));
+                    ret.add(new DocdbUrlDetail(
+                            rs.getLong("ID"),
+                            url,
+                            rs.getString("TITLE"),
+                            rs.getString("DESCRIPTION"),
+                            rs.getDouble("QUALITY"),
+                            rs.getString("FORMAT"),
+                            rs.getInt("FEATURES"),
+                            rs.getInt("PUB_YEAR"),
+                            rs.getLong("DATA_HASH"),
+                            rs.getInt("WORDS_TOTAL")
+                    ));
+                }
+            }
+        } catch (URISyntaxException e) {
+            throw new RuntimeException(e);
+        }
+        return ret;
+    }
+}
--- a/code/common/linkdb/java/nu/marginalia/linkdb/docs/DocumentDbWriter.java
+++ b/code/common/linkdb/java/nu/marginalia/linkdb/docs/DocumentDbWriter.java
@@ -0,0 +1,83 @@
+package nu.marginalia.linkdb.docs;
+
+import nu.marginalia.linkdb.model.DocdbUrlDetail;
+
+import java.io.IOException;
+import java.nio.file.Path;
+import java.sql.Connection;
+import java.sql.DriverManager;
+import java.sql.SQLException;
+import java.util.List;
+
+/** Writes the document database, which is a SQLite database
+ * containing the URLs and metadata of the documents in the
+ * index.
+ * */
+public class DocumentDbWriter {
+
+    private final Connection connection;
+
+    public DocumentDbWriter(Path outputFile) throws SQLException {
+        String connStr = "jdbc:sqlite:" + outputFile.toString();
+        connection = DriverManager.getConnection(connStr);
+
+        try (var stream = ClassLoader.getSystemResourceAsStream("db/docdb-document.sql");
+             var stmt = connection.createStatement()
+        ) {
+            var sql = new String(stream.readAllBytes());
+            stmt.executeUpdate(sql);
+
+            // Disable synchronous writing as this is a one-off operation with no recovery
+            stmt.execute("PRAGMA synchronous = OFF");
+        } catch (IOException e) {
+            throw new RuntimeException(e);
+        }
+    }
+
+    public void add(DocdbUrlDetail docdbUrlDetail) throws SQLException {
+        add(List.of(docdbUrlDetail));
+    }
+
+    public void add(List<DocdbUrlDetail> docdbUrlDetail) throws SQLException {
+
+        try (var stmt = connection.prepareStatement("""
+                INSERT OR IGNORE INTO DOCUMENT(ID, URL, TITLE, DESCRIPTION, WORDS_TOTAL, FORMAT, FEATURES, DATA_HASH, QUALITY, PUB_YEAR)
+                VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
+                """)) {
+
+            int i = 0;
+            for (var document : docdbUrlDetail) {
+                var url = document.url();
+
+                stmt.setLong(1, document.urlId());
+                stmt.setString(2, url.toString());
+
+                stmt.setString(3, document.title());
+                stmt.setString(4, document.description());
+                stmt.setInt(5, document.wordsTotal());
+                stmt.setString(6, document.format());
+                stmt.setInt(7, document.features());
+                stmt.setLong(8, document.dataHash());
+                stmt.setDouble(9, document.urlQuality());
+                if (document.pubYear() == null) {
+                    stmt.setInt(10, 0);
+                } else {
+                    stmt.setInt(10, document.pubYear());
+                }
+
+                stmt.addBatch();
+
+                if (++i > 1000) {
+                    stmt.executeBatch();
+                    i = 0;
+                }
+            }
+
+            if (i != 0) stmt.executeBatch();
+        }
+    }
+
+    public void close() throws SQLException {
+        connection.close();
+    }
+}
--- a/code/common/linkdb/java/nu/marginalia/linkdb/model/DocdbUrlDetail.java
+++ b/code/common/linkdb/java/nu/marginalia/linkdb/model/DocdbUrlDetail.java
@@ -0,0 +1,18 @@
+package nu.marginalia.linkdb.model;
+
+import nu.marginalia.model.EdgeUrl;
+
+public record DocdbUrlDetail(long urlId,
+                             EdgeUrl url,
+                             String title,
+                             String description,
+                             double urlQuality,
+                             String format,
+                             int features,
+                             Integer pubYear,
+                             long dataHash,
+                             int wordsTotal
+                        )
+
+{
+}
--- a/code/common/linkdb/readme.md
+++ b/code/common/linkdb/readme.md
@@ -0,0 +1,19 @@
+## Document Database
+
+The document database contains information about links,
+such as their ID, their URL, their title, their description,
+and so forth.
+
+The document database is a sqlite file.  The reason this information
+is not in the MariaDB database is that this would make updates to
+this information take effect in production immediately, even before
+the information was searchable.
+
+* [DocumentLinkDbWriter](java/nu/marginalia/linkdb/docs/DocumentDbWriter.java)
+* [DocumentLinkDbLoader](java/nu/marginalia/linkdb/docs/DocumentDbReader.java)
+
+**TODO**:  This module should probably be renamed and moved into some other package. 
+
+## See Also
+
+The database is constructed by the [loading-process](../../processes/loading-process), and consumed by the [index-service](../../services-core/index-service).
--- a/code/common/linkdb/resources/db/docdb-document.sql
+++ b/code/common/linkdb/resources/db/docdb-document.sql
@@ -0,0 +1,17 @@
+CREATE TABLE DOCUMENT (
+    ID INT8 PRIMARY KEY,
+
+    URL TEXT,
+
+    STATE INT,
+    TITLE TEXT NOT NULL,
+    DESCRIPTION TEXT NOT NULL,
+
+    WORDS_TOTAL INTEGER NOT NULL,
+    FORMAT TEXT NOT NULL,
+    FEATURES INTEGER NOT NULL,
+
+    DATA_HASH INTEGER NOT NULL,
+    QUALITY REAL NOT NULL,
+    PUB_YEAR INTEGER NOT NULL
+);
--- a/code/common/linkdb/test/nu/marginalia/linkdb/DocumentDbWriterTest.java
+++ b/code/common/linkdb/test/nu/marginalia/linkdb/DocumentDbWriterTest.java
@@ -0,0 +1,44 @@
+package nu.marginalia.linkdb;
+
+import gnu.trove.list.array.TLongArrayList;
+import nu.marginalia.linkdb.docs.DocumentDbReader;
+import nu.marginalia.linkdb.docs.DocumentDbWriter;
+import nu.marginalia.linkdb.model.DocdbUrlDetail;
+import nu.marginalia.model.EdgeDomain;
+import org.junit.jupiter.api.Test;
+
+import java.io.IOException;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.sql.SQLException;
+
+public class DocumentDbWriterTest {
+    @Test
+    public void testCreate() throws IOException {
+        Path tempPath = Files.createTempFile("docdb", ".db");
+        try {
+            var writer = new DocumentDbWriter(tempPath);
+            writer.add(new DocdbUrlDetail(
+                    1,
+                    new nu.marginalia.model.EdgeUrl("http", new EdgeDomain("example.com"), null, "/", null),
+                    "Test",
+                    "This is a test",
+                    -4.,
+                    "XHTML",
+                    5,
+                    2020,
+                    0xF00BA3,
+                    444
+            ));
+            writer.close();
+
+            var reader = new DocumentDbReader(tempPath);
+            var deets = reader.getUrlDetails(new TLongArrayList(new long[]{1}));
+            System.out.println(deets);
+        } catch (SQLException e) {
+            throw new RuntimeException(e);
+        } finally {
+            Files.deleteIfExists(tempPath);
+        }
+    }
+}
--- a/code/common/model/build.gradle
+++ b/code/common/model/build.gradle
@@ -0,0 +1,41 @@
+plugins {
+    id 'java'
+
+    id 'jvm-test-suite'
+}
+
+java {
+    toolchain {
+        languageVersion.set(JavaLanguageVersion.of(rootProject.ext.jvmVersion))
+    }
+}
+
+apply from: "$rootProject.projectDir/srcsets.gradle"
+
+dependencies {
+    implementation project(':code:libraries:braille-block-punch-cards')
+    implementation project(':code:libraries:coded-sequence')
+
+    implementation libs.bundles.slf4j
+
+    implementation libs.guava
+    implementation dependencies.create(libs.guice.get()) {
+        exclude group: 'com.google.guava'
+    }
+    implementation libs.bundles.gson
+
+    implementation libs.notnull
+
+    implementation libs.commons.lang3
+
+    implementation libs.trove
+    implementation libs.fastutil
+
+    implementation libs.bundles.mariadb
+
+    testImplementation libs.bundles.slf4j.test
+    testImplementation libs.bundles.junit
+    testImplementation libs.mockito
+}
+
+
--- a/code/common/model/java/nu/marginalia/model/EdgeDomain.java
+++ b/code/common/model/java/nu/marginalia/model/EdgeDomain.java
@@ -0,0 +1,209 @@
+package nu.marginalia.model;
+
+import javax.annotation.Nonnull;
+import java.io.Serializable;
+import java.util.Objects;
+import java.util.Optional;
+import java.util.function.Predicate;
+import java.util.regex.Pattern;
+
+public class EdgeDomain implements Serializable {
+
+    @Nonnull
+    public final String subDomain;
+    @Nonnull
+    public final String topDomain;
+
+    public EdgeDomain(String host) {
+        Objects.requireNonNull(host, "domain name must not be null");
+
+        host = host.toLowerCase();
+
+        // Remove trailing dots, which are allowed in DNS but not in URLs
+        // (though sometimes still show up in the wild)
+        while (!host.isBlank() && host.endsWith(".")) {
+            host = host.substring(0, host.length() - 1);
+        }
+
+        var dot = host.lastIndexOf('.');
+
+        if (dot < 0 || looksLikeAnIp(host)) { // IPV6 >.>
+            subDomain = "";
+            topDomain = host;
+        } else {
+            int dot2 = host.substring(0, dot).lastIndexOf('.');
+            if (dot2 < 0) {
+                subDomain = "";
+                topDomain = host;
+            } else {
+                if (looksLikeGovTld(host)) { // Capture .ac.jp, .co.uk
+                    int dot3 = host.substring(0, dot2).lastIndexOf('.');
+                    if (dot3 >= 0) {
+                        dot2 = dot3;
+                        subDomain = host.substring(0, dot2);
+                        topDomain = host.substring(dot2 + 1);
+                    } else {
+                        subDomain = "";
+                        topDomain = host;
+                    }
+                } else {
+                    subDomain = host.substring(0, dot2);
+                    topDomain = host.substring(dot2 + 1);
+                }
+            }
+        }
+    }
+
+    private static final Predicate<String> govListTest = Pattern.compile(".*\\.(id|ac|co|org|gov|edu|com)\\.[a-z]{2}").asMatchPredicate();
+
+    public EdgeDomain(@Nonnull String subDomain, @Nonnull String topDomain) {
+        this.subDomain = subDomain;
+        this.topDomain = topDomain;
+    }
+
+    private boolean looksLikeGovTld(String host) {
+        if (host.length() < 8)
+            return false;
+        int cnt = 0;
+        for (int i = host.length() - 7; i < host.length(); i++) {
+            if (host.charAt(i) == '.')
+                cnt++;
+        }
+        return cnt >= 2 && govListTest.test(host);
+    }
+
+
+    private static final Predicate<String> ipPatternTest = Pattern.compile("[\\d]{1,3}\\.[\\d]{1,3}\\.[\\d]{1,3}\\.[\\d]{1,3}").asMatchPredicate();
+
+    private boolean looksLikeAnIp(String host) {
+        if (host.length() < 7)
+            return false;
+
+        char firstChar = host.charAt(0);
+        int lastChar = host.charAt(host.length() - 1);
+
+        return Character.isDigit(firstChar)
+                && Character.isDigit(lastChar)
+                && ipPatternTest.test(host);
+    }
+
+
+    public EdgeUrl toRootUrlHttp() {
+        // Set default protocol to http, as most https websites redirect http->https, but few http websites redirect https->http
+        return new EdgeUrl("http", this, null, "/", null);
+    }
+
+    public EdgeUrl toRootUrlHttps() {
+        return new EdgeUrl("https", this, null, "/", null);
+    }
+
+    public String toString() {
+        return getAddress();
+    }
+
+    public String getAddress() {
+        if (!subDomain.isEmpty()) {
+            return subDomain + "." + topDomain;
+        }
+        return topDomain;
+    }
+
+    public String getDomainKey() {
+        int cutPoint = topDomain.indexOf('.');
+        if (cutPoint < 0) {
+            return topDomain;
+        }
+        return topDomain.substring(0, cutPoint).toLowerCase();
+    }
+
+    public String getLongDomainKey() {
+        StringBuilder ret = new StringBuilder();
+
+        int cutPoint = topDomain.indexOf('.');
+        if (cutPoint < 0) {
+            ret.append(topDomain);
+        } else {
+            ret.append(topDomain, 0, cutPoint);
+        }
+
+        if (!subDomain.isEmpty() && !"www".equals(subDomain)) {
+            ret.append(":");
+            ret.append(subDomain);
+        }
+
+        return ret.toString().toLowerCase();
+    }
+
+    /** If possible, try to provide an alias domain,
+     * i.e. a domain name that is very likely to link to this one
+     * */
+    public Optional<EdgeDomain> aliasDomain() {
+        if (subDomain.equals("www")) {
+            return Optional.of(new EdgeDomain("", topDomain));
+        } else if (subDomain.isBlank()){
+            return Optional.of(new EdgeDomain("www", topDomain));
+        }
+        else return Optional.empty();
+    }
+
+
+    public boolean hasSameTopDomain(EdgeDomain other) {
+        if (other == null) return false;
+
+        return topDomain.equalsIgnoreCase(other.topDomain);
+    }
+
+    public String getTld() {
+        int dot = -1;
+        int length = topDomain.length();
+
+        if (ipPatternTest.test(topDomain)) {
+            return "IP";
+        }
+
+        if (govListTest.test(topDomain)) {
+            dot = topDomain.indexOf('.', Math.max(0, length - ".edu.uk".length()));
+        } else {
+            dot = topDomain.lastIndexOf('.');
+        }
+
+
+        if (dot < 0 || dot == topDomain.length() - 1) {
+            return "-";
+        } else {
+            return topDomain.substring(dot + 1);
+        }
+    }
+
+    public boolean equals(final Object o) {
+        if (o == this) return true;
+        if (!(o instanceof EdgeDomain other)) return false;
+        final String this$subDomain = this.getSubDomain();
+        final String other$subDomain = other.getSubDomain();
+        if (!Objects.equals(this$subDomain, other$subDomain)) return false;
+        final String this$domain = this.getTopDomain();
+        final String other$domain = other.getTopDomain();
+        if (!Objects.equals(this$domain, other$domain)) return false;
+        return true;
+    }
+
+    public int hashCode() {
+        final int PRIME = 59;
+        int result = 1;
+        final Object $subDomain = this.getSubDomain().toLowerCase();
+        result = result * PRIME + $subDomain.hashCode();
+        final Object $domain = this.getTopDomain().toLowerCase();
+        result = result * PRIME + $domain.hashCode();
+        return result;
+    }
+
+    @Nonnull
+    public String getSubDomain() {
+        return this.subDomain;
+    }
+
+    @Nonnull
+    public String getTopDomain() {
+        return this.topDomain;
+    }
+}
--- a/code/common/model/java/nu/marginalia/model/EdgeUrl.java
+++ b/code/common/model/java/nu/marginalia/model/EdgeUrl.java
@@ -0,0 +1,249 @@
+package nu.marginalia.model;
+
+import nu.marginalia.util.QueryParams;
+
+import javax.annotation.Nullable;
+import java.io.Serializable;
+import java.net.MalformedURLException;
+import java.net.URI;
+import java.net.URISyntaxException;
+import java.net.URL;
+import java.util.Objects;
+import java.util.Optional;
+import java.util.regex.Pattern;
+
+public class EdgeUrl implements Serializable {
+    public final String proto;
+    public final EdgeDomain domain;
+    public final Integer port;
+    public final String path;
+    public final String param;
+
+    public EdgeUrl(String proto, EdgeDomain domain, Integer port, String path, String param) {
+        this.proto = proto;
+        this.domain = domain;
+        this.port = port(port, proto);
+        this.path = path;
+        this.param = param;
+    }
+
+    public EdgeUrl(String url) throws URISyntaxException {
+        this(parseURI(url));
+    }
+
+    private static URI parseURI(String url) throws URISyntaxException {
+        try {
+            return new URI(urlencodeFixer(url));
+        } catch (URISyntaxException ex) {
+            throw new URISyntaxException("Failed to parse URI '" + url + "'", ex.getMessage());
+        }
+    }
+
+    public static Optional<EdgeUrl> parse(@Nullable String url) {
+        try {
+            if (null == url) {
+                return Optional.empty();
+            }
+
+            return Optional.of(new EdgeUrl(url));
+        } catch (URISyntaxException e) {
+            return Optional.empty();
+        }
+    }
+
+    private static Pattern badCharPattern = Pattern.compile("[ \t\n\"<>\\[\\]()',|]");
+
+    /* Java's URI parser is a bit too strict in throwing exceptions when there's an error.
+
+       Here on the Internet, standards are like the picture on the box of the frozen pizza,
+       and what you get is more like what's on the inside, we try to patch things instead,
+       just give it a best-effort attempt att cleaning out broken or unnecessary constructions
+       like bad or missing URLEncoding
+     */
+    public static String urlencodeFixer(String url) throws URISyntaxException {
+        var s = new StringBuilder();
+        String goodChars = "&.?:/-;+$#";
+        String hexChars = "0123456789abcdefABCDEF";
+
+        int pathIdx = findPathIdx(url);
+        if (pathIdx < 0) { // url looks like http://marginalia.nu
+            return url + "/";
+        }
+        s.append(url, 0, pathIdx);
+
+        // We don't want the fragment, and multiple fragments breaks the Java URIParser for some reason
+        int end = url.indexOf("#");
+        if (end < 0) end = url.length();
+
+        for (int i = pathIdx; i < end; i++) {
+            int c = url.charAt(i);
+
+            if (goodChars.indexOf(c) >= 0 || (c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z') || (c >= '0' && c <= '9')) {
+                s.appendCodePoint(c);
+            } else if (c == '%' && i + 2 < end) {
+                int cn = url.charAt(i + 1);
+                int cnn = url.charAt(i + 2);
+                if (hexChars.indexOf(cn) >= 0 && hexChars.indexOf(cnn) >= 0) {
+                    s.appendCodePoint(c);
+                } else {
+                    s.append("%25");
+                }
+            } else {
+                s.append(String.format("%%%02X", c));
+            }
+        }
+
+        return s.toString();
+    }
+
+    private static int findPathIdx(String url) throws URISyntaxException {
+        int colonIdx = url.indexOf(':');
+        if (colonIdx < 0 || colonIdx + 2 >= url.length()) {
+            throw new URISyntaxException(url, "Lacking protocol");
+        }
+        return url.indexOf('/', colonIdx + 2);
+    }
+
+    public EdgeUrl(URI URI) {
+        try {
+            String host = URI.getHost();
+
+            if (host == null) { // deal with a rare serialization error
+                host = "parse-error.invalid.example.com";
+            }
+
+            this.domain = new EdgeDomain(host);
+            this.path = URI.getPath().isEmpty() ? "/" : URI.getPath();
+            this.proto = URI.getScheme().toLowerCase();
+            this.port = port(URI.getPort(), proto);
+            this.param = QueryParams.queryParamsSanitizer(this.path, URI.getQuery());
+        } catch (Exception ex) {
+            System.err.println("Failed to parse " + URI);
+            throw ex;
+        }
+    }
+
+    public EdgeUrl(URL URL) {
+        try {
+            String host = URL.getHost();
+
+            if (host == null) { // deal with a rare serialization error
+                host = "parse-error.invalid.example.com";
+            }
+
+            this.domain = new EdgeDomain(host);
+            this.path = URL.getPath().isEmpty() ? "/" : URL.getPath();
+            this.proto = URL.getProtocol().toLowerCase();
+            this.port = port(URL.getPort(), proto);
+            this.param = QueryParams.queryParamsSanitizer(this.path, URL.getQuery());
+        } catch (Exception ex) {
+            System.err.println("Failed to parse " + URL);
+            throw ex;
+        }
+    }
+
+    private static Integer port(Integer port, String protocol) {
+        if (null == port || port < 1) {
+            return null;
+        }
+        if (protocol.equals("http") && port == 80) {
+            return null;
+        } else if (protocol.equals("https") && port == 443) {
+            return null;
+        }
+        return port;
+    }
+
+    public String toString() {
+        StringBuilder sb = new StringBuilder(256);
+
+        sb.append(proto);
+        sb.append("://");
+        sb.append(domain);
+
+        if (port != null) {
+            sb.append(':');
+            sb.append(port);
+        }
+
+        sb.append(path);
+
+        if (param != null) {
+            sb.append('?');
+            sb.append(param);
+        }
+
+        return sb.toString();
+    }
+
+    public String dir() {
+        return path.replaceAll("/[^/]+$", "/");
+    }
+
+    public String fileName() {
+        return path.replaceAll(".*/", "");
+    }
+
+    public int depth() {
+        return (int) path.chars().filter(c -> c == '/').count();
+    }
+
+    public EdgeUrl withPathAndParam(String path, String param) {
+        return new EdgeUrl(proto, domain, port, path, param);
+    }
+
+    public boolean equals(Object other) {
+        if (other == null) return false;
+        if (other == this) return true;
+        if (other instanceof EdgeUrl e) {
+            return Objects.equals(e.domain, domain)
+                    && Objects.equals(e.path, path)
+                    && Objects.equals(e.param, param);
+        }
+
+        return true;
+    }
+
+    public boolean equalsExactly(Object other) {
+        if (other == null) return false;
+        if (other == this) return true;
+        if (other instanceof EdgeUrl e) {
+            return Objects.equals(e.proto, proto)
+                    && Objects.equals(e.domain, domain)
+                    && Objects.equals(e.port, port)
+                    && Objects.equals(e.path, path)
+                    && Objects.equals(e.param, param);
+        }
+
+        return true;
+    }
+
+    public int hashCode() {
+        return Objects.hash(domain, path, param);
+    }
+
+    public URL asURL() throws MalformedURLException {
+        try {
+            return asURI().toURL();
+        } catch (URISyntaxException e) {
+            throw new MalformedURLException(e.getMessage());
+        }
+    }
+
+    public URI asURI() throws URISyntaxException {
+        if (port != null) {
+            return new URI(this.proto, null, this.domain.toString(), this.port, this.path, this.param, null);
+        }
+
+        return new URI(this.proto, this.domain.toString(), this.path, this.param, null);
+    }
+
+    public EdgeDomain getDomain() {
+        return this.domain;
+    }
+
+    public String getProto() {
+        return this.proto;
+    }
+
+}
--- a/code/common/model/java/nu/marginalia/model/crawl/DomainIndexingState.java
+++ b/code/common/model/java/nu/marginalia/model/crawl/DomainIndexingState.java
@@ -0,0 +1,18 @@
+package nu.marginalia.model.crawl;
+
+public enum DomainIndexingState {
+    ACTIVE("Active"),
+    EXHAUSTED("Fully Crawled"),
+    SPECIAL("Content is side-loaded"),
+    SOCIAL_MEDIA("Social media-like website"),
+    BLOCKED("Blocked"),
+    REDIR("Redirected to another domain"),
+    ERROR("Error during crawling"),
+    UNKNOWN("Unknown");
+
+    public String desc;
+
+    DomainIndexingState(String desc) {
+        this.desc = desc;
+    }
+}
--- a/code/common/model/java/nu/marginalia/model/crawl/HtmlFeature.java
+++ b/code/common/model/java/nu/marginalia/model/crawl/HtmlFeature.java
@@ -0,0 +1,96 @@
+package nu.marginalia.model.crawl;
+
+import java.util.Collection;
+
+public enum HtmlFeature {
+    // Note, the first 32 of these features are bit encoded in the database
+    // so be sure to keep anything that's potentially important toward the top
+    // of the list
+
+    MEDIA( "special:media"),
+    JS("special:scripts"),
+    AFFILIATE_LINK( "special:affiliate"),
+    TRACKING("special:tracking"),
+    TRACKING_ADTECH("special:ads"), // We'll call this ads for now
+
+    KEBAB_CASE_URL("special:kcurl"), // https://www.example.com/urls-that-look-like-this/
+    LONG_URL("special:longurl"),
+
+    CLOUDFLARE_FEATURE("special:cloudflare"),
+    CDN_FEATURE("special:cdn"),
+
+    VIEWPORT("special:viewport"),
+
+    COOKIES("special:cookies"),
+    CATEGORY_FOOD("category:food"),
+    ADVERTISEMENT("special:ads"),
+    CATEGORY_CRAFTS("category:crafts"),
+
+    GA_SPAM("special:gaspam"),
+
+    /** For fingerprinting and ranking */
+    OPENGRAPH("special:opengraph"),
+    OPENGRAPH_IMAGE("special:opengraph:image"),
+    TWITTERCARD("special:twittercard"),
+    TWITTERCARD_IMAGE("special:twittercard:image"),
+    FONTAWSESOME("special:fontawesome"),
+    GOOGLEFONTS("special:googlefonts"),
+    DNS_PREFETCH("special:dnsprefetch"),
+    PRELOAD("special:preload"),
+    PRECONNECT("special:preconnect"),
+    PINGBACK("special:pingback"),
+    FEED("special:feed"),
+    WEBMENTION("special:webmention"),
+    INDIEAUTH("special:indieauth"),
+    ME_TAG("special:metag"),
+    NEXT_TAG("special:nexttag"),
+    AMPHTML("special:amphtml"),
+    JSON_LD("special:jsonld"),
+    ORIGIN_TRIAL("special:origintrial"),
+    PROFILE_GMPG("special:profile-gpmg"),
+    QUANTCAST("special:quantcast"),
+    COOKIELAW("special:cookielaw"),
+    DIDOMI("special:didomi"),
+    PARDOT("special:pardot"),
+    ONESIGNAL("special:onesignal"),
+    DATE_TAG("special:date_tag"),
+    NOSCRIPT_TAG("special:noscript_tag"),
+
+    ROBOTS_INDEX("robots:index"),
+    ROBOTS_FOLLOW("robots:follow"),
+    ROBOTS_NOODP("robots:noodp"),
+    ROBOTS_NOYDIR("robots:noydir"),
+    DOFOLLOW_LINK("special:dofollow"),
+    APPLE_TOUCH_ICON("special:appleicon"),
+
+    S3_FEATURE("special:s3"),
+
+    UNKNOWN("special:uncategorized");
+
+
+    private final String keyword;
+
+    HtmlFeature(String keyword) {
+        this.keyword = keyword;
+    }
+
+    public String getKeyword() {
+        return keyword;
+    }
+
+    public static int encode(Collection<HtmlFeature> featuresAll) {
+        int ret = 0;
+        for (var feature : featuresAll) {
+            ret |= (1 << (feature.ordinal()));
+        }
+        return ret;
+    }
+
+    public static boolean hasFeature(int value, HtmlFeature feature) {
+        return (value & (1<< feature.ordinal())) != 0;
+    }
+
+    public int getFeatureBit() {
+        return (1<< ordinal());
+    }
+}
--- a/marginalia_nu/src/main/java/nu/marginalia/wmsa/edge/converting/processor/logic/pubdate/PubDate.java
+++ b/marginalia_nu/src/main/java/nu/marginalia/wmsa/edge/converting/processor/logic/pubdate/PubDate.java
@@ -1,4 +1,4 @@
-package nu.marginalia.wmsa.edge.converting.processor.logic.pubdate;
+package nu.marginalia.model.crawl;

 import java.time.LocalDate;
 import java.time.format.DateTimeFormatter;
@@ -57,5 +57,8 @@ public record PubDate(String dateIso8601, int year) {
    public static int fromYearByte(int yearByte) {
        return yearByte + ENCODING_OFFSET;
    }
+    public static int toYearByte(int year) {
+        return Math.max(0, year - ENCODING_OFFSET);
+    }

 }
--- a/code/common/model/java/nu/marginalia/model/crawl/UrlIndexingState.java
+++ b/code/common/model/java/nu/marginalia/model/crawl/UrlIndexingState.java
@@ -0,0 +1,10 @@
+package nu.marginalia.model.crawl;
+
+/** This should correspond to EC_URL.STATE */
+public enum UrlIndexingState {
+    OK,
+    REDIRECT,
+    DEAD,
+    DISQUALIFIED
+
+}
--- a/code/common/model/java/nu/marginalia/model/gson/GsonFactory.java
+++ b/code/common/model/java/nu/marginalia/model/gson/GsonFactory.java
@@ -0,0 +1,27 @@
+package nu.marginalia.model.gson;
+
+import com.google.gson.*;
+import marcono1234.gson.recordadapter.RecordTypeAdapterFactory;
+import nu.marginalia.model.EdgeDomain;
+import nu.marginalia.model.EdgeUrl;
+
+import java.net.URISyntaxException;
+
+public class GsonFactory {
+    public static Gson get() {
+        return new GsonBuilder()
+                .registerTypeAdapterFactory(RecordTypeAdapterFactory.builder().allowMissingComponentValues().create())
+                .registerTypeAdapter(EdgeUrl.class, (JsonSerializer<EdgeUrl>) (src, typeOfSrc, context) -> new JsonPrimitive(src.toString()))
+                .registerTypeAdapter(EdgeDomain.class, (JsonSerializer<EdgeDomain>) (src, typeOfSrc, context) -> new JsonPrimitive(src.toString()))
+                .registerTypeAdapter(EdgeUrl.class, (JsonDeserializer<EdgeUrl>) (json, typeOfT, context) -> {
+                    try {
+                        return new EdgeUrl(json.getAsString());
+                    } catch (URISyntaxException e) {
+                        throw new JsonParseException("URL Parse Exception", e);
+                    }
+                })
+                .registerTypeAdapter(EdgeDomain.class, (JsonDeserializer<EdgeDomain>) (json, typeOfT, context) -> new EdgeDomain(json.getAsString()))
+                .serializeSpecialFloatingPointValues()
+                .create();
+    }
+}
--- a/code/common/model/java/nu/marginalia/model/html/HtmlStandard.java
+++ b/code/common/model/java/nu/marginalia/model/html/HtmlStandard.java
@@ -0,0 +1,22 @@
+package nu.marginalia.model.html;
+
+// This class really doesn't belong anywhere, but will squat here for now
+public enum HtmlStandard {
+    PLAIN(0, 1),
+    UNKNOWN(0, 1),
+    HTML123(0, 1),
+    HTML4(-0.1, 1.05),
+    XHTML(-0.1, 1.05),
+    HTML5(0.5, 1.1);
+
+    /** Used to tune quality score */
+    public final double offset;
+    /** Used to tune quality score */
+    public final double scale;
+
+    HtmlStandard(double offset, double scale) {
+        this.offset = offset;
+        this.scale = scale;
+    }
+
+}
--- a/code/common/model/java/nu/marginalia/model/id/UrlIdCodec.java
+++ b/code/common/model/java/nu/marginalia/model/id/UrlIdCodec.java
@@ -0,0 +1,93 @@
+package nu.marginalia.model.id;
+
+/** URL id encoding scheme, including an optional ranking part that's used in the indices and washed away
+ * outside.   The ranking part is put in the highest bits so that when we sort the documents by id, they're
+ * actually sorted by rank.  Next is the domain id part, which keeps documents from the same domain clustered.
+ * Finally is the document ordinal part, which is a non-unique sequence number for within the current set of
+ * documents loaded.  The same ID may be re-used over time as a new index is loaded.
+ * <p></p>
+ * <table>
+ *     <tr><th>Part</th><th>Bits</th><th>Cardinality</th></tr>
+ *     <tr>
+ *         <td>rank</td><td>6 bits</td><td>64</td>
+ *     </tr>
+ *     <tr>
+ *         <td>domain</td><td>31 bits</td><td>2 billion</td>
+ *     </tr>
+ *     <tr>
+ *         <td>document</td><td>26 bits</td><td>67 million</td>
+ *     </tr>
+ * </table>
+ *  <p></p>
+ *  Most significant bit is unused for now because I'm not routing Long.compareUnsigned() all over the codebase.
+ *  <i>If</i> we end up needing more domains, we'll cross that bridge when we come to it.
+ *
+ * <h2>Coding Scheme</h2>
+ * <code><pre>
+ * [    | rank | domain | url ]
+ *  0   1       6       38    64
+ * </pre></code>
+ */
+public class UrlIdCodec {
+    private static final long RANK_MASK = 0xFE00_0000_0000_0000L;
+    private static final int DOCORD_MASK = 0x03FF_FFFF;
+
+    /** Encode a URL id without a ranking element */
+    public static long encodeId(int domainId, int documentOrdinal) {
+        domainId &= 0x7FFF_FFFF;
+        documentOrdinal &= 0x03FF_FFFF;
+
+        assert (domainId & 0x7FFF_FFFF) == domainId : "Domain id must be in [0, 2^31-1], was " + domainId;
+        assert (documentOrdinal & 0x03FF_FFFF) == documentOrdinal : "Document ordinal must be in [0, 2^26-1], was " + documentOrdinal;
+
+        return ((long) domainId << 26) | documentOrdinal;
+    }
+
+    /** Encode a URL id with a ranking element */
+    public static long encodeId(int rank, int domainId, int documentOrdinal) {
+        assert (rank & 0x3F) == rank : "Rank must be in [0, 63], was " + rank;
+        assert (domainId & 0x7FFF_FFFF) == domainId : "Domain id must be in [0, 2^31-1], was " + domainId;
+        assert (documentOrdinal & 0x03FF_FFFF) == documentOrdinal : "Document ordinal must be in [0, 2^26-1], was " + documentOrdinal;
+
+        domainId &= 0x7FFF_FFFF;
+        documentOrdinal &= 0x03FF_FFFF;
+        rank &= 0x3F;
+
+        return  ((long) rank << 57) | ((long) domainId << 26) | documentOrdinal;
+    }
+    /** Add a ranking element to an existing combined URL id.
+     *
+     * @param rank [0,1] the importance of the domain, low is good
+     * @param urlId
+     */
+    public static long addRank(float rank, long urlId) {
+        long rankPart = (int)(rank * (1<<6));
+
+        if (rankPart >= 64) rankPart = 63;
+        if (rankPart < 0) rankPart = 0;
+
+        return (urlId&(~RANK_MASK)) | (rankPart << 57);
+    }
+
+    /** Extract the domain component from this URL id */
+    public static int getDomainId(long combinedId) {
+        return (int) ((combinedId >>> 26) & 0x7FFF_FFFFL);
+    }
+
+    /** Extract the document ordinal component from this URL id */
+    public static int getDocumentOrdinal(long combinedId) {
+        return (int) (combinedId & DOCORD_MASK);
+    }
+
+
+    /** Extract the document ordinal component from this URL id */
+    public static int getRank(long combinedId) {
+        return (int) (combinedId >>> 57) & 0x3F;
+    }
+
+    /** Mask out the ranking element from this URL id */
+    public static long removeRank(long combinedId) {
+        return combinedId & ~RANK_MASK;
+    }
+
+}
--- a/code/common/model/java/nu/marginalia/model/idx/CodedWordSpan.java
+++ b/code/common/model/java/nu/marginalia/model/idx/CodedWordSpan.java
@@ -0,0 +1,6 @@
+package nu.marginalia.model.idx;
+
+import nu.marginalia.sequence.VarintCodedSequence;
+
+public record CodedWordSpan(byte code, VarintCodedSequence spans) {
+}
--- a/code/common/model/java/nu/marginalia/model/idx/DocumentFlags.java
+++ b/code/common/model/java/nu/marginalia/model/idx/DocumentFlags.java
@@ -0,0 +1,35 @@
+package nu.marginalia.model.idx;
+
+import java.util.EnumSet;
+
+public enum DocumentFlags {
+    Javascript,
+    PlainText,
+    GeneratorDocs,
+    GeneratorForum,
+    GeneratorWiki,
+    Sideloaded,
+    Unused7,
+    Unused8,
+    ;
+
+    public int asBit() {
+        return 1 << ordinal();
+    }
+
+    public boolean isPresent(long value) {
+        return (asBit() & value) > 0;
+    }
+
+    public static EnumSet<DocumentFlags> decode(long encodedValue) {
+        EnumSet<DocumentFlags> ret = EnumSet.noneOf(DocumentFlags.class);
+
+        for (DocumentFlags f : values()) {
+            if ((encodedValue & f.asBit() & 0xff) > 0) {
+                ret.add(f);
+            }
+        }
+
+        return ret;
+    }
+}
--- a/code/common/model/java/nu/marginalia/model/idx/DocumentMetadata.java
+++ b/code/common/model/java/nu/marginalia/model/idx/DocumentMetadata.java
@@ -0,0 +1,168 @@
+package nu.marginalia.model.idx;
+
+import nu.marginalia.model.crawl.PubDate;
+
+import java.io.Serializable;
+import java.util.EnumSet;
+import java.util.Set;
+
+import static java.lang.Math.max;
+import static java.lang.Math.min;
+
+/**  Document level metadata designed to fit in a single 64 bit long.
+ *
+ * @param avgSentLength average sentence length
+ * @param rank domain ranking
+ * @param encDomainSize encoded number of documents in the domain
+ * @param topology a measure of how important the document is
+ * @param year encoded publishing year
+ * @param sets bit mask for search sets
+ * @param quality quality of the document (0-15); 0 is best, 15 is worst
+ * @param flags flags (see {@link DocumentFlags})
+ */
+public record DocumentMetadata(int avgSentLength,
+                               int rank,
+                               int encDomainSize,
+                               int topology,
+                               int year,
+                               int sets,
+                               int quality,
+                               byte flags)
+    implements Serializable
+{
+
+    public String toString() {
+        StringBuilder sb = new StringBuilder(getClass().getSimpleName());
+        sb.append('[')
+                .append("avgSentL=").append(avgSentLength).append(", ")
+                .append("rank=").append(rank).append(", ")
+                .append("domainSize=").append(ENC_DOMAIN_SIZE_MULTIPLIER * encDomainSize).append(", ")
+                .append("topology=").append(topology).append(", ")
+                .append("year=").append(PubDate.fromYearByte(year)).append(", ")
+                .append("sets=").append(sets).append(", ")
+                .append("quality=").append(quality).append(", ")
+                .append("flags=").append(flagSet()).append("]");
+        return sb.toString();
+    }
+
+    public static final long ASL_MASK = 0x03L;
+    public static final int ASL_SHIFT = 56;
+
+    public static final long RANK_MASK = 0xFFL;
+    public static final int RANK_SHIFT = 48;
+
+    public static final long ENC_DOMAIN_SIZE_MASK = 0xFFL;
+    public static final int ENC_DOMAIN_SIZE_SHIFT = 40;
+    public static final int ENC_DOMAIN_SIZE_MULTIPLIER = 5;
+
+    public static final long TOPOLOGY_MASK = 0xFFL;
+
+    public static final int TOPOLOGY_SHIFT = 32;
+
+    public static final long YEAR_MASK = 0xFFL;
+    public static final int YEAR_SHIFT = 24;
+
+    public static final long SETS_MASK = 0xFL;
+    public static final int SETS_SHIFT = 16;
+
+    public static final long QUALITY_MASK = 0xFL;
+    public static final int QUALITY_SHIFT = 8;
+
+    public static long defaultValue() {
+        return 0L;
+    }
+    public DocumentMetadata() {
+        this(defaultValue());
+    }
+
+    public DocumentMetadata(int avgSentLength, int year, int quality, EnumSet<DocumentFlags> flags) {
+        this(avgSentLength, 0, 0, 0, year, 0, quality, encodeFlags(flags));
+    }
+
+    public DocumentMetadata withSizeAndTopology(int size, int topology) {
+        final int encSize = (int) Math.min(ENC_DOMAIN_SIZE_MASK, Math.max(1, size / ENC_DOMAIN_SIZE_MULTIPLIER));
+
+        return new DocumentMetadata(avgSentLength, rank, encSize, topology, year, sets, quality, flags);
+    }
+
+    private static byte encodeFlags(Set<DocumentFlags> flags) {
+        byte ret = 0;
+        for (var flag : flags) { ret |= flag.asBit(); }
+        return ret;
+    }
+
+    public boolean hasFlag(DocumentFlags flag) {
+        return (flags & flag.asBit()) != 0;
+    }
+
+    public DocumentMetadata(long value) {
+        this(
+                (int) ((value >>> ASL_SHIFT) & ASL_MASK),
+                (int) ((value >>> RANK_SHIFT) & RANK_MASK),
+                (int) ((value >>> ENC_DOMAIN_SIZE_SHIFT) & ENC_DOMAIN_SIZE_MASK),
+                (int) ((value >>> TOPOLOGY_SHIFT) & TOPOLOGY_MASK),
+                (int) ((value >>> YEAR_SHIFT) & YEAR_MASK),
+                (int) ((value >>> SETS_SHIFT) & SETS_MASK),
+                (int) ((value >>> QUALITY_SHIFT) & QUALITY_MASK),
+                (byte) (value & 0xFF)
+        );
+    }
+
+    public static boolean hasFlags(long encoded, long metadataBitMask) {
+        return ((encoded & 0xFF) & metadataBitMask) == metadataBitMask;
+    }
+
+    public long encode() {
+        long ret = 0;
+        ret |= Byte.toUnsignedLong(flags);
+        ret |= min(QUALITY_MASK, max(0, quality)) << QUALITY_SHIFT;
+        ret |= min(SETS_MASK, max(0, sets)) << SETS_SHIFT;
+        ret |= min(YEAR_MASK, max(0, year)) << YEAR_SHIFT;
+        ret |= min(TOPOLOGY_MASK, max(0, topology)) << TOPOLOGY_SHIFT;
+        ret |= min(ENC_DOMAIN_SIZE_MASK, max(0, encDomainSize)) << ENC_DOMAIN_SIZE_SHIFT;
+        ret |= min(RANK_MASK, max(0, rank)) << RANK_SHIFT;
+        ret |= min(ASL_MASK, max(0, avgSentLength)) << ASL_SHIFT;
+        return ret;
+    }
+
+    public boolean isEmpty() {
+        return avgSentLength == 0 && encDomainSize == 0 && topology == 0 && sets == 0 && quality == 0 && year == 0 && flags == 0 && rank == 0;
+    }
+
+    public static int decodeQuality(long encoded) {
+        return (int) ((encoded >>> QUALITY_SHIFT) & QUALITY_MASK);
+    }
+
+    public static int decodeTopology(long encoded) {
+        return (int) ((encoded >>> TOPOLOGY_SHIFT) & TOPOLOGY_MASK);
+    }
+
+    public static int decodeAvgSentenceLength(long encoded) {
+        return (int) ((encoded >>> ASL_SHIFT) & ASL_MASK);
+    }
+
+    public static int decodeYear(long encoded) {
+        return PubDate.fromYearByte((int) ((encoded >>> YEAR_SHIFT) & YEAR_MASK));
+    }
+
+    public int size() {
+        return ENC_DOMAIN_SIZE_MULTIPLIER * encDomainSize;
+    }
+
+    public static int decodeSize(long encoded) {
+        return ENC_DOMAIN_SIZE_MULTIPLIER * (int) ((encoded >>> ENC_DOMAIN_SIZE_SHIFT) & ENC_DOMAIN_SIZE_MASK);
+    }
+
+    public static int decodeRank(long encoded) {
+        return  (int) ((encoded >>> RANK_SHIFT) & RANK_MASK);
+    }
+
+    public static long encodeRank(long encoded, int rank) {
+        return encoded | min(RANK_MASK, max(0, rank)) << RANK_SHIFT;
+    }
+
+    public EnumSet<DocumentFlags> flagSet() {
+        return DocumentFlags.decode(flags);
+    }
+
+}
--- a/code/common/model/java/nu/marginalia/model/idx/WordFlags.java
+++ b/code/common/model/java/nu/marginalia/model/idx/WordFlags.java
@@ -0,0 +1,73 @@
+package nu.marginalia.model.idx;
+
+
+import java.util.EnumSet;
+
+public enum WordFlags {
+    /** Word appears in title */
+    Title,
+
+    /** Word appears to be the subject in several sentences */
+    Subjects,
+
+    /** Word is a likely named object. This is a weaker version of Subjects. */
+    NamesWords,
+
+    /** The word isn't actually a word on page, but a fake keyword from the code
+     * to aid discovery
+     */
+    Synthetic,
+
+    /** Word is important to site
+     */
+    Site,
+
+    /** Word is important to adjacent documents
+     * */
+    SiteAdjacent,
+
+    /** Keyword appears in URL path
+     */
+    UrlPath,
+
+    /** Keyword appears in domain name
+     */
+    UrlDomain,
+
+    /** Word appears in an external link */
+    ExternalLink
+    ;
+
+    public byte asBit() {
+        return (byte) (1 << ordinal());
+    }
+
+    public boolean isPresent(byte value) {
+        return (asBit() & value) > 0;
+    }
+
+    public boolean isAbsent(byte value) {
+        return (asBit() & value) == 0;
+    }
+
+    public static byte encode(EnumSet<WordFlags> flags) {
+        byte ret = 0;
+        for (WordFlags f : flags) {
+            ret |= f.asBit();
+        }
+        return ret;
+    }
+
+    public static EnumSet<WordFlags> decode(byte encodedValue) {
+        EnumSet<WordFlags> ret = EnumSet.noneOf(WordFlags.class);
+
+        for (WordFlags f : values()) {
+            if ((encodedValue & f.asBit()) > 0) {
+                ret.add(f);
+            }
+        }
+
+        return ret;
+    }
+
+}
--- a/code/common/model/java/nu/marginalia/util/QueryParams.java
+++ b/code/common/model/java/nu/marginalia/util/QueryParams.java
@@ -0,0 +1,93 @@
+package nu.marginalia.util;
+
+import org.apache.commons.lang3.StringUtils;
+
+import javax.annotation.Nullable;
+import java.util.ArrayList;
+import java.util.Comparator;
+import java.util.List;
+import java.util.StringJoiner;
+
+public class QueryParams {
+
+    @Nullable
+    public static String queryParamsSanitizer(String path, @Nullable String queryParams) {
+        if (queryParams == null) {
+            return null;
+        }
+
+        String ret;
+        if (queryParams.indexOf('&') >= 0) {
+
+            List<String> parts = new ArrayList<>();
+            for (var part : StringUtils.split(queryParams, '&')) {
+                if (QueryParams.isPermittedParam(path, part)) {
+                    parts.add(part);
+                }
+            }
+            if (parts.size() > 1) {
+                parts.sort(Comparator.naturalOrder());
+            }
+            StringJoiner retJoiner = new StringJoiner("&");
+            parts.forEach(retJoiner::add);
+            ret = retJoiner.toString();
+        }
+        else if (isPermittedParam(path, queryParams)) {
+            ret = queryParams;
+        }
+        else {
+            return null;
+        }
+
+        if (ret.isBlank())
+            return null;
+
+        return ret;
+    }
+
+    public static boolean isPermittedParam(String path, String param) {
+        if (path.endsWith(".cgi")) return true;
+
+        if (path.endsWith("/posting.php")) return false;
+
+        if (param.startsWith("id=")) return true;
+        if (param.startsWith("p=")) {
+            // Don't retain forum links with post-id:s, they're always non-canonical and eat up a lot of
+            // crawling bandwidth
+
+            if (path.endsWith("showthread.php") || path.endsWith("viewtopic.php")) {
+                return false;
+            }
+            return true;
+        }
+        if (param.startsWith("f=")) {
+            if (path.endsWith("showthread.php") || path.endsWith("viewtopic.php")) {
+                return false;
+            }
+            return true;
+        }
+        if (param.startsWith("i=")) return true;
+        if (param.startsWith("start=")) return true;
+        if (param.startsWith("t=")) return true;
+        if (param.startsWith("v=")) return true;
+
+        if (param.startsWith("post=")) return true;
+
+        if (path.endsWith("index.php")) {
+            if (param.startsWith("showtopic="))
+                return true;
+            if (param.startsWith("showforum="))
+                return true;
+        }
+
+        if (path.endsWith("StoryView.py")) { // folklore.org is neat
+            return param.startsWith("project=") || param.startsWith("story=");
+        }
+
+        // www.perseus.tufts.edu:
+        if (param.startsWith("collection=")) return true;
+        if (param.startsWith("doc=")) return true;
+
+        return false;
+    }
+}
--- a/code/common/model/readme.md
+++ b/code/common/model/readme.md
@@ -0,0 +1,11 @@
+# Model
+
+This package contains common models to the search engine
+
+## Central Classes
+
+* [EdgeDomain](java/nu/marginalia/model/EdgeDomain.java)
+* [EdgeUrl](java/nu/marginalia/model/EdgeUrl.java)
+* [DocumentMetadata](java/nu/marginalia/model/idx/DocumentMetadata.java)
+* [DocumentFlags](java/nu/marginalia/model/idx/DocumentFlags.java)
+* [WordFlags](java/nu/marginalia/model/idx/WordFlags.java)
--- a/Show More
+++ b/Show More
				`@@ -0,0 +1 @@`
				`DELETE FROM FILE_STORAGE WHERE TYPE IN ('LEXICON_STAGING', 'LEXICON_LIVE');`
				`@@ -0,0 +1 @@`
				`UPDATE MESSAGE_QUEUE SET STATE='DEAD' WHERE STATE='NEW';`
				`@@ -0,0 +1 @@`
				`ALTER TABLE EC_DOMAIN ADD COLUMN NODE_AFFINITY INT NOT NULL;`
				`@@ -0,0 +1 @@`
				`ALTER TABLE WMSA_prod.NODE_CONFIGURATION ADD COLUMN KEEP_WARCS BOOLEAN DEFAULT FALSE;`
				`@@ -0,0 +1 @@`
				`ALTER TABLE MESSAGE_QUEUE ADD COLUMN AUDIT_RELATED_ID LONG NOT NULL DEFAULT -1 COMMENT 'To be applied to any new messages created while handling a message';`
				`@@ -0,0 +1 @@`
				`ALTER TABLE CONF_DOMAIN_RANKING_SET DROP COLUMN ALGORITHM;`