feat: properly include transit data

This commit is contained in:
Jan-Henrik 2026-03-04 11:41:42 +01:00
parent 78240b77aa
commit bfe6146645
22 changed files with 1476 additions and 127 deletions

109
README.md
View file

@ -16,21 +16,27 @@ Next.js App Server
└── Valkey (API response cache, BullMQ queues)
BullMQ Worker (pipeline queue, concurrency 8)
├── refresh-city → orchestrates full ingest via FlowProducer
├── download-pbf → streams OSM PBF from Geofabrik
├── extract-pois → osmium filter + osm2pgsql flex → raw_pois
├── build-valhalla → clips PBF, builds Valhalla routing tiles
├── generate-grid → PostGIS 200 m hex grid → grid_points
├── compute-scores → two-phase orchestrator (see Scoring below)
└── compute-routing → Valhalla matrix → grid_poi_details
(15 parallel jobs: 3 modes × 5 categories)
├── refresh-city → orchestrates full ingest via FlowProducer
├── download-pbf → streams OSM PBF from Geofabrik
├── extract-pois → osmium filter + osm2pgsql flex → raw_pois
├── build-valhalla → clips PBF, builds Valhalla routing tiles + transit tiles
├── download-gtfs-de → downloads & extracts GTFS feed for German ÖPNV
├── generate-grid → PostGIS 200 m hex grid → grid_points
├── compute-scores → two-phase orchestrator (see Scoring below)
├── compute-routing → Valhalla matrix → grid_poi_details
│ (15 parallel jobs: 3 modes × 5 categories)
└── compute-transit → Valhalla isochrones → grid_poi_details (travel_mode='transit')
(1 job per city, covers all categories via PostGIS spatial join)
BullMQ Worker (valhalla queue, concurrency 1)
└── build-valhalla → runs valhalla_build_tiles, manages valhalla_service
└── build-valhalla → valhalla_ingest_transit + valhalla_convert_transit (GTFS → tiles),
valhalla_build_tiles (road graph + transit connection),
manages valhalla_service
Valhalla (child process of valhalla worker)
├── sources_to_targets matrix → compute-routing jobs
└── isochrones endpoint → user click → /api/isochrones
├── sources_to_targets matrix → compute-routing jobs (walking/cycling/driving)
├── isochrone (multimodal) → compute-transit jobs
└── isochrone endpoint → user click → /api/isochrones
Protomaps → self-hosted map tiles (PMTiles)
```
@ -105,52 +111,97 @@ docker compose up postgres valkey -d
### Data pipeline
For each grid point (200 m hexagonal spacing) the pipeline runs in two phases:
For each city the pipeline runs in two phases:
**Phase 1 — Routing** (15 parallel jobs: 3 modes × 5 categories)
**Phase 1 — Routing** (parallel child jobs)
A PostGIS KNN lateral join (`<->` operator) finds the 6 nearest POIs in the category for each grid point. Those POI coordinates are passed to Valhalla's `sources_to_targets` matrix API to obtain real network travel times for the requested travel mode (walking, cycling, driving). The nearest POI per subcategory is persisted to `grid_poi_details`.
*Walking, cycling, driving — 15 jobs (3 modes × 5 categories):*
A PostGIS KNN lateral join (`<->` operator) finds the 6 nearest POIs in the category for each grid point (200 m hexagonal spacing). Those POI coordinates are sent in batches of 20 to Valhalla's `sources_to_targets` matrix API to obtain exact real-network travel times. The nearest POI per subcategory is persisted to `grid_poi_details`.
*Transit — 1 job per city (`compute-transit`):*
Valhalla's matrix API does not support transit. Instead, for each grid point a multimodal isochrone is fetched from Valhalla at contour intervals of 5, 10, 15, 20, and 30 minutes (fixed departure: Tuesday 08:00 to ensure reproducible GTFS results). PostGIS `ST_Within` then classifies all POIs in the city into the smallest contour they fall within, giving estimated travel times of 300 s / 600 s / 900 s / 1 200 s / 1 800 s respectively. Grid points outside the transit network are silently skipped — transit contributes nothing to their score and the other modes compensate.
**Phase 2 — Score aggregation**
Scores are precomputed for every combination of:
- 5 thresholds: 5, 10, 15, 20, 30 minutes
- 3 travel modes: walking, cycling, driving
- 5 travel modes (see below)
- 5 profiles: Universal, Young Family, Senior, Young Professional, Student
### Travel modes
| Mode | Internal key | How travel time is obtained |
|------|--------------|-----------------------------|
| Best mode | `fifteen` | Synthetic — minimum travel time across walking, cycling, and transit per subcategory. A destination reachable by any of these modes within the threshold counts as accessible. Driving excluded intentionally. |
| Walking | `walking` | Valhalla pedestrian matrix, exact seconds |
| Cycling | `cycling` | Valhalla bicycle matrix, exact seconds |
| Transit | `transit` | Valhalla multimodal isochrone, quantised to 5-min bands (requires GTFS feed) |
| Driving | `driving` | Valhalla auto matrix, exact seconds |
The `fifteen` mode is computed entirely in memory during Phase 2: for each (grid point, category, subcategory) the minimum travel time across the three active modes is used, then scored normally. No extra routing jobs are needed.
### Scoring formula
Each subcategory *i* within a category contributes a sigmoid score:
Each subcategory *i* within a category contributes a sigmoid score based on the real travel time `t` and the selected threshold `T` (both in seconds):
```
sigmoid(t, T) = 1 / (1 + exp(4 × (t T) / T))
```
Where `t` is the Valhalla travel time in seconds and `T` is the threshold in seconds. The sigmoid equals 0.5 exactly at the threshold and approaches 1 for very short times.
The sigmoid equals 0.5 exactly at the threshold and approaches 1 for very short times. It is continuous, so a 14-minute trip to a park still contributes nearly as much as a 10-minute trip under a 15-minute threshold.
The category score combines subcategories via a complement-product, weighted by per-profile subcategory importance weights `w_i ∈ [0, 1]`:
The category score combines all subcategories via a **complement product**, weighted by per-profile subcategory importance weights `w_i ∈ [0, 1]`:
```
category_score = 1 ∏ (1 w_i × sigmoid(t_i, T))
```
This captures diversity of coverage: reaching one supermarket near you already yields a high score, but having a pharmacy, bakery, and bank nearby as well pushes the score higher.
This captures diversity of coverage: one nearby supermarket already yields a high score, but also having a pharmacy and a bakery pushes it higher. Missing subcategories (no POI found) are simply omitted from the product and do not penalise the score.
### Profiles
Each profile carries two sets of weights:
- **Category weights** (used as slider presets in the UI, range 02): how much relative importance each of the 5 categories gets in the composite score.
- **Subcategory weights** (used during score computation, range 01): how much a specific subcategory contributes to its category score.
- **Category weights** (used as slider presets in the UI, range 02): how much relative importance each of the 5 categories receives in the composite score.
- **Subcategory weights** (baked into precomputed scores, range 01): how strongly a specific subcategory contributes to its parent category score.
| Profile | Focus |
|---------|-------|
| Universal | Balanced across all resident types |
| Young Family | Schools, playgrounds, healthcare, daily shopping |
| Senior | Healthcare, local services, accessible green space, transit |
| Young Professional | Rapid transit, fitness, dining, coworking |
| Student | University, library, cafés, transit, budget services |
| Profile | Emoji | Category emphasis | Notable subcategory boosts |
|---------|-------|-------------------|---------------------------|
| Universal | ⚖️ | All equal (1.0) | Balanced baseline |
| Young Family | 👨‍👩‍👧 | Work & School 1.5×, Recreation 1.4×, Service 1.2× | school, kindergarten, playground, clinic all → 1.0 |
| Senior | 🧓 | Culture & Community 1.5×, Service 1.4×, Transport 1.1× | hospital, clinic, pharmacy, social services → 1.0; school → 0.05 |
| Young Professional | 💼 | Transport 1.5×, Recreation 1.1× | metro, train → 1.0; gym 0.9; coworking 0.85; school → 0.1 |
| Student | 🎓 | Work & School 1.5×, Transport 1.4×, Culture 1.2× | university, library → 1.0; bike share 0.85; school → 0.05 |
### Composite score
The composite shown on the heatmap is a weighted average of the 5 category scores. Category weights come from the selected profile but can be adjusted freely in the UI. All scores are precomputed — changing the profile or weights only triggers a client-side re-blend with no server round-trip.
The composite shown on the heatmap is a weighted average of the 5 category scores. Category weights come from the selected profile but can be adjusted freely in the UI. **All scores are precomputed** — changing the profile, threshold, or travel mode only queries the database; adjusting the category weight sliders re-blends entirely client-side with no round-trip.
### Per-location score (pin)
When a user places a pin on the map:
1. The nearest grid point is found via a PostGIS `<->` KNN query.
2. Precomputed `grid_scores` rows for that grid point, travel mode, threshold, and profile are returned — one row per category.
3. Per-subcategory detail rows from `grid_poi_details` are also fetched, showing the name, straight-line distance, and travel time to the nearest POI in each subcategory for the requested mode.
4. An isochrone overlay is fetched live from Valhalla and shown on the map (walking is used as the representative mode for `fifteen` and `transit` since Valhalla's interactive isochrone only supports single-mode costing).
The pin panel also shows estate value data (land price in €/m² from the BORIS NI cadastre) for cities in Lower Saxony, including a percentile rank among all zones in the city and a "peer percentile" rank among zones with similar accessibility scores.
### Hidden gem score
For cities with BORIS NI estate value data, a **hidden gem score** is precomputed per grid point at the end of Phase 2:
```
hidden_gem_score = composite_accessibility × (1 price_rank_within_decile)
```
- `composite_accessibility` — average of all category scores for that grid point (walking / 15 min / universal profile)
- `price_rank_within_decile``PERCENT_RANK()` of the nearest zone's land price among all zones in the same accessibility decile (0 = cheapest, 1 = most expensive relative to equally accessible peers)
The result is in [0, 1]: high only when a location is both accessible *and* priced below its peers. Stored in `grid_points.hidden_gem_score` and served as a separate MVT overlay at `/api/tiles/hidden-gems/`.
The map offers three mutually exclusive base overlays (switchable in the control panel):
- **Accessibility** — default grid heatmap coloured by composite score
- **Land value** — BORIS NI zones coloured by €/m² (Lower Saxony cities only)
- **Hidden gems** — grid points coloured by hidden gem score (Lower Saxony cities only)

View file

@ -19,7 +19,7 @@ export async function GET(
return NextResponse.json({
id: job.id,
type: job.data.type,
citySlug: job.data.citySlug,
citySlug: "citySlug" in job.data ? job.data.citySlug : null,
state,
progress: job.progress ?? null,
failedReason: job.failedReason ?? null,

View file

@ -1,7 +1,7 @@
import { NextRequest } from "next/server";
import { Job } from "bullmq";
import { getPipelineQueue, getValhallaQueue } from "@/lib/queue";
import type { PipelineJobData, JobProgress, ComputeScoresJobData } from "@/lib/queue";
import type { PipelineJobData, JobProgress, ComputeScoresJobData, RefreshCityJobData } from "@/lib/queue";
import type { SSEEvent } from "@transportationer/shared";
import { CATEGORY_IDS } from "@transportationer/shared";
@ -20,10 +20,17 @@ export async function GET(
const encoder = new TextEncoder();
let timer: ReturnType<typeof setInterval> | null = null;
// Resolve citySlug from the refresh-city job that was returned to the UI.
// Resolve citySlug and creation timestamp from the refresh-city job.
// We track progress by citySlug across all pipeline stages because
// refresh-city itself completes almost immediately after enqueueing children.
// jobCreatedAt gates failed lookups so we never match results from a
// previous ingest of the same city.
// computeScoresJobId is captured after flow.add() by the worker; once
// available it allows exact-ID matching for the completion check,
// eliminating false positives from previous runs.
let citySlug: string;
let jobCreatedAt: number;
let computeScoresJobId: string | undefined;
try {
const job = await Job.fromId<PipelineJobData>(queue, id);
if (!job) {
@ -31,7 +38,9 @@ export async function GET(
headers: { "Content-Type": "text/event-stream" },
});
}
citySlug = job.data.citySlug ?? "";
citySlug = "citySlug" in job.data ? (job.data.citySlug ?? "") : "";
jobCreatedAt = job.timestamp;
computeScoresJobId = (job.data as RefreshCityJobData).computeScoresJobId;
} catch {
return new Response(fmt({ type: "failed", jobId: id, error: "Queue unavailable" }), {
headers: { "Content-Type": "text/event-stream" },
@ -57,6 +66,15 @@ export async function GET(
const poll = async () => {
try {
// If computeScoresJobId wasn't set when the stream opened (race with
// the worker updating job data), re-read the job once to pick it up.
if (!computeScoresJobId) {
const refreshJob = await Job.fromId<PipelineJobData>(queue, id);
computeScoresJobId = refreshJob
? (refreshJob.data as RefreshCityJobData).computeScoresJobId
: undefined;
}
// 1. Fetch active jobs and waiting-children jobs in parallel.
const [pipelineActive, valhallaActive, waitingChildren] = await Promise.all([
queue.getActive(0, 100),
@ -66,32 +84,60 @@ export async function GET(
// 1a. Parallel routing phase: compute-scores is waiting for its routing
// children to finish. Report aggregate progress instead of one job's pct.
// Only enter this branch when routingDispatched=true (Phase 1 has run).
// Before that, compute-scores is in waiting-children while generate-grid
// is running — fall through to the sequential active-job check instead.
// Match by job ID (exact) when available; fall back to citySlug for the
// brief window before computeScoresJobId is written to the job record.
const csWaiting = waitingChildren.find(
(j) => j.data.citySlug === citySlug && j.data.type === "compute-scores",
(j) =>
j.data.type === "compute-scores" &&
(j.data as ComputeScoresJobData).routingDispatched === true &&
(computeScoresJobId ? j.id === computeScoresJobId : j.data.citySlug === citySlug),
);
if (csWaiting) {
const csData = csWaiting.data as ComputeScoresJobData;
const totalRoutingJobs = csData.modes.length * CATEGORY_IDS.length;
// Transit uses a single compute-transit child, not per-category routing jobs.
const routingModes = csData.modes.filter((m) => m !== "transit");
const totalRoutingJobs = routingModes.length * CATEGORY_IDS.length;
const hasTransit = csData.modes.includes("transit");
// Count jobs that haven't finished yet (active or still waiting in queue)
const pipelineWaiting = await queue.getWaiting(0, 200);
const stillActive = pipelineActive.filter(
const stillRoutingActive = pipelineActive.filter(
(j) => j.data.citySlug === citySlug && j.data.type === "compute-routing",
).length;
const stillWaiting = pipelineWaiting.filter(
const stillRoutingWaiting = pipelineWaiting.filter(
(j) => j.data.citySlug === citySlug && j.data.type === "compute-routing",
).length;
const completedCount = Math.max(0, totalRoutingJobs - stillActive - stillWaiting);
const pct = totalRoutingJobs > 0
? Math.round((completedCount / totalRoutingJobs) * 100)
: 0;
const completedRouting = Math.max(0, totalRoutingJobs - stillRoutingActive - stillRoutingWaiting);
enqueue({
type: "progress",
stage: "Computing scores",
pct,
message: `${completedCount} / ${totalRoutingJobs} routing jobs`,
});
// Check if compute-transit is still running
const transitRunning =
hasTransit &&
(pipelineActive.some((j) => j.data.citySlug === citySlug && j.data.type === "compute-transit") ||
pipelineWaiting.some((j) => j.data.citySlug === citySlug && j.data.type === "compute-transit"));
// compute-transit job also shows its own progress when active — prefer that
const transitActiveJob = pipelineActive.find(
(j) => j.data.citySlug === citySlug && j.data.type === "compute-transit",
);
if (transitActiveJob) {
const p = transitActiveJob.progress as JobProgress | undefined;
if (p?.stage) {
enqueue({ type: "progress", stage: p.stage, pct: p.pct, message: p.message });
return;
}
}
const pct = totalRoutingJobs > 0
? Math.round((completedRouting / totalRoutingJobs) * 100)
: transitRunning ? 99 : 100;
const message = transitRunning && completedRouting >= totalRoutingJobs
? "Routing done — computing transit isochrones…"
: `${completedRouting} / ${totalRoutingJobs} routing jobs`;
enqueue({ type: "progress", stage: "Computing scores", pct, message });
return;
}
@ -117,7 +163,7 @@ export async function GET(
return;
}
// 2. No active stage — check for a recent failure in either queue.
// 2. No active stage — check for a failure that occurred after this refresh started.
const [pipelineFailed, valhallaFailed] = await Promise.all([
queue.getFailed(0, 50),
valhallaQueue.getFailed(0, 50),
@ -126,7 +172,7 @@ export async function GET(
(j) =>
j.data.citySlug === citySlug &&
j.data.type !== "refresh-city" &&
Date.now() - (j.finishedOn ?? 0) < 600_000,
(j.finishedOn ?? 0) > jobCreatedAt,
);
if (recentFail) {
enqueue({
@ -138,13 +184,16 @@ export async function GET(
return;
}
// 3. Check if compute-scores completed recently → full pipeline done.
// 3. Check if the specific compute-scores job completed → pipeline done.
// Use exact job ID match (computeScoresJobId) to avoid false positives
// from a previous run's completed record still in BullMQ's retention window.
const completed = await queue.getCompleted(0, 100);
const finalDone = completed.find(
(j) =>
j.data.citySlug === citySlug &&
j.data.type === "compute-scores" &&
Date.now() - (j.finishedOn ?? 0) < 3_600_000,
const finalDone = completed.find((j) =>
computeScoresJobId
? j.id === computeScoresJobId
: j.data.citySlug === citySlug &&
j.data.type === "compute-scores" &&
(j.finishedOn ?? 0) > jobCreatedAt,
);
if (finalDone) {
enqueue({ type: "completed", jobId: finalDone.id ?? "" });

View file

@ -8,7 +8,7 @@ import {
export const runtime = "nodejs";
const VALID_MODES = ["walking", "cycling", "driving"];
const VALID_MODES = ["walking", "cycling", "driving", "fifteen"];
const VALID_THRESHOLDS = [5, 8, 10, 12, 15, 20, 25, 30];
export async function GET(req: NextRequest) {

View file

@ -4,7 +4,7 @@ import { PROFILE_IDS } from "@transportationer/shared";
export const runtime = "nodejs";
const VALID_MODES = ["walking", "cycling", "driving"];
const VALID_MODES = ["walking", "cycling", "driving", "transit", "fifteen"];
const VALID_THRESHOLDS = [5, 8, 10, 12, 15, 20, 25, 30];
export async function GET(

View file

@ -39,7 +39,7 @@ export default function HomePage() {
const [cities, setCities] = useState<City[]>([]);
const [selectedCity, setSelectedCity] = useState<string | null>(null);
const [profile, setProfile] = useState<ProfileId>("universal");
const [mode, setMode] = useState<TravelMode>("walking");
const [mode, setMode] = useState<TravelMode>("fifteen");
const [threshold, setThreshold] = useState(15);
const [weights, setWeights] = useState({ ...PROFILES["universal"].categoryWeights });
const [activeCategory, setActiveCategory] = useState<CategoryId | "composite">("composite");
@ -204,7 +204,9 @@ export default function HomePage() {
body: JSON.stringify({
lng: pinLocation.lng,
lat: pinLocation.lat,
travelMode: mode,
// "fifteen" and "transit" have no direct Valhalla isochrone costing —
// use walking as the representative display mode for both.
travelMode: (mode === "fifteen" || mode === "transit") ? "walking" : mode,
contourMinutes: isochroneContours(threshold),
}),
})

View file

@ -5,8 +5,10 @@ import type { CategoryId, TravelMode, ProfileId } from "@transportationer/shared
const TRAVEL_MODES: Array<{ value: TravelMode; label: string; icon: string }> =
[
{ value: "fifteen", label: "Best mode", icon: "🏆" },
{ value: "walking", label: "Walking", icon: "🚶" },
{ value: "cycling", label: "Cycling", icon: "🚲" },
{ value: "transit", label: "Transit", icon: "🚌" },
{ value: "driving", label: "Driving", icon: "🚗" },
];
@ -90,12 +92,12 @@ export function ControlPanel({
<p className="text-xs font-medium text-gray-600 mb-2 uppercase tracking-wide">
Travel Mode
</p>
<div className="flex gap-1">
<div className="grid grid-cols-2 gap-1">
{TRAVEL_MODES.map((m) => (
<button
key={m.value}
onClick={() => onModeChange(m.value)}
className={`flex-1 flex flex-col items-center gap-1 py-2 rounded-md text-xs border transition-colors ${
className={`flex flex-col items-center gap-1 py-2 rounded-md text-xs border transition-colors ${
mode === m.value
? "border-brand-500 bg-brand-50 text-brand-700 font-medium"
: "border-gray-200 text-gray-600 hover:border-gray-300"

View file

@ -51,6 +51,8 @@ services:
VALHALLA_CONFIG: /data/valhalla/valhalla.json
VALHALLA_TILES_DIR: /data/valhalla/valhalla_tiles
NODE_ENV: production
# Optional: connect-info.net token for NDS-specific GTFS feed
CONNECT_INFO_TOKEN: ${CONNECT_INFO_TOKEN:-}
ports:
- "127.0.0.1:8002:8002" # Valhalla HTTP API
depends_on:
@ -105,6 +107,8 @@ services:
LUA_SCRIPT: /app/infra/osm2pgsql.lua
VALHALLA_URL: http://valhalla:8002
NODE_ENV: production
# Optional: enables NDS-specific GTFS source for cities in Niedersachsen
CONNECT_INFO_TOKEN: ${CONNECT_INFO_TOKEN:-}
volumes:
- osm_data:/data/osm # Worker downloads PBF here
depends_on:

View file

@ -75,7 +75,7 @@ CREATE INDEX IF NOT EXISTS idx_grid_hidden_gem
CREATE TABLE IF NOT EXISTS grid_scores (
grid_point_id BIGINT NOT NULL REFERENCES grid_points(id) ON DELETE CASCADE,
category TEXT NOT NULL,
travel_mode TEXT NOT NULL CHECK (travel_mode IN ('walking','cycling','driving')),
travel_mode TEXT NOT NULL CHECK (travel_mode IN ('walking','cycling','driving','transit','fifteen')),
threshold_min INTEGER NOT NULL,
profile TEXT NOT NULL DEFAULT 'universal',
nearest_poi_id BIGINT,

145
package-lock.json generated
View file

@ -1482,6 +1482,16 @@
"@types/geojson": "*"
}
},
"node_modules/@types/unzipper": {
"version": "0.10.11",
"resolved": "https://registry.npmjs.org/@types/unzipper/-/unzipper-0.10.11.tgz",
"integrity": "sha512-D25im2zjyMCcgL9ag6N46+wbtJBnXIr7SI4zHf9eJD2Dw2tEB5e+p5MYkrxKIVRscs5QV0EhtU9rgXSPx90oJg==",
"dev": true,
"license": "MIT",
"dependencies": {
"@types/node": "*"
}
},
"node_modules/any-promise": {
"version": "1.3.0",
"resolved": "https://registry.npmjs.org/any-promise/-/any-promise-1.3.0.tgz",
@ -1579,6 +1589,12 @@
"url": "https://github.com/sponsors/sindresorhus"
}
},
"node_modules/bluebird": {
"version": "3.7.2",
"resolved": "https://registry.npmjs.org/bluebird/-/bluebird-3.7.2.tgz",
"integrity": "sha512-XpNj6GDQzdfW+r2Wnn7xiSAd7TM3jzkxGXBGTtWKuSXv1xUV+azxAm8jdWZN06QTQk+2N2XB9jRDkvbmQmcRtg==",
"license": "MIT"
},
"node_modules/braces": {
"version": "3.0.3",
"resolved": "https://registry.npmjs.org/braces/-/braces-3.0.3.tgz",
@ -1765,6 +1781,12 @@
"node": ">= 6"
}
},
"node_modules/core-util-is": {
"version": "1.0.3",
"resolved": "https://registry.npmjs.org/core-util-is/-/core-util-is-1.0.3.tgz",
"integrity": "sha512-ZQBvi1DcpJ4GDqanjucZ2Hj3wEO5pZDS89BWbkcrvdxksJorwUDDZamX9ldFkp9aw2lmBDLgkObEA4DWNJ9FYQ==",
"license": "MIT"
},
"node_modules/cron-parser": {
"version": "4.9.0",
"resolved": "https://registry.npmjs.org/cron-parser/-/cron-parser-4.9.0.tgz",
@ -1847,6 +1869,15 @@
"dev": true,
"license": "MIT"
},
"node_modules/duplexer2": {
"version": "0.1.4",
"resolved": "https://registry.npmjs.org/duplexer2/-/duplexer2-0.1.4.tgz",
"integrity": "sha512-asLFVfWWtJ90ZyOUHMqk7/S2w2guQKxUI2itj3d92ADHhxUSbCMGi1f1cBcJ7xM1To+pE/Khbwo1yuNbMEPKeA==",
"license": "BSD-3-Clause",
"dependencies": {
"readable-stream": "^2.0.2"
}
},
"node_modules/earcut": {
"version": "3.0.2",
"resolved": "https://registry.npmjs.org/earcut/-/earcut-3.0.2.tgz",
@ -1985,6 +2016,20 @@
"url": "https://github.com/sponsors/rawify"
}
},
"node_modules/fs-extra": {
"version": "11.3.3",
"resolved": "https://registry.npmjs.org/fs-extra/-/fs-extra-11.3.3.tgz",
"integrity": "sha512-VWSRii4t0AFm6ixFFmLLx1t7wS1gh+ckoa84aOeapGum0h+EZd1EhEumSB+ZdDLnEPuucsVB9oB7cxJHap6Afg==",
"license": "MIT",
"dependencies": {
"graceful-fs": "^4.2.0",
"jsonfile": "^6.0.1",
"universalify": "^2.0.0"
},
"engines": {
"node": ">=14.14"
}
},
"node_modules/fsevents": {
"version": "2.3.3",
"resolved": "https://registry.npmjs.org/fsevents/-/fsevents-2.3.3.tgz",
@ -2074,6 +2119,12 @@
"node": ">=16"
}
},
"node_modules/graceful-fs": {
"version": "4.2.11",
"resolved": "https://registry.npmjs.org/graceful-fs/-/graceful-fs-4.2.11.tgz",
"integrity": "sha512-RbJ5/jmFcNNCcDV5o9eTnBLJ/HszWV0P73bc+Ff4nS/rJj+YaS6IGyiOL0VoBYX+l1Wrl3k63h/KrH+nhJ0XvQ==",
"license": "ISC"
},
"node_modules/hasown": {
"version": "2.0.2",
"resolved": "https://registry.npmjs.org/hasown/-/hasown-2.0.2.tgz",
@ -2107,6 +2158,12 @@
],
"license": "BSD-3-Clause"
},
"node_modules/inherits": {
"version": "2.0.4",
"resolved": "https://registry.npmjs.org/inherits/-/inherits-2.0.4.tgz",
"integrity": "sha512-k/vGaX4/Yla3WzyMCvTQOXYeIHvqOKtnqBduzTHpzpQZzAskKMhZ2K+EnBiSM9zGSoIFeMpXKxa4dYeZIQqewQ==",
"license": "ISC"
},
"node_modules/ini": {
"version": "4.1.3",
"resolved": "https://registry.npmjs.org/ini/-/ini-4.1.3.tgz",
@ -2202,6 +2259,12 @@
"node": ">=0.12.0"
}
},
"node_modules/isarray": {
"version": "1.0.0",
"resolved": "https://registry.npmjs.org/isarray/-/isarray-1.0.0.tgz",
"integrity": "sha512-VLghIWNM6ELQzo7zwmcg0NmTVyWKYjvIeM83yjp0wRDTmUnrM678fQbcKBo6n2CJEF0szoG//ytg+TKla89ALQ==",
"license": "MIT"
},
"node_modules/isexe": {
"version": "3.1.5",
"resolved": "https://registry.npmjs.org/isexe/-/isexe-3.1.5.tgz",
@ -2237,6 +2300,18 @@
"integrity": "sha512-3CNZ2DnrpByG9Nqj6Xo8vqbjT4F6N+tb4Gb28ESAZjYZ5yqvmc56J+/kuIwkaAMOyblTQhUW7PxMkUb8Q36N3Q==",
"license": "MIT"
},
"node_modules/jsonfile": {
"version": "6.2.0",
"resolved": "https://registry.npmjs.org/jsonfile/-/jsonfile-6.2.0.tgz",
"integrity": "sha512-FGuPw30AdOIUTRMC2OMRtQV+jkVj2cfPqSeWXv1NEAJ1qZ5zb1X6z1mFhbfOB/iy3ssJCD+3KuZ8r8C3uVFlAg==",
"license": "MIT",
"dependencies": {
"universalify": "^2.0.0"
},
"optionalDependencies": {
"graceful-fs": "^4.1.6"
}
},
"node_modules/kdbush": {
"version": "4.0.2",
"resolved": "https://registry.npmjs.org/kdbush/-/kdbush-4.0.2.tgz",
@ -2551,6 +2626,12 @@
"node-gyp-build-optional-packages-test": "build-test.js"
}
},
"node_modules/node-int64": {
"version": "0.4.0",
"resolved": "https://registry.npmjs.org/node-int64/-/node-int64-0.4.0.tgz",
"integrity": "sha512-O5lz91xSOeoXP6DulyHfllpq+Eg00MWitZIbtPfoSEvqIHdl5gfcY6hYzDWnj0qD5tz52PI08u9qUvSVeUBeHw==",
"license": "MIT"
},
"node_modules/node-releases": {
"version": "2.0.27",
"resolved": "https://registry.npmjs.org/node-releases/-/node-releases-2.0.27.tgz",
@ -2840,6 +2921,12 @@
"integrity": "sha512-pcaShQc1Shq0y+E7GqJqvZj8DTthWV1KeHGdi0Z6IAin2Oi3JnLCOfwnCo84qc+HAp52wT9nK9H7FAJp5a44GQ==",
"license": "ISC"
},
"node_modules/process-nextick-args": {
"version": "2.0.1",
"resolved": "https://registry.npmjs.org/process-nextick-args/-/process-nextick-args-2.0.1.tgz",
"integrity": "sha512-3ouUOpQhtgrbOa17J7+uxOTpITYWaGP7/AhoR3+A+/1e9skrzelGi/dXzEYyvbxubEF6Wn2ypscTKiKJFFn1ag==",
"license": "MIT"
},
"node_modules/protocol-buffers-schema": {
"version": "3.6.0",
"resolved": "https://registry.npmjs.org/protocol-buffers-schema/-/protocol-buffers-schema-3.6.0.tgz",
@ -2906,6 +2993,21 @@
"pify": "^2.3.0"
}
},
"node_modules/readable-stream": {
"version": "2.3.8",
"resolved": "https://registry.npmjs.org/readable-stream/-/readable-stream-2.3.8.tgz",
"integrity": "sha512-8p0AUk4XODgIewSi0l8Epjs+EVnWiK7NoDIEGU0HhE7+ZyY8D1IMY7odu5lRrFXGg71L15KG8QrPmum45RTtdA==",
"license": "MIT",
"dependencies": {
"core-util-is": "~1.0.0",
"inherits": "~2.0.3",
"isarray": "~1.0.0",
"process-nextick-args": "~2.0.0",
"safe-buffer": "~5.1.1",
"string_decoder": "~1.1.1",
"util-deprecate": "~1.0.1"
}
},
"node_modules/readdirp": {
"version": "3.6.0",
"resolved": "https://registry.npmjs.org/readdirp/-/readdirp-3.6.0.tgz",
@ -3021,6 +3123,12 @@
"integrity": "sha512-PdhdWy89SiZogBLaw42zdeqtRJ//zFd2PgQavcICDUgJT5oW10QCRKbJ6bg4r0/UY2M6BWd5tkxuGFRvCkgfHQ==",
"license": "BSD-3-Clause"
},
"node_modules/safe-buffer": {
"version": "5.1.2",
"resolved": "https://registry.npmjs.org/safe-buffer/-/safe-buffer-5.1.2.tgz",
"integrity": "sha512-Gd2UZBJDkXlY7GbJxfsE8/nvKkUEU1G38c1siN6QP6a9PT9MmHB8GnpscSmMJSoF8LOIrt8ud/wPtojys4G6+g==",
"license": "MIT"
},
"node_modules/scheduler": {
"version": "0.27.0",
"resolved": "https://registry.npmjs.org/scheduler/-/scheduler-0.27.0.tgz",
@ -3099,6 +3207,15 @@
"integrity": "sha512-qoRRSyROncaz1z0mvYqIE4lCd9p2R90i6GxW3uZv5ucSu8tU7B5HXUP1gG8pVZsYNVaXjk8ClXHPttLyxAL48A==",
"license": "MIT"
},
"node_modules/string_decoder": {
"version": "1.1.1",
"resolved": "https://registry.npmjs.org/string_decoder/-/string_decoder-1.1.1.tgz",
"integrity": "sha512-n/ShnvDi6FHbbVfviro+WojiFzv+s8MPMHBczVePfUpDJLwoLT0ht1l4YwBCbi8pJAveEEdnkHyPyTP/mzRfwg==",
"license": "MIT",
"dependencies": {
"safe-buffer": "~5.1.0"
}
},
"node_modules/styled-jsx": {
"version": "5.1.6",
"resolved": "https://registry.npmjs.org/styled-jsx/-/styled-jsx-5.1.6.tgz",
@ -3352,6 +3469,28 @@
"dev": true,
"license": "MIT"
},
"node_modules/universalify": {
"version": "2.0.1",
"resolved": "https://registry.npmjs.org/universalify/-/universalify-2.0.1.tgz",
"integrity": "sha512-gptHNQghINnc/vTGIk0SOFGFNXw7JVrlRUtConJRlvaw6DuX0wO5Jeko9sWrMBhh+PsYAZ7oXAiOnf/UKogyiw==",
"license": "MIT",
"engines": {
"node": ">= 10.0.0"
}
},
"node_modules/unzipper": {
"version": "0.12.3",
"resolved": "https://registry.npmjs.org/unzipper/-/unzipper-0.12.3.tgz",
"integrity": "sha512-PZ8hTS+AqcGxsaQntl3IRBw65QrBI6lxzqDEL7IAo/XCEqRTKGfOX56Vea5TH9SZczRVxuzk1re04z/YjuYCJA==",
"license": "MIT",
"dependencies": {
"bluebird": "~3.7.2",
"duplexer2": "~0.1.4",
"fs-extra": "^11.2.0",
"graceful-fs": "^4.2.2",
"node-int64": "^0.4.0"
}
},
"node_modules/update-browserslist-db": {
"version": "1.2.3",
"resolved": "https://registry.npmjs.org/update-browserslist-db/-/update-browserslist-db-1.2.3.tgz",
@ -3387,7 +3526,6 @@
"version": "1.0.2",
"resolved": "https://registry.npmjs.org/util-deprecate/-/util-deprecate-1.0.2.tgz",
"integrity": "sha512-EPD5q1uXyFxJpCrLnCc1nHnq3gOa6DZBocAIiI2TaSCA7VCJ1UJDMagCzIkXNsUYfD1daK//LTEQ8xiIbrHtcw==",
"dev": true,
"license": "MIT"
},
"node_modules/uuid": {
@ -3442,11 +3580,12 @@
"dependencies": {
"@transportationer/shared": "*",
"bullmq": "^5.13.0",
"ioredis": "^5.4.1",
"postgres": "^3.4.4"
"postgres": "^3.4.4",
"unzipper": "^0.12.3"
},
"devDependencies": {
"@types/node": "^22.0.0",
"@types/unzipper": "^0.10.11",
"tsx": "^4.19.0",
"typescript": "^5.6.0"
}

View file

@ -5,7 +5,10 @@ export type CategoryId =
| "culture_community"
| "recreation";
export type TravelMode = "walking" | "cycling" | "driving";
/** Modes that produce real routing data (matrix or isochrone calls). */
export type RoutingMode = "walking" | "cycling" | "driving" | "transit";
/** All display modes, including the synthetic "fifteen" (best-of walking+cycling+transit). */
export type TravelMode = RoutingMode | "fifteen";
export interface TagFilter {
key: string;

View file

@ -1,3 +1,5 @@
import type { RoutingMode } from "./osm-tags.js";
// ─── Job data types ───────────────────────────────────────────────────────────
export interface DownloadPbfJobData {
@ -24,7 +26,7 @@ export interface GenerateGridJobData {
export interface ComputeScoresJobData {
type: "compute-scores";
citySlug: string;
modes: Array<"walking" | "cycling" | "driving">;
modes: RoutingMode[];
thresholds: number[];
/** Set after compute-routing children are dispatched (internal two-phase state). */
routingDispatched?: boolean;
@ -39,6 +41,11 @@ export interface ComputeRoutingJobData {
category: string;
}
export interface ComputeTransitJobData {
type: "compute-transit";
citySlug: string;
}
export interface BuildValhallaJobData {
type: "build-valhalla";
/** City being added/updated. Absent for removal-only rebuilds. */
@ -55,6 +62,8 @@ export interface RefreshCityJobData {
citySlug: string;
geofabrikUrl: string;
resolutionM?: number;
/** ID of the compute-scores job enqueued for this refresh; set after flow.add(). */
computeScoresJobId?: string;
}
export interface IngestBorisNiJobData {
@ -62,6 +71,20 @@ export interface IngestBorisNiJobData {
citySlug: string;
}
export interface DownloadGtfsDeJobData {
type: "download-gtfs-de";
url: string;
/** Re-download even if data already exists */
force?: boolean;
/**
* Per-city bounding boxes [minLng, minLat, maxLng, maxLat] used to clip the
* GTFS feed after extraction. A stop is kept when it falls inside ANY of the
* bboxes (each already padded by a small buffer in refresh-city.ts).
* When absent the full feed is kept.
*/
bboxes?: [number, number, number, number][];
}
export type PipelineJobData =
| DownloadPbfJobData
| ExtractPoisJobData
@ -70,11 +93,24 @@ export type PipelineJobData =
| ComputeRoutingJobData
| BuildValhallaJobData
| RefreshCityJobData
| IngestBorisNiJobData;
| IngestBorisNiJobData
| DownloadGtfsDeJobData
| ComputeTransitJobData;
// ─── Job options (BullMQ-compatible plain objects) ────────────────────────────
export const JOB_OPTIONS: Record<PipelineJobData["type"], object> = {
"compute-transit": {
attempts: 1,
removeOnComplete: { age: 86400 * 7 },
removeOnFail: { age: 86400 * 30 },
},
"download-gtfs-de": {
attempts: 2,
backoff: { type: "fixed", delay: 10000 },
removeOnComplete: { age: 86400 * 7 },
removeOnFail: { age: 86400 * 30 },
},
"compute-routing": {
attempts: 2,
backoff: { type: "fixed", delay: 3000 },

View file

@ -12,10 +12,12 @@
"dependencies": {
"@transportationer/shared": "*",
"bullmq": "^5.13.0",
"postgres": "^3.4.4"
"postgres": "^3.4.4",
"unzipper": "^0.12.3"
},
"devDependencies": {
"@types/node": "^22.0.0",
"@types/unzipper": "^0.10.11",
"tsx": "^4.19.0",
"typescript": "^5.6.0"
}

View file

@ -8,13 +8,14 @@ import { handleComputeScores } from "./jobs/compute-scores.js";
import { handleComputeRouting } from "./jobs/compute-routing.js";
import { handleRefreshCity } from "./jobs/refresh-city.js";
import { handleIngestBorisNi } from "./jobs/ingest-boris-ni.js";
import { handleComputeTransit } from "./jobs/compute-transit.js";
console.log("[worker] Starting Transportationer pipeline worker…");
const worker = new Worker<PipelineJobData>(
"pipeline",
async (job: Job<PipelineJobData>, token?: string) => {
console.log(`[worker] Processing job ${job.id} type=${job.data.type} city=${job.data.citySlug}`);
console.log(`[worker] Processing job ${job.id} type=${job.data.type} city=${"citySlug" in job.data ? job.data.citySlug : "n/a"}`);
switch (job.data.type) {
case "download-pbf":
@ -31,6 +32,8 @@ const worker = new Worker<PipelineJobData>(
return handleRefreshCity(job as Job<any>);
case "ingest-boris-ni":
return handleIngestBorisNi(job as Job<any>);
case "compute-transit":
return handleComputeTransit(job as Job<any>);
default:
throw new Error(`Unknown job type: ${(job.data as any).type}`);
}

View file

@ -1,6 +1,7 @@
import type { Job } from "bullmq";
import { execSync, spawn } from "child_process";
import { existsSync, mkdirSync, readFileSync, unlinkSync, writeFileSync } from "fs";
import { existsSync, mkdirSync, readFileSync, readdirSync, rmSync, statSync, unlinkSync, writeFileSync } from "fs";
import * as path from "path";
import type { JobProgress } from "@transportationer/shared";
export type BuildValhallaData = {
@ -13,18 +14,50 @@ export type BuildValhallaData = {
removeSlugs?: string[];
};
const OSM_DATA_DIR = process.env.OSM_DATA_DIR ?? "/data/osm";
const VALHALLA_CONFIG = process.env.VALHALLA_CONFIG ?? "/data/valhalla/valhalla.json";
const OSM_DATA_DIR = process.env.OSM_DATA_DIR ?? "/data/osm";
const VALHALLA_CONFIG = process.env.VALHALLA_CONFIG ?? "/data/valhalla/valhalla.json";
const VALHALLA_TILES_DIR = process.env.VALHALLA_TILES_DIR ?? "/data/valhalla/valhalla_tiles";
const VALHALLA_DATA_DIR = "/data/valhalla";
const VALHALLA_DATA_DIR = "/data/valhalla";
const GTFS_DATA_DIR = process.env.GTFS_DATA_DIR ?? "/data/valhalla/gtfs";
const GTFS_FEED_DIR = `${GTFS_DATA_DIR}/feed`;
/**
* Auxiliary databases downloaded by valhalla_build_tiles on first run.
* Stored OUTSIDE VALHALLA_TILES_DIR so they survive crash-recovery tile
* wipes and don't need to be re-downloaded on retries.
*/
const TIMEZONE_SQLITE = `${VALHALLA_DATA_DIR}/timezone.sqlite`;
const ADMINS_SQLITE = `${VALHALLA_DATA_DIR}/admins.sqlite`;
/**
* Explicit mjolnir.transit_dir used by all four transit-aware Valhalla
* operations (ingest, convert, build_tiles, service). Pinned here to avoid
* the Valhalla default (/data/valhalla/transit) and to persist transit tiles
* between builds.
*
* IMPORTANT build order (per Valhalla docs):
* 1. valhalla_build_tiles road graph; also downloads timezone.sqlite
* 2. valhalla_ingest_transit GTFS transit PBF tiles in transit_dir
* 3. valhalla_convert_transit reads transit PBFs + road tiles transit graph
*
* valhalla_convert_transit REQUIRES road tiles to exist (it uses GraphReader
* to look up road node IDs for stop connections). Running it before
* valhalla_build_tiles causes it to crash looking for tiles that don't exist.
*
* TRANSIT_CACHE_MARKER tracks whether ingest PBFs are current relative to the
* GTFS source. valhalla_convert_transit is always re-run after a road build
* because road node IDs change on each rebuild and old transit-to-road
* connections would otherwise be stale.
*/
const TRANSIT_CACHE_DIR = `${VALHALLA_DATA_DIR}/transit_graph`;
/** Written after a successful valhalla_ingest_transit; compared against GTFS source mtime. */
const TRANSIT_CACHE_MARKER = `${TRANSIT_CACHE_DIR}/.ready`;
/** Written by download-gtfs-de after each successful GTFS extraction. */
const GTFS_SOURCE_MARKER = `${GTFS_FEED_DIR}/.source`;
/**
* Manifest file: maps citySlug absolute path of its routing PBF.
* Persists in the valhalla_tiles Docker volume across restarts.
*
* For bbox-clipped cities the path is /data/valhalla/{slug}-routing.osm.pbf.
* For whole-region cities (no bbox) the path is /data/osm/{slug}-latest.osm.pbf
* (accessible via the osm_data volume mounted read-only in this container).
*/
const ROUTING_MANIFEST = `${VALHALLA_DATA_DIR}/routing-sources.json`;
@ -52,6 +85,40 @@ function runProcess(cmd: string, args: string[]): Promise<void> {
});
}
/**
* Build the IANA timezone SQLite database required by valhalla_ingest_transit.
* Without it, ingest does not write the root index tile (0/000/000.pbf) and
* valhalla_convert_transit crashes trying to load it.
*
* valhalla_build_timezones writes the SQLite database to stdout (no args),
* so we capture stdout and write it to TIMEZONE_SQLITE.
*/
function buildTimezoneDb(): Promise<void> {
return new Promise((resolve, reject) => {
console.log("[build-valhalla] Running: valhalla_build_timezones (output → " + TIMEZONE_SQLITE + ")");
const child = spawn("valhalla_build_timezones", [], {
stdio: ["ignore", "pipe", "inherit"],
});
const chunks: Buffer[] = [];
child.stdout!.on("data", (chunk: Buffer) => chunks.push(chunk));
child.on("error", reject);
child.on("exit", (code) => {
if (code !== 0) {
reject(new Error(`valhalla_build_timezones exited with code ${code}`));
return;
}
const db = Buffer.concat(chunks);
if (db.length < 1024) {
reject(new Error(`valhalla_build_timezones output too small (${db.length} B) — likely failed silently`));
return;
}
writeFileSync(TIMEZONE_SQLITE, db);
console.log(`[build-valhalla] Timezone database written to ${TIMEZONE_SQLITE} (${(db.length / 1024 / 1024).toFixed(1)} MB)`);
resolve();
});
});
}
type JsonObject = Record<string, unknown>;
/** Deep-merge override into base. Objects are merged recursively; arrays and
@ -75,13 +142,11 @@ function deepMerge(base: JsonObject, override: JsonObject): JsonObject {
/**
* Generate valhalla.json by starting from the canonical defaults produced by
* valhalla_build_config, then overlaying only the deployment-specific settings.
* This ensures every required field for the installed Valhalla version is present
* without us having to maintain a manual list.
*/
function generateConfig(): void {
mkdirSync(VALHALLA_TILES_DIR, { recursive: true });
mkdirSync(TRANSIT_CACHE_DIR, { recursive: true });
// Get the full default config for this exact Valhalla build.
let base: JsonObject = {};
try {
const out = execSync("valhalla_build_config", {
@ -94,13 +159,18 @@ function generateConfig(): void {
console.warn("[build-valhalla] valhalla_build_config failed, using empty base:", err);
}
// Only override settings specific to this deployment.
const overrides: JsonObject = {
mjolnir: {
tile_dir: VALHALLA_TILES_DIR,
tile_extract: `${VALHALLA_TILES_DIR}.tar`,
timezone: `${VALHALLA_TILES_DIR}/timezone.sqlite`,
admin: `${VALHALLA_TILES_DIR}/admins.sqlite`,
// Stored outside tile_dir so they survive crash-recovery wipes.
timezone: TIMEZONE_SQLITE,
admin: ADMINS_SQLITE,
// All transit operations (ingest, convert, service) read/write here.
transit_dir: TRANSIT_CACHE_DIR,
// valhalla_ingest_transit expects a directory whose subdirectories are
// individual GTFS feeds. feed/ (inside GTFS_DATA_DIR) is one such feed.
transit_feeds_dir: GTFS_DATA_DIR,
},
additional_data: {
elevation: "/data/elevation/",
@ -111,6 +181,12 @@ function generateConfig(): void {
timeout_seconds: 26,
},
},
service_limits: {
isochrone: {
// Transit scoring uses 5 contours [5,10,15,20,30]; Valhalla default is 4.
max_contours: 5,
},
},
};
const config = deepMerge(base, overrides);
@ -118,6 +194,16 @@ function generateConfig(): void {
console.log(`[build-valhalla] Config written to ${VALHALLA_CONFIG}`);
}
/**
* True when valhalla_ingest_transit has been run against the current GTFS data.
* Compares the ingest marker mtime against the GTFS source marker mtime written
* by download-gtfs-de after each successful extraction.
*/
function isTransitIngestFresh(): boolean {
if (!existsSync(TRANSIT_CACHE_MARKER) || !existsSync(GTFS_SOURCE_MARKER)) return false;
return statSync(TRANSIT_CACHE_MARKER).mtimeMs >= statSync(GTFS_SOURCE_MARKER).mtimeMs;
}
export async function handleBuildValhalla(
job: Job<BuildValhallaData>,
restartService: () => Promise<void>,
@ -133,12 +219,9 @@ export async function handleBuildValhalla(
generateConfig();
// ── Step 1: update the routing manifest ──────────────────────────────────
// The manifest maps citySlug → pbfPath for every city that should be
// included in the global tile set. It persists across container restarts.
const manifest = readManifest();
// Remove requested cities
for (const slug of removeSlugs) {
const clippedPbf = `${VALHALLA_DATA_DIR}/${slug}-routing.osm.pbf`;
if (existsSync(clippedPbf)) {
@ -148,7 +231,6 @@ export async function handleBuildValhalla(
delete manifest[slug];
}
// Add/update the city being ingested (absent for removal-only jobs)
if (citySlug && pbfPath) {
await job.updateProgress({
stage: "Building routing graph",
@ -175,7 +257,6 @@ export async function handleBuildValhalla(
]);
routingPbf = clippedPbf;
} else {
// No bbox: use the full PBF from the osm_data volume (mounted :ro here)
if (existsSync(pbfPath)) {
routingPbf = pbfPath;
} else {
@ -193,7 +274,7 @@ export async function handleBuildValhalla(
writeManifest(manifest);
// ── Step 2: build tiles from ALL registered cities ────────────────────────
// ── Step 2: check for cities to build ────────────────────────────────────
const allPbfs = Object.values(manifest).filter(existsSync);
const allSlugs = Object.keys(manifest);
@ -208,19 +289,116 @@ export async function handleBuildValhalla(
return;
}
// ── Step 3: build road tiles ──────────────────────────────────────────────
//
// valhalla_build_tiles MUST run before transit operations:
// • valhalla_convert_transit needs road tiles (GraphReader) to look up road
// node IDs for each transit stop — running it before this step causes the
// "Couldn't load .../0/000/000.pbf" crash.
//
// valhalla_build_tiles ignores any transit tiles in transit_dir (it filters
// them out of the hierarchy build), so there is no "transit connection" pass
// to worry about — transit connectivity is created by convert_transit.
await job.updateProgress({
stage: "Building routing graph",
pct: 10,
message: `Building global routing tiles for: ${allSlugs.join(", ")}`,
message: `Building road routing tiles for: ${allSlugs.join(", ")}`,
} satisfies JobProgress);
// valhalla_build_tiles accepts multiple PBF files as positional arguments,
// so we get one combined tile set covering all cities in a single pass.
await runProcess("valhalla_build_tiles", ["-c", VALHALLA_CONFIG, ...allPbfs]);
// Tiles are fully built — restart the service to pick them up.
// compute-routing jobs will transparently retry their in-flight matrix calls
// across the brief restart window (~510 s).
console.log("[build-valhalla] Road tiles built");
// ── Step 4: transit tile preparation ─────────────────────────────────────
//
// Transit runs after road tiles exist. Three sub-steps:
//
// 4a. timezone db — valhalla_build_timezones (one-time, skip if exists).
// valhalla_ingest_transit needs it to assign timezone info to stops.
// Without it, ingest skips writing the root index tile (0/000/000.pbf)
// and valhalla_convert_transit crashes trying to load it.
//
// 4b. valhalla_ingest_transit — GTFS → transit PBF tiles in transit_dir.
// Only re-run when GTFS data changed (expensive: can take hours).
//
// 4c. valhalla_convert_transit — transit PBFs + road tiles → transit graph.
// ALWAYS re-run after a road build because road node IDs change on
// every rebuild; old transit-to-road connections would be stale.
const gtfsReady =
existsSync(GTFS_FEED_DIR) &&
readdirSync(GTFS_FEED_DIR).some((f) => f.endsWith(".txt"));
if (gtfsReady) {
// 4a: timezone database — one-time setup, persists in VALHALLA_DATA_DIR.
// valhalla_ingest_transit needs this to assign timezone info to stops;
// without it the root index tile (0/000/000.pbf) is not written and
// valhalla_convert_transit crashes trying to load it.
if (!existsSync(TIMEZONE_SQLITE)) {
await job.updateProgress({
stage: "Building routing graph",
pct: 73,
message: "Building timezone database (one-time setup)…",
} satisfies JobProgress);
try {
await buildTimezoneDb();
} catch (err) {
console.warn("[build-valhalla] valhalla_build_timezones failed — skipping transit:", err);
// Can't safely run transit ingest without timezone db.
}
}
// 4b: ingest (only when GTFS changed, and only when timezone db is ready)
let ingestPbfsAvailable = isTransitIngestFresh();
if (!ingestPbfsAvailable && existsSync(TIMEZONE_SQLITE)) {
await job.updateProgress({
stage: "Building routing graph",
pct: 75,
message: "Ingesting GTFS transit feeds…",
} satisfies JobProgress);
try {
// Wipe stale/partial PBF tiles before ingesting.
rmSync(TRANSIT_CACHE_DIR, { recursive: true, force: true });
mkdirSync(TRANSIT_CACHE_DIR, { recursive: true });
await runProcess("valhalla_ingest_transit", ["-c", VALHALLA_CONFIG]);
writeFileSync(TRANSIT_CACHE_MARKER, new Date().toISOString());
ingestPbfsAvailable = true;
console.log("[build-valhalla] valhalla_ingest_transit completed");
} catch (err) {
console.warn("[build-valhalla] valhalla_ingest_transit failed (road routing unaffected):", err);
// Wipe partial output so convert doesn't try to read corrupt PBFs.
rmSync(TRANSIT_CACHE_DIR, { recursive: true, force: true });
mkdirSync(TRANSIT_CACHE_DIR, { recursive: true });
}
} else if (ingestPbfsAvailable) {
console.log("[build-valhalla] Transit ingest cache is fresh — skipping ingest");
} else {
console.log("[build-valhalla] timezone.sqlite unavailable — skipping transit ingest");
}
// 4c: convert (always, to reconnect transit to the new road graph)
if (ingestPbfsAvailable) {
await job.updateProgress({
stage: "Building routing graph",
pct: 85,
message: "Connecting transit tiles to road graph…",
} satisfies JobProgress);
try {
await runProcess("valhalla_convert_transit", ["-c", VALHALLA_CONFIG]);
console.log("[build-valhalla] valhalla_convert_transit completed");
} catch (err) {
console.warn("[build-valhalla] valhalla_convert_transit failed (road routing unaffected):", err);
}
}
} else {
console.log("[build-valhalla] No GTFS feed found — skipping transit tile prep");
}
// ── Step 5: restart Valhalla service ─────────────────────────────────────
await job.updateProgress({
stage: "Building routing graph",
pct: 95,

View file

@ -2,7 +2,7 @@ import type { Job } from "bullmq";
import { Queue, WaitingChildrenError } from "bullmq";
import { getSql } from "../db.js";
import { createBullMQConnection } from "../redis.js";
import type { JobProgress } from "@transportationer/shared";
import type { JobProgress, ComputeScoresJobData as ComputeScoresData } from "@transportationer/shared";
import {
CATEGORY_IDS,
PROFILES,
@ -10,17 +10,6 @@ import {
DEFAULT_SUBCATEGORY_WEIGHT,
} from "@transportationer/shared";
export type ComputeScoresData = {
type: "compute-scores";
citySlug: string;
modes: Array<"walking" | "cycling" | "driving">;
thresholds: number[];
/** Persisted after routing children are dispatched to distinguish phase 1 from phase 2. */
routingDispatched?: boolean;
/** When true, ingest-boris-ni is dispatched in Phase 1 to run alongside routing jobs. */
ingestBorisNi?: boolean;
};
const INSERT_CHUNK = 2000;
function subcategoryWeight(profileId: string, subcategory: string): number {
@ -96,11 +85,14 @@ export async function handleComputeScores(
// Enqueue one routing child per (mode, category). Each child registers
// itself to this parent job via opts.parent, so BullMQ tracks completion.
// Transit is handled by a single compute-transit job (not per-category)
// since it uses isochrones rather than the matrix API.
// For NI cities, ingest-boris-ni is also enqueued here so it runs in
// parallel with the routing jobs rather than sequentially after them.
const queue = new Queue("pipeline", { connection: createBullMQConnection() });
try {
for (const mode of modes) {
if (mode === "transit") continue; // handled below as a single job
for (const category of CATEGORY_IDS) {
await queue.add(
"compute-routing",
@ -121,6 +113,21 @@ export async function handleComputeScores(
}
}
// Dispatch transit scoring as a sibling child (one job covers all categories
// via PostGIS isochrone spatial joins, unlike per-category routing jobs).
if (modes.includes("transit")) {
await queue.add(
"compute-transit",
{ type: "compute-transit", citySlug },
{
attempts: 1,
removeOnComplete: { age: 86400 * 7 },
removeOnFail: { age: 86400 * 30 },
parent: { id: job.id!, queue: queue.qualifiedName },
},
);
}
// Dispatch BORIS NI ingest as a sibling child so it runs during routing.
if (job.data.ingestBorisNi) {
await queue.add(
@ -220,6 +227,66 @@ export async function handleComputeScores(
}
}
// Synthesize "multimodal" groups: for each (gpId, category, subcategory),
// take the minimum travel time across walking and cycling so that a
// destination reachable by either mode counts as accessible.
// Driving is intentionally excluded (not a 15-min city metric).
const MULTIMODAL_MODES = new Set(["walking", "cycling", "transit"]); // modes combined into "fifteen"
const mmAccumulator = new Map<string, {
gpId: string;
category: string;
subTimes: Map<string, number | null>;
nearestDistM: number | null;
nearestPoiId: string | null;
nearestTimeS: number | null;
}>();
for (const entry of groups.values()) {
if (!MULTIMODAL_MODES.has(entry.mode)) continue;
const mmKey = `${entry.gpId}:${entry.category}`;
if (!mmAccumulator.has(mmKey)) {
mmAccumulator.set(mmKey, {
gpId: entry.gpId,
category: entry.category,
subTimes: new Map(),
nearestDistM: null,
nearestPoiId: null,
nearestTimeS: null,
});
}
const acc = mmAccumulator.get(mmKey)!;
// Track nearest POI across all multimodal modes
if (entry.nearestDistM !== null && (acc.nearestDistM === null || entry.nearestDistM < acc.nearestDistM)) {
acc.nearestDistM = entry.nearestDistM;
acc.nearestPoiId = entry.nearestPoiId;
acc.nearestTimeS = entry.nearestTimeS;
}
// For each subcategory, keep the minimum travel time across modes
for (const { subcategory, timeS } of entry.subcategoryTimes) {
const existing = acc.subTimes.get(subcategory);
if (existing === undefined) {
acc.subTimes.set(subcategory, timeS);
} else if (existing === null && timeS !== null) {
acc.subTimes.set(subcategory, timeS);
} else if (timeS !== null && existing !== null && timeS < existing) {
acc.subTimes.set(subcategory, timeS);
}
}
}
for (const acc of mmAccumulator.values()) {
const key = `${acc.gpId}:fifteen:${acc.category}`;
groups.set(key, {
gpId: acc.gpId,
mode: "fifteen",
category: acc.category,
subcategoryTimes: Array.from(acc.subTimes.entries()).map(([subcategory, timeS]) => ({ subcategory, timeS })),
nearestPoiId: acc.nearestPoiId,
nearestDistM: acc.nearestDistM,
nearestTimeS: acc.nearestTimeS,
});
}
// Compute and insert scores for every threshold × profile combination.
// Each threshold writes to distinct rows (threshold_min is part of the PK),
// so all thresholds can be processed concurrently without conflicts.

View file

@ -0,0 +1,206 @@
/**
* Compute public-transport accessibility scores for every grid point in a city.
*
* Unlike walking/cycling (which use Valhalla's matrix endpoint), transit routing
* requires Valhalla's isochrone endpoint with multimodal costing, since
* sources_to_targets does not support transit. The approach:
*
* 1. For each grid point, fetch a transit isochrone (5/10/15/20/30 min contours).
* 2. Use PostGIS ST_Within to find which POIs fall in each contour band.
* 3. Assign estimated transit travel time = the smallest contour that contains the POI.
* 4. Write results to grid_poi_details with travel_mode = 'transit'.
*
* If transit routing fails for a grid point (e.g. no GTFS data, or the point is
* outside the transit network), it is silently skipped transit contributes
* nothing to that grid point's "fifteen" score, which then falls back to
* walking/cycling only.
*
* Transit scores are computed ONCE per city (not per category) since the isochrone
* polygon covers all categories in a single PostGIS spatial join.
*/
import type { Job } from "bullmq";
import { getSql } from "../db.js";
import { fetchTransitIsochrone } from "../valhalla.js";
import type { JobProgress } from "@transportationer/shared";
import { CATEGORY_IDS } from "@transportationer/shared";
export type ComputeTransitData = {
type: "compute-transit";
citySlug: string;
};
/** Grid points processed per concurrent Valhalla isochrone call. */
const BATCH_CONCURRENCY = 4;
/** Rows per INSERT. */
const INSERT_CHUNK = 2000;
async function asyncPool<T>(
concurrency: number,
items: T[],
fn: (item: T) => Promise<void>,
): Promise<void> {
const queue = [...items];
async function worker(): Promise<void> {
while (queue.length > 0) await fn(queue.shift()!);
}
await Promise.all(Array.from({ length: Math.min(concurrency, items.length) }, worker));
}
export async function handleComputeTransit(job: Job<ComputeTransitData>): Promise<void> {
const { citySlug } = job.data;
const sql = getSql();
const gridPoints = await Promise.resolve(sql<{ id: string; lat: number; lng: number }[]>`
SELECT id::text AS id, ST_Y(geom) AS lat, ST_X(geom) AS lng
FROM grid_points
WHERE city_slug = ${citySlug}
ORDER BY id
`);
if (gridPoints.length === 0) return;
await job.updateProgress({
stage: "Transit routing",
pct: 1,
message: `Computing transit isochrones for ${gridPoints.length} grid points…`,
} satisfies JobProgress);
// Accumulate insert arrays across all batches before bulk-inserting.
const gpIdArr: string[] = [];
const catArr: string[] = [];
const subcatArr: string[] = [];
const poiIdArr: (string | null)[] = [];
const poiNameArr: (string | null)[] = [];
const distArr: (number | null)[] = [];
const timeArr: (number | null)[] = [];
let processed = 0;
let withTransit = 0;
await asyncPool(BATCH_CONCURRENCY, gridPoints, async (gp) => {
const contours = await fetchTransitIsochrone({ lat: gp.lat, lng: gp.lng });
processed++;
if (!contours || contours.length === 0) {
// No transit coverage — skip silently. This grid point gets null transit
// times for all subcategories, so it contributes 0 to transit scores.
return;
}
withTransit++;
// Build per-contour geometry JSON strings for PostGIS.
// The CASE expression checks from innermost to outermost contour — the first
// matching contour gives the best (smallest) estimated transit time.
const c5 = contours.find((c) => c.minutes === 5)?.geojson ?? null;
const c10 = contours.find((c) => c.minutes === 10)?.geojson ?? null;
const c15 = contours.find((c) => c.minutes === 15)?.geojson ?? null;
const c20 = contours.find((c) => c.minutes === 20)?.geojson ?? null;
const c30 = contours.find((c) => c.minutes === 30)?.geojson ?? null;
// Outermost available contour — used for the initial spatial filter.
const outer = c30 ?? c20 ?? c15 ?? c10 ?? c5;
if (!outer) return;
const c5s = c5 ? JSON.stringify(c5) : null;
const c10s = c10 ? JSON.stringify(c10) : null;
const c15s = c15 ? JSON.stringify(c15) : null;
const c20s = c20 ? JSON.stringify(c20) : null;
const c30s = c30 ? JSON.stringify(c30) : null;
const outers = JSON.stringify(outer);
// One query per grid point: find the nearest POI per (category, subcategory)
// within the transit isochrone, classified by contour band.
const rows = await Promise.resolve(sql<{
poi_id: string;
poi_name: string | null;
category: string;
subcategory: string;
dist_m: number;
transit_time_s: number;
}[]>`
SELECT DISTINCT ON (p.category, p.subcategory)
p.osm_id::text AS poi_id,
p.name AS poi_name,
p.category,
p.subcategory,
ST_Distance(
ST_SetSRID(ST_Point(${gp.lng}, ${gp.lat}), 4326)::geography,
p.geom::geography
) AS dist_m,
CASE
${c5s ? sql`WHEN ST_Within(p.geom, ST_SetSRID(ST_GeomFromGeoJSON(${c5s }), 4326)) THEN 300` : sql``}
${c10s ? sql`WHEN ST_Within(p.geom, ST_SetSRID(ST_GeomFromGeoJSON(${c10s}), 4326)) THEN 600` : sql``}
${c15s ? sql`WHEN ST_Within(p.geom, ST_SetSRID(ST_GeomFromGeoJSON(${c15s}), 4326)) THEN 900` : sql``}
${c20s ? sql`WHEN ST_Within(p.geom, ST_SetSRID(ST_GeomFromGeoJSON(${c20s}), 4326)) THEN 1200` : sql``}
ELSE 1800
END AS transit_time_s
FROM raw_pois p
WHERE p.city_slug = ${citySlug}
AND ST_Within(p.geom, ST_SetSRID(ST_GeomFromGeoJSON(${outers}), 4326))
ORDER BY p.category, p.subcategory, transit_time_s ASC, dist_m ASC
`);
for (const row of rows) {
gpIdArr.push(gp.id);
catArr.push(row.category);
subcatArr.push(row.subcategory);
poiIdArr.push(row.poi_id);
poiNameArr.push(row.poi_name);
distArr.push(row.dist_m);
timeArr.push(row.transit_time_s);
}
if (processed % 50 === 0) {
await job.updateProgress({
stage: "Transit routing",
pct: Math.min(95, Math.round((processed / gridPoints.length) * 95)),
message: `${processed}/${gridPoints.length} grid points — ${withTransit} with transit coverage`,
} satisfies JobProgress);
}
});
console.log(`[compute-transit] ${withTransit}/${gridPoints.length} grid points have transit coverage`);
// Bulk-insert all results
for (let i = 0; i < gpIdArr.length; i += INSERT_CHUNK) {
const end = Math.min(i + INSERT_CHUNK, gpIdArr.length);
await Promise.resolve(sql`
INSERT INTO grid_poi_details (
grid_point_id, category, subcategory, travel_mode,
nearest_poi_id, nearest_poi_name, distance_m, travel_time_s
)
SELECT
gp_id::bigint,
cat,
subcat,
'transit',
CASE WHEN poi_id IS NULL THEN NULL ELSE poi_id::bigint END,
poi_name,
dist,
time_s
FROM unnest(
${gpIdArr.slice(i, end)}::text[],
${catArr.slice(i, end)}::text[],
${subcatArr.slice(i, end)}::text[],
${poiIdArr.slice(i, end)}::text[],
${poiNameArr.slice(i, end)}::text[],
${distArr.slice(i, end)}::float8[],
${timeArr.slice(i, end)}::float8[]
) AS t(gp_id, cat, subcat, poi_id, poi_name, dist, time_s)
ON CONFLICT (grid_point_id, category, subcategory, travel_mode)
DO UPDATE SET
nearest_poi_id = EXCLUDED.nearest_poi_id,
nearest_poi_name = EXCLUDED.nearest_poi_name,
distance_m = EXCLUDED.distance_m,
travel_time_s = EXCLUDED.travel_time_s,
computed_at = now()
`);
}
await job.updateProgress({
stage: "Transit routing",
pct: 100,
message: `Transit routing complete: ${gpIdArr.length} POI entries for ${withTransit}/${gridPoints.length} grid points`,
} satisfies JobProgress);
}

View file

@ -0,0 +1,481 @@
/**
* Download and extract a GTFS feed ZIP so Valhalla can build transit tiles.
*
* The feed is saved to GTFS_DATA_DIR (default /data/valhalla/gtfs) inside the
* valhalla container, which owns the valhalla_tiles Docker volume.
*
* After extraction the feed is clipped to the bounding boxes of each known city
* (plus a small buffer) so that valhalla_ingest_transit only processes stops and
* trips that are relevant reducing ingest time from hours to minutes for
* country-wide feeds like gtfs.de.
*
* After this job completes, the next build-valhalla run will automatically
* call valhalla_ingest_transit and produce transit-capable routing tiles.
*
* Source: https://download.gtfs.de/germany/nv_free/latest.zip
* Covers all German ÖPNV (local public transport), updated regularly.
*/
import type { Job } from "bullmq";
import {
createReadStream,
createWriteStream,
existsSync,
mkdirSync,
readdirSync,
renameSync,
rmSync,
readFileSync,
writeFileSync,
} from "fs";
import { mkdir } from "fs/promises";
import { pipeline } from "stream/promises";
import { Readable } from "stream";
import { createInterface } from "readline";
import * as path from "path";
import unzipper from "unzipper";
import type { JobProgress } from "@transportationer/shared";
export type DownloadGtfsDeData = {
type: "download-gtfs-de";
url: string;
/** Re-download even if data already exists (default: false) */
force?: boolean;
/**
* Per-city bounding boxes [minLng, minLat, maxLng, maxLat] used to clip the
* feed after extraction. Each bbox should already include a buffer. A stop is
* kept when it falls inside ANY of the boxes. When absent the full feed is kept.
*/
bboxes?: [number, number, number, number][];
};
const GTFS_DATA_DIR = process.env.GTFS_DATA_DIR ?? "/data/valhalla/gtfs";
const GTFS_ZIP_PATH = `${GTFS_DATA_DIR}/feed.zip`;
const GTFS_FEED_DIR = `${GTFS_DATA_DIR}/feed`;
/** Records which source/bboxes last populated GTFS_FEED_DIR. JSON format. */
const SOURCE_MARKER = `${GTFS_FEED_DIR}/.source`;
// ─── Source marker helpers ────────────────────────────────────────────────────
interface SourceMarker {
source: string;
bboxes?: [number, number, number, number][];
}
function readSourceMarker(): SourceMarker | null {
if (!existsSync(SOURCE_MARKER)) return null;
const content = readFileSync(SOURCE_MARKER, "utf8").trim();
try {
return JSON.parse(content) as SourceMarker;
} catch {
// Legacy format: plain string written by older versions
return { source: content };
}
}
function writeSourceMarker(source: string, bboxes?: [number, number, number, number][]): void {
writeFileSync(SOURCE_MARKER, JSON.stringify({ source, bboxes }));
}
/** True when `outer` fully contains `inner`. */
function bboxContains(
outer: [number, number, number, number],
inner: [number, number, number, number],
): boolean {
return outer[0] <= inner[0] && outer[1] <= inner[1] && outer[2] >= inner[2] && outer[3] >= inner[3];
}
/**
* True when every bbox in `requested` is covered by at least one bbox in `existing`.
* If `existing` is empty/absent the data was unfiltered, which covers everything.
*/
function allBboxesCovered(
existing: [number, number, number, number][] | undefined,
requested: [number, number, number, number][],
): boolean {
if (!existing || existing.length === 0) return true; // unfiltered → covers all
return requested.every((req) => existing.some((ex) => bboxContains(ex, req)));
}
// ─── GTFS bbox filter ─────────────────────────────────────────────────────────
/**
* Clip an extracted GTFS feed in-place to the union of the given bboxes.
*
* Algorithm:
* 1. Filter stops.txt: keep stops whose lat/lon falls inside ANY bbox.
* 2. Pass 1 over stop_times.txt (streaming): collect trip_ids with 1 stop
* inside a bbox.
* 3. Pass 2 over stop_times.txt (streaming): write filtered rows to a temp
* file, then replace the original.
* 4. Filter trips.txt collect validRouteIds / validServiceIds / validShapeIds.
* 5. Filter routes.txt, calendar.txt, calendar_dates.txt.
* 6. Stream-filter shapes.txt (can be large).
*/
async function filterGtfsByBboxes(
feedDir: string,
bboxes: [number, number, number, number][],
): Promise<void> {
if (bboxes.length === 0) return;
console.log(
`[download-gtfs-de] Filtering GTFS to ${bboxes.length} bbox(es):`,
bboxes.map((b) => `[${b.map((v) => v.toFixed(3)).join(",")}]`).join(" "),
);
// ── CSV helpers ─────────────────────────────────────────────────────────────
function splitCsv(line: string): string[] {
if (!line.includes('"')) return line.split(",");
const result: string[] = [];
let current = "";
let inQuotes = false;
for (let i = 0; i < line.length; i++) {
const ch = line[i];
if (ch === '"') {
if (inQuotes && line[i + 1] === '"') { current += '"'; i++; }
else inQuotes = !inQuotes;
} else if (ch === "," && !inQuotes) {
result.push(current); current = "";
} else {
current += ch;
}
}
result.push(current);
return result;
}
function colIndex(header: string): Map<string, number> {
return new Map(splitCsv(header).map((c, i) => [c.trim().replace(/^\uFEFF/, ""), i]));
}
function inAnyBbox(lat: number, lon: number): boolean {
return bboxes.some(([minLng, minLat, maxLng, maxLat]) =>
lat >= minLat && lat <= maxLat && lon >= minLng && lon <= maxLng,
);
}
/** Filter a small CSV file (fits in memory) in-place. */
function filterSmallCsv(
filePath: string,
keepRow: (idx: Map<string, number>, fields: string[]) => boolean,
onKept?: (idx: Map<string, number>, fields: string[]) => void,
): void {
if (!existsSync(filePath)) return;
const lines = readFileSync(filePath, "utf8").split(/\r?\n/).filter((l) => l.trim());
if (lines.length < 2) return;
const idx = colIndex(lines[0]);
const out = [lines[0]];
for (let i = 1; i < lines.length; i++) {
const fields = splitCsv(lines[i]);
if (keepRow(idx, fields)) {
if (onKept) onKept(idx, fields);
out.push(lines[i]);
}
}
writeFileSync(filePath, out.join("\n") + "\n");
}
/** Stream-filter a large CSV file in-place via a temp file. */
async function filterLargeCsv(
filePath: string,
keepRow: (targetCol: number, line: string) => boolean,
getTargetCol: (idx: Map<string, number>) => number,
): Promise<void> {
if (!existsSync(filePath)) return;
const tmpPath = filePath + ".tmp";
const writer = createWriteStream(tmpPath);
let isFirst = true;
let targetCol = -1;
const rl = createInterface({ input: createReadStream(filePath), crlfDelay: Infinity });
for await (const line of rl) {
if (!line.trim()) continue;
if (isFirst) {
isFirst = false;
targetCol = getTargetCol(colIndex(line));
writer.write(line + "\n");
continue;
}
if (keepRow(targetCol, line)) writer.write(line + "\n");
}
await new Promise<void>((resolve, reject) =>
writer.end((err?: unknown) => (err ? reject(err) : resolve())),
);
renameSync(tmpPath, filePath);
}
// ── Step 1: filter stops.txt by bbox → validStopIds ──────────────────────
const stopsPath = path.join(feedDir, "stops.txt");
if (!existsSync(stopsPath)) {
console.log("[download-gtfs-de] No stops.txt — skipping GTFS bbox filter");
return;
}
const validStopIds = new Set<string>();
filterSmallCsv(
stopsPath,
(idx, fields) => {
const lat = parseFloat(fields[idx.get("stop_lat") ?? -1] ?? "NaN");
const lon = parseFloat(fields[idx.get("stop_lon") ?? -1] ?? "NaN");
return inAnyBbox(lat, lon);
},
(idx, fields) => {
validStopIds.add(fields[idx.get("stop_id") ?? -1] ?? "");
},
);
console.log(`[download-gtfs-de] Bbox filter: ${validStopIds.size} stops in area`);
if (validStopIds.size === 0) {
console.warn(
"[download-gtfs-de] No stops found in any bbox — GTFS filter skipped " +
"(check bbox coverage and feed area)",
);
return;
}
// ── Step 2 (pass 1): collect trip_ids that serve the area ─────────────────
const stopTimesPath = path.join(feedDir, "stop_times.txt");
if (!existsSync(stopTimesPath)) {
console.log("[download-gtfs-de] No stop_times.txt — skipping trip filter");
return;
}
const validTripIds = new Set<string>();
{
let stopIdCol = -1;
let tripIdCol = -1;
let isFirst = true;
const rl = createInterface({ input: createReadStream(stopTimesPath), crlfDelay: Infinity });
for await (const line of rl) {
if (!line.trim()) continue;
if (isFirst) {
isFirst = false;
const idx = colIndex(line);
stopIdCol = idx.get("stop_id") ?? -1;
tripIdCol = idx.get("trip_id") ?? -1;
continue;
}
// stop_id and trip_id never contain commas/quotes — fast split is safe
const fields = line.split(",");
if (stopIdCol >= 0 && validStopIds.has(fields[stopIdCol] ?? "")) {
validTripIds.add(fields[tripIdCol] ?? "");
}
}
}
console.log(`[download-gtfs-de] Bbox filter: ${validTripIds.size} trips serve the area`);
// ── Step 2 (pass 2): write filtered stop_times.txt ────────────────────────
await filterLargeCsv(
stopTimesPath,
(tripIdCol, line) => validTripIds.has(line.split(",")[tripIdCol] ?? ""),
(idx) => idx.get("trip_id") ?? -1,
);
// ── Step 3: filter trips.txt ───────────────────────────────────────────────
const validRouteIds = new Set<string>();
const validServiceIds = new Set<string>();
const validShapeIds = new Set<string>();
filterSmallCsv(
path.join(feedDir, "trips.txt"),
(idx, fields) => validTripIds.has(fields[idx.get("trip_id") ?? -1] ?? ""),
(idx, fields) => {
validRouteIds.add(fields[idx.get("route_id") ?? -1] ?? "");
validServiceIds.add(fields[idx.get("service_id") ?? -1] ?? "");
const shapeId = fields[idx.get("shape_id") ?? -1] ?? "";
if (shapeId) validShapeIds.add(shapeId);
},
);
// ── Step 4: filter remaining files ────────────────────────────────────────
filterSmallCsv(
path.join(feedDir, "routes.txt"),
(idx, fields) => validRouteIds.has(fields[idx.get("route_id") ?? -1] ?? ""),
);
for (const name of ["calendar.txt", "calendar_dates.txt"] as const) {
filterSmallCsv(
path.join(feedDir, name),
(idx, fields) => validServiceIds.has(fields[idx.get("service_id") ?? -1] ?? ""),
);
}
// shapes.txt can be large — stream it
if (validShapeIds.size > 0) {
await filterLargeCsv(
path.join(feedDir, "shapes.txt"),
(col, line) => validShapeIds.has(line.split(",")[col] ?? ""),
(idx) => idx.get("shape_id") ?? -1,
);
}
console.log(
`[download-gtfs-de] GTFS filter complete: ` +
`${validStopIds.size} stops, ${validTripIds.size} trips, ${validRouteIds.size} routes`,
);
}
// ─── Job handler ──────────────────────────────────────────────────────────────
export async function handleDownloadGtfsDe(job: Job<DownloadGtfsDeData>): Promise<void> {
const { url, force = false, bboxes } = job.data;
const effectiveSource = "gtfs-de";
// ── Idempotency check ──────────────────────────────────────────────────────
//
// Skip entirely when source is unchanged AND data is present AND the existing
// filter already covers all requested bboxes.
//
// Filter-only (no re-download) when data is present with the same source but
// the existing data is unfiltered (marker has no bboxes) while bboxes are now
// requested. The unfiltered data on disk is the superset we need.
//
// Re-download when source changes OR the existing filter bbox set no longer
// covers all requested bboxes (e.g. a new city was added outside the
// previously covered area).
const existingMarker = readSourceMarker();
const sourceChanged = existingMarker?.source !== effectiveSource;
const dataExists = existsSync(GTFS_FEED_DIR) &&
readdirSync(GTFS_FEED_DIR).some((f) => f.endsWith(".txt"));
if (!force && !sourceChanged && dataExists) {
const existingBboxes = existingMarker?.bboxes;
// Does the existing filtered data cover all requested bboxes?
const bboxesCovered = !bboxes?.length || allBboxesCovered(existingBboxes, bboxes);
if (bboxesCovered) {
// Marker already reflects desired filtering?
const markerOk = !bboxes?.length || (existingBboxes && existingBboxes.length > 0);
if (markerOk) {
console.log(`[download-gtfs-de] GTFS feed up to date (source=${effectiveSource}), skipping`);
await job.updateProgress({
stage: "Downloading GTFS",
pct: 100,
message: "GTFS data already present and up to date.",
} satisfies JobProgress);
return;
}
// Data is unfiltered but bboxes are now requested — filter in place.
console.log(`[download-gtfs-de] Applying bbox filter to existing GTFS data`);
await job.updateProgress({
stage: "Downloading GTFS",
pct: 10,
message: "Filtering existing GTFS feed to city areas…",
} satisfies JobProgress);
await filterGtfsByBboxes(GTFS_FEED_DIR, bboxes!);
writeSourceMarker(effectiveSource, bboxes);
await job.updateProgress({
stage: "Downloading GTFS",
pct: 100,
message: "GTFS feed filtered to city areas.",
} satisfies JobProgress);
return;
}
// Existing filter too small — need fresh data from network.
console.log(`[download-gtfs-de] Existing GTFS filter too small for new areas — re-downloading`);
}
if (sourceChanged) {
console.log(
`[download-gtfs-de] Source changed ` +
`(${existingMarker?.source ?? "none"}${effectiveSource}), re-downloading`,
);
}
mkdirSync(GTFS_DATA_DIR, { recursive: true });
// ── Download ───────────────────────────────────────────────────────────────
await job.updateProgress({
stage: "Downloading GTFS",
pct: 5,
message: `Downloading GTFS feed (source: ${effectiveSource})…`,
} satisfies JobProgress);
const response = await fetch(url, { signal: AbortSignal.timeout(600_000) });
if (!response.ok || !response.body) {
throw new Error(`Failed to download GTFS: HTTP ${response.status} ${response.statusText}`);
}
const totalBytes = Number(response.headers.get("content-length") ?? 0);
let downloadedBytes = 0;
let lastReportedPct = 5;
const nodeReadable = Readable.fromWeb(response.body as Parameters<typeof Readable.fromWeb>[0]);
nodeReadable.on("data", (chunk: Buffer) => {
downloadedBytes += chunk.length;
if (totalBytes > 0) {
const pct = Math.min(55, 5 + Math.round((downloadedBytes / totalBytes) * 50));
if (pct > lastReportedPct + 4) {
lastReportedPct = pct;
void job.updateProgress({
stage: "Downloading GTFS",
pct,
message: `Downloading… ${(downloadedBytes / 1024 / 1024).toFixed(1)} / ${(totalBytes / 1024 / 1024).toFixed(1)} MB`,
bytesDownloaded: downloadedBytes,
totalBytes,
} satisfies JobProgress);
}
}
});
await pipeline(nodeReadable, createWriteStream(GTFS_ZIP_PATH));
console.log(`[download-gtfs-de] Downloaded ${(downloadedBytes / 1024 / 1024).toFixed(1)} MB`);
// ── Extract ────────────────────────────────────────────────────────────────
await job.updateProgress({
stage: "Downloading GTFS",
pct: 60,
message: "Extracting GTFS feed…",
} satisfies JobProgress);
if (existsSync(GTFS_FEED_DIR)) rmSync(GTFS_FEED_DIR, { recursive: true, force: true });
mkdirSync(GTFS_FEED_DIR, { recursive: true });
const zip = unzipper.Parse({ forceStream: true });
createReadStream(GTFS_ZIP_PATH).pipe(zip);
for await (const entry of zip) {
const e = entry as unzipper.Entry;
const destPath = path.join(GTFS_FEED_DIR, path.basename(e.path));
if (e.type === "Directory") { e.autodrain(); continue; }
await mkdir(path.dirname(destPath), { recursive: true });
await pipeline(e as unknown as NodeJS.ReadableStream, createWriteStream(destPath));
}
const extractedFiles = readdirSync(GTFS_FEED_DIR);
console.log(`[download-gtfs-de] Extracted ${extractedFiles.length} files to ${GTFS_FEED_DIR}`);
rmSync(GTFS_ZIP_PATH, { force: true });
// ── Bbox filter ────────────────────────────────────────────────────────────
if (bboxes && bboxes.length > 0) {
await job.updateProgress({
stage: "Downloading GTFS",
pct: 65,
message: `Filtering GTFS feed to ${bboxes.length} city area(s)…`,
} satisfies JobProgress);
await filterGtfsByBboxes(GTFS_FEED_DIR, bboxes);
}
writeSourceMarker(effectiveSource, bboxes?.length ? bboxes : undefined);
await job.updateProgress({
stage: "Downloading GTFS",
pct: 100,
message: bboxes?.length
? `GTFS feed ready and filtered to ${bboxes.length} city area(s) (source: ${effectiveSource}).`
: `GTFS feed ready: ${extractedFiles.length} files (source: ${effectiveSource}).`,
} satisfies JobProgress);
}

View file

@ -26,7 +26,10 @@ export async function handleDownloadPbf(
mkdirSync(OSM_DATA_DIR, { recursive: true });
const outputPath = `${OSM_DATA_DIR}/${citySlug}-latest.osm.pbf`;
const tmpPath = `${outputPath}.tmp`;
// Use job.id in the tmp path so two concurrent download-pbf jobs for the
// same city (one under extract-pois, one under build-valhalla) don't write
// to the same file and corrupt each other.
const tmpPath = `${outputPath}.${job.id}.tmp`;
// Idempotency: skip if a complete file is already on disk (supports
// parallel download-pbf instances for the same city PBF).

View file

@ -10,6 +10,8 @@ export type RefreshCityData = {
citySlug: string;
geofabrikUrl: string;
resolutionM?: number;
/** Set after flow.add() — the ID of the enqueued compute-scores job. */
computeScoresJobId?: string;
};
const OSM_DATA_DIR = process.env.OSM_DATA_DIR ?? "/data/osm";
@ -29,21 +31,45 @@ export async function handleRefreshCity(
// Read the user-specified bbox from the database (set at city creation time).
// If present, it will be passed to extract-pois to clip the PBF before import.
const bboxRows = await Promise.resolve(sql<{
minlng: number; minlat: number; maxlng: number; maxlat: number;
}[]>`
SELECT
ST_XMin(bbox)::float AS minlng,
ST_YMin(bbox)::float AS minlat,
ST_XMax(bbox)::float AS maxlng,
ST_YMax(bbox)::float AS maxlat
FROM cities WHERE slug = ${citySlug} AND bbox IS NOT NULL
`);
// Also read ALL city bboxes for the GTFS filter: each city gets its own bbox
// (with a small buffer) so valhalla_ingest_transit only processes relevant stops.
const [bboxRows, allCityBboxRows] = await Promise.all([
Promise.resolve(sql<{
minlng: number; minlat: number; maxlng: number; maxlat: number;
}[]>`
SELECT
ST_XMin(bbox)::float AS minlng,
ST_YMin(bbox)::float AS minlat,
ST_XMax(bbox)::float AS maxlng,
ST_YMax(bbox)::float AS maxlat
FROM cities WHERE slug = ${citySlug} AND bbox IS NOT NULL
`),
Promise.resolve(sql<{
minlng: number; minlat: number; maxlng: number; maxlat: number;
}[]>`
SELECT
ST_XMin(bbox)::float AS minlng,
ST_YMin(bbox)::float AS minlat,
ST_XMax(bbox)::float AS maxlng,
ST_YMax(bbox)::float AS maxlat
FROM cities WHERE bbox IS NOT NULL
`),
]);
const bbox: [number, number, number, number] | undefined =
bboxRows.length > 0
? [bboxRows[0].minlng, bboxRows[0].minlat, bboxRows[0].maxlng, bboxRows[0].maxlat]
: undefined;
// ~10 km buffer for GTFS stop coverage near city edges (0.09° ≈ 10 km)
const GTFS_BUFFER = 0.09;
const gtfsBboxes: [number, number, number, number][] = allCityBboxRows.map((r) => [
r.minlng - GTFS_BUFFER,
r.minlat - GTFS_BUFFER,
r.maxlng + GTFS_BUFFER,
r.maxlat + GTFS_BUFFER,
]);
await job.updateProgress({
stage: "Orchestrating pipeline",
pct: 0,
@ -65,15 +91,16 @@ export async function handleRefreshCity(
opts: JOB_OPTIONS["download-pbf"],
});
// For cities in Niedersachsen, ingest-boris-ni is dispatched in Phase 1
// of compute-scores so it runs in parallel with the routing jobs.
// For NI cities: ingest-boris-ni is dispatched in Phase 1 of compute-scores.
const niApplicable = !!(bbox && isInNiedersachsen(...bbox));
// Parallel pipeline DAG (bottom-up — leaves execute first):
//
// download-pbf ─┬─→ extract-pois ──┐
// │ ├─→ generate-grid → compute-scores
// download-pbf ─└─→ build-valhalla ─┘
// download-pbf ──────┬─→ extract-pois ────────────────────┐
// │ ├─→ generate-grid → compute-scores
// download-pbf ──┐ └─→ build-valhalla ──────────────────┘
// └──→ build-valhalla (waits for both ↑)
// download-gtfs-de ──┘
//
// compute-scores Phase 1 also dispatches ingest-boris-ni (NI cities only)
// as a child alongside the routing jobs, so it runs during routing.
@ -83,7 +110,7 @@ export async function handleRefreshCity(
data: {
type: "compute-scores" as const,
citySlug,
modes: ["walking", "cycling", "driving"] as const,
modes: ["walking", "cycling", "driving", "transit"] as const,
thresholds: [5, 10, 15, 20, 30],
ingestBorisNi: niApplicable,
},
@ -120,7 +147,24 @@ export async function handleRefreshCity(
...(bbox ? { bbox } : {}),
},
opts: JOB_OPTIONS["build-valhalla"],
children: [downloadNode()],
children: [
downloadNode(),
// Download GTFS feed before building tiles so valhalla_build_transit
// runs during this build. The job is idempotent — it skips immediately
// if the feed is already present, so subsequent refreshes are cheap.
{
name: "download-gtfs-de",
queueName: "valhalla",
data: {
type: "download-gtfs-de" as const,
url: "https://download.gtfs.de/germany/nv_free/latest.zip",
// Per-city bboxes (with ~10 km buffer) so valhalla_ingest_transit
// only processes stops/trips relevant to the known cities.
...(gtfsBboxes.length > 0 ? { bboxes: gtfsBboxes } : {}),
},
opts: JOB_OPTIONS["download-gtfs-de"],
},
],
},
],
},
@ -128,12 +172,21 @@ export async function handleRefreshCity(
};
const flow = new FlowProducer({ connection: createBullMQConnection() });
let computeScoresJobId: string | undefined;
try {
await flow.add(rootNode);
const jobNode = await flow.add(rootNode);
// jobNode.job is the root (compute-scores) job. Store its ID in this
// refresh-city job's data so the SSE stream can match by exact job ID
// rather than by citySlug (which would match stale completed jobs).
computeScoresJobId = jobNode.job.id;
} finally {
await flow.close();
}
if (computeScoresJobId) {
await job.updateData({ ...job.data, computeScoresJobId });
}
await job.updateProgress({
stage: "Orchestrating pipeline",
pct: 100,

View file

@ -3,6 +3,7 @@ import { spawn, type ChildProcess } from "child_process";
import { existsSync } from "fs";
import { createBullMQConnection } from "./redis.js";
import { handleBuildValhalla } from "./jobs/build-valhalla.js";
import { handleDownloadGtfsDe } from "./jobs/download-gtfs-de.js";
const VALHALLA_CONFIG = process.env.VALHALLA_CONFIG ?? "/data/valhalla/valhalla.json";
@ -62,6 +63,10 @@ const worker = new Worker(
startValhallaService();
}
if (job.data.type === "download-gtfs-de") {
await handleDownloadGtfsDe(job as any);
return;
}
await handleBuildValhalla(job as any, restartService);
},
{

View file

@ -6,6 +6,71 @@ const COSTING: Record<"walking" | "cycling" | "driving", string> = {
driving: "auto",
};
// Standard contour times used for transit isochrones.
// Must match the scoring thresholds used in compute-scores.
export const TRANSIT_CONTOUR_MINUTES = [5, 10, 15, 20, 30] as const;
// Fixed weekday morning departure for reproducible transit scores.
// GTFS schedules repeat weekly, so the exact date doesn't matter — any Tuesday works.
const TRANSIT_DEPARTURE = "2024-01-16T08:00";
export interface TransitContour {
minutes: number;
/** GeoJSON Polygon or MultiPolygon geometry of the reachable area */
geojson: object;
}
/**
* Fetch a transit isochrone for a point using Valhalla's multimodal costing.
* Returns an array of contour polygons sorted from smallest to largest,
* or null if transit routing fails (e.g. no GTFS data loaded in Valhalla).
*/
export async function fetchTransitIsochrone(
source: LatLng,
): Promise<TransitContour[] | null> {
const body = {
locations: [{ lat: source.lat, lon: source.lng }],
costing: "multimodal",
contours: TRANSIT_CONTOUR_MINUTES.map((t) => ({ time: t })),
polygons: true,
costing_options: {
transit: { use_bus: 1.0, use_rail: 1.0, use_transfers: 1.0 },
},
date_time: { type: 1, value: TRANSIT_DEPARTURE },
};
let resp: Response;
try {
resp = await fetch(`${VALHALLA_URL}/isochrone`, {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify(body),
signal: AbortSignal.timeout(30_000),
});
} catch {
return null;
}
if (!resp.ok) return null;
let data: { features?: Array<{ properties: { contour: number }; geometry: object }>; error?: unknown; error_code?: unknown };
try {
data = await resp.json() as typeof data;
} catch {
return null;
}
if (data.error || data.error_code || !Array.isArray(data.features)) return null;
const contours: TransitContour[] = [];
for (const minutes of TRANSIT_CONTOUR_MINUTES) {
const feature = data.features.find((f) => f.properties?.contour === minutes);
if (feature?.geometry) contours.push({ minutes, geojson: feature.geometry });
}
return contours.length >= 2 ? contours : null;
}
export interface LatLng {
lat: number;
lng: number;