Jobhunt - Architecture¶
Visual map of how a job moves from "posted on a site somewhere" to "Jonny sees it in the dashboard / Slack / email digest". The dotted boxes mark every point where this chat-session Claude (me) differs from the production scorer (the agent).
End-to-end flow¶
flowchart TD
classDef source fill:#dbeafe,stroke:#1e40af,color:#1e3a8a
classDef store fill:#e0e7ff,stroke:#4338ca,color:#312e81
classDef gate fill:#fef3c7,stroke:#b45309,color:#78350f
classDef llm fill:#fee2e2,stroke:#b91c1c,color:#7f1d1d
classDef notify fill:#dcfce7,stroke:#15803d,color:#14532d
classDef diff fill:#fce7f3,stroke:#be185d,color:#831843,stroke-dasharray: 4 3
subgraph SOURCES["🌐 Scrapers (every 15-240 min depending on source)"]
UP[Upwork]:::source
LI[LinkedIn]:::source
IN[Indeed via FlareSolverr]:::source
RR[RemoteRocketship]:::source
CO[Consortia]:::source
IP[IntelligentPeople]:::source
GE[Generic engine: CWJobs, CV-Library, Built In, Hays, Robert Walters, Michael Page, Sphere, Cranberry Panda, Jobserve]:::source
end
SOURCES --> DB[(SQLite jobs table)]:::store
DB --> SUFFCHECK{"🚪 Sufficiency gate<br/>(>=500 chars, no read-more,<br/>terminal punct, structural signal)"}:::gate
SUFFCHECK -->|insufficient| FETCH[Per-source detail fetch:<br/>LinkedIn / Indeed / generic HTTP]
FETCH --> SUFFCHECK
SUFFCHECK -->|still insufficient<br/>after 5 attempts| UNSCORABLE[status='unscorable'<br/>surfaces in dashboard<br/>for paste-and-rescore]:::store
SUFFCHECK -->|sufficient| SYSPROMPT["📜 System prompt<br/>criteria.md + portfolio.md<br/>(with PART B/C routing by source)"]
SYSPROMPT --> LLM["🧠 Scorer<br/>Claude Sonnet 4.6 via tool-use<br/>returns score + location_mode + engagement"]:::llm
LLM --> CAPS["⚙️ _apply_caps()<br/>Python hard-clamp:<br/>onsite -> 3, hybrid_heavy -> 6, unknown -> 7"]:::gate
CAPS --> SCORED[(scored job in DB)]:::store
SCORED --> ROUTE{score band}
ROUTE -->|9-10| SLACK[Slack instant ping]:::notify
ROUTE -->|7-8| EMAIL[Daily 07:00 Madrid email digest]:::notify
ROUTE -->|all| DASH[Flask dashboard localhost:5001<br/>filter by source/identity/status/score]:::notify
%% Where I differ from the agent
ME1[💬 Me in chat: WebFetch live URL<br/>often sees full spec the agent's DB row lacks]:::diff
ME2[💬 Me in chat: full conversation history<br/>your specific corrections, founder gap acceptance,<br/>fintech subdomain distinctions, etc]:::diff
ME3[💬 Me in chat: apply caps by judgement<br/>not via Python clamp]:::diff
ME4[💬 Me in chat: can read criteria + reason explicitly<br/>without LLM softening pull]:::diff
ME1 -.-> SUFFCHECK
ME2 -.-> SYSPROMPT
ME3 -.-> CAPS
ME4 -.-> LLM
Where the agent differs from this chat (and how we close it)¶
| Difference | What I have | What the agent has | Mitigation |
|---|---|---|---|
| Job-spec visibility | I can WebFetch live page anytime | Agent reads description column - empty until detail-fetch runs |
✅ Sufficiency gate added today. Description must be ≥500 chars + structural signals before scoring. Backfill running for historical empty rows. |
| Conversation context | Full session history - corrections, founder-gap acceptance, payments≠retail banking distinction | Stateless per call | Codify everything in criteria.md + portfolio.md. Anything I know that should affect scoring goes in the system prompt. No verbal-only context. |
| Cap enforcement | Apply caps by judgement, can't be "softened" | LLM tendency to soften caps for great-looking roles | Python _apply_caps() hard-clamps after LLM returns. Code, not prose. |
| Reasoning visibility | I think out loud, you can correct me mid-stream | Single shot, returns just a score + 120-char reason | Add chain-of-thought scaffold to prompt so the LLM has to enumerate hard gates + count pass/fail BEFORE the score. Done in Task #11. |
| Tools | WebFetch, WebSearch, Bash, Read | None - just the criteria + the DB row | Wire detail-fetchers in (done today). No live web search needed if the spec is fully in the DB. |
Components reference¶
Scrapers (src/scrapers/)
- upwork.py - find-work pages (logged-in profile, SSR-fast)
- linkedin.py - guest-mode search + auth-walled detail fetch
- indeed.py - via FlareSolverr (Cloudflare-pass sidecar)
- remoterocketship.py - headed Chromium under Xvfb
- consortia.py - JS-rendered, settle-and-parse pattern
- intelligentpeople.py - JS-rendered, skips filled vacancies
- generic.py - one engine, per-site YAML config (CWJobs, CV-Library, Built In, Hays, Robert Walters, Michael Page, Sphere, Cranberry Panda, Jobserve)
- detail_fetcher.py - generic HTTP fallback for sources without per-source fetcher
Scorer (src/scoring/)
- prompt.py - assembles system prompt from criteria.md + portfolio.md
- scorer.py - Claude tool-use call + _apply_caps() post-process
Storage (src/storage/)
- db.py - SQLite wrapper, idempotent migrations
- schema.sql - jobs + search_runs tables
Notifications (src/notifications/)
- slack.py - 9-10 instant pings via webhook
- email.py - daily 07:00 Madrid digest of 7-8 scores
Dashboard (src/dashboard/)
- app.py - Flask on :5001 with filter/sort, manual paste-and-rescore for unscorable rows
Scheduler (scripts/main.py / run_scrape.py / score_jobs.py / send_digest.py)
- APScheduler with cron + interval triggers
- scrape every 15 min (per-source cadence overrides apply)
- score every 5 min
- digest at 07:00 Europe/Madrid
Cadences (per config/searches.yaml)¶
| Source | Default cadence |
|---|---|
| Upwork | 15 min (high volume, fresh-first) |
| 120 min (anti-bot polite) | |
| Indeed | 60 min (FlareSolverr cycle is slow) |
| RemoteRocketship | 120 min |
| Consortia | 120 min |
| IntelligentPeople | 120 min |
| Generic (all) | 240 min |
Scoring runs every 5 min on unscored backlog. Slack notify runs every 5 min on un-notified 9-10s. Digest is daily at 07:00 Europe/Madrid.