Skip to content

Jobhunt - Architecture

Visual map of how a job moves from "posted on a site somewhere" to "Jonny sees it in the dashboard / Slack / email digest". The dotted boxes mark every point where this chat-session Claude (me) differs from the production scorer (the agent).

End-to-end flow

flowchart TD
  classDef source fill:#dbeafe,stroke:#1e40af,color:#1e3a8a
  classDef store fill:#e0e7ff,stroke:#4338ca,color:#312e81
  classDef gate fill:#fef3c7,stroke:#b45309,color:#78350f
  classDef llm fill:#fee2e2,stroke:#b91c1c,color:#7f1d1d
  classDef notify fill:#dcfce7,stroke:#15803d,color:#14532d
  classDef diff fill:#fce7f3,stroke:#be185d,color:#831843,stroke-dasharray: 4 3

  subgraph SOURCES["🌐 Scrapers (every 15-240 min depending on source)"]
    UP[Upwork]:::source
    LI[LinkedIn]:::source
    IN[Indeed via FlareSolverr]:::source
    RR[RemoteRocketship]:::source
    CO[Consortia]:::source
    IP[IntelligentPeople]:::source
    GE[Generic engine: CWJobs, CV-Library, Built In, Hays, Robert Walters, Michael Page, Sphere, Cranberry Panda, Jobserve]:::source
  end

  SOURCES --> DB[(SQLite jobs table)]:::store

  DB --> SUFFCHECK{"🚪 Sufficiency gate<br/>(>=500 chars, no read-more,<br/>terminal punct, structural signal)"}:::gate

  SUFFCHECK -->|insufficient| FETCH[Per-source detail fetch:<br/>LinkedIn / Indeed / generic HTTP]
  FETCH --> SUFFCHECK

  SUFFCHECK -->|still insufficient<br/>after 5 attempts| UNSCORABLE[status='unscorable'<br/>surfaces in dashboard<br/>for paste-and-rescore]:::store

  SUFFCHECK -->|sufficient| SYSPROMPT["📜 System prompt<br/>criteria.md + portfolio.md<br/>(with PART B/C routing by source)"]
  SYSPROMPT --> LLM["🧠 Scorer<br/>Claude Sonnet 4.6 via tool-use<br/>returns score + location_mode + engagement"]:::llm

  LLM --> CAPS["⚙️ _apply_caps()<br/>Python hard-clamp:<br/>onsite -> 3, hybrid_heavy -> 6, unknown -> 7"]:::gate

  CAPS --> SCORED[(scored job in DB)]:::store

  SCORED --> ROUTE{score band}

  ROUTE -->|9-10| SLACK[Slack instant ping]:::notify
  ROUTE -->|7-8| EMAIL[Daily 07:00 Madrid email digest]:::notify
  ROUTE -->|all| DASH[Flask dashboard localhost:5001<br/>filter by source/identity/status/score]:::notify

  %% Where I differ from the agent
  ME1[💬 Me in chat: WebFetch live URL<br/>often sees full spec the agent's DB row lacks]:::diff
  ME2[💬 Me in chat: full conversation history<br/>your specific corrections, founder gap acceptance,<br/>fintech subdomain distinctions, etc]:::diff
  ME3[💬 Me in chat: apply caps by judgement<br/>not via Python clamp]:::diff
  ME4[💬 Me in chat: can read criteria + reason explicitly<br/>without LLM softening pull]:::diff

  ME1 -.-> SUFFCHECK
  ME2 -.-> SYSPROMPT
  ME3 -.-> CAPS
  ME4 -.-> LLM

Where the agent differs from this chat (and how we close it)

Difference What I have What the agent has Mitigation
Job-spec visibility I can WebFetch live page anytime Agent reads description column - empty until detail-fetch runs ✅ Sufficiency gate added today. Description must be ≥500 chars + structural signals before scoring. Backfill running for historical empty rows.
Conversation context Full session history - corrections, founder-gap acceptance, payments≠retail banking distinction Stateless per call Codify everything in criteria.md + portfolio.md. Anything I know that should affect scoring goes in the system prompt. No verbal-only context.
Cap enforcement Apply caps by judgement, can't be "softened" LLM tendency to soften caps for great-looking roles Python _apply_caps() hard-clamps after LLM returns. Code, not prose.
Reasoning visibility I think out loud, you can correct me mid-stream Single shot, returns just a score + 120-char reason Add chain-of-thought scaffold to prompt so the LLM has to enumerate hard gates + count pass/fail BEFORE the score. Done in Task #11.
Tools WebFetch, WebSearch, Bash, Read None - just the criteria + the DB row Wire detail-fetchers in (done today). No live web search needed if the spec is fully in the DB.

Components reference

Scrapers (src/scrapers/) - upwork.py - find-work pages (logged-in profile, SSR-fast) - linkedin.py - guest-mode search + auth-walled detail fetch - indeed.py - via FlareSolverr (Cloudflare-pass sidecar) - remoterocketship.py - headed Chromium under Xvfb - consortia.py - JS-rendered, settle-and-parse pattern - intelligentpeople.py - JS-rendered, skips filled vacancies - generic.py - one engine, per-site YAML config (CWJobs, CV-Library, Built In, Hays, Robert Walters, Michael Page, Sphere, Cranberry Panda, Jobserve) - detail_fetcher.py - generic HTTP fallback for sources without per-source fetcher

Scorer (src/scoring/) - prompt.py - assembles system prompt from criteria.md + portfolio.md - scorer.py - Claude tool-use call + _apply_caps() post-process

Storage (src/storage/) - db.py - SQLite wrapper, idempotent migrations - schema.sql - jobs + search_runs tables

Notifications (src/notifications/) - slack.py - 9-10 instant pings via webhook - email.py - daily 07:00 Madrid digest of 7-8 scores

Dashboard (src/dashboard/) - app.py - Flask on :5001 with filter/sort, manual paste-and-rescore for unscorable rows

Scheduler (scripts/main.py / run_scrape.py / score_jobs.py / send_digest.py) - APScheduler with cron + interval triggers - scrape every 15 min (per-source cadence overrides apply) - score every 5 min - digest at 07:00 Europe/Madrid

Cadences (per config/searches.yaml)

Source Default cadence
Upwork 15 min (high volume, fresh-first)
LinkedIn 120 min (anti-bot polite)
Indeed 60 min (FlareSolverr cycle is slow)
RemoteRocketship 120 min
Consortia 120 min
IntelligentPeople 120 min
Generic (all) 240 min

Scoring runs every 5 min on unscored backlog. Slack notify runs every 5 min on un-notified 9-10s. Digest is daily at 07:00 Europe/Madrid.