ArchiveBox

ArchiveBox

Self-hosted web archiving for bookmarks and links

26.3kstars
1.4kforks
Last commit: 3d ago
Repo age: 9y old
ArchiveBox screenshot

ArchiveBox is a self-hosted web archiving application that turns lists of URLs (bookmarks, RSS/Atom, browser exports, text files) into a local, browsable archive. It captures pages using multiple methods to increase resilience against link rot, and provides a web UI and CLI for managing collections.

Key Features

  • Ingest URLs from browser bookmarks/exports, RSS/Atom feeds, Pocket/Pinboard-style lists, and plain text
  • Multi-method archiving pipeline (e.g., raw HTML, single-file snapshot, screenshots, PDF, readability/text extraction, media downloads) to improve long-term preservation
  • Full-text search and filtering in the web UI, with tagging/metadata for organizing large collections
  • Scheduled/automatic archiving and re-archiving via cron/queue-style workflows
  • CLI-first operation plus a web interface for browsing, searching, and replaying saved content
  • Extensible “extractor” architecture to enable/disable capture methods and integrate external tools

Use Cases

  • Personal “read-it-later” vault that keeps offline copies of important articles and references
  • Team or research group evidence collection for sources, citations, and compliance records
  • Preserving documentation, vendor pages, or incident-related URLs for future auditing

Limitations and Considerations

  • Archive quality depends on target-site complexity; heavy JavaScript apps may require headless-browser based capture for fidelity
  • Storage can grow quickly when enabling media/video, PDFs, and screenshots across many links

ArchiveBox is well-suited for users who want durable, searchable link preservation beyond a bookmark manager. Its layered capture approach and automation options make it a practical tool for building long-lived web archives.

Categories:

Tags:

Tech Stack:

Share:

Similar Services

RSSHub

RSSHub

An extensible RSS feed generator for websites and platforms

41k
9k
Last commit: 17h ago

Generate RSS/Atom/JSON feeds from websites and services that lack native feeds, with hundreds of built-in routes and easy extensibility.

Alternative to:
Feedly
Feedly
+4
Glance

Glance

A fast, minimal, self-hosted dashboard for your feeds and services

31.1k
1.2k
Last commit: 1mo ago

Glance is a lightweight self-hosted startpage/dashboard that aggregates RSS/Atom feeds and service widgets (e.g., weather, markets, GitHub) into a single customizable homepage.

Alternative to:
Protopage
Protopage
+5
Karakeep

Karakeep

A self-hosted bookmark manager with archiving and full-text search

22.6k
1k
Last commit: 5d ago

Self-hosted bookmarking and read-it-later app with tagging, archiving, and full-text search for saved web pages.

Alternative to:
Pocket
Pocket
+4
Invidious

Invidious

Privacy-focused front-end for YouTube

18.4k
2.1k
Last commit: 18d ago

Invidious is an alternative YouTube web front-end that reduces tracking and improves performance, offering RSS feeds, subscriptions, and video playback without a Google account.

Alternative to:
YouTube
YouTube
+1
Linkwarden

Linkwarden

Self-hosted bookmark manager with archival snapshots

16.7k
656
Last commit: 4d ago

A self-hosted bookmark manager that organizes links with tags/collections and preserves content via screenshots and readable archives for reliable long-term reference.

Alternative to:
Pocket
Pocket
+2
FreshRSS

FreshRSS

A lightweight, self-hosted RSS/Atom feed aggregator

13.6k
1.1k
Last commit: 21h ago

Web-based RSS/Atom reader with multi-user support, powerful filters, and integrations for mobile and desktop clients.

Alternative to:
Feedly
Feedly
+4