February 18, 2019

Remove duplicates from Instapaper HTML export.

Instapaper

Instapaper turns web content – articles, stories, posts, videos, and even long emails – into a great reading experience.

Over the course of your day, you'll encounter things you want to save for later. With Instapaper, you simply push a button in your browser, or choose “send to Instapaper” in a linked mobile app. Instapaper then saves it for you, and makes it available in a beautiful, uncluttered, reading-optimized format on or your browser.

Hickory

Hickory parses HTML into Clojure data structures, so you can analyze, transform, and output back to HTML. HTML can be parsed into hiccup vectors, or into a map-based DOM-like format very similar to that used by clojure.xml. It can be used from both Clojure and Clojurescript. – Hickory (by David Santiago)

When adding links to Instapaper, sometimes duplicates are stored. These will also be in an Instapaper HTML export file.

A tool (built with Hickory) to remove duplicate hyperlinks from Instapaper HTML export files – instapaper (by Mari Donkers)

Binary download

The ready for use binary download can be found here: instapaper.jar

Execute via the following command (Java must be installed on your machine):

java -jar instapaper.jar inputfile.html outputfile.html

(Where inputfile.html is the Instapaper HTML export file, which can be generated via Instapaper -> Settings -> Export -> Download HTML file.)

Tags: Software Computer Clojure Mobile Web Internet Functional