A Haskell program to convert the Paperspan HTML export format to an Instapaper CVS import format with automatic –configuration file driven– designation to folders. The HXT library is used to parse the Paperspan HTML file and the CSV result is written to standard output.
Usage: see the Makefile.
Existing Paperspan folders are taken over by the conversion program. If the
Read Later folder is encountered then an automatic designation to folders (via regular expression rules, which are provided in a configuration file) is done. See the next section for details on this.
Automatic designation to folders
Paperspan HTML export to Instapaper CSV import. With folders.yaml configuration file, which contains Instapaper target folder names (for output file) and regular expressions (PCRE) for
text in Paperspan export (which is input). Each of the selectors in the configuration file (I have hundreds) is matched against the URL or text of the Paperspan link being imported, until a match is found and an associated folder can be designated to it. This is very useful when you have a lot of unorganized links in your Paperspan (which you did not yet move to a folder).
e.g. the Paperspan link
is matched with the following selector from folders.yaml:
which results in designation to the
Biology Health folder via its folderName (also in folders.yaml).
and the following CSV line is the result:
https://news360.com/article/563394549, "Stop prescribing hydroxychloroquine for Covid-19, warn researchers | Stop News – India TV", https://news360.com/article/563394549, "Biology Health", 1630495255000
The source code for the convertor program is on GitHub: maridonkers/paperspan2instapaper.
Disclaimer: this is a ‘one shot’ program (excuse my Haskell) that I’ve used only once to import an export of my 27,689 Paperspan article links into Instapaper. Update: still 2,140 undesignated links left; refining program; adding more rules.
Earlier this 2019-02-18-instapaper-export was used for an Instpaper HTML export to Paperspan import.
(Hopping back and forth between these excellent read-later/archiving solutions.)