HN Books

As I was falling asleep last night I had an idea about how to maintain a healthy consumer habit for stuff like Hacker News. I don’t have a working title yet, but the idea is simple: a weekly feed of HN posts in an ebook format that can be loaded to a kindle.

What I have in mind is a cover page that reads HN Book - {week number} or something of the sort. Then, like any good book, a table of contents. This would be just the title of each article, much like what we see on the HN front page.

Each article, then, becomes a “chapter” in the book. I will need some sort of way to extract the text from whatever link, and it will obviously not work well for non-static media types. The end of each chapter would then include the comments (perhaps a different chapter, aptly named, immediately after?).

The difficult part here is how to present the comments, given that there are multiple threads that can be displayed. Additionally, truncating the comments might be a good idea, since some posts that reach into the thousands of comments. A way to think about the structures of the comments is a tree, where each top-level comment is a branch, which can itself have either 0, 1, or mulitple branches. At the end, each comment without further comments is a leaf.

Selection of entries

An interesting issue here is to think about the selection of entries that will be included. In all cases, I suspect a subset of what makes it to the front page would be more than sufficient. However, there are still choices that need to be made: do I base the selection based on comment count? Or perhaps on how long the submissions stays on the front page.

Let’s walk through an example: assuming I use comment count for this, I can pull X amount of articles into the system. How many articles should be pulled? And if the process runs, say, on Sunday mornings, will this then generally prefer older articles - meaning an article that might be very good is submitted on Saturday evening, but it has not had a chance to recieve that many comments yet.

One way to complicate the system could be to look at the current and previous weeks - a rolling window - that keeps track of which entries are allowed. So for example, on the previous week there were three articles:

Python Tutorial - 2 comments
Rust Tutorial - 300 comments
Java Tutoria - 150 comments

On that particular week, assuming we load 2 articles, the Python Tutorial would be discarded. The following week we see:

Python Tutorial - 900 comments
Kotlin Tutorial - 500 comments
Rust Tutorial - 1500 comments

Again, keeping a limit of 2 articles per week here the system would pick Python and Rust. However, if it has seen the Rust tutorial before, discard that and grab the next best one - Kotlin, in this case.

Perhaps another way to think about this is to always look at the preceding two weeks - like that, there is always at least one week overlap. It does, again, require some sort of DB that tracks which links are included (this alone would be quite interesting).

Rewarding RSS

Part of the problem with such a system is the parsing of the sites themselves. Many blogs, etc. provide RSS links that contain the full text of the article. Many more, however, either truncate the RSS feeds or simply don’t have them.

Since the program is supposed to severely cut the amount of stuff flashing in front of my eyes, a selection criteria could be to simply include things that include full content RSS feeds, instead of everything. Alternatively, include only things that can be parsed with relative ease and don’t require unique, custom code for their specific sites.

Parting with FOMO

There is another option, which is to keep the system naive. Who cares if not every article is included? After all, this is already just a subset of the front page, which is a subset of all entries. The default experience is already to miss out on the vast majority of articles. The primary motivator here is to engage a bit more closely with each article, rather than skim through hundreds of entries.

Comment management

Comments are part of the charm of HN - the discussions tend to be more interesting than their counterparts on Reddit, etc. If the article selection is based on comment count primarily, then the comments need to be included in each edition of the book.

Comment 1
Comment 2
- Comment 2a
- Comment 2b
  - Comment 2b1
  - Comment 2b2
- Comment 2c
Comment 3

Assuming a comment structure like the one above, how does one parse and format the information into a print medium so that it remains readable?

On a computer, one could pin each parent comment to the top of the screen as the user scrolls down until a new parent comment is found (in other words, until a new thread is created). Or one could hide each comment as they read it, much like it already works on HN.

In a book, however, this does not work. Are there printed forums? Threads that have been converted to paper? It could be possible to think of each top level comment as a chapter, and then each sub-comment as a section in that chapter.

Previous examples

A few years ago I did something similar to this using RSS-bridge. The problem with that set up was the parsing of articles and comments - having a list of links in an ebook isn’t all that useful if you are still required to go into the internet for each link and load individual webpages.

arvb.net

Explorer