Look at this site.
- Load super fast. Less than 30KB data is transmitted for this page (before anybody adds any comments). Less than 5KB for index page.
- Responsive. Works on your feature phones.
- From my side – super convenient to publish. Just type
- Write in
- Supports hierarchial comment threads.
- Look not bad?
So let's go over it.
Update: I figured out somebody figured out I didn't validate empty name. Just fixed it. Guess how do I fix it? I fix it in the REPL! Yes I run my server in REPL. Most modern people have forgot about this kind of ancient magic. XD
Update 2: Somebody actually discovered a bug about reply # (though not obvious), many thanks! And finally ad spam bot visited this site. I'm doing some anti-spam now but don't be surprise if something funny show up in the comments.
The desired feature set:
- Post blogs.
- Comment board. IRC-like, no authorization required (Following the idea described in Why no HTTPS at this site: Morality instead of Barriers).
- Better not look bad.
It seems hard to choose blog engines, some of them are over bloated and others seems too elementary. But following the goals of the system it's not that hard to choose our tools.
- We just need some HTML generator. org-static-blog seems like a good choice. It's a simple org-to-html generator and is super easy to use. One canveat is that up to the generated index page contain the whole body of the posts. That's not a hard problem to fix.
- Write some Common Lisp. We always need a HTTP server to run a site anyway so just add some more things besides
publish-directory. We use the awesome framework AllegroServe.
- Write some CSS.
M-x package-install org-static-blog, play around with it and look at its source. Use
The function used for generating index page is
org-static-blog-assemble-index, which calls
org-static-blog-assemble-multipost-page that does most of the actual job.
org-static-blog-assemble-multipost-page gets content of the posts from
Bingo! To alternate the appearance of the index page we just need to use another function to replace this call.
(defun org-static-blog-get-preview (post-filename) (with-temp-buffer (insert-file-contents (org-static-blog-matching-publish-filename post-filename)) (let ((title-start) (paragraph-end) (post-start) (post-end)) (goto-char (point-min)) (setq title-start (search-forward "<div id=\"content\">")) (search-forward "<h1 class=\"post-title\">") (replace-match "<h2 class=\"post-title\">") (search-forward "</h1>") (replace-match "</h2>") (when (search-forward "<p>" nil t) (search-forward "</p>")) (setq paragraph-end (point)) (goto-char (point-max)) (search-backward "<div id=\"postamble\" class=\"status\">") (setq post-end (search-backward "</div>")) (search-backward "<div class=\"taglist\">") (search-backward ">") ;; eat the returns/white spaces (setq post-start (+ (point) 1)) (concat (buffer-substring-no-properties title-start paragraph-end) (if (equal paragraph-end post-start) "" "(...)") (buffer-substring-no-properties post-start post-end)))))
Those searching and replacing does not look elegant (it is used in org-static-blog everywhere anyway), but since this is just a small site generating tool it's ok. The code snippet above basically search in a temporary Emacs buffer containing the full HTML of the posts to find the first paragraph and the taglist of it. Then it concatenates them to generate a preview.
Managing the data
Ok here comes the big part. Lots of things (like DBMS) come to my head…
But is this that complicated? Let's look at the goal:
- Comment board. IRC-like, no authorization required.
This is actually dead simple! Because no authorization would mean normally users cannot delete or edit comments (this also make some sense on system with authorizations, too, because this might make users feel more responsible for their comments). This naturally fits into the designed use case of plain file. You need DB only when you need frequent insertion (or a high performance system, which is not the very case here), but if users only add comments, we can solely append to a log file, and for single server this is almost the most optimal implementation. Solving the problem by arguing the problem does not exist, yeah!
So now let's think about the implementation detail.
Since each post is independent from others, we can store the log file of them seperately. For in memory cache, we can just make a hashtable mapping from the name of a post to its own data structure. Now the problem reduces to handling comments for one particular post.
We'd like to make hierarchial comment threads, which means each comment can reply to another comment and they are displayed in a tree. This means
- each comment has an
- we'd like to traverse the tree of the children of a comment when doing formatting
For 2. it is quite intuitive to store the list of direct children for each parent comment. We can then store all lists of child comment in a hashtable with their
parent-id as their key.
A schematic diagram of the data structure looks like:
+-----------------+ | Hashtable | Key: parent ID | 1 - 2 - 3 ...| +-+----+----+---- + | | | | ... ... | +----+ +----+ +----+ List of children: |cons|---|cons|---|cons|-... +----+ +----+ +----+ | | | | ... ... | +----+ +------------+ Child content |cons|---|HTML Content| +----+ +------------+ | +---+ Child ID | 3 | +---+
This is illustrated in the following code to insert a comment into our data structure:
(defun comment-table-insert (comment-table num-id num-parent content) (multiple-value-bind (child-list exist) (gethash num-parent comment-table) (if exist (progn (push (cons num-id content) (gethash num-parent comment-table)) (setf (gethash num-id comment-table) '())) (format t "ERROR: parent ~D does not exist.~%" num-parent))))
This simple design actually has more good things than we think!
- Posting new comment has time complexity
O(1). Baseline. (not true if somebody uses Python
dictto write a server)
- The traversal has time complexity
O(n), which is already optimal.
- The time complexity for creating the data structure from log file is
O(n), which is also optimal.
- When restart the server (or after GC cleaned up the memory data structure, if we future add weak pointers), loading from log file is guaranteed to restore the system to the state when it was shutdown – Just reuse the code for posting comments! When read from log file, we recreate the time order of the comments and recreate the process of building our data structure.
- When formatting HTML, if we just recursively traverse the list of children in order, then we naturally get new posts at the top of the page. (Pushing to child list preserves time order!)
Now there's a subtle problem: what to do with the "standalone comments" without parents? The solution is simple and beautiful: add an imaginary comment with ID 0 (root comment), and make any "standalone comment" a child of root. Then we can just reuse all of our code of formatting child comments.
As we've described, when serving
GET requests, we just need to recursively traverse the comment tree and write to html stream.
(defun format-comment-list (comment-table content child-list) (html ((:div class "comment") (:princ content) (mapc (lambda (child) (format-comment-list comment-table (cdr child) (gethash (car child) comment-table))) child-list)))) (defun format-comments (filename) (touch-comment-table filename) (let* ((comment-table (gethash filename *comment-table*)) (root-list (gethash 0 comment-table))) (if (null root-list) (html "No comments yet.") (mapc (lambda (child) (format-comment-list comment-table (cdr child) (gethash (car child) comment-table))) root-list))))
Here we use htmlgen from the AllegroServe framework. Check its documentation. Be careful about the "list beginning with a keyword symbol" and "list beginning with a list beginning with a keyword symbol". If you mysteriously get some
UNBOUND-VARIABLE you've probably messed up with those list (aka parenthesis) structures because then the macro will try to evaluate some part of your markup as Lisp expressions.
In fact, evaluating part of the markup language tree is a very powerful feature. The documentation says it throws the value of them away, that means if you want to generate some output from those inline Lisp expressions, just use nested
html macro. A simple example of conditionals:
(let ((content-stream (make-string-output-stream))) (html-stream content-stream (:p (:b (:princ-safe (format nil "#~D" (gethash filename *next-comment-id-table*)))) (:princ-safe (format nil " by ~a<" nickname)) (if empty-contact (html "CIA top secret") (html ((:a href (concatenate 'string "mailto:" contact)) (:princ-safe contact)))) (:princ-safe ">")) (:p (unless empty-uri (html ((:a href uri) (:princ-safe uri)))) (unless (or empty-uri empty-text) (html :br)) (unless empty-text (html (:princ-safe text))))) (get-output-stream-string content-stream))
Serving the clients
For generating full HTML page, we just look for a special line and replace it. Simple task, no need for a template system.
(defun response-with-comments (req ent filename info) (with-open-file (src-stream filename) (loop for line = (read-line src-stream nil) while line do (if (string-equal line (format nil "<!--%comments%-->")) (format-comments filename) (html (:princ line) :newline))) t))
There's a subtle problem on handling user
POST. If we just directly return the updated page, when user hit refresh or back button or whatever the form might be resubmitted. A common practice is the PRG pattern. We do it here.
(publish-directory :prefix "/" :destination *document-root* :filter (lambda (req ent filename info) (if (string-equal "text/html" (gethash (pathname-type (pathname filename)) *mime-types*)) (case (request-method req) (:post (let ((nickname (request-query-value "nickname" req)) (contact (request-query-value "contact" req)) (parent (request-query-value "rep" req)) (text (request-query-value "text" req)) (uri (request-query-value "url" req))) (if (and nickname contact parent text uri) (post-new-comment req filename nickname contact parent text uri) (failed-request req))) (with-http-response (req ent :response *response-found*) (setf (reply-header-slot-value req :location) (request-uri req)) ;;redirect to same page (with-http-body (req ent))) t) (:get (with-http-response (req ent) (with-http-body (req ent) (response-with-comments req ent filename info))) t) (otherwise (failed-request req))) nil)))
That's basically how we make a comment system! The full code is at https://github.com/BlueFlo0d/site-server. There's some canveat, e.g., no cache for formatting comment and open file each time a comment is posted, I'll probably fix it but not very likely because none of this affects asymptomatic complexity, who cares about constants? BTW, the file opening problem is the fault of modern OSs which induce too much overhead on file opening, not my fault. (I'm kiddin lol)
Make it look better
Write some CSS. See CSS for this site.