Hi Emacs community,
I'm an elisp noob, and I recently wrote a function to get the references on a wikipedia page. I plan on using it for org-mode/org-roam so I can do research faster (even though there's probably already a package for that sort of thing). Unfortunately, it's probably not as robust as I would like to think it is, as some of the dois/isbns appear to be missing in some wikipedia pages I've tested. Here it is for reference:
(defun get-wikipedia-references (subject)
"Gets references for a wikipedia article"
(let ((wikipedia-prefix-url "https://en.wikipedia.org/wiki/"))
(with-current-buffer
(url-retrieve-synchronously (concat wikipedia-prefix-url subject))
(let* ((html-start (progn (goto-char (point-min))
(re-search-forward "^$")))
(dom (libxml-parse-html-region (1+ (point)) (point-max)))
(result))
(dolist (cite-tag (dom-by-tag dom 'cite) result)
(let ((cite-class (dom-attr cite-tag 'class)))
(cond ((string-search "journal" cite-class)
(let ((a-tag (dom-search cite-tag (lambda (tag) (string-prefix-p "https://doi.org" (dom-attr tag 'href))))))
(setq result (cons (cons (concat "doi:" (dom-text a-tag))
(let* ((cite-texts (dom-texts cite-tag))
(title-beg (1+ (string-search "\"" cite-texts)))
(title-end (string-search "\"" cite-texts (1+ title-beg))))
(substring cite-texts title-beg title-end)
))
result))))
((string-search "book" cite-class)
(let ((a-tag (dom-search cite-tag (lambda (tag) (string-prefix-p "/wiki/Special:BookSources" (dom-attr tag 'href))))))
(setq result (cons (cons (concat "isbn:" (dom-text (dom-child-by-tag a-tag 'bdi)))
(dom-text (dom-child-by-tag cite-tag 'i)))
result))))
(t
(let ((a-tag (assoc 'a cite-tag)))
(setq result (cons (cons (dom-attr a-tag 'href) (dom-text a-tag)) result))))
))
)))))
(get-wikipedia-references "Graph_traversal")
(("doi:10.1109/SFCS.1979.34" . "Random walks, universal traversal sequences, and the complexity of maze problems")
("doi:10.1016/j.tcs.2015.11.017" . "Lower and upper competitive bounds for online directed graph exploration")
("doi:10.1016/j.tcs.2020.06.007" . "Online graph exploration on a restricted graph class: Optimal solutions for tadpole graphs")
("doi:10.1587/transinf.E92.D.1620" . "The Online Graph Exploration Problem on Restricted Graphs")
("doi:10.1016/j.tcs.2021.04.003" . "An improved lower bound for competitive graph exploration")
("doi:10.1137/0206041" . "An Analysis of Several Heuristics for the Traveling Salesman Problem"))
And yes, I know that I could probably use a library like s, dash, seq, or cl, but I try to keep my elisp functions free of those kind of things. I would appreciate any criticism from the Emacs community about my elisp!
You don't have anything to guard against a bad response from the server. e.g.
To position point at the end of the headers:
This:
Is more clearly expressed as:
Better yet, you could map over the elements you're interested in and accumulate the results via
mapcar
orcl-loop
. That would obviate the need for the "results" variable.You could probably shorten things by using the
dom-elements
function to directly search for the href's you're interested in in combination with dom-parent to get at the parent elements.Overall your function gets a 65 out of 130 ERU (elisp rating units).