Recommendation engines for scientific papers

Updated Feb 24th, with new info on PubChase

Netflix tells you what movies you should watch, Twitter tells you who to follow, and okcupid tells you who you should be dating. Wouldn’t it be nice if you knew what scientific papers you should be reading?

nic-cage-netflix

Despite the well-understood theory behind recommendation systems, and their source in academia, few recommendation systems exist for scientific papers. Here I compare the 3 recommendation systems that I could find:

Scores are based on:

  • Accuracy – how relevant the recommendations are to the user’s interests
  • Novelty – how novel the recommendations are to the user

I’ll also consider two other practical factors:

  • Cost
  • Ease of use

Mendeley

Mendeley, a citation manager, offers a recommendation system in its offline version. The recommendation system is only available to premium users (minimum cost of 5$ a month). It uses a collaborative filtering approach to find relevant papers for you based on the content of library of citations.

mendeley-suggest

I ponied up 5$ to test it out. I imported my Zotero library – containing about 1,100 citations – and let it do its magic. Here’s what it recommended to me:

  • How close are we to understanding v1? Bruno A Olshausen, David J Field in Neural computation (2005)
  • Dimensionality reduction in neural models: an information-theoretic generalization of spike-triggered average and covariance analysis. Jonathan W Pillow, Eero P Simoncelli in Journal of vision (2006)
  • Dynamics of orientation selectivity in the primary visual cortex and the importance of cortical inhibition. Robert Shapley, Michael Hawken, Dario L Ringach in Neuron (2003)
  • The “independent components” of natural scenes are edge filters. A J Bell, T J Sejnowski in Vision research (1997)
  • Spatiotemporal elements of macaque v1 receptive fields. Nicole C Rust, Odelia Schwartz, J Anthony Movshon, Eero P Simoncelli in Neuron (2005)
  • Normalization as a canonical neural computation. M Carandini, DJ Heeger (2012)
  • Decoding the activity of neuronal populations in macaque primary visual cortex. Arnulf B A Graf, Adam Kohn, Mehrdad Jazayeri, J Anthony Movshon in Nature neuroscience (2011)
  • Structure and function of visual area MT. Richard T Born, David C Bradley in Annual review of neuroscience (2005)
  • Neuronal adaptation to visual motion in area MT of the macaque. Adam Kohn, J Anthony Movshon in Neuron (2003)
  • Spatiotemporal energy models for the perception of motion. E H Adelson, J R Bergen in Journal of the Optical Society of America. A, Optics and image science (1985)

Well, I’ll give it points for relevance: all of these are excellent papers, and they’re relevant to my interests in vision and computational neuroscience. They’re so relevant that I have, in fact, already read each and every one of these papers (10/10!). Indeed, many of these papers were already in my library; Mendeley doesn’t do a good job of filtering what’s already in my library. It also seems like it likes NYU A LOT. Tony’s great, but, you know, there are other visual neuroscientists. As far as novelty goes, it’s underwhelming.

No matter, you can tell Mendeley that you already have some papers in your library. I did that, and ran the tool again:

  • Space and time in visual context. Odelia Schwartz, Anne Hsu, Peter Dayan in Nature reviews. Neuroscience (2007)
  • Analyzing neural responses to natural signals: maximally informative dimensions. Tatyana Sharpee, Nicole C Rust, William Bialek in Neural computation (2004)
  • Contrast dependence of contextual effects in primate visual cortex. J B Levitt, J S Lund in Nature (1997)
  • Adaptation changes the direction tuning of macaque MT neurons. Adam Kohn, J Anthony Movshon in Nature neuroscience (2004)
  • Functional mechanisms shaping lateral geniculate responses to artificial and natural stimuli. Valerio Mante, Vincent Bonin, Matteo Carandini in Neuron (2008)
  • Nature and interaction of signals from the receptive field center and surround in macaque V1 neurons. James R Cavanaugh, Wyeth Bair, J Anthony Movshon in Journal of neurophysiology (2002)
  • Spike-triggered neural characterization. Odelia Schwartz, Jonathan W Pillow, Nicole C Rust, Eero P Simoncelli in Journal of vision (2006)
  • Adaptive temporal integration of motion in direction-selective neurons in macaque visual cortex. Wyeth Bair, J Anthony Movshon in The Journal of neuroscience : the official journal of the Society for Neuroscience (2004)
  • Adaptation of the simple or complex nature of V1 receptive fields to visual statistics. Julien Fournier, Cyril Monier, Marc Pananceau, Yves Frégnac in Nature neuroscience (2011)
  • The tempotron: a neuron that learns spike timing-based decisions. Robert Gütig, Haim Sompolinsky in Nature neuroscience (2006)

Here again, all relevant, and this time I’d only read 5/10. The tempotron paper is the kind of recommendation I’m looking for: not necessarily in my field exactly, but related to my interests. If it kept spitting out tempotron papers, I’d be happy. Unfortunately, I ran it one more time, it showed 5 suggestions, and then once more – and now it recommends me nothing at all.

  • Relevance: 5/5
  • Novelty: 1/5
  • Ease of use: 5/5
  • Cost: 5$ a month
  • Verdict: has potential, but currently useless

Google Scholar

Screenshot at 2014-02-20 18:56:16

Google Scholar is a search engine for papers. By registering yourself, and identifying the papers that you’ve previously published, Scholar can recommend newly published papers that may be of interest to you. Google explains it thus:

We analyze your articles (as identified in your Scholar profile), scan the entire web looking for new articles relevant to your research, and then show you the most relevant articles when you visit Scholar.  We determine relevance using a statistical model that incorporates what your work is about, the citation graph between articles, the fact that interests can change over time, and the authors you work with and cite.

Scholar is therefore a content-based recommendation system. Here are some of the latest recommendations I received:

  • Adaptation Disrupts Motion Integration in the Primate Dorsal Stream, CA Patterson, SC Wissig, A Kohn – Neuron, 2014
  • Cascaded Effects of Spatial Adaptation in the Early Visual System. NT Dhruv, M Carandini – Neuron, 2014
  • Semantic Control of Feature Extraction from Natural Scenes. P Neri – The Journal of Neuroscience, 2014
  • Transforming Visual Percepts into Memories, U Rutishauser – Current Biology, 2014
  • Computational principles of microcircuits for visual object processing in the macaque temporal cortex, T Hirabayashi, Y Miyashita – Trends in Neurosciences, 2014
  • Simultaneous multi-channel spikes and inverted spikes in focal epileptic ECoG are more after offset than during the seizure, K Majumdar – Biomedical Signal Processing and Control, 2014
  • Using human neuroimaging to examine top-down modulation of visual perception, TC Sprague, JT Serences – serenceslab.ucsd.edu
  • Predicting human gaze beyond pixels, J Xu, M Jiang, S Wang, MS Kankanhalli, Q Zhao – Journal of vision, 2014
  • Regularized Brain Reading with Shrinkage and Smoothing, L Wehbe, A Ramdas, RC Steorts, CR Shalizi – arXiv
  • Mixing of Chromatic and Luminance Retinal Signals in Primate Area V1, X Li, Y Chen, R Lashgari, Y Bereshpolova… – Cerebral Cortex, 2014

I would say I’m interested in about 9/10 papers in these suggestions; interestingly, only 3-4 of the papers suggested are from groups whose work I’m familiar with; this strikes me as a good balance between precision and novelty

  • Relevance: 4.5/5
  • Novelty: 4/5
  • Ease of use: 3/5*
  • Cost: Free (as in beer)
  • Verdict: Scholar is a good recommendation engine. *To make good predictions, however, it does require you to have published a sizable number of papers, which limits its usability for new researchers (e.g. 2nd PhD students). It also only recommends new papers, never classics.

PubChase

PubChase is a new recommendation engine that indexes a library of citations – either your Mendeley library or a BibTex file, which can be generated by e.g. EndNote and Zotero – and cross-references them with new articles in PubMed. There’s web, iOS, and Android versions. I have yet to try the tablet versions, but I hear they are very good.

The pitch video includes soothing background music and cartoon characters:

I had a chat with one of the people behind PubChase, Matthew Davis – also a postdoc in a molecular biology lab – who developed the core recommendation engine. PubChase combines collaborative filtering and content-based approaches to recommendations. The user’s library is matched up against other users’ libraries to cast a wider net on potential matches. Papers are recommended based on different factors, including the impact factor of the journal it’s published it, its plain-text content, and especially authorship.

Indeed, Matthew described an algorithm that’s used by PubChase to map author names – which are non-unique plain text entities – to unique authors based on the pattern of co-occurence of authors in papers. This is especially helpful to resolve Chinese names – Li, Liu, Wu, Zhang, etc. – which are difficult to discriminate once mapped to the latin alphabet.

Authorship-based recommendation is especially helpful to keep track of people who published good papers as postdocs as part of a big lab once they’ve set up their own lab.

I tried importing my Zotero library, via Mendeley AND via BibTex. BibTex didn’t work, and the Mendeley method only succesfully imported about a third of my references. Matthew looked at my BibTex library and it appears that the issue was not in the library, nor in PubChase’s software, but rather in Amazon EC2 instances dropping requests; growing pains from too many new users, basically.

A couple of days later, it started spitting out recommendations; it now sends me an email every week with top recommendations. Very neat.

There’s some kinks to work out, for sure, but the recommendations it does give are encouraging:

  • Cascaded effects of spatial adaptation in the early visual system, Neuron, Dhruv NT, Carandini M
  • Adaptation disrupts motion integration in the primate dorsal stream, Patterson CA, Wissig SC, Kohn A
  • The projective field of retinal bipolar cells and its modulation by visual context, Neuron, Asari H, Meister M
  • Temporal Responses of C. elegans Chemosensory Neurons Are Preserved in Behavioral Dynamics, Neuron, Kato S, Xu Y, Cho CE, Abbott LF, Bargmann CI
  • Learning by the dendritic prediction of somatic spiking, Neuron, Urbanczik R, Senn W
  • Structured Synaptic Connectivity between Hippocampal Regions, Neuron, Druckmann S, Feng L, Lee B, Yook C, Zhao T, Magee JC, Kim J
  • Deprivation-Induced Strengthening of Presynaptic and Postsynaptic Inhibitory Transmission in Layer 4 of Visual Cortex during the Critical Period, The Journal Of Neuroscience. Nahmani M, Turrigiano GG

The first two recommendations, you will notice, also appear in Scholar. Relevance wise, I would say that about 5/7 of these papers interest me; OTOH, the novelty is pretty high; the C Elegans  and dendritic prediction papers, while not in my exact field, seem interesting. A week later, its recommendations were:

  • Neural Circuit Components of the Drosophila OFF Motion Vision Pathway, Current Biology, Meier M, Serbe E, Maisak MS, Haag J, Dickson BJ, Borst A
  • Equation-oriented specification of neural models for simulations,Frontiers In Neuroinformatics, Stimberg M, Goodman DF, Benichoux V, Brette R
  • Grid-layout and theta-modulation of layer 2 pyramidal neurons in medial entorhinal cortex, Science, Ray S, Naumann R, Burgalossi A, Tang Q, Schmidt H, Brecht M
  • Modulatory effects of inhibition on persistent activity in a cortical microcircuit model, Frontiers In Neural Circuits, Konstantoudaki X, Papoutsi A, Chalkiadaki K, Poirazi P, Sidiropoulou K
  • Genetic resolutions of brain convolutions, Science, Rash BG, Rakic P
  • Island cells control temporal association memory, Science, Kitamura T, Pignatelli M, Suh J, Kohara K, Yoshiki A, Abe K, Tonegawa S

Very cool. The balance of novelty and relevance seems about right to me.

  • Relevance: 4/5
  • Novelty: 5/5
  • Ease of use: 4/5*
  • Cost: Free (as in beer)
  • Verdict: PubChase is a worthy alternative to Scholar recommendations, especially for people who haven’t published a lot yet, or whose research interests are not entirely reflected in their publication history. It does have a few kinks here and there, and I would like it if, like Scholar, it integrated with Zotero so that I could add recommended papers directly in my library rather than in their online library. Nevertheless, an excellent addition to the short list of recommendation engines for papers.

Conclusion

Scholar and PubChase do a good job of finding new papers, but they can’t find old papers. What would be nice is if these recommendation engines worked with old papers as well, perhaps leveraging collaborative filtering or citation networks to do so. Then they could present the results NetFlix style, e.g. “Papers about vision”; “Classics”; “Random cog.sci. paper to remind yourself why you’re doing “real science””; “Papers featuring a title with a really bad pun”; etc.

Matthew Davis of PubChase mentioned that they’re working on such a feature, and it will be nice to see how useful this is. In any case, I’m happy to see people PubChase take on the Google behemoth; as much as I like Scholar, it really hasn’t evolved that much in 8 years, so perhaps some competition will get them off their laurels. Exciting times.


19 thoughts on “Recommendation engines for scientific papers

  1. Hei! ReadCube(www.readcube.com) also provides a recommended reading list based on your current library. One can also choose to get recommendations on specific tags instead of the whole library. Furthermore one can choose how far back one wants to search for recommended articles.

  2. Another quite clumsy but efficient way is to set email alerts on citations of your own key publications and other key publications in your field that would most likely be cited buy papers that you can’t allow to miss. This probably is anyways built in into all these algorithms but it is another line of defence.

  3. Ah, one correction regarding PubChase’s integration of journal information. While we do consider the journal as a factor in our algorithm, Impact Factor is not used explicitly or implicitly. It’s really simply not a good predictor.

      1. Oh that’s cool, I’ll try it and report on it later. I totally get what you’re saying – I don’t *need* more papers to read either – but oftentimes I find that I focus too much on reading papers which are immediately relevant to my current research, while reading more widely would be better in the long run. Hence a recommendation engine to streamline the process.

  4. I don’t find GoogleScholar that useful. What I find interesting to read is not the same as what I publish. GS provides a lot of “I should read that at some point” but it’s not of interest to me right now. Since I do somewhat interdisciplinary work I need to read widely and not just papers that are very similar to what I produce.

    1. That’s a good point. Hence the suggestion of having NetFlix-style recommendations, so that you get both recommendations for stuff you are currently working on AND recommendations about broader research.

  5. F1000 also have a recommendation system that works pretty well.

    Following which papers cite the most important papers in your field is always a good source of information.

    Finally, papers from the Annual Review series always attract citations from many papers that are not directly in your field. I found that following these citations does a pretty good job for finding interesting papers

    1. You’re talking about F1000 prime, which is 10$ a month, right? I’ll sign up for it now and evaluate it later.

      Do you know of any way of following which new paper cites classic papers in your field? Or is this a manual search?

    1. Hmmm, I’ll ask one of the new guys in the lab – one who doesn’t have any papers published – to sign up for Scholar and upload some citations to their library. We should be able to see whether Scholar generates any recommendations for him.

Leave a comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s