List comprehension in R

Thomas Mailund
7 min readAug 13, 2017

First of all, my apologies for showing code examples as screenshots in this post. I couldn’t get Medium and Wordpress to show the code correctly. If you want to copy the actual code to play with it, you can get it here: https://github.com/mailund/dsl/blob/master/R/lcomp/lcomp.R

Right! To it!

If you are familiar with Python programming you will also be used to list comprehension. It is syntactic sugar you use to construct lists from other lists (or any structure you can iterate through in general). It isn’t actually an invention from Python — you have it in a lot of programming languages — but Python is probably the most used of these, so you are more likely to know it from there.

List comprehension gives you syntactic sugar to write expressions that combine filtering and mapping. You can construct a list by evaluating an expression for each element in another list and combine it with a predicate that must be true for the element you want to include in the result. Using it, you can write algorithms very succinctly. For example, a quick sort in Python can be implemented in a few lines if you use list comprehension:

R doesn’t have list comprehension, but it does have excellent support for meta-programming that will let you add it if you write your own domain specific language — which will be the topic for one of my upcoming books.

I wanted to implement it as a test for what I want to write about, so I played around with it a bit over the weekend, and came up with a solution that lets me, at least, implement quick sort as simply as I can in Python:

If it works, and it does, it works like this:

Here’s a solution I came up with. It isn’t perfect, but it is a starting point for playing around with it:

I use the rlang package for it, rather than working with raw quotes and evals, but you could just as easily do that. I just want to use the excuse of writing the next book to get more familiar with that package and tidy eval, so that is what I use.

This implementation is a bit more general than Python’s list comprehension — I can handle more than one list and more than one predicate. I Python, you might think you can handle more than one list, but you’d be wrong. You can write expressions that involves more than one, but those are just nested versions of list comprehension. You get an outer product of the lists rather than an inner product. I get do that as well by nesting my solution, but I can handle more.

Anyway, let me walk you through my solution.

The lcomp function takes an arbitrary number of arguments, the first of which must be an expression, the rest captured by the “…” special argument. I translate the first argument into a quoted expression using the enquo function. This works like the quote function, in a way, but it first substitute the parameter the user actually wrote into the expression and then quotes it. Similar to if you used the substitute function, but it handles environments correctly so we can evaluate the expression in the context where the user wrote it. The remaining elements, and there can be an arbitrary number of those, are quoted in a similar way using the quos function.

We set the expression aside for a little bit and consider the other arguments. Those will be the lists we should iterate over and the predicates for them.

The way I intend the function to be used, I will require that you name lists — see the “x = …” argument in the quick sort example — but that you provide the predicates as boolean expressions — the third argument in the calls in the quick sort example.

Named arguments will have a name in the quoted list of expressions, the others won’t, so I can use this to distinguish between lists provided to the function and the predicates over them.

I evaluate the lists in the context where the user defined them using the eval_tidy function and I just get the raw quoted predicates for the predicates using the UQE function.

After these statements, I have in “lists” the named values of the lists provided, and in “predicates” I have expressions with no environment associated. The latter might be a problem that I have to deal with at some point, but I’ll get back to that.

Anyway, I have the lists so I should be able to compute the expression in the first argument to lcomp in the context of each element in the lists. This is something that calls for mapping a function over the lists, but first we must create such a function.

It isn’t entirely trivial how we should evaluate the body of this function. Obviously, the body of the function must be the expression provided as the first argument to lcomp, but this expression could refer to both elements in the lists and variables in the calling context.

The way I chose to implement this is this:

The body of the function I set to be the raw expression that was provided to the lcomp function. This looses all information about the context in which the expression was provided, but I handle that by making that the enclosing environment of the function. Anything that isn’t provided as arguments to the function will be available through this environment, then. For the arguments, I just shove in the lists. This makes the function take the right arguments, and it will have the lists as defaults. The expression could be evaluated just from these default arguments if it is a vector expression and all the arguments are vectors — something we could make sure that the quick sort algorithm would provide — but it won’t work if we are combining lists and vectors and the expression is not vectorised, so I map the function over the lists instead.

Now I have all the values for the output list, but I have ignored the predicates. I did that because the predicates are a bit harder to handle.

The way I handle the predicates is similar to how I compute the values, though. I create functions that evaluate each predicate over the list, and then I “and” them together to get a boolean vector I can use to pick out the values I should return.

Getting the values where all predicates are satisfied is now only a question of returning the values where the “keep_index” vector is true:

This really isn’t an ideal solution. I compute values that I don’t need for the final result, and worse, I might end up raising all kinds of exceptions if some of the predicates are preventing me from evaluating an expression on illegal input.

Ideally, I should compute the predicates first and then use that to pick out the elements in the lists I should evaluate the expression on. But by now, there was a bottle of wine on the table and dinner was ready, so I put that aside for a later play session.

I plan to make list comprehension a chapter in my next R book, so keep an eye out if for it if you want to know what I come up with.

I will put up a draft of that book up on Gumroad as soon as I get something written. As always, if you buy it there early, you get it cheap, and you will get updates up until I finish it. I will try to raise enough to pay a copy editor to go through it before I try to sell it — for Functional Data Structures in R I have made exactly $12.75 which won’t pay for many pages of editing (it might pay for salt on the bread but not the bread), so it didn’t happen there, but if you help me out you will get a proof read book. If you send me comments and suggestions, though, I will try to get you a printed copy if I manage to sell the book. I only get ten copies, though, so I can only send a copy to those that really help.

--

--

Thomas Mailund

Associate professor in bioinformatics and author of a bunch of text books