C R & SQL

In the recent years, SQL has wound its way deeper and deeper in R workflows. Many data analysts switch between the two every day.

It turns out that within RStudio, there are actually (at least) two methods of previewing SQL queries. These two methods also have the added benefit of SQL code with syntax highlighting.

C.1 Preview a .sql file

When you open a new .sql file in RStudio, it automatically populates the file with the following code:

If you’ve run the previous dplyr/dbplyr code chunk in the same session, you’ll have both the connection and the data you need to test out this preview function yourself! Just create a .sql file with the following code:

`– !preview conn=src_memdb()$con

SELECT * FROM storms LIMIT 5`

Note that unlike in the tbl() example above, you have to pull out the connection from the src_memdb() using $con. This seems to be a special case; other connections seem to work without extra fussing. For the rest of this post, I’ll refer to this connection as con with one “n” (with the exception of the gif below, where I gave it the lengthier name conn_isolated).

con <- src_memdb()$con

C.2 SQL chunks in RMarkdown

While you can use SQL chunks with this setting, there is NO chunk preview option. You must trust your queries and knit the file to make sure everything runs. You get the syntax highlighting razzle-dazzle but alas– no preview.

It is in this very specific case where inline mode wins big time. SQL previews magically become an option, allowing you to interact with your beautifully colored SQL code.

C.3 Passing vars to/from SQL chunks

When mixing R and SQL, you often want to (a) get data from a database and then continue manipulating it in R, or (b) modify the query by passing in variables from R.

When using SQL chunks, you can specify an output variable using the output.var chunk option with the variable name as a string.

In inline mode, the preview will no longer appear when running the SQL chunk, but the variable will appear in your environment.

C.4 Query parameter

Adding a single variable is quite simple. Add ? before the variable in the SQL chunk and just make sure that the R variable has the same name.

Sometimes, you cannot simply pass in a parameter. The glue_sql() function is great at managing some of the magic for you. To pass in a column name as a parameter, for example, you can just wrap the string in glue_sql() and it just works.

C.5 Multiple parameters

What if you want to provide multiple parameters? Adding an * to the end of the variable collapses the vector into a single comma-separated expression, rather than outputting a vector of SQL expressions. This behavior is somewhat reminiscent of args and kwargs in Python, which uses * as an unpacking operator.

Of course, to pass these parameters to our SQL chunk, we need to make sure we have the appropriate parentheses around ?storm_status for the IN to run.

C.6 SQL FILES & CHUNKS

What if you have a gigantic SQL query that you want to store in a separate file but you also want to use chunks?

When you knit the file, the imported SQL (and its comments) is included in the output by default.