MonetDBLite for R
MonetDBLite for R is a SQL database that runs inside the R environment for statistical computing and does not require the installation of any external software. MonetDBLite is based on free and open-source MonetDB, a product of the Centrum Wiskunde & Informatica.
MonetDBLite is similar in functionality to RSQLite, but typically completes queries blazingly fast due to its columnar storage architecture and bulk query processing model. Since both of these embedded SQL options rely on the the R DBI interface, the conversion of legacy
RSQLite project syntax over to
MonetDBLite code should be a cinch.
MonetDBLite works seamlessly with the dplyr grammar of data manipulation. For a detailed tutorial of how to work with database-backed dplyr commands, see the dplyr databases vignette. To reproduce this vignette using MonetDBLite rather than RSQLite, simply replace the functions ending with
*_sqlite with the suffix
the latest released version from CRAN with
the latest development version from github using
If you encounter a bug, please file a minimal reproducible example on github. For questions and other discussion, please use stack overflow with the tag
monetdblite. The development version of MonetDBLite endures sisyphean perpetual testing on both unix and windows machines.
MonetDBLite outperforms all other SQL databases currently accessible by the R language and ranks competitively among other High Performace Computing options available to R users. For more detail, see Szilard Pafka's benchmarks.
If you want to store a database permanently (or to reconnect to a previously-initiated one), set the
dbdir to some folder path on your local machine. A new database that you would like to store permanently should be directed to an empty folder:
library(DBI) dbdir <- "C:/path/to/database_directory" con <- dbConnect(MonetDBLite::MonetDBLite(), dbdir)
To create a temporary server, create a DBI connection as follows:
library(DBI) con <- dbConnect(MonetDBLite::MonetDBLite())
Note that the above temporary server command is equivalent to initiating the server in the
tempdir() of your R session:
library(DBI) dbdir <- tempdir() con <- dbConnect(MonetDBLite::MonetDBLite(), dbdir)
Note that MonetDB may hiccup when using network drives, use servers stored on the same machine as the R session.
Versatile Data Importation
To efficiently copy a
data.frame object into a table within the MonetDBLite database, use
# directly copy a data.frame object to a table within the database dbWriteTable(con, "mtcars", mtcars)
To load a CSV file into a table within the database, provide the local file path of a
.csv file to
# construct an example CSV file on the local disk csvfile <- tempfile() write.csv(mtcars, csvfile, row.names = FALSE) # directly copy a csv file to a table within the database dbWriteTable(con, "mtcars2", csvfile) # append the same table to the bottom of the previous table dbWriteTable(con, "mtcars2", csvfile, append=TRUE) # overwrite the table with a new table dbWriteTable(con, "mtcars2", csvfile, overwrite=TRUE)
The SQL interface of MonetDBLite can also be used to manually create a table and import data:
# construct an example CSV file on the local disk csvfile <- tempfile() write.csv(mtcars, csvfile, row.names = FALSE) # start a SQL transaction dbBegin(con) # construct an empty table within the database, using a manually-defined structure dbSendQuery(con, "CREATE TABLE mtcars3 (mpg DOUBLE PRECISION, cyl INTEGER, disp DOUBLE PRECISION, hp INTEGER, drat DOUBLE PRECISION, wt DOUBLE PRECISION, qsec DOUBLE PRECISION, vs INTEGER, am INTEGER, gear INTEGER, carb INTEGER)") # copy the contents of a CSV file into the database, using the MonetDB COPY INTO command dbSendQuery(con, paste0("COPY OFFSET 2 INTO mtcars3 FROM '", csvfile, "' USING DELIMITERS ',','\n','\"' NULL as ''")) # finalize the SQL transaction dbCommit(con)
Note how we wrap the two commands in a transaction using
dbCommit. This creates all-or-nothing semantics. See the MonetDB documentation for details on how to create a table and how to perform bulk input.
Reading and Writing (Queries and Updates)
This section reviews how to pass SQL queries to an embedded server session and then pull those results into R. If you are interested in learning SQL syntax, perhaps review the w3schools SQL tutorial or the MonetDB SQL Reference Manual.
dbGetQuery function sends a
SELECT statement to the server, then returns the result as a
# calculate the average miles per gallon, grouped by number of cylinders dbGetQuery(con, "SELECT cyl, AVG(mpg) FROM mtcars GROUP BY cyl" ) # calculate the number of records in the _mtcars_ table dbGetQuery(con, "SELECT COUNT(*) FROM mtcars" )
dbSendQuery function can open a connection to some read-only query. Once initiated, the
res object below can then be accessed repeatedly with a
res <- dbSendQuery(con, "SELECT wt, gear FROM mtcars") first_sixteen_records <- fetch(res, n=16) dbHasCompleted(res) second_sixteen_records <- fetch(res, n=16) dbHasCompleted(res) dbClearResult(res)
dbSendQuery function should also be used to make edits to tables within the database:
# add a new column of kilometers per liter dbSendQuery(con, "ALTER TABLE mtcars ADD COLUMN kpl DOUBLE PRECISION" ) # populate that new column with kilometers per liter dbSendQuery(con, "UPDATE mtcars SET kpl = mpg * 0.425144" )
Glamorous Data Export
The contents of an entire table within the database can be transferred to an R
data.frame object with
dbReadTable. Since MonetDBLite is most useful for the storage and analysis of large datasets, there might be limited utility to copying an entire table into working RAM in R. The
dbReadTable function and a SQL
SELECT * FROM tablename command are equivalent:
# directly copy a table within the database to an R data.frame object x <- dbReadTable(con, "mtcars") # directly copy a table within the database to an R data.frame object y <- dbGetQuery(con, "SELECT * FROM mtcars" )
Special database informational functions
Certain administrative commands can be sent using either
dbSendQuery or with a custom DBI function:
# remove the table `mtcars2` from the database dbSendQuery(con, "DROP TABLE mtcars2" ) # remove the table `mtcars3` from the database dbRemoveTable(con, "mtcars3" )
Other administrative commands can be sent using
dbGetQuery or with a custom DBI function:
# list the column names of the mtcars table within the database names(dbGetQuery(con, "SELECT * FROM mtcars LIMIT 1" )) # list the column names of the mtcars table within the database dbListFields(con, "mtcars" )
Still other administrative commands are much easier to simply use the custom DBI function:
# print the names of all tables within the current database dbListTables(con)
MonetDBLite allows multiple concurrent connections to a single database, but does not allow more than one concurrent embedded server session (actively-running database). This is not an issue for most users since a single database can store thousands of individual tables. To switch between databases, however, the first server must be shut down before the second can be opened. To shutdown a server, include the
To globally shut down the embedded server session without the
con connection object, use:
MonetDBLite does not allow multiple R sessions to connect to a single database concurrently. As soon as a single R session loads an embedded server, that server is locked down and inaccessible to other R consoles.
Help us keep the lights on
0.5.2 - Jun 19, 2018
This has never been released to CRAN but is the last hurrah of the 0.5.x line with some bugs fixed. Kept here for future reference.
v0.6.0-6 - Jun 15, 2018
Pre-release of 0.6.0, based on MonetDB Mar2018
- Main new feature is
integer64support via the
bit64package. If you pass
integer64columns. See https://github.com/hannesmuehleisen/MonetDBLite-R/issues/2 for an example. Supports Zero-Copy for reading.
See https://www.monetdb.org/Downloads/ReleaseNotes for their changelog Main changes from there:
- A column default value can be used in a
UPDATE tname SET cname = DEFAULT, and INSERT statements:
INSERT INTO tname VALUES (..., DEFAULT, ...)
- Support for the creation of ordered indices, e.g.
CREATE ORDERED INDEX idxname ON mtcars (mpg, cyl);
- Support for non-cascading deletions, e.g.
DROP SCHEMA my_schema RESTRICT
- Support for schema comments
- Support for TRUNCATE statements conforming to the SQL:2008 standard:
TRUNCATE [ TABLE ] qname [ CONTINUE IDENTITY | RESTART IDENTITY ] [ RESTRICT | CASCADE ]
v0.5.1-6 - Jan 2, 2018
v0.5.0-4 - Nov 13, 2017
- Prepared statements
dbFetch(dbBind(dbSendQuery(con, "PREPARE select model from mtcars where cyl = ?"), list(4)), -1)
- Experimental in-memory mode
- Faster builds
- Fix protection issue with query results
- Fix memory leaks & segfaults in shutdown
- Stripped more unused code
- Various dplyr fixes
- Separated C library and R/Python/... frontends