Ph.D. student Mark Raasveldt tried to bridge the gap between relational databases and data science. Ph.D. defense 9 June 2020.
In his work Mark poses a problem of a gap between the relational database research community and data scientists. Raasveldt reckons this gap leads to inefficient use of databases in data science.
In his thesis Mark Raasveldt states the problem as follows: most data scientists use analytical tools, such as R, Python and C/C++, for their research. Mark sees these tools to be difficult to integrate with current database systems. As a result - slow and cumbersome data analysis.
"Data scientists have opted to reinvent database systems by developing a zoo of data management alternatives that perform similar tasks to classical database management systems, but have many of the problems that were solved in the database field decades ago," says Raasveldt.
What Mark is working on is the ways on facilitation of efficient and painless integration of analytical tools and relational database management systems.
This way has led him to a new data management system, called DuckDB, that was purpose-built for efficient and painless integration with R and Python (and other analytical tools). This management system is meant to be used as a mature database system that is not only used for research purposes.
"In DuckDB, we take all the lessons that we have learned investigating database-client integrations and create an easy-to-use and highly efficient embedded database." Raasveldt will continue his work as a postdoc at the CWI, where he will work on further developing DuckDB.
Comments