Main page Technology MariaDB ColumnStore performance optimization exercise

MariaDB ColumnStore performance optimization exercise

We are dealing with both relational and NoSQL databases and we are often involved in advising our customers on choosing better product for their unique circumstances. One of our favorites, MariaDB ColumnStore has number of immediate advantages for utilization in BI systems:

For companies already using MySQL it may offer additional benefit of not leaving MySQL universe.

However, there are other good NoSQL vendors around and we are always on the lookout for better product or features. This time we noticed number of discussions and blogs that claimed MariaDB ColumnStore as having significant performance issues comparing to competition.

For example, here they compare the following vendors (and MariaDB ColumnStore appears the slowest):

We decided to make our own test – and if necessary, to find a solution. Below is the description of our experiment.

We always approach optimization process using two principal methods:

  • Data Model optimization
  • Parameters optimization

Data Model Optimization

We looked at the tables used in the original tests and decided to replace VARCHAR and DECIMAL to INT UNSIGNED wherever possible.

Our environment: Centos 7 Guest on Oracle Virtual Box, MacBook Pro.

Original TableOur Table
CREATE TABLE dim_links ( movieId DECIMAL(25) , imdbId DECIMAL(25) , tmdbId DECIMAL(25) );CREATE TABLE `dim_links` ( `movieId` int(10) unsigned DEFAULT NULL, `imdbId` int(10) unsigned DEFAULT NULL, `tmdbId` bigint(20) unsigned DEFAULT NULL ) ENGINE=Columnstore DEFAULT CHARSET=latin1;
CREATE TABLE dim_movies ( movieId DECIMAL(20) , title VARCHAR(200) , genres VARCHAR(120) );CREATE TABLE `dim_movies` ( `movieId` int(10) unsigned DEFAULT NULL, `title` varchar(200) DEFAULT NULL, `genres` varchar(120) DEFAULT NULL ) ENGINE=Columnstore DEFAULT CHARSET=latin1;
CREATE TABLE dim_tags ( userId DECIMAL(20) , movieId DECIMAL(20) , tag VARCHAR(500) , “timestamp” VARCHAR(50) )CREATE TABLE `dim_tags` ( `userId` int(10) unsigned DEFAULT NULL, `movieId` int(10) unsigned DEFAULT NULL, `tag` varchar(100) DEFAULT NULL, `timestamp` bigint(20) DEFAULT NULL ) ENGINE=Columnstore DEFAULT CHARSET=latin1;
CREATE TABLE “fact_ratings”( “fact_id” BIGINT , “userId” DECIMAL , “movieId” DECIMAL , “rating” DOUBLE , “timestamp” VARCHAR(40) )CREATE TABLE `fact_ratings` ( `userId` int(11) DEFAULT NULL, `movieId` int(11) DEFAULT NULL, `rating` decimal(2,1) DEFAULT NULL, `timestamp` bigint(20) DEFAULT NULL ) ENGINE=Columnstore DEFAULT CHARSET=latin1;

Results

Test 1

SQL statement: SELECT AVG(fact_id) FROM fact_ratings;

VendorExecutuon time
MonetDB1.40 sec
HPE Vertica0.39 sec
MariaDB ColumnStore28.37 sec

Our result for MariaDB ColumnStore: 1.34 sec.

Test 2

SQL statement: SELECT rating, count(*) from fact_ratings group by rating order by rating;

VendorExecutuon time
MonetDB1.22 sec
HPE Vertica0.95 sec
MariaDB ColumnStore29.24 sec

Our result for MariaDB ColumnStore: 2.02 sec.

Test 3

SQL statement: select dim_movies.title, avg(fact_ratings.rating) from dim_movies join fact_ratings on dim_movies.movieId = fact_ratings.”movieId” group by dim_movies.title DESC order by dim_movies.title limit 20;

VendorExecutuon time
MonetDB6.53 sec
HPE Vertica14.94 sec
MariaDB ColumnStore5 min 22 sec

Our result for MariaDB ColumnStore: 10.98 sec.

Test 4

SQL statement: select dim_movies.title, fact_ratings.rating, dim_links.imdbid from dim_movies left outer join fact_ratings on dim_movies.movieId = fact_ratings.”movieId” left outer join dim_links on dim_movies.movieId = dim_links.movieid where fact_ratings.rating > (select avg(fact_ratings.rating) from fact_ratings) order by fact_ratings.rating DESC limit 20;

VendorExecutuon time
MonetDB8.47 sec
HPE Vertica10.12 sec
MariaDB ColumnStore6 min 4 sec

Our result for MariaDB ColumnStore: 17.55 sec.

Conclusion

We experimented using very simple optimization method and common hardware – and still results are pretty good. If we were using the same hardware as described in the original comparison or applied parameter optimization – we tend to believe our results would be even better.

We are glad to be able to confirm MariaDB ColumnStore capabilities and will continue recommending it to our customers.