MariaDB ColumnStore performance optimization exercise
We are dealing with both relational and NoSQL databases and we are often involved in advising our customers on choosing better product for their unique circumstances. One of our favorites, MariaDB ColumnStore has number of immediate advantages for utilization in BI systems:
- Transaction support
- Powerful MySQL API
- Analytical functions
- Open Source
For companies already using MySQL it may offer additional benefit of not leaving MySQL universe.
However, there are other good NoSQL vendors around and we are always on the lookout for better product or features. This time we noticed number of discussions and blogs that claimed MariaDB ColumnStore as having significant performance issues comparing to competition.
For example, here they compare the following vendors (and MariaDB ColumnStore appears the slowest):
- MonetDB
- HPE Vertica
- MariaDB ColumnStore
We decided to make our own test – and if necessary, to find a solution. Below is the description of our experiment.
We always approach optimization process using two principal methods:
- Data Model optimization
- Parameters optimization
Data Model Optimization
We looked at the tables used in the original tests and decided to replace VARCHAR and DECIMAL to INT UNSIGNED wherever possible.
Our environment: Centos 7 Guest on Oracle Virtual Box, MacBook Pro.
Original Table | Our Table |
---|---|
CREATE TABLE dim_links ( movieId DECIMAL(25) , imdbId DECIMAL(25) , tmdbId DECIMAL(25) ); | CREATE TABLE `dim_links` ( `movieId` int(10) unsigned DEFAULT NULL, `imdbId` int(10) unsigned DEFAULT NULL, `tmdbId` bigint(20) unsigned DEFAULT NULL ) ENGINE=Columnstore DEFAULT CHARSET=latin1; |
CREATE TABLE dim_movies ( movieId DECIMAL(20) , title VARCHAR(200) , genres VARCHAR(120) ); | CREATE TABLE `dim_movies` ( `movieId` int(10) unsigned DEFAULT NULL, `title` varchar(200) DEFAULT NULL, `genres` varchar(120) DEFAULT NULL ) ENGINE=Columnstore DEFAULT CHARSET=latin1; |
CREATE TABLE dim_tags ( userId DECIMAL(20) , movieId DECIMAL(20) , tag VARCHAR(500) , “timestamp” VARCHAR(50) ) | CREATE TABLE `dim_tags` ( `userId` int(10) unsigned DEFAULT NULL, `movieId` int(10) unsigned DEFAULT NULL, `tag` varchar(100) DEFAULT NULL, `timestamp` bigint(20) DEFAULT NULL ) ENGINE=Columnstore DEFAULT CHARSET=latin1; |
CREATE TABLE “fact_ratings”( “fact_id” BIGINT , “userId” DECIMAL , “movieId” DECIMAL , “rating” DOUBLE , “timestamp” VARCHAR(40) ) | CREATE TABLE `fact_ratings` ( `userId` int(11) DEFAULT NULL, `movieId` int(11) DEFAULT NULL, `rating` decimal(2,1) DEFAULT NULL, `timestamp` bigint(20) DEFAULT NULL ) ENGINE=Columnstore DEFAULT CHARSET=latin1; |
Results
Test 1
SQL statement: SELECT AVG(fact_id) FROM fact_ratings;
Vendor | Executuon time |
---|---|
MonetDB | 1.40 sec |
HPE Vertica | 0.39 sec |
MariaDB ColumnStore | 28.37 sec |
Our result for MariaDB ColumnStore: 1.34 sec.
Test 2
SQL statement: SELECT rating, count(*) from fact_ratings group by rating order by rating;
Vendor | Executuon time |
---|---|
MonetDB | 1.22 sec |
HPE Vertica | 0.95 sec |
MariaDB ColumnStore | 29.24 sec |
Our result for MariaDB ColumnStore: 2.02 sec.
Test 3
SQL statement: select dim_movies.title, avg(fact_ratings.rating) from dim_movies join fact_ratings on dim_movies.movieId = fact_ratings.”movieId” group by dim_movies.title DESC order by dim_movies.title limit 20;
Vendor | Executuon time |
---|---|
MonetDB | 6.53 sec |
HPE Vertica | 14.94 sec |
MariaDB ColumnStore | 5 min 22 sec |
Our result for MariaDB ColumnStore: 10.98 sec.
Test 4
SQL statement: select dim_movies.title, fact_ratings.rating, dim_links.imdbid from dim_movies left outer join fact_ratings on dim_movies.movieId = fact_ratings.”movieId” left outer join dim_links on dim_movies.movieId = dim_links.movieid where fact_ratings.rating > (select avg(fact_ratings.rating) from fact_ratings) order by fact_ratings.rating DESC limit 20;
Vendor | Executuon time |
---|---|
MonetDB | 8.47 sec |
HPE Vertica | 10.12 sec |
MariaDB ColumnStore | 6 min 4 sec |
Our result for MariaDB ColumnStore: 17.55 sec.
Conclusion
We experimented using very simple optimization method and common hardware – and still results are pretty good. If we were using the same hardware as described in the original comparison or applied parameter optimization – we tend to believe our results would be even better.
We are glad to be able to confirm MariaDB ColumnStore capabilities and will continue recommending it to our customers.