Friday 8 May 2015

How is Tableau so fast when working with databases?

Tableau compiles the elements of your visual canvas into a SQL or MDX query for the remote database to process. Since a database typically runs on more powerful hardware than the laptops / workstations used by analysts, you should generally expect the database to handle queries much faster than most in-memory BI applications limited by end-user hardware. Tableau's ability to push computation (queries) close to the data is increasingly important for large data sets, which may reside on a fast cluster and may be too large to bring in-memory.
Another factor in performance relates to data transfer, or in Tableau's case resultset transfer. Since Tableau visualizations are designed for human consumption, they are tailored to the capabilities and limits of the human perception system. This generally means that the amount of data in a query resultset is small relative to the size of the underlying data, and visualizations focus on aggregation and filtering to identify trends and outliers. The small resultsets require little network bandwidth, so Tableau is able to fetch and render the resultset very quickly. And, as Ross mentioned, Tableau will cache query results for fast reuse.
The last factor as mentioned by Eriglen involves Tableau's ability to use in-memory acceleration as needed (for example, when working with very slow databases, text files, etc.). Tableau's Data Engine uses memory-mapped I/O, so while it takes advantage of in-memory acceleration it can easily work with large data sets which cannot fit in memory. The Data Engine will work only with the subsets of data on disk which are needed for a given query, and the data subsets are mapped into memory as needed.
OR
Tableau does do some amount of in-memory storage to increase speed (when extracted), but a great portion of its speed actually comes from not having to store data in memory.
This is because Tableau only stores the data relevant to your queries in-memory, whereas other solutions will store the entire set in memory, which can take more time to load.
OR
Tableau main feature "data engine" is really a cool feature. If you work with a large amount of data it takes some time to import, create indexes and sort data but after that every thing speedup. Tableau data engine is not really in-memory technology. The data is stored in disk after imported and then RAM is hardly utilized. This conception brings the desired performance.

1 comment: