Sunday, June 18, 2006

Why SQL Performance Tuning is 'teh suck'.

At work of late I've been doing performance tuning in SQL. In postgresql to be precise. A friend of mine said that its great and you learn a lot. I'd say the second part is true, but the first is wrong. SQL Performance tuning, especially in postgresql is truly an abysmal process!

Part of the problem is that you have to work with a poor mans toolkit. In postgresql, your best friend for tuning your queries is explain. And thats it.
There is a visual version, but for the really complex queries (which are usually the ones that NEED the tuning in the first place), this visual version will choke and die as it requires just too much power to generate an visual display of these often frightening queries.

Explain basically tells you what the database 'planner' is probably going to do to answer your query. I.e, what tables it will hit, what merging, connecting, sorting etc it will do to answer your demands. This planner is quite intelligent in most cases, except with complex databases like the one I'm working on. This is where you have to dabble in the black arts. Eye of newt, and key of foreign table style.

Things that help the planner

One thing that really helps the planner is making your foreign keys explict in joins. So sometimes if a table has a foreign key back and forth, just mention them both and it can often improve things dramatically.

Beware of joining views which have unions defined in them. Unions seem to create temporary tables which means bye bye indexing. Which leads me to the next point. When it comes to searching, indexes are king. It can vastly improve the speed of your joins if you use indexes, so never sacrifice (or forget indexes). Sequential scans are very costly, but index scans can be much faster. But do be aware, that when the table is too small, indexes won't be used (seq scans are faster), and when the table is too large, sometimes it won't use indexes either!

Avoid using tables which you know are going to be huge, but which you can avoid. Postgresql is particularly relevant here. Since it allows using inheritance, your smaller specialised 'sub tables' could be more useful for the join than the big table since it can be scanned faster. It also be fine to scan each of those sub tables separately and union them at the end before joining (because you'd have got smaller filtered sets to work with).

Relative sizes also matter. Say if you are joining Table A to Table B to Table C, and you know A and B are generally smaller than C. Its a lot faster to join A and B first, and then join the results of this to C. Rather than joining all at the same time.

I'm working in Postgresql 7.3 and granted this is getting a bit old. 8.0+ has apparently made a lot of improvements and hence half the work I've done is probably no longer necessary. Which begs the question, why aren't we using pgsql 8.0. Well dear reader, thats a question for another day :)

No comments: