Tag Archives: tuning

hoe snel neemt google blogpostings op?

Ik ben in een usenet-discussie verzeild geraakt. Even als demonstratie van hoe snel google kan zijn als je je zaakjes een beetje op orde hebt:


On 2008-05-11, Teun <teun19900909@hotmail.com> wrote:

1) Nee nee, dat is appels met peren vergelijken en nog onjuist ook.
http://www.nutz.nl/ is een bestaande site, geen nieuwe.

De posting is nieuw.

2) De ‘In cache’ staat op 5 mei 2008 13:10:23 GMT en het is nu 11 mei. Dat wil zeggen dat de spider al 6 dagen niet meer is langsgekomen. Het is puur toeval dat je laatste update al na 46 uur werd toegevoegd.

De spider komt pas weer als ik een nieuwe posting doe. sitemap.xml.gz zorgt daarvoor. Ik heb sinds m’n laatste posting een keer of vijf 304 NOT MODIFIED tegen een google-machine geroepen. Als ik dit post, komt binnen korte tijd de spider langs en indexeert het artikel. Dat ga ik nu even voordoen, als illustratie.


Harrison’s Postulate: For every action, there is an equal and opposite criticism.

Als ik nu op ‘publish’ klik wordt er een nieuwe sitemap.xml.gz gemaakt, google wordt daarvan op de hoogte gebracht, en meestal komt een paar minuten later er al een spider langs. Eens kijken of dat deze keer ook zo vlot gaat.
(Edit: grofweg een uur dus.)

tuning mailservers: go for the low-hanging fruit first

Enough about spammers, let’s tackle tuning a mail-server. After installing munin on a couple of boxes I quickly found some low-hanging fruit to pick. My mail-servers use mysql for many things. A couple of my mysql-servers had their query-cache disabled, which seems default for the linux-distro they use. If your pdns- and postfix-box is glued together with mysql you might have a problem on your hands. Many queries are repeated a number of times. Not caching them leads to many longer-running queries, which leads to a larger number of parallel queries, which leads to more memory-use and more parallel tasks. Your box wil eat memory, processes and file-handles. Load will  increase, throughput will suffer, and within days you’re close to deciding that your cluster need another box. 

Because of the weekend I decided to tune whatever I could, and the results are amazing, even without going deep into high-end tuning. Just picking some low-hanging fruit was sufficient. Before the weekend the mail-server in question was always swapping, had a load of > 15, consumed most of it’s CPU-time waiting for disks to catch up, and was slow in a most unpleasant way. 2 hours later it’s mostly idle, I’ve got memory to spare and the box feels like a Xeon should. Snappy.

The biggest improvement was made by enabling various caches in mysql. Pdns makes loads of queries, and caching those is always good. Some customers get loads of mail, and caching those also helps. After tuning a bit I got a query cache hit-rate of 90%, which quickly translates to 1/10th of the disk-accesses for mysql I had before. Less waiting on disk-access means less parallel queries, so I could lower that too, saving lots of memory, processes and file-handles.

Next up: check your syslog-configuration. Mine was full of doubles, by which I mean things logged in two different files. If you haven’t done anything to syslogd.conf and just installed packages this might be the case. I found out my predecessors did even worse. Every POP3 and IMAP log-in was written to courier.log, mail.log, messages.log and secure.log, which would count as quadruple logging. Realise that *every* log-in incurs four write-actions to logfiles instead of one, and you’ll see why checking makes sense. 

For my tuning I found to more or less overlapping tools to quickly analyze your mysql. The first is mysqltuner which is a perl script that will analyze a running mysql instance and give some advice. The second is tuning-primer.sh which is slightly more verbose, and explains why you should set certain values and offers links to explanations. Both will offer you a certain degree of insight which might be hard to find otherwise. For instance: Many people over-allocate resources to their mysql-server. You might be tempted to allocate 500 parallel connections. But you probably didn’t realise that each connection will require memory and that you thus can allocated more than physical memory. Which might bite you when your machine is heavily loaded and part of your running processes will end up in swap, slowing your machine to the point of unusability. 

The effects of adding query-cache to your mysql