We’re an enterprisey shop. Our department has no public-facing services. Our shop’s customers are engineer employees at our company, and you could basically consider our department a large lab operations group at a deep-but-slow think tank. This will be a pretty moronic post to anyone involved in providing performant web services for a salary, but I’m always readily willing to look moronic to tell a story.
As an aside, in an unrelated ticket, a customer mentioned slow page load times for one of the pages served by our GForge Advanced Server instance. He indicated that this particular page always takes nearly 20 seconds to load and offered that it might be tied to the fact that he is a member of roughly 20 projects hosted via that GForge server.
This aside from the customer was a remarkable coincidence, as I had created an internal task ticket 2-3 weeks ago for us to implement metrics gathering for this service because “we should”. We already were performing the most basic of up/down monitoring for the host and service via Nagios. Now we get to have a problematic baseline of metrics and watch things improve from here.
After tweaking our Apache logging to log request service time (%D
via mod_log_config, we noticed some (too many) problematic pages for certain users and projects. One of those pages was, of course, the page the customer had reported.
So far, we’ve instrumented metric gathering for each block of PHP (…) code in the most commonly accessed problematic page and tracked down the specific section where the slowness happens. The metric gathering is simplistic: for each major block of execution in the PHP file, store a start time, store an end time, and calculate total seconds to execute that block. Finally, syslog()
the accumulated metrics as one line.
Anyway, that’s not the point of this post. The point is: Never stand up a service that works overall and assume your users will complain of slowness. Turns out 10-20 users have been quietly suffering through seriously long page load times for years now without saying a word to us. According to other admins in other departments (who responded to our nice-but-essentially “WTF people?” message to all customers of the GForge service), they’ve experienced the same phenomenom.