Re: Welcome back! [September 2014 reboot]

Fri Feb 26, 2016 9:56 pm

GSlob wrote:And today something similar had happened again, the site was temporarily unavailable. Were it up to me, I would bar the external search engines. It is not like we are getting a stream of new members from their google- or other searches.

No stream-punks?

Re: Welcome back! [September 2014 reboot]

Mon May 02, 2016 12:44 pm

After some research into the inner workings of the phpBB software (which powers DarwinCentral and a huge number of other internet forums), I found something unexpected.

As you may recall, for a while now we've been having problems with high CPU demand, the forum has been auto-offlining itself when CPU hits a certain max level, and since search-bots seemed to be a bunch of the load, we turned most of them (other than Google) off in the Admin Control Panel.

Still, things have been running at high CPU for a while, getting really bad a few days ago until I did some digging and found what was really going on. It counter-intuitive, and I wonder how many other phpBB admins aren't aware of this (and doing the wrong thing).

Here's the bot page from the Admin Control Panel:
AdminBotPage.PNG

See that "Deactivate" option on each bot? Gee, you'd think that would disable that bot from visiting your site and using forum resources, right? What I found is that a) no it doesn't, and b) deactivating a bot entry actually puts *more* load on the forum. ](*,)

Nerdy explanation: Every request to the forum is tracked in a "session", and sessions are managed in the phpbb3_sessions database table. If you're logged in as a member, all of your activity over a long period (several hours) is handled by a single long-lived session record.

But forum requests by non-logged-in users generate a fresh session for every single web "hit" -- if they look at 20 threads even over just a minute or two, that generates 20 session records in the forum database (each of which is assigned to user_id "1", which is the "anonymous" user record). Here are the most recent records in the sessions table (I've truncated the session_ip_address column to protect privacy):
DC_Sessions.PNG

These session records (like all others) hang around the database for several hours until they expire of "old age". When there get to be too many of them, forum CPU goes up and response time goes down, because just about everything anyone does with the forum requires at least a search (and sometimes an insertion/cleanup) of the sessions table records, so the more records that have to be waded through, the longer that operation takes, and if forum hits are coming faster than sessions table operations can be completed, a traffic jam occurs and things just spiral out of control.

Here's a post of mine from last year describing the same problem, back when bad "forum cookie" settings were causing other issues along with a growth in the sessions table (because every page hit even by members looked like a new session): viewtopic.php?p=1198741#p1198741

This also explains why the server would go back to low CPU usage for a while after I rebooted it, which I often did when CPU usage was getting out of control. It worked, but I didn't understand exactly why until now -- when the forum software reboots, it clears the sessions table and starts fresh. However it wouldn't take long for the size to grow too large again (and CPU usage to rise), due to the following issue.

Long story short (yeah, I know, too late), what the 'bots" page actually does is help the forum software recognize multiple hits by a particular bot, so that it can consolidate all of its activity into a single re-usable session record (as if it were a pseudo-member of the forum) rather than generating a new session record for each of its many, many hits on the forum. It can do this because every hit on the forum contains information about what kind of browser is requesting the info (see the "session_browser" column in the above screenshot), which is normally used in web traffic to help an HTTP request know how best to format the resulting pageview depending upon what your browser can handle, but well-behaved bots use it to basically say "Hi I'm the Google bot [or whatever]" (see for example the second row in the above screenshot, that's the "AhrefsBot" visiting which also announces it's compatible with a Mozilla browser).

The bots table here contains a list of search strings to match with a particular visiting bot (e.g. "AhrefsBot"). A problem arises when new bots get unleashed on the world, which aren't already "recognized" by the pre-loaded "bots table". Those don't get recognized as a bot, and each and every one of their forum hits generates a new sessions table record, and we're back to the problem of tens of thousands of records in the sessions table which slows down the forum until the traffic jam occurs again -- not an overload of bandwidth traffic, but an overload of forum database churning.

Once I figured this all out I sifted through the sessions table to find all the unrecognized bot activity, and I created new "bot table" entries for them so that from now on they'll have their traffic folded into single sessions table records, which will keep the size of the sessions table down to a manageable size. Check out the before-and-after in this graph of DarwinCentral CPU activity over the past two weeks, it's pretty obvious where I added the new "bot recognizers":
DC_CPU2.PNG

I also added a bot entry for the "pagespeed_mod" feature, which isn't technically a bot. It's a server-side add-on that caches recently viewed forum images as people view them, under the theory that it's likely that other people will be viewing the same pages soon and the images can be served up from the in-memory copy rather than hitting the disk again. For some reason (I can think of several that would make sense) this add-on fetches the images by making its own "web request" via the forum itself rather than just reading it directly off the disk, so it generates session records as well, and it generates one session per image fetch, which is obviously the same as the "unrecognized bot" problem. And unlike bots that don't visit all the time, the pagespeed_mod gets triggered (often multiple times) whenever someone views a page that has image(s). So i made a "bot" entry for it which helps lower the size of the sessions table by about another 2000 records. I'm surprised this isn't a standard thing for phpBB forums since they use that add-on a lot.

And now for the counter-intuitive thing. You'd think that using the "deactivate" option on a bot entry would block that bot, but it doesn't. All it does is stop the forum from recognizing the bot anymore, which means that when it does visit it generates thousands of session table records instead of just one, which means that deactivating a bot entry *increases* forum CPU load, not decreases it. :roll:

There is in theory a way to actually block a bot, but it's messy and doesn't always work, since it relies on the bot itself looking for and then honoring the "please leave us alone" request. This is done via the "robots.txt" file that can be placed on a website, which contains a list of bots you want to stop visiting your site. But setting it up is a pain, and it's not worth doing as long as the actual amount of bot traffic isn't abusive, which it doesn't seem to be. As long as we keep the size of the sessions table down, our CPU level should be just fine.
You do not have the required permissions to view the files attached to this post.

Re: Welcome back! [September 2014 reboot]

Mon May 02, 2016 2:49 pm

magic.....

Re: Welcome back! [September 2014 reboot]

Mon May 02, 2016 2:51 pm

That's some fine detective work, Ichny!

Re: Welcome back! [September 2014 reboot]

Fri Jul 29, 2016 1:36 pm

Ichneumon wrote:Let me know if you spot anything that doesn't work like it did at the original site.


It looks similar, but I don't remember the old site well enough to make other comment.

Re: Welcome back! [September 2014 reboot]

Fri Jul 29, 2016 1:52 pm

Ned Ludlam wrote:
Ichneumon wrote:Let me know if you spot anything that doesn't work like it did at the original site.


It looks similar, but I don't remember the old site well enough to make other comment.


Well, then it's time for you to spend more time here. :D

Re: Welcome back! [September 2014 reboot]

Sat Jul 30, 2016 7:04 pm

Are we back? Were we gone?

Re: Welcome back! [September 2014 reboot]

Sat Jul 30, 2016 7:52 pm

Old name is dead, long live the new DNS name!

Re: Welcome back! [September 2014 reboot]

Sat Jul 30, 2016 7:56 pm

Aand we're back again, with no obvious loss of data. Sorry for the 22-hour outage.

I noticed the outage about ten minutes after it began. Normally rebooting the AWS (Amazon Web Service, where we're hosted) instance fixes any hangups, but this time it not only didn't help, AWS itself was flagging a "reachability" issue, which I've never seen before.

Long story short, apparently the hardware we were running on exploded or something. Here's part of the reply I got back when I opened a support ticket:
Thank you for contacting AWS support. I understand you are having an issue with the reachability of one of your Instances i-055cd8ee. I took a look at the Instance and the underlying hardware. The underlying hardware became unhealthy at Sat Jul 30 02:00:00 UTC 2016. We try to maintain a highly available infrastructure however as with any technology there will always be failures.

Moving forward you can stop/start the Instance and it will move to new host.

I was kind of surprised, I had always assumed that software instances were run on a virtual machine, which would be shuffled around on AWS's massive servers as necessary to balance loads and keep them running.

Apparently that's not the case (at least for the old-fashioned instance types like we're using), and an instance runs on a dedicated hunk of hardware. When that crashes, you have to fully close down the old instance and spawn off a new copy, which assigns it to new hardware (and a new IP address).

Once I did that everything was happy again, which was a relief because I had never done a complete restart before and wasn't entirely sure what might be lost in translation (the AWS documentation gives cryptic warnings about losing "ephemeral storage" and such). As far as I can tell nothing was lost aside from our old IP address (which isn't an issue since I was using AWS's IP redirection service, thus I could re-point the location instantly without even having to propagate DNS server tables).

If I had known the Stop/Start would work without a hitch, we could have been back up last night in about 2 hours, but I spent a lot of time triple-verifying things first, including spawning backup volumes of the DC hard drives, and using them to build brand new clone instances of the site to ensure I had spare working copies of everything before doing a Full Stop on the old one that's been operating continuously since September 2014, in case the Stop process ate something, especially since I've been sort of lax in making off-site backups to my own computer lately.

I'm pretty impressed with AWS's service and the promptness of their support people. When the instance hung while trying to do the Stop, I updated the ticket and the issue was manually resolved and replied to within six minutes.

Let me know if anything seems amiss, but so far it looks like everything carried over to the new machine just fine.

Re: Welcome back! [September 2014 reboot]

Sat Jul 30, 2016 9:51 pm

I don't know exactly how clouds work, but yeah, typically the VM is pretty tightly coupled to the hardware and if the hardware goes, so does the VM. Typically it's 3 VMs per box.

Re: Welcome back! [September 2014 reboot]

Sat Jul 30, 2016 10:52 pm

Desty wrote:I don't know exactly how clouds work, but yeah, typically the VM is pretty tightly coupled to the hardware and if the hardware goes, so does the VM. Typically it's 3 VMs per box.

Well, I've looked at clouds from both sides now, but I really don't know clouds at all.

Re: Welcome back! [September 2014 reboot]

Sun Jul 31, 2016 12:38 pm

NicknamedBob wrote:
Desty wrote:I don't know exactly how clouds work, but yeah, typically the VM is pretty tightly coupled to the hardware and if the hardware goes, so does the VM. Typically it's 3 VMs per box.

Well, I've looked at clouds from both sides now, but I really don't know clouds at all.

Me neither, but in my case, why should clouds be different?

Re: Welcome back! [September 2014 reboot]

Mon Aug 08, 2016 1:55 pm

furball4paws wrote:
Ned Ludlam wrote:
Ichneumon wrote:Let me know if you spot anything that doesn't work like it did at the original site.
It looks similar, but I don't remember the old site well enough to make other comment.

Well, then it's time for you to spend more time here. :D


I see that happening, of course, that may not be such a great deal for DC. ;)

Re: Welcome back! [September 2014 reboot]

Mon Aug 08, 2016 2:01 pm

Okay, so, the signatures don't work? I've tried it manually and on by default, bupkis.

Re: Welcome back! [September 2014 reboot]

Mon Aug 08, 2016 2:49 pm

Oh, okay, I found the magic radio button.

Re: Welcome back! [September 2014 reboot]

Mon Aug 08, 2016 3:31 pm

Ned Ludlam wrote:Oh, okay, I found the magic radio button.

:lol: