On Jul 17, 2014, at 10:24 AM, Scott Dellinger wrote:
[Jim Lawson and I were discussing yesterday afternoon who to speak with about the landscape project’s maintenance, so I’ll let this serve tentatively as confirmation of who belongs on this thread. 🙂 ]
The landscape code for image searches performs some very inefficient searches against the database in ways that would not be so apparent with fewer images, but start quickly making queries pile up as the image count increases and multiple search engines start accessing your web pages simultaneously. Despite there only being 70 some thousand rows in your image tables in the database, these types of database queries in large numbers have recently been seriously impacting performance of WebDB services for other users as well as your own. Short of temporarily banning your user account from access, yesterday our best option in the short term was (further) restricting the number of simultaneous connections permitted by the “glcp” WebDB user, to 15. This is why you encountered an error about exceeding max_user_connections.
As a matter of scalability, 70 thousand rows is not a lot. Hundreds of millions of rows would be. However, database servers are not designed to efficiently handle queries such as “list all images without restriction and toss out all but one in the middle”, which is the main culprit type of query occurring here. This is very, very different from a conceptual “show me image number 5,002”, which can be extremely fast.
Because these queries are hurting performance on WebDB for other users at this point, we cannot raise the concurrent connection limit again unless these problems are resolved in some way. Obviously, the current state is not optimal for you or your users. I am not sure how this software is being regularly maintained at present, but here are the non-mutually-exclusive options I can think of, going forward:• Move the landscape database onto a MySQL service on your existing dedicated web server. You would not have a connection limit this way and would not impact anyone else. This will not solve performance issues, which will continue getting worse over time.• Put a caching proxy server (probably Varnish) in front of your web server. For pages that are accessed frequently, this would query the database however many times one page requires, then not again for that page until the page expires from the cache. A basic Varnish setup for a web application that is not overly complicated can be easy to set up. This would not solve the database issues, but might put a big dent in their impact.• Rewrite the inefficient database queries to access data more efficiently. Selecting a single row at a time from a table should not involve querying half the table before picking out one row. There must either be some reasonable way to search by a specific identifier to retrieve one row in the first place, or at least have a periodic job that performs the regular searches and caches the results in a way that itself can be queried by an identifier.
Who is currently maintaining the landscape project code? SAA would be happy to offer advice, but cannot do more than that at present.
So while on vacation…
- I added a nicer error message
- I edited http://www.uvm.edu/landscape/search such that it no longer performs a “select all” before offering to “Browse all”
- I disabled custom database session handling and restored php file based session handling
- will these changes help? I don’t know.
- I went back to vacation mode. Things are swell on my deck in Starksboro