All add-ons currently require PHP 7.4 or greater.

On July 4th 2024 PHP 8.2 will be the new minimum requirement for all add-ons. Expect any add-on released after that date to require 8.2 or greater. Some releases may not immediately take advantage of 8.x specific features in PHP, which means you might, be able to continue using new releases in PHP 7.4, however, if you experience an error the first thing you should do is update to PHP 8.2 then create a support ticket if the error persists.

Please read about the changes to BoldMinded add-on licensing.

Ticket: When Cached Items are above 15,000 Add-On page and Cache Breaking Fail

Status	Resolved
Add-on / Version	Speedy 1.8.0
Severity
EE Version	6.3.5

Patmos Inc

Jan 18, 2023

So, I am not an expert on caching to say the least so this may or may not be a bug, but at the least, I would like to get some advice on how to optimize speedy.

So, I have a site that has a lot of urls because it has 100s of thousands of old news articles. So, in the span of 6 hours roughly up to 30,000 cached items can be created.
To complicate things further we have allowed bots to create cache since some of the pages they keep indexing are poorly designed using low_reorder which is pretty expensive, with a large exp_channel_data table and many relationships. So, turning this off for the time being is not really an option.

So to get to the main problem: when we have above 20,000 cache items, the add-on page for speedy times out around 15 seconds with “Mysql Server has gone away”. Can’t remember the exact query there will get more details on that. But the same issue also happens when trying to save entries that have been set up with cache_breaking. The error is as shown in the image. From the image, it is possible to see the cause which is counting the number of rows in the exp_channel_data table which has 77,000 rows.

Now, this seems to be caused by the function _breakEntryCache() in the service file CacheBreaker. On the 156, there is the following code:

$entries = $this->normalizeEntriesCollection($entries);

if (count($entries) === 0) {
    return;
}

// This is a potentially long running operation, disable the time limit
// so PHP won't timeout.
set_time_limit(0);

Now, the part that reads count($entries) seems to be causing this long running query. Seems like the set_timeout_limit never gets called because it is waiting for the count function. Any ideas why this only becomes a problem when I have a large cache? And any fixes possible for this issue? I am assuming their is a similar call on the settings page for the addon.

Just wondering what my options are. Do I need to increase my php timeout on the servers (not ideal because this could cause other issues)? should I just set my cache limit lower like 2-3 hours (would prefer to have a longer cache life than this)?

BoldMinded (Brian)

What cache driver are you using? File, Database? Redis? Memcache?

Patmos Inc

Sorry I am using Redis

BoldMinded (Brian)

Static pages or are you using the fragment caching?

Patmos Inc

I am using fragment caching

BoldMinded (Brian)

So this is mostly an issue in the control panel, front-end operates fine with that many cache items?

How big is the $entries array? Before that normalizeEntriesCollection call did you try adding var_dump(count($entries)); ?

Patmos Inc

Yes, it is only an issue in the control panel. I am not sure how large the count is, and it may be difficult for me to var_dump on a live site, but I will run that query in the database and determine how many rows it returns.

Patmos Inc

I have a staging website, so I can run the var_dump there. Give me a sec.

Patmos Inc

I added another image, but it just returns 1 for me. So that doesn’t seem to be the issue. Although it is some count statement somewhere causing this from the first image.

BoldMinded (Brian)

Do you have a dev/staging site I can login to and take a look at? Would need FTP and CP access.

#10

Patmos Inc

Asking our senior developer….

#11

Patmos Inc

Working on setting up sftp access for you.

#12

Patmos Inc

Comment has been marked private.

#13

Patmos Inc

I also added another image with the error when trying to access the settings page. There was 17,000 cache items in redis. Clearing all fixed the issue.

#14

BoldMinded (Brian)

Thanks I’ll try to take a look before the weekend. If I had to guess it’s not using the optimal redis command to count the number of keys.

#15

Patmos Inc

Thanks, I set my cache low for now until it gets figured out, but thanks for all your help. Have a good week.

#16

Patmos Inc

Okay, so I am pretty certain I found the offending script/s in Redis Driver:

// line 96
public function countItems()
{
  return count($this->getItemsFromPath('/'));
}

//which gets called by countItems() on line 208
public function getItemsFromPath($path)
{
   $path = $this->prefix . '/' . ltrim($path, '/');
   $path = rtrim($path, '/') . '/';

   $pattern = sprintf('~^%s~', preg_quote($path, '~'));
   $iterator = $this->getIterator($pattern);

   $items = [];
   $path_len = strlen($path);

   // Strip the $path prefix from each filename
   foreach ($iterator as $key => $data) {
       $items[] = substr($key, $path_len);
   }

   sort($items, SORT_STRING);

   return $items;
}

// which gets called by getItemsFromPath() on line 235
private function getIterator($pattern = null)
    {
        if ($pattern === null) {
            $pattern = sprintf('~^%s~', preg_quote($this->prefix, '~'));
        }

        $iterator = null;
        $store = [];

        try {
            while($keys = $this->client->scan($iterator, '*')) {
                foreach($keys as $key) {
                    if (preg_match($pattern, $key)) {
                        $store[$key] = $this->client->hGetAll($key);
                    }
                }
            }
        } catch (RedisException $exception) {
            // Fail silently...
        }

        return new \ArrayIterator($store);
    }

This series of functions grabs all items with ‘*’ then it has a nested for loop that checks path pattern, then it loops through and strips path, then it sorts, and then it counts. I am not great at calculating runtime analyses so I won’t attempt, but I would guess that very much is not a scalable function. This is for sure what is causing the issue. Here is a much better way to count redis using the php redis library: https://www.php.net/manual/en/function.substr.php.

<pre><code> public function countItems() { return $this->client->dbSize(); } [\code]

There are still a lot of instances where this spider web of nested loops is getting called. Such as deleteItem() or clear(). These should all be rewritten. I pushed my above change to my server to help out, but will watch for more bugs.

#17

Patmos Inc

// on line 85
 public function clear()
    {
        $keys = $this->getItemsFromPath('/');

        foreach($keys as $key) {
            $this->client->del($this->getKeyPath($key));
        }

        return true;
    }

// rewrite to 
public function clear()
    {
        $this->client->flushDb();

        return true;
    }

#18

BoldMinded (Brian)

Yeah I was reading a bit last night and found several mentions of not using scan or keys function in production especially on a large site, so yeah this needs some optimization. I looked at the dbsize function and while it might work, it also assumes that every key in redis is scoped to the site and to Speedy itself. A site and something other than Speedy could potentially be using Redis for something else, so dbsize might no reflect how many keys Speedy is actually managing.

#19

Patmos Inc

Yeah, I saw some of those comments as well. Hmm, the comment about dbSize totally makes sense. What about selecting a different redis database on install that isn’t the default, and then using that exclusively for speedy?. Using this command,

$redis->select(INT);

I guess the problem with that alone is that you cannot tell which database is used by other services. Theoretically, all 15 default databases could be used by different servers, and also the redis configuration could be restricted to a smaller value. So possiblle bugs there. But it is possible to see all the connected clients from this command.

$redis->info();
// which returns the followiing value
//# Keyspace
//db0:keys=407,expires=407,avg_ttl=14369899

#20

BoldMinded (Brian)

I added 500,000 keys to my local instance last night and the Speedy settings page loaded a little slower, but it eventually loaded… just took a few seconds. I’m surprised 15,000 is causing problems for you. Most of the info I’m finding about using keys() or scan() with large data sets are usually when taking about millions of keys.

#21

BoldMinded (Brian)

I tried logging into your CP with the user/pass provided but it said I was not authorized.

#22

BoldMinded (Brian)

In the RedisDriver.php file change the countItems function to this:

public function countItems()
    {
        $info = $this->client->info();
        preg_match('/^keys=(\d+),/', $info['db0'], $matches);

        return $matches[1] ?? 0;
    }

It might not be a perfect count, but it might be suitable for the time being.

#23

Patmos Inc

Alright, Looks like I did not make you part of a member role with access to the control panel. I changed that. so you

Now, it was a mysql server went away error, so it might have to do with the combination of entries in channel_titles with the exp_speedy_tags table. Did you test the 500,000 keys with entries in exp_speedy_tags? I had one instance where I cleared all the keys in redis, but speedy_tags table was still full, and I was still getting the same error.

I will try out that script. Thanks so much.

Thanks

#24

Patmos Inc

From the first sentence above *so you should have access now. I just tested that you did”

#25

Patmos Inc

Okay, I added 18,000 keys to my staging environment with the speedy tags, and it did not crash on the panel. Using the old countItems command. When I added 500,000 it broke with “504 timeout”. I wonder if this has to do with php timeout configuration.

#26

Patmos Inc

Both have the same settings for max_execution_time=30, memory_limit=500M, max_input_time=30

#27

Patmos Inc

Do you think that it has something to do with faulty installation?

#28

BoldMinded (Brian)

No, prob not a faulty installation. My CP page took about a minute to load with 500k redis keys, so a 30 second timeout like you have set makes sense.

Just loading the main Speedy settings page shouldn’t care about the number of tags in the table.

#29

Patmos Inc

Yeah, I was wondering why it would crash on my production environment as opposed to my staging environment. Also, why the mysql errors as shown in the pictures? flushing database definitely solved the issue in those cases, so it has something to do with the cache size. I’ll keep digging.

#30

Patmos Inc

Are there possible problems with speedy if I directly flush db? I have a pipeline that flushes redis database and does not go through the speedy addon.

#31

Patmos Inc

Okay found this error thrown today:

Caught Exception (500): SQLSTATE[HY000]: General error: 2006 MySQL server has gone away:

SELECT speedy_m_Tag_speedy_tags.id as speedy_m_Tag__id
FROM (`exp_speedy_tags` as speedy_m_Tag_speedy_tags)
WHERE  ( 
`speedy_m_Tag_speedy_tags`.`key` = 'speedy/default_site/global/more-stories-embed'
) 
LIMIT 18446744073709551615 in /var/www/joandearc.churchmilitant.com/panel/ee/legacy/database/drivers/mysqli/mysqli_connection.php:114
}

#32

BoldMinded (Brian)

> Are there possible problems with speedy if I directly flush db?

No, you should be able to clear Redis outside of EE/Speedy’s methods just fine. If you do you might have some rows in the tags table that don’t match up to a cached item, but if the cached item is just going to get regenerated again then it shouldn’t be a big deal.

What were you doing when you got that tags query error? I’ve never seen that before.

#33

Patmos Inc

Oh okay, good to know about clearing without updating the table. In regards to that last error that I posted. That was a user on the front end who I traced using datadog. They received a 500 error with that message after a long timeout. So, I had some more thoughts over the weekend.

I think that I may have started getting us lost in the weeds here; I am not entirely sure why my caching would cause timeouts with only 17,000 items in the cache, but because I tested it out on our staging environment with the new redis driver count method, and was not able to recreate a timeout error until I reached 400,000 items, I would conclude that there are other environment specific errors that may not have to do with the addon contributing to these problems. I noticed a problem with our environment: There are indexing bots hitting ee pagination links with infinite possibilities which were causing cache to explode at too fast a rate, so I created conditionals to stop this, and this has brought my cache down to a much more manageable limit; around 2000. This was setup from the earlier ce caching system and was not a problem before, so I was unable to identify before switching to speedy.

So, I do not want to drag this ticket out longer than necessary, as I think your work to make the redis driver will probably ameliorate most of these issues. If after that, I am still having more issues, it likely will be easier to identify, and I can submit a separate ticket. Thanks for all your work, and I appreciate the fast response times.

Thanks, Thomas at Patmos.

#34

Patmos Inc

So, another update on this issue that might be revealing.

We were having issues with saving an entry in the open status due to the size of the exp_speedy_tags table. Basically, the entry would timeout when trying to save. However, after looking at the exp_speedy_tags table had grown to 125,000 entries even thought my cache was below 2000 items. After deleting the entries in this table, saving entries was lightening fast. This would explain why when I added a large amount of cache items on my staging environment, it was working correctly.

This is specifically if a cache breaking rule is set on a channel with a large amount of entries in the exp_speedy_tags, and it does not have an effect on overall clearing of the cache nor actual users loading urls on the front-end. The issue here seems to be that if a site has many urls and a decent amount of traffic, the amount of tags will eventually become to large for cache breaking.

So, the exp_speedy_tags table obviously serves the purpose of keeping track of tags for clearing on the add-on overview panel and also for cache breaking rules, and possibly for identifying if a cache item already exists without contacting the redis server, but it doesn’t seem obvious that it is really is necessary in the end. Why not just add segments to the end of a cached item like so,

speedy/default_site/local/news/article/a-human-future/article-a-human-future/tags/news/news-article

Then all searches for tags could simply grab the items by the end segments with a regex check. Also, a redis command to check for that key item could be done very simply on front end page loads. Sometimes, this may mean that cache items are decently long, but I don’t see why that would be a performance issue. This would solve the issue that I am facing since the cache would never get large enough to cause issues.

Will do some more investigation to see if this would be possible, and whether there are other reasons for not doing this, but throwing it up here for the time being.

#35

Patmos Inc

Or another easy solution is to remove the items from exp_speedy_tags table when cache breaking or clearing cache in general.

#36

BoldMinded (Brian)

The buttons at the top mention how they handle tags. Are you finding that tags are still around even after clicking “Clear All Cache”? Also what are your cache breaking rules for the channels? If you define the rules it should also clear the tags relevant to that channel and entry.

Clear All Cache This will clear all cache items from enabled drivers, and related tags.

Clear Driver This will clear a specific driver’s cache, but leave tags.

Refresh Cache This will refresh all expired cache items (based on their ttl) from enabled drivers, but leave tags.

#37

Patmos Inc

The clear all cache button for me times out with 504 gateway error. Although it does clear all tags in the table and cache items. Generally speaking, the team does not clear cache from this panel, but relies on cache breaking rules. However, we do clear cache from our pipeline by a direct redis command which may cause inconsistencies between the cache items and the table.

You should have credentials to log into our staging environment, if you want to see the cache breaking rules.

#38

Patmos Inc

Also as a side note, I changed Redis Clear function from

public function clear()
    {
        $keys = $this->getItemsFromPath('/');

        foreach($keys as $key) {
            $this->client->del($this->getKeyPath($key));
        }

        return true;
    }

To the following:

public function clear()
    {
        $this->client->flushDb();

        return true;
    }

Besides that and the countItems function. There are no other hacks.

#39

BoldMinded (Brian)

Changing it to flushDb() obv deletes everything, and in this case Speedy is namespaced to the current site name, and it’ll also hit everything else in the Redis db that Speedy might not be controlling, so I don’t think I want to make that change.

>However, we do clear cache from our pipeline by a direct redis command which may cause inconsistencies between the cache items and the table.

Right, so it sounds like your pipeline might also need to handle the tags table, if that is how you want to end up clearing the cache.

If I’m understanding correctly, what I’m hearing is that Speedy is functioning fine and this might be more of an issue with how you’re clearing the cache with some custom pipeline. I have customers with large datasets and I haven’t heard of similar issues.

#40

Patmos Inc

Cool will be looking into clearing table as well. Thanks for the help.

I was not suggesting you change the code to flushDB() was just pointing out that on my environment that is how I have it set to avoid the getItemsFromPath function. If I am only using my redis server for speedy, this shouldn’t be a problem for us?

#41

BoldMinded (Brian)

Yeah, if you want to modify that call to use flushDB() go for it. Maybe I’ll consider adding a config variable override to use that call instead.

#42

Patmos Inc

Cool, I am good to close this one. I think you were right about the pipeline causing issues going to try to use the action call for breaking all cache instead of a redis flushall. Also, another issue seemed to be that our client and their team has been clearing the redis driver specifically and not the clear all button which explains why the exp_speedy_tags table kept getting so big. Going to restrict access to avoid this issue.

Anyways, thanks so much for your help. I really appreciate it, and have a good weekend.

Thanks Patmos