Publisher does not support the Fluid field type. Please do not contact asking when support will be available.

If you purchased an add-on from expressionengine.com, be sure to visit boldminded.com/claim to add the license to your account here on boldminded.com.

Ticket: When Cached Items are above 15,000 Add-On page and Cache Breaking Fail

Status Open
Add-on / Version Speedy 1.8.0
Severity
EE Version 6.3.5

Patmos Inc

Jan 18, 2023

So, I am not an expert on caching to say the least so this may or may not be a bug, but at the least, I would like to get some advice on how to optimize speedy.

So, I have a site that has a lot of urls because it has 100s of thousands of old news articles. So, in the span of 6 hours roughly up to 30,000 cached items can be created.
To complicate things further we have allowed bots to create cache since some of the pages they keep indexing are poorly designed using low_reorder which is pretty expensive, with a large exp_channel_data table and many relationships. So, turning this off for the time being is not really an option.

So to get to the main problem: when we have above 20,000 cache items, the add-on page for speedy times out around 15 seconds with “Mysql Server has gone away”. Can’t remember the exact query there will get more details on that. But the same issue also happens when trying to save entries that have been set up with cache_breaking. The error is as shown in the image. From the image, it is possible to see the cause which is counting the number of rows in the exp_channel_data table which has 77,000 rows.

Now, this seems to be caused by the function _breakEntryCache() in the service file CacheBreaker. On the 156, there is the following code:

$entries = $this->normalizeEntriesCollection($entries);

if (count($entries) === 0) {
    return;
}

// This is a potentially long running operation, disable the time limit
// so PHP won't timeout.
set_time_limit(0);

Now, the part that reads count($entries) seems to be causing this long running query. Seems like the set_timeout_limit never gets called because it is waiting for the count function. Any ideas why this only becomes a problem when I have a large cache? And any fixes possible for this issue? I am assuming their is a similar call on the settings page for the addon.

Just wondering what my options are. Do I need to increase my php timeout on the servers (not ideal because this could cause other issues)? should I just set my cache limit lower like 2-3 hours (would prefer to have a longer cache life than this)?

 

 

#1

BoldMinded (Brian)

Jan 18, 2023

What cache driver are you using? File, Database? Redis? Memcache?

#2

Patmos Inc

Jan 18, 2023

Sorry I am using Redis

#3

BoldMinded (Brian)

Jan 18, 2023

Static pages or are you using the fragment caching?

#4

Patmos Inc

Jan 18, 2023

I am using fragment caching

#5

BoldMinded (Brian)

Jan 18, 2023

So this is mostly an issue in the control panel, front-end operates fine with that many cache items?

How big is the $entries array? Before that normalizeEntriesCollection call did you try adding var_dump(count($entries)); ?

#6

Patmos Inc

Jan 18, 2023

Yes, it is only an issue in the control panel. I am not sure how large the count is, and it may be difficult for me to var_dump on a live site, but I will run that query in the database and determine how many rows it returns.

#7

Patmos Inc

Jan 18, 2023

I have a staging website, so I can run the var_dump there. Give me a sec.

#8

Patmos Inc

Jan 18, 2023

I added another image, but it just returns 1 for me. So that doesn’t seem to be the issue. Although it is some count statement somewhere causing this from the first image.

#9

BoldMinded (Brian)

Jan 18, 2023

Do you have a dev/staging site I can login to and take a look at? Would need FTP and CP access.

#10

Patmos Inc

Jan 18, 2023

Asking our senior developer….

#11

Patmos Inc

Jan 18, 2023

Working on setting up sftp access for you.

#12

Patmos Inc

Jan 18, 2023

Comment has been marked private.

#13

Patmos Inc

Jan 18, 2023

I also added another image with the error when trying to access the settings page. There was 17,000 cache items in redis. Clearing all fixed the issue.

#14

BoldMinded (Brian)

Jan 18, 2023

Thanks I’ll try to take a look before the weekend. If I had to guess it’s not using the optimal redis command to count the number of keys.

#15

Patmos Inc

Jan 18, 2023

Thanks, I set my cache low for now until it gets figured out, but thanks for all your help. Have a good week.

#16

Patmos Inc

Jan 19, 2023

Okay,
so I am pretty certain I found the offending script/s in Redis Driver:

// line 96
public function countItems()
{
  return count($this->getItemsFromPath('/'));
}

//which gets called by countItems() on line 208
public function getItemsFromPath($path)
{
   $path = $this->prefix . '/' . ltrim($path, '/');
   $path = rtrim($path, '/') . '/';

   $pattern = sprintf('~^%s~', preg_quote($path, '~'));
   $iterator = $this->getIterator($pattern);

   $items = [];
   $path_len = strlen($path);

   // Strip the $path prefix from each filename
   foreach ($iterator as $key => $data) {
       $items[] = substr($key, $path_len);
   }

   sort($items, SORT_STRING);

   return $items;
}

// which gets called by getItemsFromPath() on line 235
private function getIterator($pattern = null)
    {
        if ($pattern === null) {
            $pattern = sprintf('~^%s~', preg_quote($this->prefix, '~'));
        }

        $iterator = null;
        $store = [];

        try {
            while($keys = $this->client->scan($iterator, '*')) {
                foreach($keys as $key) {
                    if (preg_match($pattern, $key)) {
                        $store[$key] = $this->client->hGetAll($key);
                    }
                }
            }
        } catch (RedisException $exception) {
            // Fail silently…
        }

        return new \ArrayIterator($store);
    }

This series of functions grabs all items with ‘*’ then it has a nested for loop that checks path pattern, then it loops through and strips path, then it sorts, and then it counts. I am not great at calculating runtime analyses so I won’t attempt, but I would guess that very much is not a scalable function. This is for sure what is causing the issue. Here is a much better way to count redis using the php redis library: https://www.php.net/manual/en/function.substr.php.

<pre><code>
public function countItems()
{
  return $this->client->dbSize();
}
[\code]

There are still a lot of instances where this spider web of nested loops is getting called. Such as deleteItem() or clear(). These should all be rewritten. I pushed my above change to my server to help out, but will watch for more bugs.

 

#17

Patmos Inc

Jan 19, 2023

// on line 85
 public function clear()
    {
        $keys = $this->getItemsFromPath('/');

        foreach($keys as $key) {
            $this->client->del($this->getKeyPath($key));
        }

        return true;
    }

// rewrite to 
public function clear()
    {
        $this->client->flushDb();

        return true;
    }
#18

BoldMinded (Brian)

Jan 19, 2023

Yeah I was reading a bit last night and found several mentions of not using scan or keys function in production especially on a large site, so yeah this needs some optimization. I looked at the dbsize function and while it might work, it also assumes that every key in redis is scoped to the site and to Speedy itself. A site and something other than Speedy could potentially be using Redis for something else, so dbsize might no reflect how many keys Speedy is actually managing.

#19

Patmos Inc

Jan 19, 2023

Yeah,
I saw some of those comments as well.
Hmm, the comment about dbSize totally makes sense. What about selecting a different redis database on install that isn’t the default, and then using that exclusively for speedy?. Using this command,

$redis->select(INT);

I guess the problem with that alone is that you cannot tell which database is used by other services. Theoretically, all 15 default databases could be used by different servers, and also the redis configuration could be restricted to a smaller value. So possiblle bugs there. But it is possible to see all the connected clients from this command.

$redis->info();
// which returns the followiing value
//# Keyspace
//db0:keys=407,expires=407,avg_ttl=14369899

 

#20

BoldMinded (Brian)

Jan 20, 2023

I added 500,000 keys to my local instance last night and the Speedy settings page loaded a little slower, but it eventually loaded… just took a few seconds. I’m surprised 15,000 is causing problems for you. Most of the info I’m finding about using keys() or scan() with large data sets are usually when taking about millions of keys.

#21

BoldMinded (Brian)

Jan 20, 2023

I tried logging into your CP with the user/pass provided but it said I was not authorized.

#22

BoldMinded (Brian)

Jan 20, 2023

In the RedisDriver.php file change the countItems function to this:

public function countItems()
    {
        $info = $this->client->info();
        preg_match('/^keys=(\d+),/', $info['db0'], $matches);

        return $matches[1] ?? 0;
    }

It might not be a perfect count, but it might be suitable for the time being.

#23

Patmos Inc

Jan 20, 2023

Alright,
Looks like I did not make you part of a member role with access to the control panel. I changed that. so you

Now, it was a mysql server went away error, so it might have to do with the combination of entries in channel_titles with the exp_speedy_tags table.
Did you test the 500,000 keys with entries in exp_speedy_tags?
I had one instance where I cleared all the keys in redis, but speedy_tags table was still full, and I was still getting the same error.

I will try out that script. Thanks so much.

Thanks

#24

Patmos Inc

Jan 20, 2023

From the first sentence above *so you should have access now. I just tested that you did”

#25

Patmos Inc

Jan 20, 2023

Okay, I added 18,000 keys to my staging environment with the speedy tags, and it did not crash on the panel. Using the old countItems command. When I added 500,000 it broke with “504 timeout”. I wonder if this has to do with php timeout configuration.

#26

Patmos Inc

Jan 20, 2023

Both have the same settings for max_execution_time=30, memory_limit=500M, max_input_time=30

#27

Patmos Inc

Jan 20, 2023

Do you think that it has something to do with faulty installation?

#28

BoldMinded (Brian)

Jan 20, 2023

No, prob not a faulty installation. My CP page took about a minute to load with 500k redis keys, so a 30 second timeout like you have set makes sense.

Just loading the main Speedy settings page shouldn’t care about the number of tags in the table.

#29

Patmos Inc

Jan 20, 2023

Yeah, I was wondering why it would crash on my production environment as opposed to my staging environment. Also, why the mysql errors as shown in the pictures? flushing database definitely solved the issue in those cases, so it has something to do with the cache size. I’ll keep digging.

#30

Patmos Inc

Jan 20, 2023

Are there possible problems with speedy if I directly flush db? I have a pipeline that flushes redis database and does not go through the speedy addon.

#31

Patmos Inc

Jan 20, 2023

Okay found this error thrown today:

Caught Exception (500): SQLSTATE[HY000]: General error: 2006 MySQL server has gone away:

SELECT speedy_m_Tag_speedy_tags.id as speedy_m_Tag__id
FROM (`exp_speedy_tags` as speedy_m_Tag_speedy_tags)
WHERE  ( 
`speedy_m_Tag_speedy_tags`.`key` = 'speedy/default_site/global/more-stories-embed'
) 
LIMIT 18446744073709551615 in /var/www/joandearc.churchmilitant.com/panel/ee/legacy/database/drivers/mysqli/mysqli_connection.php:114
}
#32

BoldMinded (Brian)

Jan 22, 2023

> Are there possible problems with speedy if I directly flush db?

No, you should be able to clear Redis outside of EE/Speedy’s methods just fine. If you do you might have some rows in the tags table that don’t match up to a cached item, but if the cached item is just going to get regenerated again then it shouldn’t be a big deal.

What were you doing when you got that tags query error? I’ve never seen that before.

#33

Patmos Inc

Jan 24, 2023

Oh okay, good to know about clearing without updating the table. In regards to that last error that I posted. That was a user on the front end who I traced using datadog. They received a 500 error with that message after a long timeout. So, I had some more thoughts over the weekend.

I think that I may have started getting us lost in the weeds here; I am not entirely sure why my caching would cause timeouts with only 17,000 items in the cache, but because I tested it out on our staging environment with the new redis driver count method, and was not able to recreate a timeout error until I reached 400,000 items, I would conclude that there are other environment specific errors that may not have to do with the addon contributing to these problems. I noticed a problem with our environment:
There are indexing bots hitting ee pagination links with infinite possibilities which were causing cache to explode at too fast a rate, so I created conditionals to stop this, and this has brought my cache down to a much more manageable limit; around 2000. This was setup from the earlier ce caching system and was not a problem before, so I was unable to identify before switching to speedy.

So, I do not want to drag this ticket out longer than necessary, as I think your work to make the redis driver will probably ameliorate most of these issues. If after that, I am still having more issues, it likely will be easier to identify, and I can submit a separate ticket. Thanks for all your work, and I appreciate the fast response times.

Thanks,
Thomas at Patmos.

Login to reply