In-Portal Issue Tracker - In-Portal CMS
Viewing Issue Advanced Details
1123 [In-Portal CMS] Caching System bug report always 2011-09-19 15:41 2012-07-25 05:33
closed 5.1.3  
none 5.2.0-B1
Fixes server load exponential raise on cache reset
0001123: Parallel cache rebuild problem could cause high server load
Some background info:
To rebuild cache firstly user_a deletes it. Then first user to ask for this missing cache (not necessarily user_a, that initially deleted a cache) will build it.

All seems nice and pretty until user_b visits a website, while user_a is still building a cache. Based on logic, described above user_b will too start cache building process in PARALLEL with user_a. This will indefinitely continue to happen with each new user visiting website in time, until user_a completes cache building process. But user_a won't be able to build cache as quickly as he planned, since other users, who are building same cache in PARALLEL will slow him down.

This way due parallel calculations server load will raise exponentially. For a dedicated servers this might not be that big problem, but for shared hosting this could lead to whole server shutdown.

Here are preconditions, that can cause exponential server load I've explain above:

In-Commerce module installed
Unit Config Cache build time - 5 seconds
5+ RPS (request per second to the site)

Concept of fixing:
There are 2 states in which each of caches could be:

we have cache, but it's outdated and needs to be rebuild
we don't have any cache and we need to build from scratch

Here is what I propose:
in case, when we have outdated cache, then let user_a rebuild cache, while other users would use outdated cache version
in case, when we don't have cache, then let user_a rebuild cache, while other users will be waiting (predefined amount of seconds) for him to finish and then use cache, when it's ready

To implement proposed idea we always need a way to get outdated cache version to return to other users, while user_a is building.

This is now always possible due current cache key automatic expiration scheme. For example, cache key "sample_key[%LangSerial%]" (that automatically expire on LangSerial cache key change) would be stored in cache under name "sample_key[%LangSerial:1%]" (added ":1"). This way, when LangSerial cache key will be changed, then key name (in cache) will be different and that cache with previous name sort-of expires (since nobody will know how to access it). This works well, but we don't have a way to get old cache key name to return all users, except one, that is building new cache.

To solve this issue I'm proposing to store additional cache key with each cache key stored and don't replace any serial cache keys (ones between "[%" and "%]") within cache key name. That additional cache key will hold variable part of cache key. This way original cache key will always be the same, but expiration fact could be detected by comparing at cached and current additional cache key value.
For example:

key: "sample_key[%LangSerial%]" (actually stored key is: "sample_key[%LangSerial:1%]"), value: "some cached data"
key: "sample_key[%LangSerial%]" (actually stored key is: "sample_key[%LangSerial%]"), value: "some cached data"
key: "sample_key[%LangSerial%]_serials" ("_serials" added to original cache key name), value: "sample_key[%LangSerial:1%]"

To implement described scheme we need:

make "getCache" method to wait for cache (if it's totally missing) or return outdated cache (when cache is build by other user)
make "setCache" method reset any cache building indicators (set by rebuildCache method, see below)
create "rebuildCache" method, that will allow to indicate, that:
cache will be rebuild right away (e.g. set "<cache_key>_rebuilding" cache key, so other users will know, that somebody is rebuilding cache)
cache must be rebuild on next user visit (e.g. set "<cache_key>_rebuild" cache key, so next user will know that cache must be rebuild)
related to 0000107closed  (5.1.0)alex Implement "MemCached" functionality 
parent of 0001231closed  (5.2.0)alex Bug in parallel cache rebuild protection 
patch server_load_raise_on_cache_rebuild_core_520.patch (31,771) 2011-09-19 15:58
patch server_load_raise_on_cache_rebuild_modules_520.patch (1,589) 2011-09-19 15:58
patch install_step2_infinite_cache_waiting.patch (2,157) 2011-09-20 12:31
vsd Parallel cache rebuild prevention.vsd (54,784) 2011-09-25 04:48
png Parallel cache rebuild prevention.png (33,857) 2011-09-25 04:48
Issue History
2012-07-25 05:33 alex Note Added: 0005080
2012-07-25 05:33 alex Status resolved => closed
2012-03-24 15:38 alex Relationship added parent of 0001231
2012-01-24 08:55 alex Changeset attached 5.2.x r15096
2012-01-23 09:31 alex Changeset attached 5.2.x r15095
2011-10-22 05:39 alex Estimate Points => 3
2011-09-25 04:48 alex File Added: Parallel cache rebuild prevention.png
2011-09-25 04:48 alex File Added: Parallel cache rebuild prevention.vsd
2011-09-20 12:34 alex Changeset attached 5.2.x r14575
2011-09-20 12:32 alex Note Added: 0003875
2011-09-20 12:31 alex File Added: install_step2_infinite_cache_waiting.patch
2011-09-19 16:01 alex Note Added: 0003769
2011-09-19 16:01 alex Status reviewed and tested => resolved
2011-09-19 16:01 alex Fixed in Version => 5.2.0-B1
2011-09-19 16:01 alex Resolution open => fixed
2011-09-19 16:01 alex Assigned To !COMMUNITY => alex
2011-09-19 16:01 alex Changeset attached 5.2.x r14560
2011-09-19 16:00 alex Changeset attached 5.2.x r14559
2011-09-19 15:59 alex Note Added: 0003768
2011-09-19 15:59 alex Status needs testing => reviewed and tested
2011-09-19 15:59 alex Relationship added related to 0000107
2011-09-19 15:58 alex Assigned To => !COMMUNITY
2011-09-19 15:58 alex Developer => alex
2011-09-19 15:58 alex Status active => needs testing
2011-09-19 15:58 alex File Added: server_load_raise_on_cache_rebuild_modules_520.patch
2011-09-19 15:58 alex File Added: server_load_raise_on_cache_rebuild_core_520.patch
2011-09-19 15:41 alex New Issue
2011-09-19 15:41 alex Reference =>
2011-09-19 15:41 alex Change Log Message => Fixes server load exponential raise on cache reset

2011-09-19 15:59   
Testing right away, since will be tested all together later.
2011-09-19 16:01   
Fix committed to 5.2.x branch. Commit Message:

Fixes 0001123: Parallel cache rebuild problem could cause high server load
2011-09-20 12:32   
Unit config cache isn't stored to database/memory during installation. Because of that "cache rebuilding" mark was staying indefinitely and user never was able to go to 2nd step of installation wizard.

Patch "install_step2_infinite_cache_waiting.patch" fixes that.
2012-07-25 05:33   
Since 5.2.0 version was released.