Environment
- YugabyteDB - DocDB layer
Issue
- Master server logs warning messages with stack trace below.
- These printed stack traces causes high CPU causing the server to become unresponsive with below in the logs every few secs.
scoped_leader_shared_lock.cc:114] Long lock of catalog manager: 0.123s
@ 0x7fc7b2c52ad0 yb::master::ScopedLeaderSharedLock::Unlock()
@ 0x7fc7b2c4a912 yb::master::CatalogManagerBgTasks::Run()
@ 0x7fc7a87dae6f yb::Thread::SuperviseThread()
@ 0x7fc7a3f12694 start_thread
@ 0x7fc7a364f41d __clone
@ (nil) (unknown)
Resolution
- These are warning messages printed when the master leader shared lock is held for longer than 0.1 sec. These messages can be ignored.. There is an existing GH issue that is open to increasing the default interval. Follow that issue here: https://github.com/yugabyte/yugabyte-db/issues/7620
- If the lock is longer than the default value, shown in the stack trace, the current operations on the Master server have to be checked to see if it's creating any contention or blocked/waiting by some other operation at Master.
- If stack trace printed in the logs contribute to high CPU due to frequent logging, increase the below configuration flag to a higher value and restart the master process.
master_log_lock_warning_ms
from 100ms to 1000ms.master_leader_lock_stack_trace_ms
from 1000ms to 3000ms.
Overview
Catalog Manager of the master tracks the state and location of tables/tablets in the cluster. There are various operations in the cluster that requires LockforRead()
which obtains a read lock on the Master's Catalog. An example of such an operation is such GetSchema()
or GetTableLocations()
call. For long running Master operations, If the leader shared locks held longer than master_log_lock_warning_ms
, warning messages are printed in the logs which can be ignored. If the shared locks are held longer than master_leader_lock_stack_trace_ms
then stack trace gets printed in the logs.
./yugabyte-db/src/yb/master/scoped_leader_shared_lock.cc
....................................................................................
void ScopedLeaderSharedLock::Unlock() {
if (leader_shared_lock_.owns_lock()) {
{
decltype(leader_shared_lock_) lock;
lock.swap(leader_shared_lock_);
}
auto finish = std::chrono::steady_clock::now();
bool need_stack_trace = finish > start_ + 1ms * FLAGS_master_leader_lock_stack_trace_ms;
bool need_warning =
need_stack_trace || (finish > start_ + 1ms * FLAGS_master_log_lock_warning_ms);
if (need_warning) {
LOG(WARNING)
<< "Long lock of catalog manager (" << file_name_ << ":" << line_number_ << ", "
<< function_name_ << "): " << AsString(finish - start_)
<< (need_stack_trace ? "\n" + GetStackTrace() : "");
}
}
}
Comments
0 comments
Please sign in to leave a comment.