Issue
Prometheus process on Yugabyte Anywhere stops abruptly, or is suspected of running out of memory.
Steps
Navigate to the web browser to <YB Anyhwere Hostname or IP>:9090
and enter the following PromQL query:
1. Click on Status > TSDB Status and save the output to a single html file, or print as a PDF. This will contain summary information about the Prometheus metrics for the universes monitored by the Yugaware Anywhere instance.
2. On the Prometheus menu bar select Graph and provide ONLY the Table output of the following queries:
count ({job=~".+"}) by (job)
count ({job=~".+"}) by (node_prefix)
count ({job="yugabyte", table_id=~".+"}) by (node_prefix)
count (count ({job="yugabyte", table_id=~".+"}) by (node_prefix, table_id)) by (node_prefix)
count (count ({job="yugabyte", table_id=~".+"}) by (node_prefix, __name__)) by (node_prefix)
IMPORTANT: If in a constrained memory situation, if Graph is selected, this can cause Prometheus to stop unexpectedly.
Provide an html or PDF print out of the table information.
3. On the Prometheus menu bar select Graph and provide the graph of the following PromQL query:
sum(process_resident_memory_bytes{job="prometheus"}) by (__name__) or
sum(go_memstats_heap_inuse_bytes{job="prometheus"}) by (instance) or
sum(go_memstats_heap_idle_bytes{job="prometheus"}) by (job)
- Graph this over the time period of prometheus outage for the last 2 weeks of time.
- Provide an html or PDF print out of the graph information.
4. Provide the logs associated with the prometheus docker container by running the following on the Yugabyte Anywhere node:
sudo docker logs prometheus
5. On the Yugabyte Anywhere host, provide the output of the following command:
dmesg -T
6. If the Yugabyte Anywhere host is running RHEL or CentOS run a sosreport.
Resolution
- If out of memory conditions are encountered consider increase the available system memory on the Yugabyte Anywhere host.
- Provide the artifacts to Yugabyte Support for further analysis.
Comments
0 comments
Please sign in to leave a comment.