Python, Javascript and UNIX hacker, open source advocate, IRC addict, general badass and traveler
355 stories
·
7 followers

Brian Fehrle: Why is PostgreSQL Running Slow? Tips & Tricks to Get to the Source

2 Shares

As a PostgreSQL Database Administrator, there are the everyday expectations to check on backups, apply DDL changes, make sure the logs don’t have any game breaking ERROR’s, and answer panicked calls from developers who’s reports are running twice as long as normal and they have a meeting in ten minutes.

Even with a good understanding of the health of managed databases, there will always be new cases and new issues popping up relating to performance and how the database “feels”. Whether it’s a panicked email, or an open ticket for “the database feels slow”, this common task can generally be followed with a few steps to check whether or not there is a problem with PostgreSQL, and what that problem may be.

This is by no extent an exhaustive guide, nor do the steps need to be done in any specific order. But it’s rather a set of initial steps that can be taken to help find the common offenders quickly, as well as gain new insight as to what the issue may be. A developer may know how the application acts and responds, but the Database Administrator knows how the database acts and responds to the application, and together, the issue can be found.

NOTE: The queries to be executed should be done as a superuser, such as ‘postgres’ or any database user granted the superuser permissions. Limited users will either be denied or have data omitted.

Step 0 - Information Gathering

Get as much information as possible from whoever says the database seems slow; specific queries, applications connected, timeframes of the performance slowness, etc. The more information they give the easier it will be to find the issue.

Step 1 - Check pg_stat_activity

The request may come in many different forms, but if “slowness” is the general issue, checking pg_stat_activity is the first step to understand just what’s going on. The pg_stat_activity view (documentation for every column in this view can be found here) contains a row for every server process / connection to the database from a client. There is a handful of useful information in this view that can help.

NOTE: pg_stat_activity has been known to change structure over time, refining the data it presents. Understanding of the columns themselves will help build queries dynamically as needed in the future.

Notable columns in pg_stat_activity are:

  1. query: a text column showing the query that’s currently being executed, waiting to be executed, or was last executed (depending on the state). This can help identify what query / queries a developer may be reporting are running slowly.
  2. client_addr: The IP address for which this connection and query originated from. If empty (or Null), it originated from localhost.
  3. backend_start, xact_start, query_start: These three provide a timestamp of when each started respectively. Backend_start represents when the connection to the database was established, xact_start is when the current transaction started, and query_start is when the current (or last) query started.
  4. state: The state of the connection to the database. Active means it’s currently executing a query, ‘idle’ means it’s waiting further input from the client, ‘idle in transaction’ means it’s waiting for further input from the client while holding an open transaction. (There are others, however their likelihood is rare, consult the documentation for more information).
  5. datname: The name of the database the connection is currently connected to. In multiple database clusters, this can help isolate problematic connections.
  6. wait_event_type and wait_event: These columns will be null when a query isn’t waiting, but if it is waiting they will contain information on why the query is waiting, and exploring pg_locks can identify what it’s waiting on. (PostgreSQL 9.5 and before only has a boolean column called ‘waiting’, true if waiting, false if not.

1.1. Is the query waiting / blocked?

If there is a specific query or queries that are “slow” or “hung”, check to see if they are waiting for another query to complete. Due to relation locking, other queries can lock a table and not let any other queries to access or change data until that query or transaction is done.

PostgreSQL 9.5 and earlier:

SELECT * FROM pg_stat_activity WHERE waiting = TRUE;

PostgreSQL 9.6:

SELECT * FROM pg_stat_activity WHERE wait_event IS NOT NULL;

PostgreSQL 10 and later (?):

SELECT * FROM pg_stat_activity WHERE wait_event IS NOT NULL AND backend_type = 'client backend';

The results of this query will show any connections currently waiting on another connection to release locks on a relation that is needed.

If the query is blocked by another connection, there are some ways to find out just what they are. In PostgreSQL 9.6 and later, the function pg_blocking_pids() allows the input of a process ID that’s being blocked, and it will return an array of process ID’s that are responsible for blocking it.

PostgreSQL 9.6 and later:

SELECT * FROM pg_stat_activity 
WHERE pid IN (SELECT pg_blocking_pids(<pid of blocked query>));

PostgreSQL 9.5 and earlier:

SELECT blocked_locks.pid     AS blocked_pid,
         blocked_activity.usename  AS blocked_user,
         blocking_locks.pid     AS blocking_pid,
         blocking_activity.usename AS blocking_user,
         blocked_activity.query    AS blocked_statement,
         blocking_activity.query   AS current_statement_in_blocking_process
   FROM  pg_catalog.pg_locks         blocked_locks
    JOIN pg_catalog.pg_stat_activity blocked_activity  ON blocked_activity.pid = blocked_locks.pid
    JOIN pg_catalog.pg_locks         blocking_locks 
        ON blocking_locks.locktype = blocked_locks.locktype
        AND blocking_locks.DATABASE IS NOT DISTINCT FROM blocked_locks.DATABASE
        AND blocking_locks.relation IS NOT DISTINCT FROM blocked_locks.relation
        AND blocking_locks.page IS NOT DISTINCT FROM blocked_locks.page
        AND blocking_locks.tuple IS NOT DISTINCT FROM blocked_locks.tuple
        AND blocking_locks.virtualxid IS NOT DISTINCT FROM blocked_locks.virtualxid
        AND blocking_locks.transactionid IS NOT DISTINCT FROM blocked_locks.transactionid
        AND blocking_locks.classid IS NOT DISTINCT FROM blocked_locks.classid
        AND blocking_locks.objid IS NOT DISTINCT FROM blocked_locks.objid
        AND blocking_locks.objsubid IS NOT DISTINCT FROM blocked_locks.objsubid
        AND blocking_locks.pid != blocked_locks.pid
    JOIN pg_catalog.pg_stat_activity blocking_activity ON blocking_activity.pid = blocking_locks.pid
   WHERE NOT blocked_locks.GRANTED;

(Available from the PostgreSQL Wiki).

These queries will point to whatever is blocking a specific PID that’s provided. With that, a decision can be made to kill the blocking query or connection, or let it run.

Step 2 - If the queries are running, why are they taking so long?

2.1. Is the planner running queries efficiently?

If a query (or set of queries) in question has the status of ‘active’, then it’s actually running. If the whole query isn’t available in pg_stat_activity, fetch it from the developers or the postgresql log and start exploring the query planner.

EXPLAIN SELECT * FROM postgres_stats.table_stats t JOIN hosts h ON (t.host_id = h.host_id) WHERE logged_date >= '2018-02-01' AND logged_date < '2018-02-04' AND t.india_romeo = 569;
Nested Loop  (cost=0.280..1328182.030 rows=2127135 width=335)
  ->  Index Scan using six on victor_oscar echo  (cost=0.280..8.290 rows=1 width=71)
          Index Cond: (india_romeo = 569)
  ->  Append  (cost=0.000..1306902.390 rows=2127135 width=264)
        ->  Seq Scan on india_echo romeo  (cost=0.000..0.000 rows=1 width=264)
                Filter: ((logged_date >= '2018-02-01'::timestamp with time zone) AND (logged_date < '2018-02-04'::timestamp with time zone) AND (india_romeo = 569))
        ->  Seq Scan on juliet victor_echo  (cost=0.000..437153.700 rows=711789 width=264)
                Filter: ((logged_date >= '2018-02-01'::timestamp with time zone) AND (logged_date < '2018-02-04'::timestamp with time zone) AND (india_romeo = 569))
        ->  Seq Scan on india_papa quebec_bravo  (cost=0.000..434936.960 rows=700197 width=264)
                Filter: ((logged_date >= '2018-02-01'::timestamp with time zone) AND (logged_date < '2018-02-04'::timestamp with time zone) AND (india_romeo = 569))
        ->  Seq Scan on two oscar  (cost=0.000..434811.720 rows=715148 width=264)
                Filter: ((logged_date >= '2018-02-01'::timestamp with time zone) AND (logged_date < '2018-02-04'::timestamp with time zone) AND (india_romeo = 569))

This example shows a query plan for a two table join that also hits a partitioned table. We’re looking for anything that can cause the query to be slow, and in this case the planner is doing several Sequential Scans on partitions, suggesting that they are missing indexes. Adding indexes to these tables for column ‘india_romeo’ will instantly improve this query.

Things to look for are sequential scans, nested loops, expensive sorting, etc. Understanding the query planner is crucial to making sure queries are performing the best way possible, official documentation can be read for more information here.

2.2. Are the tables involved bloated?

If the queries are still feeling slow without the query planner pointing at anything obvious, it’s time to check the health of the tables involved. Are they too big? Are they bloated?

SELECT n_live_tup, n_dead_tup from pg_stat_user_tables where relname = ‘mytable’;
n_live_tup  | n_dead_tup
------------+------------
      15677 |    8275431
(1 row)

Here we see that there are many times more dead rows than live rows, which means to find the correct rows, the engine must sift through data that’s not even relevant to find real data. A vacuum / vacuum full on this table will increase performance significantly.

Step 3 - Check the logs

If the issue still can’t be found, check the logs for any clues.

FATAL / ERROR messages:

Look for messages that may be causing issues, such as deadlocks or long wait times to gain a lock.

Checkpoints

Hopefully log_checkpoints is set to on, which will write checkpoint information to the logs. There are two types of checkpoints, timed and requested (forced). If checkpoints are being forced, then dirty buffers in memory must be written to disk before processing more queries, which can give a database system an overall feeling of “slowness”. Increasing checkpoint_segments or max_wal_size (depending on the database version) will give the checkpointer more room to work with, as well as help the background writer take some of the writing load.

Step 4 - What’s the health of the host system?

If there’s no clues in the database itself, perhaps the host itself is overloaded or having issues. Anything from an overloaded IO chanel to disk, memory overflowing to swap, or even a failing drive, none of these issues would be apparent with anything we looked at before. Assuming the database is running on a *nix based operating system, here are a few things that can help.

4.1. System load

Using ‘top’, look at the load average for the host. If the number is approaching or exceeding the number of cores on the system, it could be simply too many concurrent connections hitting the database bringing it to a crawl to catch up.

load average: 3.43, 5.25, 4.85

4.2. System memory and SWAP

Using ‘free’, check to see if SWAP has been used at all. Memory overflowing to SWAP in a PostgreSQL database environment is extremely bad for performance, and many DBA’s will even eliminate SWAP from database hosts, as an ‘out of memory’ error is more preferable than a sluggish system to many.

If SWAP is being used, a reboot of the system will clear it out, and increasing total system memory or re-configuring memory usage for PostgreSQL (such as lowering shared_buffers or work_mem) may be in order.

[postgres@livedb1 ~]$ free -m
              total        used        free      shared  buff/cache   available
Mem:           7986         225        1297          12        6462        7473
Swap:          7987        2048        5939

4.3. Disk access

PostgreSQL attempts to do a lot of its work in memory, and spread out writing to disk to minimize bottlenecks, but on an overloaded system with heavy writing, it’s easily possible to see heavy reads and writes cause the whole system to slow as it catches up on the demands. Faster disks, more disks and IO channels are some ways to increase the amount of work that can be done.

Tools like ‘iostat’ or ‘iotop’ can help pinpoint if there is a disk bottleneck, and where it may be coming from.

4.4. Check the logs

If all else fails, or even if not, logs should always be checked to see if the system is reporting anything that’s not right. We already discussed checking the postgresql.logs, but the system logs can give information about issues such as failing disks, failing memory, network problems, etc. Any one of these issues can cause the database to act slow and unpredictable, so a good understanding of perfect health can help find these issues.

Download the Whitepaper Today
 
PostgreSQL Management & Automation with ClusterControl
Learn about what you need to know to deploy, monitor, manage and scale PostgreSQL

Step 5 - Something still not make sense?

Even the most seasoned administrators will run into something new that doesn’t make sense. That’s where the global PostgreSQL community can come in to help out. Much like step #0, the more clear information given to the community, the easier they can help out.

5.1. PostgreSQL Mailing Lists

Since PostgreSQL is developed and managed by the open source community, there are thousands of people who talk through the mailing lists to discuss countless topics including features, errors, and performance issues. The mailing lists can be found here, with pgsql-admin and pgsql-performance being the most important for looking for help with performance issues.

5.2. IRC

Freenode hosts several PostgreSQL channels with developers and administrators all over the world, and it’s not hard to find a helpful person to track down where issues may be coming from. More information can be found on the PostgreSQL IRC page.

Read the whole story
miohtama
24 days ago
reply
Helsinki, Finland
Share this story
Delete

Spinnaker Hull Automatic Collection

1 Share

Spinnaker has built a reputation in the watch world by offering uniquely designed, purpose driven, and affordable timepieces. The Spinnaker design language has always aimed to incorporate unique and seldom seen aspects, and their latest release, the Spinnaker Hull, is no different. The Spinnaker Hull...

The post Spinnaker Hull Automatic Collection appeared first on aBlogtoWatch.

Read the whole story
miohtama
29 days ago
reply
Helsinki, Finland
Share this story
Delete

'Solo' Just Canonized Teräs Käsi, A Forgotten 'Star Wars' PlayStation Game

1 Share

If you didn’t catch every single obscure Star Wars reference in Solo, you’re totally forgiven. Though the tone of the movie is easily the lightest and most accessible Star Wars film in years, some of the easter eggs are seemingly designed only for hardcore fans over the age of 30. Which is why if you didn’t know what Qi’ra (Emelia Clarke) was talking about when she mentioned a certain kind of martial arts she knew, that’s because it was a call back to an all-but-forgotten 1997 Star Wars video game for the very first PlayStation; Masters of Teräs Käsi. Spoilers ahead for Solo: A Star Wars Story.

Read More... 'Solo' Just Canonized Teräs Käsi, A Forgotten 'Star Wars' PlayStation Game
Read the whole story
miohtama
30 days ago
reply
Helsinki, Finland
Share this story
Delete

Blower and Charcoal

1 Share

I made a blower and some charcoal at the new area in order to create higher temperatures in for advancing my material technology. I took Fan palm leaves and fashioned them into an impellor (about 25 cm in diameter) held in a split stick as a rotor. I then built a housing from clay (slightly more than 25 cm diameter with inlet and outlet openings about 8cm in diameter) and assembled the blower. I opted not to make a bow or cord mechanism as I’ve done before due to the complexity and lower portability of such a device. The lighter impellor material (leaf instead of the previous bark) made it easier to spin by hand anyway as it has a lower momentum. Each stroke of the spindle with the hand produces 4 rotations, so about 2 strokes per second gives 480 rpm. The blower increases the heat of a fire when blowing into it and I would guess it’s more effective than a blow pipe and lungs but don’t how it would compare to a primitive pot or bag bellows for air supply. A small furnace was made and then fired with wood fuel. The wood was wet but managed to fuse and partially met sand in the furnace.

To get better performance, I made charcoal from the poor quality wood. I made a reusable charcoal retort to make it. This was different from the previous reusable mound I built as it consisted of a mud cylinder with air holes around the base. To use, it was stacked with wood and the top was covered with mud as opposed to the previous design which had a side door. The fire was lit from the top as usual and when the fire reached the air entries at the base (after an hour or two) the holes were sealed and the mound left to cool. The top was the broken open the next day and the charcoal removed. Another batch was made using significantly less effort as the main structure of the mound did not need to be rebuilt each time, only the top.

Iron bacteria was again used to test the furnace. Charcoal and ore was placed in the furnace and the blower utilised. After an hour of operation the furnace was left to cool. The next day the furnace was opened and only slag was found with no metallic iron this time. I think increasing the ratio of charcoal to ore might increase the temperature so that the slag flows better. Further experiments will be needed before I get used to the new materials here.

The new area I’m in is significantly wetter than the old area and this has affected the order in which I create my pyro technology. The old spot was a dry eucalypt forest with an abundant source of energy dense fire wood. As a result, I developed kilns early on, powered with wood fuel and a natural draft, before developing charcoal fuelled forced air furnaces. In contrast, the new area is a wet tropical rainforest, where wood rots nearly as soon as it falls off the tree in the damp conditions. Wood is also more difficult to collect here because of hordes of mosquitoes (away from the fire) and unpleasant, spiky plants. Because of this I developed a forge blower first as it allows higher temperatures from a lower quantity and quality of fuel.

This poor quality wood can further be improved by converting it to charcoal first. In future, it may be necessary to cut fire wood green and dry it as opposed to picking it up off the ground dead as was preferable in the Eucalyptus forest I came from. The blower is also handy for stoking a tired campfire back into flames, I simply scrape the coals into a small mound around the nose of the tuyere and spin the impellor. I use the blower each day I’m at the hut for this purpose to save blowing on hot coals each time I need a fire for something.



Read the whole story
miohtama
37 days ago
reply
Helsinki, Finland
Share this story
Delete

Inter-American Development Bank Publishes Report on Fintech Sandboxes

1 Share

The Inter-American Development Bank (IDB) is out with report covering the topic of Fintech Sandboxes. These new step in fostering financial innovation was first launched in the UK by the Financial Conduct Authority (FCA) but the concept has since been replicated in a growing number... Read More

The post Inter-American Development Bank Publishes Report on Fintech Sandboxes appeared first on Crowdfund Insider.

Read the whole story
miohtama
68 days ago
reply
Helsinki, Finland
Share this story
Delete

Apple co-founder Steve Wozniak quits Facebook over data abuse scandal

1 Share
Comments
Read the whole story
miohtama
76 days ago
reply
Helsinki, Finland
Share this story
Delete
Next Page of Stories