当前位置：首页 > news >正文

外贸自建站收款通道自己做企业网站详细流程免费

news 2025/10/14 23:58:01

外贸自建站收款通道,自己做企业网站详细流程免费,怎么做律所的官方网站,上海小程序开发合肥PgSQL-并行查询系列-介绍现代CPU模型拥有大量的CPU核心。多年来#xff0c;数据库应用程序都是并发向数据库发送查询的。查询处理多个表的行时#xff0c;若可以使用多核#xff0c;则可以客观地提升性能。PgSQL 9.6引入了并行查询的新特性#xff0c;开启并行查询后可以大… PgSQL-并行查询系列-介绍现代CPU模型拥有大量的CPU核心。多年来数据库应用程序都是并发向数据库发送查询的。查询处理多个表的行时若可以使用多核则可以客观地提升性能。PgSQL 9.6引入了并行查询的新特性开启并行查询后可以大幅提升性能。 1、局限性 1若所有CPU核心已经饱和则不要启动并行查询。并行执行会从其他查询中窃取CPU时间并增加响应时间 2进一步需要注意并行处理会显著增加内存使用需要注意work_mem的值。因为每个hash join或者排序操作都会使用work_mem大小的内存。 3低延迟的OLTP查询并不能通过并行显著提升性能。特别是仅返回1行的查询若启用并行性能会变得特烂。 4并行执行仅支持没有锁谓词的SELECT查询 5不支持cursor和会挂起的查询 6windowed 函数和ordered-set聚合函数都不是并行的 7对于负载已达IO瓶颈的并没有啥好处 8没有并行排序算法。然而排序查询在某些方面仍然可以并行 9将CTEWITH...替换为sub-select以支持并行执行 10FDW还不支持并行后面版本可以注意哪个版本支持 11full outer join不支持 12客户端设置了max_rows禁止并行执行 13如果查询中使用了没有标记为PARALLEL SAFE的函数那他就是单线程执行 14SERIALIZABLE事务隔离级别禁用并行执行 2、并行顺序扫描并行顺序扫描很快原因可能不是并行读而是将数据访问分散到多个CPU上。现代操作系统给PgSQL的数据文件提供了很好的缓冲机制。预取允许从存储中获取一个块而不仅是PgSQL请求的块。因此查询性能限制往往不在IO上它消耗CPU周期从表数据页中逐行读取比较行值和WHERE条件我们执行一个简单查询 tpch# explain analyze select l_quantity as sum_qty from lineitem where l_shipdate date 1998-12-01 - interval 105 day; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------- Seq Scan on lineitem (cost0.00..1964772.00 rows58856235 width5) (actual time0.014..16951.669 rows58839715 loops1) Filter: (l_shipdate 1998-08-18 00:00:00::timestamp without time zone) Rows Removed by Filter: 1146337 Planning Time: 0.203 ms Execution Time: 19035.100 ms 一个顺序扫描没有聚合需要产生大量行。因此该查询被一个CPU核心执行。添加聚合SUM()后可以清晰的看到有2个进程帮助查询 explain analyze select sum(l_quantity) as sum_qty from lineitem where l_shipdate lt; date 1998-12-01 - interval 105 day; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------- Finalize Aggregate (cost1589702.14..1589702.15 rows1 width32) (actual time8553.365..8553.365 rows1 loops1) -gt; Gather (cost1589701.91..1589702.12 rows2 width32) (actual time8553.241..8555.067 rows3 loops1) Workers Planned: 2 Workers Launched: 2 -gt; Partial Aggregate (cost1588701.91..1588701.92 rows1 width32) (actual time8547.546..8547.546 rows1 loops3) -gt; Parallel Seq Scan on lineitem (cost0.00..1527393.33 rows24523431 width5) (actual time0.038..5998.417 rows19613238 loops3) Filter: (l_shipdate lt; 1998-08-18 00:00:00::timestamp without time zone) Rows Removed by Filter: 382112 Planning Time: 0.241 ms Execution Time: 8555.131 ms 性能提升2.2倍。 3、并行聚合 “Parallel Seq Scan”节点为partial aggregation提供行。“Partial Aggregate”节点先对SUM()进行一次操作。最后“Gather”节点汇总每个进程的SUM值。“Finalize Aggregate”节点进行最后计算。如果你使用了聚合函数不要忘记标记他们为“parallel safe”。 4、进程个数可以不重启服务增加并行进程个数 alter system set max_parallel_workers_per_gather4; select * from pg_reload_conf(); Now, there are 4 workers in explain output: tpch# explain analyze select sum(l_quantity) as sum_qty from lineitem where l_shipdate lt; date 1998-12-01 - interval 105 day; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------- Finalize Aggregate (cost1440213.58..1440213.59 rows1 width32) (actual time5152.072..5152.072 rows1 loops1) -gt; Gather (cost1440213.15..1440213.56 rows4 width32) (actual time5151.807..5153.900 rows5 loops1) Workers Planned: 4 Workers Launched: 4 -gt; Partial Aggregate (cost1439213.15..1439213.16 rows1 width32) (actual time5147.238..5147.239 rows1 loops5) -gt; Parallel Seq Scan on lineitem (cost0.00..1402428.00 rows14714059 width5) (actual time0.037..3601.882 rows11767943 loops5) Filter: (l_shipdate lt; 1998-08-18 00:00:00::timestamp without time zone) Rows Removed by Filter: 229267 Planning Time: 0.218 ms Execution Time: 5153.967 ms 我们将并发进程由2改成了4但是查询仅快1.6599倍。实际上我们有2个进程一个leader配置改好成为41。并行最大提升可以5/31.66倍。 5、如何工作查询执行总是从“leader”进程开始。Leader进程执行所有非并行动作。其他进程执行相同查询称为“worker”进程。并行利用Dynamic Backgroud workers基础架构9.4引入执行。因此创建3个工作进程的查询可能比传统执行快4倍。 Worker进程使用消息队列基于共享内存和leader进行通信。每个进程有2个队列一个为errors另一个是tuples。 5、进程使用个数 1max_parallel_workers_per_gather是workers进程数的最小限制 2查询执行使用的workers限制为max_parallel_workes 3最上层的限制是max_worker_processes后台进程的总数分配进程失败会导致使用单进程执行。查询规划器会根据表或索引大小来增加worker个数。min_parallel_table_scan_size和min_parallel_index_scan_size控制该行为。 set min_parallel_table_scan_size8MB 8MB table gt; 1 worker 24MB table gt; 2 workers 72MB table gt; 3 workers x gt; log(x / min_parallel_table_scan_size) / log(3) 1 worker 表比min_parallel_(index|table)_scan_size值每大3倍PG增加一个worker进程。Workers进程个数不是基于成本的。循环依赖使得复杂的实现变得困难。相反,规划者使用简单的规则。可以通过ALTER TABLE … SET (parallel_workers N)来对某个表指定并行进程数。 6、为什么不使用并行除了并行限制外PG还会检查代价 parallel_setup_cost:避免短查询的并行执行。模拟用于内存设置、流程启动和初始通信的时间 parallel_tuple_cost:leader和worker之间通信可能花费很长时间。时间和worker发送的记录数成正比。参数对通信成本进行建模。 7、Nested Loop Join PgSQL9.6可以以并行形式执行“Nested loop”。 explain (costs off) select c_custkey, count(o_orderkey)from customer left outer join orders onc_custkey o_custkey and o_comment not like %special%deposits%group by c_custkey;QUERY PLAN --------------------------------------------------------------------------------------Finalize GroupAggregateGroup Key: customer.c_custkey-gt; Gather MergeWorkers Planned: 4-gt; Partial GroupAggregateGroup Key: customer.c_custkey-gt; Nested Loop Left Join-gt; Parallel Index Only Scan using customer_pkey on customer-gt; Index Scan using idx_orders_custkey on ordersIndex Cond: (customer.c_custkey o_custkey)Filter: ((o_comment)::text !~~ %special%deposits%::text) Gather发生在最后阶段因此“Nested Loop Left Join”是并行操作。“Parallel Index Only Scan”在版本10才可以使用和并行顺序扫描类似。c_custkey o_custkey条件读取每个customer行的order列因此不是并行。 8、Hash Join PgSQL11中每个worker构建自己的hash table。因此4 workers不能提升性能。新的实现方式使用一个共享hash table。每个worker可以利用WORK_MEM来构建hash table selectl_shipmode,sum(casewhen o_orderpriority 1-URGENTor o_orderpriority 2-HIGHthen 1else 0end) as high_line_count,sum(casewhen o_orderpriority lt;gt; 1-URGENTand o_orderpriority lt;gt; 2-HIGHthen 1else 0end) as low_line_count fromorders,lineitem whereo_orderkey l_orderkeyand l_shipmode in (MAIL, AIR)and l_commitdate lt; l_receiptdateand l_shipdate lt; l_commitdateand l_receiptdate gt; date 1996-01-01and l_receiptdate lt; date 1996-01-01 interval 1 year group byl_shipmode order byl_shipmode LIMIT 1;QUERY PLAN -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------Limit (cost1964755.66..1964961.44 rows1 width27) (actual time7579.592..7922.997 rows1 loops1)-gt; Finalize GroupAggregate (cost1964755.66..1966196.11 rows7 width27) (actual time7579.590..7579.591 rows1 loops1)Group Key: lineitem.l_shipmode-gt; Gather Merge (cost1964755.66..1966195.83 rows28 width27) (actual time7559.593..7922.319 rows6 loops1)Workers Planned: 4Workers Launched: 4-gt; Partial GroupAggregate (cost1963755.61..1965192.44 rows7 width27) (actual time7548.103..7564.592 rows2 loops5)Group Key: lineitem.l_shipmode-gt; Sort (cost1963755.61..1963935.20 rows71838 width27) (actual time7530.280..7539.688 rows62519 loops5)Sort Key: lineitem.l_shipmodeSort Method: external merge Disk: 2304kBWorker 0: Sort Method: external merge Disk: 2064kBWorker 1: Sort Method: external merge Disk: 2384kBWorker 2: Sort Method: external merge Disk: 2264kBWorker 3: Sort Method: external merge Disk: 2336kB-gt; Parallel Hash Join (cost382571.01..1957960.99 rows71838 width27) (actual time7036.917..7499.692 rows62519 loops5)Hash Cond: (lineitem.l_orderkey orders.o_orderkey)-gt; Parallel Seq Scan on lineitem (cost0.00..1552386.40 rows71838 width19) (actual time0.583..4901.063 rows62519 loops5)Filter: ((l_shipmode ANY ({MAIL,AIR}::bpchar[])) AND (l_commitdate lt; l_receiptdate) AND (l_shipdate lt; l_commitdate) AND (l_receiptdate gt; 1996-01-01::date) AND (l_receiptdate lt; 1997-01-01 00:00:00::timestamp without time zone))Rows Removed by Filter: 11934691-gt; Parallel Hash (cost313722.45..313722.45 rows3750045 width20) (actual time2011.518..2011.518 rows3000000 loops5)Buckets: 65536 Batches: 256 Memory Usage: 3840kB-gt; Parallel Seq Scan on orders (cost0.00..313722.45 rows3750045 width20) (actual time0.029..995.948 rows3000000 loops5)Planning Time: 0.977 msExecution Time: 7923.770 ms TPC-H中的SQL12是并行hash join的一个很好的哪里每个进程都帮助构建共享hash table。 9、Merge Join 由于merge join的特性使得不能并行。如果merge join是查询执行的最后阶段那么不用担心仍可以使用并行。 -- Query 2 from TPC-H explain (costs off) select s_acctbal, s_name, n_name, p_partkey, p_mfgr, s_address, s_phone, s_comment from part, supplier, partsupp, nation, region wherep_partkey ps_partkeyand s_suppkey ps_suppkeyand p_size 36and p_type like %BRASSand s_nationkey n_nationkeyand n_regionkey r_regionkeyand r_name AMERICAand ps_supplycost (selectmin(ps_supplycost)from partsupp, supplier, nation, regionwherep_partkey ps_partkeyand s_suppkey ps_suppkeyand s_nationkey n_nationkeyand n_regionkey r_regionkeyand r_name AMERICA) order by s_acctbal desc, n_name, s_name, p_partkey LIMIT 100;QUERY PLAN ----------------------------------------------------------------------------------------------------------Limit-gt; SortSort Key: supplier.s_acctbal DESC, nation.n_name, supplier.s_name, part.p_partkey-gt; Merge JoinMerge Cond: (part.p_partkey partsupp.ps_partkey)Join Filter: (partsupp.ps_supplycost (SubPlan 1))-gt; Gather MergeWorkers Planned: 4-gt; Parallel Index Scan using strongpart_pkey/strong on partFilter: (((p_type)::text ~~ %BRASS::text) AND (p_size 36))-gt; Materialize-gt; SortSort Key: partsupp.ps_partkey-gt; Nested Loop-gt; Nested LoopJoin Filter: (nation.n_regionkey region.r_regionkey)-gt; Seq Scan on regionFilter: (r_name AMERICA::bpchar)-gt; Hash JoinHash Cond: (supplier.s_nationkey nation.n_nationkey)-gt; Seq Scan on supplier-gt; Hash-gt; Seq Scan on nation-gt; Index Scan using idx_partsupp_suppkey on partsuppIndex Cond: (ps_suppkey supplier.s_suppkey)SubPlan 1-gt; Aggregate-gt; Nested LoopJoin Filter: (nation_1.n_regionkey region_1.r_regionkey)-gt; Seq Scan on region region_1Filter: (r_name AMERICA::bpchar)-gt; Nested Loop-gt; Nested Loop-gt; Index Scan using idx_partsupp_partkey on partsupp partsupp_1Index Cond: (part.p_partkey ps_partkey)-gt; Index Scan using supplier_pkey on supplier supplier_1Index Cond: (s_suppkey partsupp_1.ps_suppkey)-gt; Index Scan using nation_pkey on nation nation_1Index Cond: (n_nationkey supplier_1.s_nationkey) “Merge Join”节点在“Gather Merge”上。因此merge不使用并行。但是“Parallel Index Scan”仍旧有助于part_pkey。 10、Partition-wise join PgSQL11默认禁止partition-wise join特性。它有一个很高的规划代价。分区表可以一个分区一个分区的进行join。允许使用更小的hash table。每个per-partition join操作可以并行 tpch# set enable_partitionwise_joint; tpch# explain (costs off) select * from prt1 t1, prt2 t2 where t1.a t2.b and t1.b 0 and t2.b between 0 and 10000;QUERY PLAN ---------------------------------------------------Append-gt; Hash JoinHash Cond: (t2.b t1.a)-gt; Seq Scan on prt2_p1 t2Filter: ((b gt; 0) AND (b lt; 10000))-gt; Hash-gt; Seq Scan on prt1_p1 t1Filter: (b 0)-gt; Hash JoinHash Cond: (t2_1.b t1_1.a)-gt; Seq Scan on prt2_p2 t2_1Filter: ((b gt; 0) AND (b lt; 10000))-gt; Hash-gt; Seq Scan on prt1_p2 t1_1Filter: (b 0) tpch# set parallel_setup_cost 1; tpch# set parallel_tuple_cost 0.01; tpch# explain (costs off) select * from prt1 t1, prt2 t2 where t1.a t2.b and t1.b 0 and t2.b between 0 and 10000;QUERY PLAN -----------------------------------------------------------GatherWorkers Planned: 4-gt; Parallel Append-gt; Parallel Hash JoinHash Cond: (t2_1.b t1_1.a)-gt; Parallel Seq Scan on prt2_p2 t2_1Filter: ((b gt; 0) AND (b lt; 10000))-gt; Parallel Hash-gt; Parallel Seq Scan on prt1_p2 t1_1Filter: (b 0)-gt; Parallel Hash JoinHash Cond: (t2.b t1.a)-gt; Parallel Seq Scan on prt2_p1 t2Filter: ((b gt; 0) AND (b lt; 10000))-gt; Parallel Hash-gt; Parallel Seq Scan on prt1_p1 t1Filter: (b 0) 分区连接只有在分区足够大的情况下才能使用并行执行 11、Parallel Append Parallel Append通常在UNION ALL中。缺点较小的并行度因为每个worker进程最终都为一个查询服务。即使启用了4个进程也会仍旧发起2个 tpch# explain (costs off) select sum(l_quantity) as sum_qty from lineitem where l_shipdate lt; date 1998-12-01 - interval 105 day union all select sum(l_quantity) as sum_qty from lineitem where l_shipdate lt; date 2000-12-01 - interval 105 day;QUERY PLAN ------------------------------------------------------------------------------------------------GatherWorkers Planned: 2-gt; Parallel Append-gt; Aggregate-gt; Seq Scan on lineitemFilter: (l_shipdate lt; 2000-08-18 00:00:00::timestamp without time zone)-gt; Aggregate-gt; Seq Scan on lineitem lineitem_1Filter: (l_shipdate lt; 1998-08-18 00:00:00::timestamp without time zone) 12、更重要的变量 WORKER_MEM限制每个进程的使用内存。每个查询work_mem*processes*joins--会导致内存使用很大 max_parallel_workers_per_gather:执行器使用多少进程并发执行该节点 max_worker_processes根据服务器上CPU核数调整进程数 max_parallel_workers:和并发进程数一样 13、总结从9.6并行查询执行开始,可以显著提高扫描许多行或索引记录的复杂查询的性能。不要忘记在高oltp工作负载的服务器上禁止并行执行。顺序扫描或索引扫描仍然耗费大量资源。如果您没有针对整个数据集运行报表,那么只需添加缺失的索引或使用适当的分区就可以提高查询性能。原文 https://www.percona.com/blog/parallel-queries-in-postgresql/

查看全文

http://www.yingshimen.cn/news/66754/