聚合查询
> 文档中心 > 文档中心 > INFINI Easysearch > 功能手册 > SQL 查询 > 聚合查询

聚合查询 #

介绍 #

聚合函数作用于一组值。它们通常与GROUP BY子句一起使用,将值分组为子集。

GROUP BY 子句 #

GROUP BY 表达式可以是:

  1. 标识符:Identifier
  2. 序数:Ordinal
  3. 表达式:Expression

标识符 #

group by 表达式可以是标识符:

os> SELECT gender, sum(age) FROM accounts GROUP BY gender;
fetched rows / total rows = 2/2
+----------+------------+
| gender   | sum(age)   |
|----------+------------|
| F        | 28         |
| M        | 101        |
+----------+------------+

序数 #

group by 表达式可以是序数:

os> SELECT gender, sum(age) FROM accounts GROUP BY 1;
fetched rows / total rows = 2/2
+----------+------------+
| gender   | sum(age)   |
|----------+------------|
| F        | 28         |
| M        | 101        |
+----------+------------+

group by 表达式可以是一个表达式。

os> SELECT abs(account_number), sum(age) FROM accounts GROUP BY abs(account_number);
fetched rows / total rows = 4/4
+-----------------------+------------+
| abs(account_number)   | sum(age)   |
|-----------------------+------------|
| 1                     | 32         |
| 13                    | 28         |
| 18                    | 33         |
| 6                     | 36         |
+-----------------------+------------+

聚合 #

  1. 聚合可以用于select。
  2. 聚合可以作为表达式的参数。
  3. 聚合可以包含表达式作为参数。

查询中的聚合 #

聚合可以用于select。

os> SELECT gender, sum(age) FROM accounts GROUP BY gender;
fetched rows / total rows = 2/2
+----------+------------+
| gender   | sum(age)   |
|----------+------------|
| F        | 28         |
| M        | 101        |
+----------+------------+

聚合表达式 #

聚合可以用作表达式的参数:

os> SELECT gender, sum(age) * 2 as sum2 FROM accounts GROUP BY gender;
fetched rows / total rows = 2/2
+----------+--------+
| gender   | sum2   |
|----------+--------|
| F        | 56     |
| M        | 202    |
+----------+--------+

表达式作为聚合参数 #

聚合可以将表达式作为参数:

os> SELECT gender, sum(age * 2) as sum2 FROM accounts GROUP BY gender;
fetched rows / total rows = 2/2
+----------+--------+
| gender   | sum2   |
|----------+--------|
| F        | 56     |
| M        | 202    |
+----------+--------+

COUNT 聚合 #

除了常规标识符,COUNT 聚合函数还接受诸如 * 或字面量如 1 的参数。这些不同形式的含义如下:

  1. COUNT(field) 只有当给定字段(或表达式)在输入行中不为 null 或缺失时才会计数。
  2. COUNT(*) 将计算其所有输入行的数量。
  3. COUNT(1)COUNT(*) 相同,因为任何非 null 的字面量都会被计数。

聚合函数 #

COUNT #

用法:返回 SELECT 语句检索到的行中 expr 的计数。

Example:

os> SELECT gender, count(*) as countV FROM accounts GROUP BY gender;
fetched rows / total rows = 2/2
+----------+----------+
| gender   | countV   |
|----------+----------|
| F        | 1        |
| M        | 3        |
+----------+----------+

SUM #

用法:SUM(expr). 返回 expr 的总和。

Example:

os> SELECT gender, sum(age) as sumV FROM accounts GROUP BY gender;
fetched rows / total rows = 2/2
+----------+--------+
| gender   | sumV   |
|----------+--------|
| F        | 28     |
| M        | 101    |
+----------+--------+

AVG #

用法: AVG(expr). 返回 expr 的平均值。

Example:

os> SELECT gender, avg(age) as avgV FROM accounts GROUP BY gender;
fetched rows / total rows = 2/2
+----------+--------------------+
| gender   | avgV               |
|----------+--------------------|
| F        | 28.0               |
| M        | 33.666666666666664 |
+----------+--------------------+

MAX #

用法: MAX(expr). 返回 expr 的最大值。

Example:

os> SELECT max(age) as maxV FROM accounts;
fetched rows / total rows = 1/1
+--------+
| maxV   |
|--------|
| 36     |
+--------+

MIN #

用法: MIN(expr). 返回 expr 的最小值。

Example:

os> SELECT min(age) as minV FROM accounts;
fetched rows / total rows = 1/1
+--------+
| minV   |
|--------|
| 28     |
+--------+

VAR_POP #

用法:VAR_POP(expr). 返回 expr 的总体标准方差。

Example:

os> SELECT var_pop(age) as varV FROM accounts;
fetched rows / total rows = 1/1
+--------+
| varV   |
|--------|
| 8.1875 |
+--------+

VAR_SAMP #

用法:VAR_SAMP(expr). 返回 expr 的样本方差。

Example:

os> SELECT var_samp(age) as varV FROM accounts;
fetched rows / total rows = 1/1
+--------------------+
| varV               |
|--------------------|
| 10.916666666666666 |
+--------------------+

VARIANCE #

用法:VARIANCE(expr). 返回 expr 的总体标准方差。VARIANCE() 是 VAR_POP() 函数的同义词。

Example:

os> SELECT variance(age) as varV FROM accounts;
fetched rows / total rows = 1/1
+--------+
| varV   |
|--------|
| 8.1875 |
+--------+

STDDEV_POP #

用法:STDDEV_POP(expr). 返回 expr 的总体标准差。

Example:

os> SELECT stddev_pop(age) as stddevV FROM accounts;
fetched rows / total rows = 1/1
+--------------------+
| stddevV            |
|--------------------|
| 2.8613807855648994 |
+--------------------+

STDDEV_SAMP #

用法:STDDEV_SAMP(expr). 返回 expr 的样本标准差。

Example:

os> SELECT stddev_samp(age) as stddevV FROM accounts;
fetched rows / total rows = 1/1
+-------------------+
| stddevV           |
|-------------------|
| 3.304037933599835 |
+-------------------+

STD #

用法:STD(expr). 返回 expr 的总体标准差。STD() 是STDDEV_POP() 函数的同义词。

Example:

os> SELECT stddev_pop(age) as stddevV FROM accounts;
fetched rows / total rows = 1/1
+--------------------+
| stddevV            |
|--------------------|
| 2.8613807855648994 |
+--------------------+

STDDEV #

用法:STDDEV(expr). 返回 expr 的总体标准差。STDDEV() 是 STDDEV_POP() 函数的同义词。

Example:

os> SELECT stddev(age) as stddevV FROM accounts;
fetched rows / total rows = 1/1
+--------------------+
| stddevV            |
|--------------------|
| 2.8613807855648994 |
+--------------------+

DISTINCT COUNT 聚合 #

要获取一个字段不同值的计数,可以在count聚合中的字段前添加关键字 DISTINCT

Example:

os> SELECT COUNT(DISTINCT gender), COUNT(gender) FROM accounts;
fetched rows / total rows = 1/1
+--------------------------+-----------------+
| COUNT(DISTINCT gender)   | COUNT(gender)   |
|--------------------------+-----------------|
| 2                        | 4               |
+--------------------------+-----------------+

HAVING 子句 #

HAVING子句可以作为聚合过滤器,过滤掉不满足给定条件表达式的聚合值。

带 GROUP BY 的 HAVING #

SELECT 子句中定义的聚合表达式或其别名可以在 HAVING 条件中使用。

  1. 尽管在 HAVING 子句中允许使用非聚合表达式,但建议在 WHERE 中使用非聚合表达式。
  2. HAVING 子句中的聚合并不一定与选择列表中的聚合相同。作为对 SQL 标准的扩展,它也不限于只涉及 group by 列表上的标识符。

以下是一个典型使用 HAVING 子句的示例:

os> SELECT
...  gender, sum(age)
... FROM accounts
... GROUP BY gender
... HAVING sum(age) > 100;
fetched rows / total rows = 1/1
+----------+------------+
| gender   | sum(age)   |
|----------+------------|
| M        | 101        |
+----------+------------+

以下是另一个在 HAVING 条件中使用别名的例子。请注意,如果一个标识符是模糊的,例如既作为选择别名又作为索引字段出现,那么优先考虑别名。这意味着标识符将被 SELECT 子句中的别名表达式替换:

os> SELECT
...  gender, sum(age) AS s
... FROM accounts
... GROUP BY gender
... HAVING s > 100;
fetched rows / total rows = 1/1
+----------+-----+
| gender   | s   |
|----------+-----|
| M        | 101 |
+----------+-----+

不带 GROUP BY 的 HAVING #

此外,HAVING 子句也可以在没有GROUP BY 子句的情况下使用。这很有用,因为聚合表达式不能出现在 WHERE 子句中。

os> SELECT
...  'Total of age > 100'
... FROM accounts
... HAVING sum(age) > 100;
fetched rows / total rows = 1/1
+------------------------+
| 'Total of age > 100'   |
|------------------------|
| Total of age > 100     |
+------------------------+

FILTER 子句 #

FILTER 子句可以按照语法 aggregation_function(expr) FILTER(WHERE condition_expr) 为当前聚合存储桶设置特定条件。如果指定了过滤器,则只有过滤器子句中的条件计算结果为 true 的输入行才会馈送到聚合函数;其他行将被丢弃。 带筛选器子句的聚合只能在 SELECT 子句中使用。

带 GROUP BY 的 FILTER #

带有 FILTER 子句的 group by 聚合可以为每个聚合桶设置不同的条件。以下是在 group by 聚合中使用 FILTER 的一个例子:

os> SELECT avg(age) FILTER(WHERE balance > 10000) AS filtered, gender FROM accounts GROUP BY gender
fetched rows / total rows = 2/2
+------------+----------+
| filtered   | gender   |
|------------+----------|
| 28.0       | F        |
| 32.0       | M        |
+------------+----------+

不带 GROUP BY 的 FILTER #

FILTER 子句也可以在没有 GROUP BY 的情况下用于聚合函数。例如:

os> SELECT
...   count(*) AS unfiltered,
...   count(*) FILTER(WHERE age > 34) AS filtered
... FROM accounts
fetched rows / total rows = 1/1
+--------------+------------+
| unfiltered   | filtered   |
|--------------+------------|
| 4            | 1          |
+--------------+------------+

带有 FILTER 的 Distinct count 聚合

FILTER 子句也用在 distinct count 中,在计算特定字段的不同值之前进行过滤。例如:

os> SELECT COUNT(DISTINCT firstname) FILTER(WHERE age > 30) AS distinct_count FROM accounts
fetched rows / total rows = 1/1
+------------------+
| distinct_count   |
|------------------|
| 3                |
+------------------+