# Aggregation Functions

Functions that activate auto aggregation.

## Data Setup

```
LOAD 'age';
SET search_path TO ag_catalog;
SELECT create_graph('graph_name');
SELECT * FROM cypher('graph_name', $$
CREATE (a:Person {name: 'A', age: 13}),
(b:Person {name: 'B', age: 33, eyes: "blue"}),
(c:Person {name: 'C', age: 44, eyes: "blue"}),
(d1:Person {name: 'D', eyes: "brown"}),
(d2:Person {name: 'D'}),
(a)-[:KNOWS]->(b),
(a)-[:KNOWS]->(c),
(a)-[:KNOWS]->(d1),
(b)-[:KNOWS]->(d2),
(c)-[:KNOWS]->(d2)
$$) as (a agtype);
```

## min

`min()`

returns the minimum value in a set of values.

Syntax: `min(expression)`

Returns:

```
A property type, or a list, depending on the values returned by expression.
```

Arguments:

Name | Description |

expression | An expression returning a set containing any combination of property types and lists thereof. |

Considerations:

Any null values are excluded from the calculation.

In a mixed set, any string value is always considered to be lower than any numeric value, and any list is always considered to be lower than any string.

Lists are compared in dictionary order, i.e. list elements are compared pairwise in ascending order from the start of the list to the end.

`min(null)`

returns null.

Query

```
SELECT *
FROM cypher('graph_name', $$
MATCH (v:Person)
RETURN min(v.age)
$$) as (min_age agtype);
```

The lowest of all the values in the property age is returned.

Result:

min_age |

13 |

1 row(s) returned |

### Using `min()`

with Lists

Data Setup:

To clarify the following example, assume the next three commands are run first:

```
SELECT * FROM cypher('graph_name', $$
CREATE (:min_test {val:'d'})
$$) as (result agtype);
SELECT * FROM cypher('graph_name', $$
CREATE (:min_test {val:['a', 'b', 23]})
$$) as (result agtype);
SELECT * FROM cypher('graph_name', $$
CREATE (:min_test {val:[1, 'b', 23]})
$$) as (result agtype);
```

Query

```
SELECT *
FROM cypher('graph_name', $$
MATCH (v:min_test)
RETURN min(v.val)
$$) as (min_val agtype);
```

The lowest of all the values in the set—in this case, the list [‘a’, ‘b’, 23]—is returned, as (i) the two lists are considered to be lower values than the string “d”, and (ii) the string “a” is considered to be a lower value than the numerical value 1.

Result:

min_age |

["a", "b", 23] |

1 row(s) returned |

## max

`max()`

returns the maximum value in a set of values.

Syntax: `max(expression)`

Returns:

```
A property type, or a list, depending on the values returned by expression.
```

Arguments:

Name | Description |

expression | An expression returning a set containing any combination of property types and lists thereof. |

Considerations:

Any null values are excluded from the calculation.

In a mixed set, any numeric value is always considered to be higher than any string value, and any string value is always considered to be higher than any list.

Lists are compared in dictionary order, i.e. list elements are compared pairwise in ascending order from the start of the list to the end.

`max(null)`

returns null.

Query:

```
SELECT *
FROM cypher('graph_name', $$
MATCH (n:Person)
RETURN max(n.age)
$$) as (max_age agtype);
```

The highest of all the values in the property age is returned.

Result:

min_age |

44 |

1 row(s) returned |

## stDev

`stDev()`

returns the standard deviation for the given value over a group. It uses a standard two-pass method, with N - 1 as the denominator, and should be used when taking a sample of the population for an unbiased estimate. When the standard deviation of the entire population is being calculated, `stDevP`

should be used.

Syntax: `stDev(expression)`

Returns:

```
An agtype float.
```

Arguments:

Name | Description |

expression | An agtype number expression |

Considerations:

Any null values are excluded from the calculation.

`stDev(null)`

returns 0.0 (zero).

Query

```
SELECT *
FROM cypher('graph_name', $$
MATCH (n:Person)
RETURN stDev(n.age)
$$) as (stdev_age agtype);
```

The standard deviation of the values in the property age is returned.

Result:

stdev_age |

15.716233645501712 |

1 row(s) returned |

## stDevP

`stDevP()`

returns the standard deviation for the given value over a group. It uses a standard two-pass method, with N as the denominator, and should be used when calculating the standard deviation for an entire population. When the standard deviation of only a sample of the population is being calculated, `stDev`

should be used.

Syntax: `stDevP(expression)`

Returns:

```
An agtype float.
```

Arguments:

Name | Description |

expression | An agtype number expression |

Considerations:

Any null values are excluded from the calculation.

`stDevP(null)`

returns 0.0 (zero).

Query

```
SELECT *
FROM cypher('graph_name', $$
MATCH (n:Person)
RETURN stDevP(n.age)
$$) as (stdevp_age agtype);
```

The population standard deviation of the values in the property age is returned.

Result:

stdevp_age |

12.832251036613439 |

1 row(s) returned |

## percentileCont

`percentileCont()`

returns the percentile of the given value over a group, with a percentile from 0.0 to 1.0. It uses a linear interpolation method, calculating a weighted average between two values if the desired percentile lies between them. For nearest values using a rounding method, see `percentileDisc`

.

Syntax: `percentileCont(expression, percentile)`

Returns:

```
An agtype float.
```

Arguments:

Name | Description |

expression | An agtype number expression |

percentile | An agtype number value between 0.0 and 1.0 |

Considerations:

Any null values are excluded from the calculation.

`percentileCont(null, percentile)`

returns null.

Query

```
SELECT *
FROM cypher('graph_name', $$
MATCH (n:Person)
RETURN percentileCont(n.age, 0.4)
$$) as (percentile_cont_age agtype);
```

The 40th percentile of the values in the property age is returned, calculated with a weighted average. In this case, 0.4 is the median, or 40th percentile.

Result:

percentile_cont_age |

29.0 |

1 row(s) returned |

## percentileDisc

`percentileDisc()`

returns the percentile of the given value over a group, with a percentile from 0.0 to 1.0. It uses a rounding method and calculates the nearest value to the percentile. For interpolated values, see `percentileCont`

.

Syntax: `percentileDisc(expression, percentile)`

Returns:

```
An agtype float.
```

Arguments:

Name | Description |

expression | An agtype number expression |

percentile | An agtype number value between 0.0 and 1.0 |

Considerations:

Any null values are excluded from the calculation.

`percentileDisc(null, percentile)`

returns null.

Query

```
SELECT *
FROM cypher('graph_name', $$
MATCH (n:Person)
RETURN percentileDisc(n.age, 0.5)
$$) as (percentile_disc_age agtype);
```

The 50th percentile of the values in the property age is returned.

Result:

percentile_cont_age |

33.0 |

1 row(s) returned |

## count

`count()`

returns the number of values or records, and appears in two variants:

`count(*)`

returns the number of matching records`count(expr)`

returns the number of non-null values returned by an expression.

Syntax: `count(expression)`

Returns:

```
An agtype integer.
```

Arguments:

Name | Description |

expression | An expression |

Considerations:

`count(*)`

includes records returning null.`count(expr)`

ignores null values.`count(null)`

returns 0 (zero).`count(*)`

can be used to return the number of nodes; for example, the number of nodes connected to some node n.

Query

```
SELECT *
FROM cypher('graph_name', $$
MATCH (n {name: 'A'})-[]->(x)
RETURN n.age, count(*)
$$) as (age agtype, number_of_people agtype);
```

The age property of the start node n (with a name value of ‘A’) and the number of nodes related to n are returned.

Result:

age | number_of_people |

13 | 3 |

1 row(s) returned |

Using `count(*)`

can be used to group and count relationship types, returning the number of relationships of each type.

Query

```
SELECT *
FROM cypher('graph_name', $$
MATCH (n {name: 'A'})-[r]->()
RETURN type(r), count(*)
$$) as (label agtype, count agtype);
```

The relationship type and the number of relationships with that type are returned.

Result:

label | count |

“KNOWS” | 3 |

1 row(s) returned |

### Using `count(expression)`

to return the number of values

Instead of simply returning the number of records with `count(*)`

, it may be more useful to return the actual number of values returned by an expression.

Query

```
SELECT *
FROM cypher('graph_name', $$
MATCH (n {name: 'A'})-[]->(x)
RETURN count(x)
$$) as (count agtype);
```

The number of nodes connected to the start node n is returned.

Result:

count |

3 |

1 row(s) returned |

### Counting non-null values

`count(expression)`

can be used to return the number of non-null values returned by the expression.

Query

```
SELECT *
FROM cypher('graph_name', $$
MATCH (n:Person)
RETURN count(n.age)
$$) as (count agtype);
```

The number of nodes with the label `Person`

that have a non-null value for the age property is returned.

Result:

count |

3 |

1 row(s) returned |

### Counting with and without duplicates

In this example we are trying to find all our friends of friends, and count them:

The first aggregate function,

`count(DISTINCT friend_of_friend)`

, will only count a`friend_of_friend`

once, as`DISTINCT`

removes the duplicates.The second aggregate function,

`count(friend_of_friend)`

, will consider the same`friend_of_friend`

multiple times.

Query

```
SELECT *
FROM cypher('graph_name', $$
MATCH (me:Person)-[]->(friend:Person)-[]->(friend_of_friend:Person)
WHERE me.name = 'A'
RETURN count(DISTINCT friend_of_friend), count(friend_of_friend)
$$) as (friend_of_friends_distinct agtype, friend_of_friends agtype);
```

Both B and C know D and thus D will get counted twice when not using `DISTINCT`

.

Result:

friend_of_friends_distinct | friend_of_friends |

1 | 2 |

1 row |

## avg

`avg()`

returns the average of a set of numeric values.

Syntax: `avg(expression)`

Returns:

```
An agtype integer
```

Arguments:

Name | Description |

expression | An expression returning a set of numeric values. |

Considerations:

Any null values are excluded from the calculation.

`avg(null)`

returns null.

Query

```
SELECT *
FROM cypher('graph_name', $$
MATCH (n:Person)
RETURN avg(n.age)
$$) as (avg_age agtype);
```

The average of all the values in the property age is returned.

Result:

avg_age |

30.0 |

1 row(s) returned |

## sum

`sum()`

returns the sum of a set of numeric values.

Syntax: `sum(expression)`

Returns:

```
An agtype float
```

Arguments:

Name | Description |

expression | An expression returning a set of numeric values. |

Considerations:

Any null values are excluded from the calculation.

`sum(null)`

returns null.

Query

```
SELECT *
FROM cypher('graph_name', $$
MATCH (n:Person)
RETURN sum(n.age)
$$) as (total_age agtype);
```

The sum of all the values in the property age is returned.

Result:

total_age |

90 |

1 row(s) returned |