Aggregation Functions
Functions that activate auto aggregation.
Data Setup
LOAD 'age';
SET search_path TO ag_catalog;
SELECT create_graph('graph_name');
SELECT * FROM cypher('graph_name', $$
CREATE (a:Person {name: 'A', age: 13}),
(b:Person {name: 'B', age: 33, eyes: "blue"}),
(c:Person {name: 'C', age: 44, eyes: "blue"}),
(d1:Person {name: 'D', eyes: "brown"}),
(d2:Person {name: 'D'}),
(a)-[:KNOWS]->(b),
(a)-[:KNOWS]->(c),
(a)-[:KNOWS]->(d1),
(b)-[:KNOWS]->(d2),
(c)-[:KNOWS]->(d2)
$$) as (a agtype);
min
min()
returns the minimum value in a set of values.
Syntax: min(expression)
Returns:
A property type, or a list, depending on the values returned by expression.
Arguments:
Name | Description |
expression | An expression returning a set containing any combination of property types and lists thereof. |
Considerations:
Any null values are excluded from the calculation.
In a mixed set, any string value is always considered to be lower than any numeric value, and any list is always considered to be lower than any string.
Lists are compared in dictionary order, i.e. list elements are compared pairwise in ascending order from the start of the list to the end.
min(null)
returns null.
Query
SELECT *
FROM cypher('graph_name', $$
MATCH (v:Person)
RETURN min(v.age)
$$) as (min_age agtype);
The lowest of all the values in the property age is returned.
Result:
min_age |
13 |
1 row(s) returned |
Using min()
with Lists
Data Setup:
To clarify the following example, assume the next three commands are run first:
SELECT * FROM cypher('graph_name', $$
CREATE (:min_test {val:'d'})
$$) as (result agtype);
SELECT * FROM cypher('graph_name', $$
CREATE (:min_test {val:['a', 'b', 23]})
$$) as (result agtype);
SELECT * FROM cypher('graph_name', $$
CREATE (:min_test {val:[1, 'b', 23]})
$$) as (result agtype);
Query
SELECT *
FROM cypher('graph_name', $$
MATCH (v:min_test)
RETURN min(v.val)
$$) as (min_val agtype);
The lowest of all the values in the set—in this case, the list [‘a’, ‘b’, 23]—is returned, as (i) the two lists are considered to be lower values than the string “d”, and (ii) the string “a” is considered to be a lower value than the numerical value 1.
Result:
min_age |
["a", "b", 23] |
1 row(s) returned |
max
max()
returns the maximum value in a set of values.
Syntax: max(expression)
Returns:
A property type, or a list, depending on the values returned by expression.
Arguments:
Name | Description |
expression | An expression returning a set containing any combination of property types and lists thereof. |
Considerations:
Any null values are excluded from the calculation.
In a mixed set, any numeric value is always considered to be higher than any string value, and any string value is always considered to be higher than any list.
Lists are compared in dictionary order, i.e. list elements are compared pairwise in ascending order from the start of the list to the end.
max(null)
returns null.
Query:
SELECT *
FROM cypher('graph_name', $$
MATCH (n:Person)
RETURN max(n.age)
$$) as (max_age agtype);
The highest of all the values in the property age is returned.
Result:
min_age |
44 |
1 row(s) returned |
stDev
stDev()
returns the standard deviation for the given value over a group. It uses a standard two-pass method, with N - 1 as the denominator, and should be used when taking a sample of the population for an unbiased estimate. When the standard deviation of the entire population is being calculated, stDevP
should be used.
Syntax: stDev(expression)
Returns:
An agtype float.
Arguments:
Name | Description |
expression | An agtype number expression |
Considerations:
Any null values are excluded from the calculation.
stDev(null)
returns 0.0 (zero).
Query
SELECT *
FROM cypher('graph_name', $$
MATCH (n:Person)
RETURN stDev(n.age)
$$) as (stdev_age agtype);
The standard deviation of the values in the property age is returned.
Result:
stdev_age |
15.716233645501712 |
1 row(s) returned |
stDevP
stDevP()
returns the standard deviation for the given value over a group. It uses a standard two-pass method, with N as the denominator, and should be used when calculating the standard deviation for an entire population. When the standard deviation of only a sample of the population is being calculated, stDev
should be used.
Syntax: stDevP(expression)
Returns:
An agtype float.
Arguments:
Name | Description |
expression | An agtype number expression |
Considerations:
Any null values are excluded from the calculation.
stDevP(null)
returns 0.0 (zero).
Query
SELECT *
FROM cypher('graph_name', $$
MATCH (n:Person)
RETURN stDevP(n.age)
$$) as (stdevp_age agtype);
The population standard deviation of the values in the property age is returned.
Result:
stdevp_age |
12.832251036613439 |
1 row(s) returned |
percentileCont
percentileCont()
returns the percentile of the given value over a group, with a percentile from 0.0 to 1.0. It uses a linear interpolation method, calculating a weighted average between two values if the desired percentile lies between them. For nearest values using a rounding method, see percentileDisc
.
Syntax: percentileCont(expression, percentile)
Returns:
An agtype float.
Arguments:
Name | Description |
expression | An agtype number expression |
percentile | An agtype number value between 0.0 and 1.0 |
Considerations:
Any null values are excluded from the calculation.
percentileCont(null, percentile)
returns null.
Query
SELECT *
FROM cypher('graph_name', $$
MATCH (n:Person)
RETURN percentileCont(n.age, 0.4)
$$) as (percentile_cont_age agtype);
The 40th percentile of the values in the property age is returned, calculated with a weighted average. In this case, 0.4 is the median, or 40th percentile.
Result:
percentile_cont_age |
29.0 |
1 row(s) returned |
percentileDisc
percentileDisc()
returns the percentile of the given value over a group, with a percentile from 0.0 to 1.0. It uses a rounding method and calculates the nearest value to the percentile. For interpolated values, see percentileCont
.
Syntax: percentileDisc(expression, percentile)
Returns:
An agtype float.
Arguments:
Name | Description |
expression | An agtype number expression |
percentile | An agtype number value between 0.0 and 1.0 |
Considerations:
Any null values are excluded from the calculation.
percentileDisc(null, percentile)
returns null.
Query
SELECT *
FROM cypher('graph_name', $$
MATCH (n:Person)
RETURN percentileDisc(n.age, 0.5)
$$) as (percentile_disc_age agtype);
The 50th percentile of the values in the property age is returned.
Result:
percentile_cont_age |
33.0 |
1 row(s) returned |
count
count()
returns the number of values or records, and appears in two variants:
count(*)
returns the number of matching recordscount(expr)
returns the number of non-null values returned by an expression.
Syntax: count(expression)
Returns:
An agtype integer.
Arguments:
Name | Description |
expression | An expression |
Considerations:
count(*)
includes records returning null.count(expr)
ignores null values.count(null)
returns 0 (zero).count(*)
can be used to return the number of nodes; for example, the number of nodes connected to some node n.
Query
SELECT *
FROM cypher('graph_name', $$
MATCH (n {name: 'A'})-[]->(x)
RETURN n.age, count(*)
$$) as (age agtype, number_of_people agtype);
The age property of the start node n (with a name value of ‘A’) and the number of nodes related to n are returned.
Result:
age | number_of_people |
13 | 3 |
1 row(s) returned |
Using count(*)
can be used to group and count relationship types, returning the number of relationships of each type.
Query
SELECT *
FROM cypher('graph_name', $$
MATCH (n {name: 'A'})-[r]->()
RETURN type(r), count(*)
$$) as (label agtype, count agtype);
The relationship type and the number of relationships with that type are returned.
Result:
label | count |
“KNOWS” | 3 |
1 row(s) returned |
Using count(expression)
to return the number of values
Instead of simply returning the number of records with count(*)
, it may be more useful to return the actual number of values returned by an expression.
Query
SELECT *
FROM cypher('graph_name', $$
MATCH (n {name: 'A'})-[]->(x)
RETURN count(x)
$$) as (count agtype);
The number of nodes connected to the start node n is returned.
Result:
count |
3 |
1 row(s) returned |
Counting non-null values
count(expression)
can be used to return the number of non-null values returned by the expression.
Query
SELECT *
FROM cypher('graph_name', $$
MATCH (n:Person)
RETURN count(n.age)
$$) as (count agtype);
The number of nodes with the label Person
that have a non-null value for the age property is returned.
Result:
count |
3 |
1 row(s) returned |
Counting with and without duplicates
In this example we are trying to find all our friends of friends, and count them:
The first aggregate function,
count(DISTINCT friend_of_friend)
, will only count afriend_of_friend
once, asDISTINCT
removes the duplicates.The second aggregate function,
count(friend_of_friend)
, will consider the samefriend_of_friend
multiple times.
Query
SELECT *
FROM cypher('graph_name', $$
MATCH (me:Person)-[]->(friend:Person)-[]->(friend_of_friend:Person)
WHERE me.name = 'A'
RETURN count(DISTINCT friend_of_friend), count(friend_of_friend)
$$) as (friend_of_friends_distinct agtype, friend_of_friends agtype);
Both B and C know D and thus D will get counted twice when not using DISTINCT
.
Result:
friend_of_friends_distinct | friend_of_friends |
1 | 2 |
1 row |
avg
avg()
returns the average of a set of numeric values.
Syntax: avg(expression)
Returns:
An agtype integer
Arguments:
Name | Description |
expression | An expression returning a set of numeric values. |
Considerations:
Any null values are excluded from the calculation.
avg(null)
returns null.
Query
SELECT *
FROM cypher('graph_name', $$
MATCH (n:Person)
RETURN avg(n.age)
$$) as (avg_age agtype);
The average of all the values in the property age is returned.
Result:
avg_age |
30.0 |
1 row(s) returned |
sum
sum()
returns the sum of a set of numeric values.
Syntax: sum(expression)
Returns:
An agtype float
Arguments:
Name | Description |
expression | An expression returning a set of numeric values. |
Considerations:
Any null values are excluded from the calculation.
sum(null)
returns null.
Query
SELECT *
FROM cypher('graph_name', $$
MATCH (n:Person)
RETURN sum(n.age)
$$) as (total_age agtype);
The sum of all the values in the property age is returned.
Result:
total_age |
90 |
1 row(s) returned |