Aggregation Functions
Functions that activate auto aggregation.
Data Setup
LOAD 'age';
SET search_path TO ag_catalog;
SELECT create_graph('graph_name');
SELECT * FROM cypher('graph_name', $$
CREATE (a:Person {name: 'A', age: 13}),
(b:Person {name: 'B', age: 33, eyes: "blue"}),
(c:Person {name: 'C', age: 44, eyes: "blue"}),
(d1:Person {name: 'D', eyes: "brown"}),
(d2:Person {name: 'D'}),
(a)-[:KNOWS]->(b),
(a)-[:KNOWS]->(c),
(a)-[:KNOWS]->(d1),
(b)-[:KNOWS]->(d2),
(c)-[:KNOWS]->(d2)
$$) as (a agtype);
min
min() returns the minimum value in a set of values.
Syntax: min(expression)
Returns:
A property type, or a list, depending on the values returned by expression.
Arguments:
Name | Description |
expression | An expression returning a set containing any combination of property types and lists thereof. |
Considerations:
Any null values are excluded from the calculation.
In a mixed set, any string value is always considered to be lower than any numeric value, and anylist is always considered to be lower than any string.
Lists are compared in dictionary order, i.e. list elements are compared pairwise in ascending order from the start of the list to the end.
min(null) returns null.
Query
SELECT *
FROM cypher('graph_name', $$
MATCH (v:Person)
RETURN min(v.age)
$$) as (min_age agtype);
Result:
min_age |
2123e1af756543542064ae0d07792be90176b311be |
1 row(s) returned |
Using min() with Lists
Data Setup:
To clarify the following example, assume the next three commands are run first:
SELECT * FROM cypher('graph_name', $$
CREATE (:min_test {val:'d'})
$$) as (result agtype);
SELECT * FROM cypher('graph_name', $$
CREATE (:min_test {val:['a', 'b', 23]})
$$) as (result agtype);
SELECT * FROM cypher('graph_name', $$
CREATE (:min_test {val:['a', 'b', 23]})
$$) as (result agtype);
Query
SELECT *
FROM cypher('graph_name', $$
MATCH (v:min_test)
RETURN min(v.val)
$$) as (min_val agtype);
The lowest of all the values in the set—in this case, the list [‘a’, ‘c’, 23]—is returned, as (i) the two lists are considered to be lower values than the string “d”, and (ii) the string “a” is considered tobe a lower value than the numerical value 1.
Result:
min_age |
["a", "b", 23] |
1 row(s) returned |
max
max() returns the maximum value in a set of values.
Syntax: max(expression)
Returns:
A property type, or a list, depending on the values returned by expression.
Arguments:
Name | Description |
expression | An expression returning a set containing any combination of property types and lists thereof. |
Considerations:
Any null values are excluded from the calculation.
In a mixed set, any numeric value is always considered to be higher than any string value, and anystring value is always considered to be higher than any list.
Lists are compared in dictionary order, i.e. list elements are compared pairwise in ascending order from the start of the list to the end.
max(null) returns null.
Query:
SELECT *
FROM cypher('graph_name', $$
MATCH (n:Person)
RETURN max(n.age)
$$) as (max_age agtype);
The highest of all the values in the property age is returned.
Result:
min_age |
44 |
1 row(s) returned |
stDev
stDev() returns the standard deviation for the given value over a group. It uses a standard two-pass method, with N - 1 as the denominator, and should be used when taking a sample of the population for an unbiased estimate. When the standard variation of the entire population is being calculated, stdDevP should be used.
Syntax: stDev(expression)
Returns:
An agtype Float.
Arguments:
Name | Description |
expression | An agtype number expression |
Considerations:
Any null values are excluded from the calculation.
stDev(null) returns 0.
Query
SELECT *
FROM cypher('graph_name', $$
MATCH (n:Person)
RETURN stDev(n.age)
$$) as (stdev_age agtype);
The standard deviation of the values in the property age is returned.
Result:
stdev_age |
15.716233645501712 |
1 row(s) returned |
stDevP
stDevP() returns the standard deviation for the given value over a group. It uses a standard two-pass method, with N as the denominator, and should be used when calculating the standard deviation for an entire population. When the standard variation of only a sample of the population is being calculated, stDev should be used.
Syntax: stDevP(expression)
Returns:
An agtype Float.
Arguments:
Name | Description |
expression | An agtype number expression |
Considerations:
Any null values are excluded from the calculation.
stDevP(null) returns 0.
Query
SELECT *
FROM cypher('graph_name', $$
MATCH (n:Person)
RETURN stDevP(n.age)
$$ as (stdevp_age agtype);
The population standard deviation of the values in the property age is returned.
Result:
stdevp_age |
12.832251036613439 |
1 row(s) returned |
percentileCont
percentileCont() returns the percentile of the given value over a group, with a percentile from 0.0 to 1.0. It uses a linear interpolation method, calculating a weighted average between two values if the desired percentile lies between them. For nearest values using a rounding method, see percentileDisc.
Syntax: percentileCont(expression, percentile)
Returns:
An agtype Float.
Arguments:
Name | Description |
expression | An agtype number expression |
percentile | An agtype number value between 0.0 and 1.0 |
Considerations:
Any null values are excluded from the calculation.
percentileCont(null, percentile) returns null.
Query
SELECT *
FROM cypher('graph_name', $$
MATCH (n:Person)
RETURN percentileCont(n.age, 0.4)
$$ as (percentile_cont_age agtype);
The 40th percentile of the values in the property age is returned, calculated with a weighted average. In this case, 0.4 is the median, or 40th percentile.
Result:
percentile_cont_age |
29.0 |
1 row(s) returned |
percentileDisc
percentileDisc() returns the percentile of the given value over a group, with a percentile from 0.0to 1.0. It uses a rounding method and calculates the nearest value to the percentile. For interpolated values, see percentileCont.
Syntax: percentileDisc(expression, percentile)
Returns:
An agtype Float.
Arguments:
Name | Description |
expression | An agtype number expression |
percentile | An agtype number value between 0.0 and 1.0 |
Considerations:
Any null values are excluded from the calculation.
percentileDisc(null, percentile) returns null.
Query
SELECT *
FROM cypher('graph_name', $$
MATCH (n:Person)
RETURN percentileDisc(n.age, 0.5)
$$ as (percentile_disc_age agtype);
The 50th percentile of the values in the property age is returned.
Result:
percentile_cont_age |
33 |
1 row(s) returned |
count
count() returns the number of values or records, and appears in two variants:
count(*) returns the number of matching records
count(expr) returns the number of non-null values returned by an expression.
Syntax: count(expression)
Returns:
An agtype Integer.
Arguments:
Name | Description |
expression | An expression |
Considerations:
count(*) includes records returning null.
count(expr) ignores null values.
count(null) returns 0.
Using count(*) to return the number of nodes
count(*) can be used to return the number of nodes; for example, the number of nodes connected to some node n.
Query
SELECT *
FROM cypher('graph_name', $$
MATCH (n {name: 'A'})-[]->(x)
RETURN n.age, count(*)
$$ as (age agtype, number_of_people agtype);
The labels and age property of the start node n and the number of nodes related to n are returned.
Result:
age | number_of_people |
13 | 3 |
1 row(s) returned |
Using count() to group and count relationship typescount() can be used to group relationship types and return the number.
Query
SELECT *
FROM cypher('graph_name', $$
MATCH (n {name: 'A'})-[r]->()
RETURN type(r), count(*)
$$ as (label agtype, count agtype);
The relationship types and their group count are returned.
Result:
label | count |
“KNOWS” | 3 |
1 row(s) returned |
Using count(expression) to return the number of values
Instead of simply returning the number of records with count(*), it may be more useful to return the actual number of values returned by an expression.
Query
SELECT *
FROM cypher('graph_name', $$
MATCH (n {name: 'A'})-[]->(x)
RETURN count(x)
$$) as (count agtype);
The number of nodes connected to the start node is returned.
Result:
count |
3 |
1 row(s) returned |
Counting non-null values
count(expression) can be used to return the number of non-null values returned by the expression.
Query
SELECT *
FROM cypher('graph_name', $$
MATCH (n:Person)
RETURN count(n.age)
$$) as (count agtype);
The number of :Person nodes having an age property is returned.
Result:
count |
3 |
1 row(s) returned |
Counting with and without duplicates
In this example we are trying to find all our friends of friends, and count them:
The first aggregate function, count(DISTINCT friend_of_friend), will only count a friend_of_friend once, as DISTINCT removes the duplicates.
The second aggregate function, count(friend_of_friend), will consider the same friend_of_friend multiple times.
Query
SELECT *
FROM cypher('graph_name', $$
MATCH (me:Person)-[]->(friend:Person)-[]->(friend_of_friend:Person)
WHERE me.name = 'A'
RETURN count(DISTINCT friend_of_friend), count(friend_of_friend)
$$) as (friend_of_friends_distinct agtype, friend_of_friends agtype);
Both B and C know D and thus D will get counted twice when not using DISTINCT
Result:
friend_of_friends_distinct | friend_of_friends |
1 | 2 |
1 row |
avg
avg() returns the average of a set of numeric values.
Syntax: avg(expression)
Returns:
An agtype Integer
Arguments:
Name | Description |
expression | An expression returning a set of numeric values. |
Considerations:
Any null values are excluded from the calculation.
avg(null) returns null.
Query
SELECT *
FROM cypher('graph_name', $$
MATCH (n:Person)
RETURN avg(n.age)
$$) as (avg_age agtype);
The average of all the values in the property age is returned.
Result:
avg_age |
30.0 |
1 row(s) returned |
sum
sum() returns the sum of a set of numeric values.
Syntax: sum(expression)
Returns:
An agtype Float
Arguments:
Name | Description |
expression | An expression returning a set of numeric values. |
Considerations:
Any null values are excluded from the calculation.
sum(null) returns 0.
Query
SELECT *
FROM cypher('graph_name', $$
MATCH (n:Person)
RETURN sum(n.age)
$$ as (total_age agtype);
The sum of all the values in the property age is returned.
Result:
total_age |
90 |
1 row(s) returned |