Aggregation Functions

Functions that activate auto aggregation.

Data Setup

LOAD 'age';
SET search_path TO ag_catalog;

SELECT create_graph('graph_name');

SELECT * FROM cypher('graph_name', $$
	CREATE (a:Person {name: 'A', age: 13}),
	(b:Person {name: 'B', age: 33, eyes: "blue"}),
	(c:Person {name: 'C', age: 44, eyes: "blue"}),
	(d1:Person {name: 'D', eyes: "brown"}),
	(d2:Person {name: 'D'}),
	(a)-[:KNOWS]->(b),
	(a)-[:KNOWS]->(c),
	(a)-[:KNOWS]->(d1),
	(b)-[:KNOWS]->(d2),
	(c)-[:KNOWS]->(d2)
$$) as (a agtype);

min

min() returns the minimum value in a set of values.

Syntax: min(expression)

Returns:

A property type, or a list, depending on the values returned by expression.

Arguments:

Name Description
expression An expression returning a set containing any combination of property types and lists thereof.

Considerations:

  • Any null values are excluded from the calculation.

  • In a mixed set, any string value is always considered to be lower than any numeric value, and any list is always considered to be lower than any string.

  • Lists are compared in dictionary order, i.e. list elements are compared pairwise in ascending order from the start of the list to the end.

  • min(null) returns null.

Query

SELECT *
FROM cypher('graph_name', $$
    MATCH (v:Person)
    RETURN min(v.age)
$$) as (min_age agtype);

The lowest of all the values in the property age is returned.

Result:

min_age
13
1 row(s) returned

Using min() with Lists

Data Setup:

To clarify the following example, assume the next three commands are run first:

SELECT * FROM cypher('graph_name', $$ 
    CREATE (:min_test {val:'d'})
$$) as (result agtype);

SELECT * FROM cypher('graph_name', $$
    CREATE (:min_test {val:['a', 'b', 23]})
$$) as (result agtype);

SELECT * FROM cypher('graph_name', $$ 
    CREATE (:min_test {val:[1, 'b', 23]})
$$) as (result agtype);

Query

SELECT *
FROM cypher('graph_name', $$
    MATCH (v:min_test)
    RETURN min(v.val)
$$) as (min_val agtype);

The lowest of all the values in the set—in this case, the list [‘a’, ‘b’, 23]—is returned, as (i) the two lists are considered to be lower values than the string “d”, and (ii) the string “a” is considered to be a lower value than the numerical value 1.

Result:

min_age
["a", "b", 23]
1 row(s) returned

max

max() returns the maximum value in a set of values.

Syntax: max(expression)

Returns:

A property type, or a list, depending on the values returned by expression.

Arguments:

Name Description
expression An expression returning a set containing any combination of property types and lists thereof.

Considerations:

  • Any null values are excluded from the calculation.

  • In a mixed set, any numeric value is always considered to be higher than any string value, and any string value is always considered to be higher than any list.

  • Lists are compared in dictionary order, i.e. list elements are compared pairwise in ascending order from the start of the list to the end.

  • max(null) returns null.

Query:

SELECT *
FROM cypher('graph_name', $$
    MATCH (n:Person)
    RETURN max(n.age)
$$) as (max_age agtype);

The highest of all the values in the property age is returned.

Result:

min_age
44
1 row(s) returned

stDev

stDev() returns the standard deviation for the given value over a group. It uses a standard two-pass method, with N - 1 as the denominator, and should be used when taking a sample of the population for an unbiased estimate. When the standard deviation of the entire population is being calculated, stDevP should be used.

Syntax: stDev(expression)

Returns:

An agtype float.

Arguments:

Name Description
expression An agtype number expression

Considerations:

  • Any null values are excluded from the calculation.

  • stDev(null) returns 0.0 (zero).

Query

SELECT *
FROM cypher('graph_name', $$
   MATCH (n:Person)
   RETURN stDev(n.age)
$$) as (stdev_age agtype);

The standard deviation of the values in the property age is returned.

Result:

stdev_age
15.716233645501712
1 row(s) returned

stDevP

stDevP() returns the standard deviation for the given value over a group. It uses a standard two-pass method, with N as the denominator, and should be used when calculating the standard deviation for an entire population. When the standard deviation of only a sample of the population is being calculated, stDev should be used.

Syntax: stDevP(expression)

Returns:

An agtype float.

Arguments:

Name Description
expression An agtype number expression

Considerations:

  • Any null values are excluded from the calculation.

  • stDevP(null) returns 0.0 (zero).

Query

SELECT *
FROM cypher('graph_name', $$
    MATCH (n:Person)
    RETURN stDevP(n.age)
$$) as (stdevp_age agtype);

The population standard deviation of the values in the property age is returned.

Result:

stdevp_age
12.832251036613439
1 row(s) returned

percentileCont

percentileCont() returns the percentile of the given value over a group, with a percentile from 0.0 to 1.0. It uses a linear interpolation method, calculating a weighted average between two values if the desired percentile lies between them. For nearest values using a rounding method, see percentileDisc.

Syntax: percentileCont(expression, percentile)

Returns:

An agtype float.

Arguments:

Name Description
expression An agtype number expression
percentile An agtype number value between 0.0 and 1.0

Considerations:

  • Any null values are excluded from the calculation.

  • percentileCont(null, percentile) returns null.

Query

SELECT *
FROM cypher('graph_name', $$
    MATCH (n:Person)
    RETURN percentileCont(n.age, 0.4)
$$) as (percentile_cont_age agtype);

The 40th percentile of the values in the property age is returned, calculated with a weighted average. In this case, 0.4 is the median, or 40th percentile.

Result:

percentile_cont_age
29.0
1 row(s) returned

percentileDisc

percentileDisc() returns the percentile of the given value over a group, with a percentile from 0.0 to 1.0. It uses a rounding method and calculates the nearest value to the percentile. For interpolated values, see percentileCont.

Syntax: percentileDisc(expression, percentile)

Returns:

An agtype float.

Arguments:

Name Description
expression An agtype number expression
percentile An agtype number value between 0.0 and 1.0

Considerations:

  • Any null values are excluded from the calculation.

  • percentileDisc(null, percentile) returns null.

Query

SELECT *
FROM cypher('graph_name', $$
    MATCH (n:Person)
    RETURN percentileDisc(n.age, 0.5)
$$) as (percentile_disc_age agtype);

The 50th percentile of the values in the property age is returned.

Result:

percentile_cont_age
33.0
1 row(s) returned

count

count() returns the number of values or records, and appears in two variants:

  • count(*) returns the number of matching records

  • count(expr) returns the number of non-null values returned by an expression.

Syntax: count(expression)

Returns:

An agtype integer.

Arguments:

Name Description
expression An expression

Considerations:

  • count(*) includes records returning null.

  • count(expr) ignores null values.

  • count(null) returns 0 (zero).

  • count(*) can be used to return the number of nodes; for example, the number of nodes connected to some node n.

Query

SELECT *
FROM cypher('graph_name', $$
    MATCH (n {name: 'A'})-[]->(x)
    RETURN n.age, count(*)
$$) as (age agtype, number_of_people agtype);

The age property of the start node n (with a name value of ‘A’) and the number of nodes related to n are returned.

Result:

age number_of_people
13 3
1 row(s) returned

Using count(*) can be used to group and count relationship types, returning the number of relationships of each type.

Query

SELECT *
FROM cypher('graph_name', $$
    MATCH (n {name: 'A'})-[r]->()
    RETURN type(r), count(*)
$$) as (label agtype, count agtype);

The relationship type and the number of relationships with that type are returned.

Result:

label count
“KNOWS” 3
1 row(s) returned

Using count(expression) to return the number of values

Instead of simply returning the number of records with count(*), it may be more useful to return the actual number of values returned by an expression.

Query

SELECT *
FROM cypher('graph_name', $$
    MATCH (n {name: 'A'})-[]->(x)
    RETURN count(x)
$$) as (count agtype);

The number of nodes connected to the start node n is returned.

Result:

count
3
1 row(s) returned

Counting non-null values

count(expression) can be used to return the number of non-null values returned by the expression.

Query

SELECT *
FROM cypher('graph_name', $$
    MATCH (n:Person)
    RETURN count(n.age)
$$) as (count agtype);

The number of nodes with the label Person that have a non-null value for the age property is returned.

Result:

count
3
1 row(s) returned

Counting with and without duplicates

In this example we are trying to find all our friends of friends, and count them:

  • The first aggregate function, count(DISTINCT friend_of_friend), will only count a friend_of_friend once, as DISTINCT removes the duplicates.

  • The second aggregate function, count(friend_of_friend), will consider the same friend_of_friend multiple times.

Query

SELECT *
FROM cypher('graph_name', $$
	MATCH (me:Person)-[]->(friend:Person)-[]->(friend_of_friend:Person)
	WHERE me.name = 'A'
	RETURN count(DISTINCT friend_of_friend), count(friend_of_friend)
$$) as (friend_of_friends_distinct agtype, friend_of_friends agtype);

Both B and C know D and thus D will get counted twice when not using DISTINCT.

Result:

friend_of_friends_distinct friend_of_friends
1 2
1 row

avg

avg() returns the average of a set of numeric values.

Syntax: avg(expression)

Returns:

An agtype integer

Arguments:

Name Description
expression An expression returning a set of numeric values.

Considerations:

  • Any null values are excluded from the calculation.

  • avg(null) returns null.

Query

SELECT *
FROM cypher('graph_name', $$
MATCH (n:Person)
RETURN avg(n.age)
$$) as (avg_age agtype);

The average of all the values in the property age is returned.

Result:

avg_age
30.0
1 row(s) returned

sum

sum() returns the sum of a set of numeric values.

Syntax: sum(expression)

Returns:

An agtype float

Arguments:

Name Description
expression An expression returning a set of numeric values.

Considerations:

  • Any null values are excluded from the calculation.

  • sum(null) returns null.

Query

SELECT *
FROM cypher('graph_name', $$
MATCH (n:Person)
RETURN sum(n.age)
$$) as (total_age agtype);

The sum of all the values in the property age is returned.

Result:

total_age
90
1 row(s) returned