IBM Cloud Docs
DataPrime aggregation expressions

DataPrime aggregation expressions

This guide provides a glossary of IBM® Cloud Logs DataPrime expressions that you can use for aggregation.

any_value

Returns any non-null expression value in the group. If expression is not defined, it defaults to the $data object.

any_value(expression: any?)

Returns null if all expression values in the group are null.

Example:

groupby $m.severity calculate any_value($d.url)

avg

Calculates the average value of a numerical expression in the group.

avg(expression: number)

Example:

groupby $m.severity calculate avg($d.duration) as average_duration

count

Counts non-null expression values. If expression is not defined, all rows will be counted.

count(expression: any?) [into <keypath>]

An alias can be provided to override the keypath the result will be written to.

For example, the following part of a query

count() into $d.num_rows

will result in a single row of the following form:

{ "num_rows": 7532 }

count_if

Counts non-null expression values on rows which satisfy the condition. If expression is not defined, all rows that satisfy condition will be counted.

count_if(condition: bool, expression: any?)

Example:

groupby $m.severity calculate count_if($d.duration > 500) as $d.high_duration_logs
groupby $m.severity calculate count_if($d.duration > 500, $d.company_id) as $d.high_duration_logs

distinct_count

Counts non-null distinct expression values.

distinct_count(expression: any)

Example:

groupby $l.applicationname calculate distinct_count($d.username) as active_users

distinct_count_if

Counts non-null distinct expression values on rows which satisfy condition.

distinct_count_if(condition: bool, expression: any)

Example:

groupby $l.applicationname calculate distinct_count_if($m.severity == 'Error', $d.username) as users_with_errors

max

Calculates the maximum value of a numerical expression in the group.

max(expression: number | timestamp)

Example:

groupby $m.severity calculate max($d.duration)

min

Calculates the minimum value of a numerical expression in the group.

min(expression: number | timestamp)

Example:

groupby $m.severity calculate min($d.duration)

percentile

Calculates the approximate n-th percentile value of a numerical expression in the group.

percentile(percentile: number, expression: number, error_threshold: number?)

Since the percentile calculation is approximate, the accuracy may be controlled with the error_threshold parameter which ranges from 0 to 1 (defaults to 0.01). A lower value will result in better accuracy at the cost of longer query times.

Example:

groupby $m.severity calculate percentile(0.99, $d.duration) as p99_latency

sample_stddev

Computes the sample standard deviation of a numerical expression in the group.

sample_stddev(expression: number)

Example:

groupby $m.severity calculate sample_stddev($d.duration)

sample_variance

Computes the variance of a numerical expression in the group.

sample_variance(expression: number)

Example:

groupby $m.severity calculate sample_variance($d.duration)

stddev

Computes the standard deviation of a numerical expression in the group.

stddev(expression: number)

Example:

groupby $m.severity calculate stddev($d.duration)

sum

Calculates the sum of a numerical expression in the group.

sum(expression: number)

Example:

groupby $m.severity calculate sum($d.duration) as total_duration

variance

Computes the variance of a numerical expression in the group.

variance(expression: number)

Example:

groupby $m.severity calculate variance($d.duration)

DataPrime Expressions in Aggregations

When querying with the groupby operator, you can apply an aggregation function (such asavg, max, sum) to the bucket of results. This feature gives you the ability to manipulate an aggregation expression inside the expression itself, allowing you to calculate and manipulate your data simultaneously.

Example 1

This examples takes logs which have some connect_duration and batch_duration fields, and calculates the ratio between the averages of those durations, per region.

# Query
source logs
  | groupby region calculate avg(connect_duration) / avg(batch_duration)

Example 2

This query calculates the percentage of logs which don’t have a kubernetes_pod_name out of the total number of logs. The calculation is done per subsystem.

# Query
source logs
| groupby $l.subsystemname calculate
  sum(if(kubernetes.pod_name != null,1,0)) / count() as pct_without_pod_name

Example 3

This query calculates the ratio between the maximum and minimum salary per department, and provides a Based on N Employees string as an additional column per row.

# Query
source logs
| groupby department_id calculate
    max(salary) / min(salary) as salary_ratio
    `Based on {count()} Employees`

Example 4

This query calculates the ratio between error logs and info logs.

source logs
| groupby $m.timestamp / 1h as hour calculate
    count_if($m.severity == '5') / count_if($m.severity == '3') as error_to_info_ratio