DataPrime aggregation expressions
This guide provides a glossary of IBM® Cloud Logs DataPrime expressions that you can use for aggregation.
any_value
Returns any non-null expression value in the group. If expression is not defined, it defaults to the $data object.
any_value(expression: any?)
Returns null if all expression values in the group are null.
Example:
groupby $m.severity calculate any_value($d.url)
avg
Calculates the average value of a numerical expression in the group.
avg(expression: number)
Example:
groupby $m.severity calculate avg($d.duration) as average_duration
count
Counts non-null expression values. If expression is not defined, all rows will be counted.
count(expression: any?) [into <keypath>]
An alias can be provided to override the keypath the result will be written to.
For example, the following part of a query
count() into $d.num_rows
will result in a single row of the following form:
{ "num_rows": 7532 }
count_if
Counts non-null expression values on rows which satisfy the condition. If expression is not defined, all rows that satisfy condition will be counted.
count_if(condition: bool, expression: any?)
Example:
groupby $m.severity calculate count_if($d.duration > 500) as $d.high_duration_logs
groupby $m.severity calculate count_if($d.duration > 500, $d.company_id) as $d.high_duration_logs
distinct_count
Counts non-null distinct expression values.
distinct_count(expression: any)
Example:
groupby $l.applicationname calculate distinct_count($d.username) as active_users
distinct_count_if
Counts non-null distinct expression values on rows which satisfy condition.
distinct_count_if(condition: bool, expression: any)
Example:
groupby $l.applicationname calculate distinct_count_if($m.severity == 'Error', $d.username) as users_with_errors
max
Calculates the maximum value of a numerical expression in the group.
max(expression: number | timestamp)
Example:
groupby $m.severity calculate max($d.duration)
min
Calculates the minimum value of a numerical expression in the group.
min(expression: number | timestamp)
Example:
groupby $m.severity calculate min($d.duration)
percentile
Calculates the approximate n-th percentile value of a numerical expression in the group.
percentile(percentile: number, expression: number, error_threshold: number?)
Since the percentile calculation is approximate, the accuracy may be controlled with the error_threshold parameter which ranges from 0 to 1 (defaults to 0.01). A lower value will result in better accuracy at the cost of longer query
times.
Example:
groupby $m.severity calculate percentile(0.99, $d.duration) as p99_latency
sample_stddev
Computes the sample standard deviation of a numerical expression in the group.
sample_stddev(expression: number)
Example:
groupby $m.severity calculate sample_stddev($d.duration)
sample_variance
Computes the variance of a numerical expression in the group.
sample_variance(expression: number)
Example:
groupby $m.severity calculate sample_variance($d.duration)
stddev
Computes the standard deviation of a numerical expression in the group.
stddev(expression: number)
Example:
groupby $m.severity calculate stddev($d.duration)
sum
Calculates the sum of a numerical expression in the group.
sum(expression: number)
Example:
groupby $m.severity calculate sum($d.duration) as total_duration
variance
Computes the variance of a numerical expression in the group.
variance(expression: number)
Example:
groupby $m.severity calculate variance($d.duration)
DataPrime Expressions in Aggregations
When querying with the groupby operator, you can apply an aggregation function (such asavg, max, sum) to the bucket of results. This feature gives you the ability to manipulate an aggregation expression
inside the expression itself, allowing you to calculate and manipulate your data simultaneously.
Example 1
This examples takes logs which have some connect_duration and batch_duration fields, and calculates the ratio between the averages of those durations, per region.
# Query
source logs
| groupby region calculate avg(connect_duration) / avg(batch_duration)
Example 2
This query calculates the percentage of logs which don’t have a kubernetes_pod_name out of the total number of logs. The calculation is done per subsystem.
# Query
source logs
| groupby $l.subsystemname calculate
sum(if(kubernetes.pod_name != null,1,0)) / count() as pct_without_pod_name
Example 3
This query calculates the ratio between the maximum and minimum salary per department, and provides a Based on N Employees string as an additional column per row.
# Query
source logs
| groupby department_id calculate
max(salary) / min(salary) as salary_ratio
`Based on {count()} Employees`
Example 4
This query calculates the ratio between error logs and info logs.
source logs
| groupby $m.timestamp / 1h as hour calculate
count_if($m.severity == '5') / count_if($m.severity == '3') as error_to_info_ratio