Data Science Functions

AnzoGraph DB offers an extension package of pre-built Data Science functions that you can use in the same way as other native, built-in analytic functions. In addition, Cambridge Semantics offers an Apache Zeppelin Docker image, which includes a collection of notebooks that provide details and example usage of each of the Data Science functions. See Zeppelin Notebook Integration for more information on installing the custom Apache Zeppelin Docker image.

If you use the Installer to install AnzoGraph DB on RHEL/CentOS, these functions are included only if you choose to install the optional C++ extensions and related dependencies. If you do not install the C++ extensions, this package of functions is not included in the deployment. For more information, see Installing AnzoGraph DB.

The Data Science functions are organized into the following categories:

  • Correlation: Determine the relationship between elements.
  • Distribution: Calculate the probability of a given value over a random distribution.
  • Entropy: Determine variance and probability density across a given distribution.
  • Feature Exploration: Classify values in a distribution using techniques such as Linear Discriminant Analysis (LDA), Principal Component Analysis (PCA), and Singular Value Decomposition (SVD).
  • Linear Algebra: Create product vectors or a matrix from a given collection of random variables.
  • Profiling Metrics: Produce statistical metrics such as percentile, geometric mean, or skew on a given dataset.
  • Sketches: Extremely efficient streaming algorithms that approximate calculations such as count distinct, quantiles, most frequent items, joins, and matrix computations and return data sketches.
  • Matrix Utilities: Return information on various attributes of vector space mapping and related matrix tensors.

Correlation

Canonical Correlation (CANCOR)

This aggregate calculates the overall correlation between two sets of variables.

Wikipedia Reference: Canonical correlation function

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:cancor(lc, m, x1, x2, ..., xn, y1, y2, ..., yn)
Parameter Data Type Description
lc int Number of linear combinations for the first canonical correlation
m int Number of columns in the first set
x1–xn double Feature columns from the first dataset
y1–yn double Feature columns from the second dataset

Returns

Data Type Description
string Canonical correlation
string Square of the canonical correlation
string Canonical coefficient

Covariance (COVARIANCE)

This aggregate provides a measure of the strength of the correlation between two or more sets of random variables (or variates).

Wikipedia Reference: Covariance

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:covariance(x1, x2, ..., xn) 
Parameter Data Type Description
x1–xn double Feature columns from the dataset

Returns

Data Type Description
http://anzograph.com/matrices#tensor Covariance matrix

Matthews Correlation Coefficient (MCC)

This aggregate returns a coefficient value between observed and predicted binary classifications.

Wikipedia Reference: Matthews correlation coefficient.

Syntax

PREFIX stats: <http://cambridgesemantics.com/anzograph/statistics#>
stats:mcc(x, y)
Parameter Data Type Description
x boolean First variable column data
y boolean Second variable column data

Returns

Data Type Description
double Coefficient value that shows the extent to which observed and predicted binary classifications are related

Pearson Correlation Coefficient (PCC)

This aggregate determines the extent to which two variables are linearly related: positive, negative, or no relationship.

Wikipedia Reference: Pearson correlation coefficient

Syntax

PREFIX stats: <http://cambridgesemantics.com/anzograph/statistics#>
stats:pcc(x, y)
Parameter Data Type Description
x boolean First variable column data
y boolean Second variable column data

Returns

Data Type Description
double Coefficient that shows the extent to which two variables are linearly related

Spearman Correlation Coefficient (SCC)

This aggregate determines how well the relationship between two variables can be described using a monotonic function.

Wikipedia Reference: Spearman's Correlation Coefficient

Syntax

PREFIX stats: <http://cambridgesemantics.com/anzograph/statistics#>
stats:scc(rank_X, rank_Y) 
Parameter Data Type Description
rank_X double First set of ranked data
rank_Y double Second set of ranked data

Returns

Data Type Description
double Coefficient between ranked datasets

Distribution

Cumulative Distribution Functions (CDF)

A Cumulative Distribution function calculates the probability of a random variable X taking on a value less than or equal to Y.

Wikipedia Reference: Cumulative Distribution Function (CDF)

The following functions produce cumulative distribution calculations:

  • Binomial Distribution (BINOMDIST): Calculates the probability for X successes in N trials given a probability of success P for each trial.
  • Chi-Squared Distribution (CHISQDIST): Calculates probability often used in hypothesis testing to compare an observed distribution with a theoretical one. Also provides a way to show a relationship between two categorical variables.
  • Continuous Uniform Distribution (CONUNIDIST): Calculates probability using continuous probability distribution concerned with events that are equally likely to occur.
  • Discrete Uniform Distribution (DISCUNIDIST): Calculates probability using symmetric probability distribution where a finite number of values are equally likely to be observed and every one of n values has equal probability.
  • Exponential Distribution (EXPDIST): Calculates probability using a distribution that describes time between events in a Poisson point process (where events occur continuously and independently at a constant average rate).
  • Laplace Distribution (LAPLACEDIST): Calculates probability using a distribution that represents differences between two independent variables that have identical exponential distributions (also called double exponential distribution).
  • Log Normal Distribution (LOGNORDIST): Calculates probability using a distribution of a random variable whose logarithm follows a normal distribution. Log normal distributions are widely used in risk analysis.
  • Negative Binomial Distribution (NEGBINDIST): Calculates probability using a discrete probability distribution that concerns the number of trials which must occur in order to have a predetermined number of successes.
  • Normal Distribution (NORMDIST): Calculates probability using a continuous probability distribution of data in which the majority of data points are relatively similar, within a small range of values having few outliers.
  • Poisson Distribution (POISDIST): Calculates probability using a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time or space and those events occur with a known constant rate and occur independently of the time since the last event.
  • Student's T-Distribution (TDIST): Calculates probability using the Student's t-distribution and associated t scores. Often used in hypothesis testing when the sample size is small and/or when the population variance is unknown.
  • TDigest Metric (TDIGEST): Creates an estimate of the median (and more generally, any percentile) from either distributed data or streaming data, using a t-Digest probabilistic data structure.
  • Weibull Distribution (WEIBULDIST): Calculates probability from a continuous probability distribution that is commonly used to assess product reliability, analyze product life data and failure times.

Binomial Distribution (BINOMDIST)

This aggregate calculates the probability for X successes in N trials given a probability of success P for each trial.

Wikipedia Reference: Binomial Distribution

Syntax

PREFIX stats: <http://cambridgesemantics.com/anzograph/statistics#>
stats:binomdist(data, n, k, "success_string")  
Parameter Data Type Description
data string Column data
n long Number of trials
k long Number of successes in n trials
success_string string Defines the success string

Returns

Data Type Description
double Probability mass function value
double Lower cumulative distribution: probability (<=k) under the area of distribution
double Upper cumulative distribution: probability (>k) under the area of distribution

Chi-Squared Distribution (CHISQDIST)

This aggregate calculates probability that is often used in hypothesis testing to compare an observed distribution with a theoretical one. It also provides a way to show a relationship between two categorical variables.

Wikipedia Reference: Chi-Squared Distribution

Syntax

PREFIX stats: <http://cambridgesemantics.com/anzograph/statistics#>
stats:chisqdist(data, s)
Parameter Data Type Description
data double Sample data
s double Population standard deviation

Returns

Data Type Description
double Mean of the distribution
double Standard deviation of the distribution
double Variance of the distribution
double Chi-squared statistic: [(n - 1) * s^2] / d^2 where d is the standard deviation of the population, s is the standard deviation of the sample, and n is the sample size
long Number of samples: the degrees of freedom(k) is (count-1)
double Probability mass function value
double Cumulative distribution: the probability for <= the chi-squared statistic

Continuous Uniform Distribution (CONUNIDIST)

This aggregate calculates probability using a continuous probability distribution concerned with events that are equally likely to occur.

Wikipedia Reference: Continuous Uniform Distribution

Syntax

PREFIX stats: <http://cambridgesemantics.com/anzograph/statistics#>
stats:conunidist(data, a, b)
Parameter Data Type Description
data double Column data
a double Minimum value of the probability interval
b double Maximum value of the probability interval

Returns

Data Type Description
double Cumulative distribution: probability under the area of distribution
double Probability density function value
double Differential entropy in nats

Discrete Uniform Distribution (DISCUNIDIST)

This aggregate calculates probability using symmetric probability distribution where a finite number of values are equally likely to be observed and every one of n values has equal probability.

Wikipedia Reference: Discrete Uniform Distribution

Syntax

PREFIX stats: <http://cambridgesemantics.com/anzograph/statistics#>
stats:discunidist(data, k) 
Parameter Data Type Description
data long Column data
k long The number of outcomes

Returns

Data Type Description
double Cumulative distribution: probability under the area of distribution
double Probability density function value
double Differential entropy in nats

Exponential Distribution (EXPDIST)

This aggregate calculates probability using a distribution that describes time between events in a Poisson point process (where events occur continuously and independently at a constant average rate).

Wikipedia Reference: Exponential Distribution

Syntax

PREFIX stats: <http://cambridgesemantics.com/anzograph/statistics#>
stats:expdist(data, x)
Parameter Data Type Description
data long Column data
x double The probability for the interval

Returns

Data Type Description
double Lower cumulative distribution: probability (<=k) under the area of distribution
double Upper cumulative distribution: probability (>k) under the area of distribution
double Probability density function value
double Differential entropy in nats

Laplace Distribution (LAPLACEDIST)

This aggregate calculates probability using a distribution that represents differences between two independent variables that have identical exponential distributions (also called double exponential distribution).

Wikipedia Reference: Laplace Distribution

Syntax

PREFIX stats: <http://cambridgesemantics.com/anzograph/statistics#>
stats:laplacedist(data, "c", x1, x2)
Parameter Data Type Description
data double Column data
c string "below", "above", "bet" (between), or "out" (outside)
x1 double Lower number (>0) to find the probability
x2 double Upper number (>0) to find the probability

Returns

Data Type Description
double Mean of the distribution
double Scale parameter of the distribution
double Standard deviation of the distribution
double Variance of the distribution
double Differential entropy in nats
double Cumulative distribution: probability under the area of distribution
double Probability density function value for x1
double Probability density function value for x2

Log Normal Distribution (LOGNORDIST)

This aggregate calculates probability using distribution of a random variable whose logarithm follows a normal distribution. The log normal distribution widely used in risk analysis.

Wikipedia Reference: Log Normal Distribution

Syntax

PREFIX stats: <http://cambridgesemantics.com/anzograph/statistics#>
stats:lognordist(data, "c", x1, x2)
Parameter Data Type Description
data double Column data
c string "below", "above", "bet" (between), or "out" (outside)
x1 double Lower number (>0) to find the probability
x2 double Upper number (>0) to find the probability

Returns

Data Type Description
double Mean of the distribution of natural logarithms distribution
double Standard deviation of the distribution of natural logarithms distribution
double Variance of the distribution
double Differential entropy in nats
double Cumulative distribution: probability under the area of distribution
double Probability density function value for x1
double Probability density function value for x2

Negative Binomial Distribution (NEGBINDIST)

This aggregate calculates probability using a discrete probability distribution that concerns the number of trials which must occur in order to have a predetermined number of successes.

Wikipedia Reference: Negative Binomial Distribution

Syntax

PREFIX stats: <http://cambridgesemantics.com/anzograph/statistics#>
stats:negbindist("data", k, r, "success_string")
Parameter Data Type Description
data string Column data
k long Number of successes
r long Number of failures
success_string string Defines the success string

Returns

Data Type Description
double Probability mass function value
double Lower cumulative distribution: probability (<=k) under the area of distribution
double Upper cumulative distribution: probability (>k) under the area of distribution

Normal Distribution (NORMDIST)

This aggregate calculates probability using a continuous probability distribution of data in which the majority of data points are relatively similar, within a small range of values with few outliers.

Wikipedia Reference: Normal Distribution

Syntax

PREFIX stats: <http://cambridgesemantics.com/anzograph/statistics#>
stats:normdist(data, "c", x1, x2) 
Parameter Data Type Description
data double Column data
c string "below", "above", "bet" (between), or "out" (outside)
x1 double Lower number (>0) to find the probability
x2 double Upper number (>0) to find the probability

Returns

Data Type Description
double Mean of the distribution
double Standard deviation of the distribution
double Variance of the distribution
double Differential entropy in nats
double Cumulative distribution: probability under the area of distribution
double Probability density function value for x1
double Probability density function value for x2

Poisson Distribution (POISDIST)

This function calculates probability using discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time or space, given these events occur with a known constant rate and occur independently of the time since the last event.

Wikipedia Reference: Poisson Distribution

Syntax

PREFIX stats: <http://cambridgesemantics.com/anzograph/statistics#>
stats:poisdist(data, k)
Parameter Data Type Description
data long Column data
k long Probability of observing k events in an interval

Returns

Data Type Description
double Probability mass function value
double Lower cumulative distribution: probability (<=k) under the area of distribution
double Upper cumulative distribution: probability (>k) under the area of distribution

Student's T-Distribution (TDIST)

This function calculates probability using the Student's t-distribution (and associated t scores) which are often used in hypothesis testing when the sample size is small and/or when the population variance is unknown.

Wikipedia Reference: Student's t-distribution

Syntax

PREFIX stats: <http://cambridgesemantics.com/anzograph/statistics#>
stats:tdist(data, M) 
Parameter Data Type Description
data double Sample data
M double Population mean

Returns

Data Type Description
double Mean of the distribution
double Standard deviation of the distribution
double Variance of the distribution
double T-statistics: t = [ u - M ] / [ s / sqrt( N ) ] where u is the sample mean, M is the population mean, s is the standard deviation of the sample, and N is the sample size
long Number of samples: the degrees of freedom is (count-1)
double Probability mass function value
double Cumulative distribution: the probability for <= t-statistics

TDigest Metric (TDIGEST)

This function creates an estimate of the median (and more generally, any percentile) from either distributed data or streaming data, using a t-Digest probabilistic data structure.

Wikipedia Reference: Computing Quantiles using T-Digests

Syntax

PREFIX sketch: <http://cambridgesemantics.com/anzograph/statistics/sketch#>
sketch:tdigest(data, p, q, cdf)
Parameter Data Type Description
data double Column data
p double The percentile (0 - 100) to compute
q double The quantile (0.0 - 1.0) to compute
cdf double The CDF to use

Returns

Data Type Description
double Percentile: the value below which a given percentage of observations falls
double Quantile: Cut point to dividing the observations in a sample
double The computation of F(x) where F is the CDF of the distribution

Weibull Distribution (WEIBULDIST)

This function calculates probability from a continuous probability distribution commonly used to assess product reliability and analyze product life data and failure times.

Wikipedia Reference: Weibull Distribution

Syntax

PREFIX stats: <http://cambridgesemantics.com/anzograph/statistics#>
stats:weibuldist(data, k, x)
Parameter Data Type Description
data double Sample data
k double The initial starting value for the shape parameter. A good guess is crucial to quick convergence.
x double The probability for a random variable

Returns

Data Type Description
double The mean of the distribution
double The standard deviation of the distribution
double The variance of the distribution
long The count of the number of samples
double The estimated shape parameter(k) of the distribution from the mean and variance using the root finding method
double The estimated scale parameter(a) of the distribution from the mean and variance using the root finding method
double Differential entropy in nats
double Probability density function value
double Lower cumulative distribution: probability (<=x) under the area of distribution
double Upper cumulative distribution: probability (>x) under the area of distribution
long The actual number of iterations performed to get an estimate of the k value
double The mean calculated using estimated values of k and a
double The variance calculated using estimated values of k and a

Bernoulli Distribution (BERNDIST)

The Bernoulli Distribution function determines the probability of success or failure (or Yes or No) in tests that have only two possible outcomes.

Wikipedia Reference: Bernoulli Distribution

Syntax

PREFIX stats: <http://cambridgesemantics.com/anzograph/statistics#>
stats:berndist("data", prob, "success_string")
Parameter Data Type Description
data string Column data
prob boolean Probability of success (true) or failure (false)
success_string string The success message

Returns

Data Type Description
double The Bernoulli distribution probability

Beta-Binomial Distribution (BETABINDIST)

The Beta-Binomial Distribution function computes probability using a combination of both binomial and beta probability distributions.

Wikipedia Reference: Beta-Binomial Distribution

Syntax

PREFIX stats: <http://cambridgesemantics.com/anzograph/statistics#>
stats:betabindist(k, n, alpha, beta)  
Parameter Data Type Description
k double The probability for the number
n double The number of trials
alpha, beta double Shape parameters

Returns

Data Type Description
double The probability of occurrence k for a beta binomial n, alpha, beta

Hypergeometric Distribution (HYPGEODIST)

The Hypergeometric Distribution function calculates probability from a distribution often used to predict the outcome of a process in which different elements are randomly drawn from a collection and not replaced.

Wikipedia Reference: HyperGeometric Distribution

Syntax

PREFIX stats: <http://cambridgesemantics.com/anzograph/statistics#>
stats:hypgeodist("data", n, k, "success_string") 
Parameter Data Type Description
data string Column data
n int The number of trials
k int The number of success in n trials
success_string string The success message

Returns

Data Type Description
double The hypergeometric distribution probability

Logarithmic (Series) Distribution (LOGSERDIST)

This function calculates probability using a discrete probability distribution derived from the Maclaurin series expansion.

Wikipedia Reference: Logarithmic (Series) Distribution

Syntax

PREFIX stats: <http://cambridgesemantics.com/anzograph/statistics#>
stats:logserdist("data", k, "success_string")
Parameter Data Type Description
data string Column data
k long The probability for the number
success_string string The success message

Returns

Data Type Description
double The logarithmic distribution probability

Skellam Distribution (SKELLAMDIST)

This function calculates probability using the Skellam distribution which models the difference between two independent Poisson distributed variables.

Wikipedia Reference: Skellam Distribution

Syntax

PREFIX stats: <http://cambridgesemantics.com/anzograph/statistics#>
stats:skellamdist(n1_data, n2_data, k) 
Parameter Data Type Description
n1_data long N1 column data
n2_data long N2 column data
k long Probability for the number

Returns

Data Type Description
double The Skellam probability value

Entropy

Cross Entropy (CROSSENTROPY)

This function computes cross-entropy, which is commonly used to quantify the difference between two probability distributions.

Wikipedia Reference: Cross Entropy

Syntax

PREFIX stats: <http://cambridgesemantics.com/anzograph/statistics#>
stats:crossentropy(p, q)  
Parameter Data Type Description
p double True probabilities for x
q double Predicted probabilities for x

Returns

Data Type Description
double The cross entropy value

Discrete Entropy Metric (DISCENTROPY)

This function calculates entropy for maps on finite sets, referred to as discrete entropy.

ScienceDirect Reference: Discrete Entropy

Syntax

PREFIX stats: <http://cambridgesemantics.com/anzograph/statistics#>
stats:discentropy("data")
Parameter Data Type Description
data string Column data

Returns

Data Type Description
double The discrete entropy value

Differential Entropy or Continuous Entropy Metrics

Differential entropy (also referred to as continuous entropy) is entropy that can be computed for distributions with a continuous random variable.

Wikipedia Reference: Differential entropy

The following functions produce entropy calculations:

Feature Exploration

Principal Component Analysis (PCA)

Applies principal component analysis (PCA) to create combined eigenvalues and vectors that highlight patterns in a dataset, making it easier to explore and visualize data.

Wikipedia Reference: Principal Component Analysis

There are several options available for principal component analysis:

pca::create

Applies PCA to create combined eigenvalues and eigenvectors.

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:pca::create(x1, x2, ..., xn) 
Parameter Data Type Description
x1–xn double Feature column datasets

Returns

Data Type Description
http://anzograph.com/matrices#feature_result PCA data containing eigenvalues and eigenvectors

pca::get_eigvec

Retrieves PCA's eigenvectors as a matrix from the PCA data.

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:pca::get_eigvec(pca_data)
Parameter Data Type Description
pca_data http://anzograph.com/matrices#feature_result Principal Component Analysis data

Returns

Data Type Description
http://anzograph.com/matrices#tensor Eigenvectors as a matrix

pca::get_eigval

Retrieves PCA's eigenvalues as a column vector from PCA data.

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:pca::get_eigval(pca_data)
Parameter Data Type Description
pca_data http://anzograph.com/matrices#feature_result Principal Component Analysis data

Returns

Data Type Description
http://anzograph.com/matrices#tensor Eigenvalues in descending order as column vectors

transform

Applies Linear Discriminant Analysis (LDA) to transform samples onto the new subspace.

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:transform(pca_data, d, x1, x2, ..., xn) 
Parameter Data Type Description
pca_data http://anzograph.com/matrices#feature_result Principal Component Analysis data
d int The number of eigenvectors to consider from the end
x1–xn double Feature column datasets

Returns

Data Type Description
double Sample data transformed into the tuple of lower dimensional space

dump_tensor

Gets a string representation of vector or matrix in row-wise/column-wise order.

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:dump_tensor(m [, type ] [, isRowWise ])
Parameter Data Type Description
m http://anzograph.com/matrices#tensor A tensor of matrix/row vector/column vector
type int Optional argument that specifies the type of tensor: 0=row vector, 1=column vector, 2=matrix. Default is 2.
isRowWise boolean Optional argument that indicates whether the display matrix is column- or row- wise: false=column-wise, true=row-wise. Default is true.

Returns

Data Type Description
string String representation of vector or matrix in row-wise/column-wise

pca::get_raw_eigval

Gets the PCA's unsorted eigenvalues from the PCA data.

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:pca::get_raw_eigval(pca_data)
Parameter Data Type Description
pca_data http://anzograph.com/matrices#feature_result Principal Component Analysis data

Returns

Data Type Description
http://anzograph.com/matrices#tensor Eigenvalues in unsorted order as column vectors

Singular Value Decomposition (SVD)

A matrix factorization method, singular value decomposition (SVD), creates combined singular values and right singular vectors.

Wikipedia Reference: Singular Value Decomposition

There are several options available for singular value decomposition:

svd::create

Applies SVD to create combined singular values and right singular vectors.

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:svd::create(x1, x2, ..., xn) 
Parameter Data Type Description
x1–xn double Feature column datasets

Returns

Data Type Description
http://anzograph.com/matrices#feature_result SVD data containing singular values and right singular vectors

svd::get_sigval

Gets SVD's singular values as a column vector from the SVD data.

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:svd::get_sigval(svd_data)
Parameter Data Type Description
svd_data http://anzograph.com/matrices#feature_result SVD data

Returns

Data Type Description
http://anzograph.com/matrices#tensor Singular values in descending order as a column vector

svd::get_sigvec

Gets SVD's singular vector as a matrix from the SVD data.

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:svd::get_sigvec(svd_data)
Parameter Data Type Description
svd_data http://anzograph.com/matrices#feature_result SVD data

Returns

Data Type Description
http://anzograph.com/matrices#tensor Right singular vectors as a matrix

transform

Applies PCA or SVD to transform samples onto the new subspace.

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:transform(svd_data, d, x1, x2, ..., xn)
Parameter Data Type Description
svd_data http://anzograph.com/matrices#feature_result SVD data
d int The number of singular vectors to consider from the start
x1–xn double Feature column datasets

Returns

Data Type Description
string Sample data transformed into the tuple of lower dimensional space

dump_tensor

Returns a string representation of a vector or matrix in row- or column- wise order.

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:dump_tensor(m [, type ] [, isRowWise ])
Parameter Data Type Description
m http://anzograph.com/matrices#tensor A tensor of matrix/row vector/column vector
type int Optional argument that specifies the type of tensor: 0=row vector, 1=column vector, 2=matrix. Default is 2.
isRowWise boolean Optional argument that indicates whether the display matrix is column- or row- wise: false=column-wise, true=row-wise. Default is true.

Returns

Data Type Description
string String representation of the vector or matrix in row-wise/column-wise

Linear Discriminant Analysis (LDA)

This function applies linear discriminant analysis (LDA) to create combined eigenvalues and vectors that characterize or separate two or more classes of objects or events.

Wikipedia Reference: Linear Discriminant Analysis

There are several options available for linear discriminant analysis:

lda::create

Applies LDA to create combined eigenvalues and eigenvectors.

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:lda::create(y, x1, x2, ..., xn)
Parameter Data Type Description
y double

Class of feature tuple

x1–xn double Feature column datasets

Returns

Data Type Description
http://anzograph.com/matrices#lda_result Combined eigenvalues, eigenvectors, class mean, count, and class map

lda::get_eigvec

Given LDA data, this function gets LDA's eigenvectors as a matrix.

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:lda::get_eigvec(lda_data) 
Parameter Data Type Description
lda_data http://anzograph.com/matrices#lda_result Linear discriminant analysis data

Returns

Data Type Description
http://anzograph.com/matrices#tensor Eigenvectors as a matrix

lda::get_eigval

Given LDA data, this function gets LDA's eigenvalues as a column vector.

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:lda::get_eigval(lda_data)
Parameter Data Type Description
lda_data http://anzograph.com/matrices#lda_result Linear discriminant analysis data

Returns

Data Type Description
http://anzograph.com/matrices#tensor Eigenvalues in descending order as a column vector

lda::transform

Applies LDA to transform samples onto the new subspace.

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:lda::transform(lda_data, d, x1, x2, ..., xn)
Parameter Data Type Description
lda_data http://anzograph.com/matrices#lda_result Linear discriminant analysis data
d int The number of eigenvectors to consider from the start
x1–xn double Feature column datasets

Returns

Data Type Description
double Original data transformed into the tuple of lower dimensional space

dump_tensor

Returns a string representation of a vector or matrix in row- or column- wise order.

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:dump_tensor(m [, type ] [, isRowWize ])
Parameter Data Type Description
m http://anzograph.com/matrices#tensor A tensor of matrix/row vector/column vector
type int Optional argument that specifies the type of tensor: 0=row vector, 1=column vector, 2=matrix. Default is 2.
isRowWise boolean Optional argument that indicates whether the display matrix is column- or row- wise: false=column-wise, true=row-wise. Default is true.

Returns

Data Type Description
string String representation of the vector or matrix in row-wise/column-wise

lda::predict

Predicts the class for the samples using LDA as the classifier.

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:lda::predict(lda_data, p1, p2, ..., pn)
Parameter Data Type Description
lda_data http://anzograph.com/matrices#lda_result LDA data
p1–pn double Data sample whose class to predict

Returns

Data Type Description
string The class name to which data tuple belongs

lda::get_raw_eigval

Given LDA data, this function gets LDA's unsorted eigenvalues.

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:lda::get_raw_eigval(lda_data) 
Parameter Data Type Description
lda_data http://anzograph.com/matrices#lda_result LDA data

Returns

Data Type Description
http://anzograph.com/matrices#tensor Eigenvalues in unsorted order as a column vector

Linear Algebra

Gramian Matrix (GRAMIAN)

This function creates a Gramian matrix commonly used to compute linear independence.

Wikipedia Reference: Gramian Matrix

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:gramian(x1, x2, ..., xn)
Parameter Data Type Description
x1–xn double Feature column datasets

Returns

Data Type Description
http://anzograph.com/matrices#tensor Gramian matrix

Profiling Metrics

Discrete Probability Metrics

These functions calculate a discrete probability distribution of values.

Wikipedia Reference: Discrete Probability

The following functions produce discrete probability metrics:

Geometric Mean Metric (GMEAN)

This function calculates geometric mean, defined as the nth root of the product of n positive numbers.

Wikipedia Reference: Geometric Mean

Syntax

PREFIX stats: <http://cambridgesemantics.com/anzograph/statistics#>
stats:gmean(data)
Parameter Data Type Description
data double Column data

Returns

Data Type Description
double The geometric mean value

Percentile Metric (PERCENTILE)

This function calculates the percentile (1 to 100) of numeric values.

Wikipedia Reference: Percentile Metric

Syntax

PREFIX stats: <http://cambridgesemantics.com/anzograph/statistics#>
stats:percentile(data, p)  
Parameter Data Type Description
data double Dataset
p double The percentile (0 - 100) to compute

Returns

Data Type Description
double The percentile value

Skew Metric (SKEWCOEFF)

This function calculates the Pearson’s coefficient of skewness on numeric values.

Wikipedia Reference: Skewness

Syntax

PREFIX stats: <http://cambridgesemantics.com/anzograph/statistics#>
stats:skewcoeff(data, dp) 
Parameter Data Type Description
data double Data set
dp int Number of decimal points to consider for the input data

Returns

Data Type Description
double The mode (value that appears most frequently)
double The median number in an ordered set of data
double The average value
double The standard deviation
double Pearson mode skewness or first skewness coefficient
double Pearson median skewness or second skewness coefficient

Sketches

  • Cardinality Metric (HLL): Uses Apache DataSketches HyperLogLog (HLL) to calculate cardinality estimates for a dataset.
  • Frequent Items (FI): Collection of functions used to create frequency sketches and obtain information about frequent items.
  • Quantile/Rank Sketch (KLL): Collection of functions that use the KLL sketch computation model to approximate minimum and maximum items in a dataset, the quantile and rank of items, the Probability Mass Function (PMF), and the Cumulative Distribution Function (CDF).
  • Theta Sketch (THETA): Collection of functions that use the Theta Sketch framework to compute estimates of the cardinality, union, intersection, and difference set operations and return a Theta Sketch.

Cardinality Metric (HLL)

This aggregate calculates cardinality estimates for a dataset using Apache DataSketches HyperLogLog (HLL).

Reference: Cardinality Prominence Metric

Syntax

PREFIX sketch: <http://cambridgesemantics.com/anzograph/sketch#>
sketch:hll(data [, log_base_2_K ] [, hll_target_type ]) 
Parameter Data Type Description
data byte, short, int, long, float, double, string, URI The dataset
log_base_2_K int Optional argument that specifies the log base 2 of K, where K is the number of buckets or slots for the sketch. Must be between 4 and 21 (inclusive). Default value is 12.
hll_target_type int Optional argument that specifies the target type for the HLL sketch. Supported values are 4 (HLL_4), 6 (HLL_6), or 8 (HLL_8). Default value is 4.

Returns

Data Type Description
double The cardinality metric value

Frequent Items (FI)

The FI aggregate is used to estimate the frequency of items in a dataset, the upper and lower bounds of the items, the number of active items, and the total stream weight. FI returns a binary stream (Frequent Items Sketch) containing all of the computed values. Values can retrieved from the sketch using the get_estimates, get_active_items_total_weights, get_top_items, and get_top_strings functions.

For more information about frequency sketches, see Frequency Sketches Overview.

FI Syntax

PREFIX sketch: <http://cambridgesemantics.com/anzograph/sketch#>
sketch:fi(values [, weight ])
Parameter Data Type Description
values short, int, long, float, double, string The dataset
weight long Optional argument that specifies the weight of val. The default value is 1.

Returns

Data Type Description
http://anzograph.com/statistics#fi_sketch Binary Frequent Items Sketch

Frequent Items Sketch Retrieval Functions

The following functions are available for retrieving values from a Frequent Items Sketch:

fi::get_estimates

Returns the estimates for the frequency and lower and upper bound of the given item in a sketch.

Syntax

PREFIX sketch: <http://cambridgesemantics.com/anzograph/sketch#>
sketch:fi::get_estimates(fi_sketch, item) 
Parameter Data Type Description
fi_sketch http://anzograph.com/statistics#fi_sketch Frequent Items Sketch
item Object Item for which to get estimates

Returns

Data Type Description
long Frequency estimate for the item
long Lower bound estimate for the item
long Upper bound estimate for the item

fi::get_active_items_total_weights

Returns the number of active items and the estimated total stream weight from a sketch.

Syntax

PREFIX sketch: <http://cambridgesemantics.com/anzograph/sketch#>
sketch:fi::get_active_items_total_weights(fi_sketch)
Parameter Data Type Description
fi_sketch http://anzograph.com/statistics#fi_sketch Frequent Items Sketch

Returns

Data Type Description
long The estimated number of active items
long The estimated total stream weight

fi::get_top_items

Returns the most frequent items and their corresponding frequency.

Syntax

PREFIX sketch: <http://cambridgesemantics.com/anzograph/sketch#>
sketch:fi::get_top_items(fi_sketch)
Parameter Data Type Description
fi_sketch http://anzograph.com/statistics#fi_sketch Frequent Items Sketch

Returns

Data Type Description
double The item with the highest frequency
long Frequency estimate of the first item
double The item with the second highest frequency
long Frequency estimate of the second item
double The item with the nth highest frequency
long Frequency estimate of the nth item

fi::get_top_strings

Gets top frequent strings and their corresponding frequency.

Syntax

PREFIX sketch: <http://cambridgesemantics.com/anzograph/sketch#>
sketch:fi::get_top_strings(fi_sketch)
Parameter Data Type Description
fi_sketch http://anzograph.com/statistics#fi_sketch Frequent Items Sketch

Returns

Data Type Description
string The string with the highest frequency
long Frequency estimate of the first string
string The string with the second highest frequency
long Frequency estimate of the second string
string The string with the nth highest frequency
long Frequency estimate of the nth string

Quantile/Rank Sketch (KLL)

The KLL aggregate uses the KLL Sketch computation model to calculate the approximate minimum and maximum items in a dataset, the quantile and rank of items, the Probability Mass Function (PMF), and the Cumulative Distribution Function (CDF). KLL returns a binary stream (KLL Sketch) containing all of the computed values. Values can retrieved from the sketch using various functions.

For more information about KLL sketches, see KLL Sketch.

KLL Syntax

PREFIX sketch: <http://cambridgesemantics.com/anzograph/sketch#>
sketch:kll(values [, k ])
Parameter Data Type Description
values short, int, long, float, double, string The dataset
k int Optional argument that configures the size of the sketch and its estimation error. Can be any value between 8 and 65535 (inclusive). The default value is 200, which results in a normalized rank error of about 1.65%. Higher values will have a smaller error but the sketch will be larger (and slower).

Returns

Data Type Description
http://anzograph.com/statistics#kll_sketch Binary KLL sketch

KLL Sketch Retrieval Functions

The following functions are available for retrieving values from a KLL sketch:

kll::get_min_value

Returns the minimum value in a KLL sketch.

Syntax

PREFIX sketch: <http://cambridgesemantics.com/anzograph/sketch#>
sketch:kll::get_min_value(kll_sketch)
Parameter Data Type Description
kll_sketch http://anzograph.com/statistics#kll_sketch KLL sketch

Returns

Data Type Description
double The minimum value in the sketch
string If the input is a string, the minimum string is returned

kll::get_max_value

Returns the maximum value in a KLL sketch.

Syntax

PREFIX sketch: <http://cambridgesemantics.com/anzograph/sketch#>
sketch:kll::get_max_value(kll_sketch)
Parameter Data Type Description
kll_sketch http://anzograph.com/statistics#kll_sketch KLL sketch

Returns

Data Type Description
double The maximum value in the sketch
string If the input is a string, the minimum string is returned

kll::get_n

Returns the length of a KLL sketch.

Syntax

PREFIX sketch: <http://cambridgesemantics.com/anzograph/sketch#>
sketch:kll::get_n(kll_sketch)
Parameter Data Type Description
kll_sketch http://anzograph.com/statistics#kll_sketch KLL sketch

Returns

Data Type Description
long The length of the sketch

kll::get_num_retained

Returns the number of retained items (samples) in a sketch.

Syntax

PREFIX sketch: <http://cambridgesemantics.com/anzograph/sketch#>
sketch:get_num_retained(kll_sketch)
Parameter Data Type Description
kll_sketch http://anzograph.com/statistics#kll_sketch KLL sketch

Returns

Data Type Description
long The number of retained items (samples) in the sketch

kll::get_rank

Returns an approximation of the normalized (fractional) rank of the given item.

Syntax

PREFIX sketch: <http://cambridgesemantics.com/anzograph/sketch#>
sketch:kll::get_rank(kll_sketch, v)
Parameter Data Type Description
kll_sketch http://anzograph.com/statistics#kll_sketch KLL sketch
v double The item to retrieve the rank for

Returns

Data Type Description
double The approximate rank of the item from 0 - 1 (inclusive)

kll::get_quantile

Returns an approximation of the value for an item from the rank.

Syntax

PREFIX sketch: <http://cambridgesemantics.com/anzograph/sketch#>
sketch:kll::get_quantile(kll_sketch, fraction)
Parameter Data Type Description
kll_sketch http://anzograph.com/statistics#kll_sketch KLL sketch
fraction double The fractional position in the hypothetical sorted stream

Returns

Data Type Description
double An approximation of the value of the item that would be preceded by the given fraction of a hypothetical sorted version of the sketch
string An approximation of the string when the input is a string

kll::get_quantiles

Provides a more efficient, multiple-query version of kll::get_quantile that enables you to specify a number of evenly spaced fractional ranks.

Syntax

PREFIX sketch: <http://cambridgesemantics.com/anzograph/sketch#>
sketch:kll::get_quantiles(kll_sketch, f1, f2, ..., f10)
Parameter Data Type Description
kll_sketch http://anzograph.com/statistics#kll_sketch KLL sketch
f1–f10 double Normalized or fractional ranks in the hypothetical sorted stream. The ranks must be in the interval 0.0 - 1.0 (inclusive).

Returns

Data Type Description
double An approximation of the values in the same order as the given fractional positions

kll::get_quantiles_str

Provides an approximation to the strings when the input is a string type.

Syntax

PREFIX sketch: <http://cambridgesemantics.com/anzograph/sketch#>
sketch:kll::get_quantiles_str(kll_sketch, f1, f2, ..., f10)
Parameter Data Type Description
kll_sketch http://anzograph.com/statistics#kll_sketch KLL sketch
f1–f10 double Normalized or fractional ranks in the hypothetical sorted stream. The ranks must be in the interval 0.0 - 1.0 (inclusive).

Returns

Data Type Description
string An approximation of the strings

kll::get_pmf

Provides an approximation to the Probability Mass Function (PMF) of the input stream.

Syntax

PREFIX sketch: <http://cambridgesemantics.com/anzograph/sketch#>
sketch:kll::get_pmf(kll_sketch, v1, v2, ..., v10)
Parameter Data Type Description
kll_sketch http://anzograph.com/statistics#kll_sketch KLL sketch
v1–v10 Object Input values between the minimum and maximum values of the input stream. Values must be unique and monotonically increasing.

Returns

Data Type Description
double PMF values corresponding to the input

kll::get_cdf

Provides an approximation to the Cumulative Distribution Function (CDF), which is the cumulative analog of the PMF of the input stream.

Syntax

PREFIX sketch: <http://cambridgesemantics.com/anzograph/sketch#>
sketch:kll::get_cdf(kll_sketch, v1, v2, ..., v10)
Parameter Data Type Description
kll_sketch http://anzograph.com/statistics#kll_sketch KLL sketch
v1–v10 Object Input values between the minimum and maximum values of the input stream. Values must be unique and monotonically increasing.

Returns

Data Type Description
double CDF values corresponding to the input

Theta Sketch (THETA)

The THETA aggregate uses the Theta Sketch framework to compute estimates of the cardinality, union, intersection, and difference set operations and return a binary stream (Theta Sketch) containing the computed values. Values can be retrieved from the sketch using the cardinality, union, intersection, and difference functions.

Theta Sketches are a generalization of the well-known Kth Minimum Value (KMV) sketches. For more information about the framework, you may find the following references helpful:

THETA Syntax

PREFIX sketch: <http://cambridgesemantics.com/anzograph/sketch#>
sketch:theta(values)
Parameter Data Type Description
values short, int, long, float, double, string The dataset to operate on

Returns

Data Type Description
http://anzograph.com/statistics#theta_sketch Binary Theta Sketch

Theta Sketch Retrieval Functions

The following functions are available for retrieving values from a Theta Sketch:

theta::cardinality

Retrieves the estimated count of values in a Theta Sketch.

Syntax

PREFIX sketch: <http://cambridgesemantics.com/anzograph/sketch#>
sketch:theta::cardinality(theta_sketch)
Parameter Data Type Description
theta_sketch http://anzograph.com/statistics#theta_sketch Binary Theta Sketch

Returns

Data Type Description
double The count of items in the sketch

theta::union

Retrieves the estimate of the number of items that are in the union of two or more Theta Sketches.

Syntax

PREFIX sketch: <http://cambridgesemantics.com/anzograph/sketch#>
sketch:theta::union(theta_sketch1, theta_sketch2 [, theta_sketchN ])
Parameter Data Type Description
theta_sketch1–N http://anzograph.com/statistics#theta_sketch Any number of Theta Sketches

Returns

Data Type Description
double The estimated number of items in the union

theta::intersection

Retrieves the estimate of the number of items that are in the intersection between two or more Theta Sketches.

Syntax

PREFIX sketch: <http://cambridgesemantics.com/anzograph/sketch#>
sketch:theta::intersection(theta_sketch1, theta_sketch2 [, theta_sketchN ])
Parameter Data Type Description
theta_sketch1–N http://anzograph.com/statistics#theta_sketch Any number of Theta Sketches

Returns

Data Type Description
double The estimated number of items that intersect in the sketches

theta::difference

Retrieves the estimate of the number of items that are in the difference between two Theta Sketches, i.e., the number of items that are in the first sketch but not in the second sketch.

Syntax

PREFIX sketch: <http://cambridgesemantics.com/anzograph/sketch#>
sketch:theta::difference(a, b)
Parameter Data Type Description
a http://anzograph.com/statistics#theta_sketch The first Theta Sketch
b http://anzograph.com/statistics#theta_sketch The Theta Sketch to compare to sketch a

Returns

Data Type Description
double The estimated number of items in the difference between the sketches

Matrix Utilities

This collection of functions returns information on various attributes of vector space mapping and related matrix tensors.

make_matrix

Creates a matrix of doubles with the given dimensions and values.

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:make_matrix(m, n [, v1, v2, ..., vn ])
Parameter Data Type Description
m int Number of rows for the new matrix
n int Number of columns for the new matrix
v1–vn double Optional arguments that specify the row-wise matrix elements to include. Default value is 0 for all elements.

Returns

Data Type Description
http://anzograph.com/matrices#tensor Tensor representation for m x n matrix of doubles

make_tensor_from_string

Constructs a tensor from the given dimensions in a string.

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:make_tensor_from_string(s [, n ])
Parameter Data Type Description
s string String that contains the row-wise elements for constructing the tensor
n int Optional argument that specifies the number of columns to include in the tensor. The default value is 0, which constructs a row vector. A value of 1 constructs a column vector. A value that is greater than 1 constructs a matrix with the specified number of columns.

Returns

Data Type Description
http://anzograph.com/matrices#tensor A tensor of doubles

make_vec

Constructs a row vector with the given index and value to be stored in the index.

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:make_vec(n, v)
Parameter Data Type Description
n int The index into the vector
v double The value to be stored in the vector at the nth index

Returns

Data Type Description
http://anzograph.com/matrices#tensor A row vector

dump_tensor

Displays the Armadillo header and the first few elements of the matrix or vector as a string.

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:dump_tensor(b [, type ] [, isRowWise ]) 
Parameter Data Type Description
b http://anzograph.com/matrices#tensor A tensor of matrix/row vector/column vector.
type int Optional argument that specifies the type of tensor: 0=row vector, 1=column vector, 2=matrix. Default is 2.
isRowWise Boolean Optional argument that indicates whether the display matrix is column- or row- wise: false=column-wise, true=row-wise. Default is true.

Returns

Data Type Description
string Row- or column- wise string representation of the vector or matrix

dump_vec

Returns the string representation of a row or column vector.

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:dump_vec(b)
Parameter Data Type Description
b http://anzograph.com/matrices#tensor Tensor representation of a row or column vector

Returns

Data Type Description
string String representation of the row or column vector

get_rows

Returns the number of rows present in a tensor.

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:get_rows(b)
Parameter Data Type Description
b http://anzograph.com/matrices#tensor A tensor

Returns

Data Type Description
long The number of rows

get_cols

Gets the number of columns present in a tensor.

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:get_cols(b)
Parameter Data Type Description
b http://anzograph.com/matrices#tensor A tensor

Returns

Data Type Description
long The number of columns

get_slices

Gets the number of slices present in a tensor.

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:get_slices(b)
Parameter Data Type Description
b http://anzograph.com/matrices#tensor A tensor

Returns

Data Type Description
long The number of slices

get_order

Gets the tensor order.

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:get_order(b)
Parameter Data Type Description
b http://anzograph.com/matrices#tensor A tensor

Returns

Data Type Description
long The tensor order

get_total_elem

Gets the total number of elements that are present in a tensor.

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:get_total_elem(b)
Parameter Data Type Description
b http://anzograph.com/matrices#tensor A tensor

Returns

Data Type Description
long The total number of elements

get_nonzero

Gets the number of non-zero elements that are present in a sparse matrix.

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:get_nonzero(b)
Parameter Data Type Description
b http://anzograph.com/matrices#tensor A tensor

Returns

Data Type Description
long The number of non-zero elements

get_elem

Accesses one or more elements that are stored in a tensor.

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:get_elem(b, i [, j ] [, k ])
Parameter Data Type Description
b http://anzograph.com/matrices#tensor A tensor
i long The element stored in the ith row
j long Optional argument that lists the element stored in the jth column
k long Optional argument that lists the element stored in the kth slice

Returns

Data Type Description
double The element value

get_diag

Extracts a diagonal from a matrix or sparse matrix.

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:get_diag(b [, k ])
Parameter Data Type Description
b http://anzograph.com/matrices#tensor A tensor
k long Optional diagonal number. By default, the main diagonal is accessed (k=0). For k>0 , the kth super-diagonal is accessed (top right corner). For k<0, the kth sub-diagonal is accessed (bottom left corner).

Returns

Data Type Description
http://anzograph.com/matrices#tensor Tensor representation of the diagonal as a column vector

getmax_val

Retrieves the maximum value from a tensor.

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:getmax_val(b)
Parameter Data Type Description
b http://anzograph.com/matrices#tensor A tensor

Returns

Data Type Description
double The maximum value in the tensor

getmin_val

Retrieves the minimum value from a tensor.

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:getmin_val(b)
Parameter Data Type Description
b http://anzograph.com/matrices#tensor A tensor

Returns

Data Type Description
double The minimum value in the tensor

get_subvec

Extracts a range of elements from a row or column vector.

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:get_subvec(b, i, j)
Parameter Data Type Description
b http://anzograph.com/matrices#tensor A tensor
i long Start index
j long End index

Returns

Data Type Description
http://anzograph.com/matrices#tensor Tensor representation of a row or column vector

subvec_head

Extracts starting elements from a row or column vector.

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:subvec_head(b, n)
Parameter Data Type Description
b http://anzograph.com/matrices#tensor A tensor
n long The number of elements to extract from the beginning of the vector

Returns

Data Type Description
http://anzograph.com/matrices#tensor Tensor representation of a row or column vector with elements from 0 to n-1

subvec_tail

Extracts tailing elements from a row or column vector.

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:subvec_tail(b, n)
Parameter Data Type Description
b http://anzograph.com/matrices#tensor A tensor
n long The number of elements to extract from the end of the vector

Returns

Data Type Description
http://anzograph.com/matrices#tensor Tensor representation of a row or column vector with n elements from the tail

cosine_similarity

This function calculates the cosine similarity between two row vectors.

The cosine_similarity function is not compatible with column or matrix vectors. The input must be row vectors.

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:cosine_similarity(m, n)
Parameter Data Type Description
m http://anzograph.com/matrices#tensor A row vector
n http://anzograph.com/matrices#tensor The row vector to compare to the vector in argument m

Returns

Data Type Description
double Results range from -1 to 1: -1 is perfectly dissimilar and 1 is perfectly similar

subview_col

Extracts a column from a matrix or sparse matrix.

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:subview_col(b, n)
Parameter Data Type Description
b http://anzograph.com/matrices#tensor A tensor
n long Column index

Returns

Data Type Description
http://anzograph.com/matrices#tensor Tensor representation of a column vector

subview_row

Extracts a row from matrix or sparse matrix.

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:subview_row(b, n)
Parameter Data Type Description
b http://anzograph.com/matrices#tensor A tensor
n long Row index

Returns

Data Type Description
http://anzograph.com/matrices#tensor Tensor representation of a row vector

subview_rows

Extracts a range of rows from a matrix or sparse matrix.

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:subview_rows(b, r1, ..., rn)
Parameter Data Type Description
b http://anzograph.com/matrices#tensor A tensor
r1–rn long The start row index to the end row
index

Returns

Data Type Description
http://anzograph.com/matrices#tensor Tensor representation of the matrix with rows from r1 to rn

subview_cols

Extracts a range of columns from a matrix or sparse matrix.

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:subview_cols(b, c1, ..., cn)
Parameter Data Type Description
b http://anzograph.com/matrices#tensor A tensor
c1–n long The start column index to the end column index

Returns

Data Type Description
http://anzograph.com/matrices#tensor Tensor representation of the matrix with columns from c1 to cn

subview_mat

Extracts a submatrix from a matrix or sparse matrix.

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:subview_mat(b, r1, ..., rn, c1, ..., cn)
Parameter Data Type Description
b http://anzograph.com/matrices#tensor A tensor
r1–n long The start row index to the end row
index
c1–n long The start column index to the end column index

Returns

Data Type Description
http://anzograph.com/matrices#tensor Tensor representation of a matrix of size [1+(rn-r1)] x [1+(cn-c1)]

subview_head_rows

Extracts starting rows from a matrix or sparse matrix.

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:subview_head_rows(b, n)
Parameter Data Type Description
b http://anzograph.com/matrices#tensor A tensor
n long The number of rows to extract from the beginning of the matrix

Returns

Data Type Description
http://anzograph.com/matrices#tensor Tensor representation of a matrix with rows from 0 to n-1

subview_head_cols

Extracts starting columns from a matrix or sparse matrix.

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:subview_head_cols(b, n)
Parameter Data Type Description
b http://anzograph.com/matrices#tensor A tensor
n long The number of columns to extract from the beginning of the matrix

Returns

Data Type Description
http://anzograph.com/matrices#tensor Tensor representation of a matrix with columns from 0 to n-1

subview_tail_rows

Extracts tailing rows from a matrix or sparse matrix.

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:subview_tail_rows(b, n)
Parameter Data Type Description
b http://anzograph.com/matrices#tensor A tensor
n long The number of rows to extract from the end of the matrix

Returns

Data Type Description
http://anzograph.com/matrices#tensor Tensor representation of a matrix with n rows from the tail

subview_tail_cols

Extracts tailing columns from a matrix or sparse matrix.

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:subview_tail_cols(b, n)
Parameter Data Type Description
b http://anzograph.com/matrices#tensor A tensor
n long The number of columns to extract from the end of the matrix

Returns

Data Type Description
http://anzograph.com/matrices#tensor Tensor representation of a matrix with n columns from the tail

flatten_as_col

Returns a flattened version of a matrix as a column vector.

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:flatten_as_col(b)
Parameter Data Type Description
b http://anzograph.com/matrices#tensor A tensor

Returns

Data Type Description
http://anzograph.com/matrices#tensor Tensor representation of the matrix as a column vector

flatten_as_row

Returns a flattened version of a matrix as a row vector.

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:flatten_as_row(b)
Parameter Data Type Description
b http://anzograph.com/matrices#tensor A tensor

Returns

Data Type Description
http://anzograph.com/matrices#tensor Tensor representation of the matrix as a row vector

is_vec

Evaluates whether the given matrix is a row or column vector.

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:is_vec(b)
Parameter Data Type Description
b http://anzograph.com/matrices#tensor A tensor

Returns

Data Type Description
boolean True if the matrix can be interpreted as a column or row vector. False if the matrix does not have exactly one column or one row.

is_rowvec

Evaluates whether the given matrix is a row vector.

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:is_rowvec(b)
Parameter Data Type Description
b http://anzograph.com/matrices#tensor A tensor

Returns

Data Type Description
boolean True if the matrix can be interpreted as a row vector. False if the matrix does not have exactly one row.

is_colvec

Evaluates whether the given matrix is a column vector.

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:is_colvec(b)
Parameter Data Type Description
b http://anzograph.com/matrices#tensor A tensor

Returns

Data Type Description
boolean True if the matrix can be interpreted as a column vector. False if the matrix does not have exactly one column.

is_sorted

Evaluates whether a vector or matrix is sorted.

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:is_sorted(b [, t ] [, d ])
Parameter Data Type Description
b http://anzograph.com/matrices#tensor A tensor
t boolean Optional argument that specifies the sort dimension for the matrix. Set to true if elements are sorted row-wise and false if they are sorted column-wise. Default is false.
d int Optional argument that specifies the sort direction for the matrix. Allowed arguments are:
  • 0: ascend (default). Elements are ascending; consecutive elements can be equal.
  • 1: descend. Elements are descending; consecutive elements can be equal.
  • 2: strictascend. Elements are strictly ascending; consecutive elements cannot be equal.
  • 3: strictdescend. Elements are strictly descending; consecutive elements cannot be equal.

Returns

Data Type Description
boolean True if the elements are sorted. False if they are not.

is_tri_mat_upper

Evaluates whether a matrix is upper triangular, i.e., the matrix is square sized and all elements below the main diagonal are zero.

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:is_tri_mat_upper(b)
Parameter Data Type Description
b http://anzograph.com/matrices#tensor A tensor

Returns

Data Type Description
boolean Returns true if the matrix is upper triangular and false if it is not

is_tri_mat_lower

Evaluates whether a matrix is lower triangular, i.e., the matrix is square sized and all elements above the main diagonal are zero.

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:is_tri_mat_lower(b)
Parameter Data Type Description
b http://anzograph.com/matrices#tensor A tensor

Returns

Data Type Description
boolean Returns true if the matrix is lower triangular and false if it is not

is_diag_mat

Evaluates whether a matrix is diagonal, i.e., all elements outside of the main diagonal are zero.

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:is_diag_mat(b)
Parameter Data Type Description
b http://anzograph.com/matrices#tensor A tensor

Returns

Data Type Description
boolean Returns true if the matrix is diagonal and false if it is not

is_square

Evaluates whether a matrix is square, i.e., the number of rows is equal to the number of columns.

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:is_square(b)
Parameter Data Type Description
b http://anzograph.com/matrices#tensor A tensor

Returns

Data Type Description
boolean Returns true if the matrix is square and false if it is not

is_symmetric

Evaluates whether a matrix is symmetrical.

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:is_symmetric(b)
Parameter Data Type Description
b http://anzograph.com/matrices#tensor A tensor

Returns

Data Type Description
boolean Returns true if the matrix is symmetrical and false if it is not

is_hermitian

Checks whether the matrix is hermitian (self-adjoint).

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:is_hermitian(b)
Parameter Data Type Description
b http://anzograph.com/matrices#tensor A tensor

Returns

Data Type Description
boolean Returns true if the matrix is hermitian and false if it is not

has_nan

Evaluates whether a matrix is not a number (NaN).

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:has_nan(b)
Parameter Data Type Description
b http://anzograph.com/matrices#tensor A tensor

Returns

Data Type Description
boolean Returns true if at least one of the elements is NaN and false if all elements are numbers

vec_all

Evaluates whether all elements in a row or column vector are non-zero or satisfy the specified relational condition.

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:vec_all(b [, c ] [, val ])
Parameter Data Type Description
b http://anzograph.com/matrices#tensor A tensor
c int Optional argument that specifies the relational condition to test:
  • 0 (default): not equal
  • 1: greater than
  • 2: less than
  • 3: equal
  • 4: greater than or equal to
  • 5: less than or equal to
val double Optional argument that specifies the value to apply the condition (c) to. Default is 0.

Returns

Data Type Description
boolean Returns true if all elements are non-zero or satisfy the condition and false if not

vec_any

Evaluates whether any elements in a row or column vector are non-zero or satisfy the specified relational condition.

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:vec_any(b [, c ] [, val ])
Parameter Data Type Description
b http://anzograph.com/matrices#tensor A tensor
c int Optional argument that specifies the relational condition to test:
  • 0 (default): not equal
  • 1: greater than
  • 2: less than
  • 3: equal
  • 4: greater than or equal to
  • 5: less than or equal to
val double Optional argument that specifies the value to apply the condition (c) to. Default is 0.

Returns

Data Type Description
boolean Returns true if any elements are non-zero or satisfy the condition and false if not

mat_all

Evaluates whether all elements in a matrix are non-zero or satisfy the specified relational condition.

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:mat_all(b [, d ] [, c ] [, val ])
Parameter Data Type Description
b http://anzograph.com/matrices#tensor A tensor
d boolean Optional argument that indicates whether to check rows or columns. Set to true for rows or false for columns. Default is false.
c int Optional argument that specifies the relational condition to test:
  • 0 (default): not equal
  • 1: greater than
  • 2: less than
  • 3: equal
  • 4: greater than or equal to
  • 5: less than or equal to
val double Optional argument that specifies the value to apply the condition (c) to. Default is 0.

Returns

Data Type Description
http://anzograph.com/matrices#tensor Tensor representation of a row vector with each element as 0 or 1, indicating whether the corresponding row or column has all non-zero elements

mat_any

Evaluates whether any elements in a matrix are non-zero or satisfy the specified relational condition.

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:mat_any(b [, d ] [, c ] [, val ])
Parameter Data Type Description
b http://anzograph.com/matrices#tensor A tensor
d boolean Optional argument that indicates whether to check rows or columns. Set to true for rows or false for columns. Default is false.
c int Optional argument that specifies the relational condition to test:
  • 0 (default): not equal
  • 1: greater than
  • 2: less than
  • 3: equal
  • 4: greater than or equal to
  • 5: less than or equal to
val double Optional argument that specifies the value to apply the condition (c) to. Default is 0.

Returns

Data Type Description
http://anzograph.com/matrices#tensor Tensor representation of a row vector with each element as 0 or 1, indicating whether the corresponding row or column has any non-zero elements