Data Science Functions

AnzoGraph DB offers an extension package of pre-built Data Science functions that you can use in the same way as other native, built-in analytic functions. In addition, Cambridge Semantics offers an Apache Zeppelin Docker image, which includes a collection of notebooks that provide details and example usage of each of the Data Science functions. See Zeppelin Notebook Integration for more information on installing the custom Apache Zeppelin Docker image.

If you use the Installer to install AnzoGraph DB on RHEL/CentOS, these functions are included only if you choose to install the optional C++ extensions and related dependencies. If you do not install the C++ extensions, this package of functions is not included in the deployment. For more information, see Installing AnzoGraph DB.

The Data Science functions are organized into the following categories:

Correlation: Determine the relationship between elements.
Distribution: Calculate the probability of a given value over a random distribution.
Entropy: Determine variance and probability density across a given distribution.
Feature Exploration: Classify values in a distribution using techniques such as Linear Discriminant Analysis (LDA), Principal Component Analysis (PCA), and Singular Value Decomposition (SVD).
Linear Algebra: Create product vectors or a matrix from a given collection of random variables.
Profiling Metrics: Produce statistical metrics such as percentile, geometric mean, or skew on a given dataset.
Sketches: Extremely efficient streaming algorithms that approximate calculations such as count distinct, quantiles, most frequent items, joins, and matrix computations and return data sketches.
Matrix Utilities: Return information on various attributes of vector space mapping and related matrix tensors.

Correlation

Canonical Correlation (CANCOR): Calculates the overall correlation between two sets of variables.
Covariance (COVARIANCE): Provides a measure of the strength of the correlation between two or more sets of random variables (or variates).
Matthews Correlation Coefficient (MCC): Provides a measure of the quality of binary classifications of a condition with observed versus predicted scoring.
Pearson Correlation Coefficient (PCC): Determines the extent to which two variables are linearly related: positive, negative, or no relationship.
Spearman Correlation Coefficient (SCC): Determines how well the relationship between two variables can be described using a monotonic function.

Canonical Correlation (CANCOR)

This aggregate calculates the overall correlation between two sets of variables.

Wikipedia Reference: Canonical correlation function

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:cancor(lc, m, x1, x2, ..., xn, y1, y2, ..., yn)

Parameter	Data Type	Description
lc	int	Number of linear combinations for the first canonical correlation
m	int	Number of columns in the first set
x1–xn	double	Feature columns from the first dataset
y1–yn	double	Feature columns from the second dataset

Returns

Data Type	Description
string	Canonical correlation
string	Square of the canonical correlation
string	Canonical coefficient

Covariance (COVARIANCE)

This aggregate provides a measure of the strength of the correlation between two or more sets of random variables (or variates).

Wikipedia Reference: Covariance

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:covariance(x1, x2, ..., xn)

Parameter	Data Type	Description
x1–xn	double	Feature columns from the dataset

Returns

Data Type	Description
http://anzograph.com/matrices#tensor	Covariance matrix

Matthews Correlation Coefficient (MCC)

This aggregate returns a coefficient value between observed and predicted binary classifications.

Wikipedia Reference: Matthews correlation coefficient.

Syntax

PREFIX stats: <http://cambridgesemantics.com/anzograph/statistics#>
stats:mcc(x, y)

Parameter	Data Type	Description
x	boolean	First variable column data
y	boolean	Second variable column data

Returns

Data Type	Description
double	Coefficient value that shows the extent to which observed and predicted binary classifications are related

Pearson Correlation Coefficient (PCC)

This aggregate determines the extent to which two variables are linearly related: positive, negative, or no relationship.

Wikipedia Reference: Pearson correlation coefficient

Syntax

PREFIX stats: <http://cambridgesemantics.com/anzograph/statistics#>
stats:pcc(x, y)

Parameter	Data Type	Description
x	boolean	First variable column data
y	boolean	Second variable column data

Returns

Data Type	Description
double	Coefficient that shows the extent to which two variables are linearly related

Spearman Correlation Coefficient (SCC)

This aggregate determines how well the relationship between two variables can be described using a monotonic function.

Wikipedia Reference: Spearman's Correlation Coefficient

Syntax

PREFIX stats: <http://cambridgesemantics.com/anzograph/statistics#>
stats:scc(rank_X, rank_Y)

Parameter	Data Type	Description
rank_X	double	First set of ranked data
rank_Y	double	Second set of ranked data

Returns

Data Type	Description
double	Coefficient between ranked datasets

Distribution

Cumulative Distribution Functions (CDF): Calculate the probability of a random variable X taking on a value less than or equal to Y.
Bernoulli Distribution (BERNDIST): Determines the probability of a specific event occurring, or not occurring, in tests that have only two possible outcomes: success (1) or failure (0).
Beta-Binomial Distribution (BETABINDIST): Computes probability using a combination of both binomial and beta probability distributions.
Hypergeometric Distribution (HYPGEODIST): Calculates probability from a distribution that is often used to predict the outcome of a process in which different elements are randomly drawn from a collection and not replaced.
Logarithmic (Series) Distribution (LOGSERDIST): Calculates probability using a discrete probability distribution derived from the Maclaurin series expansion.
Skellam Distribution (SKELLAMDIST): Calculates probability using the Skellam distribution which models the difference between two independent Poisson distributed variables.

Cumulative Distribution Functions (CDF)

A Cumulative Distribution function calculates the probability of a random variable X taking on a value less than or equal to Y.

Wikipedia Reference: Cumulative Distribution Function (CDF)

The following functions produce cumulative distribution calculations:

Binomial Distribution (BINOMDIST): Calculates the probability for X successes in N trials given a probability of success P for each trial.
Chi-Squared Distribution (CHISQDIST): Calculates probability often used in hypothesis testing to compare an observed distribution with a theoretical one. Also provides a way to show a relationship between two categorical variables.
Continuous Uniform Distribution (CONUNIDIST): Calculates probability using continuous probability distribution concerned with events that are equally likely to occur.
Discrete Uniform Distribution (DISCUNIDIST): Calculates probability using symmetric probability distribution where a finite number of values are equally likely to be observed and every one of n values has equal probability.
Exponential Distribution (EXPDIST): Calculates probability using a distribution that describes time between events in a Poisson point process (where events occur continuously and independently at a constant average rate).
Laplace Distribution (LAPLACEDIST): Calculates probability using a distribution that represents differences between two independent variables that have identical exponential distributions (also called double exponential distribution).
Log Normal Distribution (LOGNORDIST): Calculates probability using a distribution of a random variable whose logarithm follows a normal distribution. Log normal distributions are widely used in risk analysis.
Negative Binomial Distribution (NEGBINDIST): Calculates probability using a discrete probability distribution that concerns the number of trials which must occur in order to have a predetermined number of successes.
Normal Distribution (NORMDIST): Calculates probability using a continuous probability distribution of data in which the majority of data points are relatively similar, within a small range of values having few outliers.
Poisson Distribution (POISDIST): Calculates probability using a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time or space and those events occur with a known constant rate and occur independently of the time since the last event.
Student's T-Distribution (TDIST): Calculates probability using the Student's t-distribution and associated t scores. Often used in hypothesis testing when the sample size is small and/or when the population variance is unknown.
TDigest Metric (TDIGEST): Creates an estimate of the median (and more generally, any percentile) from either distributed data or streaming data, using a t-Digest probabilistic data structure.
Weibull Distribution (WEIBULDIST): Calculates probability from a continuous probability distribution that is commonly used to assess product reliability, analyze product life data and failure times.

Binomial Distribution (BINOMDIST)

This aggregate calculates the probability for X successes in N trials given a probability of success P for each trial.

Wikipedia Reference: Binomial Distribution

Syntax

PREFIX stats: <http://cambridgesemantics.com/anzograph/statistics#>
stats:binomdist(data, n, k, "success_string")

Parameter	Data Type	Description
data	string	Column data
n	long	Number of trials
k	long	Number of successes in `n` trials
success_string	string	Defines the success string

Returns

Data Type	Description
double	Probability mass function value
double	Lower cumulative distribution: probability (<=k) under the area of distribution
double	Upper cumulative distribution: probability (>k) under the area of distribution

Chi-Squared Distribution (CHISQDIST)

This aggregate calculates probability that is often used in hypothesis testing to compare an observed distribution with a theoretical one. It also provides a way to show a relationship between two categorical variables.

Wikipedia Reference: Chi-Squared Distribution

Syntax

PREFIX stats: <http://cambridgesemantics.com/anzograph/statistics#>
stats:chisqdist(data, s)

Parameter	Data Type	Description
data	double	Sample data
s	double	Population standard deviation

Returns

Data Type	Description
double	Mean of the distribution
double	Standard deviation of the distribution
double	Variance of the distribution
double	Chi-squared statistic: `[(n - 1) * s^2] / d^2` where `d` is the standard deviation of the population, `s` is the standard deviation of the sample, and `n` is the sample size
long	Number of samples: the degrees of freedom(`k`) is `(count-1)`
double	Probability mass function value
double	Cumulative distribution: the probability for <= the chi-squared statistic

Continuous Uniform Distribution (CONUNIDIST)

This aggregate calculates probability using a continuous probability distribution concerned with events that are equally likely to occur.

Wikipedia Reference: Continuous Uniform Distribution

Syntax

PREFIX stats: <http://cambridgesemantics.com/anzograph/statistics#>
stats:conunidist(data, a, b)

Parameter	Data Type	Description
data	double	Column data
a	double	Minimum value of the probability interval
b	double	Maximum value of the probability interval

Returns

Data Type	Description
double	Cumulative distribution: probability under the area of distribution
double	Probability density function value
double	Differential entropy in nats

Discrete Uniform Distribution (DISCUNIDIST)

This aggregate calculates probability using symmetric probability distribution where a finite number of values are equally likely to be observed and every one of n values has equal probability.

Wikipedia Reference: Discrete Uniform Distribution

Syntax

PREFIX stats: <http://cambridgesemantics.com/anzograph/statistics#>
stats:discunidist(data, k)

Parameter	Data Type	Description
data	long	Column data
k	long	The number of outcomes

Returns

Data Type	Description
double	Cumulative distribution: probability under the area of distribution
double	Probability density function value
double	Differential entropy in nats

Exponential Distribution (EXPDIST)

This aggregate calculates probability using a distribution that describes time between events in a Poisson point process (where events occur continuously and independently at a constant average rate).

Wikipedia Reference: Exponential Distribution

Syntax

PREFIX stats: <http://cambridgesemantics.com/anzograph/statistics#>
stats:expdist(data, x)

Parameter	Data Type	Description
data	long	Column data
x	double	The probability for the interval

Returns

Data Type	Description
double	Lower cumulative distribution: probability (<=k) under the area of distribution
double	Upper cumulative distribution: probability (>k) under the area of distribution
double	Probability density function value
double	Differential entropy in nats

Laplace Distribution (LAPLACEDIST)

This aggregate calculates probability using a distribution that represents differences between two independent variables that have identical exponential distributions (also called double exponential distribution).

Wikipedia Reference: Laplace Distribution

Syntax

PREFIX stats: <http://cambridgesemantics.com/anzograph/statistics#>
stats:laplacedist(data, "c", x1, x2)

Parameter	Data Type	Description
data	double	Column data
c	string	"below", "above", "bet" (between), or "out" (outside)
x1	double	Lower number (>0) to find the probability
x2	double	Upper number (>0) to find the probability

Returns

Data Type	Description
double	Mean of the distribution
double	Scale parameter of the distribution
double	Standard deviation of the distribution
double	Variance of the distribution
double	Differential entropy in nats
double	Cumulative distribution: probability under the area of distribution
double	Probability density function value for x1
double	Probability density function value for x2

Log Normal Distribution (LOGNORDIST)

This aggregate calculates probability using distribution of a random variable whose logarithm follows a normal distribution. The log normal distribution widely used in risk analysis.

Wikipedia Reference: Log Normal Distribution

Syntax

PREFIX stats: <http://cambridgesemantics.com/anzograph/statistics#>
stats:lognordist(data, "c", x1, x2)

Parameter	Data Type	Description
data	double	Column data
c	string	"below", "above", "bet" (between), or "out" (outside)
x1	double	Lower number (>0) to find the probability
x2	double	Upper number (>0) to find the probability

Returns

Data Type	Description
double	Mean of the distribution of natural logarithms distribution
double	Standard deviation of the distribution of natural logarithms distribution
double	Variance of the distribution
double	Differential entropy in nats
double	Cumulative distribution: probability under the area of distribution
double	Probability density function value for x1
double	Probability density function value for x2

Negative Binomial Distribution (NEGBINDIST)

This aggregate calculates probability using a discrete probability distribution that concerns the number of trials which must occur in order to have a predetermined number of successes.

Wikipedia Reference: Negative Binomial Distribution

Syntax

PREFIX stats: <http://cambridgesemantics.com/anzograph/statistics#>
stats:negbindist("data", k, r, "success_string")

Parameter	Data Type	Description
data	string	Column data
k	long	Number of successes
r	long	Number of failures
success_string	string	Defines the success string

Returns

Data Type	Description
double	Probability mass function value
double	Lower cumulative distribution: probability (<=k) under the area of distribution
double	Upper cumulative distribution: probability (>k) under the area of distribution

Normal Distribution (NORMDIST)

This aggregate calculates probability using a continuous probability distribution of data in which the majority of data points are relatively similar, within a small range of values with few outliers.

Wikipedia Reference: Normal Distribution

Syntax

PREFIX stats: <http://cambridgesemantics.com/anzograph/statistics#>
stats:normdist(data, "c", x1, x2)

Parameter	Data Type	Description
data	double	Column data
c	string	"below", "above", "bet" (between), or "out" (outside)
x1	double	Lower number (>0) to find the probability
x2	double	Upper number (>0) to find the probability

Returns

Data Type	Description
double	Mean of the distribution
double	Standard deviation of the distribution
double	Variance of the distribution
double	Differential entropy in nats
double	Cumulative distribution: probability under the area of distribution
double	Probability density function value for x1
double	Probability density function value for x2

Poisson Distribution (POISDIST)

This function calculates probability using discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time or space, given these events occur with a known constant rate and occur independently of the time since the last event.

Wikipedia Reference: Poisson Distribution

Syntax

PREFIX stats: <http://cambridgesemantics.com/anzograph/statistics#>
stats:poisdist(data, k)

Parameter	Data Type	Description
data	long	Column data
k	long	Probability of observing k events in an interval

Returns

Data Type	Description
double	Probability mass function value
double	Lower cumulative distribution: probability (<=k) under the area of distribution
double	Upper cumulative distribution: probability (>k) under the area of distribution

Student's T-Distribution (TDIST)

This function calculates probability using the Student's t-distribution (and associated t scores) which are often used in hypothesis testing when the sample size is small and/or when the population variance is unknown.

Wikipedia Reference: Student's t-distribution

Syntax

PREFIX stats: <http://cambridgesemantics.com/anzograph/statistics#>
stats:tdist(data, M)

Parameter	Data Type	Description
data	double	Sample data
M	double	Population mean

Returns

Data Type	Description
double	Mean of the distribution
double	Standard deviation of the distribution
double	Variance of the distribution
double	T-statistics: `t = [ u - M ] / [ s / sqrt( N ) ]` where `u` is the sample mean, `M` is the population mean, `s` is the standard deviation of the sample, and `N` is the sample size
long	Number of samples: the degrees of freedom is `(count-1)`
double	Probability mass function value
double	Cumulative distribution: the probability for <= t-statistics

TDigest Metric (TDIGEST)

This function creates an estimate of the median (and more generally, any percentile) from either distributed data or streaming data, using a t-Digest probabilistic data structure.

Wikipedia Reference: Computing Quantiles using T-Digests

Syntax

PREFIX sketch: <http://cambridgesemantics.com/anzograph/statistics/sketch#>
sketch:tdigest(data, p, q, cdf)

Parameter	Data Type	Description
data	double	Column data
p	double	The percentile (0 - 100) to compute
q	double	The quantile (0.0 - 1.0) to compute
cdf	double	The CDF to use

Returns

Data Type	Description
double	Percentile: the value below which a given percentage of observations falls
double	Quantile: Cut point to dividing the observations in a sample
double	The computation of F(x) where F is the CDF of the distribution

Weibull Distribution (WEIBULDIST)

This function calculates probability from a continuous probability distribution commonly used to assess product reliability and analyze product life data and failure times.

Wikipedia Reference: Weibull Distribution

Syntax

PREFIX stats: <http://cambridgesemantics.com/anzograph/statistics#>
stats:weibuldist(data, k, x)

Parameter	Data Type	Description
data	double	Sample data
k	double	The initial starting value for the shape parameter. A good guess is crucial to quick convergence.
x	double	The probability for a random variable

Returns

Data Type	Description
double	The mean of the distribution
double	The standard deviation of the distribution
double	The variance of the distribution
long	The count of the number of samples
double	The estimated shape parameter(k) of the distribution from the mean and variance using the root finding method
double	The estimated scale parameter(a) of the distribution from the mean and variance using the root finding method
double	Differential entropy in nats
double	Probability density function value
double	Lower cumulative distribution: probability (<=x) under the area of distribution
double	Upper cumulative distribution: probability (>x) under the area of distribution
long	The actual number of iterations performed to get an estimate of the k value
double	The mean calculated using estimated values of k and a
double	The variance calculated using estimated values of k and a

Bernoulli Distribution (BERNDIST)

The Bernoulli Distribution function determines the probability of success or failure (or Yes or No) in tests that have only two possible outcomes.

Wikipedia Reference: Bernoulli Distribution

Syntax

PREFIX stats: <http://cambridgesemantics.com/anzograph/statistics#>
stats:berndist("data", prob, "success_string")

Parameter	Data Type	Description
data	string	Column data
prob	boolean	Probability of success (true) or failure (false)
success_string	string	The success message

Returns

Data Type	Description
double	The Bernoulli distribution probability

Beta-Binomial Distribution (BETABINDIST)

The Beta-Binomial Distribution function computes probability using a combination of both binomial and beta probability distributions.

Wikipedia Reference: Beta-Binomial Distribution

Syntax

PREFIX stats: <http://cambridgesemantics.com/anzograph/statistics#>
stats:betabindist(k, n, alpha, beta)

Parameter	Data Type	Description
k	double	The probability for the number
n	double	The number of trials
alpha, beta	double	Shape parameters

Returns

Data Type	Description
double	The probability of occurrence k for a beta binomial n, alpha, beta

Hypergeometric Distribution (HYPGEODIST)

The Hypergeometric Distribution function calculates probability from a distribution often used to predict the outcome of a process in which different elements are randomly drawn from a collection and not replaced.

Wikipedia Reference: HyperGeometric Distribution

Syntax

PREFIX stats: <http://cambridgesemantics.com/anzograph/statistics#>
stats:hypgeodist("data", n, k, "success_string")

Parameter	Data Type	Description
data	string	Column data
n	int	The number of trials
k	int	The number of success in n trials
success_string	string	The success message

Returns

Data Type	Description
double	The hypergeometric distribution probability

Logarithmic (Series) Distribution (LOGSERDIST)

This function calculates probability using a discrete probability distribution derived from the Maclaurin series expansion.

Wikipedia Reference: Logarithmic (Series) Distribution

Syntax

PREFIX stats: <http://cambridgesemantics.com/anzograph/statistics#>
stats:logserdist("data", k, "success_string")

Parameter	Data Type	Description
data	string	Column data
k	long	The probability for the number
success_string	string	The success message

Returns

Data Type	Description
double	The logarithmic distribution probability

Skellam Distribution (SKELLAMDIST)

This function calculates probability using the Skellam distribution which models the difference between two independent Poisson distributed variables.

Wikipedia Reference: Skellam Distribution

Syntax

PREFIX stats: <http://cambridgesemantics.com/anzograph/statistics#>
stats:skellamdist(n1_data, n2_data, k)

Parameter	Data Type	Description
n1_data	long	N1 column data
n2_data	long	N2 column data
k	long	Probability for the number

Returns

Data Type	Description
double	The Skellam probability value

Entropy

Cross Entropy (CROSSENTROPY): Computes cross-entropy, which is commonly used to quantify the difference between two probability distributions.
Discrete Entropy Metric (DISCENTROPY): Calculates discrete entropy for maps on finite sets.
Differential Entropy or Continuous Entropy Metrics: Compute differential entropy (also referred to as continuous entropy), which is entropy defined for distributions with a continuous random variable.

Cross Entropy (CROSSENTROPY)

This function computes cross-entropy, which is commonly used to quantify the difference between two probability distributions.

Wikipedia Reference: Cross Entropy

Syntax

PREFIX stats: <http://cambridgesemantics.com/anzograph/statistics#>
stats:crossentropy(p, q)

Parameter	Data Type	Description
p	double	True probabilities for x
q	double	Predicted probabilities for x

Returns

Data Type	Description
double	The cross entropy value

Discrete Entropy Metric (DISCENTROPY)

This function calculates entropy for maps on finite sets, referred to as discrete entropy.

ScienceDirect Reference: Discrete Entropy

Syntax

PREFIX stats: <http://cambridgesemantics.com/anzograph/statistics#>
stats:discentropy("data")

Parameter	Data Type	Description
data	string	Column data

Returns

Data Type	Description
double	The discrete entropy value

Differential Entropy or Continuous Entropy Metrics

Differential entropy (also referred to as continuous entropy) is entropy that can be computed for distributions with a continuous random variable.

Wikipedia Reference: Differential entropy

The following functions produce entropy calculations:

Continuous Uniform Distribution (CONUNIDIST)
Discrete Uniform Distribution (DISCUNIDIST)
Exponential Distribution (EXPDIST)
Laplace Distribution (LAPLACEDIST)
Log Normal Distribution (LOGNORDIST)
Normal Distribution (NORMDIST)
Weibull Distribution (WEIBULDIST)

Feature Exploration

Principal Component Analysis (PCA): Reduces a high-dimensional dataset into fewer dimensions while retaining important information, which makes it easier to explore and visualize data.
Singular Value Decomposition (SVD): Similar to PCA, except that the factorization for SVD is done on the data matrix, whereas the factorization is done on the covariance matrix with PCA.
Linear Discriminant Analysis (LDA): Uses dimensionality reduction and classifier to make predictions.

Principal Component Analysis (PCA)

Applies principal component analysis (PCA) to create combined eigenvalues and vectors that highlight patterns in a dataset, making it easier to explore and visualize data.

Wikipedia Reference: Principal Component Analysis

There are several options available for principal component analysis:

pca::create
pca::get_eigvec
pca::get_eigval
transform
dump_tensor
pca::get_raw_eigval

pca::create

Applies PCA to create combined eigenvalues and eigenvectors.

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:pca::create(x1, x2, ..., xn)

Parameter	Data Type	Description
x1–xn	double	Feature column datasets

Returns

Data Type	Description
http://anzograph.com/matrices#feature_result	PCA data containing eigenvalues and eigenvectors

pca::get_eigvec

Retrieves PCA's eigenvectors as a matrix from the PCA data.

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:pca::get_eigvec(pca_data)

Parameter	Data Type	Description
pca_data	http://anzograph.com/matrices#feature_result	Principal Component Analysis data

Returns

Data Type	Description
http://anzograph.com/matrices#tensor	Eigenvectors as a matrix

pca::get_eigval

Retrieves PCA's eigenvalues as a column vector from PCA data.

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:pca::get_eigval(pca_data)

Parameter	Data Type	Description
pca_data	http://anzograph.com/matrices#feature_result	Principal Component Analysis data

Returns

Data Type	Description
http://anzograph.com/matrices#tensor	Eigenvalues in descending order as column vectors

transform

Applies Linear Discriminant Analysis (LDA) to transform samples onto the new subspace.

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:transform(pca_data, d, x1, x2, ..., xn)

Parameter	Data Type	Description
pca_data	http://anzograph.com/matrices#feature_result	Principal Component Analysis data
d	int	The number of eigenvectors to consider from the end
x1–xn	double	Feature column datasets

Returns

Data Type	Description
double	Sample data transformed into the tuple of lower dimensional space

dump_tensor

Gets a string representation of vector or matrix in row-wise/column-wise order.

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:dump_tensor(m [, type ] [, isRowWise ])

Parameter	Data Type	Description
m	http://anzograph.com/matrices#tensor	A tensor of matrix/row vector/column vector
type	int	Optional argument that specifies the type of tensor: `0`=row vector, `1`=column vector, `2`=matrix. Default is `2`.
isRowWise	boolean	Optional argument that indicates whether the display matrix is column- or row- wise: `false`=column-wise, `true`=row-wise. Default is `true`.

Returns

Data Type	Description
string	String representation of vector or matrix in row-wise/column-wise

pca::get_raw_eigval

Gets the PCA's unsorted eigenvalues from the PCA data.

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:pca::get_raw_eigval(pca_data)

Parameter	Data Type	Description
pca_data	http://anzograph.com/matrices#feature_result	Principal Component Analysis data

Returns

Data Type	Description
http://anzograph.com/matrices#tensor	Eigenvalues in unsorted order as column vectors

Singular Value Decomposition (SVD)

A matrix factorization method, singular value decomposition (SVD), creates combined singular values and right singular vectors.

Wikipedia Reference: Singular Value Decomposition

There are several options available for singular value decomposition:

svd::create
svd::get_sigval
svd::get_sigvec
transform
dump_tensor

svd::create

Applies SVD to create combined singular values and right singular vectors.

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:svd::create(x1, x2, ..., xn)

Parameter	Data Type	Description
x1–xn	double	Feature column datasets

Returns

Data Type	Description
http://anzograph.com/matrices#feature_result	SVD data containing singular values and right singular vectors

svd::get_sigval

Gets SVD's singular values as a column vector from the SVD data.

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:svd::get_sigval(svd_data)

Parameter	Data Type	Description
svd_data	http://anzograph.com/matrices#feature_result	SVD data

Returns

Data Type	Description
http://anzograph.com/matrices#tensor	Singular values in descending order as a column vector

svd::get_sigvec

Gets SVD's singular vector as a matrix from the SVD data.

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:svd::get_sigvec(svd_data)

Parameter	Data Type	Description
svd_data	http://anzograph.com/matrices#feature_result	SVD data

Returns

Data Type	Description
http://anzograph.com/matrices#tensor	Right singular vectors as a matrix

transform

Applies PCA or SVD to transform samples onto the new subspace.

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:transform(svd_data, d, x1, x2, ..., xn)

Parameter	Data Type	Description
svd_data	http://anzograph.com/matrices#feature_result	SVD data
d	int	The number of singular vectors to consider from the start
x1–xn	double	Feature column datasets

Returns

Data Type	Description
string	Sample data transformed into the tuple of lower dimensional space

dump_tensor

Returns a string representation of a vector or matrix in row- or column- wise order.

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:dump_tensor(m [, type ] [, isRowWise ])

Parameter	Data Type	Description
m	http://anzograph.com/matrices#tensor	A tensor of matrix/row vector/column vector
type	int	Optional argument that specifies the type of tensor: `0`=row vector, `1`=column vector, `2`=matrix. Default is `2`.
isRowWise	boolean	Optional argument that indicates whether the display matrix is column- or row- wise: `false`=column-wise, `true`=row-wise. Default is `true`.

Returns

Data Type	Description
string	String representation of the vector or matrix in row-wise/column-wise

Linear Discriminant Analysis (LDA)

This function applies linear discriminant analysis (LDA) to create combined eigenvalues and vectors that characterize or separate two or more classes of objects or events.

Wikipedia Reference: Linear Discriminant Analysis

There are several options available for linear discriminant analysis:

lda::create
lda::get_eigvec
lda::get_eigval
lda::transform
dump_tensor
lda::get_raw_eigval

lda::create

Applies LDA to create combined eigenvalues and eigenvectors.

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:lda::create(y, x1, x2, ..., xn)

Parameter	Data Type	Description
y	double	Class of feature tuple
x1–xn	double	Feature column datasets

Returns

Data Type	Description
http://anzograph.com/matrices#lda_result	Combined eigenvalues, eigenvectors, class mean, count, and class map

lda::get_eigvec

Given LDA data, this function gets LDA's eigenvectors as a matrix.

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:lda::get_eigvec(lda_data)

Parameter	Data Type	Description
lda_data	http://anzograph.com/matrices#lda_result	Linear discriminant analysis data

Returns

Data Type	Description
http://anzograph.com/matrices#tensor	Eigenvectors as a matrix

lda::get_eigval

Given LDA data, this function gets LDA's eigenvalues as a column vector.

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:lda::get_eigval(lda_data)

Parameter	Data Type	Description
lda_data	http://anzograph.com/matrices#lda_result	Linear discriminant analysis data

Returns

Data Type	Description
http://anzograph.com/matrices#tensor	Eigenvalues in descending order as a column vector

lda::transform

Applies LDA to transform samples onto the new subspace.

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:lda::transform(lda_data, d, x1, x2, ..., xn)

Parameter	Data Type	Description
lda_data	http://anzograph.com/matrices#lda_result	Linear discriminant analysis data
d	int	The number of eigenvectors to consider from the start
x1–xn	double	Feature column datasets

Returns

Data Type	Description
double	Original data transformed into the tuple of lower dimensional space

dump_tensor

Returns a string representation of a vector or matrix in row- or column- wise order.

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:dump_tensor(m [, type ] [, isRowWize ])

Parameter	Data Type	Description
m	http://anzograph.com/matrices#tensor	A tensor of matrix/row vector/column vector
type	int	Optional argument that specifies the type of tensor: `0`=row vector, `1`=column vector, `2`=matrix. Default is `2`.
isRowWise	boolean	Optional argument that indicates whether the display matrix is column- or row- wise: `false`=column-wise, `true`=row-wise. Default is `true`.

Returns

Data Type	Description
string	String representation of the vector or matrix in row-wise/column-wise

lda::predict

Predicts the class for the samples using LDA as the classifier.

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:lda::predict(lda_data, p1, p2, ..., pn)

Parameter	Data Type	Description
lda_data	http://anzograph.com/matrices#lda_result	LDA data
p1–pn	double	Data sample whose class to predict

Returns

Data Type	Description
string	The class name to which data tuple belongs

lda::get_raw_eigval

Given LDA data, this function gets LDA's unsorted eigenvalues.

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:lda::get_raw_eigval(lda_data)

Parameter	Data Type	Description
lda_data	http://anzograph.com/matrices#lda_result	LDA data

Returns

Data Type	Description
http://anzograph.com/matrices#tensor	Eigenvalues in unsorted order as a column vector

Linear Algebra

Gramian Matrix (GRAMIAN): Creates a Gramian matrix commonly used to compute linear independence.

Gramian Matrix (GRAMIAN)

This function creates a Gramian matrix commonly used to compute linear independence.

Wikipedia Reference: Gramian Matrix

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:gramian(x1, x2, ..., xn)

Parameter	Data Type	Description
x1–xn	double	Feature column datasets

Returns

Data Type	Description
http://anzograph.com/matrices#tensor	Gramian matrix

Profiling Metrics

Discrete Probability Metrics: Calculate a discrete probability distribution of values.
Geometric Mean Metric (GMEAN): Calculates geometric mean, defined as the nth root of the product of n positive numbers.
Percentile Metric (PERCENTILE): Calculates 1 to 100 percentile of numeric values.
Skew Metric (SKEWCOEFF): Calculates Pearson’s coefficient of skewness on numeric values.
TDigest Metric (TDIGEST): Creates an estimate of the median (and more generally, any percentile) from either distributed data or streaming data, using a t-Digest probabilistic data structure.

Discrete Probability Metrics

These functions calculate a discrete probability distribution of values.

Wikipedia Reference: Discrete Probability

The following functions produce discrete probability metrics:

Bernoulli Distribution (BERNDIST)
Binomial Distribution (BINOMDIST)
Negative Binomial Distribution (NEGBINDIST)
Poisson Distribution (POISDIST)

Geometric Mean Metric (GMEAN)

This function calculates geometric mean, defined as the nth root of the product of n positive numbers.

Wikipedia Reference: Geometric Mean

Syntax

PREFIX stats: <http://cambridgesemantics.com/anzograph/statistics#>
stats:gmean(data)

Parameter	Data Type	Description
data	double	Column data

Returns

Data Type	Description
double	The geometric mean value

Percentile Metric (PERCENTILE)

This function calculates the percentile (1 to 100) of numeric values.

Wikipedia Reference: Percentile Metric

Syntax

PREFIX stats: <http://cambridgesemantics.com/anzograph/statistics#>
stats:percentile(data, p)

Parameter	Data Type	Description
data	double	Dataset
p	double	The percentile (0 - 100) to compute

Returns

Data Type	Description
double	The percentile value

Skew Metric (SKEWCOEFF)

This function calculates the Pearson’s coefficient of skewness on numeric values.

Wikipedia Reference: Skewness

Syntax

PREFIX stats: <http://cambridgesemantics.com/anzograph/statistics#>
stats:skewcoeff(data, dp)

Parameter	Data Type	Description
data	double	Data set
dp	int	Number of decimal points to consider for the input data

Returns

Data Type	Description
double	The mode (value that appears most frequently)
double	The median number in an ordered set of data
double	The average value
double	The standard deviation
double	Pearson mode skewness or first skewness coefficient
double	Pearson median skewness or second skewness coefficient

Sketches

Cardinality Metric (HLL): Uses Apache DataSketches HyperLogLog (HLL) to calculate cardinality estimates for a dataset.
Frequent Items (FI): Collection of functions used to create frequency sketches and obtain information about frequent items.
Quantile/Rank Sketch (KLL): Collection of functions that use the KLL sketch computation model to approximate minimum and maximum items in a dataset, the quantile and rank of items, the Probability Mass Function (PMF), and the Cumulative Distribution Function (CDF).
Theta Sketch (THETA): Collection of functions that use the Theta Sketch framework to compute estimates of the cardinality, union, intersection, and difference set operations and return a Theta Sketch.

Cardinality Metric (HLL)

This aggregate calculates cardinality estimates for a dataset using Apache DataSketches HyperLogLog (HLL).

Reference: Cardinality Prominence Metric

Syntax

PREFIX sketch: <http://cambridgesemantics.com/anzograph/sketch#>
sketch:hll(data [, log_base_2_K ] [, hll_target_type ])

Parameter	Data Type	Description
data	byte, short, int, long, float, double, string, URI	The dataset
log_base_2_K	int	Optional argument that specifies the log base 2 of K, where K is the number of buckets or slots for the sketch. Must be between 4 and 21 (inclusive). Default value is `12`.
hll_target_type	int	Optional argument that specifies the target type for the HLL sketch. Supported values are `4` (HLL_4), `6` (HLL_6), or `8` (HLL_8). Default value is `4`.

Returns

Data Type	Description
double	The cardinality metric value

Frequent Items (FI)

The FI aggregate is used to estimate the frequency of items in a dataset, the upper and lower bounds of the items, the number of active items, and the total stream weight. FI returns a binary stream (Frequent Items Sketch) containing all of the computed values. Values can retrieved from the sketch using the get_estimates, get_active_items_total_weights, get_top_items, and get_top_strings functions.

For more information about frequency sketches, see Frequency Sketches Overview.

FI Syntax

PREFIX sketch: <http://cambridgesemantics.com/anzograph/sketch#>
sketch:fi(values [, weight ])

Parameter	Data Type	Description
values	short, int, long, float, double, string	The dataset
weight	long	Optional argument that specifies the weight of `val`. The default value is `1`.

Returns

Data Type	Description
http://anzograph.com/statistics#fi_sketch	Binary Frequent Items Sketch

Frequent Items Sketch Retrieval Functions

The following functions are available for retrieving values from a Frequent Items Sketch:

fi::get_estimates
fi::get_active_items_total_weights
fi::get_top_items
fi::get_top_strings

fi::get_estimates

Returns the estimates for the frequency and lower and upper bound of the given item in a sketch.

Syntax

PREFIX sketch: <http://cambridgesemantics.com/anzograph/sketch#>
sketch:fi::get_estimates(fi_sketch, item)

Parameter	Data Type	Description
fi_sketch	http://anzograph.com/statistics#fi_sketch	Frequent Items Sketch
item	Object	Item for which to get estimates

Returns

Data Type	Description
long	Frequency estimate for the item
long	Lower bound estimate for the item
long	Upper bound estimate for the item

fi::get_active_items_total_weights

Returns the number of active items and the estimated total stream weight from a sketch.

Syntax

PREFIX sketch: <http://cambridgesemantics.com/anzograph/sketch#>
sketch:fi::get_active_items_total_weights(fi_sketch)

Parameter	Data Type	Description
fi_sketch	http://anzograph.com/statistics#fi_sketch	Frequent Items Sketch

Returns

Data Type	Description
long	The estimated number of active items
long	The estimated total stream weight

fi::get_top_items

Returns the most frequent items and their corresponding frequency.

Syntax

PREFIX sketch: <http://cambridgesemantics.com/anzograph/sketch#>
sketch:fi::get_top_items(fi_sketch)

Parameter	Data Type	Description
fi_sketch	http://anzograph.com/statistics#fi_sketch	Frequent Items Sketch

Returns

Data Type	Description
double	The item with the highest frequency
long	Frequency estimate of the first item
double	The item with the second highest frequency
long	Frequency estimate of the second item
double	The item with the nth highest frequency
long	Frequency estimate of the nth item

fi::get_top_strings

Gets top frequent strings and their corresponding frequency.

Syntax

PREFIX sketch: <http://cambridgesemantics.com/anzograph/sketch#>
sketch:fi::get_top_strings(fi_sketch)

Parameter	Data Type	Description
fi_sketch	http://anzograph.com/statistics#fi_sketch	Frequent Items Sketch

Returns

Data Type	Description
string	The string with the highest frequency
long	Frequency estimate of the first string
string	The string with the second highest frequency
long	Frequency estimate of the second string
string	The string with the nth highest frequency
long	Frequency estimate of the nth string

Quantile/Rank Sketch (KLL)

The KLL aggregate uses the KLL Sketch computation model to calculate the approximate minimum and maximum items in a dataset, the quantile and rank of items, the Probability Mass Function (PMF), and the Cumulative Distribution Function (CDF). KLL returns a binary stream (KLL Sketch) containing all of the computed values. Values can retrieved from the sketch using various functions.

For more information about KLL sketches, see KLL Sketch.

KLL Syntax

PREFIX sketch: <http://cambridgesemantics.com/anzograph/sketch#>
sketch:kll(values [, k ])

Parameter	Data Type	Description
values	short, int, long, float, double, string	The dataset
k	int	Optional argument that configures the size of the sketch and its estimation error. Can be any value between 8 and 65535 (inclusive). The default value is `200`, which results in a normalized rank error of about 1.65%. Higher values will have a smaller error but the sketch will be larger (and slower).

Returns

Data Type	Description
http://anzograph.com/statistics#kll_sketch	Binary KLL sketch

KLL Sketch Retrieval Functions

The following functions are available for retrieving values from a KLL sketch:

kll::get_min_value
kll::get_max_value
kll::get_n
kll::get_num_retained
kll::get_rank
kll::get_quantile
kll::get_quantiles
kll::get_quantiles_str
kll::get_pmf
kll::get_cdf

kll::get_min_value

Returns the minimum value in a KLL sketch.

Syntax

PREFIX sketch: <http://cambridgesemantics.com/anzograph/sketch#>
sketch:kll::get_min_value(kll_sketch)

Parameter	Data Type	Description
kll_sketch	http://anzograph.com/statistics#kll_sketch	KLL sketch

Returns

Data Type	Description
double	The minimum value in the sketch
string	If the input is a string, the minimum string is returned

kll::get_max_value

Returns the maximum value in a KLL sketch.

Syntax

PREFIX sketch: <http://cambridgesemantics.com/anzograph/sketch#>
sketch:kll::get_max_value(kll_sketch)

Parameter	Data Type	Description
kll_sketch	http://anzograph.com/statistics#kll_sketch	KLL sketch

Returns

Data Type	Description
double	The maximum value in the sketch
string	If the input is a string, the minimum string is returned

kll::get_n

Returns the length of a KLL sketch.

Syntax

PREFIX sketch: <http://cambridgesemantics.com/anzograph/sketch#>
sketch:kll::get_n(kll_sketch)

Parameter	Data Type	Description
kll_sketch	http://anzograph.com/statistics#kll_sketch	KLL sketch

Returns

Data Type	Description
long	The length of the sketch

kll::get_num_retained

Returns the number of retained items (samples) in a sketch.

Syntax

PREFIX sketch: <http://cambridgesemantics.com/anzograph/sketch#>
sketch:get_num_retained(kll_sketch)

Parameter	Data Type	Description
kll_sketch	http://anzograph.com/statistics#kll_sketch	KLL sketch

Returns

Data Type	Description
long	The number of retained items (samples) in the sketch

kll::get_rank

Returns an approximation of the normalized (fractional) rank of the given item.

Syntax

PREFIX sketch: <http://cambridgesemantics.com/anzograph/sketch#>
sketch:kll::get_rank(kll_sketch, v)

Parameter	Data Type	Description
kll_sketch	http://anzograph.com/statistics#kll_sketch	KLL sketch
v	double	The item to retrieve the rank for

Returns

Data Type	Description
double	The approximate rank of the item from 0 - 1 (inclusive)

kll::get_quantile

Returns an approximation of the value for an item from the rank.

Syntax

PREFIX sketch: <http://cambridgesemantics.com/anzograph/sketch#>
sketch:kll::get_quantile(kll_sketch, fraction)

Parameter	Data Type	Description
kll_sketch	http://anzograph.com/statistics#kll_sketch	KLL sketch
fraction	double	The fractional position in the hypothetical sorted stream

Returns

Data Type	Description
double	An approximation of the value of the item that would be preceded by the given fraction of a hypothetical sorted version of the sketch
string	An approximation of the string when the input is a string

kll::get_quantiles

Provides a more efficient, multiple-query version of kll::get_quantile that enables you to specify a number of evenly spaced fractional ranks.

Syntax

PREFIX sketch: <http://cambridgesemantics.com/anzograph/sketch#>
sketch:kll::get_quantiles(kll_sketch, f1, f2, ..., f10)

Parameter	Data Type	Description
kll_sketch	http://anzograph.com/statistics#kll_sketch	KLL sketch
f1–f10	double	Normalized or fractional ranks in the hypothetical sorted stream. The ranks must be in the interval 0.0 - 1.0 (inclusive).

Returns

Data Type	Description
double	An approximation of the values in the same order as the given fractional positions

kll::get_quantiles_str

Provides an approximation to the strings when the input is a string type.

Syntax

PREFIX sketch: <http://cambridgesemantics.com/anzograph/sketch#>
sketch:kll::get_quantiles_str(kll_sketch, f1, f2, ..., f10)

Parameter	Data Type	Description
kll_sketch	http://anzograph.com/statistics#kll_sketch	KLL sketch
f1–f10	double	Normalized or fractional ranks in the hypothetical sorted stream. The ranks must be in the interval 0.0 - 1.0 (inclusive).

Returns

Data Type	Description
string	An approximation of the strings

kll::get_pmf

Provides an approximation to the Probability Mass Function (PMF) of the input stream.

Syntax

PREFIX sketch: <http://cambridgesemantics.com/anzograph/sketch#>
sketch:kll::get_pmf(kll_sketch, v1, v2, ..., v10)

Parameter	Data Type	Description
kll_sketch	http://anzograph.com/statistics#kll_sketch	KLL sketch
v1–v10	Object	Input values between the minimum and maximum values of the input stream. Values must be unique and monotonically increasing.

Returns

Data Type	Description
double	PMF values corresponding to the input

kll::get_cdf

Provides an approximation to the Cumulative Distribution Function (CDF), which is the cumulative analog of the PMF of the input stream.

Syntax

PREFIX sketch: <http://cambridgesemantics.com/anzograph/sketch#>
sketch:kll::get_cdf(kll_sketch, v1, v2, ..., v10)

Parameter	Data Type	Description
kll_sketch	http://anzograph.com/statistics#kll_sketch	KLL sketch
v1–v10	Object	Input values between the minimum and maximum values of the input stream. Values must be unique and monotonically increasing.

Returns

Data Type	Description
double	CDF values corresponding to the input

Theta Sketch (THETA)

The THETA aggregate uses the Theta Sketch framework to compute estimates of the cardinality, union, intersection, and difference set operations and return a binary stream (Theta Sketch) containing the computed values. Values can be retrieved from the sketch using the cardinality, union, intersection, and difference functions.

Theta Sketches are a generalization of the well-known K^th Minimum Value (KMV) sketches. For more information about the framework, you may find the following references helpful:

THETA Syntax

PREFIX sketch: <http://cambridgesemantics.com/anzograph/sketch#>
sketch:theta(values)

Parameter	Data Type	Description
values	short, int, long, float, double, string	The dataset to operate on

Returns

Data Type	Description
http://anzograph.com/statistics#theta_sketch	Binary Theta Sketch

Theta Sketch Retrieval Functions

The following functions are available for retrieving values from a Theta Sketch:

theta::cardinality
theta::union
theta::intersection
theta::difference

theta::cardinality

Retrieves the estimated count of values in a Theta Sketch.

Syntax

PREFIX sketch: <http://cambridgesemantics.com/anzograph/sketch#>
sketch:theta::cardinality(theta_sketch)

Parameter	Data Type	Description
theta_sketch	http://anzograph.com/statistics#theta_sketch	Binary Theta Sketch

Returns

Data Type	Description
double	The count of items in the sketch

theta::union

Retrieves the estimate of the number of items that are in the union of two or more Theta Sketches.

Syntax

PREFIX sketch: <http://cambridgesemantics.com/anzograph/sketch#>
sketch:theta::union(theta_sketch1, theta_sketch2 [, theta_sketchN ])

Parameter	Data Type	Description
theta_sketch1–N	http://anzograph.com/statistics#theta_sketch	Any number of Theta Sketches

Returns

Data Type	Description
double	The estimated number of items in the union

theta::intersection

Retrieves the estimate of the number of items that are in the intersection between two or more Theta Sketches.

Syntax

PREFIX sketch: <http://cambridgesemantics.com/anzograph/sketch#>
sketch:theta::intersection(theta_sketch1, theta_sketch2 [, theta_sketchN ])

Parameter	Data Type	Description
theta_sketch1–N	http://anzograph.com/statistics#theta_sketch	Any number of Theta Sketches

Returns

Data Type	Description
double	The estimated number of items that intersect in the sketches

theta::difference

Retrieves the estimate of the number of items that are in the difference between two Theta Sketches, i.e., the number of items that are in the first sketch but not in the second sketch.

Syntax

PREFIX sketch: <http://cambridgesemantics.com/anzograph/sketch#>
sketch:theta::difference(a, b)

Parameter	Data Type	Description
a	http://anzograph.com/statistics#theta_sketch	The first Theta Sketch
b	http://anzograph.com/statistics#theta_sketch	The Theta Sketch to compare to sketch `a`

Returns

Data Type	Description
double	The estimated number of items in the difference between the sketches

Matrix Utilities

This collection of functions returns information on various attributes of vector space mapping and related matrix tensors.

make_matrix
make_tensor_from_string
make_vec
dump_tensor
dump_vec
get_rows
get_cols
get_slices
get_order
get_total_elem
get_nonzero
get_elem
get_diag
getmax_val
getmin_val
get_subvec
subvec_head
subvec_tail
cosine_similarity
subview_col
subview_row
subview_rows
subview_cols
subview_mat
subview_head_rows
subview_head_cols
subview_tail_rows
subview_tail_cols
flatten_as_col
flatten_as_row
is_vec
is_rowvec
is_colvec
is_sorted
is_tri_mat_upper
is_tri_mat_lower
is_diag_mat
is_square
is_symmetric
is_hermitian
has_nan
vec_all
vec_any
mat_all
mat_any

make_matrix

Creates a matrix of doubles with the given dimensions and values.

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:make_matrix(m, n [, v1, v2, ..., vn ])

Parameter	Data Type	Description
m	int	Number of rows for the new matrix
n	int	Number of columns for the new matrix
v1–vn	double	Optional arguments that specify the row-wise matrix elements to include. Default value is `0` for all elements.

Returns

Data Type	Description
http://anzograph.com/matrices#tensor	Tensor representation for m x n matrix of doubles

make_tensor_from_string

Constructs a tensor from the given dimensions in a string.

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:make_tensor_from_string(s [, n ])

Parameter	Data Type	Description
s	string	String that contains the row-wise elements for constructing the tensor
n	int	Optional argument that specifies the number of columns to include in the tensor. The default value is `0`, which constructs a row vector. A value of `1` constructs a column vector. A value that is greater than 1 constructs a matrix with the specified number of columns.

Returns

Data Type	Description
http://anzograph.com/matrices#tensor	A tensor of doubles

make_vec

Constructs a row vector with the given index and value to be stored in the index.

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:make_vec(n, v)

Parameter	Data Type	Description
n	int	The index into the vector
v	double	The value to be stored in the vector at the nth index

Returns

Data Type	Description
http://anzograph.com/matrices#tensor	A row vector

dump_tensor

Displays the Armadillo header and the first few elements of the matrix or vector as a string.

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:dump_tensor(b [, type ] [, isRowWise ])

Parameter	Data Type	Description
b	http://anzograph.com/matrices#tensor	A tensor of matrix/row vector/column vector.
type	int	Optional argument that specifies the type of tensor: `0`=row vector, `1`=column vector, `2`=matrix. Default is `2`.
isRowWise	Boolean	Optional argument that indicates whether the display matrix is column- or row- wise: `false`=column-wise, `true`=row-wise. Default is `true`.

Returns

Data Type	Description
string	Row- or column- wise string representation of the vector or matrix

dump_vec

Returns the string representation of a row or column vector.

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:dump_vec(b)

Parameter	Data Type	Description
b	http://anzograph.com/matrices#tensor	Tensor representation of a row or column vector

Returns

Data Type	Description
string	String representation of the row or column vector

get_rows

Returns the number of rows present in a tensor.

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:get_rows(b)

Parameter	Data Type	Description
b	http://anzograph.com/matrices#tensor	A tensor

Returns

Data Type	Description
long	The number of rows

get_cols

Gets the number of columns present in a tensor.

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:get_cols(b)

Parameter	Data Type	Description
b	http://anzograph.com/matrices#tensor	A tensor

Returns

Data Type	Description
long	The number of columns

get_slices

Gets the number of slices present in a tensor.

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:get_slices(b)

Parameter	Data Type	Description
b	http://anzograph.com/matrices#tensor	A tensor

Returns

Data Type	Description
long	The number of slices

get_order

Gets the tensor order.

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:get_order(b)

Parameter	Data Type	Description
b	http://anzograph.com/matrices#tensor	A tensor

Returns

Data Type	Description
long	The tensor order

get_total_elem

Gets the total number of elements that are present in a tensor.

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:get_total_elem(b)

Parameter	Data Type	Description
b	http://anzograph.com/matrices#tensor	A tensor

Returns

Data Type	Description
long	The total number of elements

get_nonzero

Gets the number of non-zero elements that are present in a sparse matrix.

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:get_nonzero(b)

Parameter	Data Type	Description
b	http://anzograph.com/matrices#tensor	A tensor

Returns

Data Type	Description
long	The number of non-zero elements

get_elem

Accesses one or more elements that are stored in a tensor.

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:get_elem(b, i [, j ] [, k ])

Parameter	Data Type	Description
b	http://anzograph.com/matrices#tensor	A tensor
i	long	The element stored in the ith row
j	long	Optional argument that lists the element stored in the jth column
k	long	Optional argument that lists the element stored in the kth slice

Returns

Data Type	Description
double	The element value

get_diag

Extracts a diagonal from a matrix or sparse matrix.

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:get_diag(b [, k ])

Parameter	Data Type	Description
b	http://anzograph.com/matrices#tensor	A tensor
k	long	Optional diagonal number. By default, the main diagonal is accessed (`k=0`). For `k>0` , the `k`th super-diagonal is accessed (top right corner). For `k<0`, the `k`th sub-diagonal is accessed (bottom left corner).

Returns

Data Type	Description
http://anzograph.com/matrices#tensor	Tensor representation of the diagonal as a column vector

getmax_val

Retrieves the maximum value from a tensor.

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:getmax_val(b)

Parameter	Data Type	Description
b	http://anzograph.com/matrices#tensor	A tensor

Returns

Data Type	Description
double	The maximum value in the tensor

getmin_val

Retrieves the minimum value from a tensor.

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:getmin_val(b)

Parameter	Data Type	Description
b	http://anzograph.com/matrices#tensor	A tensor

Returns

Data Type	Description
double	The minimum value in the tensor

get_subvec

Extracts a range of elements from a row or column vector.

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:get_subvec(b, i, j)

Parameter	Data Type	Description
b	http://anzograph.com/matrices#tensor	A tensor
i	long	Start index
j	long	End index

Returns

Data Type	Description
http://anzograph.com/matrices#tensor	Tensor representation of a row or column vector

subvec_head

Extracts starting elements from a row or column vector.

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:subvec_head(b, n)

Parameter	Data Type	Description
b	http://anzograph.com/matrices#tensor	A tensor
n	long	The number of elements to extract from the beginning of the vector

Returns

Data Type	Description
http://anzograph.com/matrices#tensor	Tensor representation of a row or column vector with elements from `0` to `n-1`

subvec_tail

Extracts tailing elements from a row or column vector.

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:subvec_tail(b, n)

Parameter	Data Type	Description
b	http://anzograph.com/matrices#tensor	A tensor
n	long	The number of elements to extract from the end of the vector

Returns

Data Type	Description
http://anzograph.com/matrices#tensor	Tensor representation of a row or column vector with `n` elements from the tail

cosine_similarity

This function calculates the cosine similarity between two row vectors.

The cosine_similarity function is not compatible with column or matrix vectors. The input must be row vectors.

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:cosine_similarity(m, n)

Parameter	Data Type	Description
m	http://anzograph.com/matrices#tensor	A row vector
n	http://anzograph.com/matrices#tensor	The row vector to compare to the vector in argument `m`

Returns

Data Type	Description
double	Results range from `-1` to `1`: -1 is perfectly dissimilar and 1 is perfectly similar

subview_col

Extracts a column from a matrix or sparse matrix.

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:subview_col(b, n)

Parameter	Data Type	Description
b	http://anzograph.com/matrices#tensor	A tensor
n	long	Column index

Returns

Data Type	Description
http://anzograph.com/matrices#tensor	Tensor representation of a column vector

subview_row

Extracts a row from matrix or sparse matrix.

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:subview_row(b, n)

Parameter	Data Type	Description
b	http://anzograph.com/matrices#tensor	A tensor
n	long	Row index

Returns

Data Type	Description
http://anzograph.com/matrices#tensor	Tensor representation of a row vector

subview_rows

Extracts a range of rows from a matrix or sparse matrix.

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:subview_rows(b, r1, ..., rn)

Parameter	Data Type	Description
b	http://anzograph.com/matrices#tensor	A tensor
r1–rn	long	The start row index to the end row index

Returns

Data Type	Description
http://anzograph.com/matrices#tensor	Tensor representation of the matrix with rows from `r1` to `rn`

subview_cols

Extracts a range of columns from a matrix or sparse matrix.

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:subview_cols(b, c1, ..., cn)

Parameter	Data Type	Description
b	http://anzograph.com/matrices#tensor	A tensor
c1–n	long	The start column index to the end column index

Returns

Data Type	Description
http://anzograph.com/matrices#tensor	Tensor representation of the matrix with columns from `c1` to `cn`

subview_mat

Extracts a submatrix from a matrix or sparse matrix.

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:subview_mat(b, r1, ..., rn, c1, ..., cn)

Parameter	Data Type	Description
b	http://anzograph.com/matrices#tensor	A tensor
r1–n	long	The start row index to the end row index
c1–n	long	The start column index to the end column index

Returns

Data Type	Description
http://anzograph.com/matrices#tensor	Tensor representation of a matrix of size `[1+(rn-r1)] x [1+(cn-c1)]`

subview_head_rows

Extracts starting rows from a matrix or sparse matrix.

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:subview_head_rows(b, n)

Parameter	Data Type	Description
b	http://anzograph.com/matrices#tensor	A tensor
n	long	The number of rows to extract from the beginning of the matrix

Returns

Data Type	Description
http://anzograph.com/matrices#tensor	Tensor representation of a matrix with rows from `0` to `n-1`

subview_head_cols

Extracts starting columns from a matrix or sparse matrix.

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:subview_head_cols(b, n)

Parameter	Data Type	Description
b	http://anzograph.com/matrices#tensor	A tensor
n	long	The number of columns to extract from the beginning of the matrix

Returns

Data Type	Description
http://anzograph.com/matrices#tensor	Tensor representation of a matrix with columns from `0` to `n-1`

subview_tail_rows

Extracts tailing rows from a matrix or sparse matrix.

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:subview_tail_rows(b, n)

Parameter	Data Type	Description
b	http://anzograph.com/matrices#tensor	A tensor
n	long	The number of rows to extract from the end of the matrix

Returns

Data Type	Description
http://anzograph.com/matrices#tensor	Tensor representation of a matrix with `n` rows from the tail

subview_tail_cols

Extracts tailing columns from a matrix or sparse matrix.

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:subview_tail_cols(b, n)

Parameter	Data Type	Description
b	http://anzograph.com/matrices#tensor	A tensor
n	long	The number of columns to extract from the end of the matrix

Returns

Data Type	Description
http://anzograph.com/matrices#tensor	Tensor representation of a matrix with `n` columns from the tail

flatten_as_col

Returns a flattened version of a matrix as a column vector.

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:flatten_as_col(b)

Parameter	Data Type	Description
b	http://anzograph.com/matrices#tensor	A tensor

Returns

Data Type	Description
http://anzograph.com/matrices#tensor	Tensor representation of the matrix as a column vector

flatten_as_row

Returns a flattened version of a matrix as a row vector.

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:flatten_as_row(b)

Parameter	Data Type	Description
b	http://anzograph.com/matrices#tensor	A tensor

Returns

Data Type	Description
http://anzograph.com/matrices#tensor	Tensor representation of the matrix as a row vector

is_vec

Evaluates whether the given matrix is a row or column vector.

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:is_vec(b)

Parameter	Data Type	Description
b	http://anzograph.com/matrices#tensor	A tensor

Returns

Data Type	Description
boolean	True if the matrix can be interpreted as a column or row vector. False if the matrix does not have exactly one column or one row.

is_rowvec

Evaluates whether the given matrix is a row vector.

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:is_rowvec(b)

Parameter	Data Type	Description
b	http://anzograph.com/matrices#tensor	A tensor

Returns

Data Type	Description
boolean	True if the matrix can be interpreted as a row vector. False if the matrix does not have exactly one row.

is_colvec

Evaluates whether the given matrix is a column vector.

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:is_colvec(b)

Parameter	Data Type	Description
b	http://anzograph.com/matrices#tensor	A tensor

Returns

Data Type	Description
boolean	True if the matrix can be interpreted as a column vector. False if the matrix does not have exactly one column.

is_sorted

Evaluates whether a vector or matrix is sorted.

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:is_sorted(b [, t ] [, d ])

Parameter	Data Type	Description
b	http://anzograph.com/matrices#tensor	A tensor
t	boolean	Optional argument that specifies the sort dimension for the matrix. Set to `true` if elements are sorted row-wise and `false` if they are sorted column-wise. Default is `false`.
d	int	Optional argument that specifies the sort direction for the matrix. Allowed arguments are: 0: ascend (default). Elements are ascending; consecutive elements can be equal. 1: descend. Elements are descending; consecutive elements can be equal. 2: strictascend. Elements are strictly ascending; consecutive elements cannot be equal. 3: strictdescend. Elements are strictly descending; consecutive elements cannot be equal.

Returns

Data Type	Description
boolean	True if the elements are sorted. False if they are not.

is_tri_mat_upper

Evaluates whether a matrix is upper triangular, i.e., the matrix is square sized and all elements below the main diagonal are zero.

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:is_tri_mat_upper(b)

Parameter	Data Type	Description
b	http://anzograph.com/matrices#tensor	A tensor

Returns

Data Type	Description
boolean	Returns `true` if the matrix is upper triangular and `false` if it is not

is_tri_mat_lower

Evaluates whether a matrix is lower triangular, i.e., the matrix is square sized and all elements above the main diagonal are zero.

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:is_tri_mat_lower(b)

Parameter	Data Type	Description
b	http://anzograph.com/matrices#tensor	A tensor

Returns

Data Type	Description
boolean	Returns `true` if the matrix is lower triangular and `false` if it is not

is_diag_mat

Evaluates whether a matrix is diagonal, i.e., all elements outside of the main diagonal are zero.

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:is_diag_mat(b)

Parameter	Data Type	Description
b	http://anzograph.com/matrices#tensor	A tensor

Returns

Data Type	Description
boolean	Returns `true` if the matrix is diagonal and `false` if it is not

is_square

Evaluates whether a matrix is square, i.e., the number of rows is equal to the number of columns.

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:is_square(b)

Parameter	Data Type	Description
b	http://anzograph.com/matrices#tensor	A tensor

Returns

Data Type	Description
boolean	Returns `true` if the matrix is square and `false` if it is not

is_symmetric

Evaluates whether a matrix is symmetrical.

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:is_symmetric(b)

Parameter	Data Type	Description
b	http://anzograph.com/matrices#tensor	A tensor

Returns

Data Type	Description
boolean	Returns `true` if the matrix is symmetrical and `false` if it is not

is_hermitian

Checks whether the matrix is hermitian (self-adjoint).

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:is_hermitian(b)

Parameter	Data Type	Description
b	http://anzograph.com/matrices#tensor	A tensor

Returns

Data Type	Description
boolean	Returns `true` if the matrix is hermitian and `false` if it is not

has_nan

Evaluates whether a matrix is not a number (NaN).

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:has_nan(b)

Parameter	Data Type	Description
b	http://anzograph.com/matrices#tensor	A tensor

Returns

Data Type	Description
boolean	Returns `true` if at least one of the elements is NaN and `false` if all elements are numbers

vec_all

Evaluates whether all elements in a row or column vector are non-zero or satisfy the specified relational condition.

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:vec_all(b [, c ] [, val ])

Parameter	Data Type	Description
b	http://anzograph.com/matrices#tensor	A tensor
c	int	Optional argument that specifies the relational condition to test: 0 (default): not equal 1: greater than 2: less than 3: equal 4: greater than or equal to 5: less than or equal to
val	double	Optional argument that specifies the value to apply the condition (`c`) to. Default is `0`.

Returns

Data Type	Description
boolean	Returns `true` if all elements are non-zero or satisfy the condition and `false` if not

vec_any

Evaluates whether any elements in a row or column vector are non-zero or satisfy the specified relational condition.

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:vec_any(b [, c ] [, val ])

Parameter	Data Type	Description
b	http://anzograph.com/matrices#tensor	A tensor
c	int	Optional argument that specifies the relational condition to test: 0 (default): not equal 1: greater than 2: less than 3: equal 4: greater than or equal to 5: less than or equal to
val	double	Optional argument that specifies the value to apply the condition (`c`) to. Default is `0`.

Returns

Data Type	Description
boolean	Returns `true` if any elements are non-zero or satisfy the condition and `false` if not

mat_all

Evaluates whether all elements in a matrix are non-zero or satisfy the specified relational condition.

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:mat_all(b [, d ] [, c ] [, val ])

Parameter	Data Type	Description
b	http://anzograph.com/matrices#tensor	A tensor
d	boolean	Optional argument that indicates whether to check rows or columns. Set to `true` for rows or `false` for columns. Default is `false`.
c	int	Optional argument that specifies the relational condition to test: 0 (default): not equal 1: greater than 2: less than 3: equal 4: greater than or equal to 5: less than or equal to
val	double	Optional argument that specifies the value to apply the condition (`c`) to. Default is `0`.

Returns

Data Type	Description
http://anzograph.com/matrices#tensor	Tensor representation of a row vector with each element as 0 or 1, indicating whether the corresponding row or column has all non-zero elements

mat_any

Evaluates whether any elements in a matrix are non-zero or satisfy the specified relational condition.

Syntax

PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:mat_any(b [, d ] [, c ] [, val ])

Parameter	Data Type	Description
b	http://anzograph.com/matrices#tensor	A tensor
d	boolean	Optional argument that indicates whether to check rows or columns. Set to `true` for rows or `false` for columns. Default is `false`.
c	int	Optional argument that specifies the relational condition to test: 0 (default): not equal 1: greater than 2: less than 3: equal 4: greater than or equal to 5: less than or equal to
val	double	Optional argument that specifies the value to apply the condition (`c`) to. Default is `0`.

Returns

Data Type	Description
http://anzograph.com/matrices#tensor	Tensor representation of a row vector with each element as 0 or 1, indicating whether the corresponding row or column has any non-zero elements