Data Science Functions
AnzoGraph DB offers an extension package of pre-built Data Science functions that you can use in the same way as other native, built-in analytic functions. In addition, Cambridge Semantics offers an Apache Zeppelin Docker image, which includes a collection of notebooks that provide details and example usage of each of the Data Science functions. See Zeppelin Notebook Integration for more information on installing the custom Apache Zeppelin Docker image.
If you use the Installer to install AnzoGraph DB on RHEL/CentOS, these functions are included only if you choose to install the optional C++ extensions and related dependencies. If you do not install the C++ extensions, this package of functions is not included in the deployment. For more information, see Installing AnzoGraph DB.
The Data Science functions are organized into the following categories:
- Correlation: Determine the relationship between elements.
- Distribution: Calculate the probability of a given value over a random distribution.
- Entropy: Determine variance and probability density across a given distribution.
- Feature Exploration: Classify values in a distribution using techniques such as Linear Discriminant Analysis (LDA), Principal Component Analysis (PCA), and Singular Value Decomposition (SVD).
- Linear Algebra: Create product vectors or a matrix from a given collection of random variables.
- Profiling Metrics: Produce statistical metrics such as percentile, geometric mean, or skew on a given dataset.
- Sketches: Extremely efficient streaming algorithms that approximate calculations such as count distinct, quantiles, most frequent items, joins, and matrix computations and return data sketches.
- Matrix Utilities: Return information on various attributes of vector space mapping and related matrix tensors.
Correlation
Canonical Correlation (CANCOR)
This aggregate calculates the overall correlation between two sets of variables.
Wikipedia Reference: Canonical correlation function
Syntax
PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:cancor(lc, m, x1, x2, ..., xn, y1, y2, ..., yn)
Parameter |
Data Type |
Description |
lc |
int |
Number of linear combinations for the first canonical correlation |
m |
int |
Number of columns in the first set |
x1–xn |
double |
Feature columns from the first dataset |
y1–yn |
double |
Feature columns from the second dataset |
Returns
Data Type |
Description |
string |
Canonical correlation |
string |
Square of the canonical correlation |
string |
Canonical coefficient |
Covariance (COVARIANCE)
This aggregate provides a measure of the strength of the correlation between two or more sets of random variables (or variates).
Wikipedia Reference: Covariance
Syntax
PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:covariance(x1, x2, ..., xn)
Parameter |
Data Type |
Description |
x1–xn |
double |
Feature columns from the dataset |
Returns
Data Type |
Description |
http://anzograph.com/matrices#tensor |
Covariance matrix |
Matthews Correlation Coefficient (MCC)
This aggregate returns a coefficient value between observed and predicted binary classifications.
Wikipedia Reference: Matthews correlation coefficient.
Syntax
PREFIX stats: <http://cambridgesemantics.com/anzograph/statistics#>
stats:mcc(x, y)
Parameter |
Data Type |
Description |
x |
boolean |
First variable column data |
y |
boolean |
Second variable column data |
Returns
Data Type |
Description |
double |
Coefficient value that shows the extent to which observed and predicted binary classifications are related |
Pearson Correlation Coefficient (PCC)
This aggregate determines the extent to which two variables are linearly related: positive, negative, or no relationship.
Wikipedia Reference: Pearson correlation coefficient
Syntax
PREFIX stats: <http://cambridgesemantics.com/anzograph/statistics#>
stats:pcc(x, y)
Parameter |
Data Type |
Description |
x |
boolean |
First variable column data |
y |
boolean |
Second variable column data |
Returns
Data Type |
Description |
double |
Coefficient that shows the extent to which two variables are linearly related |
Spearman Correlation Coefficient (SCC)
This aggregate determines how well the relationship between two variables can be described using a monotonic function.
Wikipedia Reference: Spearman's Correlation Coefficient
Syntax
PREFIX stats: <http://cambridgesemantics.com/anzograph/statistics#>
stats:scc(rank_X, rank_Y)
Parameter |
Data Type |
Description |
rank_X |
double |
First set of ranked data |
rank_Y |
double |
Second set of ranked data |
Returns
Data Type |
Description |
double |
Coefficient between ranked datasets |
Distribution
Cumulative Distribution Functions (CDF)
A Cumulative Distribution function calculates the probability of a random variable X taking on a value less than or equal to Y.
Wikipedia Reference: Cumulative Distribution Function (CDF)
The following functions produce cumulative distribution calculations:
- Binomial Distribution (BINOMDIST): Calculates the probability for X successes in N trials given a probability of success P for each trial.
- Chi-Squared Distribution (CHISQDIST): Calculates probability often used in hypothesis testing to compare an observed distribution with a theoretical one. Also provides a way to show a relationship between two categorical variables.
- Continuous Uniform Distribution (CONUNIDIST): Calculates probability using continuous probability distribution concerned with events that are equally likely to occur.
- Discrete Uniform Distribution (DISCUNIDIST): Calculates probability using symmetric probability distribution where a finite number of values are equally likely to be observed and every one of n values has equal probability.
- Exponential Distribution (EXPDIST): Calculates probability using a distribution that describes time between events in a Poisson point process (where events occur continuously and independently at a constant average rate).
- Laplace Distribution (LAPLACEDIST): Calculates probability using a distribution that represents differences between two independent variables that have identical exponential distributions (also called double exponential distribution).
- Log Normal Distribution (LOGNORDIST): Calculates probability using a distribution of a random variable whose logarithm follows a normal distribution. Log normal distributions are widely used in risk analysis.
- Negative Binomial Distribution (NEGBINDIST): Calculates probability using a discrete probability distribution that concerns the number of trials which must occur in order to have a predetermined number of successes.
- Normal Distribution (NORMDIST): Calculates probability using a continuous probability distribution of data in which the majority of data points are relatively similar, within a small range of values having few outliers.
- Poisson Distribution (POISDIST): Calculates probability using a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time or space and those events occur with a known constant rate and occur independently of the time since the last event.
- Student's T-Distribution (TDIST): Calculates probability using the Student's t-distribution and associated t scores. Often used in hypothesis testing when the sample size is small and/or when the population variance is unknown.
- TDigest Metric (TDIGEST): Creates an estimate of the median (and more generally, any percentile) from either distributed data or streaming data, using a t-Digest probabilistic data structure.
- Weibull Distribution (WEIBULDIST): Calculates probability from a continuous probability distribution that is commonly used to assess product reliability, analyze product life data and failure times.
Binomial Distribution (BINOMDIST)
This aggregate calculates the probability for X successes in N trials given a probability of success P for each trial.
Wikipedia Reference: Binomial Distribution
Syntax
PREFIX stats: <http://cambridgesemantics.com/anzograph/statistics#>
stats:binomdist(data, n, k, "success_string")
Parameter |
Data Type |
Description |
data |
string |
Column data |
n |
long |
Number of trials |
k |
long |
Number of successes in n trials |
success_string |
string |
Defines the success string |
Returns
Data Type |
Description |
double |
Probability mass function value |
double |
Lower cumulative distribution: probability (<=k) under the area of distribution |
double |
Upper cumulative distribution: probability (>k) under the area of distribution |
Chi-Squared Distribution (CHISQDIST)
This aggregate calculates probability that is often used in hypothesis testing to compare an observed distribution with a theoretical one. It also provides a way to show a relationship between two categorical variables.
Wikipedia Reference: Chi-Squared Distribution
Syntax
PREFIX stats: <http://cambridgesemantics.com/anzograph/statistics#>
stats:chisqdist(data, s)
Parameter |
Data Type |
Description |
data |
double |
Sample data |
s |
double |
Population standard deviation |
Returns
Data Type |
Description |
double |
Mean of the distribution |
double |
Standard deviation of the distribution |
double |
Variance of the distribution |
double |
Chi-squared statistic: [(n - 1) * s^2] / d^2 where d is the standard deviation of the population, s is the standard deviation of the sample, and n is the sample size |
long |
Number of samples: the degrees of freedom(k ) is (count-1) |
double |
Probability mass function value |
double |
Cumulative distribution: the probability for <= the chi-squared statistic |
Continuous Uniform Distribution (CONUNIDIST)
This aggregate calculates probability using a continuous probability distribution concerned with events that are equally likely to occur.
Wikipedia Reference: Continuous Uniform Distribution
Syntax
PREFIX stats: <http://cambridgesemantics.com/anzograph/statistics#>
stats:conunidist(data, a, b)
Parameter |
Data Type |
Description |
data |
double |
Column data |
a |
double |
Minimum value of the probability interval |
b |
double |
Maximum value of the probability interval |
Returns
Data Type |
Description |
double |
Cumulative distribution: probability under the area of distribution |
double |
Probability density function value |
double |
Differential entropy in nats |
Discrete Uniform Distribution (DISCUNIDIST)
This aggregate calculates probability using symmetric probability distribution where a finite number of values are equally likely to be observed and every one of n values has equal probability.
Wikipedia Reference: Discrete Uniform Distribution
Syntax
PREFIX stats: <http://cambridgesemantics.com/anzograph/statistics#>
stats:discunidist(data, k)
Parameter |
Data Type |
Description |
data |
long |
Column data |
k |
long |
The number of outcomes |
Returns
Data Type |
Description |
double |
Cumulative distribution: probability under the area of distribution |
double |
Probability density function value |
double |
Differential entropy in nats |
Exponential Distribution (EXPDIST)
This aggregate calculates probability using a distribution that describes time between events in a Poisson point process (where events occur continuously and independently at a constant average rate).
Wikipedia Reference: Exponential Distribution
Syntax
PREFIX stats: <http://cambridgesemantics.com/anzograph/statistics#>
stats:expdist(data, x)
Parameter |
Data Type |
Description |
data |
long |
Column data |
x |
double |
The probability for the interval |
Returns
Data Type |
Description |
double |
Lower cumulative distribution: probability (<=k) under the area of distribution |
double |
Upper cumulative distribution: probability (>k) under the area of distribution |
double |
Probability density function value |
double |
Differential entropy in nats |
Laplace Distribution (LAPLACEDIST)
This aggregate calculates probability using a distribution that represents differences between two independent variables that have identical exponential distributions (also called double exponential distribution).
Wikipedia Reference: Laplace Distribution
Syntax
PREFIX stats: <http://cambridgesemantics.com/anzograph/statistics#>
stats:laplacedist(data, "c", x1, x2)
Parameter |
Data Type |
Description |
data |
double |
Column data |
c |
string |
"below", "above", "bet" (between), or "out" (outside) |
x1 |
double |
Lower number (>0) to find the probability |
x2 |
double |
Upper number (>0) to find the probability |
Returns
Data Type |
Description |
double |
Mean of the distribution |
double |
Scale parameter of the distribution |
double |
Standard deviation of the distribution |
double |
Variance of the distribution |
double |
Differential entropy in nats |
double |
Cumulative distribution: probability under the area of distribution |
double |
Probability density function value for x1 |
double |
Probability density function value for x2 |
Log Normal Distribution (LOGNORDIST)
This aggregate calculates probability using distribution of a random variable whose logarithm follows a normal distribution. The log normal distribution widely used in risk analysis.
Wikipedia Reference: Log Normal Distribution
Syntax
PREFIX stats: <http://cambridgesemantics.com/anzograph/statistics#>
stats:lognordist(data, "c", x1, x2)
Parameter |
Data Type |
Description |
data |
double |
Column data |
c |
string |
"below", "above", "bet" (between), or "out" (outside) |
x1 |
double |
Lower number (>0) to find the probability |
x2 |
double |
Upper number (>0) to find the probability |
Returns
Data Type |
Description |
double |
Mean of the distribution of natural logarithms distribution |
double |
Standard deviation of the distribution of natural logarithms distribution |
double |
Variance of the distribution |
double |
Differential entropy in nats |
double |
Cumulative distribution: probability under the area of distribution |
double |
Probability density function value for x1 |
double |
Probability density function value for x2 |
Negative Binomial Distribution (NEGBINDIST)
This aggregate calculates probability using a discrete probability distribution that concerns the number of trials which must occur in order to have a predetermined number of successes.
Wikipedia Reference: Negative Binomial Distribution
Syntax
PREFIX stats: <http://cambridgesemantics.com/anzograph/statistics#>
stats:negbindist("data", k, r, "success_string")
Parameter |
Data Type |
Description |
data |
string |
Column data |
k |
long |
Number of successes |
r |
long |
Number of failures |
success_string |
string |
Defines the success string |
Returns
Data Type |
Description |
double |
Probability mass function value |
double |
Lower cumulative distribution: probability (<=k) under the area of distribution |
double |
Upper cumulative distribution: probability (>k) under the area of distribution |
Normal Distribution (NORMDIST)
This aggregate calculates probability using a continuous probability distribution of data in which the majority of data points are relatively similar, within a small range of values with few outliers.
Wikipedia Reference: Normal Distribution
Syntax
PREFIX stats: <http://cambridgesemantics.com/anzograph/statistics#>
stats:normdist(data, "c", x1, x2)
Parameter |
Data Type |
Description |
data |
double |
Column data |
c |
string |
"below", "above", "bet" (between), or "out" (outside) |
x1 |
double |
Lower number (>0) to find the probability |
x2 |
double |
Upper number (>0) to find the probability |
Returns
Data Type |
Description |
double |
Mean of the distribution |
double |
Standard deviation of the distribution |
double |
Variance of the distribution |
double |
Differential entropy in nats |
double |
Cumulative distribution: probability under the area of distribution |
double |
Probability density function value for x1 |
double |
Probability density function value for x2 |
Poisson Distribution (POISDIST)
This function calculates probability using discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time or space, given these events occur with a known constant rate and occur independently of the time since the last event.
Wikipedia Reference: Poisson Distribution
Syntax
PREFIX stats: <http://cambridgesemantics.com/anzograph/statistics#>
stats:poisdist(data, k)
Parameter |
Data Type |
Description |
data |
long |
Column data |
k |
long |
Probability of observing k events in an interval |
Returns
Data Type |
Description |
double |
Probability mass function value |
double |
Lower cumulative distribution: probability (<=k) under the area of distribution |
double |
Upper cumulative distribution: probability (>k) under the area of distribution |
Student's T-Distribution (TDIST)
This function calculates probability using the Student's t-distribution (and associated t scores) which are often used in hypothesis testing when the sample size is small and/or when the population variance is unknown.
Wikipedia Reference: Student's t-distribution
Syntax
PREFIX stats: <http://cambridgesemantics.com/anzograph/statistics#>
stats:tdist(data, M)
Parameter |
Data Type |
Description |
data |
double |
Sample data |
M |
double |
Population mean |
Returns
Data Type |
Description |
double |
Mean of the distribution |
double |
Standard deviation of the distribution |
double |
Variance of the distribution |
double |
T-statistics: t = [ u - M ] / [ s / sqrt( N ) ] where u is the sample mean, M is the population mean, s is the standard deviation of the sample, and N is the sample size |
long |
Number of samples: the degrees of freedom is (count-1) |
double |
Probability mass function value |
double |
Cumulative distribution: the probability for <= t-statistics |
TDigest Metric (TDIGEST)
This function creates an estimate of the median (and more generally, any percentile) from either distributed data or streaming data, using a t-Digest probabilistic data structure.
Wikipedia Reference: Computing Quantiles using T-Digests
Syntax
PREFIX sketch: <http://cambridgesemantics.com/anzograph/statistics/sketch#>
sketch:tdigest(data, p, q, cdf)
Parameter |
Data Type |
Description |
data |
double |
Column data |
p |
double |
The percentile (0 - 100) to compute |
q |
double |
The quantile (0.0 - 1.0) to compute |
cdf |
double |
The CDF to use |
Returns
Data Type |
Description |
double |
Percentile: the value below which a given percentage of observations falls |
double |
Quantile: Cut point to dividing the observations in a sample |
double |
The computation of F(x) where F is the CDF of the distribution |
Weibull Distribution (WEIBULDIST)
This function calculates probability from a continuous probability distribution commonly used to assess product reliability and analyze product life data and failure times.
Wikipedia Reference: Weibull Distribution
Syntax
PREFIX stats: <http://cambridgesemantics.com/anzograph/statistics#>
stats:weibuldist(data, k, x)
Parameter |
Data Type |
Description |
data |
double |
Sample data |
k |
double |
The initial starting value for the shape parameter. A good guess is crucial to quick convergence. |
x |
double |
The probability for a random variable |
Returns
Data Type |
Description |
double |
The mean of the distribution |
double |
The standard deviation of the distribution |
double |
The variance of the distribution |
long |
The count of the number of samples |
double |
The estimated shape parameter(k) of the distribution from the mean and variance using the root finding method |
double |
The estimated scale parameter(a) of the distribution from the mean and variance using the root finding method |
double |
Differential entropy in nats |
double |
Probability density function value |
double |
Lower cumulative distribution: probability (<=x) under the area of distribution |
double |
Upper cumulative distribution: probability (>x) under the area of distribution |
long |
The actual number of iterations performed to get an estimate of the k value |
double |
The mean calculated using estimated values of k and a |
double |
The variance calculated using estimated values of k and a |
Bernoulli Distribution (BERNDIST)
The Bernoulli Distribution function determines the probability of success or failure (or Yes or No) in tests that have only two possible outcomes.
Wikipedia Reference: Bernoulli Distribution
Syntax
PREFIX stats: <http://cambridgesemantics.com/anzograph/statistics#>
stats:berndist("data", prob, "success_string")
Parameter |
Data Type |
Description |
data |
string |
Column data |
prob |
boolean |
Probability of success (true) or failure (false) |
success_string |
string |
The success message |
Returns
Data Type |
Description |
double |
The Bernoulli distribution probability |
Beta-Binomial Distribution (BETABINDIST)
The Beta-Binomial Distribution function computes probability using a combination of both binomial and beta probability distributions.
Wikipedia Reference: Beta-Binomial Distribution
Syntax
PREFIX stats: <http://cambridgesemantics.com/anzograph/statistics#>
stats:betabindist(k, n, alpha, beta)
Parameter |
Data Type |
Description |
k |
double |
The probability for the number |
n |
double |
The number of trials |
alpha, beta |
double |
Shape parameters |
Returns
Data Type |
Description |
double |
The probability of occurrence k for a beta binomial n, alpha, beta |
Hypergeometric Distribution (HYPGEODIST)
The Hypergeometric Distribution function calculates probability from a distribution often used to predict the outcome of a process in which different elements are randomly drawn from a collection and not replaced.
Wikipedia Reference: HyperGeometric Distribution
Syntax
PREFIX stats: <http://cambridgesemantics.com/anzograph/statistics#>
stats:hypgeodist("data", n, k, "success_string")
Parameter |
Data Type |
Description |
data |
string |
Column data |
n |
int |
The number of trials |
k |
int |
The number of success in n trials |
success_string |
string |
The success message |
Returns
Data Type |
Description |
double |
The hypergeometric distribution probability |
Logarithmic (Series) Distribution (LOGSERDIST)
This function calculates probability using a discrete probability distribution derived from the Maclaurin series expansion.
Wikipedia Reference: Logarithmic (Series) Distribution
Syntax
PREFIX stats: <http://cambridgesemantics.com/anzograph/statistics#>
stats:logserdist("data", k, "success_string")
Parameter |
Data Type |
Description |
data |
string |
Column data |
k |
long |
The probability for the number |
success_string |
string |
The success message |
Returns
Data Type |
Description |
double |
The logarithmic distribution probability |
Skellam Distribution (SKELLAMDIST)
This function calculates probability using the Skellam distribution which models the difference between two independent Poisson distributed variables.
Wikipedia Reference: Skellam Distribution
Syntax
PREFIX stats: <http://cambridgesemantics.com/anzograph/statistics#>
stats:skellamdist(n1_data, n2_data, k)
Parameter |
Data Type |
Description |
n1_data |
long |
N1 column data |
n2_data |
long |
N2 column data |
k |
long |
Probability for the number |
Returns
Data Type |
Description |
double |
The Skellam probability value |
Entropy
Cross Entropy (CROSSENTROPY)
This function computes cross-entropy, which is commonly used to quantify the difference between two probability distributions.
Wikipedia Reference: Cross Entropy
Syntax
PREFIX stats: <http://cambridgesemantics.com/anzograph/statistics#>
stats:crossentropy(p, q)
Parameter |
Data Type |
Description |
p |
double |
True probabilities for x |
q |
double |
Predicted probabilities for x |
Returns
Data Type |
Description |
double |
The cross entropy value |
Discrete Entropy Metric (DISCENTROPY)
This function calculates entropy for maps on finite sets, referred to as discrete entropy.
ScienceDirect Reference: Discrete Entropy
Syntax
PREFIX stats: <http://cambridgesemantics.com/anzograph/statistics#>
stats:discentropy("data")
Parameter |
Data Type |
Description |
data |
string |
Column data |
Returns
Data Type |
Description |
double |
The discrete entropy value |
Differential Entropy or Continuous Entropy Metrics
Differential entropy (also referred to as continuous entropy) is entropy that can be computed for distributions with a continuous random variable.
Wikipedia Reference: Differential entropy
The following functions produce entropy calculations:
Feature Exploration
Principal Component Analysis (PCA)
Applies principal component analysis (PCA) to create combined eigenvalues and vectors that highlight patterns in a dataset, making it easier to explore and visualize data.
Wikipedia Reference: Principal Component Analysis
There are several options available for principal component analysis:
pca::create
Applies PCA to create combined eigenvalues and eigenvectors.
Syntax
PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:pca::create(x1, x2, ..., xn)
Parameter |
Data Type |
Description |
x1–xn |
double |
Feature column datasets |
Returns
Data Type |
Description |
http://anzograph.com/matrices#feature_result |
PCA data containing eigenvalues and eigenvectors |
pca::get_eigvec
Retrieves PCA's eigenvectors as a matrix from the PCA data.
Syntax
PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:pca::get_eigvec(pca_data)
Parameter |
Data Type |
Description |
pca_data |
http://anzograph.com/matrices#feature_result |
Principal Component Analysis data |
Returns
Data Type |
Description |
http://anzograph.com/matrices#tensor |
Eigenvectors as a matrix |
pca::get_eigval
Retrieves PCA's eigenvalues as a column vector from PCA data.
Syntax
PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:pca::get_eigval(pca_data)
Parameter |
Data Type |
Description |
pca_data |
http://anzograph.com/matrices#feature_result |
Principal Component Analysis data |
Returns
Data Type |
Description |
http://anzograph.com/matrices#tensor |
Eigenvalues in descending order as column vectors |
transform
Applies Linear Discriminant Analysis (LDA) to transform samples onto the new subspace.
Syntax
PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:transform(pca_data, d, x1, x2, ..., xn)
Parameter |
Data Type |
Description |
pca_data |
http://anzograph.com/matrices#feature_result |
Principal Component Analysis data |
d |
int |
The number of eigenvectors to consider from the end |
x1–xn |
double |
Feature column datasets |
Returns
Data Type |
Description |
double |
Sample data transformed into the tuple of lower dimensional space |
dump_tensor
Gets a string representation of vector or matrix in row-wise/column-wise order.
Syntax
PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:dump_tensor(m [, type ] [, isRowWise ])
Parameter |
Data Type |
Description |
m |
http://anzograph.com/matrices#tensor |
A tensor of matrix/row vector/column vector |
type |
int |
Optional argument that specifies the type of tensor: 0 =row vector, 1 =column vector, 2 =matrix. Default is 2 .
|
isRowWise |
boolean |
Optional argument that indicates whether the display matrix is column- or row- wise: false =column-wise, true =row-wise. Default is true . |
Returns
Data Type |
Description |
string |
String representation of vector or matrix in row-wise/column-wise |
pca::get_raw_eigval
Gets the PCA's unsorted eigenvalues from the PCA data.
Syntax
PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:pca::get_raw_eigval(pca_data)
Parameter |
Data Type |
Description |
pca_data |
http://anzograph.com/matrices#feature_result |
Principal Component Analysis data |
Returns
Data Type |
Description |
http://anzograph.com/matrices#tensor |
Eigenvalues in unsorted order as column vectors |
Singular Value Decomposition (SVD)
A matrix factorization method, singular value decomposition (SVD), creates combined singular values and right singular vectors.
Wikipedia Reference: Singular Value Decomposition
There are several options available for singular value decomposition:
svd::create
Applies SVD to create combined singular values and right singular vectors.
Syntax
PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:svd::create(x1, x2, ..., xn)
Parameter |
Data Type |
Description |
x1–xn |
double |
Feature column datasets |
Returns
Data Type |
Description |
http://anzograph.com/matrices#feature_result |
SVD data containing singular values and right singular vectors |
svd::get_sigval
Gets SVD's singular values as a column vector from the SVD data.
Syntax
PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:svd::get_sigval(svd_data)
Parameter |
Data Type |
Description |
svd_data |
http://anzograph.com/matrices#feature_result |
SVD data |
Returns
Data Type |
Description |
http://anzograph.com/matrices#tensor |
Singular values in descending order as a column vector |
svd::get_sigvec
Gets SVD's singular vector as a matrix from the SVD data.
Syntax
PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:svd::get_sigvec(svd_data)
Parameter |
Data Type |
Description |
svd_data |
http://anzograph.com/matrices#feature_result |
SVD data |
Returns
Data Type |
Description |
http://anzograph.com/matrices#tensor |
Right singular vectors as a matrix |
transform
Applies PCA or SVD to transform samples onto the new subspace.
Syntax
PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:transform(svd_data, d, x1, x2, ..., xn)
Parameter |
Data Type |
Description |
svd_data |
http://anzograph.com/matrices#feature_result |
SVD data |
d |
int |
The number of singular vectors to consider from the start |
x1–xn |
double |
Feature column datasets |
Returns
Data Type |
Description |
string |
Sample data transformed into the tuple of lower dimensional space |
dump_tensor
Returns a string representation of a vector or matrix in row- or column- wise order.
Syntax
PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:dump_tensor(m [, type ] [, isRowWise ])
Parameter |
Data Type |
Description |
m |
http://anzograph.com/matrices#tensor |
A tensor of matrix/row vector/column vector |
type |
int |
Optional argument that specifies the type of tensor: 0 =row vector, 1 =column vector, 2 =matrix. Default is 2 .
|
isRowWise |
boolean |
Optional argument that indicates whether the display matrix is column- or row- wise: false =column-wise, true =row-wise. Default is true . |
Returns
Data Type |
Description |
string |
String representation of the vector or matrix in row-wise/column-wise |
Linear Discriminant Analysis (LDA)
This function applies linear discriminant analysis (LDA) to create combined eigenvalues and vectors that characterize or separate two or more classes of objects or events.
Wikipedia Reference: Linear Discriminant Analysis
There are several options available for linear discriminant analysis:
lda::create
Applies LDA to create combined eigenvalues and eigenvectors.
Syntax
PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:lda::create(y, x1, x2, ..., xn)
Parameter |
Data Type |
Description |
y |
double |
Class of feature tuple
|
x1–xn |
double |
Feature column datasets |
Returns
Data Type |
Description |
http://anzograph.com/matrices#lda_result |
Combined eigenvalues, eigenvectors, class mean, count, and class map |
lda::get_eigvec
Given LDA data, this function gets LDA's eigenvectors as a matrix.
Syntax
PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:lda::get_eigvec(lda_data)
Parameter |
Data Type |
Description |
lda_data |
http://anzograph.com/matrices#lda_result |
Linear discriminant analysis data |
Returns
Data Type |
Description |
http://anzograph.com/matrices#tensor |
Eigenvectors as a matrix |
lda::get_eigval
Given LDA data, this function gets LDA's eigenvalues as a column vector.
Syntax
PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:lda::get_eigval(lda_data)
Parameter |
Data Type |
Description |
lda_data |
http://anzograph.com/matrices#lda_result |
Linear discriminant analysis data |
Returns
Data Type |
Description |
http://anzograph.com/matrices#tensor |
Eigenvalues in descending order as a column vector |
lda::transform
Applies LDA to transform samples onto the new subspace.
Syntax
PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:lda::transform(lda_data, d, x1, x2, ..., xn)
Parameter |
Data Type |
Description |
lda_data |
http://anzograph.com/matrices#lda_result |
Linear discriminant analysis data |
d |
int |
The number of eigenvectors to consider from the start |
x1–xn |
double |
Feature column datasets |
Returns
Data Type |
Description |
double |
Original data transformed into the tuple of lower dimensional space |
dump_tensor
Returns a string representation of a vector or matrix in row- or column- wise order.
Syntax
PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:dump_tensor(m [, type ] [, isRowWize ])
Parameter |
Data Type |
Description |
m |
http://anzograph.com/matrices#tensor |
A tensor of matrix/row vector/column vector |
type |
int |
Optional argument that specifies the type of tensor: 0 =row vector, 1 =column vector, 2 =matrix. Default is 2 .
|
isRowWise |
boolean |
Optional argument that indicates whether the display matrix is column- or row- wise: false =column-wise, true =row-wise. Default is true . |
Returns
Data Type |
Description |
string |
String representation of the vector or matrix in row-wise/column-wise |
lda::predict
Predicts the class for the samples using LDA as the classifier.
Syntax
PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:lda::predict(lda_data, p1, p2, ..., pn)
Parameter |
Data Type |
Description |
lda_data |
http://anzograph.com/matrices#lda_result |
LDA data |
p1–pn |
double |
Data sample whose class to predict |
Returns
Data Type |
Description |
string |
The class name to which data tuple belongs |
lda::get_raw_eigval
Given LDA data, this function gets LDA's unsorted eigenvalues.
Syntax
PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:lda::get_raw_eigval(lda_data)
Parameter |
Data Type |
Description |
lda_data |
http://anzograph.com/matrices#lda_result |
LDA data |
Returns
Data Type |
Description |
http://anzograph.com/matrices#tensor |
Eigenvalues in unsorted order as a column vector |
Linear Algebra
Gramian Matrix (GRAMIAN)
This function creates a Gramian matrix commonly used to compute linear independence.
Wikipedia Reference: Gramian Matrix
Syntax
PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:gramian(x1, x2, ..., xn)
Parameter |
Data Type |
Description |
x1–xn |
double |
Feature column datasets |
Returns
Data Type |
Description |
http://anzograph.com/matrices#tensor |
Gramian matrix |
Profiling Metrics
Discrete Probability Metrics
These functions calculate a discrete probability distribution of values.
Wikipedia Reference: Discrete Probability
The following functions produce discrete probability metrics:
Geometric Mean Metric (GMEAN)
This function calculates geometric mean, defined as the nth root of the product of n positive numbers.
Wikipedia Reference: Geometric Mean
Syntax
PREFIX stats: <http://cambridgesemantics.com/anzograph/statistics#>
stats:gmean(data)
Parameter |
Data Type |
Description |
data |
double |
Column data |
Returns
Data Type |
Description |
double |
The geometric mean value |
Percentile Metric (PERCENTILE)
This function calculates the percentile (1 to 100) of numeric values.
Wikipedia Reference: Percentile Metric
Syntax
PREFIX stats: <http://cambridgesemantics.com/anzograph/statistics#>
stats:percentile(data, p)
Parameter |
Data Type |
Description |
data |
double |
Dataset |
p |
double |
The percentile (0 - 100) to compute |
Returns
Data Type |
Description |
double |
The percentile value |
Skew Metric (SKEWCOEFF)
This function calculates the Pearson’s coefficient of skewness on numeric values.
Wikipedia Reference: Skewness
Syntax
PREFIX stats: <http://cambridgesemantics.com/anzograph/statistics#>
stats:skewcoeff(data, dp)
Parameter |
Data Type |
Description |
data |
double |
Data set |
dp |
int |
Number of decimal points to consider for the input data |
Returns
Data Type |
Description |
double |
The mode (value that appears most frequently) |
double |
The median number in an ordered set of data |
double |
The average value |
double |
The standard deviation |
double |
Pearson mode skewness or first skewness coefficient |
double |
Pearson median skewness or second skewness coefficient |
Sketches
- Cardinality Metric (HLL): Uses Apache DataSketches HyperLogLog (HLL) to calculate cardinality estimates for a dataset.
- Frequent Items (FI): Collection of functions used to create frequency sketches and obtain information about frequent items.
- Quantile/Rank Sketch (KLL): Collection of functions that use the KLL sketch computation model to approximate minimum and maximum items in a dataset, the quantile and rank of items, the Probability Mass Function (PMF), and the Cumulative Distribution Function (CDF).
- Theta Sketch (THETA): Collection of functions that use the Theta Sketch framework to compute estimates of the cardinality, union, intersection, and difference set operations and return a Theta Sketch.
Cardinality Metric (HLL)
This aggregate calculates cardinality estimates for a dataset using Apache DataSketches HyperLogLog (HLL).
Reference: Cardinality Prominence Metric
Syntax
PREFIX sketch: <http://cambridgesemantics.com/anzograph/sketch#>
sketch:hll(data [, log_base_2_K ] [, hll_target_type ])
Parameter |
Data Type |
Description |
data |
byte, short, int, long, float, double, string, URI |
The dataset |
log_base_2_K |
int |
Optional argument that specifies the log base 2 of K, where K is the number of buckets or slots for the sketch. Must be between 4 and 21 (inclusive). Default value is 12 . |
hll_target_type |
int |
Optional argument that specifies the target type for the HLL sketch. Supported values are 4 (HLL_4), 6 (HLL_6), or 8 (HLL_8). Default value is 4 . |
Returns
Data Type |
Description |
double |
The cardinality metric value |
Frequent Items (FI)
The FI aggregate is used to estimate the frequency of items in a dataset, the upper and lower bounds of the items, the number of active items, and the total stream weight. FI returns a binary stream (Frequent Items Sketch) containing all of the computed values. Values can retrieved from the sketch using the get_estimates, get_active_items_total_weights, get_top_items, and get_top_strings functions.
FI Syntax
PREFIX sketch: <http://cambridgesemantics.com/anzograph/sketch#>
sketch:fi(values [, weight ])
Parameter |
Data Type |
Description |
values |
short, int, long, float, double, string |
The dataset |
weight |
long |
Optional argument that specifies the weight of val . The default value is 1 . |
Returns
Data Type |
Description |
http://anzograph.com/statistics#fi_sketch |
Binary Frequent Items Sketch |
Frequent Items Sketch Retrieval Functions
The following functions are available for retrieving values from a Frequent Items Sketch:
fi::get_estimates
Returns the estimates for the frequency and lower and upper bound of the given item in a sketch.
Syntax
PREFIX sketch: <http://cambridgesemantics.com/anzograph/sketch#>
sketch:fi::get_estimates(fi_sketch, item)
Parameter |
Data Type |
Description |
fi_sketch |
http://anzograph.com/statistics#fi_sketch |
Frequent Items Sketch |
item |
Object |
Item for which to get estimates |
Returns
Data Type |
Description |
long |
Frequency estimate for the item |
long |
Lower bound estimate for the item |
long |
Upper bound estimate for the item |
fi::get_active_items_total_weights
Returns the number of active items and the estimated total stream weight from a sketch.
Syntax
PREFIX sketch: <http://cambridgesemantics.com/anzograph/sketch#>
sketch:fi::get_active_items_total_weights(fi_sketch)
Parameter |
Data Type |
Description |
fi_sketch |
http://anzograph.com/statistics#fi_sketch |
Frequent Items Sketch |
Returns
Data Type |
Description |
long |
The estimated number of active items |
long |
The estimated total stream weight |
fi::get_top_items
Returns the most frequent items and their corresponding frequency.
Syntax
PREFIX sketch: <http://cambridgesemantics.com/anzograph/sketch#>
sketch:fi::get_top_items(fi_sketch)
Parameter |
Data Type |
Description |
fi_sketch |
http://anzograph.com/statistics#fi_sketch |
Frequent Items Sketch |
Returns
Data Type |
Description |
double |
The item with the highest frequency |
long |
Frequency estimate of the first item |
double |
The item with the second highest frequency |
long |
Frequency estimate of the second item |
double |
The item with the nth highest frequency |
long |
Frequency estimate of the nth item |
fi::get_top_strings
Gets top frequent strings and their corresponding frequency.
Syntax
PREFIX sketch: <http://cambridgesemantics.com/anzograph/sketch#>
sketch:fi::get_top_strings(fi_sketch)
Parameter |
Data Type |
Description |
fi_sketch |
http://anzograph.com/statistics#fi_sketch |
Frequent Items Sketch |
Returns
Data Type |
Description |
string |
The string with the highest frequency |
long |
Frequency estimate of the first string |
string |
The string with the second highest frequency |
long |
Frequency estimate of the second string |
string |
The string with the nth highest frequency |
long |
Frequency estimate of the nth string |
Quantile/Rank Sketch (KLL)
The KLL aggregate uses the KLL Sketch computation model to calculate the approximate minimum and maximum items in a dataset, the quantile and rank of items, the Probability Mass Function (PMF), and the Cumulative Distribution Function (CDF). KLL returns a binary stream (KLL Sketch) containing all of the computed values. Values can retrieved from the sketch using various functions.
For more information about KLL sketches, see KLL Sketch.
KLL Syntax
PREFIX sketch: <http://cambridgesemantics.com/anzograph/sketch#>
sketch:kll(values [, k ])
Parameter |
Data Type |
Description |
values |
short, int, long, float, double, string |
The dataset |
k |
int |
Optional argument that configures the size of the sketch and its estimation error. Can be any value between 8 and 65535 (inclusive). The default value is 200 , which results in a normalized rank error of about 1.65%. Higher values will have a smaller error but the sketch will be larger (and slower). |
Returns
Data Type |
Description |
http://anzograph.com/statistics#kll_sketch |
Binary KLL sketch |
KLL Sketch Retrieval Functions
The following functions are available for retrieving values from a KLL sketch:
kll::get_min_value
Returns the minimum value in a KLL sketch.
Syntax
PREFIX sketch: <http://cambridgesemantics.com/anzograph/sketch#>
sketch:kll::get_min_value(kll_sketch)
Parameter |
Data Type |
Description |
kll_sketch |
http://anzograph.com/statistics#kll_sketch |
KLL sketch |
Returns
Data Type |
Description |
double |
The minimum value in the sketch |
string |
If the input is a string, the minimum string is returned |
kll::get_max_value
Returns the maximum value in a KLL sketch.
Syntax
PREFIX sketch: <http://cambridgesemantics.com/anzograph/sketch#>
sketch:kll::get_max_value(kll_sketch)
Parameter |
Data Type |
Description |
kll_sketch |
http://anzograph.com/statistics#kll_sketch |
KLL sketch |
Returns
Data Type |
Description |
double |
The maximum value in the sketch |
string |
If the input is a string, the minimum string is returned |
kll::get_n
Returns the length of a KLL sketch.
Syntax
PREFIX sketch: <http://cambridgesemantics.com/anzograph/sketch#>
sketch:kll::get_n(kll_sketch)
Parameter |
Data Type |
Description |
kll_sketch |
http://anzograph.com/statistics#kll_sketch |
KLL sketch |
Returns
Data Type |
Description |
long |
The length of the sketch |
kll::get_num_retained
Returns the number of retained items (samples) in a sketch.
Syntax
PREFIX sketch: <http://cambridgesemantics.com/anzograph/sketch#>
sketch:get_num_retained(kll_sketch)
Parameter |
Data Type |
Description |
kll_sketch |
http://anzograph.com/statistics#kll_sketch |
KLL sketch |
Returns
Data Type |
Description |
long |
The number of retained items (samples) in the sketch |
kll::get_rank
Returns an approximation of the normalized (fractional) rank of the given item.
Syntax
PREFIX sketch: <http://cambridgesemantics.com/anzograph/sketch#>
sketch:kll::get_rank(kll_sketch, v)
Parameter |
Data Type |
Description |
kll_sketch |
http://anzograph.com/statistics#kll_sketch |
KLL sketch |
v |
double |
The item to retrieve the rank for |
Returns
Data Type |
Description |
double |
The approximate rank of the item from 0 - 1 (inclusive) |
kll::get_quantile
Returns an approximation of the value for an item from the rank.
Syntax
PREFIX sketch: <http://cambridgesemantics.com/anzograph/sketch#>
sketch:kll::get_quantile(kll_sketch, fraction)
Parameter |
Data Type |
Description |
kll_sketch |
http://anzograph.com/statistics#kll_sketch |
KLL sketch |
fraction |
double |
The fractional position in the hypothetical sorted stream |
Returns
Data Type |
Description |
double |
An approximation of the value of the item that would be preceded by the given fraction of a hypothetical sorted version of the sketch |
string |
An approximation of the string when the input is a string |
kll::get_quantiles
Provides a more efficient, multiple-query version of kll::get_quantile that enables you to specify a number of evenly spaced fractional ranks.
Syntax
PREFIX sketch: <http://cambridgesemantics.com/anzograph/sketch#>
sketch:kll::get_quantiles(kll_sketch, f1, f2, ..., f10)
Parameter |
Data Type |
Description |
kll_sketch |
http://anzograph.com/statistics#kll_sketch |
KLL sketch |
f1–f10 |
double |
Normalized or fractional ranks in the hypothetical sorted stream. The ranks must be in the interval 0.0 - 1.0 (inclusive). |
Returns
Data Type |
Description |
double |
An approximation of the values in the same order as the given fractional positions |
kll::get_quantiles_str
Provides an approximation to the strings when the input is a string type.
Syntax
PREFIX sketch: <http://cambridgesemantics.com/anzograph/sketch#>
sketch:kll::get_quantiles_str(kll_sketch, f1, f2, ..., f10)
Parameter |
Data Type |
Description |
kll_sketch |
http://anzograph.com/statistics#kll_sketch |
KLL sketch |
f1–f10 |
double |
Normalized or fractional ranks in the hypothetical sorted stream. The ranks must be in the interval 0.0 - 1.0 (inclusive). |
Returns
Data Type |
Description |
string |
An approximation of the strings |
kll::get_pmf
Provides an approximation to the Probability Mass Function (PMF) of the input stream.
Syntax
PREFIX sketch: <http://cambridgesemantics.com/anzograph/sketch#>
sketch:kll::get_pmf(kll_sketch, v1, v2, ..., v10)
Parameter |
Data Type |
Description |
kll_sketch |
http://anzograph.com/statistics#kll_sketch |
KLL sketch |
v1–v10 |
Object |
Input values between the minimum and maximum values of the input stream. Values must be unique and monotonically increasing. |
Returns
Data Type |
Description |
double |
PMF values corresponding to the input |
kll::get_cdf
Provides an approximation to the Cumulative Distribution Function (CDF), which is the cumulative analog of the PMF of the input stream.
Syntax
PREFIX sketch: <http://cambridgesemantics.com/anzograph/sketch#>
sketch:kll::get_cdf(kll_sketch, v1, v2, ..., v10)
Parameter |
Data Type |
Description |
kll_sketch |
http://anzograph.com/statistics#kll_sketch |
KLL sketch |
v1–v10 |
Object |
Input values between the minimum and maximum values of the input stream. Values must be unique and monotonically increasing. |
Returns
Data Type |
Description |
double |
CDF values corresponding to the input |
Theta Sketch (THETA)
The THETA aggregate uses the Theta Sketch framework to compute estimates of the cardinality, union, intersection, and difference set operations and return a binary stream (Theta Sketch) containing the computed values. Values can be retrieved from the sketch using the cardinality, union, intersection, and difference functions.
Theta Sketches are a generalization of the well-known Kth Minimum Value (KMV) sketches. For more information about the framework, you may find the following references helpful:
THETA Syntax
PREFIX sketch: <http://cambridgesemantics.com/anzograph/sketch#>
sketch:theta(values)
Parameter |
Data Type |
Description |
values |
short, int, long, float, double, string |
The dataset to operate on |
Returns
Data Type |
Description |
http://anzograph.com/statistics#theta_sketch |
Binary Theta Sketch |
Theta Sketch Retrieval Functions
The following functions are available for retrieving values from a Theta Sketch:
theta::cardinality
Retrieves the estimated count of values in a Theta Sketch.
Syntax
PREFIX sketch: <http://cambridgesemantics.com/anzograph/sketch#>
sketch:theta::cardinality(theta_sketch)
Parameter |
Data Type |
Description |
theta_sketch |
http://anzograph.com/statistics#theta_sketch |
Binary Theta Sketch |
Returns
Data Type |
Description |
double |
The count of items in the sketch |
theta::union
Retrieves the estimate of the number of items that are in the union of two or more Theta Sketches.
Syntax
PREFIX sketch: <http://cambridgesemantics.com/anzograph/sketch#>
sketch:theta::union(theta_sketch1, theta_sketch2 [, theta_sketchN ])
Parameter |
Data Type |
Description |
theta_sketch1–N |
http://anzograph.com/statistics#theta_sketch |
Any number of Theta Sketches |
Returns
Data Type |
Description |
double |
The estimated number of items in the union |
theta::intersection
Retrieves the estimate of the number of items that are in the intersection between two or more Theta Sketches.
Syntax
PREFIX sketch: <http://cambridgesemantics.com/anzograph/sketch#>
sketch:theta::intersection(theta_sketch1, theta_sketch2 [, theta_sketchN ])
Parameter |
Data Type |
Description |
theta_sketch1–N |
http://anzograph.com/statistics#theta_sketch |
Any number of Theta Sketches |
Returns
Data Type |
Description |
double |
The estimated number of items that intersect in the sketches |
theta::difference
Retrieves the estimate of the number of items that are in the difference between two Theta Sketches, i.e., the number of items that are in the first sketch but not in the second sketch.
Syntax
PREFIX sketch: <http://cambridgesemantics.com/anzograph/sketch#>
sketch:theta::difference(a, b)
Parameter |
Data Type |
Description |
a |
http://anzograph.com/statistics#theta_sketch |
The first Theta Sketch |
b |
http://anzograph.com/statistics#theta_sketch |
The Theta Sketch to compare to sketch a |
Returns
Data Type |
Description |
double |
The estimated number of items in the difference between the sketches |
Matrix Utilities
This collection of functions returns information on various attributes of vector space mapping and related matrix tensors.
make_matrix
Creates a matrix of doubles with the given dimensions and values.
Syntax
PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:make_matrix(m, n [, v1, v2, ..., vn ])
Parameter |
Data Type |
Description |
m |
int |
Number of rows for the new matrix |
n |
int |
Number of columns for the new matrix |
v1–vn |
double |
Optional arguments that specify the row-wise matrix elements to include. Default value is 0 for all elements. |
Returns
Data Type |
Description |
http://anzograph.com/matrices#tensor |
Tensor representation for m x n matrix of doubles |
make_tensor_from_string
Constructs a tensor from the given dimensions in a string.
Syntax
PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:make_tensor_from_string(s [, n ])
Parameter |
Data Type |
Description |
s |
string |
String that contains the row-wise elements for constructing the tensor |
n |
int |
Optional argument that specifies the number of columns to include in the tensor. The default value is 0 , which constructs a row vector. A value of 1 constructs a column vector. A value that is greater than 1 constructs a matrix with the specified number of columns. |
Returns
Data Type |
Description |
http://anzograph.com/matrices#tensor |
A tensor of doubles |
make_vec
Constructs a row vector with the given index and value to be stored in the index.
Syntax
PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:make_vec(n, v)
Parameter |
Data Type |
Description |
n |
int |
The index into the vector |
v |
double |
The value to be stored in the vector at the nth index |
Returns
Data Type |
Description |
http://anzograph.com/matrices#tensor |
A row vector |
dump_tensor
Displays the Armadillo header and the first few elements of the matrix or vector as a string.
Syntax
PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:dump_tensor(b [, type ] [, isRowWise ])
Parameter |
Data Type |
Description |
b |
http://anzograph.com/matrices#tensor |
A tensor of matrix/row vector/column vector. |
type |
int |
Optional argument that specifies the type of tensor: 0 =row vector, 1 =column vector, 2 =matrix. Default is 2 . |
isRowWise |
Boolean |
Optional argument that indicates whether the display matrix is column- or row- wise: false =column-wise, true =row-wise. Default is true . |
Returns
Data Type |
Description |
string |
Row- or column- wise string representation of the vector or matrix |
dump_vec
Returns the string representation of a row or column vector.
Syntax
PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:dump_vec(b)
Parameter |
Data Type |
Description |
b |
http://anzograph.com/matrices#tensor |
Tensor representation of a row or column vector |
Returns
Data Type |
Description |
string |
String representation of the row or column vector |
get_rows
Returns the number of rows present in a tensor.
Syntax
PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:get_rows(b)
Parameter |
Data Type |
Description |
b |
http://anzograph.com/matrices#tensor |
A tensor |
Returns
Data Type |
Description |
long |
The number of rows |
get_cols
Gets the number of columns present in a tensor.
Syntax
PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:get_cols(b)
Parameter |
Data Type |
Description |
b |
http://anzograph.com/matrices#tensor |
A tensor |
Returns
Data Type |
Description |
long |
The number of columns |
get_slices
Gets the number of slices present in a tensor.
Syntax
PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:get_slices(b)
Parameter |
Data Type |
Description |
b |
http://anzograph.com/matrices#tensor |
A tensor |
Returns
Data Type |
Description |
long |
The number of slices |
get_order
Gets the tensor order.
Syntax
PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:get_order(b)
Parameter |
Data Type |
Description |
b |
http://anzograph.com/matrices#tensor |
A tensor |
Returns
Data Type |
Description |
long |
The tensor order |
get_total_elem
Gets the total number of elements that are present in a tensor.
Syntax
PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:get_total_elem(b)
Parameter |
Data Type |
Description |
b |
http://anzograph.com/matrices#tensor |
A tensor |
Returns
Data Type |
Description |
long |
The total number of elements |
get_nonzero
Gets the number of non-zero elements that are present in a sparse matrix.
Syntax
PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:get_nonzero(b)
Parameter |
Data Type |
Description |
b |
http://anzograph.com/matrices#tensor |
A tensor |
Returns
Data Type |
Description |
long |
The number of non-zero elements |
get_elem
Accesses one or more elements that are stored in a tensor.
Syntax
PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:get_elem(b, i [, j ] [, k ])
Parameter |
Data Type |
Description |
b |
http://anzograph.com/matrices#tensor |
A tensor |
i |
long |
The element stored in the ith row |
j |
long |
Optional argument that lists the element stored in the jth column |
k |
long |
Optional argument that lists the element stored in the kth slice |
Returns
Data Type |
Description |
double |
The element value |
get_diag
Extracts a diagonal from a matrix or sparse matrix.
Syntax
PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:get_diag(b [, k ])
Parameter |
Data Type |
Description |
b |
http://anzograph.com/matrices#tensor |
A tensor |
k |
long |
Optional diagonal number. By default, the main diagonal is accessed (k=0 ). For k>0 , the k th super-diagonal is accessed (top right corner). For k<0 , the k th sub-diagonal is accessed (bottom left corner). |
Returns
Data Type |
Description |
http://anzograph.com/matrices#tensor |
Tensor representation of the diagonal as a column vector |
getmax_val
Retrieves the maximum value from a tensor.
Syntax
PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:getmax_val(b)
Parameter |
Data Type |
Description |
b |
http://anzograph.com/matrices#tensor |
A tensor |
Returns
Data Type |
Description |
double |
The maximum value in the tensor |
getmin_val
Retrieves the minimum value from a tensor.
Syntax
PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:getmin_val(b)
Parameter |
Data Type |
Description |
b |
http://anzograph.com/matrices#tensor |
A tensor |
Returns
Data Type |
Description |
double |
The minimum value in the tensor |
get_subvec
Extracts a range of elements from a row or column vector.
Syntax
PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:get_subvec(b, i, j)
Parameter |
Data Type |
Description |
b |
http://anzograph.com/matrices#tensor |
A tensor |
i |
long |
Start index |
j |
long |
End index |
Returns
Data Type |
Description |
http://anzograph.com/matrices#tensor |
Tensor representation of a row or column vector |
subvec_head
Extracts starting elements from a row or column vector.
Syntax
PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:subvec_head(b, n)
Parameter |
Data Type |
Description |
b |
http://anzograph.com/matrices#tensor |
A tensor |
n |
long |
The number of elements to extract from the beginning of the vector |
Returns
Data Type |
Description |
http://anzograph.com/matrices#tensor |
Tensor representation of a row or column vector with elements from 0 to n-1 |
subvec_tail
Extracts tailing elements from a row or column vector.
Syntax
PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:subvec_tail(b, n)
Parameter |
Data Type |
Description |
b |
http://anzograph.com/matrices#tensor |
A tensor |
n |
long |
The number of elements to extract from the end of the vector |
Returns
Data Type |
Description |
http://anzograph.com/matrices#tensor |
Tensor representation of a row or column vector with n elements from the tail |
cosine_similarity
This function calculates the cosine similarity between two row vectors.
The cosine_similarity function is not compatible with column or matrix vectors. The input must be row vectors.
Syntax
PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:cosine_similarity(m, n)
Parameter |
Data Type |
Description |
m |
http://anzograph.com/matrices#tensor |
A row vector |
n |
http://anzograph.com/matrices#tensor |
The row vector to compare to the vector in argument m |
Returns
Data Type |
Description |
double |
Results range from -1 to 1 : -1 is perfectly dissimilar and 1 is perfectly similar |
subview_col
Extracts a column from a matrix or sparse matrix.
Syntax
PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:subview_col(b, n)
Parameter |
Data Type |
Description |
b |
http://anzograph.com/matrices#tensor |
A tensor |
n |
long |
Column index |
Returns
Data Type |
Description |
http://anzograph.com/matrices#tensor |
Tensor representation of a column vector |
subview_row
Extracts a row from matrix or sparse matrix.
Syntax
PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:subview_row(b, n)
Parameter |
Data Type |
Description |
b |
http://anzograph.com/matrices#tensor |
A tensor |
n |
long |
Row index |
Returns
Data Type |
Description |
http://anzograph.com/matrices#tensor |
Tensor representation of a row vector |
subview_rows
Extracts a range of rows from a matrix or sparse matrix.
Syntax
PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:subview_rows(b, r1, ..., rn)
Parameter |
Data Type |
Description |
b |
http://anzograph.com/matrices#tensor |
A tensor |
r1–rn |
long |
The start row index to the end row index |
Returns
Data Type |
Description |
http://anzograph.com/matrices#tensor |
Tensor representation of the matrix with rows from r1 to rn |
subview_cols
Extracts a range of columns from a matrix or sparse matrix.
Syntax
PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:subview_cols(b, c1, ..., cn)
Parameter |
Data Type |
Description |
b |
http://anzograph.com/matrices#tensor |
A tensor |
c1–n |
long |
The start column index to the end column index |
Returns
Data Type |
Description |
http://anzograph.com/matrices#tensor |
Tensor representation of the matrix with columns from c1 to cn |
subview_mat
Extracts a submatrix from a matrix or sparse matrix.
Syntax
PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:subview_mat(b, r1, ..., rn, c1, ..., cn)
Parameter |
Data Type |
Description |
b |
http://anzograph.com/matrices#tensor |
A tensor |
r1–n |
long |
The start row index to the end row index |
c1–n |
long |
The start column index to the end column index |
Returns
Data Type |
Description |
http://anzograph.com/matrices#tensor |
Tensor representation of a matrix of size [1+(rn-r1)] x [1+(cn-c1)] |
subview_head_rows
Extracts starting rows from a matrix or sparse matrix.
Syntax
PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:subview_head_rows(b, n)
Parameter |
Data Type |
Description |
b |
http://anzograph.com/matrices#tensor |
A tensor |
n |
long |
The number of rows to extract from the beginning of the matrix |
Returns
Data Type |
Description |
http://anzograph.com/matrices#tensor |
Tensor representation of a matrix with rows from 0 to n-1 |
subview_head_cols
Extracts starting columns from a matrix or sparse matrix.
Syntax
PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:subview_head_cols(b, n)
Parameter |
Data Type |
Description |
b |
http://anzograph.com/matrices#tensor |
A tensor |
n |
long |
The number of columns to extract from the beginning of the matrix |
Returns
Data Type |
Description |
http://anzograph.com/matrices#tensor |
Tensor representation of a matrix with columns from 0 to n-1 |
subview_tail_rows
Extracts tailing rows from a matrix or sparse matrix.
Syntax
PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:subview_tail_rows(b, n)
Parameter |
Data Type |
Description |
b |
http://anzograph.com/matrices#tensor |
A tensor |
n |
long |
The number of rows to extract from the end of the matrix |
Returns
Data Type |
Description |
http://anzograph.com/matrices#tensor |
Tensor representation of a matrix with n rows from the tail |
subview_tail_cols
Extracts tailing columns from a matrix or sparse matrix.
Syntax
PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:subview_tail_cols(b, n)
Parameter |
Data Type |
Description |
b |
http://anzograph.com/matrices#tensor |
A tensor |
n |
long |
The number of columns to extract from the end of the matrix |
Returns
Data Type |
Description |
http://anzograph.com/matrices#tensor |
Tensor representation of a matrix with n columns from the tail |
flatten_as_col
Returns a flattened version of a matrix as a column vector.
Syntax
PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:flatten_as_col(b)
Parameter |
Data Type |
Description |
b |
http://anzograph.com/matrices#tensor |
A tensor |
Returns
Data Type |
Description |
http://anzograph.com/matrices#tensor |
Tensor representation of the matrix as a column vector |
flatten_as_row
Returns a flattened version of a matrix as a row vector.
Syntax
PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:flatten_as_row(b)
Parameter |
Data Type |
Description |
b |
http://anzograph.com/matrices#tensor |
A tensor |
Returns
Data Type |
Description |
http://anzograph.com/matrices#tensor |
Tensor representation of the matrix as a row vector |
is_vec
Evaluates whether the given matrix is a row or column vector.
Syntax
PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:is_vec(b)
Parameter |
Data Type |
Description |
b |
http://anzograph.com/matrices#tensor |
A tensor |
Returns
Data Type |
Description |
boolean |
True if the matrix can be interpreted as a column or row vector. False if the matrix does not have exactly one column or one row. |
is_rowvec
Evaluates whether the given matrix is a row vector.
Syntax
PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:is_rowvec(b)
Parameter |
Data Type |
Description |
b |
http://anzograph.com/matrices#tensor |
A tensor |
Returns
Data Type |
Description |
boolean |
True if the matrix can be interpreted as a row vector. False if the matrix does not have exactly one row. |
is_colvec
Evaluates whether the given matrix is a column vector.
Syntax
PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:is_colvec(b)
Parameter |
Data Type |
Description |
b |
http://anzograph.com/matrices#tensor |
A tensor |
Returns
Data Type |
Description |
boolean |
True if the matrix can be interpreted as a column vector. False if the matrix does not have exactly one column. |
is_sorted
Evaluates whether a vector or matrix is sorted.
Syntax
PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:is_sorted(b [, t ] [, d ])
Parameter |
Data Type |
Description |
b |
http://anzograph.com/matrices#tensor |
A tensor |
t |
boolean |
Optional argument that specifies the sort dimension for the matrix. Set to true if elements are sorted row-wise and false if they are sorted column-wise. Default is false . |
d |
int |
Optional argument that specifies the sort direction for the matrix. Allowed arguments are:- 0: ascend (default). Elements are ascending; consecutive elements can be equal.
- 1: descend. Elements are descending; consecutive elements can be equal.
- 2: strictascend. Elements are strictly ascending; consecutive elements cannot be equal.
- 3: strictdescend. Elements are strictly descending; consecutive elements cannot be equal.
|
Returns
Data Type |
Description |
boolean |
True if the elements are sorted. False if they are not. |
is_tri_mat_upper
Evaluates whether a matrix is upper triangular, i.e., the matrix is square sized and all elements below the main diagonal are zero.
Syntax
PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:is_tri_mat_upper(b)
Parameter |
Data Type |
Description |
b |
http://anzograph.com/matrices#tensor |
A tensor |
Returns
Data Type |
Description |
boolean |
Returns true if the matrix is upper triangular and false if it is not |
is_tri_mat_lower
Evaluates whether a matrix is lower triangular, i.e., the matrix is square sized and all elements above the main diagonal are zero.
Syntax
PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:is_tri_mat_lower(b)
Parameter |
Data Type |
Description |
b |
http://anzograph.com/matrices#tensor |
A tensor |
Returns
Data Type |
Description |
boolean |
Returns true if the matrix is lower triangular and false if it is not |
is_diag_mat
Evaluates whether a matrix is diagonal, i.e., all elements outside of the main diagonal are zero.
Syntax
PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:is_diag_mat(b)
Parameter |
Data Type |
Description |
b |
http://anzograph.com/matrices#tensor |
A tensor |
Returns
Data Type |
Description |
boolean |
Returns true if the matrix is diagonal and false if it is not |
is_square
Evaluates whether a matrix is square, i.e., the number of rows is equal to the number of columns.
Syntax
PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:is_square(b)
Parameter |
Data Type |
Description |
b |
http://anzograph.com/matrices#tensor |
A tensor |
Returns
Data Type |
Description |
boolean |
Returns true if the matrix is square and false if it is not |
is_symmetric
Evaluates whether a matrix is symmetrical.
Syntax
PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:is_symmetric(b)
Parameter |
Data Type |
Description |
b |
http://anzograph.com/matrices#tensor |
A tensor |
Returns
Data Type |
Description |
boolean |
Returns true if the matrix is symmetrical and false if it is not |
is_hermitian
Checks whether the matrix is hermitian (self-adjoint).
Syntax
PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:is_hermitian(b)
Parameter |
Data Type |
Description |
b |
http://anzograph.com/matrices#tensor |
A tensor |
Returns
Data Type |
Description |
boolean |
Returns true if the matrix is hermitian and false if it is not |
has_nan
Evaluates whether a matrix is not a number (NaN).
Syntax
PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:has_nan(b)
Parameter |
Data Type |
Description |
b |
http://anzograph.com/matrices#tensor |
A tensor |
Returns
Data Type |
Description |
boolean |
Returns true if at least one of the elements is NaN and false if all elements are numbers |
vec_all
Evaluates whether all elements in a row or column vector are non-zero or satisfy the specified relational condition.
Syntax
PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:vec_all(b [, c ] [, val ])
Parameter |
Data Type |
Description |
b |
http://anzograph.com/matrices#tensor |
A tensor |
c |
int |
Optional argument that specifies the relational condition to test:- 0 (default): not equal
- 1: greater than
- 2: less than
- 3: equal
- 4: greater than or equal to
- 5: less than or equal to
|
val |
double |
Optional argument that specifies the value to apply the condition (c ) to. Default is 0 . |
Returns
Data Type |
Description |
boolean |
Returns true if all elements are non-zero or satisfy the condition and false if not |
vec_any
Evaluates whether any elements in a row or column vector are non-zero or satisfy the specified relational condition.
Syntax
PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:vec_any(b [, c ] [, val ])
Parameter |
Data Type |
Description |
b |
http://anzograph.com/matrices#tensor |
A tensor |
c |
int |
Optional argument that specifies the relational condition to test:- 0 (default): not equal
- 1: greater than
- 2: less than
- 3: equal
- 4: greater than or equal to
- 5: less than or equal to
|
val |
double |
Optional argument that specifies the value to apply the condition (c ) to. Default is 0 . |
Returns
Data Type |
Description |
boolean |
Returns true if any elements are non-zero or satisfy the condition and false if not |
mat_all
Evaluates whether all elements in a matrix are non-zero or satisfy the specified relational condition.
Syntax
PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:mat_all(b [, d ] [, c ] [, val ])
Parameter |
Data Type |
Description |
b |
http://anzograph.com/matrices#tensor |
A tensor |
d |
boolean |
Optional argument that indicates whether to check rows or columns. Set to true for rows or false for columns. Default is false . |
c |
int |
Optional argument that specifies the relational condition to test:- 0 (default): not equal
- 1: greater than
- 2: less than
- 3: equal
- 4: greater than or equal to
- 5: less than or equal to
|
val |
double |
Optional argument that specifies the value to apply the condition (c ) to. Default is 0 . |
Returns
Data Type |
Description |
http://anzograph.com/matrices#tensor |
Tensor representation of a row vector with each element as 0 or 1, indicating whether the corresponding row or column has all non-zero elements |
mat_any
Evaluates whether any elements in a matrix are non-zero or satisfy the specified relational condition.
Syntax
PREFIX matrices: <http://cambridgesemantics.com/anzograph/matrices#>
matrices:mat_any(b [, d ] [, c ] [, val ])
Parameter |
Data Type |
Description |
b |
http://anzograph.com/matrices#tensor |
A tensor |
d |
boolean |
Optional argument that indicates whether to check rows or columns. Set to true for rows or false for columns. Default is false . |
c |
int |
Optional argument that specifies the relational condition to test:- 0 (default): not equal
- 1: greater than
- 2: less than
- 3: equal
- 4: greater than or equal to
- 5: less than or equal to
|
val |
double |
Optional argument that specifies the value to apply the condition (c ) to. Default is 0 . |
Returns
Data Type |
Description |
http://anzograph.com/matrices#tensor |
Tensor representation of a row vector with each element as 0 or 1, indicating whether the corresponding row or column has any non-zero elements |