Additional Data Science Functions (Preview)

AnzoGraph offers an additional PREVIEW package of pre-built data science functions that you can use in the same way as other native, built-in analytic functions. In addition, Cambridge Semantics offers an Apache Zeppelin Docker image, which includes a collection of individual notebooks that provide details and example usage of each of the AnzoGraph Data Science functions. The Docker image also includes a custom SPARQL interpreter, which allows you to securely connect to AnzoGraph, to run queries from the notebooks, or write your own queries to run against AnzoGraph data. See Zeppelin Notebook Integration for more information on installing the custom Apache Zeppelin Docker image.

This additional collection of Preview functions is subject to change based on feedback from users and, in particular, Data Science professionals who seek improvement or changes to individual functions, their signatures, or their operation.

The additional data science functions are organized into the following categories:

Correlation – determine the relationship between different elements.
Distribution– calculate the probability of a given X value over a random distribution.
Entropy – determine variance and probability density across a given distribution.
Feature Exploration – classify values in a distribution using techniques such as Linear Discriminant Analysis (LDA), Principal Component Analysis (PCA), or Singular Value Decomposition (SVD).
Linear Algebra – create product vectors or matrix from a given collection of random variables.
Profiling – produce different statistical metrics such as percentile, geometric mean, or skew on a given data population.
Sketching – estimate or determine frequency of items in a data distribution.
Utility – returns information on various attributes of vector space mapping and related matrix tensors.

The topics in this section provide details about each of the additional data science functions that are available.

Category/Function	Description
Correlation:
Canonical Correlation (CANCOR)	Calculates the overall correlation between two sets of variables.
Covariance (COVARIANCE)	Provides a measure of the strength of the correlation between two or more sets of random variables (or variates).
Matthews Correlation Coefficient (MCC)	Provides a measure of the quality of binary classifications of a condition with observed versus predicted scoring.
Pearson Correlation Coefficient (PCC)	Determines the extent to which two variables are linearly related: positive, negative, or no relationship.
Spearman Correlation Coefficient (SCC)	Determines how well the relationship between two variables can be described using a monotonic function.
Distribution:
Cumulative Distribution Function (CDF)	Calculates the probability of a random variable X taking on a value less than or equal to Y. Various other AnzoGraph distribution functions provide this calculation. Refer to the description and signatures of the following functions that can produce cumulative distribution calculations: Binomial Distribution (BINOMDIST) Chi-Squared Distribution (CHISQDIST) Continuous Uniform Distribution (CONUNIDIST) Discrete Uniform Distribution (DISCUNIDIST) Exponential Distribution (EXPDIST) Laplace Distribution (LAPLACEDIST) Log Normal Distribution (LOGNORDIST) Negative Binomial Distribution (NEGDIST) Normal Distribution (NORMDIST) Poisson Distribution (POISDIST) Student's t-distribution (TDIST) Weibull Distribution (WEIBULDIST) T Digest Metric (TDIGEST)
Bernoulli Distribution (BERNDIST)	Determines the probability of a specific event occurring, or not occurring, in tests that have only two possible outcomes (1 - Success or 0 - Failure).
Beta-Binomial Distribution (BETABINDIST)	Computes probability using a combination of both binomial and beta probability distributions.
Binomial Distribution (BINOMDIST)	Calculates the probability for X successes in N trials given a probability of success P for each trial.
Chi-Squared Distribution (CHISQDIST)	Calculates probability often used in hypothesis testing to compare an observed distribution with a theoretical one. Also provides a way to show a relationship between two categorical variables.
Continuous Uniform Distribution (CONUNIDIST)	Calculates probability using continuous probability distribution concerned with events that are equally likely to occur.
Discrete Uniform Distribution (DISCUNIDIST)	Calculates probability using symmetric probability distribution where a finite number of values are equally likely to be observed and every one of n values has equal probability.
Exponential Distribution (EXPDIST)	Calculates probability using a distribution that describes time between events in a Poisson point process (where events occur continuously and independently at a constant average rate).
Hypergeometric Distribution (HYPGEODIST)	Calculates probability from a distribution that is often used to predict the outcome of a process in which different elements are randomly drawn from a collection and not replaced.
Laplace Distribution (LAPLACEDIST)	Calculates probability using a distribution that represents differences between two independent variables that have identical exponential distributions (also called double exponential distribution).
Log Normal Distribution (LOGNORDIST)	Calculates probability using a distribution of a random variable whose logarithm follows a normal distribution. Log normal distributions are widely used in risk analysis.
Logarithmic (Series) Distribution (LOGSERDIST)	Calculates probability using a discrete probability distribution derived from the Maclaurin series expansion.
Negative Binomial Distribution (NEGBINDIST)	Calculates probability using a discrete probability distribution that concerns the number of trials which must occur in order to have a predetermined number of successes.
Normal Distribution (NORMDIST)	Calculates probability using a continuous probability distribution of data in which the majority of data points are relatively similar, within a small range of values having few outliers.
Poisson Distribution (POISDIST)	Calculates probability using a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time or space and those events occur with a known constant rate and occur independently of the time since the last event.
Skellam Distribution (SKELLAMDIST)	Calculates probability using the Skellam distribution which models the difference between two independent Poisson distributed variables.
Student's T-Distribution (TDIST)	Calculates probability using the Student's t-distribution and associated t scores. Often used in hypothesis testing when the sample size is small and/or when the population variance is unknown.
Weibull Distribution (WEIBULDIST)	Calculates probability from a continuous probability distribution that is commonly used to assess product reliability, analyze product life data and failure times.
Entropy:
Cross Entropy (CROSSENTROPY)	Computes cross-entropy, which is commonly used to quantify the difference between two probability distributions.
Discrete Entropy Metric (DISCENTROPY)	Calculates discrete entropy for maps on finite sets.
Differential Entropy or Continuous Entropy Metric	Computes differential entropy (also referred to as continuous entropy), which is entropy defined for distributions with a continuous random variable. Various other AnzoGraph distribution functions can provide this calculation. Refer to the description and signatures of the following functions that can produce entropy calculations: Normal Distribution (NORMDIST) Log Normal Distribution (LOGNORDIST) Exponential Distribution (EXPDIST) Discrete Uniform Distribution (DISCUNIDIST) Continuous Uniform Distribution (CONUNIDIST) Laplace Distribution (LAPLACEDIST) Weibull Distribution (WEIBULDIST)
Feature Exploration:
Principal Component Analysis (PCA)	Reduces a high-dimensional dataset into fewer dimensions while retaining important information, which makes it easier to explore and visualize data.
Singular Value Decomposition (SVD)	Similar to PCA, except that the factorization for SVD is done on the data matrix, whereas the factorization is done on the covariance matrix with PCA.
Linear Discriminant Analysis (LDA)	Uses dimensionality reduction and classifier to make predictions.
Linear Algebra:
Gramian Matrix (GRAMIAN)	Creates a Gramian matrix commonly used to compute linear independence.
Profiling Metric:
Discrete Probability Metric	Calculates a discrete probability distribution of values. Various other AnzoGraph distribution functions provide a discrete probability metric. Refer to the description and signatures of the following functions that can produce discrete probability metrics: Binomial Distribution (BINOMDIST) Poisson Distribution (POISDIST) Negative Binomial Distribution (NEGDIST) Bernoulli Distribution (BERNDIST)
Geometric Mean Metric (GMEAN)	Calculates geometric mean, defined as the nth root of the product of n positive numbers.
Percentile Metric (PERCENTILE)	Calculates 1 to 100 percentile of numeric values.
Skew Metric (SKEWCOEFF)	Calculates Pearson’s coefficient of skewness on numeric values.
TDigest Metric (TDIGEST)	Creates an estimate of the median (and more generally, any percentile) from either distributed data or streaming data, using a t-Digest probabilistic data structure.
Sketches:
Cardinality Metric (HLL)	Calculates cardinality estimates of a data set.
Frequent Items (FI)	Collection of function signatures used to create sketches and obtain the most frequent items from a stream of items.
Quantile/Rank Sketch (KLL)	Collection of signatures used to calculate the quantile/rank from a stream of items using the KLL sketch computation model.
Theta Sketch (THETA)	Collection of signatures used to perform estimates of set operations, Union, Intersection, and Difference, all using the Theta Sketch framework. There are several different function signatures available for Theta Sketch estimate calculations.
Miscellaneous:
Matrix Utilities	Collection of functions that return information on various attributes of vector space mapping and related matrix tensors.

The following sections provided additional detail of each available data science function (listed in alphabetical order) as well as the syntax or signature of each function call.

Bernoulli Distribution (BERNDIST)

The Bernoulli Distribution function determines the probability of success or failure (or Yes or No) in tests that have only two possible outcomes.

Wikipedia Reference: Bernoulli Distribution

The general signature for calling the Bernoulli Distribution function is the following:

prefix:berndist(data : String, prob : bool, SuccessIs : String)

Where prefix points to the URI <http://cambridgesemantics.com/anzograph/statistics#> location of the AnzoGraph data science functions.

Parameter	Data Type	Description
Input :
data	String	String. Column data.
prob	bool	Probability of success (true) or failure (false).
SuccessIs	String	Success string.
Output :
Probability	double	Bernoulli distribution probability value.

Beta-Binomial Distribution (BETABINDIST)

The Beta-Binomial Distribution function computes probability using a combination of both binomial and beta probability distributions.

Wikipedia Reference: Beta-Binomial Distribution

The general signature for calling the Beta-Binomial Distribution function is the following:

prefix:betabindist(k : double, n : long, alpha : double, beta : double)

Where prefix points to the URI <http://cambridgesemantics.com/anzograph/statistics#> location of the AnzoGraph data science functions.

Parameter	Data Type	Description
Input:
k	double	Find probability for the number.
n	double	Number of trials.
alpha, beta	double	Shape parameters.
Output:
probability	double	The probability of occurrence k for a beta binomial n, alpha, beta.

Binomial Distribution (BINOMDIST)

The Binomial Distribution function calculates the probability for X successes in N trials given a probability of success P for each trial.

Wikipedia Reference: Binomial Distribution

The general signature for calling the Binomial Distribution function is the following:

prefix:binomdist(data : String, n : long, k : long, SuccessIs : String)

Where prefix points to the URI <http://cambridgesemantics.com/anzograph/statistics#> location of the AnzoGraph data science functions.

Parameter	Data Type	Description
Input:
data	String	Column data.
n	long	Number of trials.
k	long	Number of success in n trials.
SuccessIs	String	Define success string among the column data.
Output:
probability	double	Probability value which is the value for probability mass function.
cdfLower	double	Cumulative distribution function which is probability (<=k) under the area of distribution.
cdfUpper	double	Cumulative distribution function which is probability (>k) under the area of distribution.

Canonical Correlation (CANCOR)

The Canonical Correlation function calculates the overall correlation between two sets of variables.

Wikipedia Reference: Canonical correlation function

The general signature for calling the Canonical Correlation function is the following:

prefix:cancor(lc : int, m : int, x1, x2,...,xm : double, y1, y2,...,yn : double)

Where prefix points to the URI <http://cambridgesemantics.com/anzograph/matrices#> location of the AnzoGraph data science functions.

Parameter	Data Type	Description
Input:
lc	int	Display linear combinations for only the first 'lc' canonical correlations.
m	int	Number of columns in first set.
x1, x2,...,xm	double	Feature columns in first datasets.
y1, y2,...,yn	double	Feature columns in second datasets.
Output:
CanonicalCorrelations	String	Canonical Correlation.
SquaredCanonicalCorrelations	String	Square of Canonical Correlation.
CanonicalCoefficients	String	Canonical Coefficient.

Cardinality Metric (HLL)

This function calculates cardinality estimates of a data set using Apache datasketches's HyperLogLog(hll).

Wikipedia Reference: Cardinality Prominence Metric

The general signature for calling the Cardinality function is the following:

prefix:hll(data : Object, lgConfigK : int , TgtHllType : int)

Where prefix points to the URI <http://cambridgesemantics.com/anzograph/sketch#> location of the AnzoGraph data science functions.

Parameter	Data Type	Description
Input:
data	Object	Data set(Input can be mix of all types).
lgConfigK	int	Log-base-2 of K, where K is the number of buckets or slots for the sketch. This value must be between 4 and 21 inclusively. It is an optional parameter and default value is 12.
TgtHllType	int	Specifies the target type of HLL sketch to be created. It's value must be 4 (for HLL_4), 6 (HLL_6) or 8 (HLL_8). It is an optional parameter and default value is 4.
Output:
cardinality	double	Cardinality metric value of a data set.

Chi-Squared Distribution (CHISQDIST)

The Chi-Squared Distribution function calculates probability often used in hypothesis testing, to compare an observed distribution with a theoretical one. Also provides a way to show a relationship between two categorical variables.

Wikipedia Reference: Chi-Squared Distribution

The general signature for calling the Chi-Squared Distribution function is the following:

prefix:chisqdist(data : double, S : double)

Where prefix points to the URI <http://cambridgesemantics.com/anzograph/statistics#> location of the AnzoGraph data science functions.

Parameter	Data Type	Description
Input:
data	double	Sample data.
S	double	Population standard deviation.
Output:
mean	double	Mean of the distribution.
stdDev	double	Standard deviation of the distribution.
variance	double	Variance of the distribution.
chi-squareStatistic	double	*[(n - 1) s^2] / d^2**, where d is the standard deviation of the population; s is the standard deviation of the sample, and n is the sample size.
count	long	Number of samples so the degrees of freedom(k) is (count-1).
pdf	double	Probability value which is the value for probability mass function.
cdf	double	Cumulative distribution function which is probability for <= chi-squareStatistic.

Continuous Entropy Metric

This function is also referred to as Differential Entropy. See Differential Entropy or Continuous Entropy Metric for function call parameters and details.

Continuous Uniform Distribution (CONUNIDIST)

This function calculates probability using a continuous probability distribution concerned with events that are equally likely to occur.

Wikipedia Reference: Continuous Uniform Distribution

The general signature for calling the Continuous Uniform Distribution function is the following:

prefix:conunidist(data : double, a : double, b : double)

Where prefix points to the URI <http://cambridgesemantics.com/anzograph/statistics#> location of the AnzoGraph data science functions.

Parameter	Data Type	Description
Input:
data	double	Column data.
a	double	Minimum value of the probability interval.
b	double	Maximum value of the probability interval.
Output:
cdf	double	Cumulative distribution function which is probability under the area of distribution.
pdf	double	the probability density function value.
diffEntropy	double	Differential Entropy in nats.

Covariance (COVARIANCE)

The Covariance function provides a measure of the strength of the correlation between two or more sets of random variables (or variates).

Wikipedia Reference: Covariance

The general signature for calling the Covariance function is the following:

prefix:covariance(x1 : double, x2 : double,...,xn : double)

Where prefix points to the URI <http://cambridgesemantics.com/anzograph/matrices#> location of the AnzoGraph data science functions.

Parameter	Data Type	Description
Input:
x1, x2,...,xn	double	Feature column datasets.
Output:
covariance_matrix	"http://anzograph.com/matrices#tensor"	Co-variance matrix.

Cross Entropy (CROSSENTROPY)

This function computes cross-entropy, which is commonly used to quantify the difference between two probability distributions.

Wikipedia Reference: Cross Entropy

The general signature for calling the Cross Entropy function is the following:

prefix:crossentropy(p : double, q : double)

Where prefix points to the URI <http://cambridgesemantics.com/anzograph/statistics#> location of the AnzoGraph data science functions.

Parameter	Data Type	Description
Input:
p	double	true probabilities for x.
q	double	predicted probabilities for x.
Output:
cross_entropy	double	Cross entropy value.

Cumulative Distribution Function (CDF)

A Cumulative Distribution function calculate the probability of a random variable X taking on a value less than or equal to Y.

Wikipedia Reference: Cumulative Distribution Function (CDF)

Various other AnzoGraph distribution functions provide this calculation. Refer to the description and signatures of the following functions that can produce cumulative distribution calculations:

Binomial Distribution (BINOMDIST)
Chi-Squared Distribution (CHISQDIST)
Continuous Uniform Distribution (CONUNIDIST)
Discrete Uniform Distribution (DISCUNIDIST)
Exponential Distribution (EXPDIST)
Laplace Distribution (LAPLACEDIST)
Log Normal Distribution (LOGNORDIST)
Negative Binomial Distribution (NEGBINDIST)
Normal Distribution (NORMDIST)
Poisson Distribution (POISDIST)
Student's T-Distribution (TDIST)
Weibull Distribution (WEIBULDIST)
TDigest Metric (TDIGEST)

Differential Entropy or Continuous Entropy Metric

Differential entropy (also referred to as continuous entropy) is entropy that can be computed for distributions with a continuous random variable.

Wikipedia Reference: Differential entropy

Various other AnzoGraph distribution functions can provide this calculation. Refer to the description and signatures of the following functions that can produce entropy calculations:

Normal Distribution (NORMDIST)
Log Normal Distribution (LOGNORDIST)
Exponential Distribution (EXPDIST)
Discrete Uniform Distribution (DISCUNIDIST)
Continuous Uniform Distribution (CONUNIDIST)
Laplace Distribution (LAPLACEDIST)
Weibull Distribution (WEIBULDIST)

Discrete Entropy Metric (DISCENTROPY)

This function calculates entropy for maps on finite sets, referred to as discrete entropy.

ScienceDirect Reference: Discrete Entropy

The general signature for calling the Discrete Entropy Metric function is the following:

prefix:discentropy(data : String)

Where prefix points to the URI <http://cambridgesemantics.com/anzograph/statistics#> location of the AnzoGraph data science functions.

Parameter	Data Type	Description
Input:
data	String	Column data.
Output:
discrete_entropy	double	Discrete entropy value.

Discrete Probability Metric

This function calculates a discrete probability distribution of values.

Wikipedia Reference: Discrete Probability

Various other AnzoGraph distribution functions can provide a discrete probability metric. Refer to the description and signatures of the following functions that can produce discrete probability metrics:

Binomial Distribution (BINOMDIST)
Poisson Distribution (POISDIST)
Negative Binomial Distribution (NEGBINDIST)
Bernoulli Distribution (BERNDIST)

Discrete Uniform Distribution (DISCUNIDIST)

This function calculates probability using symmetric probability distribution where a finite number of values are equally likely to be observed and every one of n values has equal probability.

Wikipedia Reference: Discrete Uniform Distribution

The general signature for calling the Discrete Uniform Distribution function is the following:

prefix:discunidist(data : long, k : long)

Where prefix points to the URI <http://cambridgesemantics.com/anzograph/statistics#> location of the AnzoGraph data science functions.

Parameter	Data Type	Description
Input:
data	long	Column data.
k	long	To find CDF for k value (finite number of outcomes).
Output:
cdf	double	Cumulative distribution function which is probability under the area of distribution.
pdf	double	Probability density function value.
diffEntropy	double	Differential Entropy in nats.

Exponential Distribution (EXPDIST)

The Exponential Distribution function calculates probability using a distribution that describes time between events in a Poisson point process (where events occur continuously and independently at a constant average rate).

Wikipedia Reference: Exponential Distribution

The general signature for calling the Exponential Distribution function is the following:

prefix:expdist(data : long, x : double)

Where prefix points to the URI <http://cambridgesemantics.com/anzograph/statistics#> location of the AnzoGraph data science functions.

Parameter	Data Type	Description
Input:
data	long	Column data.
x	double	To find probability for interval.
Output:
cdfLower	double	Cumulative distribution function which is probability (<=x) under the area of distribution.
cdfUpper	double	Cumulative distribution function which is probability (>x) under the area of distribution.
pdf	double	Probability density function value.
diffEntropy	double	Differential Entropy in nats.

Frequent Items (FI)

This function is used to create sketches and obtain the most frequent items from a stream of items. There are several different function signatures available for frequent items discovery.

Reference: Frequent Items

The prefix shown in the function signatures below points to the URI <http://cambridgesemantics.com/anzograph/sketch#> location of the AnzoGraph data science functions.

fi – Creates frequent items sketches.

prefix:fi(val : Object, weight : long)

Parameter	Data Type	Description
Input:
val	Object	Data set (supporting short, int, long, float, double and string).
weight	long	Corresponding weight of the 'val', default is 1. This is optional parameter.
Output:
fi_sketch	"http://anzograph.com/statistics#fi_sketch"	Binary stream containing sketch data type and its frequency sketch.

fi::get_estimates – Gets the estimate for the frequency, lower and upper bound of the given item.

prefix:fi::get_estimates(fi_sketch : "http://anzograph.com/statistics#fi_sketch",
  item : Object)

Parameter	Data Type	Description
Input:
fi_sketch	"http://anzograph.com/statistics#fi_sketch"	Binary stream containing sketch data type and its frequency sketch.
item	Object	Data item whose frequency to be estimated.
Output:
frequency	long	Frequency estimates of the given item.
lower_bound	long	Lower bound frequency estimates of the given item.
upper_bound	long	Upper bound frequency estimates of the given item.

fi::get_active_items_total_weights – Gets the number of active items in the sketch and the estimated total stream weight.

prefix:fi::get_active_items_total_weights(fi_sketch : "http://anzograph.com/statistics#fi_sketch")

Parameter	Data Type	Description
Input:
fi_sketch	"http://anzograph.com/statistics#fi_sketch"	Binary stream containing sketch data type and its frequency sketch.
Output:
num_active_items	long	The number of active items in the sketch.
total_weights	long	The estimated total stream weight.

fi::get_top_items – Get top frequent items and their corresponding frequency.

prefix:fi::get_top_items(fi_sketch : "http://anzograph.com/statistics#fi_sketch")

Parameter	Data Type	Description
Input:
fi_sketch	"http://anzograph.com/statistics#fi_sketch"	Binary stream containing sketch data type and its frequency sketch.
Output:
Item1	double	Item who has the highest frequency.
Item1_frequency	long	Frequency estimates of the first item.
Item2	double	Item who has second highest frequency.
Item2_frequency	long	Frequency estimates of the second item.
...Item5	double	Item who has 5th highest frequency.
...Item5_frequency	long	Frequency estimates of the 5th item.

fi::get_top_strings – Get top frequent strings and their corresponding frequency.

prefix:fi::get_top_strings(fi_sketch : "http://anzograph.com/statistics#fi_sketch")

Parameter	Data Type	Description
Input:
fi_sketch	"http://anzograph.com/statistics#fi_sketch"	Binary stream containing sketch data type and its frequency sketch.
Output:
Item1	string	String who has the highest frequency.
Item1_frequency	long	Frequency estimates of the first String.
Item2	string	String who has the highest frequency.
Item2_frequency	long	Frequency estimates of the first String.
... Item5	string	String who has 5th highest frequency.
Item5_frequency	long	Frequency estimates of the 5th string.

Geometric Mean Metric (GMEAN)

This function calculates geometric mean, defined as the nth root of the product of n positive numbers.

Wikipedia Reference: Geometric Mean

The general signature for calling the Geometric Mean function is the following:

prefix:gmean(data : double)

Where prefix points to the URI <http://cambridgesemantics.com/anzograph/statistics#> location of the AnzoGraph data science functions.

Parameter	Data Type	Description
Input:
data	double	Column data
Output:
geometric_mean	double	Geometric mean value

Gramian Matrix (GRAMIAN)

This function creates a Gramian matrix commonly used to compute linear independence.

Wikipedia Reference: Gramian Matrix

The general signature for calling the Gramian Matrix function is the following:

prefix:gramian(x1 : double, x2 : double,...,xn : double)

Where prefix points to the URI <http://cambridgesemantics.com/anzograph/matrices#> location of the AnzoGraph data science functions.

Parameter	Data Type	Description
Input:
x1, x2,...,xn	double	Feature column data sets.
Output:
gramian_matrix	"http://anzograph.com/matrices#tensor"	Gramian matrix.

Hypergeometric Distribution (HYPGEODIST)

The Hypergeometric Distribution function calculates probability from a distribution often used to predict the outcome of a process in which different elements are randomly drawn from a collection and not replaced.

Wikipedia Reference: HyperGeometric Distribution

The general signature for calling the Hypergeometric Distribution function is the following:

prefix:hypgeodist(data : String, n : int, k : int, SuccessIs : String)

Where prefix points to the URI <http://cambridgesemantics.com/anzograph/statistics#> location of the AnzoGraph data science functions.

Parameter	Data Type	Description
Input:
data	String	Column data.
n	int	Number of trials.
k	int	Number of success in n trials.
SuccessIs	String	Success string.
Output:
probability	double	Hypergeometric distribution probability value.

Laplace Distribution (LAPLACEDIST)

The Laplace Distribution function calculates probability using a distribution that represents differences between two independent variables that have identical exponential distributions (also called double exponential distribution).

Wikipedia Reference: Laplace Distribution

The general signature for calling the Laplace Distribution function is the following:

prefix:laplacedist(data : double, c : String, x1 : double, x2 : double)

Where prefix points to the URI <http://cambridgesemantics.com/anzograph/statistics#> location of the AnzoGraph data science functions.

Parameter	Data Type	Description
Input:
data	double	Column data.
c	String	User choice: 'below'; 'above'; 'bet'(Between); 'out'(Outside).
x1	double	Lower number x1 to find the probability.
x2	double	Upper number x2 to find the probability.
Output:
mean	double	Mean of the distribution.
scaleParam	double	Scale parameter of the distribution.
stdDev	double	Standard deviation of the distribution.
variance	double	Variance of the distribution.
diffEntropy	double	Differential Entropy in nats.
cdf	double	Cumulative distribution function which is probability under the area of distribution.
pdfLower	double	Probability density function value for x1.
pdfUpper	double	Probability density function value for x2.

Linear Discriminant Analysis (LDA)

This function applies linear discriminant analysis (LDA) to create combined eigen values and vectors that characterize or separate two or more classes of objects or events.

Wikipedia Reference: Linear Discriminant Analysis

There are several different function signatures available for Linear Discriminant analysis.

The prefix shown in the function signatures below points to the URI <http://cambridgesemantics.com/anzograph/matrices#> location of the AnzoGraph LDA data science functions.

lda::create – apply Linear Discriminant Analysis(LDA) to create combined eigenvalues and eigenvectors.

prefix:lda::create(y : double, x1 : double, x2 : double,...,xn : double)

Parameter	Data Type	Description
Input:
y	double	Class of feature tuple.
x1, x2,...,xn	double	Feature column data sets.
Output:
eigen_values_vectors_mean	"http://anzograph.com/matrices#lda_result"	Combined eigenvalues, eigenvectors, class mean, count and class map.

lda::get_eigvec – Get LDA's eigen vectors as a matrix from LDA data.

prefix:lda::get_eigvec(lda_data : "http://anzograph.com/matrices#lda_result")

Parameter	Data Type	Description
Input:
lda_data	"http://anzograph.com/matrices#lda_result"	Linear Discriminant Analysis data.
Output:
eigen_vectors	"http://anzograph.com/matrices#tensor"	Eigen vectors as a matrix.

lda::get_eigval – Get LDA's eigen values as a column vector from LDA data.

prefix:lda::get_eigval(lda_data : "http://anzograph.com/matrices#lda_result")

Parameter	Data Type	Description
Input:
lda_data	"http://anzograph.com/matrices#lda_result"	LDA data.
Output:
eigen_values	"http://anzograph.com/matrices#tensor"	Eigen values in the descending order as column vector.

lda::transform – Apply Linear Discriminant Analysis(LDA) to transform the samples onto the new subspace.

prefix:lda::transform(lda_data : "http://anzograph.com/matrices#lda_result",
  d : int, x1 : double, x2 : double,...,xn : double)

Parameter	Data Type	Description
Input:
lda_data	"http://anzograph.com/matrices#lda_result"	LDA data.
d	int	Number of eigen vectors to consider from the start.
x1, x2,...,xn	double	Feature column data sets.
Output:
transformed_data	double	Original data transformed into the tuple of lower dimensional space.

dump_tensor – Get string representation of vector or matrix in row-wise/column-wise order.

prefix:dump_tensor(m : "http://anzograph.com/matrices#tensor",
  type: int, isRowWize: Boolean)

Parameter	Data Type	Description
Input:
m	"http://anzograph.com/matrices#tensor"	A tensor of matrix/row vector/column vector.
type	Int	Type of tensor: 0-Row vector, 1-Column Vector, 2-Matrix. Optional, default is 2.
isRowWize	Boolean	False if the display matrix is column-wise. Optional, default is true.
Output:
dump	String	String representation of vector or matrix in row-wise/column-wise.

lda::predict – Predict the class for the samples using Linear Discriminant Analysis (LDA) as a classifier.

prefix:lda::predict(lda_data : "http://anzograph.com/matrices#lda_result",
   p1 : double, p2 : double,...,pn : double)

Parameter	Data Type	Description
Input:
lda_data	"http://anzograph.com/matrices#lda_result"	LDA data.
p1, p2,...,pn	double	Data sample whose class to predict.
Output:
class_name	String	Class name to which data tuple belongs.

lda::get_raw_eigval – Get LDA's unsorted eigen values from LDA data.

prefix:lda::get_raw_eigval(lda_data : "http://anzograph.com/matrices#lda_result")

Parameter	Data Type	Description
Input:
lda_data	"http://anzograph.com/matrices#lda_result"	LDA data.
Output:
eigen_values	"http://anzograph.com/matrices#tensor"	Eigen values in unsorted order as a column vector.

Log Normal Distribution (LOGNORDIST)

This function calculates probability using distribution of a random variable whose logarithm follows a normal distribution. The log normal distribution widely used in risk analysis.

Wikipedia Reference: Log Normal Distribution

The general signature for calling the Log Normal Distribution function is the following:

prefix:lognordist(data : double, c : String, x1 : double, x2 : double)

Where prefix points to the URI <http://cambridgesemantics.com/anzograph/statistics#> location of the AnzoGraph data science functions.

Parameter	Data Type	Description
Input:
data	double	Column data.
c	String	User choice: 'below', 'above', 'bet'(Between), 'out'(Outside), where user can choose probability below, above, between or outside.
x1	double	Lower number x1(>0) to find the probability.
x2	double	Upper number x2(>0) to find the probability.
Output:
mean	double	Mean of the distribution of natural logarithms distribution.
stdDev	double	Standard deviation of the distribution of natural logarithms distribution.
variance	double	Variance of the distribution.
diffEntropy	double	Differential Entropy in nats.
cdf	double	Cumulative distribution function which is probability under the area of distribution.
pdfLower	double	Probability density function value for x1.
pdfUpper	double	Probability density function value for x2.

Logarithmic (Series) Distribution (LOGSERDIST)

This function calculates probability using a discrete probability distribution derived from the Maclaurin series expansion.

Wikipedia Reference: Logarithmic (Series) Distribution

The general signature for calling the Logarithmic (Series) Distribution function is the following:

prefix:logserdist(data : String, k : long, SuccessIs : String)

Where prefix points to the URI <http://cambridgesemantics.com/anzograph/statistics#> location of the AnzoGraph data science functions.

Parameter	Data Type	Description
Input:
data	String	Column data.
k	long	Find probability for the number.
SuccessIs	String	Success string.
Output:
probability	double	Logarithmic distribution probability value.

Matrix Utilities

This collection of functions returns information on various attributes of vector space mapping and related matrix tensors.

There are several different matrix utility functions available.

The prefix shown in the function signatures below points to the URI <http://cambridgesemantics.com/anzograph/matrices#> location of the AnzoGraph matrix utility functions.

get_rows – Get number of rows present in tensor.
```
prefix:get_rows(b : "http://anzograph.com/matrices#tensor") 
```
Parameter Data Type Description
Input:
b "http://anzograph.com/matrices#tensor" A Tensor.
Output:
n long Number of rows.
get_cols – Get number of columns present in tensor.
```
prefix:get_cols(b : "http://anzograph.com/matrices#tensor")
```
Parameter Data Type Description
Input:
b "http://anzograph.com/matrices#tensor" A Tensor.
Output:
n long Number of columns.
get_slices – Get number of slices present in tensor.
```
prefix:get_slices(b : "http://anzograph.com/matrices#tensor")
```
Parameter Data Type Description
Input:
b "http://anzograph.com/matrices#tensor" A Tensor.
Output:
n long Number of slices.
get_order – Get tensor order.
```
prefix:get_order(b : "http://anzograph.com/matrices#tensor")
```
Parameter Data Type Description
Input:
b "http://anzograph.com/matrices#tensor" A Tensor.
Output:
n long Tensor order.
get_total_elem – Get total number of elements present in tensor.
```
prefix:get_total_elem(b : "http://anzograph.com/matrices#tensor")
```
Parameter Data Type Description
Input:
b
"http://anzograph.com/matrices#tensor"
A Tensor.
Output:
n long Total number of elements.
get_nonzero – Get number of non-zero elements present in sparse matrix.
```
prefix:get_nonzero(b : "http://anzograph.com/matrices#tensor")
```
Parameter Data Type Description
Input:
b "http://anzograph.com/matrices#tensor" A Tensor.
Output:
n long Number of non-zero elements present in sparse matrix.

Parameter	Data Type	Description
Input:
b	"http://anzograph.com/matrices#tensor"	A Tensor.
Output:
n	long	Number of rows.

Parameter	Data Type	Description
Input:
b	"http://anzograph.com/matrices#tensor"	A Tensor.
Output:
n	long	Number of columns.

Parameter	Data Type	Description
Input:
b	"http://anzograph.com/matrices#tensor"	A Tensor.
Output:
n	long	Number of slices.

Parameter	Data Type	Description
Input:
b	"http://anzograph.com/matrices#tensor"	A Tensor.
Output:
n	long	Tensor order.

Parameter	Data Type	Description
Input:
b	"http://anzograph.com/matrices#tensor"	A Tensor.
Output:
n	long	Total number of elements.

Parameter	Data Type	Description
Input:
b	"http://anzograph.com/matrices#tensor"	A Tensor.
Output:
n	long	Number of non-zero elements present in sparse matrix.

get_elem – Access the individual element stored in tensor.

prefix:get_elem(b : "http://anzograph.com/matrices#tensor",
   i : long, j : long, k : long)

Parameter	Data Type	Description
Input:
b	"http://anzograph.com/matrices#tensor"	A Tensor.
i	long	Element stored at ith row.
j	long	Element stored at jth column; optional parameter.
k	long	Element stored at kth slice; optional parameter.
Output:
v	double	Element value.

dump_tensor – Display the Armadillo header and the first few elements of the matrix or vector data as a string.

prefix:dump_tensor(b : "http://anzograph.com/matrices#tensor", type : Int, isRowWise : boolean)

Parameter	Data Type	Description
Input:
b	"http://anzograph.com/matrices#tensor"	A matrix/row vector/column vector.
type	int	Type of tensor: 0 - Row vector; 1 -Column Vector; 2 - Matrix. This parameter is optional; the default value is 2.
isRowWise	Boolean	False if display matrix in column-wise. This parameter is optional, the default value is true.
Output:
s	String	Row-wise or column-wise string representation of vector or matrix.

make_matrix – Create a matrix of doubles with the given dimensions and values.

prefix:make_matrix(m : int, n : int, v : double, ...)

Parameter	Data Type	Description
Input:
m	int	The number of rows in the new matrix.
n	int	The number of columns in the new matrix
v	double	Matrix elements to fill in row-wise; optional repeatable parameter. The default value is 0 for all elements.
Output:
b	"http://anzograph.com/matrices#tensor"	Tensor representation for m x n element matrix of doubles.

subview_col – Extract a column from matrix or sparse matrix.
```
prefix:subview_col(b: "http://anzograph.com/matrices#tensor", n : long)
```
Parameter Data Type Description
Input:
b "http://anzograph.com/matrices#tensor" A Tensor.
n long Column index.
Output:
v "http://anzograph.com/matrices#tensor" Tensor representation of Column vectors.
subview_row – Extract a row from matrix or sparse matrix.
```
prefix:subview_row(b: "http://anzograph.com/matrices#tensor", n : long)
```
Parameter Data Type Description
Input:
b "http://anzograph.com/matrices#tensor" A Tensor.
n long Row index.
Output:
v "http://anzograph.com/matrices#tensor" Tensor representation of row vector.
dump_vec – Display the row or column vector data as a string.
```
prefix:dump_vec(b : "http://anzograph.com/matrices#tensor")
```
Parameter Data Type Description
Input:
b "http://anzograph.com/matrices#tensor" Row or column vector.
Output:
s String String representation of row or column vector.

Parameter	Data Type	Description
Input:
b	"http://anzograph.com/matrices#tensor"	A Tensor.
n	long	Column index.
Output:
v	"http://anzograph.com/matrices#tensor"	Tensor representation of Column vectors.

Parameter	Data Type	Description
Input:
b	"http://anzograph.com/matrices#tensor"	A Tensor.
n	long	Row index.
Output:
v	"http://anzograph.com/matrices#tensor"	Tensor representation of row vector.

Parameter	Data Type	Description
Input:
b	"http://anzograph.com/matrices#tensor"	Row or column vector.
Output:
s	String	String representation of row or column vector.

subview_rows – Extract a range of rows from matrix or sparse matrix.

prefix:subview_rows(b : "http://anzograph.com/matrices#tensor",
   r1 : long, ... rn : long)

Parameter	Data Type	Description
Input:
b	"http://anzograph.com/matrices#tensor"	A Tensor.
r1, r2,...,rn	long	Start row index (inclusive) to end row index (inclusive).
Output:
v	"http://anzograph.com/matrices#tensor"	Tensor representation of matrix with rows from r1 to rn.

subview_cols – Extract a range of columns from matrix or sparse matrix.

prefix:subview_cols(b : "http://anzograph.com/matrices#tensor", c1 : long,... cn : long)

Parameter	Data Type	Description
Input:
b	"http://anzograph.com/matrices#tensor"	A Tensor.
c1, c2,...,cn	long	Start column index (inclusive) to end column index (inclusive).
Output:
v	"http://anzograph.com/matrices#tensor"	Tensor representation of matrix with columns from c1 to cn.

subview_mat – Extract a submatrix from matrix or sparse matrix.

prefix:subview_mat(b : "http://anzograph.com/matrices#tensor",
   r1 : long, c1 : long,... rn : long, cn : long)

Parameter	Data Type	Description
Input:
b	"http://anzograph.com/matrices#tensor"	A Tensor.
r1, r2,...,rn	long	Start row index (inclusive) to end row index (inclusive)
c1,c2,...,cn	long	Start column index (inclusive) to end column index (inclusive)
Output:
v	"http://anzograph.com/matrices#tensor"	Tensor representation of matrix of [1+(rn-r1)] x [1+(cn-c1)] size.

subview_head_rows – Extract starting rows from matrix or sparse matrix.

prefix:subview_head_rows(b : "http://anzograph.com/matrices#tensor",
   n : long)

Parameter	Data Type	Description
Input:
b	"http://anzograph.com/matrices#tensor"	A Tensor.
n	long	Number of rows from the start.
Output:
v	"http://anzograph.com/matrices#tensor"	Tensor representation of matrix with rows from 0 to n-1.

subview_head_cols – Extract starting columns from matrix or sparse matrix.

prefix:subview_head_cols(b : "http://anzograph.com/matrices#tensor",
   n : long)

Parameter	Data Type	Description
Input:
b	"http://anzograph.com/matrices#tensor"	A Tensor.
n	long	Number of columns from the start.
Output:
v	"http://anzograph.com/matrices#tensor"	Tensor representation of matrix with columns from 0 to n-1.

subview_tail_rows – Extract tailing rows from matrix or sparse matrix.

prefix:subview_tail_rows(b : "http://anzograph.com/matrices#tensor",
  n : long)

Parameter	Data Type	Description
Input:
b	"http://anzograph.com/matrices#tensor"	A Tensor.
n	long	Number of rows from the tail.
Output:
v	"http://anzograph.com/matrices#tensor"	Tensor representation of matrix with n rows from tail.

subview_tail_cols – Extract tailing columns from matrix or sparse matrix.

prefix:subview_tail_cols(b : "http://anzograph.com/matrices#tensor",
   n : long)

Parameter	Data Type	Description
Input:
b	"http://anzograph.com/matrices#tensor"	A Tensor.
n	long	Number of columns from the tail.
Output:
v	"http://anzograph.com/matrices#tensor"	Tensor representation of matrix with n columns from tail.

get_subvec – Extract range of elements from a row or column vector.

prefix:get_subvec(b : "http://anzograph.com/matrices#tensor",
   i : long, j : long)

Parameter	Data Type	Description
Input:
b	"http://anzograph.com/matrices#tensor"	A Tensor.
i	long	Start index.
j	long	End index.
Output:
v	"http://anzograph.com/matrices#tensor"	Tensor representation of row or column vector.

subvec_head – Extract starting elements from a row or column vector.

prefix:subvec_head(b : "http://anzograph.com/matrices#tensor",
   n : long)

Parameter	Data Type	Description
Input:
b	"http://anzograph.com/matrices#tensor"	A Tensor.
n	long	Number of elements from the start.
Output:
v	"http://anzograph.com/matrices#tensor"	Tensor representation of row or column vector having elements from the start.

subvec_tail – Extract tailing elements from a row or column vector.

prefix:subvec_tail(b : "http://anzograph.com/matrices#tensor",
   n : long)

Parameter	Data Type	Description
Input:
b	"http://anzograph.com/matrices#tensor"	A Tensor.
n	long	Number of elements from the tail.
Output:
v	"http://anzograph.com/matrices#tensor"	Tensor representation of row or column vectors having elements from the tail.

get_diag – Extract a diagonal from matrix or sparse matrix.

prefix:get_diag(b : "http://anzograph.com/matrices#tensor",
   k : long)

Parameter	Data Type	Description
Input:
b	"http://anzograph.com/matrices#tensor"	A Tensor.
k	long	Optional diagonal number parameter; by default, the main diagonal is accessed (k=0). For k > 0, the kth super-diagonal is accessed (top-right corner). For k < 0, the kth sub-diagonal is accessed (bottom-left corner).
Output:
v	"http://anzograph.com/matrices#tensor"	Tensor representation of the diagonal as a column vector.

flatten_as_col – Get a flattened version of the matrix as a column vector.

prefix:flatten_as_col(b : "http://anzograph.com/matrices#tensor")

Parameter	Data Type	Description
Input:
b	"http://anzograph.com/matrices#tensor"	A Tensor.
Output:
v	"http://anzograph.com/matrices#tensor"	Tensor representation of a flattened version of the matrix as a column vector.

flatten_as_row – Get a flattened version of the matrix as a row vector.

prefix:flatten_as_row(b : "http://anzograph.com/matrices#tensor")

Parameter	Data Type	Description
Input:
b	"http://anzograph.com/matrices#tensor"	A Tensor.
Output:
v	"http://anzograph.com/matrices#tensor"	Tensor representation of a flattened version of the matrix as a row vector.

getmax_val – Get the maximum value in the tensor.
```
prefix:getmax_val(b : "http://anzograph.com/matrices#tensor")
```
Parameter Data Type Description
Input:
b "http://anzograph.com/matrices#tensor" A Tensor.
Output:
n double Maximum value in the tensor.
getmin_val – Get the minimum value in the tensor.
```
prefix:getmin_val(b : "http://anzograph.com/matrices#tensor")
```
Parameter Data Type Description
Input:
b "http://anzograph.com/matrices#tensor" A Tensor.
Output:
n double Minimum value in the tensor.

is_vec – Check whether a matrix is a vector.

prefix:is_vec(b : "http://anzograph.com/matrices#tensor")

Parameter	Data Type	Description
Input:
b	"http://anzograph.com/matrices#tensor"	A Tensor.
Output:
v	boolean	True if the matrix can be interpreted as a vector (either column or row vector). False if the matrix does not have exactly one column or one row.

is_rowvec – Check whether matrix is a row vector.

prefix:is_rowvec(b : "http://anzograph.com/matrices#tensor")

Parameter	Data Type	Description
Input:
b	"http://anzograph.com/matrices#tensor"	A Tensor.
Output:
v	boolean	True if the matrix can be interpreted as a row vector. False if the matrix does not have exactly one row.

is_colvec – Check whether matrix is a column vector.

prefix:is_colvec(b : "http://anzograph.com/matrices#tensor")

Parameter	Data Type	Description
Input:
b	"http://anzograph.com/matrices#tensor"	A Tensor.
Output:
v	boolean	True if the matrix can be interpreted as a column vector. False if the matrix does not have exactly one column.

is_sorted – Check whether vector or matrix is sorted.

prefix:is_sorted(b : "http://anzograph.com/matrices#tensor",
   t : boolean, d : int)

Parameter	Data Type	Description
Input:
b	"http://anzograph.com/matrices#tensor"	A Tensor.
t	boolean	Sort dimension for matrix. This parameter is optional; the default is false; True if elements are sorted in each row. False if elements are sorted in each column.
d	int	Optional argument specifying sort direction for matrix; the default is 0 (ascend). Allowed arguments are: 0-ascend (default), 1-descend 2-strictascend 3-strictdescend "ascend" - elements are ascending; consecutive elements can be equal; this is the default. "descend" - elements are descending; consecutive elements can be equal. "strictascend" - elements are strictly ascending; consecutive elements cannot be equal. "strictdescend" - elements are strictly descending; consecutive elements cannot be equal.
Output:
v	boolean	True if the elements are sorted, else false.

is_tri_mat_upper – Check whether matrix is upper triangular.

prefix:is_tri_mat_upper(b : "http://anzograph.com/matrices#tensor")

Parameter	Data Type	Description
Input:
b	"http://anzograph.com/matrices#tensor"	A Tensor.
Output:
v	boolean	True if the matrix is upper triangular,that is, the matrix is square sized and all elements below the main diagonal are zero; otherwise, returns false.

is_tri_mat_lower – Check whether matrix is lower triangular.

prefix:is_tri_mat_lower(b : "http://anzograph.com/matrices#tensor")

Parameter	Data Type	Description
Input:
b	"http://anzograph.com/matrices#tensor"	A Tensor.
Output:
v	boolean	True if the matrix is lower triangular, that is, the matrix is square sized and all elements above the main diagonal are zero; otherwise, returns false.

is_diag_mat – Check whether a matrix is diagonal.

prefix:is_diag_mat(b : "http://anzograph.com/matrices#tensor")

Parameter	Data Type	Description
Input:
b	"http://anzograph.com/matrices#tensor"	A Tensor.
Output:
v	boolean	True if the matrix is diagonal, that is,. all elements outside of the main diagonal are zero; otherwise, returns false.

is_square – Check whether matrix is square-sized.
```
prefix:is_square(b : "http://anzograph.com/matrices#tensor")
```
Parameter Data Type Description
Input:
b "http://anzograph.com/matrices#tensor" A Tensor.
Output:
v boolean True if the matrix is square, that is, the number of rows is equal to the number of columns.
is_symmetric – Check whether the matrix is symmetric.
```
prefix:is_symmetric(b : "http://anzograph.com/matrices#tensor")
```
Parameter Data Type Description
Input:
b "http://anzograph.com/matrices#tensor" A Tensor.
Output:
v boolean True if the matrix is symmetric.
is_hermitian – Check whether matrix is hermitian.
```
prefix:is_hermitian(b : "http://anzograph.com/matrices#tensor")
```
Parameter Data Type Description
Input:
b "http://anzograph.com/matrices#tensor" A Tensor.
Output:
v boolean True if the matrix is hermitian (self-adjoint).
has_nan – Check whether a matrix is NaN.
```
prefix:has_nan(b : "http://anzograph.com/matrices#tensor")
```
Parameter Data Type Description
Input:
b "http://anzograph.com/matrices#tensor" A Tensor.
Output:
v boolean True if at least one of the elements of the object is NaN (not-a-number).

vec_all – Check whether all elements are non-zero, or satisfy a relational condition in a row or column vectors.

prefix:vec_all(b : "http://anzograph.com/matrices#tensor",
   c : int, val : double)

Parameter	Data Type	Description
Input:
b	"http://anzograph.com/matrices#tensor"	A Tensor.
c	int	Optional parameter; the default is 0. Relation condition: 0-not equal 1-greater than 2-less than 3-equal 4-greater than or equal 5-less than or equal
val	double
Output:
v	boolean	True if all elements of the vector are non-zero or satisfy a relational condition.

mat_all – Check whether all elements are non-zero, or satisfy a relational condition in a matrix.

prefix:mat_all(b : "http://anzograph.com/matrices#tensor",
   d : boolean, c : int, val : double)

Parameter	Data Type	Description
Input:
b	"http://anzograph.com/matrices#tensor"	A Tensor.
d	boolean	Check rows or columns. Optional parameter; default is to check all columns.
c	int	Relation condition: 0-not equal 1-greater than 2-less than 3-equal 4-greater than or equal 5-less than or equal This parameter is optional; the default is 0.
val	double	Value against which the condition c to apply. This parameter is optional; the default is 0.
Output:
v	"http://anzograph.com/matrices#tensor"	A Tensor representation of a row vector, with each element (0 or 1) indicating whether the corresponding row/column has all non-zero elements.

vec_any – Check whether any element is non-zero, or satisfy a relational condition in a row or column vectors.

prefix:vec_any(b : "http://anzograph.com/matrices#tensor",
   c : int, val : double)

Parameter	Data Type	Description
Input:
b	"http://anzograph.com/matrices#tensor"	A Tensor.
c	int	Relation condition: 0-not equal 1-greater than 2-less than 3-equal 4-greater than or equal 5-less than or equal This parameter is optional; the default is 0.
val	double	Value against which the condition c to apply. This parameter is optional; the default is 0.
Output:
v	"http://anzograph.com/matrices#tensor"	True if any element of the vector is non-zero or satisfies a relational condition.

mat_any – Check whether any element is non-zero, or satisfy a relational condition in a matrix.

prefix:mat_any(b : "http://anzograph.com/matrices#tensor",
   d : boolean, c : int, val : double)

Parameter	Data Type	Description
Input:
b	"http://anzograph.com/matrices#tensor"	A Tensor.
d	boolean	Check rows or columns. Optional parameter; default is to check all columns.
c	int	Relation condition: 0-not equal, 1-greater than, 2-less than, 3-equal, 4-greater than or equal, 5-less than or equal. Optional parameter; default is 0.
val	double	Value against which the condition c to apply. This parameter is optional; the default is 0.
Output:
v	"http://anzograph.com/matrices#tensor"	A Tensor representation of a row vector, with each element (0 or 1) indicating whether the corresponding row/column has any non-zero elements.

Matthews Correlation Coefficient (MCC)

The Matthews Correlation Coefficient function returns a coefficient value between observed and predicted binary classifications.

Wikipedia Reference: Matthews correlation coefficient.

The general signature for calling the Matthews Correlation Coefficient function is the following:

prefix:mcc(x : boolean, y : boolean)

Where prefix points to the URI <http://cambridgesemantics.com/anzograph/statistics#> location of the AnzoGraph data science functions.

Parameter	Data Type	Description
Input :
x	boolean	1st variable column data.
y	boolean	2nd variable column data.
Output:
coefficient	double	Extent to which observed and predicted binary classifications are related.

Negative Binomial Distribution (NEGBINDIST)

This function calculates probability using a discrete probability distribution that concerns the number of trials which must occur in order to have a predetermined number of successes.

Wikipedia Reference: Negative Binomial Distribution

The general signature for calling the Negative Binomial Distribution function is the following:

prefix:negbindist(data : String, k : long, r : long, SuccessIs : String)

Where prefix points to the URI <http://cambridgesemantics.com/anzograph/statistics#> location of the AnzoGraph data science functions.

Parameter	Data Type	Description
Input:
data	String	Column data.
k	long	Number of successes.
r	long	Number of failures.
SuccessIs	String	Define success string among the column data.
Output:
probability	double	Probability value which is the value for probability mass function.
cdfLower	double	Cumulative distribution function which is probability (<=k) under the area of distribution.
cdfUpper	double	Cumulative distribution function which is probability (>k) under the area of distribution.

Normal Distribution (NORMDIST)

This function calculates probability using a continuous probability distribution of data in which the majority of data points are relatively similar, within a small range of values with few outliers.

Wikipedia Reference: Normal Distribution

The general signature for calling the Normal Distribution function is the following:

prefix:normdist(data : double, c : String, x1 : double, x2 : double)

Where prefix points to the URI <http://cambridgesemantics.com/anzograph/statistics#> location of the AnzoGraph data science functions.

Parameter	Data Type	Description
Input:
data	double	Column data.
c	String	User choice: 'below'; 'above'; 'bet'(Between); 'out'(Outside) where user can choose probability below, above, between, or outside.
x1	double	Lower number x1 to find the probability.
x2	double	Upper number x2 to find the probability.
Output:
mean	double	Mean of the distribution.
stdDev	double	Standard deviation of the distribution.
variance	double	Variance of the distribution.
diffEntropy	double	Differential Entropy in nats.
cdf	double	Cumulative distribution function which is probability under the area of distribution.
pdfLower	double	Probability density function value for x1.
pdfUpper	double	Probability density function value for x2.

Pearson Correlation Coefficient (PCC)

The Pearson Correlation Coefficient function determines the extent to which two variables are linearly related: positive, negative, or no relationship.

Wikipedia Reference: Pearson correlation coefficient

The general signature for calling the Pearson Correlation Coefficient function is the following:

prefix:pcc(x : double, y : double)

Where prefix points to the URI <http://cambridgesemantics.com/anzograph/statistics#> location of the AnzoGraph data science functions.

Parameter	Data Type	Description
Input :
x	double	1st variable column data.
y	double	2nd variable column data.
Output :
coefficient	double	Extent to which two variables are linearly related.

Percentile Metric (PERCENTILE)

This function calculates percentile (1 to 100) of numeric values.

Wikipedia Reference: Percentile Metric

The general signature for calling the Percentile function is the following:

prefix:percentile(data : double, p : double)

Where prefix points to the URI <http://cambridgesemantics.com/anzograph/statistics#> location of the AnzoGraph data science functions.

Parameter	Data Type	Description
Input:
data	double	Data set.
p	double	To compute the percentiles of a specific value in [0, 100].
Output:
percentile	double	percentile value.

Poisson Distribution (POISDIST)

This function calculates probability using discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time or space, given these events occur with a known constant rate and occur independently of the time since the last event.

Wikipedia Reference: Poisson Distribution

The general signature for calling the Poisson Distribution function is the following:

prefix:poisdist(data : long, k : long)

Where prefix points to the URI <http://cambridgesemantics.com/anzograph/statistics#> location of the AnzoGraph data science functions.

Parameter	Data Type	Description
Input:
data	long	Column data.
k	long	Probability of observing k events in an interval.
Output:
probability	double	Probability value which is the value for probability mass function.
cdfLower	double	Cumulative distribution function which is probability (<=k) under the area of distribution.
cdfUpper	double	Cumulative distribution function which is probability (>k) under the area of distribution.

Principal Component Analysis (PCA)

Applies principal component analysis (PCA) to create combined eigen values and vectors that highlight patterns in a dataset, making it easier to explore and visualize data.

Wikipedia Reference: Principal Component Analysis

There are several different function signatures available for Principal Component analysis.

The prefix shown in the function signatures below points to the URI <http://cambridgesemantics.com/anzograph/matrices#> location of the AnzoGraph PCA data science functions.

pca::create– Apply Principal Component Analysis(PCA) to create combined eigenvalues and eigenvectors.

prefix:pca::create(x1 : double, x2 : double,...,xn : double)

Parameter	Data Type	Description
Input:
x1, x2,...,xn	double	Feature column datasets.
Output:
eigen_values_vectors	"http://anzograph.com/matrices#feature_result"	PCA data containing eigenvalues and eigenvectors.

pca::get_eigvec – Get PCA's eigen vectors as a matrix from the PCA data.

prefix:pca::get_eigvec(pca_data: "http://anzograph.com/matrices#feature_result")

Parameter	Data Type	Description
Input:
pca_data	"http://anzograph.com/matrices#feature_result"	Principal Component Analysis data.
Output:
eigen_vectors	"http://anzograph.com/matrices#tensor"	Eigen vectors as a matrix.

pca::get_eigval – Get PCA's eigen values as a column vector from PCA data.

prefix:pca::get_eigval(pca_data : "http://anzograph.com/matrices#feature_result")

Parameter	Data Type	Description
Input:
pca_data	"http://anzograph.com/matrices#feature_result"	Principal Component Analysis data.
Output:
eigen_values	"http://anzograph.com/matrices#tensor"	Eigen values in descending order as column vectors.

transform – Apply Linear Discriminant Analysis (LDA) to transform the samples onto the new subspace.

prefix:transform(pca_data : "http://anzograph.com/matrices#feature_result",
   d : int, x1 : double, x2 : double,...,xn : double)

Parameter	Data Type	Description
Input:
pca_data	"http://anzograph.com/matrices#feature_result"	PCA data.
d	int	Number of eigen vectors to consider from the end.
x1, x2,...,xn	double	Feature column data sets.
Output:
transformed_data	double	Sample data transformed into the tuple of lower dimensional space.

dump_tensor – Get string representation of vector or matrix in row-wise/column-wise order.

prefix:dump_tensor(m : "http://anzograph.com/matrices#tensor",
   type: int, isRowWize: Boolean)

	Data Type	Description
Input:
m	"http://anzograph.com/matrices#tensor"	A tensor of matrix/row vector/column vector.
type	Int	Type of tensor: 0-Row vector, 1-Column Vector, 2-Matrix. Optional, default is 2.
isRowWize	Boolean	False if the display matrix is column-wise. Optional, default is true.
Output:
dump	String	String representation of vector or matrix in row-wise/column-wise.

pca::get_raw_eigval – Get PCA's unsorted eigen values from the PCA data.

prefix:pca::get_raw_eigval(pca_data : "http://anzograph.com/matrices#feature_result")

Parameter	Data Type	Description
Input:
pca_data	"http://anzograph.com/matrices#feature_result"	Principal Component Analysis data.
Output:
eigen_values	"http://anzograph.com/matrices#tensor"	Eigen values in unsorted order as column vector.

Quantile/Rank Sketch (KLL)

This function is used to calculate the quantile/rank from a stream of items using the KLL sketch computation model. There are several different signatures available for the Quantile/Rank Sketch function.

Reference: KLL Sketch

The prefix shown in the function signatures below points to the URI <http://cambridgesemantics.com/anzograph/sketch#> location of the AnzoGraph data science functions.

kll– Creates binary image for the theta sketch.

prefix:kll(val : Object, k : int)

Parameter	Data Type	Description
Input:
val	Object	Input data stream (supporting short, int, long, float, double and string).
k	int	Sketch configuration parameter, which affects the size of the sketch and its estimation error. It is optional and default value is 200. k can be any value between 8 and 65535, inclusive. The default k = 200 results in a normalized rank error of about 1.65%. Higher values of K will have smaller error but the sketch will be larger (and slower)..
Output:
kll_sketch	"http://anzograph.com/statistics#kll_sketch"	Binary stream containing KLL sketch data.

kll::get_min_value – Gets the minimum value of the stream.

prefix:kll::get_min_value(kll_sketch : "http://anzograph.com/statistics#kll_sketch")

Parameter	Data Type	Description
Input:
kll_sketch	"http://anzograph.com/statistics#kll_sketch"	Binary stream containing KLL sketch data.
Output:
double_val	double	The min value of the stream.
string_val	String	The string having min value when input stream is of string type.

kll::get_max_value – Gets the maximum value of the stream.

prefix:kll::get_max_value(kll_sketch : "http://anzograph.com/statistics#kll_sketch")

Parameter	Data Type	Description
Input:
kll_sketch	"http://anzograph.com/statistics#kll_sketch"	Binary stream containing KLL sketch data.
item	Object	Data item whose frequency to be estimated.
Output:
double_val	double	The max value of the stream.
string_val	String	The string having max value when input stream is of string type.

kll::get_n – Gets the stream length.
```
prefix:kll::get_n(kll_sketch : "http://anzograph.com/statistics#kll_sketch")
```
Parameter Data Type Description
Input:
kll_sketch "http://anzograph.com/statistics#kll_sketch"
Binary stream containing KLL sketch data.
Output:
n long The length of the input stream.
kll::get_num_retained – Gets the number of retained items (samples) in the sketch.
```
prefix:get_num_retained(kll_sketch : "http://anzograph.com/statistics#kll_sketch")
```
Parameter Data Type Description
Input:
kll_sketch "http://anzograph.com/statistics#kll_sketch" Binary stream containing KLL sketch data.
Output:
n long The number of retained items (samples) in the sketch.

kll::get_rank – Gets an approximation to the normalized (fractional) rank of the given value from 0 to 1, inclusive.

prefix:kll::get_rank(kll_sketch : "http://anzograph.com/statistics#kll_sketch",
   v : double)

Parameter	Data Type	Description
Input:
kll_sketch	"http://anzograph.com/statistics#kll_sketch"	Binary stream containing KLL sketch data.
v	double	Item to be ranked.
Output:
r	double	An approximate rank of the given item.

kll::get_quantile – Gets an approximation to the value of the data item from the rank.

prefix:kll::get_quantile(kll_sketch : "http://anzograph.com/statistics#kll_sketch", 
   fraction : double)

Parameter	Data Type	Description
Input:
kl_sketch	"http://anzograph.com/statistics#kll_sketch"	Binary stream containing KLL sketch data.
fraction	double	The specified fractional position in the hypothetical sorted stream.
Output:
v	double	An approximation to the value of the data item that would be preceded by the given fraction of a hypothetical sorted version of the input stream so far.
stringVal	String	An approximation to the string when input stream is of string type.

kll::get_quantiles – Provides more efficient multiple-query version of kll::get_quantile() and allows the caller to specify the number of evenly spaced fractional ranks.

prefix:kll::get_quantiles(kll_sketch : "http://anzograph.com/statistics#kll_sketch", 
  f1 : double, f2 : double, ..., f10 : double)

Parameter	Data Type	Description
Input:
kll_sketch	"http://anzograph.com/statistics#kll_sketch"	Binary stream containing KLL sketch data.
f1, f2, ... f10	double	Given fractional positions in the hypothetical sorted stream. These are also called normalized ranks or fractional ranks. These fractions must be in the interval [0.0, 1.0], inclusive.
Output:
v1, v2, ...v10	double	An approximation to the values in the same order as the given fractional positions.

kll::get_quantiles_str – Provides an approximation to the strings when the input stream is of string type.

prefix:kll::get_quantiles_str(kll_sketch : "http://anzograph.com/statistics#kll_sketch", 
   f1 : double, f2 : double, ..., f10 : double)

Parameter	Data Type	Description
Input:
kll_sketch	"http://anzograph.com/statistics#kll_sketch"	Binary stream containing KLL sketch data.
f1, f2, ... f10	double	Given fractional positions in the hypothetical sorted stream. These are also called normalized ranks or fractional ranks. These fractions must be in the interval [0.0, 1.0], inclusive.
Output:
v1, v2, ...v10	string	An approximation to the strings when input stream is of string type.

kll::get_pmf – Provides an approximation to the Probability Mass Function (PMF) of the input stream given the values.

prefix:kll::get_pmf(kll_sketch : "http://anzograph.com/statistics#kll_sketch", 
   v1 : Object, v2 : Object, ..., v10 : Object)

Parameter	Data Type	Description
Input:
kll_sketch	"http://anzograph.com/statistics#kll_sketch"	Binary stream containing KLL sketch data.
v1, v2, ...v10	Object	Input values between the min and max values of the input stream. Values must be unique and monotonically increasing.
Output:
r1, r2, ...r10	double	PMF values corresponding to the input..

kll::get_cdf – Provides an approximation to the Cumulative Distribution Function (CDF), which is the cumulative analog of the PMF of the input stream, given the values.

prefix:kll::get_cdf(kll_sketch : "http://anzograph.com/statistics#kll_sketch", 
   v1 : Object, v2 : Object, ..., v10 : Object)

Parameter	Data Type	Description
Input:
kll_sketch	"http://anzograph.com/statistics#kll_sketch"	Binary stream containing KLL sketch data.
v1, v2, ...v10	Object	Input values between the min and max values of the input stream. Values must be unique and monotonically increasing.
Output:
r1, r2, ...r10	double	CDF values corresponding to the input.

Singular Value Decomposition (SVD)

Singular value decomposition (SVD), a matrix factorization method, creates combined singular values and right singular vectors.

Wikipedia Reference: Singular Value Decomposition

There are several different function signatures available for Singular Value Decomposition.

The prefix shown in the function signatures below points to the URI <http://cambridgesemantics.com/anzograph/matrices#> location of the AnzoGraph SVD data science functions.

svd::create – Apply Singular Value Decomposition (SVD) to create combined singular values and right singular vectors.

prefix: svd::create(x1 : double, x2 : double,...,xn : double)

Parameter	Data Type	Description
Input:
x1, x2,...,xn	double	Feature column data sets.
Output:
svd_data	"http://anzograph.com/matrices#feature_result"	SVD data containing singular values and right singular vectors.

svd::get_sigval – Get SVD's singular values as a column vector from the SVD data.

prefix:svd::get_sigval(svd_data : "http://anzograph.com/matrices#feature_result")

Parameter	Data Type	Description
Input:
svd_data	"http://anzograph.com/matrices#feature_result"	SVD data.
Output:
singular_values	"http://anzograph.com/matrices#tensor"	Singular values in the descending order as a column vector.

svd::get_sigvec – Get SVD's singular vector as a matrix from the SVD data.

prefix:svd::get_sigvec(svd_data : "http://anzograph.com/matrices#feature_result")

Parameter	Data Type	Description
Input:
svd_data	"http://anzograph.com/matrices#feature_result"	svd_data - SVD data.
Output:
singular_vector	"http://anzograph.com/matrices#tensor"	Right singular vectors as matrix.

transform – Apply PCA or SVD to transform the samples onto the new subspace.

prefix:transform(svd_data : "http://anzograph.com/matrices#feature_result", 
   d : int, x1, x2,...,xn : double)

Parameter	Data Type	Description
Input:
svd_data	"http://anzograph.com/matrices#feature_result"	SVD data.
d	int	Number of singular vectors to consider from the start.
x1, x2,...,xn	double	Feature column data sets.
Output:
transformed_data	String	Sample data transformed into the tuple of lower dimensional space.

dump_tensor – Return string representation of vector or matrix in row-wise/column-wise order.

prefix: dump_tensor(m : Blob, type : Int, isRowWise : boolean)

Parameter	Data Type	Description
Input:
m	"http://anzograph.com/matrices#tensor"	A matrix/row vector/column vector.
type	Int	Type of tensor: 0-Row vector; 1-Column Vector; 2-Matrix.
isRowWise	boolean	True if displayed matrix row-wise.
Output:
dump	String	String representation of vector or matrix in row-wise/column-wise.

Skellam Distribution (SKELLAMDIST)

This function calculates probability using the Skellam distribution which models the difference between two independent Poisson distributed variables.

Wikipedia Reference: Skellam Distribution

The general signature for calling the Skellam Distribution function is the following:

prefix:skellamdist(N1_data : long, N2_data : long, k : long)

Where prefix points to the URI <http://cambridgesemantics.com/anzograph/statistics#> location of the AnzoGraph data science functions.

Parameter	Data Type	Description
Input:
N1_data	long	N1 Column data.
N2_data	long	N2 Column data.
k	long	Find probability for the number.
Output:
*probability*	double	Skellam probability value.

Skew Metric (SKEWCOEFF)

This function calculates the Pearson’s coefficient of skewness on numeric values.

Wikipedia Reference: Skewness

The general signature for calling the Skew Metric function is the following:

prefix:skewcoeff(data : double, dp : int)

Where prefix points to the URI <http://cambridgesemantics.com/anzograph/statistics#> location of the AnzoGraph data science functions.

Parameter	Data Type	Description
Input:
data	double	Data set.
dp	int	Number of decimal points to consider for the input data.
Output:
mode	double	Value that appears most often(Highest frequency).
median	double	The middle number in an ordered set of data.
mean	double	Average value.
stdDev	double	Standard deviation.
modeSkewnessCoef	double	Pearson mode skewness or first skewness coefficient.
medianSkewnessCoef	double	The Pearson median skewness or second skewness coefficient.

Spearman Correlation Coefficient (SCC)

The Spearman Correlation Coefficient function determines how well the relationship between two variables can be described using a monotonic function.

Wikipedia Reference: Spearman's Correlation Coefficient

The general signature for calling the Spearman Correlation Coefficient function is the following:

prefix:scc(rank_X : double, rank_Y : double)

Where prefix points to the URI <http://cambridgesemantics.com/anzograph/statistics#> location of the AnzoGraph data science functions.

Parameter	Data Type	Description
Input
rank_X	double	X ranked data.
rank_Y	double	Y ranked data.
Output
coefficient	double	Coefficient between ranked data.

Student's T-Distribution (TDIST)

This function calculates probability using the Student's t-distribution (and associated t scores) which are often used in hypothesis testing when the sample size is small and/or when the population variance is unknown.

Wikipedia Reference: Student's t-distribution

The general signature for calling the Student's t-distribution function is the following:

prefix:tdist(data : double, M : double)

Where prefix points to the URI <http://cambridgesemantics.com/anzograph/statistics#> location of the AnzoGraph data science functions.

Parameter	Data Type	Description
Input:
data	double	Sample data.
M	double	Population mean.
Output:
mean	double	Mean of the distribution.
stdDev	double	Standard deviation of the distribution.
variance	double	Variance of the distribution.
t-statistics	double	t = [ u - M ] / [ s / sqrt( N ) ] where u is the sample mean, M is the population mean, s is the standard deviation of the sample, and N is the sample size.
count	double	Number of samples so the degrees of freedom is (count-1).
pdf	double	Probability value which is the value for probability mass function.
cdf	double	Cumulative distribution function which is probability for <= t-statistics.

TDigest Metric (TDIGEST)

This function creates an estimate of the median (and more generally, any percentile) from either distributed data or streaming data, using a t-Digest probabilistic data structure.

Wikipedia Reference: Computing Quantiles using T-Digests

The general signature for calling the T-Digest function is the following:

prefix:tdigest(data : double, p : double, q : double, cdf : double)

Where prefix points to the URI <http://cambridgesemantics.com/anzograph/statistics/sketch#> location of the AnzoGraph data science functions.

Parameter	Data Type	Description
Input:
data	double	Column data.
p	double	To compute the percentiles of a specific value in [0, 100].
q	double	To compute the quantiles of a specific value in [0.0, 1.0].
cdf	double	To compute the CDF of a specific value.
Output:
percentile	double	Value below which a given percentage of observations in a group of observations falls.
quantile	double	Cut point to dividing the observations in a sample.
cdf	double	The computation of F(x); F denotes the CDF of the distribution.

Theta Sketch (THETA)

This function is used to perform estimates of set operations, Union, Intersection, and Difference, all using the Theta Sketch framework. There are several different function signatures available for Theta Sketch estimate calculations. Theta Sketches are a generalization of the well known K^th Minimum Value (KMV) sketches.

Reference: The Theta Sketch Framework

The prefix shown in the function signatures below points to the URI <http://cambridgesemantics.com/anzograph/sketch#> location of the AnzoGraph data science functions.

theta– Creates a binary image for the theta sketch.

prefix:theta(val : Object)

Parameter	Data Type	Description
Input:
val	Object	Data set (supporting short, int, long, float, double and string).
Output:
theta_sketch	"http://anzograph.com/statistics#theta_sketch"	Binary stream containing theta sketch data.

theta::cardinality – Gets the Cardinality estimate of the input stream.

prefix:theta::cardinality(theta_sketch : "http://anzograph.com/statistics#theta_sketch")

Parameter	Data Type	Description
Input:
theta_sketch	"http://anzograph.com/statistics#theta_sketch"	Binary stream containing theta sketch data.
Output:
n	double	Sketch's best estimate of the cardinality of the input stream.

theta::union – Gets the Cardinality estimate of the union of the input streams.

prefix:theta::union(theta_sketch... : "http://anzograph.com/statistics#theta_sketch")

Parameter	Data Type	Description
Input:
theta_sketch...	"http://anzograph.com/statistics#theta_sketch"	Binary stream containing theta sketch data. You can provide as many sketches as you want into the input as indicated by '...' in the signature.
Output:
n	double	Sketch's best estimate of the union of the input streams.

theta::intersection – Get the Cardinality estimate of the intersection of the input streams.

prefix:theta::intersection(theta_sketch... : "http://anzograph.com/statistics#theta_sketch")

Parameter	Data Type	Description
Input:
theta_sketch	"http://anzograph.com/statistics#theta_sketch"	Binary stream containing theta sketch data. You can provide as many sketches as you want into the input as indicated by '...' in the signature.
Output:
n	double	Sketch's best estimate of the intersection of the input streams.

theta::difference – Get the Cardinality estimate of set difference operation A and not B operations.

prefix:theta::difference(a : "http://anzograph.com/statistics#theta_sketch", 
   b : "http://anzograph.com/statistics#theta_sketch")

Parameter	Data Type	Description
Input:
a	"http://anzograph.com/statistics#theta_sketch"	Binary stream containing theta sketch for data set A.
b	"http://anzograph.com/statistics#theta_sketch"	Binary stream containing theta sketch for data set B.
Output:
n	double	Sketch's best estimate of the set difference operation A and not B operations.

Weibull Distribution (WEIBULDIST)

This function calculates probability from a continuous probability distribution commonly used to assess product reliability, analyze product life data and failure times.

Wikipedia Reference: Weibull Distribution

The general signature for calling the Weibull Distribution function is the following:

prefix:weibuldist(data : double, k : double, x : double)

Where prefix points to the URI <http://cambridgesemantics.com/anzograph/statistics#> location of the AnzoGraph data science functions.

Parameter	Data Type	Description
Input:
data	double	Sample data.
k	double	The initial starting value for shape parameter. A good guess is crucial to quick convergence.
x	double	Find the probability for a random variable.
Output:
mean	double	Mean of the distribution.
stdDev	double	Standard deviation of the distribution.
variance	double	Variance of the distribution.
count	long	Number of samples.
ShapeParam	double	Estimated shape parameter(k) of the distribution from mean and variance using root finding method.
ScaleParam	double	Estimated scale parameter(a) of the distribution from mean and variance using root finding method.
diffEntropy	double	Differential Entropy in nats.
pdf	double	Probability value which is the value for probability density function.
cdfLower	double	Cumulative distribution function which is probability(<=x) under the area of distribution.
cdfUpper	double	Cumulative distribution function which is probability(>x) under the area of distribution.
maxit	long	Actual number of iterations performed to get an estimate of the k value.
estimatedMean	double	Mean calculated using estimated values of k and a.
estimatedVariance	double	Variance calculated using estimated values of k and a.