Sketch Library
                                            The sketch library provides extremely efficient streaming algorithms that approximate calculations, such as count distinct, quantiles, most frequent items, joins, and matrix computations, and return data sketches. This topic describes each of the sketch functions.
                                            
                                                The URI for the sketch functions is <http://cambridgesemantics.com/anzograph/statistics/sketch#>. For readability, the syntax for each function below includes the prefix sketch:, defined as PREFIX sketch: <http://cambridgesemantics.com/anzograph/statistics/sketch#>.
                                             
                                            
                                                - Cardinality Metric (HLL): Uses Apache DataSketches HyperLogLog (HLL) to calculate  cardinality estimates for a dataset.
- Frequent Items (FI): Collection of functions used to create frequency sketches and obtain information about frequent items.
- Quantile/Rank Sketch (KLL): Collection of functions that use the KLL sketch computation model to approximate minimum and maximum items in a dataset, the quantile and rank  of items, the Probability Mass Function (PMF), and the Cumulative Distribution Function (CDF).
- Theta Sketch (THETA): Collection of functions that use the Theta Sketch framework to compute estimates of the cardinality, union, intersection, and difference set operations and return a Theta Sketch.
Cardinality Metric (HLL)
                                             This aggregate calculates cardinality estimates for a dataset using Apache DataSketches HyperLogLog (HLL).
                                            Reference: Cardinality Prominence Metric
                                            Syntax
sketch:hll(data [, log_base_2_K ] [, hll_target_type ]) 
                                            
                                                
                                                
                                                
                                                
                                                    
                                                
                                                
                                                    
                                                        | data | byte, short, int, long, float, double, string, URI | The dataset. | 
                                                    
                                                        | log_base_2_K | int | Optional argument that specifies the log base 2 of K, where K is the number of buckets or slots for the sketch. Must be between 4 and 21 (inclusive). Default value is 12. | 
                                                    
                                                        | hll_target_type | int | Optional argument that specifies the target type for the HLL sketch. Supported values are 4(HLL_4),6(HLL_6), or8(HLL_8). Default value is4. | 
                                                
                                            
                                            Returns
                                            
                                                
                                                
                                                
                                                    
                                                
                                                
                                                    
                                                        | double | The cardinality metric value. | 
                                                
                                            
                                            Frequent Items (FI)
                                            The FI aggregate is used to estimate the frequency of items in a dataset, the upper and lower bounds of the items, the number of active items, and the total stream weight. FI returns a binary stream (Frequent Items Sketch) containing all of the computed values. Values can retrieved from the sketch using the Frequent Items Sketch Retrieval Functions: get_estimates, get_active_items_total_weights, get_top_items, and get_top_strings. 
                                            
                                            FI Syntax
sketch:fi(values [, weight ])
                                            
                                                
                                                
                                                
                                                
                                                    
                                                
                                                
                                                    
                                                        | values | short, int, long, float, double, string | The dataset. | 
                                                    
                                                        | weight | long | Optional argument that specifies the weight of val. The default value is1. | 
                                                
                                            
                                            Returns
                                            
                                                
                                                
                                                
                                                    
                                                
                                                
                                                    
                                                        | http://anzograph.com/statistics#fi_sketch | Binary Frequent Items Sketch. | 
                                                
                                            
                                            Frequent Items Sketch Retrieval Functions
                                            The following functions are available for retrieving values from a Frequent Items Sketch:
                                            
                                            fi::get_estimates
                                            Returns the estimates for the frequency and lower and upper bound of the given item in a sketch.
                                            Syntax
sketch:fi::get_estimates(fi_sketch, item) 
                                            
                                                
                                                
                                                
                                                
                                                    
                                                
                                                
                                                    
                                                        | fi_sketch | http://anzograph.com/statistics#fi_sketch | Frequent Items Sketch. | 
                                                    
                                                        | item | Object | Item for which to get estimates. | 
                                                
                                            
                                            Returns
                                            
                                                
                                                
                                                
                                                    
                                                
                                                
                                                    
                                                        | long | Frequency estimate for the item. | 
                                                    
                                                        | long | Lower bound  estimate for the item. | 
                                                    
                                                        | long | Upper bound estimate for the item. | 
                                                
                                            
                                            fi::get_active_items_total_weights
                                            Returns the number of active items  and the estimated total stream weight from a sketch.
                                            Syntax
sketch:fi::get_active_items_total_weights(fi_sketch)
                                            
                                                
                                                
                                                
                                                
                                                    
                                                
                                                
                                                    
                                                        | fi_sketch | http://anzograph.com/statistics#fi_sketch | Frequent Items Sketch. | 
                                                
                                            
                                            Returns
                                            
                                                
                                                
                                                
                                                    
                                                
                                                
                                                    
                                                        | long | The estimated number of active items. | 
                                                    
                                                        | long | The estimated total stream weight. | 
                                                
                                            
                                            fi::get_top_items
                                            Returns the most frequent items and their corresponding frequency.
                                            Syntax
sketch:fi::get_top_items(fi_sketch)
                                            
                                                
                                                
                                                
                                                
                                                    
                                                
                                                
                                                    
                                                        | fi_sketch | http://anzograph.com/statistics#fi_sketch | Frequent Items Sketch. | 
                                                
                                            
                                            Returns
                                            
                                                
                                                
                                                
                                                    
                                                
                                                
                                                    
                                                        | double | The  item with the highest frequency. | 
                                                    
                                                        | long | Frequency estimate of the first item. | 
                                                    
                                                        | double | The  item with the second highest frequency. | 
                                                    
                                                        | long | Frequency estimate of the second item. | 
                                                    
                                                        | double | The  item with the nth highest frequency. | 
                                                    
                                                        | long | Frequency estimate of the nth item. | 
                                                
                                            
                                            fi::get_top_strings
                                            Gets top frequent strings and their corresponding frequency.
                                            Syntax
sketch:fi::get_top_strings(fi_sketch)
                                            
                                                
                                                
                                                
                                                
                                                    
                                                
                                                
                                                    
                                                        | fi_sketch | http://anzograph.com/statistics#fi_sketch | Frequent Items Sketch. | 
                                                
                                            
                                            Returns
                                            
                                                
                                                
                                                
                                                    
                                                
                                                
                                                    
                                                        | string | The  string with the highest frequency. | 
                                                    
                                                        | long | Frequency estimate of the first string. | 
                                                    
                                                        | string | The  string with the second highest frequency. | 
                                                    
                                                        | long | Frequency estimate of the second string. | 
                                                    
                                                        | string | The  string with the nth highest frequency. | 
                                                    
                                                        | long | Frequency estimate of the nth string. | 
                                                
                                            
                                            Quantile/Rank Sketch (KLL)
                                            The KLL aggregate uses the KLL Sketch computation model to calculate the approximate minimum and maximum items in a dataset, the quantile and rank  of items, the Probability Mass Function (PMF), and the Cumulative Distribution Function (CDF). KLL returns a binary stream (KLL Sketch) containing all of the computed values. Values can retrieved from the sketch using various KLL Sketch Retrieval Functions. 
                                            
                                                For more information about KLL sketches, see KLL Sketch.
                                             
                                            KLL Syntax
sketch:kll(values [, k ])
                                            
                                                
                                                
                                                
                                                
                                                    
                                                
                                                
                                                    
                                                        | values | short, int, long, float, double, string | The dataset. | 
                                                    
                                                        | k | int | Optional argument that configures the size of the sketch and its estimation error. Can be any value between 8 and 65535 (inclusive). The default value is 200, which  results in a normalized rank error of about 1.65%. Higher values  will have a smaller error but the sketch will be larger (and slower). | 
                                                
                                            
                                            Returns
                                            
                                                
                                                
                                                
                                                    
                                                
                                                
                                                    
                                                        | http://anzograph.com/statistics#kll_sketch | Binary KLL sketch. | 
                                                
                                            
                                            KLL Sketch Retrieval Functions
                                            The following functions are available for retrieving values from a KLL sketch:
                                            
                                            kll::get_min_value
                                            Returns the minimum value in a KLL sketch.
                                            Syntax
sketch:kll::get_min_value(kll_sketch)
                                            
                                                
                                                
                                                
                                                
                                                    
                                                
                                                
                                                    
                                                        | kll_sketch | http://anzograph.com/statistics#kll_sketch | KLL sketch. | 
                                                
                                            
                                            Returns
                                            
                                                
                                                
                                                
                                                    
                                                
                                                
                                                    
                                                        | double | The minimum value in the sketch. | 
                                                    
                                                        | string | If the input is a string, the minimum string is returned. | 
                                                
                                            
                                            kll::get_max_value
                                            Returns the maximum value in a KLL sketch.
                                            Syntax
sketch:kll::get_max_value(kll_sketch)
                                            
                                                
                                                
                                                
                                                
                                                    
                                                
                                                
                                                    
                                                        | kll_sketch | http://anzograph.com/statistics#kll_sketch | KLL sketch. | 
                                                
                                            
                                            Returns
                                            
                                                
                                                
                                                
                                                    
                                                
                                                
                                                    
                                                        | double | The maximum value in the sketch. | 
                                                    
                                                        | string | If the input is a string, the minimum string is returned. | 
                                                
                                            
                                            kll::get_n
                                            Returns the length of a KLL sketch.
                                            Syntax
sketch:kll::get_n(kll_sketch)
                                            
                                                
                                                
                                                
                                                
                                                    
                                                
                                                
                                                    
                                                        | kll_sketch | http://anzograph.com/statistics#kll_sketch | KLL sketch. | 
                                                
                                            
                                            Returns
                                            
                                                
                                                
                                                
                                                    
                                                
                                                
                                                    
                                                        | long | The length of the sketch. | 
                                                
                                            
                                            kll::get_num_retained
                                            Returns the number of retained items (samples) in a sketch.
                                            Syntax
sketch:get_num_retained(kll_sketch)
                                            
                                                
                                                
                                                
                                                
                                                    
                                                
                                                
                                                    
                                                        | kll_sketch | http://anzograph.com/statistics#kll_sketch | KLL sketch. | 
                                                
                                            
                                            Returns
                                            
                                                
                                                
                                                
                                                    
                                                
                                                
                                                    
                                                        | long | The number of retained items (samples) in the sketch. | 
                                                
                                            
                                            kll::get_rank
                                            Returns an approximation of the normalized (fractional) rank of the given item.
                                            Syntax
sketch:kll::get_rank(kll_sketch, v)
                                            
                                                
                                                
                                                
                                                
                                                    
                                                
                                                
                                                    
                                                        | kll_sketch | http://anzograph.com/statistics#kll_sketch | KLL sketch. | 
                                                    
                                                        | v | double | The item to retrieve the rank for. | 
                                                
                                            
                                            Returns
                                            
                                                
                                                
                                                
                                                    
                                                
                                                
                                                    
                                                        | double | The approximate rank of the item from 0 - 1 (inclusive). | 
                                                
                                            
                                            kll::get_quantile
                                            Returns an approximation of the value for an item from the rank.
                                            Syntax
sketch:kll::get_quantile(kll_sketch, fraction)
                                            
                                                
                                                
                                                
                                                
                                                    
                                                
                                                
                                                    
                                                        | kll_sketch | http://anzograph.com/statistics#kll_sketch | KLL sketch. | 
                                                    
                                                        | fraction | double | The fractional position in the hypothetical sorted stream. | 
                                                
                                            
                                            Returns
                                            
                                                
                                                
                                                
                                                    
                                                
                                                
                                                    
                                                        | double | An approximation of the value of the item that would be preceded  by the given fraction of a hypothetical sorted version of the sketch. | 
                                                    
                                                        | string | An approximation of the string when the input is a string. | 
                                                
                                            
                                            kll::get_quantiles
                                            Provides a more efficient, multiple-query version of kll::get_quantile that enables you to specify a number of evenly spaced fractional ranks.
                                            Syntax
sketch:kll::get_quantiles(kll_sketch, f1, f2, ..., f10)
                                            
                                                
                                                
                                                
                                                
                                                    
                                                
                                                
                                                    
                                                        | kll_sketch | http://anzograph.com/statistics#kll_sketch | KLL sketch. | 
                                                    
                                                        | f1–f10 | double | Normalized or fractional ranks in the hypothetical sorted stream.  The ranks must be in the interval 0.0 - 1.0 (inclusive). | 
                                                
                                            
                                            Returns
                                            
                                                
                                                
                                                
                                                    
                                                
                                                
                                                    
                                                        | double | An approximation of the values in the same order as the given fractional positions. | 
                                                
                                            
                                            kll::get_quantiles_str
                                            Provides an approximation to the strings when the input is a string type.
                                            Syntax
sketch:kll::get_quantiles_str(kll_sketch, f1, f2, ..., f10)
                                            
                                                
                                                
                                                
                                                
                                                    
                                                
                                                
                                                    
                                                        | kll_sketch | http://anzograph.com/statistics#kll_sketch | KLL sketch. | 
                                                    
                                                        | f1–f10 | double | Normalized or fractional ranks in the hypothetical sorted stream.  The ranks must be in the interval 0.0 - 1.0 (inclusive). | 
                                                
                                            
                                            Returns
                                            
                                                
                                                
                                                
                                                    
                                                
                                                
                                                    
                                                        | string | An approximation of the strings. | 
                                                
                                            
                                            kll::get_pmf
                                            Provides an approximation to the Probability Mass Function (PMF) of the input stream.
                                            Syntax
sketch:kll::get_pmf(kll_sketch, v1, v2, ..., v10)
                                            
                                                
                                                
                                                
                                                
                                                    
                                                
                                                
                                                    
                                                        | kll_sketch | http://anzograph.com/statistics#kll_sketch | KLL sketch. | 
                                                    
                                                        | v1–v10 | Object | Input values between the minimum and maximum values of the input stream. Values must be unique and monotonically increasing. | 
                                                
                                            
                                            Returns
                                            
                                                
                                                
                                                
                                                    
                                                
                                                
                                                    
                                                        | double | PMF values corresponding to the input. | 
                                                
                                            
                                            kll::get_cdf
                                            Provides an approximation to the Cumulative Distribution Function (CDF), which is the cumulative analog of the PMF of the input stream.
                                            Syntax
sketch:kll::get_cdf(kll_sketch, v1, v2, ..., v10)
                                            
                                                
                                                
                                                
                                                
                                                    
                                                
                                                
                                                    
                                                        | kll_sketch | http://anzograph.com/statistics#kll_sketch | KLL sketch. | 
                                                    
                                                        | v1–v10 | Object | Input values between the minimum and maximum values of the input stream. Values must be unique and monotonically increasing. | 
                                                
                                            
                                            Returns
                                            
                                                
                                                
                                                
                                                    
                                                
                                                
                                                    
                                                        | double | CDF values corresponding to the input. | 
                                                
                                            
                                            Theta Sketch (THETA)
                                            The THETA aggregate uses the Theta Sketch framework to compute estimates of the cardinality, union, intersection, and difference set operations and return a binary stream (Theta Sketch) containing the computed values.  Values can be retrieved from the sketch using the : cardinality, union, intersection, and difference.
                                            
                                                Theta Sketches are a generalization of the well-known Kth Minimum Value (KMV) sketches. For more information about the framework, you may find the following references helpful:
                                                
                                             
                                            THETA Syntax
sketch:theta(values)
                                            
                                                
                                                
                                                
                                                
                                                    
                                                
                                                
                                                    
                                                        | values | short, int, long, float, double, string | The dataset to operate on. | 
                                                
                                            
                                            Returns
                                            
                                                
                                                
                                                
                                                    
                                                
                                                
                                                    
                                                        | http://anzograph.com/statistics#theta_sketch | Binary Theta Sketch | 
                                                
                                            
                                            Theta Sketch Retrieval Functions
                                            The following functions are available for retrieving values from a Theta Sketch:
                                            
                                            theta::cardinality
                                            Retrieves the estimated count of values in a Theta Sketch.
                                            Syntax
sketch:theta::cardinality(theta_sketch)
                                            
                                                
                                                
                                                
                                                
                                                    
                                                
                                                
                                                    
                                                        | theta_sketch | http://anzograph.com/statistics#theta_sketch | Binary Theta Sketch | 
                                                
                                            
                                            Returns
                                            
                                                
                                                
                                                
                                                    
                                                
                                                
                                                    
                                                        | double | The count of items in the sketch. | 
                                                
                                            
                                            theta::union
                                            Retrieves the estimate of the number of items that are in the union of two or more Theta Sketches.
                                            Syntax
sketch:theta::union(theta_sketch1, theta_sketch2 [, theta_sketchN ])
                                            
                                                
                                                
                                                
                                                
                                                    
                                                
                                                
                                                    
                                                        | theta_sketch1–N | http://anzograph.com/statistics#theta_sketch | Any number of Theta Sketches. | 
                                                
                                            
                                            Returns
                                            
                                                
                                                
                                                
                                                    
                                                
                                                
                                                    
                                                        | double | The estimated number of items  in the union. | 
                                                
                                            
                                            theta::intersection
                                            Retrieves the estimate of the number of items that are in the intersection between two or more Theta Sketches.
                                            Syntax
sketch:theta::intersection(theta_sketch1, theta_sketch2 [, theta_sketchN ])
                                            
                                                
                                                
                                                
                                                
                                                    
                                                
                                                
                                                    
                                                        | theta_sketch1–N | http://anzograph.com/statistics#theta_sketch | Any number of Theta Sketches. | 
                                                
                                            
                                            Returns
                                            
                                                
                                                
                                                
                                                    
                                                
                                                
                                                    
                                                        | double | The estimated number of items that intersect in the sketches. | 
                                                
                                            
                                            theta::difference
                                            Retrieves the estimate of the number of items that are in the difference between two Theta Sketches, i.e., the number of items that are in the first sketch but not in the second sketch.
                                            Syntax
sketch:theta::difference(a, b)
                                            
                                                
                                                
                                                
                                                
                                                    
                                                
                                                
                                                    
                                                        | a | http://anzograph.com/statistics#theta_sketch | The first Theta Sketch. | 
                                                    
                                                        | b | http://anzograph.com/statistics#theta_sketch | The Theta Sketch to compare to sketch a. | 
                                                
                                            
                                            Returns
                                            
                                                
                                                
                                                
                                                    
                                                
                                                
                                                    
                                                        | double | The estimated number of items in the difference between the sketches. |