Highlighting Elasticsearch Results

By including the highlight property in ElasticSource GDI queries, you can configure the response to include highlights for search results. For general information about highlighting Elasticsearch responses, see Highlighting in the Elasticsearch documentation. Highlight property usage is described below.

Highlight Syntax

es:highlight [
   es:boundaryChars "string" ;
   es:boundaryMaxScan int ;
   es:boundaryScannerLocale "string" ;
   es:boundaryScannerType "string" ;
   es:field "string" ;
   es:forceSource boolean ;
   es:fragmentSize int ;
   es:fragmenter "string" ;
   es:highlightFilter boolean ;
   es:highlightQuery "string" | [ rdf_list ] ;
   es:highlighterType "string" ;
   es:noMatchSize int ;
   es:numberOfFragments int ;
   es:order "string" ;
   es:phraseLimit int ;
   es:postTags "string" ;
   es:preTags "string" ;
   es:requireFieldMatch boolean ;
] ;
Option Data Type Description
boundaryChars string This property can be used to define the boundary characters to look for. Defaults to .,!? \t\n.
boundaryMaxScan int This property can be used to place a limit on the number of characters to scan when looking for boundary characters. Defaults to 20.
boundaryScannerLocale string This property defines the language tag (such as "en-US" or "fr-FR") to apply when searching for sentence and word boundaries.
boundaryScannerType string If highlighterType is unified or fvh, this property can be used to specify how to break the highlighted fragments. This property is ignored when the highlighter type is plain. The list below describes the valid values:
  • chars: Valid when the highlighter type is fast vector highlighter (fvh) (es:highlighterType "fvh"). Specifies that the highlighting boundaries are the characters specified by boundaryChars. The boundaryMaxScan value controls how far to scan for boundary characters. This is the default value for fvh.
  • sentence: This is the default value for the unified highlighter. It configures highlighted fragments to break at the next sentence boundary. You can specify the locale to use with boundaryScannerLocale. When used with the unified highlighter, the sentence scanner splits sentences bigger than fragmentSize at the first word boundary next to fragmentSize. You can set fragmentSize to 0 to avoid splitting sentences.
  • word: Configures highlighted fragments to break at the next word boundary. You can specify the locale to use with boundaryScannerLocale.
field string or variable This property specifies the field to retrieve highlights for. It can include a ?variable (which the GDI maps to the full path of the field in the Elasticsearch document), a field name, or a field name pattern. For example:
es:highlight [
  es:field ?actor ;
  es:field "film.actor" ; 
  es:field "film.*" ;
  es:field "*" ;
]
forceSource boolean This property controls whether to highlight based on the source even if the field is stored separately. Defaults to false.
fragmentSize int This property specifies the number of characters to include in highlighted fragments. Defaults to 100.
fragmenter string If highlighterType is plain, this property can be used to specify how to break up text in highlight snippets. The list below describes the valid values:
  • simple: Breaks text into fragments that are the same size (as specified by fragmentSize).
  • span: The default value. Breaks text into fragments that are the same size but tries to avoid breaking text between highlighted terms.
highlightFilter boolean This property controls whether to highlight filter results.
highlightQuery string or object This property specifies the highlight query. The value can be a string or a query object that maps to the Elasticsearch query DSL.
highlighterType string This property defines the type of highlighter to use, "plain", "unified", or "fvh".
noMatchSize int This property specifies the number of characters to return from the beginning of the field if there are no matching fragments to highlight. Defaults to 0 (nothing is returned).
numberOfFragments int This property can be used to set the maximum number of fragments to generate. If this property is set to 0, no fragments are returned. Instead, the entire field contents are highlighted and returned, which can be useful if you want to highlight short text (such as a title or address) for which fragmentation is not required. Defaults to 5. If the number of fragments is 0, fragmentSize is ignored.
order string This property can be included to sort highlighted fragments by score. When es:order "score", the most relevant fragments are output first. Defaults to "none"; fragments are output in the order they appear in the field.
phraseLimit int If highlighterType is fvh, this property can be used to limit the number of matching phrases to consider. Limiting the number of phrases prevents the fvh highlighter from analyzing too many phrases and consuming too much memory. Defaults to 256.
postTags string This property is used in conjunction with preTags to define the HTML tags to use for the highlighted elements. This property defines the closing tag to use after the highlighted text. Defaults to </em>.
preTags string This property is used in conjunction with postTags to define the HTML tags to use for the highlighted elements. This property defines the opening tag to use before the highlighted text. Defaults to <em>.
requireFieldMatch boolean This property controls whether to highlight only the fields that contain a query match. Defaults to true. If false, all fields are highlighted.

Highlight Examples

The following example configures highlighting for fragments from the actor field.

PREFIX s: <http://cambridgesemantics.com/ontologies/DataToolkit#>
PREFIX es: <http://elastic.co/search/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
​SELECT * 
WHERE {
  SERVICE TOPDOWN <http://cambridgesemantics.com/services/DataToolkit>
  {
    ?data a es:ElasticSource ;
      ​es:url "http://localhost:9200/" ;
      es:index "films" ;
      es:html false ;
      es:query "Clint" ;
      es:field ?actor, ?director ;
      es:highlight [
        es:field ?actor ;
        es:type "plain" ;
        es:fragmentSize 200 ;
        es:numberOfFragments 10 ;
        es:preTags "<mark hit='true'>" ;
        es:postTags "</mark>" ;
      ] ;
      s:selector "film" ;
      ?actor (xsd:string) ;
      ?awards (xsd:string) ;
      ?director (xsd:string) ;
      ?image (xsd:string) ;
      ?length (xsd:long) ;
      ?popularity (xsd:long) ;
      ?subject (xsd:string) ;
      ?title (xsd:string) ;
      ?year (xsd:long) ;
​      ?score () ;
      ?id () ;
      ?highlights [
        ?field () ;
        ?fragment () ;
      ] .
  FILTER(?year = 1990 || ?length > 103)
  FILTER(REGEX(?title, "Manhattan", "q") || REGEX(?subject, "Comedy", "q") || REGEX(?subject, "Drama", "q"))
  }
}