java.lang.Object
io.fluxzero.common.SearchUtils
Utility class for search-related functionality such as term normalization, path transformation, glob pattern
matching, and primitive value extraction.
This class is used extensively during document indexing, querying, and filtering. It also includes utilities for path conversion between dot and slash notation, JSON normalization, and field escaping/unescaping.
-
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final DateTimeFormatterDate-time formatter used to serialize or deserialize full ISO-8601 instant values with millisecond precision.static final StringA character class used in regex patterns that includes letters and digits.static final PatternPattern for extracting search terms and quoted phrases from a string. -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionstatic ObjectasIntegerOrString(String fieldName) Attempts to convert a numeric string to an Integer, falling back to the original string otherwise.static ObjectasPrimitive(Object value) Converts any non-primitive value to its string form.static StringescapeFieldName(String fieldName) Escapes slashes and quotes in field names for safe indexing.getGlobMatcher(String pattern) Converts a standard POSIX Shell globbing pattern into a regular expression pattern.static booleanChecks whether the input string is a parseable integer.static StringNormalizes a string to lowercase and removes diacritics and leading/trailing whitespace.static byte[]normalizeJson(byte[] data) Normalizes all string values in a JSON byte array by stripping accents and lowercasing.static StringnormalizePath(String queryPath) Replaces unescaped dots in field paths with slashes.splitInTerms(String query) Extracts quoted phrases and standalone terms from a free-form query string.static StringunescapeFieldName(String fieldName) Unescapes slashes and quotes in field names.
-
Field Details
-
letterOrNumber
A character class used in regex patterns that includes letters and digits.- See Also:
-
termPattern
Pattern for extracting search terms and quoted phrases from a string. -
ISO_FULL
Date-time formatter used to serialize or deserialize full ISO-8601 instant values with millisecond precision.
-
-
Constructor Details
-
SearchUtils
public SearchUtils()
-
-
Method Details
-
normalize
Normalizes a string to lowercase and removes diacritics and leading/trailing whitespace. -
normalizeJson
public static byte[] normalizeJson(byte[] data) Normalizes all string values in a JSON byte array by stripping accents and lowercasing.This is typically used during document indexing to enable consistent full-text search.
- Parameters:
data- a raw JSON-encoded byte array- Returns:
- a normalized version of the input data
-
getGlobMatcher
Converts a standard POSIX Shell globbing pattern into a regular expression pattern. The result can be used with the standardjava.util.regexAPI to recognize strings which match the glob pattern.From ...
See also, the POSIX Shell language: ...
- Parameters:
pattern- A glob pattern.- Returns:
- A regex pattern to recognize the given glob pattern.
-
isInteger
Checks whether the input string is a parseable integer. -
asIntegerOrString
Attempts to convert a numeric string to an Integer, falling back to the original string otherwise. -
splitInTerms
Extracts quoted phrases and standalone terms from a free-form query string.- Parameters:
query- the raw search string- Returns:
- a list of normalized search terms
-
asPrimitive
Converts any non-primitive value to its string form. -
normalizePath
Replaces unescaped dots in field paths with slashes.This standardizes paths used in documents (e.g.
a.b.c→a/b/c). -
escapeFieldName
Escapes slashes and quotes in field names for safe indexing. -
unescapeFieldName
Unescapes slashes and quotes in field names.
-