Class CharacterValidationConstants

java.lang.Object
de.cuioss.http.security.validation.CharacterValidationConstants

public final class CharacterValidationConstants extends Object
RFC-compliant character set definitions for HTTP component validation.

This utility class provides pre-computed BitSet instances containing allowed characters for different HTTP components according to RFC 3986 (URI) and RFC 7230 (HTTP) specifications. All character sets are optimized for high-performance validation with O(1) character lookups.

Design Principles

  • RFC Compliance - Strict adherence to HTTP and URI specifications
  • Performance Optimized - Pre-computed BitSets for O(1) character validation
  • Thread Safety - Immutable after initialization, safe for concurrent access
  • Memory Efficient - Shared instances reduce memory overhead

Character Set Categories

  • RFC3986_UNRESERVED - Basic unreserved characters from RFC 3986
  • RFC3986_PATH_CHARS - Characters allowed in URL paths
  • RFC3986_QUERY_CHARS - Characters allowed in URL query parameters
  • RFC7230_HEADER_CHARS - Characters allowed in HTTP headers
  • HTTP_BODY_CHARS - Characters allowed in HTTP request/response bodies

Usage Examples

 // Get character set for URL path validation
 BitSet pathChars = CharacterValidationConstants.getCharacterSet(ValidationType.URL_PATH);

 // Check if character is allowed in URL paths
 char ch = '/';
 boolean isAllowed = pathChars.get(ch); // Returns true

 // Validate string characters
 String input = "/api/users";
 for (int i = 0; i < input.length(); i++) {
     char c = input.charAt(i);
     if (!pathChars.get(c)) {
         throw new IllegalArgumentException("Invalid character: " + c);
     }
 }
 

Performance Characteristics

  • O(1) character lookup time using BitSet.get()
  • Minimal memory footprint - shared across all validators
  • No runtime computation - all sets pre-computed during class loading
  • Thread-safe concurrent access without synchronization

RFC References

  • RFC 3986 - Uniform Resource Identifier (URI) character definitions
  • RFC 7230 - HTTP/1.1 Message Syntax and Routing header field definitions

Security Note: These character sets define allowed characters only. Additional security validation (pattern matching, length limits, etc.) should be applied by higher-level validation stages.

Implements: Task V5 from HTTP verification specification

Since:
1.0
See Also:
  • Field Details

    • RFC3986_UNRESERVED

      public static final BitSet RFC3986_UNRESERVED
      RFC 3986 unreserved characters: ALPHA / DIGIT / "-" / "." / "_" / "~".

      These are the basic safe characters allowed in URIs without percent-encoding.

    • RFC3986_PATH_CHARS

      public static final BitSet RFC3986_PATH_CHARS
      RFC 3986 path characters including unreserved + path-specific characters.

      Includes all unreserved characters plus: / @ : ! $ & ' ( ) * + , ; =

    • RFC3986_QUERY_CHARS

      public static final BitSet RFC3986_QUERY_CHARS
      RFC 3986 query characters including unreserved + query-specific characters.

      Includes all unreserved characters plus: ? & = ! $ ' ( ) * + , ;

    • RFC7230_HEADER_CHARS

      public static final BitSet RFC7230_HEADER_CHARS
      RFC 7230 header field characters (visible ASCII minus delimiters).

      Includes space through tilde (32-126) plus tab character.

    • HTTP_BODY_CHARS

      public static final BitSet HTTP_BODY_CHARS
      HTTP body content characters (permissive for JSON, XML, text, etc.).

      Includes printable ASCII (32-126), tab, LF, CR, and extended ASCII (128-255).

  • Method Details