- java.lang.Object
-
- net.disy.oss.weburl.WebUrl
-
public final class WebUrl extends Object
A uniform resource locator (URL) with a scheme of eitherhttporhttps. Use this class to compose and decompose Internet addresses. For example, this code will compose and print a URL for Google search:
which prints:WebUrl url = new WebUrl.Builder() .scheme("https") .host("www.google.com") .addPathSegment("search") .addQueryParameter("q", "polar bears") .build(); System.out.println(url);
As another example, this code prints the human-readable query parameters of a Twitter search:https://www.google.com/search?q=polar%20bears
which prints:WebUrl url = WebUrl.parse("https://twitter.com/search?q=cute%20%23puppies&f=images"); for (int i = 0, size = url.querySize(); i < size; i++) { System.out.println(url.queryParameterName(i) + ": " + url.queryParameterValue(i)); }
In addition to composing URLs from their component parts and decomposing URLs into their component parts, this class implements relative URL resolution: what address you'd reach by clicking a relative link on a specified page. For example:q: cute #puppies f: images
which prints:WebUrl base = WebUrl.parse("https://www.youtube.com/user/WatchTheDaily/videos"); WebUrl link = base.resolve("../../watch?v=cbP2N1BQdYc"); System.out.println(link);https://www.youtube.com/watch?v=cbP2N1BQdYcWhat's in a URL?
A URL has several components.Scheme
Sometimes referred to as protocol, A URL's scheme describes what mechanism should be used to retrieve the resource. Although URLs have many schemes (
mailto,file,ftp), this class only supportshttpandhttps. Usejava.net.URIfor URLs with arbitrary schemes.Username and Password
Username and password are either present, or the empty string
""if absent. This class offers no mechanism to differentiate empty from absent. Neither of these components are popular in practice. Typically HTTP applications use other mechanisms for user identification and authentication.Host
The host identifies the webserver that serves the URL's resource. It is either a hostname like
square.comorlocalhost, an IPv4 address like192.168.0.1, or an IPv6 address like::1.Usually a webserver is reachable with multiple identifiers: its IP addresses, registered domain names, and even
localhostwhen connecting from the server itself. Each of a webserver's names is a distinct URL and they are not interchangeable. For example, even ifhttp://square.github.io/daggerandhttp://google.github.io/daggerare served by the same IP address, the two URLs identify different resources.Port
The port used to connect to the webserver. By default this is 80 for HTTP and 443 for HTTPS. This class never returns -1 for the port: if no port is explicitly specified in the URL then the scheme's default is used.
Path
The path identifies a specific resource on the host. Paths have a hierarchical structure like "/square/okhttp/issues/1486" and decompose into a list of segments like ["square", "okhttp", "issues", "1486"].
This class offers methods to compose and decompose paths by segment. It composes each path from a list of segments by alternating between "/" and the encoded segment. For example the segments ["a", "b"] build "/a/b" and the segments ["a", "b", ""] build "/a/b/".
If a path's last segment is the empty string then the path ends with "/". This class always builds non-empty paths: if the path is omitted it defaults to "/". The default path's segment list is a single empty string: [""].
Query
The query is optional: it can be null, empty, or non-empty. For many HTTP URLs the query string is subdivided into a collection of name-value parameters. This class offers methods to set the query as the single string, or as individual name-value parameters. With name-value parameters the values are optional and names may be repeated.
Fragment
The fragment is optional: it can be null, empty, or non-empty. Unlike host, port, path, and query the fragment is not sent to the webserver: it's private to the client.
Encoding
Each component must be encoded before it is embedded in the complete URL. As we saw above, the string
cute #puppiesis encoded ascute%20%23puppieswhen used as a query parameter value.Percent encoding
Percent encoding replaces a character (like
🍩) with its UTF-8 hex bytes (like%F0%9F%8D%A9). This approach works for whitespace characters, control characters, non-ASCII characters, and characters that already have another meaning in a particular context.Percent encoding is used in every URL component except for the hostname. But the set of characters that need to be encoded is different for each component. For example, the path component must escape all of its
?characters, otherwise it could be interpreted as the start of the URL's query. But within the query and fragment components, the?character doesn't delimit anything and doesn't need to be escaped.
This prints:WebUrl url = WebUrl.parse("http://who-let-the-dogs.out").newBuilder() .addPathSegment("_Who?_") .query("_Who?_") .fragment("_Who?_") .build(); System.out.println(url);
When parsing URLs that lack percent encoding where it is required, this class will percent encode the offending characters.http://who-let-the-dogs.out/_Who%3F_?_Who?_#_Who?_IDNA Mapping and Punycode encoding
Hostnames have different requirements and use a different encoding scheme. It consists of IDNA mapping and Punycode encoding.
In order to avoid confusion and discourage phishing attacks, IDNA Mapping transforms names to avoid confusing characters. This includes basic case folding: transforming shouting
SQUARE.COMinto cool and casualsquare.com. It also handles more exotic characters. For example, the Unicode trademark sign (™) could be confused for the letters "TM" inhttp://ho™mail.com. To mitigate this, the single character (™) maps to the string (tm). There is similar policy for all of the 1.1 million Unicode code points. Note that some code points such as "🍩" are not mapped and cannot be used in a hostname.Punycode converts a Unicode string to an ASCII string to make international domain names work everywhere. For example, "σ" encodes as "xn--4xa". The encoded string is not human readable, but can be used with classes like
InetAddressto establish connections.Why another URL model?
Java includes both
java.net.URLandjava.net.URI. We offer a new URL model to address problems that the others don't.Different URLs should be different
Although they have different content,
java.net.URLconsiders the following two URLs equal, and theequals()method between them returns true:- http://square.github.io/
- http://google.github.io/
java.net.URLunusable for many things. It shouldn't be used as aMapkey or in aSet. Doing so is both inefficient because equality may require a DNS lookup, and incorrect because unequal URLs may be equal because of how they are hosted.Equal URLs should be equal
These two URLs are semantically identical, but
java.net.URIdisagrees:- http://host:80/
- http://host
:80) and the absent trailing slash (/) cause URI to bucket the two URLs separately. This harms URI's usefulness in collections. Any application that stores information-per-URL will need to either canonicalize manually, or suffer unnecessary redundancy for such URLs.Because they don't attempt canonical form, these classes are surprisingly difficult to use securely. Suppose you're building a webservice that checks that incoming paths are prefixed "/static/images/" before serving the corresponding assets from the filesystem.
By canonicalizing the input paths, they are complicit in directory traversal attacks. Code that checks only the path prefix may suffer!String attack = "http://example.com/static/images/../../../../../etc/passwd"; System.out.println(new URL(attack).getPath()); System.out.println(new URI(attack).getPath()); System.out.println(WebUrl.parse(attack).encodedPath());/static/images/../../../../../etc/passwd /static/images/../../../../../etc/passwd /etc/passwdIf it works on the web, it should work in your application
The
java.net.URIclass is strict around what URLs it accepts. It rejects URLs like "http://example.com/abc|def" because the '|' character is unsupported. This class is more forgiving: it will automatically percent-encode the '|', yielding "http://example.com/abc%7Cdef". This kind behavior is consistent with web browsers.WebUrlprefers consistency with major web browsers over consistency with obsolete specifications.Paths and Queries should decompose
Neither of the built-in URL models offer direct access to path segments or query parameters. Manually using
StringBuilderto assemble these components is cumbersome: do '+' characters get silently replaced with spaces? If a query parameter contains a '&', does that get escaped? By offering methods to read and write individual query parameters directly, application developers are saved from the hassles of encoding and decoding.Plus a modern API
The URL (JDK1.0) and URI (Java 1.4) classes predate builders and instead use telescoping constructors. For example, there's no API to compose a URI with a custom port without also providing a query and fragment.
Instances of
WebUrlare well-formed and always have a scheme, host, and path. Withjava.net.URLit's possible to create an awkward URL likehttp:/with scheme and path but no hostname. Building APIs that consume such malformed values is difficult!This class has a modern API. It avoids nulls and checked exceptions:
get(String)returns aWebUrlor throws anIllegalArgumentExceptionon invalid inputparse(String)returns anOptionalfrom(URI)andfrom(URL)return anOptional
You can even be explicit about whether each component has been encoded already.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static classWebUrl.Builder
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description static WebUrl.Builderbuilder()static intdefaultPort(String scheme)Returns 80 ifscheme.equals("http"), 443 ifscheme.equals("https")and -1 otherwise.StringencodedFragment()Returns this URL's encoded fragment, like"abc"forhttp://host/#abc.StringencodedPassword()Returns the password, or an empty string if none is set.StringencodedPath()Returns the entire path of this URL encoded for use in HTTP resource resolution.List<String>encodedPathSegments()Returns a list of encoded path segments like["a", "b", "c"]for the URLhttp://host/a/b/c.StringencodedQuery()Returns the query of this URL, encoded for use in HTTP resource resolution.StringencodedUsername()Returns the username, or an empty string if none is set.booleanequals(Object other)Stringfragment()Returns this URL's fragment, like"abc"forhttp://host/#abc.static Optional<WebUrl>from(URI uri)Returns anOptionalforuriif its protocol ishttporhttps, orOptional.empty()if it has any other protocol.static WebUrlget(String url)Returns a newWebUrlrepresentingurl.inthashCode()Stringhost()Returns the host address suitable for use withInetAddress.getAllByName(String).booleanisHttps()WebUrl.BuildernewBuilder()WebUrl.BuildernewBuilder(String link)Returns a builder for the URL that would be retrieved by followinglinkfrom this URL, or null if the resulting URL is not well-formed.static WebUrlof(URI uri)Returns anWebUrlforuri.static Optional<WebUrl>parse(String url)Returns a newOptionalrepresentingurlif it is a well-formed HTTP or HTTPS URL, orOptional.empty()if it isn't.Stringpassword()Returns the decoded password, or an empty string if none is present.List<String>pathSegments()Returns a list of path segments like["a", "b", "c"]for the URLhttp://host/a/b/c.intpathSize()Returns the number of segments in this URL's path.intport()Returns the explicitly-specified port if one was provided, or the default port for this URL's scheme.Stringquery()Returns this URL's query, like"abc"forhttp://host/?abc.StringqueryParameter(String name)Returns the first query parameter namednamedecoded using UTF-8, or null if there is no such query parameter.StringqueryParameterName(int index)Returns the name of the query parameter atindex.Set<String>queryParameterNames()Returns the distinct query parameter names in this URL, like["a", "b"]forhttp://host/?a=apple&b=banana.StringqueryParameterValue(int index)Returns the value of the query parameter atindex.List<String>queryParameterValues(String name)Returns all values for the query parameternameordered by their appearance in this URL.intquerySize()Returns the number of query parameters in this URL, like 2 forhttp://host/?a=apple&b=banana.Stringredact()Returns a string with containing this URL with its username, password, query, and fragment stripped, and its path replaced with/....WebUrlresolve(String link)Returns the URL that would be retrieved by followinglinkfrom this URL, or null if the resulting URL is not well-formed.Stringscheme()Returns either "http" or "https".StringtopPrivateDomain()Returns the domain name of this URL'shost()that is one level beneath the public suffix by consulting the public suffix list.StringtoString()URIuri()Returns this URL as ajava.net.URI.URLurl()Returns this URL as ajava.net.URL.Stringusername()Returns the decoded username, or an empty string if none is present.
-
-
-
Method Detail
-
url
public URL url()
Returns this URL as ajava.net.URL.
-
uri
public URI uri()
Returns this URL as ajava.net.URI. BecauseURIis more strict than this class, the returned URI may be semantically different from this URL:- Characters forbidden by URI like
[and|will be escaped. - Invalid percent-encoded sequences like
%xxwill be encoded like%25xx. - Whitespace and control characters in the fragment will be stripped.
These differences may have a significant consequence when the URI is interpreted by a webserver. For this reason the URI class and this method should be avoided.
- Characters forbidden by URI like
-
scheme
public String scheme()
Returns either "http" or "https".
-
isHttps
public boolean isHttps()
-
encodedUsername
public String encodedUsername()
Returns the username, or an empty string if none is set.URL encodedUsername()http://host/""http://username@host/"username"http://username:password@host/"username"http://a%20b:c%20d@host/"a%20b"
-
username
public String username()
Returns the decoded username, or an empty string if none is present.URL username()http://host/""http://username@host/"username"http://username:password@host/"username"http://a%20b:c%20d@host/"a b"
-
encodedPassword
public String encodedPassword()
Returns the password, or an empty string if none is set.URL encodedPassword()http://host/""http://username@host/""http://username:password@host/"password"http://a%20b:c%20d@host/"c%20d"
-
password
public String password()
Returns the decoded password, or an empty string if none is present.URL password()http://host/""http://username@host/""http://username:password@host/"password"http://a%20b:c%20d@host/"c d"
-
host
public String host()
Returns the host address suitable for use withInetAddress.getAllByName(String). May be:- A regular host name, like
android.com. - An IPv4 address, like
127.0.0.1. - An IPv6 address, like
::1. Note that there are no square braces. - An encoded IDN, like
xn--n3h.net.
URL host()http://android.com/"android.com"http://127.0.0.1/"127.0.0.1"http://[::1]/"::1"http://xn--n3h.net/"xn--n3h.net" - A regular host name, like
-
port
public int port()
Returns the explicitly-specified port if one was provided, or the default port for this URL's scheme. For example, this returns 8443 forhttps://square.com:8443/and 443 forhttps://square.com/. The result is in[1..65535].URL port()http://host/80http://host:8000/8000https://host/443
-
defaultPort
public static int defaultPort(String scheme)
Returns 80 ifscheme.equals("http"), 443 ifscheme.equals("https")and -1 otherwise.
-
pathSize
public int pathSize()
Returns the number of segments in this URL's path. This is also the number of slashes in the URL's path, like 3 inhttp://host/a/b/c. This is always at least 1.URL pathSize()http://host/1http://host/a/b/c3http://host/a/b/c/4
-
encodedPath
public String encodedPath()
Returns the entire path of this URL encoded for use in HTTP resource resolution. The returned path will start with"/".URL encodedPath()http://host/"/"http://host/a/b/c"/a/b/c"http://host/a/b%20c/d"/a/b%20c/d"
-
encodedPathSegments
public List<String> encodedPathSegments()
Returns a list of encoded path segments like["a", "b", "c"]for the URLhttp://host/a/b/c. This list is never empty though it may contain a single empty string.URL encodedPathSegments()http://host/[""]http://host/a/b/c["a", "b", "c"]http://host/a/b%20c/d["a", "b%20c", "d"]
-
pathSegments
public List<String> pathSegments()
Returns a list of path segments like["a", "b", "c"]for the URLhttp://host/a/b/c. This list is never empty though it may contain a single empty string.URL pathSegments()http://host/[""]http://host/a/b/c"["a", "b", "c"]http://host/a/b%20c/d"["a", "b c", "d"]
-
encodedQuery
public String encodedQuery()
Returns the query of this URL, encoded for use in HTTP resource resolution. The returned string may be null (for URLs with no query), empty (for URLs with an empty query) or non-empty (all other URLs).URL encodedQuery()http://host/null http://host/?""http://host/?a=apple&k=key+lime"a=apple&k=key+lime"http://host/?a=apple&a=apricot"a=apple&a=apricot"http://host/?a=apple&b"a=apple&b"
-
query
public String query()
Returns this URL's query, like"abc"forhttp://host/?abc. Most callers should preferqueryParameterName(int)andqueryParameterValue(int)because these methods offer direct access to individual query parameters.URL query()http://host/null http://host/?""http://host/?a=apple&k=key+lime"a=apple&k=key lime"http://host/?a=apple&a=apricot"a=apple&a=apricot"http://host/?a=apple&b"a=apple&b"
-
querySize
public int querySize()
Returns the number of query parameters in this URL, like 2 forhttp://host/?a=apple&b=banana. If this URL has no query this returns 0. Otherwise it returns one more than the number of"&"separators in the query.URL querySize()http://host/0http://host/?1http://host/?a=apple&k=key+lime2http://host/?a=apple&a=apricot2http://host/?a=apple&b2
-
queryParameter
public String queryParameter(String name)
Returns the first query parameter namednamedecoded using UTF-8, or null if there is no such query parameter.URL queryParameter("a")http://host/null http://host/?null http://host/?a=apple&k=key+lime"apple"http://host/?a=apple&a=apricot"apple"http://host/?a=apple&b"apple"
-
queryParameterNames
public Set<String> queryParameterNames()
Returns the distinct query parameter names in this URL, like["a", "b"]forhttp://host/?a=apple&b=banana. If this URL has no query this returns the empty set.URL queryParameterNames()http://host/[]http://host/?[""]http://host/?a=apple&k=key+lime["a", "k"]http://host/?a=apple&a=apricot["a"]http://host/?a=apple&b["a", "b"]
-
queryParameterValues
public List<String> queryParameterValues(String name)
Returns all values for the query parameternameordered by their appearance in this URL. For example this returns["banana"]forqueryParameterValue("b")onhttp://host/?a=apple&b=banana.URL queryParameterValues("a")queryParameterValues("b")http://host/[][]http://host/?[][]http://host/?a=apple&k=key+lime["apple"][]http://host/?a=apple&a=apricot["apple", "apricot"][]http://host/?a=apple&b["apple"][null]
-
queryParameterName
public String queryParameterName(int index)
Returns the name of the query parameter atindex. For example this returns"a"forqueryParameterName(0)onhttp://host/?a=apple&b=banana. This throws ifindexis not less than the query size.URL queryParameterName(0)queryParameterName(1)http://host/exception exception http://host/?""exception http://host/?a=apple&k=key+lime"a""k"http://host/?a=apple&a=apricot"a""a"http://host/?a=apple&b"a""b"
-
queryParameterValue
public String queryParameterValue(int index)
Returns the value of the query parameter atindex. For example this returns"apple"forqueryParameterName(0)onhttp://host/?a=apple&b=banana. This throws ifindexis not less than the query size.URL queryParameterValue(0)queryParameterValue(1)http://host/exception exception http://host/?null exception http://host/?a=apple&k=key+lime"apple""key lime"http://host/?a=apple&a=apricot"apple""apricot"http://host/?a=apple&b"apple"null
-
encodedFragment
public String encodedFragment()
Returns this URL's encoded fragment, like"abc"forhttp://host/#abc. This returns null if the URL has no fragment.URL encodedFragment()http://host/null http://host/#""http://host/#abc"abc"http://host/#abc|def"abc|def"
-
fragment
public String fragment()
Returns this URL's fragment, like"abc"forhttp://host/#abc. This returns null if the URL has no fragment.URL fragment()http://host/null http://host/#""http://host/#abc"abc"http://host/#abc|def"abc|def"
-
redact
public String redact()
Returns a string with containing this URL with its username, password, query, and fragment stripped, and its path replaced with/.... For example, redactinghttp://username:password@example.com/pathreturnshttp://example.com/....
-
resolve
public WebUrl resolve(String link)
Returns the URL that would be retrieved by followinglinkfrom this URL, or null if the resulting URL is not well-formed.
-
newBuilder
public WebUrl.Builder newBuilder()
-
newBuilder
public WebUrl.Builder newBuilder(String link)
Returns a builder for the URL that would be retrieved by followinglinkfrom this URL, or null if the resulting URL is not well-formed.
-
parse
public static Optional<WebUrl> parse(String url)
Returns a newOptionalrepresentingurlif it is a well-formed HTTP or HTTPS URL, orOptional.empty()if it isn't.
-
get
public static WebUrl get(String url)
Returns a newWebUrlrepresentingurl. This method is intended for inputs that are known to be valid. If the validity of the input string is unknown, useparse(String)instead.- Throws:
IllegalArgumentException- Ifurlis not a well-formed HTTP or HTTPS URL.
-
of
public static WebUrl of(URI uri)
Returns anWebUrlforuri. This method is intended for inputs that are known to be valid. If the validity of the input URI is unknown, usefrom(URI)instead.- Throws:
IllegalArgumentException- Ifuriis not a well-formed HTTP or HTTPS URL.
-
from
public static Optional<WebUrl> from(URI uri)
Returns anOptionalforuriif its protocol ishttporhttps, orOptional.empty()if it has any other protocol.
-
builder
public static WebUrl.Builder builder()
-
topPrivateDomain
public String topPrivateDomain()
Returns the domain name of this URL'shost()that is one level beneath the public suffix by consulting the public suffix list. Returns null if this URL'shost()is an IP address or is considered a public suffix by the public suffix list.In general this method should not be used to test whether a domain is valid or routable. Instead, DNS is the recommended source for that information.
URL topPrivateDomain()http://google.com"google.com"http://adwords.google.co.uk"google.co.uk"http://squarenull http://co.uknull http://localhostnull http://127.0.0.1null
-
-