~ 3 min read
URL Regex Validation: what can go wrong?
Many developers might rely on regular expressions to validate URLs they receive as user input from forms or other APIs, but this approach can lead to produce bad URL regex patterns that result in security vulnerabilities.
Surely it’s common to use regex to validate URLs such as users submitting their social media profiles, or providing images via URL links but let’s deep dive into a CVE that was identified for an npm library that was using a regex pattern to validate URLs and learn from this mistake.
URL Regex Validation
The node-forge npm package gets almost 20 million downloads a week and provides a native implementation for TLS and other cryptographic functions in JavaScript.
This npm package also includes a utility parseUrl
function, as follows (in the vulnerable code version):
util.parseUrl = function(str) {
var regex = /^(https?):\/\/([^:&^\/]*):?(\d*)(.*)$/g;
regex.lastIndex = 0;
var m = regex.exec(str);
var url = (m === null) ? null : {
full: str,
scheme: m[1],
host: m[2],
port: m[3],
path: m[4]
};
if(url) {
url.fullHost = url.host;
if(url.port) {
if(url.port !== 80 && url.scheme === 'http') {
url.fullHost += ':' + url.port;
} else if(url.port !== 443 && url.scheme === 'https') {
url.fullHost += ':' + url.port;
}
} else if(url.scheme === 'http') {
url.port = 80;
} else if(url.scheme === 'https') {
url.port = 443;
}
url.full = url.scheme + '://' + url.fullHost;
}
return url;
};
What could go wrong with this URL regex validation code?
👋 Just a quick break
I'm Liran Tal and I'm the author of the newest series of expert Node.js Secure Coding books. Check it out and level up your JavaScript
The Vulnerability: CVE-2022-0122
The function from the node-forge
packaged called parseUrl
makes use of a regex pattern that is vulnerable to a URL redirection. If an attacker provides a URL with a backslash after the protocol, such as https:/\/\/\
, the parseUrl
function will interpret the URL as a relative path and redirect the user to an untrusted site.
The vulnerable regex pattern used in the parseUrl
function is ^(https?):\/\/([^:&^\/]*):?(\d*)(.*)$
creates room for URL confusion and redirection to an untrusted site due to the backslash after the protocol in the URL provided by the attacker.
Based on the proof-of-concept provided by the researcher, we can explore a scenario where the parseUrl
function is used to parse a URL provided by an attacker:
var forge = require("node-forge");
var url = forge.util.parseUrl("https:/\/\/\www.github.com/foo/bar");
console.log(url);
What do you expect the url
output to be?
If we use the native new URL()
function in JavaScript to parse the provided URL in the above POC example, we get the following output:
var url = new URL("https:/\/\/\www.github.com/foo/bar");
console.log(url);
// Output:
URL {
href: 'https://www.github.com/foo/bar',
origin: 'https://www.github.com',
protocol: 'https:',
username: '',
password: '',
host: 'www.github.com',
hostname: 'www.github.com',
port: '',
pathname: '/foo/bar',
search: '',
searchParams: URLSearchParams {},
hash: ''
}
The new URL()
function correctly parses the provided URL, however the parseUrl
function in the node-forge
npm package fails to parse the URL correctly and redirects the user to an untrusted site:
{
full: 'https://',
scheme: 'https',
host: '',
port: 443,
path: '/www.github.com/foo/bar',
fullHost: ''
}
In the above example path should be set to /foo/bar
, however it is set to /www.github.com/foo/bar
which is incorrect and leads to URL redirection to an untrusted site.
URL Regex Validation Impact
The impact of this vulnerability is that an attacker can provide a malicious URL with a backslash after the protocol, and result not just in an open redirect vulnerability but also have an impact such as phishing attacks, social engineering attacks, or further yet a Server-side Request Forgery (SSRF) attack if the URL is used to make requests to internal services and the developer relies on the parseUrl
function to parse the URL and validate it.