~ 3 min read

URL Regex Validation: what can go wrong?

share this story on
Are you using regex to validate URLs? Learn from a CVE identified in the node-forge npm package that was using a regex pattern to validate URLs and resulted in a security vulnerability.

Many developers might rely on regular expressions to validate URLs they receive as user input from forms or other APIs, but this approach can lead to produce bad URL regex patterns that result in security vulnerabilities.

Surely it’s common to use regex to validate URLs such as users submitting their social media profiles, or providing images via URL links but let’s deep dive into a CVE that was identified for an npm library that was using a regex pattern to validate URLs and learn from this mistake.

URL Regex Validation

The node-forge npm package gets almost 20 million downloads a week and provides a native implementation for TLS and other cryptographic functions in JavaScript.

This npm package also includes a utility parseUrl function, as follows (in the vulnerable code version):

util.parseUrl = function(str) {
  var regex = /^(https?):\/\/([^:&^\/]*):?(\d*)(.*)$/g;
  regex.lastIndex = 0;
  var m = regex.exec(str);
  var url = (m === null) ? null : {
    full: str,
    scheme: m[1],
    host: m[2],
    port: m[3],
    path: m[4]
  };
  if(url) {
    url.fullHost = url.host;
    if(url.port) {
      if(url.port !== 80 && url.scheme === 'http') {
        url.fullHost += ':' + url.port;
      } else if(url.port !== 443 && url.scheme === 'https') {
        url.fullHost += ':' + url.port;
      }
    } else if(url.scheme === 'http') {
      url.port = 80;
    } else if(url.scheme === 'https') {
      url.port = 443;
    }
    url.full = url.scheme + '://' + url.fullHost;
  }
  return url;
};

What could go wrong with this URL regex validation code?

👋 Just a quick break

I'm Liran Tal and I'm the author of the newest series of expert Node.js Secure Coding books. Check it out and level up your JavaScript

Node.js Secure Coding: Defending Against Command Injection Vulnerabilities
Node.js Secure Coding: Prevention and Exploitation of Path Traversal Vulnerabilities

The Vulnerability: CVE-2022-0122

The function from the node-forge packaged called parseUrl makes use of a regex pattern that is vulnerable to a URL redirection. If an attacker provides a URL with a backslash after the protocol, such as https:/\/\/\, the parseUrl function will interpret the URL as a relative path and redirect the user to an untrusted site.

The vulnerable regex pattern used in the parseUrl function is ^(https?):\/\/([^:&^\/]*):?(\d*)(.*)$ creates room for URL confusion and redirection to an untrusted site due to the backslash after the protocol in the URL provided by the attacker.

Based on the proof-of-concept provided by the researcher, we can explore a scenario where the parseUrl function is used to parse a URL provided by an attacker:

var forge = require("node-forge");
var url = forge.util.parseUrl("https:/\/\/\www.github.com/foo/bar");
console.log(url);

What do you expect the url output to be?

If we use the native new URL() function in JavaScript to parse the provided URL in the above POC example, we get the following output:

var url = new URL("https:/\/\/\www.github.com/foo/bar");
console.log(url);

// Output:
URL {
  href: 'https://www.github.com/foo/bar',
  origin: 'https://www.github.com',
  protocol: 'https:',
  username: '',
  password: '',
  host: 'www.github.com',
  hostname: 'www.github.com',
  port: '',
  pathname: '/foo/bar',
  search: '',
  searchParams: URLSearchParams {},
  hash: ''
}

The new URL() function correctly parses the provided URL, however the parseUrl function in the node-forge npm package fails to parse the URL correctly and redirects the user to an untrusted site:

{
  full: 'https://',
  scheme: 'https',
  host: '',
  port: 443,
  path: '/www.github.com/foo/bar',
  fullHost: ''
}

In the above example path should be set to /foo/bar, however it is set to /www.github.com/foo/bar which is incorrect and leads to URL redirection to an untrusted site.

URL Regex Validation Impact

The impact of this vulnerability is that an attacker can provide a malicious URL with a backslash after the protocol, and result not just in an open redirect vulnerability but also have an impact such as phishing attacks, social engineering attacks, or further yet a Server-side Request Forgery (SSRF) attack if the URL is used to make requests to internal services and the developer relies on the parseUrl function to parse the URL and validate it.


Node.js Security Newsletter

Subscribe to get everything in and around the Node.js security ecosystem, direct to your inbox.

    JavaScript & web security insights, latest security vulnerabilities, hands-on secure code insights, npm ecosystem incidents, Node.js runtime feature updates, Bun and Deno runtime updates, secure coding best practices, malware, malicious packages, and more.