Regex for capturing all the urls in a paragraph except for a specific domain
See here for the original answer.
This should be sufficient for your use case:
/(?<!\S)(?:https?:\/\/)?(?:(?:(?!example)\w+[.-])+[a-z]{2,11})(?!\S)/gi
See here for a demonstration of the regex at work. Below is a rough explanation of what the regex is doing:
- The leading and trailing
(?<!\S)essentially splits the string into segments on space characters, including whitespace and newlines - The
?:syntax makes each set of parenthesis it is in a non-capture group, saving memory on the machine where it is ran and speeding up your execution time (?:https?:\/\/)?optionally matches bothhttpandhttpsfor URLs without matching the invalid characters:and/anywhere else in the URL(?:(?!example)\w+[.-])+looks for one or more words that do not matchexample, followed by either a hyphen or a period[a-z]{2,11}matches the final domain extension, i.e.com,org, orenterprises