Regular expression to extract inner text from anchor tags

by bryian 8. February 2011 18:24

Several days ago, someone at the forum has asked how to extract the text from a hyperlink and preserve other HTML tags. It sound interesting, I did some research but can't find the direct solution. So, I decide to put together a simple regular expression to execute the task.

Regular Expression: (<[a|A][^>]*>|</[a|A]>)


<[a|A][^>]*> -- Remove <a href="a.aspx">
</[a|A]> -- Remove </a> tag

Example 1:

string str1 = "<a href=\"\" class=\"someclass\">Mastering Regular Expressions</a> 
-- <A href=\"\">CNN</a> <div><a href=\"\"></a></div>";

str1 = System.Text.RegularExpressions.Regex.Replace(str1, "(<[a|A][^>]*>|</[a|A]>)", "");

Result: Mastering Regular Expressions -- CNN <div> </div>

Example 2:

string str2 = "<div><a href=\"\" class=\"someclass\">ysatech</a></div>";

str2 = System.Text.RegularExpressions.Regex.Replace(str2, "(<[a|A][^>]*>|</[a|A]>)", "");

Result: <div>ysatech</div>

Test this regular expression here.