Regex How to remove all html tags

The below example demonstrate the c# reg ex library. You can use this example to remove all html and special tags from any string.

Here is the code

class Program
    {
        static
        string whom =
        @"
        <p class=""MsoListParagraphCxSpFirst"" style=""text-indent: -0.25in; margin: 0in 0in 0pt 0.5in; mso-list: l0 level1 lfo1"">
            <span style=""font-family: symbol; mso-list: ignore; mso-fareast-font-family: symbol; mso-bidi-font-family: symbol"">
                ·
            </span>
            A
        </p>

        <p class=""MsoListParagraphCxSpMiddle"" style=""text-indent: -0.25in; margin: 0in 0in 0pt 0.5in; mso-list: l0 level1 lfo1"">
            <span style=""font-family: symbol; mso-list: ignore; mso-fareast-font-family: symbol; mso-bidi-font-family: symbol"">
                ·
                <span style=’font: 7pt ""times new roman""’>
                </span>
            </span>
            B
        </p>

        <p class=""MsoListParagraphCxSpLast"" style=""text-indent: -0.25in; margin: 0in 0in 10pt 0.5in; mso-list: l0 level1 lfo1"">
            <span style=""font-family: symbol; mso-list: ignore; mso-fareast-font-family: symbol; mso-bidi-font-family: symbol"">
                ·
                <span style=’font: 7pt ""times new roman""’>
                </span>
            </span>
            C
        </p>

        <p class="""" style=""margin: 0in 0in 10pt"">
            <font color=""#000000"" size=""3"" face=""Calibri"">
                Hello this is tarun Hello this is tarun Hello this is tarun Hello this is tarun Hello this is tarun Hello this is tarun Hello this is tarun Hello this is tarun Hello this is tarun Hello this is tarun Hello this is tarun Hello this is tarun Hello this is tarun
                    <b style=""mso-bidi-font-weight: normal"">
                        Hello this is tarun Hello this is tarun Hello this is tarun Hello this is tarun Hello this is tarun Hello this is tar
                    </b>
                un Hello this is tarun Hello this is tarun Hello this is tarun Hello this is tarun Hello this is tarun Hello this is tarun Hello this is tarun Hello this is tarun Hello this is tarun
            </font>
        </p>";

        static string from = @"<([span|font]*)\b[^>]*>(.*?)</\1>";

        static void Main(string[] args)
        {
            from = @"<(p) class=\""MsoListPara*\""*[^>]*>(.*?)</\1>";
            whom = Regex.Replace(whom, from, "<li>$2</li>", RegexOptions.Singleline);

            from = @"<([span|font]*)\b[^>]*>(.*?)</\1>";
            whom = Regex.Replace(whom, from, "$2", RegexOptions.Singleline);
            whom = Regex.Replace(whom, from, "$2", RegexOptions.Singleline);
            whom = Regex.Replace(whom, from, "$2", RegexOptions.Singleline);

            Console.WriteLine(whom);

            Console.ReadLine();
        }
    }

Let me know if you see any difficulty in this.

Tarun Juneja

Advertisements

4 thoughts on “Regex How to remove all html tags

  1. one easy example to remove html tagspublic class RegexDemo { public static void Main(string[] args) { string html = "<h1> <span>mayank </span> INDIA <b> tarun </b> TEST </h1>"; string regEx = @"<(h1|span)\\b[^>]*>(.*)</\\1>"; html = Regex.Replace(html, regEx, "$2", RegexOptions.Singleline); html = Regex.Replace(html, regEx, "$2", RegexOptions.Singleline); Console.WriteLine(html); } }

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s