C# and wiki markup parsing

Wiki Markup is the syntax or keyword used by wiki websites to format a webpage. You can use *asterisks* around a word for bold and _underscore_ for italic and this is common and easy to adopt.

Examples:

  • _italic_ or {{italic}}
  • *bold* or ((bold))


Wikipedia is using a different style of wiki markup and Google project hosting (as well as Google+) is using the same wiki markup I demonstrated above.

Implementation in C#
I am using Regex to implement wiki markup in C#.

And here is the method that does the entire job required for parsing any kind of wiki markup.

private static string ReplaceWikiMarkup(this string inputString, IList<string> pairs,
                                        IList<string> startTag, IList<string> endTag)
{
    for (var i = 0; i < pairs.Count; i++)
    {
        var r = new Regex(pairs[i] + "(.*?)" + pairs[i]);
        var matches = r.Matches(inputString);
        for (var j = 0; j < matches.Count; j++)
        {
            inputString = inputString.Replace(matches[j].Groups[0].Value,
                                                      startTag[i] +
                                                            matches[j].Groups[1].Value.Trim() +
                                                      endTag[i]);
        }
    }
    return inputString;
}

Using (calling) above method:



public static string ReplaceWikiMarkup(this string inputString)
{
    inputString = inputString.Replace("*", "-b-"); /* becasue asterisk(*) is a reserved word in Regex */
    return inputString.ReplaceWikiMarkup(
                new[] { "-b-", "_" },
                new[] { "<b>", "<i>" },        /* SEO: use <strong> instead */
                new[] { "</b>", "</i>" });
}

Understanding the method 
There are 4 arguments we are passing to the method:
  • inputString - Input in any format
  • pairs - Wiki Markup syntax or keywords: *Asterisk*, _underscore_ or anything else you use
  • startTag - This can be HTML element’s empty starting tag or stating tag with any number of attributes Examples: <b><i><strong><h2 class="something"><pre style="something: something ;">
  • endTag - This is the ending tag of HTML element Examples: </b></i></strong></h2></pre>
You must be careful for the ordering of HTML elements for startTag and endTag parameters. I mean that if startTag’s first element is <b> then endTag’s first element must be </b> and so on. Heading: Extending the method Let us assume that you want to use {{{ }}} or <code></code> as wiki markup for letting end-user inject the code.
  • Code: {{{ --- }}} or <code> --- </code>
  • Heading: [[ Heading ]]
  • Bold: (( Bold ))
public static string ReplaceWikiMarkup(this string inputString, 
                                IList<string> startKeywords, /* {{{ or <code> or <h> */
                                IList<string> endKeywords,   /* }}} or </code> or </h> */
                                IList<string> startTag,      /* HTML equivalent starting tag */
                                IList<string> endTag)        /* HTML equivalent ending tag */
{
    for (var i = 0; i < startKeywords.Count; i++)
    {
        var r = new Regex(startKeywords[i] + "(.*?)" + endKeywords[i]);
        var matches = r.Matches(inputString);
        for (var j = 0; j < matches.Count; j++)
        {
            inputString = inputString.Replace(matches[j].Groups[0].Value,
                                                      startTag[i] + matches[j].Groups[1].Value.Trim() +
                                                      endTag[i]);
        }
    }
    return inputString;
}

And now we are ready to call above (changed) method:


public static string ReplaceWikiMarkup(this string inputString)
{
    inputString = inputString.Replace("*", "-b-");
    return inputString.Replace(
                new[] { "-b-", "_", "<h>", "<code>" },
                new[] { "-b-", "_", "</h>", "</code>" },
                new[] { "<b>", "<i>", "<h2>", "<pre>" },
                new[] { "</b>", "</i>", "</h2>", "</pre>" });
}