Friday 22 January 2010

Manage c# threads easily using an array of BackgroundWorker class objects


I recently built a piece of software that processed files on a server.  There are hundreds of thousands of files, so this took a long time to run through, so I decided to use threads in order to process multiple files at once.

I wanted to be able to manage these threads, and be able to control how many threads I used by a programmatic variable so I could use more or less threads depending upon the power of the machine the process would be run on, and I could try different numbers of threads and benchmark the results to find the optimum number of threads.

The solution I came up with uses an array of .Net 2.0 BackgroundWorker class objects, initialised at run time to the size of the integer variable "maxThreads".

The following code sets up the variables and initialises the array:

static int maxThreads = 20;  //Make bigger or smaller, it's up to you!
private BackgroundWorker[] threadArray = new BackgroundWorker[maxThreads];
static int _numberBackGroundThreads ;  //Just for fun

// Set up the BackgroundWorker object by 
// attaching event handlers. 
private void InitializeBackgoundWorkers()
{
    for (int f = 0; f < maxThreads; f++)
    {
        threadArray[f] = new BackgroundWorker();
        threadArray[f].DoWork +=
            new DoWorkEventHandler(backgroundWorkerFiles_DoWork);
        threadArray[f].RunWorkerCompleted +=
            new RunWorkerCompletedEventHandler(backgroundWorkerFiles_RunWorkerCompleted);
        threadArray[f].ProgressChanged +=
            new ProgressChangedEventHandler(backgroundWorkerFiles_ProgressChanged);
        threadArray[f].WorkerReportsProgress = true;
        threadArray[f].WorkerSupportsCancellation = true;

    }
}


Each BackgroundWorker class has three event handlers assigned to it:
  1. backgroundWorkerFiles_DoWork - This delegate method is used to run the process
  2. backgroundWorkerFiles_RunWorkerCompleted - this delegate method is called once the "DoWork" method has completed
  3. backgroundWorkerFiles_ProgressChanged - this delegate method is used to pass information back to the calling thread, for example to report progress to the GUI thread.
This delegate methods are discussed at length on the MSDN site so I wont go into them in detail here other than to show you this simple code outline:
private void backgroundWorkerFiles_DoWork(object sender, DoWorkEventArgs e)
{
    //Just for fun - increment the count of the number of threads we are currently using.  Can show this number in the GUI.
    _numberBackGroundThreads --;
    
    // Get argument from DoWorkEventArgs argument.  Can use any type here with cast
    int myProcessArguments = (int)e.Argument;

    // "ProcessFile" is the name of my method that does the main work.  Replace with your own method!  
    // Can return reulsts from this method, i.e. a status (OK, FAIL etc)
    e.Result = ProcessFile(myProcessArgument);
}

private void backgroundWorkerFiles_ProgressChanged(object sender, ProgressChangedEventArgs e)
{
    // Use this method to report progress to GUI
}

private void backgroundWorkerFiles_RunWorkerCompleted(object sender, RunWorkerCompletedEventArgs e)
{
    // First, handle the case where an exception was thrown.
    if (e.Error != null)
    {
        MessageBox.Show(e.Error.Message);
    }

    // For fun - print out the result of the ProcessFile() method.
    debug.print = e.Result.ToString();

    // Just for fun - decrement the count of threads
    _numberBackGroundThreads --;
}

OK now comes the fun part. The following code shows a loop thorugh a large number of items. Rather than run ProcessFile() for each item in the loop in turn, we instead choose an unused thread to run it in. This allows the loop to step onto the next item, which also is allocated an empty thread.
// Some process with many iterations
for(int f = 0; f < 100000; f++)
{

    //Use the thread array to process ech iteration
    //choose the first unused thread.
    bool fileProcessed = false;
    while (!fileProcessed)
    {
        for (int threadNum = 0; threadNum < maxThreads; threadNum++)
        {
            if (!threadArray[threadNum].IsBusy)
            {   // This thread is available
                Debug.Print("Starting thread: " + threadNum);
        
                //Call the "RunWorkerAsync()" method of the thread.  
                //This will call the delegate method "backgroundWorkerFiles_DoWork()" method defined above.  
                //The parameter passed (the loop counter "f") will be available through the delegate's argument "e" through the ".Argument" property.
                threadArray[threadNum].RunWorkerAsync(f);
                fileProcessed = true;
                break;
            }
        }
        //If all threads are being used, sleep awhile before checking again
        if (!fileProcessed)
        {
            Thread.Sleep(50);
        }
    }
}

Using this technique, it's eay to create 2, 10, or even 100 threads to process each loop iteration asychronously, which speeds up execution time enormously.  Simply change the value of "maxThreads" to whatever you need!
Seksy Watches on Yngoo!
Click here for the bestselling Seksy Watches and deals on Yngoo!

Wednesday 6 January 2010

What is the difference between a URI and a URL?

If you get confused between URLs (Uniform Resource Locators) and URIs (Uniform Resource Identifiers), then simply think of a URL as a special type of URI used to point to resources on the WWW.

The term "URL" is now deprecated (a term meaning marked for removal in computing circles), and the term "URI" is now the accepted term in computer science. However with the rise of the internet "URL" has become entrenched within the English language and is used and understood widely among non computer scientists. It's not possible for any group of individuals to deprecate words (although plenty have tried...) so "URL" will probably long outlive its technical definition.

At the end of the day, I expect that if you should use the term "URL" when you really mean "URI" everybody will know what you mean!

Tuesday 5 January 2010

Strip and manipulate a URL by breaking it into segments (.NET 2.0)

I recently required a method that would take a string that contain a URL (href), and another string that contained a root section of this URL (root), and return a string that contained the remained of the URL (i.e. the section of href that remained once the root had been removed).

To make matters more complicated, the href parameter could have some unusual features.  Because this URL was pointing to content created by users in a Content Management System (CMS), some segments of the URL contained trailing or leading white space (segments being the bits of the URL between the slashes).   This whitespace is fine in the CMS system, but my method must strip this whitespace to return a canonical URL.

Fortunately .Net 2.0 onwards provides us with the URI class.  This has lots of fabulous methods and properties, but in this example I shall use it to:
  1. turn the parameter strings "root" and "href" URLs into canonical URIs
  2. break down the "href" parameter into segments, 
  3. ignore the segments that exist in the "root" parameter,
  4. strip leading and trailing whitespace from the remaining segments
  5. return the canonicalised section of the URL


string GetRootStrippedURI(string root, string href)
{
    Uri fileUri = new Uri(Uri.UnescapeDataString(href));
    Uri rootUri = new Uri(Uri.UnescapeDataString(root));

    // Create the return string from the root
    string strippedExtension = "";

    // Loop through segments not in the root and clean them up
    for (int i = rootUri.Segments.Length; i < fileUri.Segments.Length - 1; i++)
    {
        strippedExtension += fileUri.Segments[i].TrimEnd().TrimStart();
    }
    return strippedExtension;      
}