Wednesday, October 9, 2013

QA Automation a la Westinghouse

No single technological advance meant more for a maturing railroad industry than the invention of the air brake…. http://explorepahistory.com/hmarker.php?markerId=1-A-1A9

This like the story of the invention of the computer programming language C++ are occasions where automation not only made a hugely significant impact, but also they are stories that are relatively open, providing great insight into the minds of great automators.

George Westinghouse, like Nikola Tesla (who Westinghouse employed), did not invent from nothing or just work hard. Where Tesla had decided upon AC as the solution upon seeing how obviously poor performant the brush-based DC solutions must be, Westinghouse decided upon an engineer-driven railroad braking system similar to how a horse-drawn carriage driver pulls the reigns, Westinghouse envisioned the engineer pulling reigns of some sort to apply brakes on every car (this couldn’t be employed with a physical connection, ie multiple levers pulling a metal shaft per car). Westinghouse’s great epiphany towards solving the problem of brakemen running on top of the cars and the scaling and (life) cost and poor stopping performance of the manual solution did not come from banging his head on a train-specific problem, his great epiphany came upon hearing news of an air-driven drill employed in Italy to excavate minerals. The simplicity of the invention in hindsight is awesome. The length of time to implement less so. The end result, though, the train industry scaled, and in the expansion was able to remove the need to have men run atop the train cars, saving lives, but more importantly to the owners increasing the number of cars at the command of the engineer who was nearest the upcoming rail obstacles.

With QA being the engineers with the clearest sight of obstacles (and opportunities since we aren’t on a rail), automation is our salvation.

Saturday, August 17, 2013

Interview Questions - Teaching, !Yet Another For Loop

Q: How would you teach loops to someone who is new to programming?
After the likely:
var loopAction =
  (i) => { Console.WriteLine("Current Value: {0}", i); };
for(var i = 0; i <= 10; i++) {
  loopAction(i);
}
Prompt for a loop that uses a non-numerical loop-control variable.  This should give the developer the opportunity to show that they truly understand the nuts&bolts of what we do.

Friday, August 9, 2013

Interview Questions - i18n and Beyond YU

Q: While updating an  ASP.NET Commerce Starter Kit (CSK) implementation, you encounter the following:
    <asp:dropdownlist font="" id="ddlCountry" nbsp="" runat="server">
        <%/* snip - other countries, for brevity snip */%>
        <asp:listitem value="YU">Yugoslavia</asp:listitem>
    </asp:dropdownlist>
Similarly, the following is in the code behind:
    public enum Country
    {
        /* snip - other countries, for brevity snip */
        [Description("Yugoslavia")]
        YU = 235,
    }
What would make this okay?
If it is not okay, what would you propose to correct this?
Background: In 1991, Yugoslavia and its status as an internationally recognized country disintegrated.
The CSK implementation was originally contracted after 1991. Does this fact change the way you will approach the solution? If so, how?

Friday, June 14, 2013

Worthy as in Ladybugs

"Interesting and helpful information. If at all you are free any time, would like to understand more on AWS side."
- Contractor who is charging the company I work for $$ per hour

Programmatic access to ephemeral ~hardware resources that has been available to the public since 2004, with documentation well written and hedged by the company and community, hmm, yes, lemme spend some of my free time regurgitating enough of it to be dangerous.

There is a part in all of us, our inner blowhard, that takes pride in receiving adoration or praise for presenting knowledge whether it is our own or as in this case quite some many others' worthy product.  We see this in tech company's efforts to re-publish the web on corporate wikis (write the novel bits instead).  We see this in hallway "soapbox" sessions.  There is no denying the satisfaction that comes from people listening (and earnestly) to you.

But free?!  Which free is intended here?  Afaic the exchange of 3rd person knowledge is neither free as in beer nor free as in speech.  Re-representing a non-novel concept in a manner that lifts those from the inability to seek and acquire information for themselves is quite costly.

But there is a place for such a thing and it lies in the distinction between a similar juxtaposition.  The exchange of knowledge should not be "Worthy as in Snickers", but "Worthy as in ladybugs".

Snickers satisfies.  If you don't like chocolate, peanuts, nougat, mouth watering caramel, and the care that goes into making this product, substitute Snickers with a product that is made with quality ingredients and is the labor of a skilled team of artisans, but available to the masses, substitute hand-crafted ale.

Ladybugs are beneficial.  If you are squeamish about "bugs", and who in software isn't (I minored in Entomology, so not I :D ), so can't appreciate the self-propagating, aphid-eating, beauties which are the "gateway insect" for so many children who grow to have a healthy relationship with their natural world, okay, substitute yeast.

In the exchange of pure information, no working product involved, worthy as in ladybugs should apparently stand out as the winner of meeting the "free" price tag.  So when should I give freely my ladybugs?  Afaic when they are going to a good garden, one that may be overrun in one corner with aphids, but not to one which has a gardener that allowed the whole garden to be overrun nor one who refuses to devote time, experiment, read the literature, well one who is clearly not a gardener, and surely to one who is worthy, one who will be ladybugging another gardener :)

Thursday, January 31, 2013

IEnumerable Still Misunderstood

With .NET 4.5 released, adopted, and leaking Task all over the code, it shocked me enough to write about a recent set of interviews in which I found that IEnumerable, the basis for .NET 3.5's biggest feature LINQ, is still misunderstood.  Three out of three recent contract candidates equate IEnumerable with List and when pressed fail over to "Well, then Collection."  If you fail to understand what is wrong with this belief, read on.

The interview question that consistently is bringing out the wrong belief that IEnumerable is a data structure follows:
We are building a distributed counter system and to reduce client messages the system receives messages such as "INC 10 /exceptions/nullReferenceExceptions" which will INCrement by 10 the counters for "exceptions" and "nullReferenceExceptions".  Don't worry about the overall implementation, we just need the bit where the counter "path" is multiplexed (if needed: meaning that a single message turns into multiple messages).  How would you code the method to extract the parts of the path?  Let me start by providing the method signature:
string[] ExtractPathParts(string path);

The answers besides highlighting whether the candidate prefers for or foreach, the answer also highlights whether the candidate understands IEnumerable and is willing to argue to change the method signature.  Unfortunately, even with prompting, the candidates are failing to understand (yes, this could be due to selection bias :/ ).

So what is IEnumerable?  IEnumerable is far from a data structure; it is an adapter for data structures or code to appear like a data structure.  The latter was more the reason for the development of LINQ and its core bit, IEnumerable.  This was the most significant early push in .NET Framework development, bringing functional programming more to the fore.
The following is an example where code, not an existing data structure, uses IEnumerable to act like a data structure:

public IEnumerable ExtractPathParts(string path)
{
  if (!string.IsNullOrEmpty(path))
  {
    const char slash = '/';
    const char whack = '\\';

    var startIndex = 0;
    var pathLength = path.Length;
    var ch = default(char);
    for(var i = 0; i < pathLength; ++i)
    {
      ch = path[i];
            if (ch == slash || ch == whack)
      {
        var subLength = i - startIndex - 1; //<< -1 to omit the slash
        if (subLength > 0) //<< this also omits the leading slash
        {
          if (startIndex > 0)
          {
            ++startIndex;
          }
          else
          {
            ++subLength;
          }
          ch = path[startIndex];
          if (ch == slash || ch == whack)
          {
            ++startIndex;
            --subLength;
          }
          yield return path.Substring(startIndex, subLength);
        }
        startIndex = i;
      }
    }

    ch = path[startIndex];
    if (ch != slash && ch != whack)
    {
      yield return path.Substring(startIndex);
    }
  }
}

The above while not being DRY, it is focused on yield return and this being the important bit in understanding IEnumerable and its importance in LINQ.  An alternative implementation follows:

public string[] ExtractPathParts(string path)
{
  if (string.IsNullOrEmpty(path)) return new string[] {};
  return path.Split('/', '\\').Where(it => !string.IsNullOrEmpty(it)).ToArray();
}

The above uses LINQ, but is missing the point.  LINQ was developed to allow developers to reduce memory allocations, trading CPU which is overly available with the increase in processors per commodity server and per consumer desktop machine.  Memory costs physically are not bounding consumers or systems engineers.  64-bit processors are making memory more addressable, so more available.  These are not the reason why memory allocations should be avoided where possible.  The reason to avoid memory allocations is that .NET, like other garbage collection based memory management models, suffer a blocking point when garbage collection occurs.

I hope that this article helps to make .NET Framework 3.5 understood more broadly.  If not, please pose questions.

Friday, December 14, 2012

Task is NOT a Panacea - Cancelled in the void


In a recent code review, I spotted the following code pattern which is like a deadlock, hard to pin down, but surely prone to issue:
protected virtual void putIntoLocalCache(string cacheKey, T obj)
{
  //put object in local cache
  var task = Task.Factory.StartNew((object item) =>
  {
    localCache.Put(cacheKey, item);
  }, obj);
}

For background sake, this method is w/i a library that is likely called from w/I IIS, either an ASP.NET HyperThread or a thread from WCF’s I/O-Thread Pool, each of which can cancel, thus cancelling any thread that they control (especially ones without anyone waiting on them).

Imagine that there is a FxCop rule that will fail the build any time a method starts a task and does not either wait on it or return it or a task that is waiting on the task, such as through Task.WhenAll().

Failing to heed this rule will result in tasks being cancelled, thus data losses or inconsistencies that we will spend great deals of time on.  AND this, like a deadlock issue only gets worse with scale.

My guidance(s) that this code brings up follow:
  • First, do no harm (to the data).  If an optimization can introduce data loss, do not perform the optimization.
  • Yes, you can optimize for the writer, but first, optimize for the reader.  Generally the writer has a much larger vested interest in the data, so is more apt to wait for the certainty of writing that data.
  • Wherever the writer is willing to fire and forget, async write, but in a manner that is visible, manageable, ie higher level task, a queue-entered workflow.
  • Read-through cache does NOT mean write to all caching layers.  Read-through means writing remote then invalidate caching layers (inversion of these steps results in a race condition).  It is a race-condition-prone implementation to write to cache concurrent to writing to the source of truth.  That said, if the remote is very distant in terms of time between the write and the eventual read back, an optimization may be introduced to write to caching layers, but must be accompanied with a short TTL, which should be equivalent to the eventual consistency SLA timespan.
  • Ensure that the cache supports TTL or forces a short-lived TTL.  Cache is not a permanent store.  Readthrough to the source of truth must be fast enough, meeting the SLA.  Cache is just an optimization.
  • Ensure that the cache can be flushed by a reconciliation process (a single-item delete is exposed).
  • If the writer is not willing to fire and forget and a write operation exceeds 3s (pluggable SLA number) under expected running conditions, break the write down into multiple component writes, using statuses on the component resources to determine the status of the composite resource.  This is basically a user-interactive workflow, ie a wizard.

In this case, I would simply await the cache write.  The cache write is fast enough.  And if it is not, that is the problem that should be addressed.  If this was writing to a remote store that is slow, ie AWS Glacier storage, then I’d figure a way to async it better.

The alternative is to bleed Task everywhere.  The interface for ICache should not be returning void here (well actually the public method that is calling putIntoLocalCache()), it should be returning Task and waiting only when a sync point needs to be introduced, ie when the result is needed.  I'm not a big fan of bleeding internals, but we'll save that for another day.

Monday, May 2, 2011

The Batch is Back - Open Scheduled Task Engine - Resources + Processes


Resources

Company

Name
string (unique)
QoSLevel
integer (ascending, sets number of times the company is slotted for execution)

Solution

Company
CompanyRef
Name
string (unique { Company, Name } )
QoSLevel
integer (ascending, sets number of times the solution is slotted for execution)
SuccessUri
Uri to send Success messages
SuccessPeriod
Number of days to retain SuccessActivity in the system, when the period has expired, the batch of activities is sent to the SuccessUri.
DeadLetterUri
Uri to send DeadLetter messages
DeadLetterPeriod
Number of days to retain DeadLetterActivity in the system, when the period has expired, the batch of activities is sent to the DeadLetterUri.

ActivityType

Solution
SolutionRef
Name
string (unique { Solution, Name } )
Verb
HTTP verb { POST, PUT, DELETE, GET }
UriPattern
Uri pattern, ie “http://api.myspace.com/music/song/{SongId}/republish”.  When executing, fields of the Parameter fill the pattern.
ParameterExpected
Description of parameter expected, used for documenting solutions
Retries
Number of times to retry failures of the activity.  When the retry count reaches or drops below zero, the activity is considered a dead letter.
ObjectIdRef
Field or fields of the Parameter that identifies the object, used for batch operations, ie updating the Uri of all Activities referencing an object.

RecurringActivity

Year
Year
Month
Month
Day
Day of month
DayOfWeek
Day of Week { 0 = Sunday, 1 = Monday, ..., 6 = Saturday }
Hour
Hour
Minute
Minute
Second
Second
ActivityType
ActivityTypeRef
Parameter
Object to fill the Activity UriPattern and be PUT or POST’d.
Retries
Number of times to retry execution of the Activity
Date/time algebra a la cron is used.  See http://www.scrounge.org/linux/cron.html

PointInTimeActivity

Date
DateTime to execute the activity
ActivityType
ActivityTypeRef
Parameter
Object to fill the Activity UriPattern and be PUT or POST’d.
Retries
Number of times to retry execution of the Activity

SuccessfulActivity

DateExecuted
DateTime the activity was executed
ActivityType
ActivityTypeRef
Parameter
Object to fill the Activity UriPattern and be PUT or POST’d.
LogDetails
Details about the Activity execution

FailedActivity

DateExecuted
DateTime the activity was executed
ActivityType
ActivityTypeRef
Parameter
Object to fill the Activity UriPattern and be PUT or POST’d.
LogDetails
Details about the Activity execution

Processes

PointInTimeActivity Executor

 Point-in-time activities are polled every 10 seconds, when an activity’s scheduled execution time has come or passed, the activity is executed.  Activity execution polling is distributed, using company/solution/activityType to partition.

RecurringActivity Executor

RecurringActivity -> PointInTimeActivity processor.  Recurring activity date time algebra is applied, creating point-in-time activities based on the recurrence schedule.  This process is executed every minute.

DeadLetter LogRoller

FailedActivity are polled every day, when an activity executed over the retention period, the activity and log details are sent to the configured Uri.

Success LogRoller

FailedActivity are polled every day, when an activity executed over the retention period, the activity and log details are sent to the configured Uri.