Monday, May 2, 2011

The Batch is Back - Open Scheduled Task Engine - Resources + Processes


Resources

Company

Name
string (unique)
QoSLevel
integer (ascending, sets number of times the company is slotted for execution)

Solution

Company
CompanyRef
Name
string (unique { Company, Name } )
QoSLevel
integer (ascending, sets number of times the solution is slotted for execution)
SuccessUri
Uri to send Success messages
SuccessPeriod
Number of days to retain SuccessActivity in the system, when the period has expired, the batch of activities is sent to the SuccessUri.
DeadLetterUri
Uri to send DeadLetter messages
DeadLetterPeriod
Number of days to retain DeadLetterActivity in the system, when the period has expired, the batch of activities is sent to the DeadLetterUri.

ActivityType

Solution
SolutionRef
Name
string (unique { Solution, Name } )
Verb
HTTP verb { POST, PUT, DELETE, GET }
UriPattern
Uri pattern, ie “http://api.myspace.com/music/song/{SongId}/republish”.  When executing, fields of the Parameter fill the pattern.
ParameterExpected
Description of parameter expected, used for documenting solutions
Retries
Number of times to retry failures of the activity.  When the retry count reaches or drops below zero, the activity is considered a dead letter.
ObjectIdRef
Field or fields of the Parameter that identifies the object, used for batch operations, ie updating the Uri of all Activities referencing an object.

RecurringActivity

Year
Year
Month
Month
Day
Day of month
DayOfWeek
Day of Week { 0 = Sunday, 1 = Monday, ..., 6 = Saturday }
Hour
Hour
Minute
Minute
Second
Second
ActivityType
ActivityTypeRef
Parameter
Object to fill the Activity UriPattern and be PUT or POST’d.
Retries
Number of times to retry execution of the Activity
Date/time algebra a la cron is used.  See http://www.scrounge.org/linux/cron.html

PointInTimeActivity

Date
DateTime to execute the activity
ActivityType
ActivityTypeRef
Parameter
Object to fill the Activity UriPattern and be PUT or POST’d.
Retries
Number of times to retry execution of the Activity

SuccessfulActivity

DateExecuted
DateTime the activity was executed
ActivityType
ActivityTypeRef
Parameter
Object to fill the Activity UriPattern and be PUT or POST’d.
LogDetails
Details about the Activity execution

FailedActivity

DateExecuted
DateTime the activity was executed
ActivityType
ActivityTypeRef
Parameter
Object to fill the Activity UriPattern and be PUT or POST’d.
LogDetails
Details about the Activity execution

Processes

PointInTimeActivity Executor

 Point-in-time activities are polled every 10 seconds, when an activity’s scheduled execution time has come or passed, the activity is executed.  Activity execution polling is distributed, using company/solution/activityType to partition.

RecurringActivity Executor

RecurringActivity -> PointInTimeActivity processor.  Recurring activity date time algebra is applied, creating point-in-time activities based on the recurrence schedule.  This process is executed every minute.

DeadLetter LogRoller

FailedActivity are polled every day, when an activity executed over the retention period, the activity and log details are sent to the configured Uri.

Success LogRoller

FailedActivity are polled every day, when an activity executed over the retention period, the activity and log details are sent to the configured Uri.

The Batch is Back - Open Scheduled Task Engine - Case Study, Myspace Music


Myspace's architecture provides for real-time publication of the music catalog to internal systems via a queue-based solution, Populator processors.  Changes to Artist, Album, and Song within the database are queued for publication to the cache.  The state of an object in cache closely resembles the record and related records in the database.  Business rules are applied in real-time within the Music API to ensure that the state of the object is rendered properly with regard to the viewer's territory and the current time.
Complexity within the Music API is mostly derived from the fact that the state of an Album and Song held within cache includes not only the currently valid state, but also upcoming states.  While rendering a single Album or Song, filtering down to the currently valid rights is not complex in itself, but is spread across multiple components owned by different teams.
Storing not only current, but upcoming valid objects in cache is further complicated while rendering multiple results, ie an Artist's Songs or search results.  Complications arise as the indexes either need to store all factors related to an objects validity or render holes within results.  The factors require SQL-like filtering, so render the indexes moot.  Results require skipping code which is further complicated by mixed page-based and ordinal-based indexes.
In addition to internal systems, Myspace Music performs catalog syncs with multiple business partners.  While it is reasonable and somewhat performant to require internal systems to apply runtime rendering rules, it is not performant to require external systems to callback to Myspace to determine if an object is valid.
By publishing only the currently valid state of Album and Song, Myspace internal and external systems need only apply business rules at publication time.  Internal indexes contain only valid objects providing for no query cost, reducing CPU, increasing performance.  And external systems contain only valid objects providing no query cost, which not only reduces CPU, but also requires no network cost, increasing performance and reducing (read: removing) linking to 404's, Myspace pages for Albums and Songs that cannot play in the end-user's territory at the time.
To accomplish this improvement to the Album and Song publication, existing Myspace Populator processors for add and update were altered to store only the currently valid state of the object in cache and to schedule point-in-time execution of a re-publication at the times that the object's state would change, the critical dates being when the rights change for each territory.  In addition to altering the Populator processors, endpoints were created, using Myspace ServiceLayer, for the activity of re-publishing an object.

The Batch is Back - Open Scheduled Task Engine - Features at a a Glance


Open Scheduled Task Engine (OSTE) is a service that enables solutions to defer execution of arbitrary activities which are executable through a single HTTP request.  OSTE allows for programmatic scheduling of activities at a point-in-time or at a recurrence pattern.  OSTE activities are segregated by company/activityType allowing for differing quality of service (QoS) requirements and audit, throttling, and isolation of activity execution.  OSTE activities can be defined to be retried n times.  On failure, an OSTE activity and failure details are stored and accessible via the OSTE management interface.  Failed activities can be re-enqueued (after the cause of failure has been resolved.)  Activities can be updated, for instance updating the target URI.  Activities are defined by an HTTP verb, URI pattern.  When instantiated, the URI pattern is filled with the objects properties and the object is PUT or POST'd (GET and DELETE do not provide for a message body.)

The Batch is Back - Open Scheduled Task Engine - Background


Job Schedulers were the norm for enterprises internal and especially external long-running and business-critical processes.  Processes executed by job schedulers are also often termed batch processes and fell out of fashion when near-real time and real-time processing became possible due to web service exposed, queue-based services.
While real-time processing has succeeded in removing some use of job schedulers, all operating systems, most database management systems, and some middleware provide implementers job schedulers.  Job schedulers act, in this way, as a recurring triggering point for execution of activities when the driving force is the passage of time, not user-interaction.
Job schedulers provided by operating systems and database management systems are bound to a single machine so do not provide an apparent means to distribute execution across a set of machines.  Additionally, activity execution via traditional job schedulers is bound to the available commands of the execution environment, ie a Bash script via cron or SQL command via SQL Server's job scheduler.  Job schedulers also do not provide an apparent means to schedule point-in-time execution.  And job schedulers do not provide for programmatic scheduling, an API.
The Open Scheduled Task Engine extends the capabilities of traditional job schedulers, making distributed execution and point-in-time scheduling ostensible and available for programmatic access via a RESTful API.  The Open Scheduled Task Engine is open in that it does not execute processes; processes are distributed and executed by sending HTTP requests to services.