Monday, May 2, 2011

The Batch is Back - Open Scheduled Task Engine - Case Study, Myspace Music


Myspace's architecture provides for real-time publication of the music catalog to internal systems via a queue-based solution, Populator processors.  Changes to Artist, Album, and Song within the database are queued for publication to the cache.  The state of an object in cache closely resembles the record and related records in the database.  Business rules are applied in real-time within the Music API to ensure that the state of the object is rendered properly with regard to the viewer's territory and the current time.
Complexity within the Music API is mostly derived from the fact that the state of an Album and Song held within cache includes not only the currently valid state, but also upcoming states.  While rendering a single Album or Song, filtering down to the currently valid rights is not complex in itself, but is spread across multiple components owned by different teams.
Storing not only current, but upcoming valid objects in cache is further complicated while rendering multiple results, ie an Artist's Songs or search results.  Complications arise as the indexes either need to store all factors related to an objects validity or render holes within results.  The factors require SQL-like filtering, so render the indexes moot.  Results require skipping code which is further complicated by mixed page-based and ordinal-based indexes.
In addition to internal systems, Myspace Music performs catalog syncs with multiple business partners.  While it is reasonable and somewhat performant to require internal systems to apply runtime rendering rules, it is not performant to require external systems to callback to Myspace to determine if an object is valid.
By publishing only the currently valid state of Album and Song, Myspace internal and external systems need only apply business rules at publication time.  Internal indexes contain only valid objects providing for no query cost, reducing CPU, increasing performance.  And external systems contain only valid objects providing no query cost, which not only reduces CPU, but also requires no network cost, increasing performance and reducing (read: removing) linking to 404's, Myspace pages for Albums and Songs that cannot play in the end-user's territory at the time.
To accomplish this improvement to the Album and Song publication, existing Myspace Populator processors for add and update were altered to store only the currently valid state of the object in cache and to schedule point-in-time execution of a re-publication at the times that the object's state would change, the critical dates being when the rights change for each territory.  In addition to altering the Populator processors, endpoints were created, using Myspace ServiceLayer, for the activity of re-publishing an object.

No comments:

Post a Comment