Tuesday, April 16, 2013

Job Control Table - Part 3 - Preventing Simultaneous Job Execution

So you're in the management console and click execute on a job, look away for a few seconds and then can't remember if you actually clicked execute or not. So you click it again, only to realize you now have two instances of the same job running simultaneously.

As the one instance is loading data the other is dropping data, and before you know it you've corrupted your data warehouse and end up doing a full reload.

Or perhaps you have a client who wants near real time loads of their data warehouse  On average the job takes 8 minutes to run so you set it to run every 10 minutes. Which is fine except for those occasions where more data than usual needs to be processed and the job takes 11 minutes to run. The next job starts before the previous one is finished and ends up again corrupting your data.

So how can we stop data services from running more than one instance of a job at a time?

The easy way is to check the SYS_AUDIT table as a first step in any ETL load. If there is a row for the job in the tablewithout an end time, then there is an instance of that job still running, and we'll want to prevent any further executions of the job until the current execution is complete.

So to do this I have created a function called IsAnotherJobRunning. And the code for it looks like this:


print('****************************************************');
print('** ');
print('** Checking for existing jobs.');
print('** ');

#First, find the status of the last job.
$L_LastJobStatus = SQL('DATAMART_SQL_TARGET_DS','SELECT STATUS FROM SYS_AUDIT_JOB WHERE AUDIT_ID = (SELECT MAX(AUDIT_ID) FROM SYS_AUDIT_JOB where JOB_NAME = {job_name()})');
print('** Last job status: {$L_LastJobStatus}.');

$L_LastEndTime = SQL('DATAMART_SQL_TARGET_DS','SELECT JOB_END_DATETIME FROM SYS_AUDIT_JOB WHERE AUDIT_ID = (SELECT MAX(AUDIT_ID) FROM SYS_AUDIT_JOB where JOB_NAME = {job_name()})');
print('** Last end time: {$L_LastEndTime}.');

#We assume that if there is an end time then the job has finished.
if($L_LastEndTime IS NULL)
begin
print('** Running job found. ');
print('** ');
print('****************************************************');
return 1; 
end
else
begin
print('** Last job ended: {$L_LastEndTime}.');
print('** No running jobs found. ');
print('** ');
print('****************************************************');
return 0;
end

I then wrap all my jobs in a conditional which calls this function.








This is how the function is called:







If the function returns 0 there are no other jobs running and the execution can proceed.

If the function finds another job running it returns a 1 and the Conditional executes the Else. In my jobs I create a simple script that just returns some basic information to the trace file.

Eg.


print('****************************************************');
print('** ');
print('** Another ' || Job_Name() || ' job appears to already be running - this job must be aborted to prevent corruption or inconsistencies.');
print('** ');
print('****************************************************');


I put this in as standard in all jobs I create as it has saved me a few times. It's also handy when the client decides that nightly loads aren't enough and they want multiple loads during the day; you can just increase the load frequency and not worry about the loads crashing into one another.

No comments:

Post a Comment