Wednesday, May 29, 2013

Tip of the day - How to tell if your job has completed

If you are still staring at your trace file trying to figure out of the job you ran in designer has finished yet, then you may find today's post very helpful.

Go to Tools --> Options...

Then break out Designer and click on General.

Then click the box that says: Show Dialog when job is completed.


Now whenever your job completes, you'll get a little dialog box popping up to let you know.







Friday, May 24, 2013

Tip of the day - Seeing the full name of your object

One of the annoying defaults in Data Services is that all the names in workflows or dataflows are cut off after 17 characters.



 So to fix this go to Tools -> Options

Then break out Designer and click on General.

Where is says: Number of characters in workspace name. Change the number 17 to 100. Click OK when it asks you if you want to overwrite the job server parameters.



Now you be able to see the full names of objects in your workspace.





Tuesday, May 7, 2013

Creating a schedule with a file dependency

So last post I talked about how to create schedules where they were dependent upon other jobs being completed first.

In this post I will show you how to make you job dependent on a file. This is really useful where your ETL needs to wait for an external process to complete first before it can run.

To do this I created a function called WaitForFileDependency which you can take a look at below:

#First check if the overide parameter has been set. If it hasn't been set then assume it should be false.

$P_Override = nvl($P_Override,'FALSE');

#Print some details about the file we are looking for in the trace file
print('File Exists: ' || file_exists($P_PreRequisiteFile));
print('File Create Date: ' || get_file_create_date( $P_PreRequisiteFile , '1900.01.01 00:00:0000'));
print('Last Job Run Date: ' || $P_LastRunDate) ;
print('Time Out Time: ' || $P_TimeOut);

#If the override variable was false, then start checking to see if the file is present.
if ($P_Override = 'FALSE')
begin
print('Dependency file ' || $P_PreRequisiteFile );
                while (
(
                                #If the file is not yet present
(file_exists($P_PreRequisiteFile) = 0) 
   or 
                                #Or the file is present but its from a previous run and so is an old file
(get_file_create_date( $P_PreRequisiteFile , '1900.01.01 00:00:0000') < $P_LastRunDate) 
 )
                and 
                                        #And we have not yet reached our timeout time
systime() < $P_TimeOut
)
begin
print('Waiting for file ' || $P_PreRequisiteFile || ' to arrive.');
                                #The the file hasn't arrive yet and you should wait a while and then check again.
sleep(300000);
end

               #If we reach the timeout time and the file has not appeared yet then fail the job
if (systime() > $P_TimeOut)
begin
print('Current Time: ' || systime() || '> Timeout time: ' || $P_TimeOut);
raise_exception( 'File not found before timout reached. Job will now fail');
end

                #If the file exists, and it was created since the last time we ran the job, then you can start running the current job
if ((file_exists($P_PreRequisiteFile) = 1) 
   AND
(get_file_create_date( $P_PreRequisiteFile , '1900.01.01 00:00:0000') > $P_LastRunDate) )
begin
print('File ' || $P_PreRequisiteFile || ' found');
print ('Starting current job.');
end
end 
else 
begin
                #If the override variable was set to true, then just start running the job without waiting for the file to appear
print ('Dependency Override in effect. Will begin job immediately.');
end
Return 1;

The call to the function looks like this:

WaitForFileDependency($G_PreRequisiteFile,$G_Override_Dependency,$L_LastCompleteDate,to_date('1900.01.01' || $G_TimeOut,'YYYY.MM.DD HH24:MI'));


  • $G_PreRequisiteFile - This is the file, with its path, that you are waiting to appear
  • $G_Override_Dependency - If its TRUE then start the job immediatly, but if its FALSE then wait for the file to appear
  • $L_LastCompleteDate - This is the last time the job was executed
  • $G_TimeOut - This is the time of day you want to stop looking for the file and fail the job. If you normally expect the file to arrive by 5:00am, and its still not present by 6:00pm, then you probably want to stop waiting for the file to arrive and alert people to this fact. 

Friday, May 3, 2013

Job Control Table - Part 5 - Sequencing Jobs

One of the things missing from Data Services is the ability to chain jobs together so that as one job finishes the next one starts. While you could put the functionality of each job into a workflow and then create a large job that executes the whole thing, what if you don't want that?

Often it's safer to have a few little jobs, that way if one little job fails the whole process is not brought down, just that one job. Also, we might have a job that loads all the dimensions, and many jobs for the various fact tables. Another scenario could be if you need to load an ODS first, and then build data marts on that ODS data.

Well as the DS scheduler doesn't give us the option of creating job dependencies, I have created my own method, using the SYS_AUDIT_JOB table as my guide.

As seen in the previous posts on the job control table, we can always tell by looking at the SYS_AUDIT_JOB table whether or not a job has run on a particular day. We have a start time, and end time, and we have a job status of STARTED, COMPLETE or FAILED. So we can use this information to create functions that will allow us to make a job wait until a previous job has completed.

The main function I have created is called WaitForJobDependency. This stops a job from executing until its prior jobs has completed. I call it in my Initialise script like this:


#Wait for prerequisite jobs to complete
WaitForJobDependency('JOB_ODS',$G_Override_Dependency );
WaitForJobDependency('JOB_DIMENSIONS',$G_Override_Dependency );

So the above is tell my job to wait for 2 jobs to finish before it runs. The global variable is there in case I want to override the wait, and make my job run immediately.

The function looks likes this:

#First check if the override value is set, if its been left off them assume its false. If true jump to the else part of the if statement
$P_Override = nvl($P_Override,'FALSE');

if ($P_Override = 'FALSE')
begin
#If the status of the previous job is not COMPLETE, FAILED or ERRORS then wait another 5 minutes and check again. The job will continue checking indefinitely until one of the job statuses are met.
while ( PreRequisiteJobStatus($P_PreRequisiteJob) != 'COMPLETE'
   and PreRequisiteJobStatus($P_PreRequisiteJob) != 'FAILED'
and PreRequisiteJobStatus($P_PreRequisiteJob) != 'ERRORS')
begin
print('Dependency job ' || $P_PreRequisiteJob || ' status: ' || PreRequisiteJobStatus($P_PreRequisiteJob));
print('Waiting for job ' || $P_PreRequisiteJob || ' to complete.');
sleep(300000);
end

#If the job status of the previous job was FAILED or ERRORS then it failed, and this job will fail too to prevent you further corrupting that data by running your job of of incomplete data.
if (PreRequisiteJobStatus($P_PreRequisiteJob) = 'FAILED' OR PreRequisiteJobStatus($P_PreRequisiteJob) = 'ERRORS')
begin
print('Job dependency ' || $P_PreRequisiteJob || ' failed  or contained errors.');
print('Aborting current job.');
raise_exception( 'Job dependency failed');
end

#If the previous job came back as complete, then you are good to start your current job.
if (PreRequisiteJobStatus($P_PreRequisiteJob) = 'COMPLETE')
begin
print('Dependency job ' || $P_PreRequisiteJob || ' status: ' || PreRequisiteJobStatus($P_PreRequisiteJob));
print ('Starting current job.');
end
end
else

#This just lets the user now that they put the override dependency into effect and the job will start immediately without checking for any previous jobs.
begin
print ('Dependency Override in effect. Will begin job immediately.');
end

Return 1;



You May have noticed i referenced a function called PreRequisiteJobStatus above. This a fairly simple function that just returns the status of the prerequisite job like this:


$L_Status = nvl(sql('DATAMART_SQL_TARGET_DS','
Select STATUS from dbo.sys_audit_job where JOB_NAME = {$P_PreRequisiteJob}
                                             and cast(JOB_END_DATETIME as DATE) = cast(GETDATE()as DATE)
and AUDIT_ID = (Select Max(AUDIT_ID) from sys_audit_job where JOB_NAME = {$P_PreRequisiteJob})
'), 'WAITING');
Return $L_Status;


Now all you need to do is schedule all your jobs to start around the same time, and these functions will make sure that they run in the correct order.