One of my standard interview questions that stumps nearly every-one is-
Which two objects have the 'Execute Only Once' property, and what does this do?
And of the very few people that get this question right, the next question completely stumps them -
Why would you want to set an object to 'Execute Only Once'?
So, the first part of the question is easy to answer, in that data flows and work flows have the execute only once property. Right click on any work flow or data flow and take a look:
So what does this do? Well, pretty much what it says on the box, it only allows an object to execute once within a single run of a job.
So if we set this property to true for the DF_CUSTOMERS_Load data flow, it would only execute once, despite being in the job six times:
I know you would never create a job like this in real life, so when would be a good time to use this property?
Well lets say you have a job that loads 3 fact tables, eg FACT_SALES, FACT_RETURNS and FACT_ENQUIRIES, and each one of these facts shares a common dimension called DIM_CUSTOMER.
Now you could build your job by running all the staging tables, then all the dimension tables and then all the fact tables:
This method of building jobs has 3 disadvantages:
- You can't start building any fact until ALL the dimensions have been loaded. It may be that your slowest fact table build only requires the fastest dimension build to complete before it can start, so you are wasting time by making that fact table wait for all the dimensions to be built first before it can run.
- Lets say you want to split out the 3 fact tables into separate jobs? You'd have to go through each fact data flow and click 'View Where Used' on each source table to make sure you get all the right dimensions and their associated data flows out of the job and into the new job.
- When it comes to debugging, if you have a bug in your fact table, you'll need to run all the staging and dimension tables, even if they aren't needed for the fact you are debugging.
So how should you do it differently and how does the 'Execute only once' property help?
Instead of having 3 big workflows for all the staging, then dimension and then fact builds, I would suggest creating a workflow for each complete table build and calling that workflow a component. Within each component will be two further work flows, one containing all the dependencies for that table ie the components for the dimension tables, and one containing the actual data flows needed to build that table.
Perhaps a few pictures would help.
So first off I'd create 3 work flow components called C_FACT_SALES, C_FACT_RETURNS and C_FACT_ENQUIRIES. My definition of a component is a work flow containing everything that is required to build the table it refers to.
Within each component create two work flows. One of these work flows will contain all the data flows needed to build the table itself, and the other will contain all the components needed to build the tables that need to be built first before the table build can start - usually all the staging and dimension tables needed for that fact table.
So in the above example the work flow WF_FACT_SALES_Dependencies would look like this:
Each of the components above will have the '
Execute only once' option set. So the C_DIM_CUSTOMER component can be in each of the fact table components, but it will only get executed once.
So in this example the data flows to build the FACT_SALES table will be able to run as soon as the 4 component work flows above have completed. We can also now also run the C_FACT_SALES component on its own and get everything we need to build the FACT_SALES table.
I create all my Data Services using this component methodology, taking full advantage of the '
Execute only once' option. It means that my jobs:
- Run faster
- Allow for a quick view of the dependencies required to build a table
- Are easier to split into separate jobs
- And are easier to debug as I can just run the offending component, rather than the entire job