Monday, October 20, 2014

Files as Data Sources - Introduction

When files are the source of your data there are a few things to take into consideration:

  • When will the files arrive?
  • How will you keep track of the files processed?
  • How do you relate data in your target tables with the original file?
  • What if multiple files arrive for the same data set?
  • What should you do with files once they've been processed
  • What happens if there is an error with the file?
  • What should you do if a file doesn't arrive?
For this blog series I am going to go through all the points above. Most of what I discuss will be relevant for both standard text files as well as xml and Excel files.

Here is the standard structure I use when processing a file through Data Services:




So firstly I used a conditional to check whether a file has arrived or not. I've created my own function, IsFileAvailable, to look out for the file arrival. Yes, I could have used the built in function wait_for_file, but there is additional functionality that you might find useful that I have built into IsFileAvailable.

If the file arrives I print that out to the trace file. You don't have to do this, but I just find it easier to see whats going on that way.

Then I place the data flow that will process the file within a try catch block. This is so that I can handle any file errors without bringing down the entire job. Within that error handler I can report the file errors to the owner\s of the file and move the file to an error file location.

In the else section of the conditional I place some code to handle the file not arriving.

Over the next few posts I'll break out the detail of how each of the above pieces works.


1 comment: