SSIS-Reading datetime stamped flat files

One of the most often asked interview question that I have encountered goes like this –

I have list of flat files in a folder with datetime stamp in it and  I would like to read only current month’s file. How would you do it? A similar variation of the question was posed in StackOverFlow and I thought why not just write a post on it. Though this particular question slightly differs from the problem statement i.e. in this instance the ask is to iterate through all the files from oldest time stamped file to the newest time-stamped file which is the logic I have written for.

At the high level, idea is to –

  1. Get all the files present in the directory.
  2. Loop through each file and store it’s metadata in a dictionary. Basically dictionary is a collections object that stores values as a Key/Value pair. In our case the Key would be ‘Date Time’ part of the file name and the Value would be full file path of the file name.
  3. Load all the Keys from the dictionary to a List.
  4. Order the List values.
  5. Loop through each value in the List and get the corresponding Value of that Key from Dictionary and store them in another List say orderedFiles.
  6. Assign the orderedFiles value to a SSIS object.
  7. Use this SSIS object as the new reference to loop in the For Each Loop container by using the Enumerator ‘Foreach From Variable Enumerator’

The Steps(1) – (6) are achieved via Script Task transformation. Let’s see how this works.

Firstly I created bunch of files with time-stamp in the suffix of the form YYYY_MM_DD.txt as shown below –

0_Sample Files.png

Create a New Package and drag Script Task Transform. Create three variables as shown below –

7_Variables Declaration

  • objOrderedFiles – Stores the list of files containing file names in a sorted order i.e oldest to newest.
  • strFilePath – Path where the files are stored.
  • strFullFilePath – Full file path of the file.

In my Script Task editor, I passed ‘strFilesPath’ as a ReadOnlyVariable and ‘objOrderedFiles’ as a ReadWriteVariable as shown below-

4_PassingVariablesToScriptTask

Now click on the ‘Edit Script’ button. In the Window that opens, we need to ensure that the target framework is 3.5 for our code to work. Click on ‘Project’ tab and select the last menu option that ends with ‘Properties…’ as shown below –

1_Changing Reference

In Application tab under Target Framework select the option ‘.NET Framework 3.5’ as shown below –

2_Changing Reference

You will get the following dialog box, click ‘Yes’

3_Changing Reference

Now in the ‘ScriptMain.cs’ file, under the main() paste the following code –


string strFilesPath = Dts.Variables["User::strFilesPath"].Value.ToString();

Dictionary<DateTime, string> dictionary = new Dictionary<DateTime, string>();
List<string> orderedFiles = new List<string>();
string[] files = {} ;

// get the all files metadata from the directory
if(Directory.Exists(strFilesPath))
files = Directory.GetFiles(strFilesPath, "bookreview_daily_*.txt", SearchOption.AllDirectories);

string strFilePath;
string strFileName;
DateTime dt;

// get all the file names and datetime present in it as a Key Value pair in the Dictionary
foreach (string file in files)
{
strFilePath = file;
strFileName = Path.GetFileNameWithoutExtension(file);

// this below gets the file name in required date time format by getting the suffix
// into datetime. ex - gets the value 2016-01-02 as date from bookreview_daily_2016_01_02
dt = DateTime.Parse(strFileName.Replace("bookreview_daily_", "").Replace("_", "-"));

dictionary.Add(dt, strFilePath);
}

// get keys into list
var list = dictionary.Keys.ToList();
// sort the list
list.Sort();

foreach (var val in list)
{
// loop through the list and add the ordered file data
// into the list
orderedFiles.Add(dictionary[val]);
}

//assign it to the user defined object
Dts.Variables["User::objOrderedFiles"].Value = orderedFiles;

Dts.TaskResult = (int)ScriptResults.Success;

Drag the ‘Foreach loop transform’ and connect it to the above script task. Double-click the transform to edit the options. Go to Collection and set the Enumeration Type as ‘Foreach from Variable Enumerator’ and under ‘Variable’ chose User::objOrderedFiles as seen below –

5_ForEachLoop container Choosing Enumerator

In the Variable Mappings window, select the variable User::strFullFilePath as seen below –

6_ForEachLoop container assigning variable.png

That’s it. Now the all the files that gets looped would be from the ‘Oldest to Newest’. I put another Script Task transformation that just pops up the file that is being read.

Here is the outline of the package –

8_PackageOutline

Once you run, you see that the first file that’s popped is the oldest one –

9_PackageResult

Now, if the question is about only getting the current months file, then we just need to modify the logic during the looping of the files in the list to check if it’s a current month’s date and pretty much everything else is the same.

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s