SSIS-Reading datetime stamped flat files

One of the most often asked interview question that I have encountered goes like this –

I have list of flat files in a folder with datetime stamp in it and  I would like to read only current month’s file. How would you do it? A similar variation of the question was posed in StackOverFlow and I thought why not just write a post on it. Though this particular question slightly differs from the problem statement i.e. in this instance the ask is to iterate through all the files from oldest time stamped file to the newest time-stamped file which is the logic I have written for.

At the high level, idea is to –

  1. Get all the files present in the directory.
  2. Loop through each file and store it’s metadata in a dictionary. Basically dictionary is a collections object that stores values as a Key/Value pair. In our case the Key would be ‘Date Time’ part of the file name and the Value would be full file path of the file name.
  3. Load all the Keys from the dictionary to a List.
  4. Order the List values.
  5. Loop through each value in the List and get the corresponding Value of that Key from Dictionary and store them in another List say orderedFiles.
  6. Assign the orderedFiles value to a SSIS object.
  7. Use this SSIS object as the new reference to loop in the For Each Loop container by using the Enumerator ‘Foreach From Variable Enumerator’

The Steps(1) – (6) are achieved via Script Task transformation. Let’s see how this works.

Firstly I created bunch of files with time-stamp in the suffix of the form YYYY_MM_DD.txt as shown below –

0_Sample Files.png

Create a New Package and drag Script Task Transform. Create three variables as shown below –

7_Variables Declaration

  • objOrderedFiles – Stores the list of files containing file names in a sorted order i.e oldest to newest.
  • strFilePath – Path where the files are stored.
  • strFullFilePath – Full file path of the file.

In my Script Task editor, I passed ‘strFilesPath’ as a ReadOnlyVariable and ‘objOrderedFiles’ as a ReadWriteVariable as shown below-

4_PassingVariablesToScriptTask

Now click on the ‘Edit Script’ button. In the Window that opens, we need to ensure that the target framework is 3.5 for our code to work. Click on ‘Project’ tab and select the last menu option that ends with ‘Properties…’ as shown below –

1_Changing Reference

In Application tab under Target Framework select the option ‘.NET Framework 3.5’ as shown below –

2_Changing Reference

You will get the following dialog box, click ‘Yes’

3_Changing Reference

Now in the ‘ScriptMain.cs’ file, under the main() paste the following code –


string strFilesPath = Dts.Variables["User::strFilesPath"].Value.ToString();

Dictionary<DateTime, string> dictionary = new Dictionary<DateTime, string>();
List<string> orderedFiles = new List<string>();
string[] files = {} ;

// get the all files metadata from the directory
if(Directory.Exists(strFilesPath))
files = Directory.GetFiles(strFilesPath, "bookreview_daily_*.txt", SearchOption.AllDirectories);

string strFilePath;
string strFileName;
DateTime dt;

// get all the file names and datetime present in it as a Key Value pair in the Dictionary
foreach (string file in files)
{
strFilePath = file;
strFileName = Path.GetFileNameWithoutExtension(file);

// this below gets the file name in required date time format by getting the suffix
// into datetime. ex - gets the value 2016-01-02 as date from bookreview_daily_2016_01_02
dt = DateTime.Parse(strFileName.Replace("bookreview_daily_", "").Replace("_", "-"));

dictionary.Add(dt, strFilePath);
}

// get keys into list
var list = dictionary.Keys.ToList();
// sort the list
list.Sort();

foreach (var val in list)
{
// loop through the list and add the ordered file data
// into the list
orderedFiles.Add(dictionary[val]);
}

//assign it to the user defined object
Dts.Variables["User::objOrderedFiles"].Value = orderedFiles;

Dts.TaskResult = (int)ScriptResults.Success;

Drag the ‘Foreach loop transform’ and connect it to the above script task. Double-click the transform to edit the options. Go to Collection and set the Enumeration Type as ‘Foreach from Variable Enumerator’ and under ‘Variable’ chose User::objOrderedFiles as seen below –

5_ForEachLoop container Choosing Enumerator

In the Variable Mappings window, select the variable User::strFullFilePath as seen below –

6_ForEachLoop container assigning variable.png

That’s it. Now the all the files that gets looped would be from the ‘Oldest to Newest’. I put another Script Task transformation that just pops up the file that is being read.

Here is the outline of the package –

8_PackageOutline

Once you run, you see that the first file that’s popped is the oldest one –

9_PackageResult

Now, if the question is about only getting the current months file, then we just need to modify the logic during the looping of the files in the list to check if it’s a current month’s date and pretty much everything else is the same.

 

Adding Record Count to flat file

I am writing this post in response to one of the questions posed in StackOverFlow. The task is to add record count info a generated flat file with some static text.

Let’s create one simple package to start with.

Firstly we will need two variables –

  • intRecordCount – Data Type – Int – to store the record count
  • strDestinationFullFilePath – Data Type – String – this holds the full file path of the destination file

I have two tasks in it as shown below –

1_Control Flow Tasks

  1. Data Flow Task – Get Sample Data – This task has three Data Flow Tasks in it-
    • OLE DB Source – OLE_SRC – Get Data. The source data is a simple static value that I have hardcoded which returns Id and Name as shown below
    • Row Count – assign the value to the variable intRecordCount.
    • Flat File Destination – FFD_Load Data
  2. Script Task  – Append Record Count. – This task does the job of reading through the file and appending the record count.

SELECT 1 AS 'Id', 'Karthik' AS 'Name'
UNION ALL
SELECT 2 AS 'Id', 'Pratap' AS 'Name'
UNION ALL
SELECT 3 AS 'Id', 'Ravi' AS 'Name'
UNION ALL
SELECT 4 AS 'Id', 'Gautam' AS 'Name'

 

 

Let’s look bit more into the script task as that is where we need to do the actual stuff. The first step is to first have the two variables as part of ReadOnlyVariables list as shown below –

2_ScriptTask_AddingVariables

Now in the the actual script, add the library System.IO in the using section. Now under Main() paste the following code –


public void Main()
{
string strDestinationFullFilePath = Dts.Variables["User::strDestinationFullFilePath"].Value.ToString();
string strRecordCount = Dts.Variables["User::intRecordCount"].Value.ToString();

using (StreamWriter sw = File.AppendText(strDestinationFullFilePath))
{
sw.WriteLine("Total Records - " + strRecordCount);
}

Dts.TaskResult = (int)ScriptResults.Success;
}

That’s it. Now execute the package and check the results by opening the resultant data. That’s how you can append the trailer records with the record count at the end of the text.

3_Results