Recently as part of my project I had a chance to finally get a hands on Informatica and how it does the standard ETL. The work that I was given was mostly to validate the counts by opening up the transformations and the actual counts that were obtained after the workflows were run. It really piqued my interest and I wanted to get a real hang of it.
So I took up the course ‘Informatica PowerCenter 9.X Developer & Admin’ in Edureka thanks to sponsorship by my current company. Over 2 weeks and with 3 hrs per week I felt I have grasped sufficient information. We had a good trainer that talked us through various aspects of the tool. He kept a use case, explained us what is it we are trying to achieve and then give a walk-through of the whole process which I could then replicate on my own.
At the end of the course we were given a small project to load a data warehouse with some dimensions of type 1 and some of type 2 into a single fact table which I completed comfortably.
Having over a decade of experience in SSIS, my mind was constantly comparing both and at the moment I feel performing certain simple tasks in Informatica are very hard compared to SSIS. So here goes some of the mental notes –
Tool Overview –
- You need, brace for it, 4 separate tools to develop, deploy and monitor the ‘workflows’ a.k.a packages in SSIS.
- Informatica PowerCentre Repository Manager- The place where you need to create the project similar to how one creates project in SSIS. That’s where it ends though. One cannot do any development here. This is the tool that one mainly uses to either import/export objects. Additionally I can see that it also gives a detailed info on various objects (will talk about it in future posts) such as Data Type, Precision, Key Type etc.
- Informatica PowerCentre Designer – This is the tool where the actual development happens. Here is where you create your sources, destination, transformations/mappings. The end product out of designer is a mapplet which is a culmination of Source, Transformation and Destination. In SSIS parlance, developing a package with only Data Flow tab available.
- Informatica PowerCentre Workflow Manager – All the mappings that were created in the designer need to be brought into the workflow. Prior to that we would first create a session, select the appropriate mapping. In SSIS parlance, one can say having just the Control Flow tab available.
- Informatica PowerCentre Monitor – Once you create the workflow, here is where you need to connect and run the workflows, monitor the output, restart if needed. This plays the role of SQL Server Agent.
- It is a metadata driven tool i.e. when the tool is installed you would need to specify the database on which it needs to be hosted where it creates all the tables, stored procedures, views etc. This is called Informatica PowerCenter Administrator console.
Dislikes –
- No ‘Test Connection’ – Unlike in SSIS where sources and destinations are intrinsic to the package, here they are separate components of the project. Starting from SSIS 2012 with the advent of Project Parameters, this feature should sound familiar. The pain point though is that when you create OLE DB Sources/ Destinations there is no ‘Test Connection’ button to validate.
- Re-creation of Source/Destination – As I was detailing earlier, your core development happens in PowerCenter Designer where. Once you are done you need to open the PowerCenter Workflow, create a session for the mapplet from Designer. Here again when it comes to OLE DB Sources/ Destination you would need to create again! It doesn’t come right away.
- Disconnect-Reconnect – Let’s say you start with a mapplet. You create a workflow for it. So you will have two tools open – Designer as well as a Workflow. From the Workflow Manager you execute it and that’s about it. Let’s say you start to work on new mapplet. From the Workflow Manager you would need to disconnect and then reconnect to see the newly created workflow.
- Dynamic File Naming – It’s just so hard. In SSIS, it’s all about getting the ConnectionString attached to a variable and that’s about. Here you would need at least 4 variables to do that.
- No Loop Containers – There is absolutely no looping containers such as For Loop, Foreach Loop containers available. At this moment I don’t even know what the replacements are.
- No Stored Procedure data as a source – Out of the box this just can’t be done. Again like above it doesn’t have a straightforward way.
- No Data Viewers
Likes –
- Component Re-usability – One can create a component and have it re-used across various mapping. This is what they call Transformations.
- Ability to connect to any data source – Unlike SSIS, which is a beauty when it comes to SQL Server Destination and a beast when dealing with others, Informatica makes it a breeze no matter what source and destinations there are
The journey has just begun. In my next post I will give a detail on how to start one simple package and get it executed in Informatica in comparison with SSIS. Meanwhile I am also looking to see who I can get to work on Looping / Stored Procedure.