For long I always wanted to be involved in Meetups or User Groups to meet like minded people, hear great lectures and gain knowledge. So on searching around I found the group ‘Sydney Microsoft Analytics and Data Science’ and immediately joined it whose next meetup was on a topic that had a bit of mystique around it – Power BI and Machine Learning.
The mere mention of the word Machine Learning conjures up images of unending machines in a warehouse in an underground bunker manned by security guards and a team of rag-tag scientists mulling around outside watching giant screen and monitoring people. Something similar to the way Mr.Robot sends some code and kaboom a company goes down or the way the ‘Machine’ of Person of Interest is programmed by Mr.Finch. Anyway I am drifting myself. Point I am getting at is Machine Learning seems to be a very abstract topic that I still need to get my head around it.
So having Power BI and Machine Learning at one place intrigued me. The session was organized by Grant Paisley who somehow reminded me of one of the important character from Outlander. I reached the place 10 minutes to the scheduled time and could see lot of people already there.
Grant started off with a bit of apology owing to the fact that his laptop on which he had his presentation had some problem and now he is going to make use of one of his friends and an attendee. So he started off explaining the general outline of what the talk is going to be, the end result that we are to expect and the way to go about getting it.
The PowerBI application showed the variance of house real estate prices containing about 20 different parameters on which a user can analyze, the chief filter being the Location hierarchy. It was a very impressive dashboard having lot of detailed analysis and multiple pages. It looked pretty complex. Grant told us this was a project he had done for a banking client to give them an understanding of their customer’s expectations.
The next steps went as follows –
- Overview of the raw data was given. The raw data that was shown had about at least 600 columns with more than 1000 rows per column. 3 to 4 columns formed a group and so there were say 200 groups.
- Setup excel sheet was demoed which contained the various parameters that Power BI would be utilizing as an input to dice the data. An excellent idea which hitherto I didn’t think was possible. I mean I always thought the filters on Power BI are the only ones available to play the data with. It was bit of news to me that the Power BI dashboard can be passed parameters.
- Raw data was first loaded into Power BI desktop.
- A copy of Raw data was dumped as a new dataset. Excepting the headers all the detail row data was deleted.A new column was added as header and so for all the 600 odd header columns that were there, each had a column name starting with Column1 to Column600. Data was then transposed i.e. unpivoted. So we had Column1, Header1,Attribute1
- Similar approach was followed next but this time all the details data was retained and headers deleted. A link was then established between Header data and Detail data along with a Date Dimension.
- A flat file was then generated as an output from Power BI which was then fed into Azure Machine Learning on the cloud. This was the part where it got over my head. They used some data mining algorithms to dissect the data further with an explanation of each step.
Grant then gave us an explanation on white board of how the whole process involved. By that time the clock had ticked an hour.
The session was very engaging and the unpivoting the data and the ability to dissect a non-tabular raw data gave me a very good insight into the whole range of possibilities that are possible with Power BI using which I intend to do my first foray into strengthening my knowledge on it.
It was an engaging session, very captive audience. I had a very good time.