DIY Machine Learning – Supervised Learning

When I first heard about supervised learning I had a picture in my head of a kindergarten class with a teacher trying to get the small humans to read. And perhaps that isn’t a bad analogy when talking about Machine Learning in general as it is based on the same principles as school, repetition and trial. After that the analogy falls apart though when you get to the specific criteria needed for Supervised Learning. There are two broad categories for types of machine learning which have the binary descriptions of supervised learning, which fall into the binary categories of Supervised and Unsupervised. This means you only have to know the one set of criteria for supervised learning, to determine which type you need.

Training Data

A problem solved with supervised learning will have a well-defined set of variables for its sample data and a known outcome choice. Unsupervised learning has an undefined set of variables as the task is to find the structure from data where it is not apparent nor is the type of outcome known. An example of Supervised learning would be determining if email was spam or not. You have a set of emails, which you can evaluate by examining a set of training data and you can determine using the elements of the email such as recipient, sender, IP, topic, number of recipient, field masking and other criteria to determine whether or not the email should be placed in the spam folder. Supervised learning is very dependent upon the training data to determine a result, as it uses training data to determine the results. Too much training and your experiment starts to memorize the answers, rather than developing a technique to derive solutions from them.

When Supervised Learning Should be employed in a Machine Learning Experiment

As the field of data science continues to proliferate, more people start are becoming interested in Machine Learning. Having the ability to learn with a free tool like Azure Machine Learning helps too. Like many tools while there are many things you can do, so knowing when you should do something is a big step in the right direction. While unsupervised learning provides a wide canvas for making a decision, creating a successful experiment can take more time as there are so many concepts to explore. If you have a good set of test data and a limited amount of time to come up with an answer, the better solution is to create a supervised learning experiment. The next step in the plan is to figure out what category the problem uses, a topic I plan to explore in depth in a later post.

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

 

Configuring Power BI Data Refresh on a Local SQL Server Database

Recently I have been tasked with setting up and assisting other in setting up Office 365 Power BI Automatic Data Refresh. If you scroll through this post, you’ll see that there are detailed steps required to accomplish a successful Power BI Data Refresh with an on site version of SQL Server. There are a lot of people who are very frustrated trying to configure Automated Data Refresh. Considering what is involved, I understand why. Follow these steps, and Automated Data Refresh for Power BI Office 365 with a SQL Server located locally will work.

TL;DR Steps for Power BI Automated Data Refresh

This is a long document because I wanted to make sure that all confusion regarding how to make this work took a lot of words. If you don’t want to read them all, here are the steps needed to complete a successful data refresh. I have a feeling this level of detail may not be enough, some people may figure it out after reading this far.

  • Install the Data Management Gateway application on your SQL Server box.
  • Add a gateway Power BI Admin Center and configure the Data Management Gateway.
  • Create a connection string and a Power BI workbook which uses the connection string
  • Create a Data Source in Power BI Admin Center using the connection string created in the previous step
  • Schedule Data Refresh using the Power BI Workbook using a matching connection string.

 

Power BI Office 365 Automated Data Refresh Steps

Here in detail is how I can get Automated Data Refresh to work every time. If you have figured out another way to make it work not using these steps for an on-premise SQL Server data refresh, please let me know how you did it. I know these steps work every time, but I am hoping that there is an easier way.

  1. Install the Microsoft Data Management Gateway on your SQL Server Database Server. Here is more information on the Data Management Gateway.Technically, you don’t have to have it installed here as Microsoft says it is supported installed in other places. If you want to make sure it does work, you will install it on your database server, which must have internet access available on the server. Here is the link for the software, and here are the instructions from Microsoft on installing it.

 

  1. The next step is to configure the gateway for the Power BI site and for your local server. Go to your Power BI for Office 365 site, and navigate to the Power BI Admin Center. On the Left Menu, you will see an option for Gateway. Click on the plus and add a gateway.  You will generate a key as part of this process and you will need to copy this key and provide the key to the application installed on the database server. If you have more than one database server, you will need more than one key. Here’s a link to the instructions for creating a gateway in the Power BI Admin Center which provide more detail. When you are done, look on Power BI Admin screen your gateway name will have this Running next to the gateway name. If you don’t see the green check, wait for about 10 minutes to see if the status changes.If the status does not show as Running, there are two things you might want to do: Review the instructions or Look on the Office 365 Admin Center. The Office 365 Admin Center can be accessed via the grid button on the top left hand side of the screen under the Admin icon. Select menu option on the left hand side of the screen for Service Health, open it and select service health. I have included a copy of the screen here, which shows that there are no issues today. If you are experiencing issues, this screen may not reflect a problem until sometime later, so continue to review this if the steps just don’t work for some reason.Office365Admin
    1. Create a local data connection within Excel to be used to create a Power BI data source. Start a new worksheet in Excel. Click on the data tab (not the Power Pivot tab) and click on the left most icon on the ribbon Get External Data->From Other Sources->From SQL Server. Enter the server name. Either Windows Authentication or SQL Server Authentication work, so pick either one and enter the Id and password. Click on the Next Select the database and then the tables you want. There is no need to select more than one table, as you can always go back and add tables once this is complete, and it will save loading time. Click on the Next button.
      DataConnectionWizardThis is the most important screen of the entire process. The Friendly Name listed here will be used on the Power BI site to create a Data Source. The Friendly Name must exactly match the name of the Data Source in Power BI, which may mean that you will want to edit the name from the default listed here. Do not check always attempt to use this file to refresh data. After you are satisfied with the Friendly Name click on the Browse… button.In a later step, the connection string listed in this file will be used to create the connection string in Power BI. As we will be opening the file listed here in Notepad in a later step, it is important to remember this file path.DataConnectionWizardFileSaveClick on the Save button in the File Save window.Click on the Finish button in the Data Connection Wizard Menu.ImportDataGenerally speaking, you will want your database information to be loaded to a Pivot Table Report, so click on the OK button. If you do this, you have the added benefit of having a report to test. When your pivot table is completed, it might be a good idea to now go to the Power Pivot tab and take a look at the model. Feel free to modify it as much as you like, but it is not necessary to do so to test the data refresh. If you need to add additional tables, make sure that you do so by using the existing data connection created earlier. Save and close your new excel workbook.

     

    1. Using the information from the previous step, we can now create a data source in Power BI. Go back to Power BI Admin Center on the Web, and from the left hand menu select Data Sources. On that screen, click on the +(plus sign) to add a new data source, and select SQL Server. Enable Cloud Access must be completed. If you want to make items from the database searchable from Power Query, you will need to check Searchable in Power Query and Enable Odata Feed. If you select Enable Odata Feed you will be prompted later to select a series of tables to expose for viewing Click next to get to the connection info screen.PowerBIConnectionInfoCompleting the connection info screen correctly will allow you to do a data refresh. The information needed to complete the screen can be found in the *.odc file you created earlier. In notepad, open up the *.odc file using path you remembered from the previous step. Look for the text <meta name=Catalog content= The name listed on the other side of the equal sign MUST be the name you list in the name in the Power BI Connection Info Name. Put a description in if you wish. The dropdown box for the Gateway will list the gateway you created in the previous step as well as any others you have created previously. Select the appropriate gateway.By default the Connection Properties option will be selected.ConnectUsing Unfortunately if you use this option you will never be able to perform an Automatic Data Refresh in Power BI. Select connection String. When you do, the connection provider will change to .NET Framework Data Provider OLE DB. This is supposed to happen. For the connection String information, go back to your *.odc file and look for the text <odc:ConnectionString> Copy everything between that text and this </odc:ConnectionString> into the connection string window. Wait until the set credentials button turns blue. If it does not turn blue, there is something wrong with the Power BI and you will want to look at the Office 365 Admin Service Health site referenced earlier.Click on the Set Credentials button. A little window will show up indicating it is loading. A pop up window will appear prompting you to add your user id and password. Make sure you test the connection before clicking on the OK button.The last step here is to click the Save button.

     

    1. Upon the successful completion of the previous steps, Scheduling Data Refresh can now be configured. Upload the excel workbook created earlier in step 3. Once it is loaded, click on … (the ellipse) and select Scheduled Data Refresh. When the AddToFeaturedReportsscreen loads, the first thing you will need to do is click on the button at the top of the screen to turn the feature on as it defaults to off. All of the connection information included inside your Power BI report will be listed. The connection name(s) listed in the report should exactly match the connection(s) in the Power BI Admin Data Connections. If there is not a match, you will not be able to Schedule an automatic data refresh. It is not possible to change the names from this screen, as it lists the connections within the Power BI document. You may need to fix the connections within your Power BI worksheet. If the connections are not valid, you will see this status iconStatusIcon and a message about this status.
      The Configure Refresh Schedule is default selected to a daily frequency.   Select the time and the zone for the data refresh. Unfortunately, you can only schedule refreshes for 3 months, so the schedule will need to be updated every 3 months as there is no way to make it continue in perpetuity. Send Notifications is defaulted to the email address of the admin. Note you will be notified of errors, not of completions of the data refresh. Click on the Save and Refresh button to test the data refresh process, which refreshes the report immediately. Once the data refresh schedule is completed, the top of the screen will have two menu items History and Settings, which you can review at any time.

     

    Data Refresh Conclusion

    If you think that there should be a better way, I would love to hear about it. Looking online, there are a lot of people struggling with this topic. If in any way you find it helpful, or you want to tell me there is an easier way, I would love to hear about it. I look forward to hearing what you have to say about this topic.

    Yours Always

    Ginger Grant

    Data aficionado et SQL Raconteur

Upcoming and Recent Events

24HOPPassSpeakingThe PASS organization is a professional organization which sponsors a number of different technical events in the technical community. Recently, I have been honored to be selected to speak at not one but two events hosted by PASS, a professional organization which provides a lot of great resources to improve knowledge of all things SQL Server and related technologies to the world. The PASS Business Intelligence Chapter provides training on all things related to Business Intelligence via the web. I was selected to talk at the last meeting in May. Thank you to all of the people who were able to attend my talk on Top 10 SSIS Tuning Tricks live. If you had to work, no problem all of the talks hosted by the PASS Business Intelligence Virtual Chapter Recordings are available on www.Youtube.com. The recording of my Top 10 SSIS Tuning Tricks session is available here.

24 Hours of PASS

Periodically PASS provides a 24 Hour Training session on SQL Related topics to provide training live to every time zone in the world. As this event is watched by people around the world, it is a real honor to be selected for this event. This time the speakers were selected from people who had not yet spoken at the PASS Summit Convention, as the theme was Growing Our Community. The theme is just another way the PASS organization is working to improve people’s skills. Not only do they provide the opportunity to learn all things data, but also provide professional development through growing the speaking skills by providing many avenues to practice these skills.

Data Analytics with Azure Machine Learning

My abstract on Improving Data Analytics with Azure Machine Learning was selected by the 24 Hours of PASS. As readers of my blog are aware, I have been working on Azure Machine Learning [ML] this year and look forward to discussing how to integrate Azure ML into current environments. Data analytics with ML are yet another way to derive meaning from data being collected and stored. I find the application of data analytic fascinating, and hope to show you why if you are able to attend. There are a number of wonderful talks scheduled at this event, so I encourage you to check out the schedule at attend as many as you can. To be sure I’ll be signing up for a number of sessions as well.

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

Azure ML, SSIS and the Modern Data Warehouse

Recently I was afforded the opportunity to speak at several different events, all of which I thoroughly enjoyed. I was able to speak on Azure Machine learning first at the Arizona SQL Server Users Group meeting. I really appreciate all who attended as we had quite a crowd. Since the meeting is held MachineLearningTalkpractically on Arizona State University’s Tempe Campus, it was great to see a number of students attending, most likely due to Ram’s continued marketing efforts on meetup.com. After talking to him about it, I was impressed at his success at improving attendance by promoting the event on Meetup, and wonder if many SQL Server User Groups have experienced the same benefits. If you have, please let me know. Thanks Joe for taking a picture of the event too.

Modern Data Warehousing Precon

The second event where I had the opportunity to talk about technology was at the Precon at SQL Saturday in Huntington Beach, where I spoke about Modern Data Warehousing. It was a real honor to be selected for this event, and I really enjoyed interacting with all of the attendees. Special thanks to Alan Faulkner for his assistance. We discussed the changing data environment including cloud based storage, analytics, Hadoop, handling ever increasing amounts of data from different sources, increasing demands of users, the review of technology solutions demonstrate ways to resolve these issues in their environments.

Talking and More Importantly Listening

The following day was SQL Saturday in Huntington Beach #389. Thanks to Andrew, Laurie, Thomas and the rest of the volunteers for making this a great event as I know a little bit about the work that goes into planning and pulling off the event. My sessions on Azure ML, Predicting the future with Machine Learning and Top 10 SSIS Tuning Tricks were both selected and I had great turnout on both sessions. To follow-up with a question I received during my SSIS Session, Balanced Data Distributor was first released as a new SSIS transform for SQL Server 2008 and 2008 R2, so you can use it for versions prior to SQL Server 2012. I’ve posted more information about it here. I also got a chance to meet a real live data scientist, the first time that has happened.  Not only did I get a chance to speak but a chance to listen. I really enjoyed the sessions from Steve Hughes on the Building a Modern Data Warehouse and Analytics Solution in Azure, Kevin Kline on , and Julie Koesmarno on Interactive & Actionable Data Visualisation With Power View. As always it’s wonderful to get a chance to visit in person with the people who’s technical expertise I read. In addition to listening to technical jokes which people outside of the SQL community would not find humorous, it’s great to discuss technology with other practitioners. Thanks to Mr. Smith for providing me a question which I didn’t know the answer, which now I feel compelled to go find. I’ll be investigating the scalability of Azure ML and R so that I will be able to have an answer for him next time I see him. I really enjoy the challenge of not only investigating and applying new technology but figuring out how to explain what I’ve learned. I look forward to the opportunity to present again, and when I do I’ll be sure to update this site so hopefully I get a chance to meet the people who read this.
Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

 

What is a Modern Data Warehouse?

As I was honored enough to be selected to give a PreCon on the Internals of the Modern Data Warehouse, I thought that I would take the time to explain why I felt drawn to the topic. There are a lot of places that haven’t given much thought to the changes in technology which have happened over the last few years. The major feature upgrades to SQL Server in 2012 and 2014 have meant that they can use column store indexes which makes things faster and maybe better High Availability. While those things are certainly valuable improvements there is a lot more that you can do to derive value from your data and companies want more than just a well-organized, running data warehouse.

Data is a Valuable Asset

In 2010, Borders Group Inc. was allowed by the Federal Trade Commission to sell their customer information to Barnes and Noble as part of their bankruptcy sale of their assets. In 2015, RadioShack is doing the same thing. Businesses understand that data is valuable and they are interested in using it to drive decision making. Amazon, Netflix and Target are well known for their use of customer information to drive sales, but they are far from the only ones doing this. This is one of the bigger trends identified recently in the business press. The heads of companies are now looking for their data teams to do more with their data so that they too can have the dream information systems they are reading about.

Total Destruction of the Existing DW is Not Required

Excavator working with earth and sand in sandpitWhile a lot of the time, it might be nice to level everything and start over, that is not always an option. The major reason for this is that the data warehouse environment already in place has a lot of value. You want to add to the value already there, not destroy what you have. Also it would take a long time to recreate the environment and no one is patient enough to wait for that. Alternatively you could expand into areas of new technology as your data grows. Perhaps this mean you archive some of your data from your database to a Hadoop cluster instead of backing up the data in some far off location. This would allow you to use Sqoop to bring the data back when you need it, providing ready access to the data. Perhaps you want to provide the users more self-service BI capabilities, moving the data analysis into the hands of the people who are more familiar with the data? You could add the capabilities of Power View in Excel, Power Designer or Tableau to your environment.

Incorporating Social Media Information

The business world operates not only on a batch cycle. More and more companies want to know what is being said about them so they can respond appropriately. With tools like Azure Event Hubs, Data Factory, Streaming Analytics, and Machine Learning this isn’t as hard to do as it might sound. We’ll review these products so that attendees will understand how these tools can provide greater insight not only into their own data, but the data building about them outside of the company firewall.

For More Information

I really hope you can join me in Huntington Beach on April 10 for a full day of exploring these concepts. I always look forward to events like the precon and of course SQL Saturday #389 – Huntington Beach which is the following day.

 

 

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

Complex Data Analysis and Azure Machine Learning Presentation Wrap Up

Thank you for all of the people who signed up for my webinar on Data Analysis with Azure Machine Learning [ML]. I hope after watching it that you find reasons to agree that the most important thing you need to know to get started in Machine Learning is not Math, but having good knowledge of the data you want to analyze. There’s no reason not to investigate as Azure Machine Learning is free.  In order to take more time with the questions after the presentation than the webinar format allowed,  I am posting my answers here, where I am able to answer them in greater detail.

How would one choose a subset of data to “train” the model? For example, would I choose a random 1000 rows from my data set?

It is important to select a subset of data which is representative of the data which wish to evaluate. Sometime a random 1000 rows will do that, and other times you will need to use other criteria, like transactions throughout a given date range to be a better representative sample. It all comes down to knowing your data well enough to know that the data used for testing is similar to what you will be ultimately using for analysis.

Do you have to rerun or does it save results?

The process of creating an experiment requires that for each run you need to re-run the data as it does not save results.

Does Azure ML use the same logic as data mining?

In a word, no. If you look at the algorithms used for data mining you will see they overlap with some of the models available in Azure ML. Azure ML provides a richer set of models, plus a greater ability to either call models created by others or write custom models.

How much does Azure ML cost?

There is no cost for Azure ML. You can sign up and use it for free.  Click here for more information on Azure ML.

If I am using Data Factory, can I use Azure ML ?

Data Factory added the ability to call Azure ML in December, providing another place to incorporate Azure ML analytics. When an Azure experiment is complete, it is published as a web service so that the experiment can be called by any program which chooses to call it. Using the Azure ML experiments from directly within Data Factory decreases the need to write custom code, while allowing the logic to be incorporated into routine data collection processes.

http://azure.microsoft.com/blog/2014/12/16/azure-data-factory-updates-integration-with-azure-machine-learning-2/

If you have more questions about Azure ML or would like to see me present on the topic live and live in Southern California, I hope you can attend SQL Saturday #389 – Huntington Beach where I will be presenting on Azure ML and Top ten SSIS tips. I hope to see you there.

 

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

 

Math and Machine Learning

MLModelsI had an interesting conversation with someone at SQL Saturday Phoenix, an event that I am happy I was able to attend, regarding knowing math and getting started in Machine Learning. As someone who had majored in Math in college, he was sure that you had to know a lot of math to do Machine Learning. While I know that having really good math skills can always be helpful when creating statistical models based on probability, a big part of Machine Learning, I do not believe that you need to know a lot of math to do Azure Machine Learning [ML].

Azure Machine Learning and Throwing Spaghetti Against the Wall

For those of you who cook, you may have heard of an old school way of testing to see if the spaghetti is done. You throw the spaghetti against the wall and if it sticks, the pasta is done. If it falls right off, keep the spaghetti in the pot for a while longer. Testing machine learning models is similar, but instead of throwing the computer against the wall, you keep on testing using the large number of models available in Azure ML. Once you have determined the classification of your data, there are a number of different models for the classification which you can try without knowing all of the statistical formulas behind each model. I have listed all of the models from Azure ML here so that you can take a look at the large number of models available. By taking a representative sample of your data, and testing all of the related models, determining which one will provide a result is not terribly difficult. The reason it is not very hard is you do not have to understand the underlying math needed to run the model. Instead you need to learn how to read a ROC curve, which I included in my last blog post. While you can pick the appropriate model by having a deep understanding of the formula behind each model, you can achieve similar results by running all of the models and selecting the model based on the data.

Advanced Statistical Analysis and Azure ML

While Azure ML contains a lot of good tools to get started if you do not have a data scientist background, which recruiters lament not enough people do, why would you use Azure ML if you have coded a bunch of R Modules already to analyze your data? Because you can use Azure ML to call those modules as well and provides a framework to raise visibility and share those modules with people within your organization or the world, if you prefer.

How to Pick the Right Model

I am going to demonstrate how to pick the right model in an upcoming webinar, which is probably easier to explain in that fashion rather than in a blog post. If you want to see how to determine which model to use and not know a lot of Math, I hope you take the time to attend. Azure ML offers the ability to integrate analysis into your data environment without having to be a data scientist, while providing advanced features to accommodate those really good at math, which I will be talking about in an upcoming Preconvention event for SQL Saturday in Huntington Beach. If you happen to be in Southern California on April 10th I hope you will be able to attend that event.

 

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

 

Getting Started with Machine Learning – Result Analysis

Recently I’ve started working with Azure Machine Learning and looking at what I consider the most challenging part, picking the right analysis. For those people who haven’t ventured into Azure Machine Learning, it looks a lot like a data flow in SSIS. After that you need to train or more to the point evaluate which model works best. The answer to that question takes a while. What kind of data do you have? Are you looking to find errors? Determine whether data classified in a certain way can predict a result? Perform a regression analysis of data over time? Group data together to identify trends?

Is your Model better than a Monkey throwing Darts?

While you can analyze your variables and rank them to determine the chance that the variables indicate a result, there is another method that is also used to determine an outcome, the coin toss. This lowly method of analysis is right half the time. If you have more than two outcomes, or to speak the language of Machine Learning, the outcome is not binary, there is another method used to determine the accuracy of predictions, monkeys. I have read about the various skills of monkeys in both literature and financial analysis. Think about it for a minute and you may remember reading or hearing about monkeys typing on a keyboard who have been able to write Shakespeare, or a blog post.  This is known as the Infinite monkey theorem. Another thing monkeys have been known to do is throw darts. Various financial publications have been measuring the success of mutual funds to monkeys throwing darts at stocks since the last century. The goal is of course to create a model that has the better success as a monkey or a quarter. The question is how?

Probability of Picking the Right Model

ROC [Receiver Operating Characteristic] Curves are used to ensure the machine learning model generated is better than a monkey throwing darts. Your goal is a perfect game of golf. Chances are your ROC curve will be somewhere between the two. In looking at the ROC curve generated here, you can see 3 lines, a light grey, a red and a blue one.

RocCurve

The diagonal line represents a coin toss. If you were able to get one of your scored datasets to be a 1, meaning that you got a true positive rate every time, you would have played a perfect game of golf. Chances are you will have two lines like I do here and one, in this example the blue line, has a higher number of true positive rates than does the red line, so the results generated by that model are more accurate.

More ML More of the Time

I find myself spending more time with Azure ML, meaning that I will be devoting a lot more future blog posts on this topic. I am also speaking on Azure ML both as part of a Pre-Convention Event on the Modern Data Warehouse and at SQL Saturday in Phoenix. If you happen to be in Phoenix, I would love to meet you. SQL Saturdays are great learning events and I am happy that I was selected to participate in this one.

 

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

Tips on SSIS at SQL Saturday Albuquerque

sqlsat358_ABQOn February 7, I was fortunate enough to be selected to speak at SQL Saturday in Albuquerque, New Mexico on Top 10 SSIS Tuning Tricks. Having worked with SSIS for a number of years, I’ve needed to research what was the best methods to employ to ensure my SSIS ETL was running optimally. I’ve compiled the most valuable items, with examples of course, into this presentation. I’m assuming that everyone attending already has been using SSIS for a while, so I will skip straight into more in-depth ways of tuning SSIS. One of the questions that I know I have heard most often is “When should I do X in SQL or SSIS?” If you are able to attend this session, you will have the answer to that question.

I really enjoy the opportunity to speak on data related topics and meeting people who may have come upon my blog in the past. Having spoken at this event last year, I know what a good job Keith, Chris and Meredith and friends do organizing this event. I want to take the time to say thank you for all of your hard work as I really appreciate it. These events are a great place to learn and keep up with a lot of the changes going on in the industry. I anticipate there will be many lively discussions both before and after the event. That reminds me. If you get a chance, on Friday there are two great precons scheduled on Friday, February 6th , Powershell Basics with Mike Fal and Query Tuning, Troubleshooting and Execution Plans with Jason Kassay. Having been fortunate enough to meet both of them, I know that they are both extremely knowledgeable in their respective topics, and if you are in Albuquerque I encourage you to sign up for either of them as I am sure both will be excellent.

I hope that you will be able to attend as I know I will enjoy seeing you there.

 

Yours Always
Ginger Grant
Data aficionado et SQL Raconteur

Presentation Follow Up to Data Analytics and Distribution with Power BI

Thank you to all of the people who took the time to view my session on Data Analytics and Distribution with Power BI. I really enjoy getting the chance to decrease the confusion I hear regarding Power BI, and hope that you will find the question and answer section helpful if you are trying to learn more about the product.

Questions and Answers

 In Power BI Designer, is it possible to manipulate the colors of the charts?

While this may change as Power BI Designer is still in preview mode, currently the colors are assigned automatically. As you might guess, this is a feature that other people are interested in so it is on Microsoft’s list of things to add. If Power BI Designer color selections are implemented like they are in Power View, which is very similar to Power BI Designer, it is likely themes of colors will be available, rather than the ability to pick each color like in Report Designer, but we will have to wait and see.

Are Power BI designer/dashboard changes specific to each user?

If you have created a Power BI Designer Dashboard, you have the ability to share it with people in your organization. There are a couple of things that need to happen for this to work. The people that you share it with must have Power BI accounts, and they have to be in the same domain as you are. When you share the reports they are only able to read, not edit them. For more information regarding security and Power BI, see Microsoft’s guide here.

Can I grant access to users outside of our domain?

Power BI’s security model is a separate tenant from the security model for SharePoint in the Office365 cloud, but they are related as you can only grant access to Power BI if those users are able to access your version of Office 365 SharePoint. As stated in the previous question, for Power BI Designer Dashboard, the users must be part of the same domain.

How does Power BI perform predictions? Is it the same logic which is used in Data mining?

Power BI uses the Forecasting and Hindcasting features to perform predictive analytics. There are a number of different analytical categories, and the kind used in Power BI use Time Series. As the name suggests, Time Series models analyze a set of measurements performed over time to determine patterns in the past which can be used as guides going forward. Data Mining looks at variables, which may or may not include time, as it looks for patterns throughout the data. These underlying statistical models are not the same.

Does Power BI have Power Map feature?

Power BI definitely contains Power Map. In fact Power Map is only supported in preview mode if you do not have Power BI. This link can provide more information about the limits of Power Map in Excel. Power Map is designed to be run as a movie, and provide a directed look of the data on the map, rather than providing the interactive drill down mapping features which are available in Power View. You can share a Power Map by saving it as an mp4 video file and posting it anywhere. There are a number of Power Maps on You tube if you care to search there.

How does “R” play here?

The R language is completely agnostic as to what is the source of its data, you have the ability to use excel if you want to as a data source. If you want to use R within Excel, try the RExcel add-in, which is available here.

Is the PowerPivot where the data is stored for Power BI Designer?

No. Data can come from anywhere, not just from Power Pivot in Excel. For example, if you want to use a website as a data source, you could do that too as there are a number of different available data sources, and that is one of many.

Do you have to use Power BI Designer on the Web?

While Microsoft has designed the Power BI Designer as a web project, so that you can create Power BI Designer Dashboards as part of the preview, there is also an application available for download here. The desktop application works very similarly to the web version, with the exception of course that you will need to upload and Select Power BI Designer file as your data source. Should you wish to modify the dashboard once it is loaded, you can do so.

How would you determine anomalies or freak instances in data versus true trends?

The problem of determining anomalies is one which the practitioners of predictive analytics are constantly struggling. For trending to occur, the numbers of what was previously considered an anomaly need to increase. Forecasting within Power BI applies one of the more standard methods for accounting for anomalies, looking at the standard deviations and probability. The likelihood a number will fall within a certain range of numbers are based upon the number of times this has happened in the past, which is graphed as a bell curve. The values representing the far sides of the Bell curve are discarded, which is known as variance, which in Forecasting in Power BI is represented as 1δ .The number increases with the more variance you wish to represent.

Does SharePoint on premise support Power BI Designer?

No. Power BI Designer is currently in preview version in the US for Power BI subscribers. You can download the application to play with it.

Will Power BI be available in next version of SharePoint?

While I cannot speak for Microsoft, I can tell you that it isn’t there now. For more information on SharePoint, check out their website here.

Does this work in Office 365 SharePoint? And this replace the bi feature offered in SharePoint on the cloud?

Since I do not work for Microsoft, I am hesitant to talk much about how their licensing plans really work. For more information, please check out their website.

If we want to start learning Power BI, where do we start?

There are a number of great places to learn about Power BI, the best and most up to date being here. I have included some other places where you might want to go to learn more about Power BI

 

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur