Which Power BI Gateway: Personal or Enterprise?

UPDATE: Power BI now has one Gateway, with the option to use a personal gateway. Check out this post on the new gateway for more information.

Power BI has two data gateways, Personal and Enterprise.  Since I assist companies with Power BI, the name powerBIGatewayPersonal Gateway made no sense to me, especially when I used a personal gateway to update an Organizational Pack. While this is a valid reason for a name change, after all Power Query is now called Get & Transform, so why not just rename Personal Gateway? I digress. Enterprise Gateway is not a replacement for Personal Gateway. It is partially an upgrade of the Power BI Analysis Services Connector, as it contains the features in that app and more. In the future, Enterprise Gateway going to be a way to manage all of the data connections within Power BI. I look forward to writing about that once it is available. The Enterprise Gateway, which I should mention is a Preview Release, supports three different Data Source Types: SQL Server, SAP HANA, and Analysis Services. One enterprise feature which is available is the ability to add users to the gateway you just added, rather than relying on one ID to grant access to everyone. For connections to SQL Server or SAP, the connection to the database is made via the user you entered, so that user should have appropriately limited reporting connections to the data source.

Power BI Data Access Based on User Security

Please note that right now, data access based upon user credentials only works for Analysis Services. For reports with an Analysis Services data source, the information passed to the server is the User Name of the user accessing the report. Using Active Directory, this user is granted the same access to the data on the server that they have on the on-premises network. Here’s an example, let’s say Jason is the sales manager for the Eastern Region, and doesn’t have access to the Western Region within Analysis Services security. Jennifer is the sales manager for the Western Region and has only been granted the ability to see the Western Region information in Analysis Services. If a sales report is created in Power BI which uses Analysis Services as it’s data connection via the enterprise gateway, Jason will only see the information on the Power BI report for the Eastern Region and Jennifer will see the same report with only the information for the Western Region. If Jason gets promoted to National Sales Manager and needs to see everything, once the security in Analysis Services is updated granting him access to all sales regions, he will see everything. Unfortunately, if you have 2008R2 or Standard Edition for SQL Server 2012 or greater, you won’t be able to connect to the server via the Enterprise Gateway.

Factors for selecting  Power BI’s Personal Gateway

The Personal Gateway takes the data and imports it into Power BI. If you want to extract data from a variety of different places such as an Oracle Database, and Excel Spreadsheets, the Personal Gateway will support this, and the Enterprise Gateway won’t.   Remember the Enterprise Gateway only connects to three different data sources, and Excel and Oracle are not on that list. If you want to manage connection and refresh of the data as the administrator or provide access to the data to everyone who needs it, use the Personal Gateway.

When might one want to use Power BI’s Enterprise Gateway?

All of the connections via the Enterprise Gateway are live connections to the underlying server, so there is no need to have a scheduled refresh. After all you are always using the connection to live connect to the server accessing the data. Enterprise Gateway imports nothing, so if you have really large databases which you are reporting upon, it probably makes sense to user the Enterprise Gateway as nothing gets copied. If you have a security policy which forbids storing data in the cloud, Enterprise Gateway meets that requirement as all of the data is stored locally and is merely accessed when needed, like a web page.

Future Plans for the Enterprise Gateway

Microsoft is planning on providing the ability to monitor and audit all of the data sources in the Enterprise Gateway. That would make it truly enterprise as it will provide the ability to see what data is being used throughout Power BI. When that happens, it might be time to get rid of the Personal Gateway. Right now, if the Personal Gateway is working for you for connecting to SQL Server, Analysis Services or SAP and the data refreshes are working, I would hold off upgrading. It’s not terribly easy to what refresh methods are configured in Power BI right now, which is also something that I hope gets fixed in the future as part of the Admin features. When Microsoft releases new versions of the Enterprise Gateway, I will definitely discuss them here. To get future updates, please subscribe to my blog to be notified when they happen.

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

 

Custom Visualizations in Power BI – Use with Caution

*UPDATE* The Power BI Update team has resolved this problem. To read about how Power BI Works now, check out my blog post Time to Use Custom Visualizations in Power BI.

PowerBIVisualMessage-click

After looking at some of the neat custom visualizations with Power BI, like the fish or something like the synoptic panel, I was sold. Time to start using these neat visualizations in my reports in Power BI. I rather regret the decision. Why? I’ve provided my reasons here.

Warning Messages Every Time the Power BI Report is Viewed

Neither I nor the client is interested in seeing a message which looks like something sketchy is in the report. Reports are supposed to answer questions not leave people wondering what is wrong every time they look at the report. PowerBIVisualMessageThe message looks like some kind of a legal cover just in case the code is really bad. Worse the report is broken as the visualization will not show until the yellow button is clicked. Yes I can click on the button and the message will go away and look like the first report image shown above with the fish, but not for long. Like this meme, the message will never go away forever, but appear every time I or someone else opens the report.

Good Luck Getting Rid of the Custom Visualization in Power BI

If you were smart enough to add the custom visualization to the Power BI using the Power BI Desktop, you are in luck. I don’t have that kind of luck. I loaded it to the web application. Why does it matter where you loaded the visualization? Because you can’t get rid of it in the web version. Check out this link from Microsoft and scroll down to this part “Once you import a custom visual you cannot remove it from the report.” Even when I got rid of any use in the report to the custom visualization the link to the custom visualization is still there. The warning message never goes away. If you added the custom visualization to Power BI Desktop, you are in luck as Teo Latchev has posted instructions for you to get rid of the visualization in Power BI Desktop. It’s not a straight-forward process, but at least there it is possible.

So Long and Thanks for All the Fish

Until I have the ability to change my mind with design issues, which I though was sort of a prerogative, I’ll be staying away from custom visualizations. If I can’t turn the warning message off, I probably will never use custom visualizations. Hopefully you will read this before you had to do what I did, deleted the report and started over.

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

How do I Update my Power BI files?

OneDriveIf you have been working in Power BI and use Power BI Desktop to create a data model and perhaps some of your reports, chances are from time to time you will want to change the Power BI Desktop file. Over time the data model created in the file, commonly known as a pbix file after the file suffix, may need to change. Updating the pbix file may not work how you expect.  If you want to be able to update pbix files, you need to save them on your One Drive. If you want multiple people to be able to maintain the Power BI Files, the best way to accomplish this is to use the Power BI Workgroup One Drive. While you can share a One Drive file with other people without creating a workspace, that file will not be visible to others from within Power BI as the location isn’t exposed to other people in the web interface.

What features do you get with Workspace and an Exchange License?

I heard recently from someone who listened to my PASS BA Webinar and didn’t understand why I said that one need to have to have an Office 365 Exchange client to use workspaces in Power BI. It is possible to create workspaces without an Exchange License, as the only thing required to create workspaces is a Power BI Pro License. While one can create workspaces without having an Office 365 Exchange client, chances are that isn’t all that you want for a team to be able to work together. When a workspace is created, often times it is created because you want a team of people to collaborate on the model and the reports. Collaboration can include modification to existing reports, data models or using shared content packs. To modify the data model, everyone needs to be able to edit the pbix file containing the data model and view the files from within Power BI. If a group of people want to edit each other’s the pbix files, a One Drive location where Power BI’s web client and all of the members of the workspace can access the file is necessary to make this collaboration happen. If you already have an Exchange server, you can still continue to use that as you don’t need to use the mail features of the Exchange license. However, since the ability to create groups which share One Drives is an Exchange feature, not a Power BI feature, you will need to purchase a license for each user in the group in order to have a One Drive area to share files. If you do not have an Exchange license, members of the group will be able to share Content Packs and Power BI Reports, but not the Power BI data model created in Power BI Desktop.

Let me know if you find this information helpful or if you have any other questions regarding team collaboration with Power BI. Feel free to contact me on twitter or by posting a comment on the blog.

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

 

2015: Year End Wrap up for Releases and More

As 2015 draws to a close, I started thinking back about everything that has happened this year. 2015 GraphicTechnically this has been a big year as a many new applications were released. Here are just some of them, with links included to provide more detail.

This short list could be a lot longer as it doesn’t count the number of updates released to Power BI, which occur several times a month, the CTP releases for SQL Server 2016, the new web version of BIML, or PowerShell. It’s really hard to keep up with everything that is changing. It’s a good thing that so many people are willing to help others learn how through speaking and blogs which make learning new things easier.

Community Involvement in 2015

Keeping up with all of these events is difficult, especially given the pace of releases.  I spend a lot of time reading various blogs, watching videos and going to hear people speak. I also have been able to talk about topics of particular interest, many Power BI and Machine Learning. This year I spoke a different times at a number of different events including: Speaker Idol, two different user groups, seven webinars, five SQL Saturdays and other Tech Events. I’ve got a number of engagements on the books for next year, including PASS BA Con and SQL Saturday #461 – Austin. 2016 is shaping up to be busy too and hopefully our paths will cross.  I list all of my speaking events on my Engagement Page and I hope that you might take a look at it from time to time if you are interested in catching up in person sometime. Next year I am hoping my list of speaking engagements changes somewhat as I plan on trying harder to get accepted to speak at events where I submitted and was turned down in 2015. On a more positive note, views of my blog are up 1000%, and the number of website subscribers has more than doubled. Thank you very much for continuing to read this site and I hope you find my thoughts helpful. I posted once a week this year, which I thought was pretty good until I talked to Ken Fischer b | t who blogs twice a week. I’ll have to try harder next year. If you think of a topic you think would make a good blog post, let me know as I am always interested in feedback.

Keeping Up the Pace in 2016

Next year there will be no slowdown in the things to learn as SQL Server 2016 is going to be released. Although the exact date has not been announced, my sources tell me to look for it around May-June. The next release of SQL Server is going to be huge as it will include new tools Microsoft added to integrate Big Data and open source platforms to SQL Server. PolyBase, JSON and R are all going to be part of with SQL Server. Personally, I find the R integration most Datazen and SSRS are going to be integrated in the next release too which should really increase the implementation of mobile reporting visualizations.

 

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

Power BI Content Packs and Workspaces

 Recently I was talking with a client who had questions regarding how to integrate Power BI Organizational Content Packs and Workspaces. Working on the principal that he was not alone in wondering about how these Power BI features worked together, I provided that information here, along with some links which you may find useful.

Organizational Content Packs in Power BI

Content Packs were released this summer for Power BI, and if you are thinking about using them, remember that you will be needing a pro license for every user who needs to access them. If you are interested iPowerBiCreatingContentPacksn providing users with the ability to access the data and provide a read only dashboard, containing data and pre-built links to access it, Content Packs can provide that capability. Consuming the provided data source available in a content pack does not allow users to modify anything about the data source, such as the refresh schedule or id used to access the data. Any user can create a content pack and publish it to the entire organization or to a list of emails, both of which can be somewhat unwieldy. A better way to distribute content packs would be to expose them via workgroups.

Workspaces in Power BI

I do talk about workspaces in a previous post, but I wanted to include a shortened version here. If you do not have an Office 365 Exchange client, you will not be able to create workspaces, also known as distribution groups, as these are created as part of outlook and used by Power BI, but are not a native Power BI feature. If you have groups of users such as a group of people working on reports or a team of people who are looking at the same data, for example the accounting department, HR, and Operations all may want to look at different kinds of data, so it might make sense to create Workspace for each of them and store the specific reports for each group in the Workspace.

Releasing Read Only Dashboards and Data Security

When releasing a dashboard via a content pack, people who have been granted access to the content pack are able to view a read only copy of the dashboard. If they wish to modify the dashboard, they will have to make a copy of it, then modify the copied version. Everyone who looks at the dashboard, since they are using the data connection in the content pack, will have the ability to look at all of the data in that connection. If instead you want to use a security model created for each user or group of users which have been created at the source, the data source must be an on-site SSAS tabular model. Only by using a SSAS tabular model can you provide different data access to people viewing a dashboard with content packs.(Update: Power BI Enterprise Gateway will use local security. For more information see this link.)  Only the person who created the content pack can delete the content pack. No other person can delete it, but they can break the content pack by removing or altering the data connection the content pack uses. If the person who created the content pack doesn’t work at your company anymore, it is possible to have orphaned content packs that no one can delete. I hope at some point this is fixed as I think an administrator or someone else should be able to delete the content pack, but Power BI does not work that way now.

If you have any other questions regarding these features or what I’ve posted here, please feel free to respond to this blog as I would be happy to hear from you.

 

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

 

R – Why Learn it?

I was talking to someone who is looking to change careers to pursue a technical field. She has been talking to people from various tech companies and has signed up for an intensive training bootcamp designed to teach enough programming to get a job after the class. She has been hearing all about various types of open source languages, but one which has never come up in previous conversations was R. That surprised me since R is on the list of the most popular list of languages (SQL didn’t make the list) as well as list for the programming languages in high demand. She asked me why I thought R would be a good language to learn? Since I had the same thought myself when I started to learn R, I thought I’d answer it. R helps bring meaning to data through its ability to combine data analysis and visualization. Data is important because nearly every application, from FitBit to various flashlight apps, are for better or worse all about data.

Combining Analysis with Visualization

To get started learning R, I took a MOOC class on it. While this appeared to be a good idea, after being in the class for five weeks, I had no idea why the language was considered useful. All we did was load arrays of data into memory and then write some code which approximated aggregation and selection which could be done in SQL. After five weeks, I dropped the class as I had other time commitments and it didn’t seem worth it. I was still interested in finding out what the big deal was. After playing around with R and watching other people play around with it who could make it do a lot more than I could, I got it.

A Picture Is Worth a Thousand Words

BubbleChartThe cliché about a picture is often repeated because it holds true. I have taught a number of Power BI classes, and during the class, we review a giant spreadsheet of data, and I ask the class tell me over time what the impact is over time. With a giant spreadsheet, you can’t readily determine what the answer is. Once we create a visualization of data, it’s easy to see the answer to that question. R has been working on providing graphical answers to data questions for years. A number of different companies are realizing the value of R as well. Microsoft bought Revolution Analytics, a leading R provider in 2014 and are rapidly incorporating R into other tools, like Power BI and SQL Server 2016. I’m convinced now and will be talking more about R in my blog in the future.

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

 

 

 

I’m Not Good at Math

How many times have you heard someone say, “I’m not good at Math”? Often times this statement is used as a reason why something technical cannot possibly be pursued. It’s a self-inflicted limitation; a reason that entire areas of study cannot be pursued. If you have ever said this, stop it. Don’t repeat it even if you believe you are not good at math. Why? Because while you may not be good at math now, there is no reason why that should stop you from learning it.

Math, Music and Programming

Years ago, back in the days before PCs and more importantly computer science degrees offered by major universities, IBM was working on developing mainframe computers and needed people to help them develop them. Since there were no computer MathandMusicscience degrees being offered at that time, they hired people with degrees in Math and Music. Music? Why Music? Music uses the same part of the brain as math does. This is one of the reasons educators think that music should be taught to small children as it has been shown to improve math scores. Personally I have found it interesting to ask technical people if they play or have played an instrument. Ask around yourself and you may be surprised at the large number of people in technical fields who have or do play a musical instrument. Musicians have the brain training needed to be good technical people, regardless of their math skills.

Learning Limits

There are no limits to what you can learn, other than the limits you put on yourself. The brain is very complex and there are infinite ways to train it to do something. Generally speaking one is not good at math because they haven’t learned it.  Oddly enough, discouraging one’s ability to learn often starts in school. If this sounds familiar, remember life isn’t school.  Often times a school setting isn’t the best way to learn anything. Performance in class is not indicative of one’s ability to learn. It may have be the ability of the instructor to teach or willingness to focus at that time. I am willing to bet you don’t view the world the way you did when you were sixteen, so why would you judge your ability to learn with that same filter?

Machine Learning is a Skill Which Can Be Learned

I know a very smart developer who told me recently that he wasn’t good at math, so he couldn’t possibly do machine learning. Really. PowerShell, Networking, TSQL, C#, SSIS, MDX and DAX you could learn but you can’t teach yourself Machine Learning? I am not going to say it is easy, but I wouldn’t say that about any of the other development and IT tasks either. If you can learn one of those, you can learn Machine Learning too, despite your belief in your math skills. There is no reason why not. I think Yoda said it best “Do or do not. There is no Try”. There is nothing really stopping you.

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

 

Azure Stream Analytics Hopping – Part 3

When incorporating streaming data from Azure Streaming Analytics, it is important to select the data to accomplish the goals of the tasks at hand. The sample company Eosh is streaming a lot of sensor data and has a variety of questions which the data is will answer.  Eosh is a concrete delivery company which is streaming the data back from their vehicles to the central dispatch system using Microsoft’s Stream Analytics. There’s a much more detailed description of what Eosh is streaming and their data needs in the first post in this series. After reviewing when Tumbling Windows and Sliding Windows, are used, in this post we are going to discuss another option for streaming data, Hopping Windows.

When to Use Hopping Windows

Eosh wants to use Hopping Windows to determine the previous action when water is added to the concrete mix. There is a flow meter sensor in the water tank which detects when the driver flips the switch to add more water. There are a number of different reasons for adding water, one being that the pouring is complete and the driver is washing out the remaining concrete. Another reason could be that the driver is stuck in traffic and the water is added to keep the concrete from setting up within the mixer. Depending on the type of concrete in the mixer, if too much water is added, the concrete will no longer have the required strength and can’t be used to create a load bearing structure. It is very important that concrete used in structural concrete be created according to specification, as concrete mixed incorrectly will crumble over time, something commonly seen in Detroit.  If too much water is added the vehicle may be routed to a different location so the concrete can be used for a non-load bearing purpose, like creating sidewalks.

Overlapping Hops

HoppingSliceBy design, all hops contain an overlapping previous time slice. The picture provides a good visualization for how the data slices are created. Eohs wants to look at the events which happened 5 minutes prior so that the adding water event can be appropriately categorized. The following Streaming query can provide that data

 

SELECT System.TimeStamp AS OutTime, VehicleID, COUNT(*)
FROM Input TIMESTAMP BY WaterStartPour
GROUP BY VehicleID, HoppingWindow(minute,10 , 5)

This query will create 10 minute slices of time. Each slice will look at the last 5 minutes previous reported and 5 minutes past that. By slicing the data in this way, the context around adding water can be evaluated to determine what kind of water add event took place. Eosh can then use this data to determine if the concrete can be delivered to the original location or if it needs to be rerouted.  This later processing will be accomplished via machine learning, which I will talk about in a later post.

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

 

Non-technical Issues Impacting Data Based Decision Making

 Having worked with a number of clients to implement Power BI in their respective environments I noticed that one factor appeared to be common to all. The success of the project depended greatly upon the relationship between the business analyst and the database team. Since this seems to be an issue which greatly impacts the ability to implement Data Based Decision Making, I decided to talk about it in my recent webinar PASS BA Marathon. Too often I see companies which decide to join data together in an analytics platform, such as Power BI, and fail to take advantages of the separate skillsets in the organization. The data team has spent a considerable amount of effort and energy determining the best ways to combine datasets together. Logically one would assume that this expertise would be leveraged to help the business team analyze data. Instead the business teams are tasked with joining data together. While this approach can work, it will take longer to train the business in areas in which they may not be familiar, and the results will be mixed, especially when considering scalability and maintenance needs over time. To leverage the capabilities of the self-service business tool, which tool doesn’t really matter as the same issues will exist in for example Tableau as well as Power BI, the data team needs to be engaged. The skills they have gathered over time allow them to design a plan a data model which can be refreshed automatically without causing issues.

Using Areas of Expertise

Business Analysts time is best spent using the unique skills they have gathered over time too. Their familiarity with the data values allows them to determine at a glance how the business is doing. Codifying this knowledge into meaningful reports which can disseminate this information throughout the organization provides the basis for data based decision making. To make them successful, they need a data model which has all of the information they need which is well documented so that they can find the values they need to provide meaningful data visualizations. Too often the report generation is left to the data team, and many times there is a reporting backlog of items as there are not enough resources to do provide all of the information a business needs.

Team Collaboration

Data Based Decision Making should be an organizational goal, as it has been shown to be a major tool for business success. When the Data Team and Business Analysts work collaboratively by using their specialized skills to create and implement a solution, this solution will be successful. The result will be a model which provides the a path for the Business Analyst to continue to use the data to answer either routine questions, such as “How successful was the business last month” to more obscure questions, such as “What happened to sales volumes after a bad story in the press?”. These and many other questions are answered using the model and tools, like Power BI to implement an enterprise wide solution.

Implementing Successful Data Analytics Management PracticesPASS Business Analytics

There is more to implementing a self-service BI Tool such as Power BI than merely knowing how to make the tool work. A process and a commitment to work among teams is required as well. I enjoyed the opportunity to talk about integrating the tools with the company data management polices at the BA Marathon. If you would like to know more about this topic, please come join me at the PASS Business Analytics Conference in San Jose May 2-4 as I will be going into more depth than was possible in the webinar.

 

 

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

 

Limiting the Number of Results with TABLESAMPLE

In the world of exploding piles of data, there are times you just need less not more. If you are working on a proof of concept, just want to test out some code, or want to pass a selection of data to a machine learning experiment, you don’t need several million records. Having that much data will just keep you from getting to the answer in a reasonable period of time. There are a number of standard ways people limit the data, the most common being some kind of a date filter. Using a date range though often times does not provide the variability needed. This is a particular problem with data used in a machine learning experiment it is designed to create an algorithm based on data pattern extrapolated over time. For example if you are doing any kind of regression analysis on a retail client and you either exclude or include the Christmas shopping season, the algorithm created will not be correct. The goal is to have less data for analysis pulled from the entire set of data. Fortunately SQL Server since 2005 has several methods for selecting random data

Using TABLESAMPLE

Until recently, I hadn’t used the Transact SQL TABLESAMPLE clause, but I ran into a situation where I needed to test some things and not wait all year for a result. TABLESAMPLE to the rescue. There are a couple of things where it won’t work. If you have derived tables, tables from a linked server (bad idea), or are writing a view and you want to return random data you can’t use TABLESAMPLE. If those conditions do not apply, you can use it.

The random sample that TABLESAMPLE provides is based on the number of data pages, not the number of records. If you want the number of rows to be specifically limited you will need to use Top(n) as well. I’ve written all my samples based upon AdventureWorksDW so you can run them for yourself later. I’ve listed the variety of ways to call TABLESAMPLE and shown the number of records returned.

SELECT *
FROM [dbo].[FactInternetSales] –60398 Rows returned prior to Table Sample
TABLESAMPLE(10 PERCENT)

(6073 row(s) affected)

Let’s say you want to return the same sample set multiple times. For that you will need some value. I picked 11, but you could pick any other you like.

SELECT *
FROM [AdventureWorksDW2014].[dbo].[FactInternetSales]
TABLESAMPLE(10 PERCENT)
REPEATABLE (11)

 (6489 row(s) affected)

When looking at the number of records returned, the values are not 10 percent exactly or particularly consistent in the number of rows returned. If you only want 6039 records returned, you can try the following code, but it doesn’t really do what it says.

SELECT *
FROM [AdventureWorksDW2014].[dbo].[FactInternetSales]  
TABLESAMPLE(6039 ROWS)

 (5640 row(s) affected)

This code will provide the exact number of records desired. Noticed I upped the number of rows returned in order to get 6039 rows. If the sample is 6039 you cannot guarantee that you have enough rows returned.

SELECT top 6039 *
FROM [AdventureWorksDW2014].[dbo].[FactInternetSales] --60398
TABLESAMPLE(6200 ROWS)

(6039 row(s) affected)

Really Random Sample

RandomSampleTABLESAMPLE has some really useful functionality, but for machine learning I need a truly random sample, which TABLESAMPLE does not provide. For that I need NEWID(). The following sample returns approximately 1% (.01) of the 60398 rows.

Select * from [AdventureWorksDW2014].[dbo].[FactInternetSales]
Where 0.01>= Cast(checksum(newid(), [ProductKey]) & 0x7fffffff as float) / Cast(0x7fffffff as int)

 

Just for fun I ran the same code 3 times and got a variety of rows returned.

(600 row(s) affected)
(607 row(s) affected)
(622 row(s) affected)

The ProductKey is added so that the NEWID() function will calculate a sample for each row. The WHERE statement calculates a random float between 0 and 1. This will truly give me the random sample I would need for a machine learning experiment.

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur