Power BI and R Links

I presented Power BI and R today, and for those who were not able to attend, when available my engagement page will have the recording as all of my recorded presentations are always listed

Links for Ropensourcerlogo

There were a lot of links used both for R and for R and Power BI integration. All items needed are included here as well as a brief description.

Comprehensive Resource Archive Network [CRAN] is where one can download Open Source R, packages and contains lots of information about R.

Microsoft R Open which is a fully CRAN compatible version created using the Intel MKL for improved performance can be downloaded here.

Microsoft R Client used with SQL Server 2016. R Server is included in SQL Server 2016. The R Client is needed to connect to R Server even if they are on the same computer.

R Studio is the most popular IDE for developing R.

Visual Studio R Tools are required for Visual Studio 2015 to become and R development environment

CRAN Package List from Microsoft not only contains the list of all of the packages, but the ability to go back in time to look at previous versions of the packages

Power BI Custom R Visuals are found in a tab separate from the other Power BI Visuals on this page. There are six visualizations available.

CRAN R Library for Forecasting model is needed for Power BI Visualizations and the download is required for this visual.

I hope that you found this one stop location for everything you need to use R helpful. For more information on how to use these tools, please take a look at all of my R posts where I describe how to use them.

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

Data Hierarchies and Drill Through in Power BI

Hierarchies provide a method of organizing data in a table to recognize that one value encompasses all of the values beneath it. One very common hierarchy is a date hierarchy, which is used to show data summarized by year, then all the values for the quarters, then the values for three months in each quarter and each month which at the lowest level includes all the values for dates. There are other hierarchies which may also exist in data, such as sales regions. A sales region could include countries which include states or provinces which include cities which include actual addresses. Because this is how data is categorized, visualizations need to reflect this organization by containing hierarches.

Creating a Data Hierarchy in Power BI

Finding where to create hierarchies is the hardest part of creating them in Power BI, especially if one has ever created datahierarchiespowerbihierarchies in Excel Power Pivot as they are not it the same place. Hierarchies are not in the Relationships data view, instead they are found in the Report view. Right clicking on the ellipse next to any field in a table displays a menu, and the second item on the menu is New hierarchy. Hierarchies can also be created by clicking and dragging a field on top of another field, which also will create a hierarchy. Once the hierarchy has been created, to add another field to the hierarchy, drag a new value on top of the value with the hierarchy icon. If the value added is not added to the location you want it, click on the ellipse next to the field named and move the field up or down as you wish.

Drill Through Reporting in Power BI

There are two ways to do drill through in Power BI, either by adding fields to the group section on a visualization or by adding a hierarchy as a field. If I create a date hierarchy and add the hierarchy to the axis of a bar chart visualization, the top Right and left corners will have arrows in them. Drilling down and back up occurs by selecting either the up and down Arrow keys, then clicking on a bar.powerbibardrilldown

For example, if the down arrow in the left corner is selected, clicking on the bar for 2013 will show data for the 2013 quarters. Repeated clicking will provide data down to the month, then the days in that month. The double arrow buttons provide the ability to show the data in the next level slightly differently. Clicking on the left double arrow will drill down to the next level. Clicking on it would provide a bar visualization for 4 quarters with data from all years, then 12 months of the combined years then the days. The double arrow with the line connecting the arrow only works on the highest level of visualization. When the double arrow line button is selected initially, it will show the quarters listed for every year, then the months and years, then the day, month and year. I hope you found this post helpful in explaining some of the features natively included in Power BI. If you are interested in learning how to expand the visualization capabilities of Power BI by including R visuals, please attend my upcoming webinar.

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

 

 

DIY – Using Custom Power BI R-Powered Visuals

In a recent post I introduced using R custom Visuals in Power BI, and this post details how to use the correlation visual in a Power BI Report. The first step in the process is to download R, if it is not on the computer already. if SQL Server 2016 with R integration is installed, there is no need to download R, as it was installed already. If the computer does not have R installed, click here. Once R is installed, go to the Power BI Custom Visualization page and select the R tab to pick one of the six R visualizations. I picked the correlation plot. To use the R visualization, if the machine does not already contain the packages used in the visualization, as screen will prompt you to install some packages also. This may take a little while to do and when it is complete, a window will appear showing the packages were successfully installed. Now the custom Visual can be used in Power BI.

Using the Correlation Visualization in Power BI

Power BI will show the new visual which you can place on the report. It is important to understand a little about R to better understand the error messages received. For example if you are using the fcoorelationplotorecasting tool and have selected the year value instead of the date, you may receive an error about an invalid time series. The underlying code is expecting to receive a date value, and a year is not a date, meaning you have to reference a date field in order to make it work.

Differences between R Visualizations and other Power BI Visualizations

Interacting with R visuals works differently than with other report visualizations as you cannot click on elements within the visualization and filter other items on the page. Other visuals on the page will filter the data contained within the R visual. For example, let’s say my report contains a total field, a slicer which contains years and a correlation plot which contains products. If the slicker is changed to select a year, total field and the data within the R visual will change to reflect that. If on the other hand, I choose to click on the R visual to select one of the product categories, the total field will not change and the R visual will not change. The R visual’s appearance will not change in any way.

nointernetrmessage

One interesting thing to note is if you have created an R visual in Power BI and are working without an internet connection, the report will throw an error when trying to open the report. This occurred when it was a report I created or a sample report, so it appears with these reports an internet connection is required.

 

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

Downloading Power BI Reports and the Impact on Report Creation Process

savepowerbireportWhile the official announcement has not been released, on November 24th the Power BI feature to Download report became available. It is now possible to download Power BI Reports from the Power BI Service. To access the feature, click on the dropdown file menu of a report, and there you will see the option to download a report. If your report was not created recently, this feature may not be enabled. If the report was created November 24th or later, the download menu item should be enabled. After the download option is selected, a message pops up letting you know the report is being downloaded. The message is accurate as it took a good three plus minutes for the report to be downloaded.

savepowerbirptmessage

There are a couple of points worth mentioning regarding the downloading feature. Downloading is for reports, not dashboards or datasets. **UPDATE: on 11-29 Microsoft released the information about the download feature. It can be turned off if on the Admin Portal exporting data is turned off. Unfortunately there isn’t a way to just turn off downloading the reports, as that one option controls both downloading reports and data.

Impact of Report Downloading on Templates and Content Packs

Power BI’s report download feature has the ability to modify the development process in Power BI as it is now possible to use Power BI Desktop and provide the ability to create report versioning. Unfortunately, a report created on the service, cannot be downloaded as only reports which originated as desktop files can be downloaded. This is a problem as these files cannot be sent to SharePoint Online to provide versioning. One has to hope no one broke a report created on the service as there is no method to rollback changes to a previous because no file exists. This limits the use of Organizational Content Packs. A Content Pack is designed to be a starting point for report development as it contains everything needed to create a report, including a connection to a dataset, plus dashboards and perhaps a report or two to serve as a style guide. Content Packs are accessed by the arrow at the bottom of the Power BI Menu or by selecting Get Data, then selecting My organization to use content packs created for local data access.

Now that anyone can use a report which has been uploaded to the service as a starting point for a new report, there may be a decreased use of the template feature. Any report created on the desktop can be saved as a template, by selecting save on a desktop report file and changing the file type to a template. Unfortunately, templates do not contain links to the datasource used. The person creating the report must determine what data to use and if there was a dataset presently used which is refreshing the data, or create a dataset and it’s respective refresh features as part of creating a report.  Content packs provide the connections to datasets, but since the reports cannot be saved as a file for versioning, this feature is not often used instead of templates. Downloading the file and then modifying it is does resolve the issue as the starting point is then a working report with a connection to an existing dataset.

Personally I have been waiting for Microsoft to release the download feature for a long time and am very happy to be able to use it. I hope that an some point the download feature will be expanded to reports created on the service as this feature will  improve the report creation process.

 

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

Admin and Auditing in Power BI

On October 24, 2016 Microsoft has created a new access right in Office 365, the Power BI Admin. This means that no longer does one have to be a Office 365 Administrator, a role with a ton of rights which have nothing to do with Power BI to be a Power BI Administrator. To be a Power BI Adminstrator, the Office 365 Adminstrator needs to run a PowerShell script to make a user a Power BI Administrator. Just change the email address and run this script.

Add-MsolRoleMember -RoleMemberEmailAddress “user@useremailaddress.com" -RoleName "Power BI Service Administrator"

After running this script you will almost have everything you need to be a Power BI Administrator. The one thing that the newly minted Power BI Administrator cannot do is review the Audit Logs.

Audit Logs in Power BI

The Audit Logs are the third menu item in the Power BI Admin Portal. As you can tell by looking at a copy of the screen below, Audit Logs are not really part of Power BI. Yes the ability to log all of the content in Power BI exists in the Audit Logs, but so does the ability to review the audit logs for things like Exchange Mailbox Activities and User Administration Activities.

Power BI Admin Screen

If the Office 365 Administrator has granted a user Power BI Administration rights, this is what the newly minted Power BI Administrator will see when trying to access any search activities. It appears that you the user has rights, until that user tries to do anything on the screen. At that point, this error window appears.

error-message-for-power-bi

Granting a Power BI Administrator the ability to view Power BI Audit Logs

To view audit logs, additional Office 365 access features are required. I did not figure out a way to turn on the audit feature unless one already was an Office 365 Administrator. If you want to view the audit logs, the first step should be to find out if they are turned on, because the steps I am going to show won’t do that. Within the Office 365 Admin center, go to the Exchange admin center and select permissions.

office365portal

Select the user and then add two roles Compliance Management and Organization Management. It is also possible to create a new role with Compliance Manager rights and add the Audit Logs and View Only Audit Log permissions, since those are the required permissions. None of this has anything to do with Power BI, which means when the permissions granted, users will have the ability to see all of the audit logs. After the new role permissions have been applied, the error message will go away and users will have the ability to see activity logged in Power BI as well as all of the Power BI Administration features.

 

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

Using Power BI to Display R Visuals

 Microsoft has in the past few months increased the support for R Visuals in Power BI. In the July release, Power BI Service started displaying interactive visuals and in October, six R visuals were made

rvisualsavailable in Power BI which do not require any knowledge of R. Six visuals is a far step away from the wide number of visuals available in R. According to the Comprehensive R [CRAN] as of today there are 9557 packages available for R, and that does not count all of the packages on GitHub.

R Packages Supported in the Power BI Service

According to a September 8th post on the Power BI Blog, there are currently 352 R packages supported in the Power BI Service. For those of you wondering if your favorite R library is included (yes dplyr, ggplot2 and cluster are there) I have included the list of packages here.

abc abc.data acepack actuar ade4 AdMit andrews
aod apcluster ape aplpack approximator arm arules
arulesViz assertthat BACCO BaM BAS base64enc bayesm
bayesQR bayesSurv BayesTree BayHaz bbemkr BCBCSF BCE
bclust BenfordTests bisoreg bit bit64 bitops BMA
Bmix BMS0 bnlearn, boa boot bootstrap bqtl
BradleyTerry2 brew brglm, bspec bspmma BVS C50
calibrator car caret, catnet caTools cclust class
clue cluster coda coin CORElearn corpcor corrplot
crayon cslogistic cubature cvTools data.table date dbscan
deal deepnet deldir dendextend DEoptimR deSolve DiagrammeR
dichromat digest dlm DMwR dplyr DPpackage dse
dtw e1071 earth EbayesThresh ebdbNet ellipse emulator
ensembleBMA entropy EvalEst evaluate evdbayes exactLoglinTest expm
extremevalues FactoMineR factorQR faoutlier fBasics fields filehash
fitdistrplus flashClust FME fmsb foreach forecast Formula
fracdiff fTrading futile.logger gam gamlr gclus gdata
gee genetics geoRglm geosphere ggdendro ggmcmc ggplot2
ggthemes glmmBUGS glmnet gmodels gmp gnm GPArotation
gridBase gridExtra growcurves grpreg gss gsubfn gtable
gtools haplo.stats hash hbsae hdrcde heavy HH
HI Hmisc HSAUR ifultools intervals ipred irlba
irr iterators jpeg kernlab KernSmooth KFKSDS kinship2
kknn klaR knitr labeling lars lattice latticeExtra
lava lavaan lazyeval leaps LearnBayes LiblineaR LICORS
limSolve lme4 lmm lmtest locfit locpol LogicReg
lsa lubridate magic magrittr mapdata mapproj maps
maptree MASS MasterBayes Matrix matrixcalc MatrixModels maxent
maxLik mboost mda memoise mi mice microbenchmark
mime misc3d miscTools mitools mixtools mlbench mnormt
MNP modeltools mombf monomvn MSBVAR msm multcomp
munsell mvtnorm NbClust ncvreg nlme nloptr NLP
NMF nnet nnls nortest numbers numDeriv OutlierDC
pacbpred party partykit PAWL pbivnorm pcaPP pdc
PerformanceAnalytics plotmo plotrix pls plyr png polynom
predmixcor PresenceAbsence profdpm proto proxy pryr pscl
psych qap qdapRegex quadprog quantreg qvcalc R.oo
.utils ramps RandomFieldsUtils RColorBrewer Rcpp RcppArmadillo RcppEigen
registry reliampo relimp reshape reshape2 RGraphics rjson
RJSONIO Rmpfr rms robustbase ROCR rpart.plot rrcov
rscproxy RSGHB RTextTools rworldmap SampleSizeMeans SampleSizeProportions sbgcop
scales scatterplot3d sciplot segmented seriation setRNG sfsmisc
SIS SixSigma skmeans slam smoothSurv sna snow
SnowballC snowFT sp spacetime spam SparseM spatial
spBayes spikeslab splancs spls spTimer sqldf sROC
stabledist stabs stepPlr stringdist stringr stsm stsm.class
survival tau tcltk2 tensorA textcat textir tfplot
TH.data tidyr timeDate timeSeries tm topicmodels tree
tseries tsfa tsoutliers TSP vcd VennDiagram VGAM
VIM whisker wordcloud xgboost XML xts yaml
zipfR zoo

 

 

I hope that some day that this list becomes much longer, but it is a good start. If your company has lots R visuals and you wish to migrate them to Power BI, chances are some of the libraries you are using are not here. If you are interested in having your library added to the list of 352, go to the Ideas page of Power BI and request that your library be added, as Microsoft I know looks at this page to determine what to release in the future. Someone has requested that igraph be added, and since it hasn’t received a lot of votes yet (hint) it is probably low on the priority list.

Release Plans for Additional R Library Support in Power BI

In reviewing the Power BI documentation online, it is not known what Microsoft’s plans are for future R packages. Does Microsoft plan on adding a lot more in the future? Will new libraries be added as part of upcoming releases? All questions where the web presently does not provide answers.

The current package support in Power BI does provide the R developer with a method to more securely and widely distribute R visualizations. In my upcoming posts I will review how to use the six R visualizations which do not require R and also how to use R with Power BI.

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

Questions and Answers on Machine Learning with R

opensourcerlogoRecently, I did a webinar on Machine Learning and R. I received a number of questions during the presentation. Due to time constraints, I was unable to answer all of them, so I have provided the Question and Answers here.

Question: Can I Use R in SQL Server to plot non-linear regression curves? We use IC50 and others in Michaelis-Menten kinetics for bio-chemical work.

R running on SQL Server provides the functionality of standard CRAN R packages with the additional capability to run the SCALER functions provided by SQL Server’s implementation of R. Any other functionality performed in R can therefore also be performed on SQL Server. Like all R code, you will need to install the required R libraries in the appropriate subdirectory. The directory is something like C:\Program Files\Microsoft SQL Server\MSSQL13.<instancename>\R_SERVICES\library

Question: Can you execute/invoke an external R script from T-SQL?

Yes. With SQL Server 2016 it is possible to run an R script from within T-SQL by using the sp_execute_external_script command. This link should help you with more detailed information on how to execute R scripts within T-SQL

Question: Is it possible to run R processes in diffrent boxes other than SQL Server itself for scalability reasons?

You have the option of installing the R Server on another server. Just keep in mind that you do have to account for the additional overhead of moving all the data over the network, which needs to weigh in on your decision to move processing to a different server.

Question: Can we join data generated from an R script to SQL data directly or does it have to be inserted into a table first?

Data generated from an R script interacts with the data the same way other data in a stored procedure does. It is not necessary to write the data to a table for the data to be processed.

Question: When would you use R Server versus R on SQL Server?

R on SQL server is an implementation of R server. When running R on SQL Server, R Server is running.

Question: What is the maximum file size I can load on R server?

R Server uses SCALR which provides the ability to not only use memory but also use disk storage, providing nearly limitless capability for file size processing.

Question: Is knitr integrated or can it be integrated into R for SQL Server?

While you can install the knitr library on SQL Server to implement the functionality provided, there is no direct integration path for incorporating the wide functionality knitr provides within SQL Server. For more information about knitr, please see this link.

Question: Does MS R Open have all the functionalities of CRAN R? Does it support all packages?

MS R Open is fully CRAN R compliant. All code which runs in CRAN R can run in MS R Open and all packages are supported. Since MS R Open was rewritten to use the Intel Math Kernel Libraries, it provides technical advantages to CRAN R. MS R Open is partially Multi-threaded, instead of single threaded like CRAN R, and is up to 38% faster

Question: Do you have a link for the Microsoft Machine learning cheat sheet?

Sure. The link can be found here.

Question: Why Use Microsoft R Open over other Analytics Tools such as Open R, Python, Matlab?

Microsoft R Open is fully CRAN R compliant, and also multi-threaded and faster, providing a clear benefit over CRAN R. Python is a great tool for data analytics, but unlike R, it is not designed solely for statistically analysis but has a wider functionality scope. R is focused solely on providing statistical data analysis. Matlab is a great tool, but given its complicated user interface and high licensing cost, many users may gravitate towards R, as it is free and there is a lot of good support for learning R available online as well.

Question: Can R be used with older versions of SQL Server or only 2016?

While it is possible to use R with other versions of SQL Server using tools such as RODBC, the R Server and full integration of R in SQL Server is only available in SQL Server 2016.

Question: Do I need all R server and R client and Microsoft R Open to be installed to run R from SQL Server?

To run R on SQL Server, R Server needs to be installed. For more information on how to install R on SQL Server please see this link. To connect to the server and use the SCALR functionality, the client machine will need to have R Client installed.

Question: What are the benefits of Microsoft Standalone R Server?

R Standalone Server is available to connect to data stored in HDFS and Teradata in addition to SQL Server. To incorporate data from other data sources and provide the ability to process more data than one has memory, it might make a lot of sense to deploy a standalone R Server.

Question: Are there any memory limitations in R when handling big data?

R is designed to run in memory, meaning if you have more data than memory you may run out of memory. When running using the SCALR functions, memory and disk are used to provide nearly limitless ability to process data.

Question: Can one call R from Python?

While it may be possible to create Python code which calls R, I am unaware of the syntax. This functionality would not be supported in the context of SQL Server.

Question: Can you use R charts in SSRS?

There are two different ways to incorporate R charts in SQL Server. Since a stored procedure can provide and R visualization, it is possible to use SSRS to call a stored procedure which would create a graphic to be rendered in SSRS. Also since Microsoft announced the incorporation of Power BI with SSRS, and Power BI provides the ability to create R graphics, incorporating Power BI which renders R would be another way to use charges in SSRS

Question: Any classes like “R for the nervous DBA” that you can recommend? I have a statistician who is running R on my SQL server.

I do not know of any R classes specifically designed for SQL Server DBAs, but I am working on a post for additional configuration information which you might find helpful. Please check back in a few weeks for updates.

Question: What is a Pirate’s favorite language?

RRRRRR Mad Props to Phillip for sending me a pirate joke. I love pirate jokes. I feel compelled to tell a few when I am presenting R and I have been told all of mine are really bad.

If you have any additional questions about SQL Server and R or any pirate jokes, please comment on this post or send me a message via twitter @desertislesql.

 

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

 

PolyBase – Another Method for Creating a Stretch Database in SQL Server 2016

PolyBase, which was released with SQL Server 2016, provides another method to access live data either locally or in the cloud, very similar to the SQL Server Stretch database feature. Polybase can also provide the ability to provide a more cost-effective availability for cold data, streamlines on-premises data maintenance, and keeps data secure even during migration. Polybase differs from Stretch database in a few ways, as the SQL must be different, the speed is noticeably slower, and it is a lot less expensive. The cost is significantly less because storing data in a Azure blob store starts at 1 cent a month and Stretch database starts at $2.50 an hour. In this post,I will show how to take data which was archived due to the age of the data, which was created in 2012 and store it in an Azure Blob Storage file which will be available via Polybase when I needed.

Implementing a PolyBase Stretch Database

PolyBase is an optional feature of SQL Server 2016, and the Instance Feature PolyBase Query Service for External Data needs to be selected as part of the installation process. Two services are ssmsexternalinstalled with the feature, SQL Server PolyBase Engine and SQL Server PolyBase Data Movement. Both of these services must be running and TCP/IP must be enabled for PolyBase to work. Either check for those services or run the query SELECT SERVERPROPERTY (‘IsPolybaseInstalled’) AS IsPolybaseInstalled; which will return a 1 when PolyBase has been installed. The next step is to tell SQL Server what the external source is by configuring which Hadoop Connectivity will be used.


EXEC sp_configure ‘hadoop connectivity’, 7;
GO
RECONFIGURE;

To use Azure Blob Storage, there are 3 different options, 1,4 and 7, which include the ability to access blob storage. These options also allow for various other Hadoop engines to be accessed as well. I have selected 7, which also allows me to use Hortonworks on Linux. For more information on the connectivity options, check out this link. A secure database ke is required to make the connection a well. The next step is to let SQL Server know where exactly the blob storage data is stored, which creates an entry in the External Data Sources. PolyBase needs to know how the data is formatted, which will be stored in an External File Format. By definition, Hadoop data has no schema, so a schema is going to have to be created and stored in an external table. The location for each of these items is stored within SQL Server Management studio as shown here.

This code will create a secure key using a password I made up

CREATE MASTER KEY ENCRYPTION BY PASSWORD = 'P@ssW0rdPolybase'

A PolyBase will need the Azure Blob Storage key to be able to make a connection.

CREATE DATABASE SCOPED CREDENTIAL AzureStorageCred
WITH IDENTITY = 'user', Secret = 'lEGL66LiK2KE2U0WEb435PH15BwFOInrHqQ1AJigjoRVfEOrOge+TLbBNu861cqbC+a26io92o5sw/b+OTD/C4w=='; --Note: Replace with your Blob Storage key


CREATE EXTERNAL DATA SOURCE AzureBlobStorage with (
TYPE = HADOOP,
LOCATION ='wasbs://gingeriottest@gingertestiotblobstore.blob.core.windows.net', --This is the location of the folder inside blob storage containing my data
CREDENTIAL = AzureStorageCred
);

The data is stored in blob storage as a CSV, so I will need to create that file Format

CREATE EXTERNAL FILE FORMAT CSV WITH (
FORMAT_TYPE = DELIMITEDTEXT
,FORMAT_OPTIONS (FIELD_TERMINATOR =',', DATE_FORMAT='MM/dd/yyyy') )

Lastly, a table definition must be created for the file so that it can be accessed as a table

CREATE External TABLE OntimePerformance2012sampleall(
[DimAirlineKey] int,
[DimOriginAirportKey] int,
[DimArrivalAirportKey] int,
[DimCancellationReasonKey] int,
[DimDelayLengthKey] int,
[DimDepartureBlockKey] int,
[DimArrivalBlockKey] int,
[DimDistanceGroupKey] int,
[FlightDateKey] int,
[FlightNumber] varchar(50),
[ScheduleDepartureTime] int,
[ActualDepartureTime] int,
[DepartureDelayInMinutes] int,
[TaxiOutTime] int,
[TaxiInTime] int,
[ScheduleArrivalTime] int,
[ActualArrivalTime] int,
[ArrivalDelayInMinutes] int,
[ScheduleElapsedTime] int,
[ActualElapsedTime] int ,
[DistanceInMiles] int,
[CarrierDelayInMinutes] int,
[WeatherDelayInMinutes] int,
[NASDelayInMinutes] int,
[SecurityDelayInMinutes] int
)

WITH
(   LOCATION = '/OntimePerformance2012.csv',
DATA_SOURCE = AzureBlobStorage,
FILE_FORMAT = CSV
)

 

All of the external tables created can be found in the DMV sys.external_tables or by looking at sys.tables where the new column is_external is set to 1.  Once the setup of PolyBase is complete, I can query the external and internal tables all at the same time, like I do here.

 

SELECT count(*)
, r.cancellationReason
, d.CalendarYear
FROM dbo.CancellationReason r
JOIN dbo.OntimePerformance2012sampleall c
ON c.dimcancellationReasonKey = r.CancellationReasonKey
JOIN dbo.DimDate d
ON c.flightDateKey = d.DateKey
GROUP BY r.cancellationReason, d.CalendarYear

PolyBase Table Considerations

It is not possible to edit the data in PolyBase, as the data is stored read only. Also since the data is being pointed to externally, it cannot be indexed and querying it is not terribly quick. All of the data will be brought down and then filtered, unless PolyBase is configured to improve performance, which I will discuss in a future blog post.

PolyBase and Azure Blob Storage provides the ability to access data stored offsite in a secure location in a cost effective manner. If you saw my presentation on this topic at PASS Summit or have questions how to make it work in your environment, please contact me on twitter or write a comment on my blog.

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

Monitoring Reports for R on SQL Server 2016

Whenever I talk about R, in the question and answer period I get a lot of questions from DBAs on how to monitor the impact of running R code on SQL Server. Fortunately Microsoft recently created a series of custom reports for SQL Server which provide a lot of useful information on how R is running on SQL Server. The custom reports are available on github here in a folder called SSMS-Custom-Reports. There are 8 different reports: R Services – Active Sessions, R Services – Configuration, R Services – Configure Instance, R Services – Execution Statistics, R Services – Extended Events, R Services – Packages, R Services – Resource Usage, R Services – Script. If it has been a while since you have added a custom report to SQL Server, this link should help you get the reports installed.

R Services Reports

The titles for the reports provide a good indication of what they do. Active Sessions will show how many users currently have their compute context set to run on the server. Configuration shows the current R Configuration, which is very useful information for checking on the R components on the server.

rconfigrpt

The report R Services – Configure Instance is for configuring R for the first time. If you have R setup already, chances are you won’t need this report, as it is really not a report but a script you need to run to configure R. R Services – Execution Statistics is a really neat way of monitoring the ongoing use of R. It keeps track of every time R is run on the server. If you run an R job then refresh the report, the execution count and time will go up.

sqlserverrrpt

R Services – Extended Events is also not a report but a list of all the extended events that are available for R Services. This is a handy bit of information, which can be a great reference tool for extended events monitoring. R Services – Packages lists the packages which are currently installed on SQL Server. When people write R, many lot of different packages are used within the script. Prior to running a package, check the information on this report to ensure the libraries used are installed on SQL Server. If the library is missing the code will not work. R Services – Resource Usage is a great way to see how R has been configured to run on the server. Notice I have created an External Pool for R. This is a configuration recommended by Microsoft to better monitor your R Services.

resourceusagerpt

The last report R Services – TSQL Script is a bit of a mystery. The screen contains this line of text. This is the T-SQL script for generating the data in report: No Report . I don’t know what was the intent of the report. The Readme.md in github provides this documentation about the reports.
SSMS Custom Reports

Custom reports for SQL Server R Services built using SQL Server Report Builder. The reports can be added to SQL Server Management Studio to view various configuration and runtime information about R scripts execution in SQL Server.

That doesn’t provide any clues to me as to what the intent of the report is, so you may want to skip installing it altogether. If anyone does figure out how this report can be used, please drop me a line as I am curious. I was really excited when I found these reports and I think that they provide a lot of valuable insight for monitoring R on SQL Server 2016.

 

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

Machine Learning with R

While there are a number of different Applications designed to implement Machine Learning, such as Azure Machine Learning, Matlab and Octave, a specific package to perform Machine Learning is not required.  The algorithms used to generate machine learning experiments, can be applied in other languages, such as R.

Machine Learning Algorithms

machinelearningLearning is often described as a method of applying rules to situations. “Don’t put your finger on the stove.  The stove is hot and will burn you”.  A child can extrapolate this to irons, fire and other hot things after being told about stoves.  Computers process learning a little differently, by applying rules or algorithms to data to determine a result.  A great example of this was the Kaggle competition to determine from looking at a picture, which picture was a cat, and which picture was a dog. The computer reviewed a number of different pictures where there was a label on the picture, indicating that it was a cat or a dog and applied those rules where the pictures were not labeled.  The winning algorithm was right 98.914% on identifying dogs and cats.  Sorting pictures into groups is a classification function, one of the common functions used in Machine Learning. Other popular functions include anomaly detection, regression and clustering.  Once experiments are created, there are a number of different methods used to determine their effectiveness, such as the Receiver Operating Characteristic [ROC] graphs or a Confusion Matrix.

Algorithm Determination

Often times determining which algorithm to use can take a while.  Here is a pretty good flowchart for determining which algorithm should be used given some examples of what the desired outcomes and data contain. The diagram lists the algorithms, which are implemented in Azure ML.  The same algorithms can be implemented in R.  In R there are libraries to help with nearly every task.  Here’s a list of libraries and their accompanying links which can be used in Machine Learning.  This list is no means comprehensive as there are libraries and functions other than the ones listed here, but if you are trying to write a Machine Learning Experiment in R, and are looking at the flowchart, these R functions and Libraries will provide the tools to do the types of Machine Learning Analysis listed.

Drawing ROC CurvesROCR

Anomaly Detection

Regression

There is a really good list of all of the R regression functions here

Clustering

Binary Classification

Multi-Class Classification

 

Applied Machine Learning

Hopefully this list of R libraries will help you apply machine learning to data within R. To see how R can be used in Machine Learning, please join me on my upcoming webinar on Machine Learning with R and SQL Server 2016  where I will show how an R program can be created and applied to a production environment.

 

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur