Syncing Contact Attributes to B2C Users for PowerApp Portals with Cloud Flows

Note: This article is goes through the process from a Logic app perspective, but in my case I needed to do it in Flow, thanks to the original author.

If you ever used the Power App Portals with the most common authentication mechanism (Azure AD B2C) then you may have come across the problem of portal users frequently changing their email address or contact information or less frequently changing their names from within the portal or by changing these attributes directly in the model driven app by backend users.

The problem with Azure AD B2C user directory is that it doesn’t respect those changes, meaning that those changes don’t reflect back from Dataverse into B2C. The common solution has been custom code. We write a plugin that calls the Graph API to update Azure AD B2C directory. I will follow a similar approach but using a Cloud Flow instead of the plugin, this of course will make any future changes easier and we can add some cool things like retry logic if the request fails or email notification to the administrator etc.

Assumptions:

  1. I assume that you have a portal ready and using Azure AD B2C as its authentication mechanism. If you don’t, check out this very new feature on how to set that up using a wizard.
  2. I also assume that you have administrative privileges to add and configure an application registration in the B2C tenant.

The Solution:

We will build a daemon flow that runs on the update of the contact entity (portal user is a contact). This Flow will execute an HTTP request to the Graph API to update that user record in B2C with the new values from Dataverse. Before building the flow, we need a way to authenticate the flow, best way to do that, is to create an app registration that takes care of the authentication and use that app registrations in our flow.

Part 1: Creating the App Registration

Since we don’t have handy connector for the Graph API that does what we want, we need to issue HTTP requests to the API. To issue these HTTP requests, we need to authenticate with the Graph API. Instead of hardcoding a username and password inside the flow, we will use the Application Registration approach.

Navigate to portal.azure.com and open the B2C directory that you use to authenticate the portal. (It is important that you are in the B2C directory and not your default one). Make not of the B2C Directory name as we will need it later. Search for the Azure AD B2C directory and open it.

  1. Under App Registrations, click New registration.

2. Give your app a name and make sure to select the last option as we are already using user flows to authenticate portal users who mostly will be from outside our organization. Click Register when done.

3. After creating the Application, take note of the Client ID and Tenant ID that appear in the application registration overview page, we will need them later.

4. We need to create a secret for this app as the flow needs it for the HTTP request authorization header. Click on Certificates & Secrets from the left pane, under the Client secrets, click New client secret. Give the secret a name and an expiry date and click Add. When the secret is added, you have one chance to copy its value as it will be hidden from you if you navigate from this page. So, for now, we have the B2C Directory name, Client ID, Tenant ID, and the Secret values saved on the side.

5. This app is almost ready, we just need to fatten its permissions to make it capable of managing some aspects of the B2C tenant that will allow us to solve our problem. Click on the API permissions that you see in the left pane of the previous image and then click on Add a permission.

From the dialog on the right, choose Application permissions because our Flow will be a background process and choose the following permissions ( you can search by permission name):

User.ManageIdentieis.All, User.ReadWrite.All, Directory.Read.All

When you add these permission, make sure to click on “Grant admin consent for YOUR-TENANT_NAME”. Now we are already, let’s move to the Flow side.

Part 2: Creating the Flow

The flow is really simple, use the current environment trigger and configure it to run on update and scope it to organization. All these variables will hold the values we collected previously, you should have everything handy by now except for Auth Audience URI with should be https://graph.microsoft.com. Since we only want this flow to work for contacts who are actually portal users, I’m checking the User Name field on the contact entity. In Dataverse, the contact record has a field called User Name which hold a GUID that represents its Azure AD B2C ID.

Now, for the HTTP action, make sure that the method is PATCH since we will be only updating pieces of the user record. For the URI, you need to use https://graph.microsoft.com/v1.0/users/{Contact Record User Name}.

For the Headers, we need the Authorization header and the Content-Type header. The Authorization is basically the secret we got from the previous steps. The Content-Type is hard-coded as application/json.

We are not done yet with authentication, if you click on advanced settings at the bottom, you need to configure the authentication as shown below using the same variables we collected before.

Now the last part, the body of the PATCH request. In my case, I’m interested in the Email (mail) , First Name (givenname) and last name (surName) because those are the only fields I mapped when I configured the B2C authentication for my portal. This means that if the contact changes their name or email, the corresponding B2C user will get those changes in few seconds as well. For more info on the schema of the JSON you see above in the body field and if you are interested in updating other fields, refer to this documentation.

That’s it! We are now good to go, go and an update a portal user email, first name or last name in the portal profile page or directly in the Model-driven app. In few moments, the changes should reflect in the B2C directly. Here is a very short demo of how this works:

Below is an example video of how this works:

Mount an Azure Data Lake Gen 2 In Azure Databricks Using a Service Principal

Mounting an azure data lake in azure databricks can be done in several ways. But the most secure and recommended way of doing it is through a service principal that has its credentials stored in an azure key vault. In my attempt to set that up, I found information scattered in many different articles. So, I decided to do this blog to discuss everything you need to get that up an running.

More specifically, I will discuss the following topics:

1- Create a secret scope in azure databricks that is backed by an azure key vault instance.
2- Setup permissions for your service principal on the data lake account.
3- Store credentials necessary for your service principal in a key vault.
4- Build a function to mount your data lake.

Assumptions:
This article assumes the following:
1- That you have already created a service principal and know the id and the secret key for that principal.
2- That you already have the data lake gen 2, key vault and databricks resources already created.

1- Create a secret scope in azure databricks that is backed by an azure key vault instance

First thing first, we need to authenticate azure databricks to the key vault instance so that it is at least able to read/list the keys in that key vault. To do that through the UI, you need to go to the following URL:
https://<databricks-instance>#secrets/createScope

See example below of how this Url will look like (please make sure that you use your databricks instance that appears in the Url when you launch databricks from azure portal:
https://adb-XXXXXXXXXXXXXXXX.X.azuredatabricks.net/#secrets/createScope
Please note that this Url is case sensitive. The s in scope should be capitalized as shown above.

Once you launch this Url you will see a screen as shown below. You will need to enter the following information:
– The scope name (any name of your choice). Let’s call it key-vault-secret-scope for now. Remember this name as we will use it later in our mounting function.
– DNS Name and Resource ID for your azure key vault instance.

The DNS Name and Resource ID are available under the properties window in your azure key vault resource. From there, you need to use the Vault URI as your DNS name and Resource ID for Resource ID as shown below:

You can verify success of this by going to your key vault instance –> Access policies –> Verify that AzureDatabricks instance is added with Get/List permissions granted for secrets (in an earlier version, this strategy was granting databricks 16 extra permissions that it didn’t really need. So make sure that you only grant it get/list permissions on secrets, or more than that only if required).

2- Setup permissions for your service principal on the data lake account

Here there are also two ways of doing this. This can either be setup through a role based access control (RBAC) or through access control list (ACL). I will discuss the RBAC for the purpose of simplicity.

Ideally, you want to grant your service principal the ability to read/write/delete data on the data lake storage account. For that, you can grant it Storage Blob Data Contributor. This should grant you enough permissions to manipulate data and containers in the storage account. This can be done by going to your storage account –> go to Access Control –> select add –> and then add role assignment –> select the storage blob data contributor role and then select the service principal that you need to grant access to.

3- Store credentials necessary for your service principal in a key vault.

To avoid hardcoding your service principal information in databricks code, it is highly recommended that you store the information in your key vault. This is particularly important to do for the client secret key for your service principal. So for the purpose of this exercise let’s just create a key called sp-appid to store the application id and a key called sp-secret to store the service principal client key. Your key vault would look something like this

4- Build a function to mount your data lake.

Finally, we are here. Once you have all that setup, you should be ready for the databricks function. Your function should look like this:

def mount_data_lake(mount_path, 
                   container_name,
                   storage_account_name,
                   service_principal_client_id,
                   azure_ad_directory_id,
		   service_principal_client_secret_key):
   """
     mount_path: the path that you want to mount to. For example, /mnt/mydata
	 contianer_name: name of container on data lake. 
	 storage_account_name: name of storage account that you want to mount. 
	 service_principal_client_id: application (client) id for the service principal 
	 azure_ad_directory_id: tenant id
	 service_principal_client_secret_key: the secret key for your service principal 
   """
  configs = {"fs.azure.account.auth.type": "OAuth",
             "fs.azure.account.oauth.provider.type": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
             "fs.azure.account.oauth2.client.id": service_principal_client_id,
             "fs.azure.account.oauth2.client.secret": service_principal_client_secret_key,
             "fs.azure.account.oauth2.client.endpoint": "https://login.microsoftonline.com/" + azure_ad_directory_id + "/oauth2/token"}
  try:
    storage_account_url = "abfss://" + container_name + "@" + storage_account_name + ".dfs.core.windows.net/"
    dbutils.fs.mount(
      source =  storage_account_url,
      mount_point = mount_path,
      extra_configs = configs
    )
  except Exception as ex:
    if 'already mounted' in str(ex.args):
      print(f"Mount: {mount_path} is already mounted")
    else:
      raise Exception(f"Unable to mount {mount_path} on {storage_account_name}. Error message: {ex}")

And now you can simply call your function on any storage account/mount point that you like. For example:

service_principal_client_id = dbutils.secrets.get(scope="key-vault-secret-scope", key="sp-appid")
service_principal_client_secret_key = dbutils.secrets.get(scope="key-vault-secret-scope", key="sp-secret")
azure_ad_directory_id = "your tenant id"
storage_account_name = "name of your storage account"
container_name = "name of your container"
mount_path = "your mount point"

mount_data_lake(mount_path, 
                   container_name,
                   storage_account_name,
                   service_principal_client_id,
                   azure_ad_directory_id,
		   service_principal_client_secret_key)

Well that’s it. Congrats! you are ready to start reading/writing to/from your data lake gen 2 by using this mount point. For example, if you named your mount point as /mnt/mydata if under that container that you mounted there is a folder called MyFolder that has a file called MyFile.csv, then you can read the csv file like this

file_path = '/mnt/mydata/MyFolder/MyFile.csv'
mydata = spark.read.format('csv').options(header = 'true', inferSchema = 'true').load(file_path)

Hope this helps. Please let me know in the comments below if you have any question.

Run Same Databricks Notebook for Multiple Times In Parallel (Concurrently) Using Python

In this blog, I would like to discuss how you will be able to use Python to run a databricks notebook for multiple times in a parallel fashion. Noting that the whole purpose of a service like databricks is to execute code on multiple nodes called the workers in parallel fashion. But there are times where you need to implement your own parallelism logic to fit your needs.

To follow along, you need to have databricks workspace, create a databricks cluster and two notebooks. The parent notebook orchestrates the parallelism process and the child notebook will be executed in parallel fashion. The idea would be that the parent notebook will pass along a parameter for the child notebook and the child notebook will use that parameter and execute a given task. Without further to say, let’s get to it.

For simplicity let’s design a child notebook that takes a number as an input and then print the multiplication of this number by 10. Also, to make sure that we test our parallelism logic, we will introduce 20 seconds sleep time for our child notebook. So open up your child notebook and enter the following code in Command 1 (this code will help us pass a parameter from the parent notebook).

dbutils.widgets.text("numberToProcess","","")
dbutils.widgets.get("numberToProcess")
numberToProcess = int(getArgument("numberToProcess"))

Open up a new command in child notebook and enter the following code which will calculate the 10 multiplier for our number of interest, introduce a sleep time of 20 seconds and then print the output

import time
outputNumber = numberToProcess * 10
time.sleep(20) # sleep for 20 seconds 
print('The desired output is {}'.format(outputNumber))

So your child notebook will look something like this

Now that you have your child notebook setup, go to your parent notebook and paste the following code to import the multithreading packages

from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor

Then we define the function that will execute a child notebook passing in the number parameter

def processAnIntegerNumber(numberToProcess):
  dbutils.notebook.run(path = "/Shared/ChildNotebook",
                                        timeout_seconds = 300, 
                                        arguments = {"numberToProcess":numberToProcess})

Finally, the magic command, which will take a list of numbers, spin up a multithreading executor and then map the list of numbers to that executor. This command will execute the childnotebook instance for 5 times (because the list of numbers contains five numbers) in a parallel fashion.

listOfNumbers = [87,90,12,34,78]
with ThreadPoolExecutor() as executor:
  results = executor.map(processAnIntegerNumber, listOfNumbers)

Executing the parent notebook, you will notice that 5 databricks jobs will run concurrently each one of these jobs will execute the child notebook with one of the numbers in the list. This is a snapshot of the parent notebook after execution

Notice how the overall time to execute the five jobs is about 40 seconds. Now, if we open the link to any of these jobs, we will notice that the time was also about 32 seconds indicating that jobs were run in parallel

Hope this blog helps you run your jobs faster and satisfies your “Need for Speed”.

You can follow this blog for Machine learning, data engineering, devops, dynamics and power apps tips and tricks.

Mount a Blob Storage in Azure DataBricks Only if Not Mounted Already (Using Python)

As discussed in this article by Databricks that during your work in a notebook, you can mount a Blob Storage container or a folder inside a container to Databricks File System. The whole point of mounting to a blob storage container is simply to use an abbreviated link to your data using the databricks file system rather than having to refer to the whole URL to your blob container every time you need to read/write data from that blob container. More details on mounting and its usage, can be found in the articles referenced above.

The purpose of this article is to suggest a way to check if the mountpoint has been created already and only attempt to create it if it doesn’t exist using python.

This can simply be done if we knew how to list existing mountpoints using python. Luckily, databricks offers this to us using the dbutils.fs.mounts() command. To access the actual mountpoint we can do something like this:

for mount in dbutils.fs.mounts():
  print (mount.mountPoint)

Knowing how to access mountpoints enables us to write some Python syntax to only mount if the mountpoint doesn’t exist. The code should look like the following:

storageAccountName = "your storage account name"
storageAccountAccessKey = "your storage account access key"
blobContainerName = "your blob container name"
if not any(mount.mountPoint == '/mnt/FileStore/MountFolder/' for mount in dbutils.fs.mounts()):
  try:
    dbutils.fs.mount(
    source = "wasbs://{}@{}.blob.core.windows.net".format(blobContainerName, storageAccountName),
    mount_point = "/mnt/FileStore/MountFolder/",
    extra_configs = {'fs.azure.account.key.' + storageAccountName + '.blob.core.windows.net': storageAccountAccessKey}
  )
except Exception as e:
  print("already mounted. Try to unmount first")

Or, you can add an error handler to print an error message if the the blob is mounted already, as such:

storageAccountName = "your storage account name"
storageAccountAccessKey = "your storage account access key"
blobContainerName = "your blob container name"

try:
  dbutils.fs.mount(
  source = "wasbs://{}@{}.blob.core.windows.net".format(blobContainerName, storageAccountName),
  mount_point = "/mnt/FileStore/MountFolder/",
  extra_configs = {'fs.azure.account.key.' + storageAccountName + '.blob.core.windows.net': storageAccountAccessKey}
  )
except Exception as e:
  print("already mounted. Try to unmount first")

Enable a Service Principal to Create Access Policies for an Azure Resource Using DevOps

If you got along well with your deployment strategy, you might got to the point where you are using a service principal in DevOps to do your deployments. One of the deployment tasks that I needed to do was to create an access policy using a service principal with Powershell. I simply wanted to do that using command Set-KeyVaultAccessPolicy. More specifically, I wanted to grant an access policy for my data factory to be able to get/list secrets in my key vault. Running my code with my windows account, everything was working as expected. Running my code using the service principal, it was failing with the following error

Exception: An error occurred: System.Exception: Insufficient privileges to complete the operation.at Set-KeyVaultAccessPolicy,

This has occurred even though my service principal was granted contributor access to the resource group where the key vault lives. That was the same level of permissions that my account had. Then why was it that my account was able to run Set-KeyVaultAccessPolicy but my service principal didn’t have enough privileges to do it?

It turns out that in order for a service principal to be able to create an access policy, it will need to validate the object ID for your resource against azure active directory. To fix, you can try one of the following two solutions:

Solution 1: Change the API permissions for your service principal to allow reading of the azure active directory graph.

This can be done by going to your service principal, and then select the API permissions blade. Click add permission as can be seen in the image below

Once there, scroll all the way to the bottom and select “Azure Active Directory Graph”. Then grant Directory.Read.All permissions to your application permissions options as can be seen below

This will enable your service principal to verify object ID and therefore create necessary access policies.

Solution 2: Add flag BypassObjectIdValidation to your command

This will enable your command to skip the object id validation step and create the access policy (even though this solution has worked for many people, it didn’t work for me). So your command will look something like

Set-AzKeyVaultAccessPolicy -ResourceGroupName $resourceGroupName `
                           -VaultName $keyVaultName `
                           -ObjectId $servicePrincipal.Id  `
                           -PermissionsToSecrets get,list `
                           -ErrorAction Stop `
                           -BypassObjectIdValidation

Hope one of these two solutions will help you figure out a solution to this problem.

Select Top n Records For Each Group In Python (Pandas)

Say that you have a dataframe in Pandas and you are interested in finding the top n records for each group. Depending on your need, top n can be defined based on a numeric column in your dataframe or it can simply be based on the count of occurrences for the rows in that group.

For example, suppose (god forbid) that I have a retailer store with four branches (A, B, C, D). Also, suppose that for each day, I would like to get the three branches with most number of items sold. So, I would like to take the day as a group, count the number of sold items in each branch for that day and then pick the three branches with highest number of sales. This is our problem, now, without further to say, let’s see the code for our example.

import pandas as pd
sales = {
'branch_name': [ 'A', 'A',  'B',   'B',   'C', 'C',  'C', 'C',  'D',
                        'A', 'A',  'B',   'B',   'C', 'D',  'D', 'D',  'D'],
    
'date': ['01/01/2020','01/01/2020','01/01/2020','01/01/2020','01/01/2020','01/01/2020',
         '01/01/2020','01/01/2020','01/01/2020', '02/01/2020','02/01/2020','02/01/2020',
         '02/01/2020','02/01/2020','02/01/2020','02/01/2020','02/01/2020','02/01/2020'],
    
'item_no': ['I1','I2','I3','I4','I5','I6','I7','I8','I9',
           'I10','I11','I12','I13','I14','I15','I16','I17','I18']
        
         }
sales = pd.DataFrame(sales)
sales['date'] = pd.to_datetime(sales['date'],format = '%d/%m/%Y')

Sweet, let’s look how our dataframe looks like

        branch_name 	date 	item_no
0 	   A       	2020-01-01 	I1
1 	   A       	2020-01-01 	I2
2 	   B       	2020-01-01 	I3
3 	   B       	2020-01-01 	I4
4 	   C       	2020-01-01 	I5
5 	   C       	2020-01-01 	I6
6 	   C       	2020-01-01 	I7
7 	   C       	2020-01-01 	I8
8 	   D       	2020-01-01 	I9
9 	   A       	2020-01-02 	I10
10 	   A       	2020-01-02 	I11
11 	   B       	2020-01-02 	I12
12 	   B       	2020-01-02 	I13
13 	   C       	2020-01-02 	I14
14 	   D       	2020-01-02 	I15
15 	   D       	2020-01-02 	I16
16 	   D       	2020-01-02 	I17
17 	   D       	2020-01-02 	I18

Great, from the dataframe above, you can see that for 2020-01-01, we have most sales coming from branches A, B and C whereas for 2020-01-02, we have most sales coming from branches A, B and D. Let’s write the expression to return what we just concluded with our eyes.

Solution

The solution here should be as the following:
1- We need to count the number of items sold for each day and each branch.

sales_agg = sales.groupby(['date', 'branch_name']).agg({'item_no':'nunique'}).reset_index()
print(sales_agg)
      date 	   branch_name 	item_no
0 	2020-01-01 	A 	         2
1 	2020-01-01 	B 	         2
2 	2020-01-01 	C 	         4
3 	2020-01-01 	D 	         1
4 	2020-01-02 	A 	         2
5 	2020-01-02 	B 	         2
6 	2020-01-02 	C 	         1
7 	2020-01-02 	D 	         4

2- For each day, we need to sort the branches by the number of items sold in descending order

sales_sorted = sales_agg.groupby(['date']).apply(lambda x: x.sort_values(['item_no'],ascending = False)).reset_index(drop = True)
print(sales_sorted)
 	date 	    branch_name 	item_no
0 	2020-01-01   	C 	         4
1 	2020-01-01   	A 	         2
2 	2020-01-01   	B 	         2
3 	2020-01-01   	D 	         1
4 	2020-01-02   	D 	         4
5 	2020-01-02   	A 	         2
6 	2020-01-02   	B 	         2
7 	2020-01-02   	C 	         1

3- Now for each date, we need to pick the top n records (3 in our case)

sales_sorted.groupby(['date']).head(3)
      date 	   branch_name 	item_no
0 	2020-01-01 	   C 	      4
1 	2020-01-01 	   A 	      2
2 	2020-01-01 	   B 	      2
4 	2020-01-02 	   D 	      4
5 	2020-01-02 	   A 	      2
6 	2020-01-02 	   B 	      2

Great, we are done. This has returned the top 3 branches based on total number of items sold for each day. Hope this helps you solve your own problem.

Read a CSV file stored in blob container using python in DataBricks

Le’ts say that you have a csv file, a blob container and access to a DataBricks workspace. The purpose of this mini blog is to show how easy is the process from having a file on your local computer to reading the data into databricks. I will go through the process of uploading the csv file manually to a an azure blob container and then read it in DataBricks using python code.

Step 1: Upload the file to your blob container

This can be done simply by navigating to your blob container. From there, you can click the upload button and select the file you are interested in. Once selected, you need to click the upload button that in the upload blade. See screenshot below.

Once uploaded, you will be able to see the file available in your blob container as shown below:

Step 2: Get credentials necessary for databricks to connect to your blob container

From your azure portal, you need to navigate to all resources then select your blob storage account and from under the settings select account keys. Once their, copy the key under Key1 to a local notepad.

Step 3: Configure DataBricks to read the file

Here, you need to navigate to your databricks work space (create one if you don’t have one already) and launch it. Once launched, go to workspace and create a new python notebook.

To start reading the data, first, you need to configure your spark session to use credentials for your blob container. This can simply be done through the spark.conf.set command. More precisely, we start with the following

storage_account_name = 'nameofyourstorageaccount' 
storage_account_access_key = 'thekeyfortheblobcontainer'
spark.conf.set('fs.azure.account.key.' + storage_account_name + '.blob.core.windows.net', storage_account_access_key)


Once done, we need to build the file path in the blob container and read the file as a spark dataframe.

blob_container = 'yourblobcontainername'
filePath = "wasbs://" + blob_container + "@" + storage_account_name + ".blob.core.windows.net/Sales/SalesFile.csv"
salesDf = spark.read.format("csv").load(filePath, inferSchema = True, header = True)

And congrats, we are done. You can use the display command to have a sneak peak at our data as shown below.