In this tutorial, I am going to zip files in azure data factory.
For going through the tutorial, we are going to create some files and move them into a blob storage.
For completing the task, please follow the next steps.
Steps to zip files in Azure Data Factory v2.
Step 1: Create a Linked Service to connect the Blob Storage.
To Begin, on ADF (Azure Data Factory) go to Manage and on Linked Services click on New and select Azure Blob Storage.
Then fill the required fields and click on “Test connection”, If you didn’t do anything wrong it is going to return a “Connection Successful” message.
Step 2: Create Datasets
Next, on ADF we need to create two datasets, one would be a dataset for reading the Blob Storage files and then we need to create another dataset for zipping files.
First, we are going to create the dataset for zipping the blob files
For creating a Dataset, click on “New dataset”.
After “New dataset” window opens, select Azure Blob Storage then click on Continue.
As we are going to work with txt files, select “Binary” as we are going to use txt files, then click on Continue.
When setting properties, Select the Linked service that we created and on File Path select the blob storage container that you are using for this tutorial. Then click on OK.
After creating the dataset, change “Compression type” and “Compression level”.
You can change this option as you wish, this is going to be what makes zipping file possible on ADF.
After that we are going to add two parameters, the first one is going to be “DirectoryPath”, this defines the directory on which zip file is going to be put and “File path” which declares the zip file name.
Click on Parameters Tab and click on “New” then add the parameter.
Then go to Connection’s tab again an add that parameter on the “File Path” Directory and File.
You can add those parameters by clicking on “Add dynamic content” on “File Path” Directory or File
Then on “Add dynamic content” window, add the dynamic content for the parameter you added for Directory “@dataset().DirectoryPath”, If you are not familiar with ADF dynamic content go to the Parameters part and click on the parameter that you want to add as a dynamic content.
Result should look like this.
After that we are going to create the dataset for reading the blob storage files.
We are going to create a Binary Dataset; options must be similar to what we had on the previous dataset as we are going to copy and compress files on the same blob container.
Then we are not going to change Compression type as we are going to read files with this dataset.
We will not add parameters on this Dataset as Directory and File are going to be defined in the Copy Activity.
Step 3: Create a Pipeline for zipping files.
Finally, we need to create the pipeline for copying data, for that we need to create a pipeline and add a “Copy data” activity.
On the Copy pipeline change Source and pick BlobFiles as the Source dataset then on the “File path type” option pick “Wildcard file path” and proceed to fill the fields.
I filled “Wildcard folder path” with “FilesToZip” as that is the directory that contains the files I want to zip and on “Wildcard file name” I wrote “Test*.txt” on “Wildcard file name” to just pick Files that starts with Test and ends with .txt.
After that go to Sink tab and select the “ZipToBlobFiles” Dataset. Then you can add a path on DirectoryPath, that is going to define on which directory zip files are going to be put, in my case that is going to be “zipfiles”.
On FilePath write the name of the final zipped file, I am going to use “data.zip”
After all changes are publish, we are going to proceed to run the pipeline.
Then you can proceed to check on the blob container the zipped file.
Finally, we proceed to check the zipped files we notice that it only took files that start with Test as we were expecting