Introduction to the Azure Machine Learning Workbench

Following the announcements post published some days ago here, we will dig deeper on this new tool, the Workbench. This is also called AML Workbench, which is shorter and this term will be used from now on to refer to Azure Machine Learning Workbench (glad about the acronym as I do not want to type that again :P)

But, what’s the AML Workbench?

It is a desktop application for Windows and MacOS, it has built-in data preparation that learns the data preparation steps as we perform them, which is able take avantage of the best open source frameworks including TensorFlow, Cognitive Toolkit, Spark ML and scikit-learn.

This also means that if you have a GPU that supports AI (read my earlier blog post on the topic here https://joslat.blog/2017/10/15/give-me-power-pegasus-or-the-state-of-hardware-in-ai/ ) you will be benefitting from that power heads-on.

Oh, it has also a command line interface for those who like them 😀

Sounds interesting? Then let’s get started!

Concepts first!

AML – Azure Machine Learning

This is in need to be described at the earliest as it might be a bit confusing. This is a solution proposal from Microsoft that englobes different components and services to provide an integrated end-to-end solution for data science and advanced analytics.

With it we can prepare data, develop experiments and deploy models at cloud scale (read massive scalability here)

AML consists of a few components:

AML Workbench – Desktop tool to “do-it-all” from a single location.
AML Experimentation Service – I “suppose” this will enable us to validate hypothesis in a protected scenario.
AML Model Management Service – I suppose this will enable us to manage our models
Microsoft Machine Learning Libraries for Apache Spark (MML Spark Library) – I read Spark/Hadoop integration here, probably to Azure servers
Visual Studio Code Tools for AI – I read here R & Python integration with Visual Studio

This picture showcases how AML Workbench fits in the Microsoft AI Ecosystem:

To say that AML fully integrates with OS (Open Source) initiatives such as scikit-learn, TensorFlow, Microsoft Cognitive Toolkit or Spark ML.

The created experiments can be run in managed environments as Docker containers and clusters running Hadoop with Spark (I am wondering why is Microsoft is only mentioning Spark there if they work together? – Ok! As it was built as an improvement over MapReduce it can also run Stand Alone in the cloud, that’s why!). Also they can use advanced hardware like GPU-enabled VMs in Azure.

AML is built on top of the following technologies:

Jupyter Notebooks
Apache Spark
Docker
Kubernetes
Python
Conda

AML Workbench (yeah, finally!)

Desktop application with a command-line for Windows & MacOS to manage ML solutions through the entire data science life cycle.

ETL
Model development and experiment management
Model Deployment

It provides the following functionalities:

Data Preparation that can learn by example (Wow!)
Data source abstraction
Python SDK for invoking visually constructed data preparation packages (SSIS anyone?)
Built-in Jupyter Notebook service and Client UX (like anaconda?)
Experiment monitoring and management
Role-based access to support sharing and collaboration
Automatic project snapshots for each run and version control (trazability on “experiments” at last!) along GIT integration
Integration with popular Python IDEs

Let’s install it!

First things first, do you have a computer with windows 10 or macOS Sierra? (I guess you won’t have a Windows Server 2016 at home, do you?) If so proceed.. else go update https://www.microsoft.com/en-us/store/b/windows 😉

Oh, well… before installing we need to set up an ML experimentation account..

Go log-in in the Azure portal here https://portal.azure.com/

Click on the new button (+) on the top left corner of the Azure portal and tzpe “machine learning” and select the “Machine Learning experimentation (preview)”

Click create and fill in the nice form:

Be sure to select “DevTest” as the cost-saving Model Management pricing tear, otherwise it will have a cost. Dev Test appears as “0.00”. Otherwise you might forget and have a non-pleasant surprise…

As I am not that much into playing with azure at a personal level (mostly HOLs and learning) I deleted all my resources, including a DB I created at a HOL that suddenly had a cost… luckily very low… and created all the required elements from the ground up. Resource Group, experimentation account, storage account, workspace, model management account, account name… my recommendation is that you keep that data safe and close to you. As this is all protected by Microsoft’s Azure security.

Oh, and Click “Create” to create all this components in the cloud. We should see a “Deployment in progress..” message which should be over in a couple of minutes, as shown in the picture below.

Also we should see the details of the resources created, storage, resource group, etc… along some useful tools to download (at last) the AML Workbench.

We can also download it from here https://aka.ms/azureml-wb-msi

And double click it or right click and select “install”.

After the installer loads we should see the gorgeous installer…

It’s clean, it’s Metro..ups! I meant modern!

As usual, click continue and you will be presented the dependencies and installation path, shown next.

There I did not like I could not change the installation path… so we can only click the install button… and cross fingers that this does not create any conflict with your Anaconda installation… as this is a clear preview (read here: under your own responsibility)

Oh, I do NOT recommend you to wait for the installation to finish…

…go watch a series or to the gym (as I did in fact) – the installation will take about half an hour to download and install all the required components…

Now AML Workbench (preview) is installed in your computer, congratulations!!

Note that you can find it at C:\Users\<user>\AppData\Local\AmlWorkbench

Oh, and get used to this icon, I have the feeling we will be visiting it for a while 😉

But let’s continue, we are not yet finished!!

First steps!

So, let’s do something! Baby steps of course..

Start it and log-in to your Azure/Microsoft account. Automatically you should be able to see the recently created workspace in the Azure portal.

Click on the plus sign next to “Projects” panel in the top left or in the text menu, select File and then “New Project…”

We will give our project a name, like “Iris”, then select a directory to save your Azure Machine Learning Projects in your local computer and add a description.

We have to select our workspace, which by default will be the one we just created in the first place.

We will select the template “Classifying Iris” and click on the “Create” button below. This template is a companion sample project for Azure Machine Learning which uses the iris flower dataset.

Once the project has been created we will see the dashboard of the recently created project.

We can see several sections from our dashboard: the home section of our project, the data sources, notebooks, runs and Files.

On the project dashboard panel we can see a description of our project with instructions on how to set it up and follow the Quick start and tutorials, as well as an execution section.

The Data panel showcases the data sources and the preparations for obtaining them. This is a pretty special section with truly amazing features that can only be found on the AML Workbench, but we will see it on a next post.

It is worth noting that the Notebook panel is basically a Jupyter notebook container, as on the installation there was a custom anaconda installation being made, which also did not seem to tamper with my installation of Anaconda…

We can also open the project in Visual Studio Code or other configured IDE.

If we do not have we can install it now here https://code.visualstudio.com/

On the text menu, select File, then “Configure Project IDE” and input a name and the path for your IDE, which I selected VS Code, as we can see next:

Once this is installed, we should install Python support for VSCode, so we go to the extensions menu and select one of the Python extensions. In my case I selected the Python 0.7.0 from Don Jayamanne, but this extension seems the most complete.

Once this is set up we can go to the text menu, click on File, then on “Open Project”, next to it should appear our configured IDE between brackets, “(Visual Studio Code)”. We can see VSCode with the project loaded and we can click on one of the Python source files, for example, iris_sklearn.py. We should see the syntax highlighter at work, intellisense also working, between some other features.

Now, let’s execute it, we can go to the project dashboard panel and select “local”, then the source file “iris_sklearn.py” and add “0.01” in the arguments and click run.

We could also execute it on the Files panel, select the “iris_sklearn.py”

On the right side of AML Workbench, the Jobs pane should appear and showcase the execution(s) that we have started.

While we are at it, we could try some other executions changing the argument to range from 0.001 to 10.

What we are doing is executing a logistic regression algorithm which uses the scikit-learn library.

Once the different executions have finished, we could go to the Runs panel. There, select “iris_sklearn.py” on the run list and the run history dashboard should show the different runs.

We can click on the different executions and see the details.

By now we have grasped the concepts of AML and its ecosystem, configured our environment in Azure, installed the Workbench, configured and next created a sample project and executed it locally.

Hope this was a good introduction and you enjoyed it.

Sources:

Give me power, Pegasus! – or the state of Hardware in AI

A bit of history..

It’s no wonder that many years ago, about 6 (in computer terms, that is) some companies started to provide specialized hardware & Software solutions to improve the performance of AI and Machine Learning algorithms, like nvidia with its CUDA platform. This is has been really important in the AI/ML industry as this graph shows:

Basically, an improvement of 33 times the speed of using a normal pc..

But if this graphic was not enough to motivate you to learn more (and get to the end of this article) – see this other one:

This is a graph made on 2016 showcasing the evolution regarding AI processing power since 2012, which the 1X at the bottom is based on an already accelerated GPU for AI processing… which was set as a landmark or baseline on 2012 with Alex Krizhevsky’s study regarding usage of a Deep convolutional neural network that learned automatically to recognice images from 1 million examples. With only two days of training using two NVIDIA GTX 580 GPUs. The study name was “ImageNet Classification with Deep Convolutional Neural Networks”

BANG!!

It’s a BANG! – A big one, which many are calling the new industrial revolution – AI. There, many companies listened, adopting this technology: Baidu, Google, Facebook, Microsoft adopted this for pattern recognition and soon for more..

Between 2011 and 2012, a lot of things happened on AI: Google Brain project achieved amazing results – being able to recognize cats and people by watching movies (though using 2,000 CPU at Google’s giant data center) – then this result was achieved by just 12 NVIDIA GPUs This feat was performed by Bryan Catanzaro from NVIDIA along (my teacher!) Andy NG’s team at Stanford (Yay! I did your course so I can call you teacher :D)

Later on 2012, Alex Krizhevsky from the University of Toronto won the 2012 ImageNet computer image recognition competition, by a HUGE margin, beating image recognition experts. He did NOT write computer vision code. Instead, using Deep Learning, his computer learned to recognice images by itself, they named their neural network AlexNet and trained it with a million example images. This AI bested the best human-coded software.

The AI race was on…

Later on, by 2015, Microsoft and Google beat the best human score in the ImageNet challenge. This means that a DNN (Deep Neural Network) was developed that bested human-level accuracy.

2012 – Deep Learning beats human coded software.

2015 – Deep Learning achieves beats human level accuracy. Basically acquiring “superhuman” levels of perception.

To have an idea, the following graphic shows the acquired accuracy of both Computer Vision and Deep Learning algorithms/models:

Related to this, I wanted to highlight the milestone achieved by Microsoft’s research team on 2016 but before this, let me mention what Microsoft’s chief scientist of speech, Xuedong Huang said on December 2015: “In the next four to five years, computers will be as good as humans” at recognizing the words that come of your mouth.

Well, on October 2016, Microsoft announced a system that can transcribe the contents of a phone call with the same or fewer errors than actual human professionals trained in transcription… Again human perception has been beaten..

The Microsoft research speech recognition team

These advancements are made possible by the improvement in Deep Learning mainly which is acquired by massive calculation power like 2.000 servers of Google Brain or, as of now, just a few NVIDIA GPUs… this delivers results and results drive the industry and make it trust a technology and, more importantly, bet on it. This is what is has been happening along this years…

Our current AI/ML/DL “Boosters”:

They are essential tools to boost AI (ML, Deep Learning, etc..) and are supported by a day by day increasing number of tools and libraries.. (Caffe, Theano, Torch7, TensorFlow, Keras, MATLAB, etc..) and many companies use them (Microsoft, Google, Baidu, Amazon, Flickr, IBM, Facebook, Netflix, Pinterest, Adobe,… )

An example of this is the Titan Z with 5,760 CUDA cores, 12GB memory and 8 Teraflops

Comparatively, “Google Brain” has 1 billion connections spread over 16,000 cores. This is achievable with $12K with three computers with Titan Z consuming “just” 2,000 KW of power, Ditto.. – oh and if this sounds amazing, this is data from 2014… yeah, I was just teasing you 😉

It gets better…

As of today, we have some solutions already on the consumer market, which you might have in your home computer, like the NVIDIA Pascal based graphic cards:

Nvidia 1080 with 10 Gbps, 2560 NVIDIA CUDA Cores and 8GB GDDR5X memory

NVIDIA Titan Xp with 11 Gbps with 3584 NVIDIA CUDA cores and 12 GB GDDR5X memory

Here is a picture of the beautifully crafted NVIDIA 1080, launched by the end of June 2016:

And it’s my current graphic card, from when I decided to focus on Machine Learning and Data Science, by the end of 2016 😉 – I am getting ready for you baby! (currently learning Python)

Also, similarly, we have the Quadro family, focused on professional graphic workstations, for professional use. Being their flagship the Quadro P6000 with 3840 CUDA cores 12 Teraflops and 24GB GDDR5X.

And this just got better and better…

I could not help myself reminding myself of this scene from Iron Sky

Recently announced this past 10^th of October 2017 we have the Pegasus nvidia drive PX, the autonomous supercomputer for fully autonomous driving, with a passively cooled 10 watts mobile CPU () with four high performance AI processors. Altogether they are able to deliver 320 Trillion Operations per Second (TOPS)

Pegasus! – I personally love the name (I think Mr. Jensen Huang must like the “Zodiac Cavalliers” very much! – as a good geek should ^.^)

I believe these AI processors are two the newest Xavier system-on-a-chip processors coupled with an embedded GPU based on the NVIDIA Volta architecture. The other two seem to be two next generation discrete GPU with hardware explicitly created for accelerated Deep Learning and computer vision algorithms. All in the size of a license plate.. not bad!

Here is a pic of the enormous “Pegasus” powerhorse:

Cute, right?

This is huge – again yeah. Think that this is basically putting 100 high-end servers in the size of a license plate.. Servers on current Hardware, that is..

And this is powered by…

Volta!

Did I say volta?

This is nvidia’s GPU Architecture which is meant to bring industrialization to AI, and has a wide range of their products supporting this platform. NVIDIA Volta is meant for healthcare, financial, big data & gaming..

This hardware architecture consist of 640 Tensor cores which deliver over 100 Teraflops per second, 5x the previous generation of nvidia’s architecture (Pascal).

DGX systems – AI Supercomputers “a la carte” Based on the just mentioned Volta architecture, having 4x TESLA V100 or the Rack based supercomputer DGX-1 with up to 8 TESLA V100, having each an intel Xeon for each 4 V100. Oh, and all the other hardware boosted to support these massive digital brainpower..

Following some comparative picture to put things in the proper perspective…

Here, in the hands of Jensen Huang, who is Nvidia co-founder and CEO, is a Volta V100, if you were wondering:

Smaller than the 100x servers it can beat, right?

V100 family, along Volta Architecture, were presented just recently this year at Computex, end of May.

Oh, and the market responded extremely well…

They are also empowering IOT solutions for embedded systems, targeting small devices like drones, robots, etc.. to perform video analytics and autonomous AI, which is started becoming a trend now in consumer products..

The family of these products is called NVIDIA Jetson, with its TX2 being their flagship, having 256 CUDA cores and 8GB 128 bit LPDDR4 memory along two CPU (HMP Dual Denver + Quad ARM)

As you can see the race is on, and continues to accelerate and who knows where it will bring us to..

Hope you enjoyed this post, if you liked it, please subscribe 🙂