Introduction to the Azure Machine Learning Workbench

Following the announcements post published some days ago  here,  we will dig deeper on this new tool, the Workbench. This is also called AML Workbench, which is shorter and this term will be used from now on to refer to Azure Machine Learning Workbench (glad about the acronym as I do not want to type that again :P)

 

But, what’s the AML Workbench?

It is a desktop application for Windows and MacOS, it has built-in data preparation that learns the data preparation steps as we perform them, which is able take avantage of the best open source frameworks including TensorFlow, Cognitive Toolkit, Spark ML and scikit-learn.

This also means that if you have a GPU that supports AI (read my earlier blog post on the topic here https://joslat.blog/2017/10/15/give-me-power-pegasus-or-the-state-of-hardware-in-ai/ ) you will be benefitting from that power heads-on.

Oh, it has also a command line interface for those who like them 😀

 

Sounds interesting? Then let’s get started!

 

Concepts first!

AML – Azure Machine Learning

This is in need to be described at the earliest as it might be a bit confusing. This is a solution proposal from Microsoft that englobes different components and services to provide an integrated end-to-end solution for data science and advanced analytics.

With it we can prepare data, develop experiments and deploy models at cloud scale (read massive scalability here)

AML consists of a few components:

  • AML Workbench – Desktop tool to “do-it-all” from a single location.
  • AML Experimentation Service – I “suppose” this will enable us to validate hypothesis in a protected scenario.
  • AML Model Management Service – I suppose this will enable us to manage our models
  • Microsoft Machine Learning Libraries for Apache Spark (MML Spark Library) – I read Spark/Hadoop integration here, probably to Azure servers
  • Visual Studio Code Tools for AI – I read here R & Python integration with Visual Studio

 

This picture showcases how AML Workbench fits in the Microsoft AI Ecosystem:

AML intro architec high level.JPG

To say that AML fully integrates with OS (Open Source) initiatives such as scikit-learn, TensorFlow, Microsoft Cognitive Toolkit or Spark ML.

The created experiments can be run in managed environments as Docker containers and clusters running Hadoop with Spark (I am wondering why is Microsoft is only mentioning Spark there if they work together? – Ok! As it was built as an improvement over MapReduce it can also run Stand Alone in the cloud, that’s why!). Also they can use advanced hardware like GPU-enabled VMs in Azure.

AML is built on top of the following technologies:

  • Jupyter Notebooks
  • Apache Spark
  • Docker
  • Kubernetes
  • Python
  • Conda

 

AML Workbench (yeah, finally!)

Desktop application with a command-line for Windows & MacOS to manage ML solutions through the entire data science life cycle.

  • ETL
  • Model development and experiment management
  • Model Deployment

 

It provides the following functionalities:

  • Data Preparation that can learn by example (Wow!)
  • Data source abstraction
  • Python SDK for invoking visually constructed data preparation packages (SSIS anyone?)
  • Built-in Jupyter Notebook service and Client UX (like anaconda?)
  • Experiment monitoring and management
  • Role-based access to support sharing and collaboration
  • Automatic project snapshots for each run and version control (trazability on “experiments” at last!) along GIT integration
  • Integration with popular Python IDEs

 

Let’s install it!

First things first, do you have a computer with windows 10 or macOS Sierra? (I guess you won’t have a Windows Server 2016 at home, do you?) If so proceed.. else go update https://www.microsoft.com/en-us/store/b/windows 😉

 

Oh, well… before installing we need to set up an ML experimentation account..

Go log-in in the Azure portal here https://portal.azure.com/

Click on the new button (+) on the top left corner of the Azure portal and tzpe “machine learning” and select the “Machine Learning experimentation (preview)”

Azure 01 MLE preview.JPG

Click create and fill in the nice form:

Azure 02 MLE preview.JPG

Be sure to select “DevTest” as the cost-saving Model Management pricing tear, otherwise it will have a cost. Dev Test appears as “0.00”. Otherwise you might forget and have a non-pleasant surprise…

Azure 03 MLE preview.JPG

As I am not that much into playing with azure at a personal level (mostly HOLs and learning) I deleted all my resources, including a DB I created at a HOL that suddenly had a cost… luckily very low… and created all the required elements from the ground up. Resource Group, experimentation account, storage account, workspace, model management account, account name… my recommendation is that you keep that data safe and close to you. As this is all protected by Microsoft’s Azure security.

Oh, and Click “Create” to create all this components in the cloud. We should see a “Deployment in progress..” message which should be over in a couple of minutes, as shown in the picture below.

Azure 04 MLE preview.JPG

Also we should see the details of the resources created, storage, resource group, etc… along some useful tools to download (at last) the AML Workbench.

We can also download it from here https://aka.ms/azureml-wb-msi

And double click it or right click and select “install”.

After the installer loads we should see the gorgeous installer…

Azure 05 MLE preview.JPG

It’s clean, it’s Metro..ups! I meant modern!

As usual, click continue and you will be presented the dependencies and installation path, shown next.

Azure 06 MLE preview.JPG

There I did not like I could not change the installation path… so we can only click the install button… and cross fingers that this does not create any conflict with your Anaconda installation… as this is a clear preview (read here: under your own responsibility)

Oh, I do NOT recommend you to wait for the installation to finish…

…go watch a series or to the gym (as I did in fact) – the installation will take about half an hour to download and install all the required components…

 

Now AML Workbench (preview) is installed in your computer, congratulations!!

Azure 07 MLE preview.JPG

Note that you can find it at C:\Users\<user>\AppData\Local\AmlWorkbench

Oh, and get used to this icon, I have the feeling we will be visiting it for a while 😉

But let’s continue, we are not yet finished!!

 

First steps!

So, let’s do something! Baby steps of course..

Start it and log-in to your Azure/Microsoft account. Automatically you should be able to see the recently created workspace in the Azure portal.

 

Click on the plus sign next to “Projects” panel in the top left or in the text menu, select File and then “New Project…”

We will give our project a name, like “Iris”, then select a directory to save your Azure Machine Learning Projects in your local computer and add a description.

We have to select our workspace, which by default will be the one we just created in the first place.

We will select the template “Classifying Iris” and click on the “Create” button below. This template is a companion sample project for Azure Machine Learning which uses the iris flower dataset.

Azure 07b MLE preview.JPG

Once the project has been created we will see the dashboard of the recently created project.

We can see several sections from our dashboard: the home section of our project, the data sources, notebooks, runs and Files.

On the project dashboard panel we can see a description of our project with instructions on how to set it up and follow the Quick start and tutorials, as well as an execution section.

The Data panel showcases the data sources and the preparations for obtaining them. This is a pretty special section with truly amazing features that can only be found on the AML Workbench, but we will see it on a next post.

It is worth noting that the Notebook panel is basically a Jupyter notebook container, as on the installation there was a custom anaconda installation being made, which also did not seem to tamper with my installation of Anaconda…

We can also open the project in Visual Studio Code or other configured IDE.

If we do not have we can install it now here https://code.visualstudio.com/

On the text menu, select File, then “Configure Project IDE” and input a name and the path for your IDE, which I selected VS Code, as we can see next:

Azure 08 MLE preview.JPG

Once this is installed, we should install Python support for VSCode, so we go to the extensions menu and select one of the Python extensions. In my case I selected the Python 0.7.0 from Don Jayamanne, but this extension seems the most complete.

Azure 09 MLE preview.JPG

Once this is set up we can go to the text menu, click on File, then on “Open Project”, next to it should appear our configured IDE between brackets, “(Visual Studio Code)”. We can see VSCode with the project loaded and we can click on one of the Python source files, for example, iris_sklearn.py. We should see the syntax highlighter at work, intellisense also working, between some other features.

Azure 10 MLE preview.JPG

Now, let’s execute it, we can go to the project dashboard panel and select “local”, then the source file “iris_sklearn.py” and add “0.01” in the arguments and click run.

We could also execute it on the Files panel, select the “iris_sklearn.py”

On the right side of AML Workbench, the Jobs pane should appear and showcase the execution(s) that we have started.

While we are at it, we could try some other executions changing the argument to range from 0.001 to 10.

Azure 11 MLE preview.JPG

What we are doing is executing a logistic regression algorithm which uses the scikit-learn library.

Once the different executions have finished, we could go to the Runs panel. There, select “iris_sklearn.py” on the run list and the run history dashboard should show the different runs.

Azure 12 MLE preview.JPG

We can click on the different executions and see the details.

By now we have grasped the concepts of AML and its ecosystem, configured our environment in Azure, installed the Workbench, configured and next created a sample project and executed it locally.

Hope this was a good introduction and you enjoyed it.

 

 

Sources:

Machine Learning / Data Science / AI / Big Data… There I go!!

Updated 29/11/2017:  I am adding AI programming to ramp up my Python skills and some focus into a gamification site, codingame.com. I have updated the article to reflect this.

Call it as you want.. it is a very fuzzy topic and there are many discussions on the names and concepts 😉

Since from some time, after the “death” of Silverlight, I had an empty space… which was to me the DRIVE, to me this is something exciting that gets me engaged, that pushes and motivates me to go further… it’s when you are in a hackathon and you have this feeling of…

This is it!!

And even .NET Core is an exciting thing with its .NET Standard compliance, And Azure is pretty exciting and improving on a day to day basis, they were not still bringing that “shiny” “Silverlight” factor that pushed me to play and explore with that technology and make it my playground… to devour design and interaction books as well as physics programming just to optimize resources and get to do magic in the UI… what times!!

So, I had two candidates: ML/DS (Machine Learning / Data Science) and AR/VR/MR… and the second is still not mature enough (and it was impossible to get a HoloLens too) I decided earlier this year to go for Machine Learning 🙂 – even you probably have figured it out after reading the title..

I have set up a path on this vast topic which is Data Science, Machine Learning and AI. And, on this path, to learn the best tools for the task in front..

That said, I already worked 2+ years in  ETL (Extract, Transform, Load) to prepare data in a big editorial, as well as in BI & reporting… as other knowledge I can leverage from my experience..

But what is Data Science exactly? (as well as those other buzzwords)

As my understanding goes, these are their meanings/areas:

  • Data Science – The “all goes in” discipline, collecting the data, organizing it, preparing for searching patterns in the data to be able to make advanced “tasks” on it like predictions, classification, etc.. Usually this tasks are the work of a Machine Learning Model that does the magic. Usually this profile has a decent background in Data management, and in defining data flows to integrate the data in a repository where the automated analysis can be done. Also this task requires Math and statistic skills.
  • Machine Learning – Science of creating (or adapting/tuning) algorithms that learn on their own from data (read: can be trained to perform better). Usually a mixed profile of a Matemathician & coder fits this position the best. To say that ML is a subset of Data Science.
  • Deep Learning: To most people, this is a subset of Machine Learning,  which are in fact a ML technique (neural networks). Which has had a lot of success in certain problems and is becoming a discipline on its own.
  • AI – Subfield of Computer Science to program computers that solve human tasks, so they can performing planning, moving, recognizing objects, etc… basically any task. This includes ML as making a prediction on a set of data is, basically, a task. That makes ML a subset of AI, basically. ML has as a goal to make computers handle the task of learning upon data and by themselves, so they can make predictions.

And even I believe this is a clear description, there are people still discussing about this definitions… Here are some more articles that discuss this topic in way more detail, like this one. If you want to understand how wide are the possibilities for a “data scientist” read this.

Some people have several different but similar opinions, and if you have time, you can read some of them. But…

I want to feel the power of DS/ML in my fingertips, know from the top to the bottom how to get things done understanding every single step and to be able to design, code and tune complex models that provide accurate results.. and to be able to explain those models through proper visualizations that provide a clear insight of the decision taken by the model.. And for this,

I have a plan…

Here is My path forward for DS-ML-DL…

Step A: become a Data Science / ML “begginer”

Goal: to become knowledgeable of what is “out there” what is the people using, what are the main technologies and get a feeling on them. Also I likeLove UI and believe that the proper presentation helps greatly to understanding so want to invest a good deal in data presentation skills.

  1. Andy Ng’s Machine Learning – done! – great base but everything done with Mathlab… and no excessive explanation as the exercises were pre-prepared.
  2. Udemy introduction to Data Science – done
  3. EDX program from Microsoft for Data Science– in progress (4 out of 11 courses)
  4. Tableau A to Z (done)

Step B: become a proficient, or at least intermediate, ML developer and DS practitioner:

Goal: To become competent in programming with a hands-on practical approach, both in R and in Python even I believe I will dig in deeper with Python as there is a lot more material in there.

  1. Datacamp.com practicing with some courses in Python, 2 modules completed.
  2. codingame.com practicing to polish my AI agent coding skills (in Python), currently implementing the “intermediate” challenges.
  3. Python A-Z (udemy, Kirill Eremenko)  (Done!)
  4. R a-z (udemy, Kirill Eremenko)
  5. Machine Learning A-Z, hands-on Phython & R (Udemy)
  6. Taming Big Data with Apache Spark & Phyton. (Udemy)

Step C: Become intermediate to advance ML Developer and get some experience:

Goal: do I need to explain? 😉

  1. Ensemble ML
  2. Start digging in on Kaggle, on examples and tutorials to get up to speed and compete in at least one Data Science contest. Ref: https://www.kaggle.com/
    Kaggle is a ML “professional” racing competition so I want to have some ground skills and “driving” experience before joining a competition.
  3. I want this experience to consolidate my learnings all together with hands on experience, with a goal.
  4. Tableau expert top visualization techniques (to get some better knowledge of Tableau)

 

Step D: Get DEEP.

Goal: To get into the most deep and complex topic on today’s Machine Learning panorama, Deep Learning with the new computational advances seems to be key in implementing new approaches of predictive systems, and more – they are being used to develop AI systems able to develop strategies that beat the best humans at a task, to be creative as humans  can be, but without our limitations  – limited cpu power, limited ability to learn and procrastination.. I have setup the following courses

  1. Deep Learning A-Z
  2. Artificial Intelligence A-Z
  3. Join some Kaggle challenges regarding DL and-or AI development.
  4. Deep Learning: GANs and variational Autoencoders
  5. Bayesian Machine Learning in Python
  6. Cluster Analysis and unsupervised ML

Obviously this is a vast topic and things can evolve there or change…

Regarding Kaggle, it is in the right spots I believe. I consider it a way to stablish the learned skills and also getting some valuable experience. see this Quora post:
https://www.quora.com/Can-I-learn-Machine-Learning-completely-with-Kaggle
Also, I love hackathons and coding competitions… participating on these events always gets the best of me and gets me to develop even further than I expected, being that the biggest win – that said, winning or getting in top places does not feel bad at all 😉

And what about Microsoft tech?

well, I do plan to get up to date on all things Microsoft, as on top there is the Microsoft Data Science Orientation, and I have already been playing with Azure Machine Learning Studio, even participating in some competitions while I was performing the Andy Ng Course… I’d like to get hands on and create some content.. I am thinking on some articles on fundamental usage of Azure ML, to show the full usage of AML (create a data integration “data science” workflow, create a model and tune it, create a service and consume it from .NET, for example…
So, do you think such an article (or several articles that show how-to get this done) would be fun/useful?
And… what do you think of the plan? let me know any suggestion you might think of to improve it, I would really appreciate that a lot – I am just beginning 🙂
Update: I forgot to mention that am spicing up the course with a jewel site I found thanks to Microsoft’s Data Science course I am currently performing: http://www.datacamp.com – so some of their trainings will fit in here and there. Also I might consider any of the specializations from udacity later on, and heard that some of the nanodegrees “have it all” from somebody doing the courses… so that could be an option too… 😉