XLKitLearn

Click here to see current version number.

I wrote XLKitLearn to give students access to the full power of scikit-learn using an Excel interface. Students use it to fit random forests, boosted trees, and carry out Latent Dirichlet Allocation on large datasets, all in Excel. It has changed the way I teach data science and analytics, in my Business Analytics 2 class.

Why design a brand new tool when there are others out there? See below for my rationale. Ready to give it a try? See the installation instructions (PC and Mac) and quick demos below.

Here are the highlights

You are welcome to use XLKitLearn for free for any purpose, but every run will be logged, together with your email address, on a server. If you want to avoid this log, either reach out to me or simply run it while disconnected from the internet.

The code is available for your perusal and copyrighted (for now, until I figure out what the right open source license is). As soon as I have a second I'll create a GitHub repo with the code in a more easily accessible format).

Installing the add-in

XLKitLearn will likely require you to have administrative rights, both for installation and use. I have also heard anecdotal evidence that antivirus/antimalware software can prevent the addin from running. If you find that Excel shuts down every time you try to run it, feel free to contact xlkitlearn@guetta.com to see if it's an issue we've encoutered before.

Jump to installation instructions for PC or Mac.

Installation instructions - PC

  1. Download the XLKitLearn installer and install it.

    Note: if you have an older computer, you will need to check whether you have a 32 bit or 64 bit version of Windows. To do this, click here to open the system settings, scroll down to "System type" and check whether it says "64-bit" or "32-bit". If it says "32-bit", do not download and install the file above - instead, use the 32 bit installer.

  2. Download XLKitLearn.xlsm. This file contains the add-in itself. Every time you want to use the add-in, create a copy of this file, and bring your data into it.

  3. Open the file XLKitLearn.xlsm. Upon opening it, the file will likely be in protected mode. Unprotect it by clicking "Enable Editing": Click on Enable Editing You will then be asked to enable Macros. Make sure you do: Click on Enable Content

  4. Once the file opens, enter your email address in the lower part of the screen. Then, click on the first run button near the top of the add-in:

    The button should turn grey and the sheet should display the addin's progress. Let it run until a new sheet appears in your Excel. If you haven't used the add-in for a while, or are using it for the first time, it is normal for Python to take a little while to load, but the entire process should take no more than 2 minutes at most.

  5. If the add-in runs successfully, the following screen should show up (note that the contents of cell D13 have been obscured. If you are installing this add-in as part of a class, your instructor might ask you to report the value in that cell to verify you've installed the add-in correctly):

  6. Finally, go back to the "Add-in" sheet, and click on the other button ("Edit Settings"). Make sure you click on the first, upper “Edit Settings” button – not the lower one. The following window should pop-up:

If you reached this point, congratulations - you're done! You may want to look at the demo videos below to discover what the add-in can do.

Installation instructions - Mac

  1. Bring up Spotlight Search by pressing Command + Space, and type “terminal” in the search bar. Click on the terminal icon

  2. Copy and paste the following words exactly into the terminal

    curl -sSL https://danguetta.github.io/files/xlkitlearn/installer.sh | bash

    and then press enter (note: do not modify the above line - you must use bash even on newer mac OS systems). You will be asked for your password, which you should enter (this is the password you use to log in to your computer):

    The command should take between 2 and 5 minutes to run, and end with the text Successfully installed XLKitLearn! in green. When it has finished running, it will look something like this:

  3. Download XLKitLearn.xlsm. This file contains the add-in itself. Every time you want to use the add-in, create a copy of this file, and bring your data into it.

  4. Open the file XLKitLearn.xlsm. Upon opening it, you should be asked to enable Macros:

  5. You will then be asked to grant access to up to three directories on your computer - to do this, click on "Select" in the window that pops up, and then click on "Grant Access". Once you have granted access to these three directories, you shouldn't need to do it again.

  6. Once the file opens, enter your email address in the lower part of the screen. Then, click on the first run button near the top of the add-in:

    The button should turn grey and the sheet should display the addin's progress. Let it run until a new sheet appears in your Excel. If you haven't used the add-in for a while, or are using it for the first time, it is normal for Python to take a little while to load, but the entire process should take no more than 2 minutes at most.

  7. If you get an error at this stage that says the developer cannot be verified, click here.

  8. If the add-in runs successfully, the following screen should show up (note that the contents of cell D13 have been obscured. If you are installing this add-in as part of a class, your instructor might ask you to report the value in that cell to verify you've installed the add-in correctly):

  9. Finally, go back to the "Add-in" sheet, and click on the other button ("Edit Settings"). Make sure you click on the first, upper “Edit Settings” button – not the lower one. The following window should pop-up:

If you reached this point, congratulations - you're done! You may want to look at the demo videos below to discover what the add-in can do.

Two quick demos

Here are two short videos that demonstrate the basics of the predictive analytics and text analytics capabilities of XLKitLearn.

You might also be interested in this introductory video I use to introduce the add-in in my classes; it discusses the general mechanics of changing the add-in settings and running it.

A more detailed user manual is forthcoming!

Why design a brand new tool?

Before designing XLKitLearn, I did a broad search to see what other approaches existed to teach non-technical students data science. I found three approaches, but none met my needs exactly, hence my decision to create something new.

I have also found that even for technical students who know how to code, using a tool that allows them to focus on the data science without worrying about the syntax can be invaluable. XLKitLearn's code output can then be used to seamlessly transition to scikit-learn.