Thursday, December 10, 2015

Theano on Windows - The Easy Way

I recently have been using a Windows Server 2008 R2 VM, but I also wanted to use Theano in scripts I was using. Rather than creating a virtual machine within my virtual machine (which I could not even do due to exceptions in the virtual environment), I wanted to make use of Theano and figure out what the quickest way to set Theano up would be.

First, install a 32-bit (even if you have a 64-bit system) install of Anaconda for Python 2.7, not any version of Python 3. I recommend using version 2.1 of Anaconda, which can be found via zip files on their site.

Once installed, run "conda install mingw libpython" from the command line. This retrieves resources required for Theano.

Before installing Theano, I suggest installing LAPACK resources. A Windows tutorial on how to do this is here and describes the process better than I would be able to: LAPACK

Theano can be installed in 2 ways. "pip install Theano" installs the current stable branch of Theano, but that branch sometimes updates and breaks Windows installs of Theano when using it. If you try "pip install Theano" and you cannot import Theano without errors/crashing, then I suggest using this alternate method: "pip install git://github.com/Theano/Theano.git" - this installs the bleeding edge dev version of Theano which may not be perfect, which is why this is the alternate method. However, before installing this bleeding edge version you either want to uninstall the Theano you have or upgrade it. "pip uninstall theano" then "pip install git://github.com/Theano/Theano.git" or "pip install git://github.com/Theano/Theano.git --upgrade", respectively.

That should be all you need to get going with Theano.

Why Anaconda? I love Anaconda because I'm a command line fanatic due to constant SSH-ing (most of my editing is done through VIM - shoutout to my automated vimrc creation tool) so I wanted to avoid IDEs. Anaconda packages nice machine learning utilities with it, such as numpy, scipy, and scikit-learn. Additionally, the "conda install" feature is pretty clean and useful.

Sources:
Theano site: http://deeplearning.net/software/theano/install.html
Why 2.1 Anaconda: http://stackoverflow.com/questions/31050976/python-exe-crashes-when-importing-theano
The issue in action: https://groups.google.com/forum/#!topic/theano-users/p77HXTvjNxc
LAPACK resources: http://icl.cs.utk.edu/lapack-for-windows/lapack/index.html#libraries
Theano Github: https://github.com/Theano/Theano

Tuesday, November 10, 2015

ICDM 2015

This coming up weekend I and Cassios Marques will be attending ICDM 2015 (IEEE International Conference on Data Mining) in Atlantic City, NJ. I am attending this conference rather than presenting at it, which is actually out of the ordinary for me. Information available at: http://icdm2015.stonybrook.edu/

We arrive the night of November 14th (conference is from the 15th to the 17th) and will be needing to head to sleep right away as our technical arrive time will be on the 15th.

Tentative Schedule:
Saturday, November 14th Schedule
9:00am - 12:30pm ~ Morning Workshop: The 2015 IEEE ICDM Workshop on Data Mining in Biomedical Informatics and Healthcare (DMBIH)
2:00pm - 6:00pm ~ Afternoon Workshop: The 2015 IEEE ICDM Workshop on Sentiment Elicitation from Natural Text for Information Retrieval and Extraction (SENTIRE)

Sunday, November 15th Schedule
9:00am - 10:15am ~ Keynote 1 (Robert F. Engle): Dynamic Conditional Beta and Global Financial Instability
10:30am - 12:30pm ~ Session 1A: Deep Learning and Representation Learning
2:00pm - 3:15pm ~ Keynote 2 (Michael I. Jordan): On Computational Thinking, Inferential Thinking and Big Data
3:30pm - 4:40pm ~ Session 2A: Big Data 1
4:50pm - 6:00pm ~ Session 3A: Big Data 2 (Not sure if this is a repeat or extension; Clustering 2 is another option)
Monday, November 16th Schedule
9:00am - 10:15am ~ Session 4C: Dimension Reduction and Feature Selection
10:30am - 12:30pm ~ One of:

  • Session 5A: Ensemble Methods
  • Session 5B: Applications 2
  • Session 5C: Network Mining 1
This day is short as there is an excursion from 2:00pm to 7:00pm


Tuesday, November 17th Schedule
9:00am - 10:15am ~ Keynote 3 (Lada Adamic): Information in Social Network
10:30am - 12:30pm ~ One of:

  • Session 6B: Graph Mining
  • Session 6C: Mining Sequential Data
2:00pm - 3:15pm ~ Session 7B: Mining Text and Unstructured Data

There is one last session from 3:30pm to 5:00pm, but we will have to leave in order to catch a taxi, to catch a train, to catch our flight so we need to leave room for any holdups in that process.

Wednesday, June 10, 2015

Numpy ValueError: Output array is read-only

I recently received this cryptic error when working with a numpy implementation of a neural network and was having trouble finding a ready-made solution for this problem.

This error can be seen in a similar fashion if you run commands such as:

import numpy as np
a = np.arange(6)
a.setflags(write=False)

a[2] = 42
ValueError:assignment destination is read-only

This is intended behavior.

I am currently working on a neural network which utilizes gradient descent and updates through the use of a client-server relationship. When processing a new mini-batch, the neural network started to throw the titular ValueError whenever updating the weights on the 2nd minibatch. 

I have seen many posts on Stack Overflow and other sites referring to this error being a result of the array being non-contiguous in memory itself. This seems to be a problem in underlying Python code as it is built on C and it could also couple with the underlying implementation of Numpy. 

If you receive an error such as this, then the easiest way to circumvent your non-contiguous memory is to simply just make it contiguous again:

my_discontiguous_array = np.array(my_discontiguous_array).copy()

This is performing an effective deep copy and re-loading the data into contiguous memory for your use. Hopefully the garbage collector cleans up the old, un-referenceable version of the array.

Monday, March 23, 2015

AMIA 2015 Joint Summits on Translational Science

I previously mentioned that the research I have been working on has been accepted to a conference. In all of my busy-ness I have not been able to effectively update this blog (being a GRA is a bit of a time-sink). I'm attaching the poster being presented at the AMIA 2015 Joint Summits on Translational Science to this post as a reference to the work we have been conducting for the past ~year. This is being presented this afternoon. This poster is setup in a landscape format so it is being shrunk for purposes of being on this blog. If you're interested and would like to see a better version of it, please contact me.




Thursday, February 5, 2015

Deep Learning and Natural Language Processing

First off, I would like to say that an abstract I co-authored, titled Improving Lupus Phenotyping Using Natural Language Processing, has been accepted to the 2015 Summit on Translational Bioinformatics. This conference is in San Francisco during late March. I will not be attending as I will be busy with classes (two of the PI's with which I am working will be attending), but I am still heavily working on information to be presented for the poster symposium then.

My most recent advances in this research have brought me to attempting to unravel the intricacies behind Deep Learning. Our goal is to classify patients', based on digitized doctors' notes for those patients, status of Lupus (effectively present or not). The ramifications of such research would result in quicker, easier, and more accurate recruitment for clinical trials, as well as the outperformance of solely using icd-9 billing codes as a classifier.

Some sources consulted for research:
  • http://colah.github.io/posts/2014-07-NLP-RNNs-Representations/
  • http://deeplearning.stanford.edu/tutorial/
  • https://www.youtube.com/watch?v=n1ViNeWhC24
  • http://arxiv.org/pdf/1206.5533.pdf
  • http://www.socher.org/uploads/Main/PaulusSocherManning_NIPS2014.pdf
  • http://nlp.stanford.edu/~socherr/thesis.pdf
  • http://nlp.stanford.edu/~socherr/SocherChenManningNg_NIPS2013.pdf
  • http://www.aclweb.org/anthology/P/P12/P12-1092.pdf
  • http://www.aaai.org/Papers/JAIR/Vol37/JAIR-3705.pdf