Monday, February 24, 2014

Capstone: Introduction to Learn2Mine

I'd like to open this post by making a note about the state of my blog over the next several weeks. For my capstone, I am required to keep up with a blog and create posts specifically about work on my research and capstone paper itself. Any post prefaced with "Capstone:" will be in direct reference to that. So for anyone who wants to skip over those readings can just skip on by them and those that are interested can read them if so wish. This is mainly to have a compiled listing of my works through software engineering and my works through my own research in one place rather than managing multiple blogs.

So my project is Learn2Mine. But before I even tell you what that is about, you need to have some prerequisite knowledge, or at least an inkling of an idea about certain topics. So let's get to it.

Data Science is the first of these topics. Data science is an interdisciplinary field which crosses the realms of Statistics, Computer Science, and a domain field (e.g. Biology, Business, Geology, etc.). To the right is a very popular image which really highlights the cross-discipline nature of Data Science.

Data science is not taught to its fullest nowadays, though. People that are data scientists tend to be primarily a mathematician (i.e. statistician), a computer scientist, or someone with substantive expertise in a scientific or business-related field (see Best Source of New Data Science Talent below). If you have traditional training in one of these fields you tend to try and self-teach yourself the important skills of other fields. So maybe a biologist will try to learn the algorithms (e.g. Smith-waterman algorithm) that conduct the alignment of nucleotides or amino acids in gene sequences. Being able to use these algorithms at the most primal of levels without really understanding how to tweak them or really know what is going on means that the computer science expertise that you have is not enough to really mold you into a data scientist - or at least what we would like data scientists to be. A better representation of this can be seen in the image below.


So what are we going to do? How do we make sure that the influx of data scientists that we desperately need in academia and industry can get the proper training? The answer to that question is one that has been under development for quite some time now: Learn2Mine. You may be wondering what Learn2Mine is or how it is going to achieve this incredible goal. Well, you may have heard of sites like Codecademy, Rosalind, and/or O'Reilly. Soon Learn2Mine will be among this list as the preferred source for students and scientists alike to learn and master the skills and techniques one knows as a data scientist.

No comments:

Post a Comment