“Big Data” on Big Data

Big-Data is the buzzword of the decade – everyone is claiming that they are in the Big Data game. To be honest, Big Data has always been with us; it has always been a part of real life – population migration, stock market movements, motor traffic patterns, television viewings and ratings, pathogen transmission & mutation etc.

The problem has always seemed to be making sense of that data in a uniform manner. All of this information was coming from different channels and was heterogeneous in nature; you could make sense of one of these data sources but that didn’t necessarily give any insight whatsoever to another data source (the number of people who watch a certain TV program on a Friday has no bearing or relation on the mutation rate of the HIV virus) – so Big Data contextualization was almost impossible across diverse fields.

Machines that understand human language

Now that the largest data channel regardless of context is the Internet (convergence) and the language used to describe that data is human readable text; there is a common channel and a common protocol. We as humans have now started to receive information from that single channel and can manually make sense of it. We can’t necessarily relate the Friday night TV viewing population directly to a pathogen mutation but we are getting both sets of information from the same channel in the same language so we can interpret diverse fields in a common fashion.

To scale this information analysis and to begin seeing possible patterns and associations across these data sets, we need scale. We need machines that can understand human language and therefore need to be contextually capable and semantically (meaningful surface data) aware.

Up until this decade we needed to create data about data (meta-data) so that we could get machines to understand context, but there is way too much data now to make this in any way feasible (one would spend considerably more time creating the meta-data than the originating data it is meant to describe) – so we need the machines (algorithms) to understand semantic information instead of meta-information.

Algorithms need to understand human language and make similar associative and dissociative decisions to those that the human brain makes by default. So context purely from text analysis is the order of the day. This is done by using code that can count the number of times certain words are used in a piece of text, the closeness of those words to other words, the complexity of statements, the importance of words (where they appear, which verbs precede, what grammatical statements appear after them), and so on and so forth. This all allows machines to start understanding meaning and related data from any piece of textual information. So many companies are now creating algorithms that take some inputs from the users screen or user behavior through textual interaction and making keyword based searches and data relationships based on those interactions; displaying back to the user what is ‘reckoned’ to be contextually aligned to what the users’ inputs were. This can be inaccurate at first but improves greatly over time by the code taking into consideration which data recommendations the user does or does not subsequently interact with (or rate) and also noting what recommendations other users with similar inputs or behaviors are choosing – these algorithms are loosely classed as Collaborative clustered algorithms and Content based filters.

Making sense of data

No two companies are analyzing the same data in the same way, so even though they are using the same channel and the same protocol, they are looking for different patterns, without having to recreate the mechanism for doing so each time they come upon a slightly different data set, so it is safe to say that the basic algorithms need to have similar structures but different behaviors. Some companies have tried to build generic algorithms that do this kind of data analysis, personalization and recommendation, some have even open-sourced their efforts, but in many cases, one needs to build these things from the ground up to be accurate, efficient and maintainable within their own business domain. It is quite often far more effort to take complex software that has already been created and try to adapt it to your own needs and then maintain it, than to start from scratch.

Fishtree uses algorithms to do what we do – Education based content search & recommendation, grade leveling, content alignment, auto generation of assessment and personalization – we are looking at different types of data for different data patterns in our efforts to drive better learning outcomes through the use of data and better workflow. These algorithms, better workflows with simple and attractive user interfaces are proving extremely useful in personalizing the experience for every student but empowering the teacher in a whole new way.

By Jim Butler

P.S. Teachers use Fishtree to plan lessons, find standard-aligned teaching resources, create assessments and see students performance, all in one place! If you’re a teacher, use Fishtree to prepare for your class and understand how well each student is learning. What’s more, it’s safe, secure, collaborative and easy! Try the next generation learning platform or contact us for a demo.

Follow @fishtree_edu

Image credits: NetWork (Playing Futures: Applied Nomadology) / CC BY 2.0

Digital Pioneers: How a D-Rated District Led the Nation in OER Adoption

Machines that understand human language

Making sense of data

Digital Pioneers:
How a D-Rated District Led the Nation in OER Adoption