What is Big Data? or What is Big Data to me? The dot-to-dot story

Well, Big Data maybe overhyped – but it is here. There are many occasions where I still meet customers and peers who don’t understand the concept and throws buzz words Data is an asset, Volume, Velocity and probably another 14 new V definitions.
Therefore, I would try to explain what Big Data is for me and why it is so cool to deal with Analytics and Big problems. Well in the beginning there was Data…
Forget about ACID, CAP Theorem, these are fundamental to understand but it is not enough.

When my Son was 5 yrs, he enjoyed fun activities and one of his favorite ones (before DS came to our life) was solving dot to dot puzzles. We bought him 1-30 dot-to-dot workbook and he enjoyed practicing numbers and also discover fun picture.
One of the benefits of dot-to-dot activities is improving children’s motor skills and eye-hand coordination (one of the things I discovered when I research)


One day, my lovely wife noticed that his workbook is about to completed so she went online to Amazon and decided to order new dot-to-dot workbook with Dinasour’s theme (Again it was before Ninjago and Pokemon times )

When he got the workbook, he was so excited and while I was sitting in my home office working, he approached me with tears in his eyes and said: “Dady, it is too Hard, I need your help”.
When I looked at his workbook, I was amazed, the workbook he got was Extreme Do-to-dot and every two pages has over 1,400 dots.

I laughed so hard and asked my wife what in the world did she think about when she bought it, but she was amazed as she didn’t notice that but she was looking for Dinosaur dot-to-dot.


We explained Jonathan that this workbook is for age 8 and up and he is not supposed to do it and even if he is 8, 18 or 38, we will make him do it if he wont behave.

At that night, reading and working on one of my projects, dealing with Analytics, noSQL and all the good stuff, it hits me: Dot-to-Dot!!! This is what Big Data for me.

From that day, I carry with me the two workbooks of dot-to-dot and on sessions I present, I show the 1-30 dots workbook and ask them to solve on picture or how much time will it take them and common answer is 20-30 seconds.

Then, I hand over the 1400 dots puzzle and ask them to solve or assess how much time will it take them. Lets say that it takes more than one minute :-)

Within Big Data, we expect to solve the 1400 dots in the same time it takes us to solve the 30 dots (mmmm, someone mentioned MapReduce), the logic is the same simple logic of following sequential numbers, but the insight is different ( different picture for every problem).

To me it is symbolic and I use this analogy a lot. Some argues that the problem is complex but it is not big data as it doesn’t present velocity, but my answer is that dot-to-dot problem is N Dimensions problem with K dots requires to discover something new and wait for the A-Ha moment!


Get me the damn Use case

So assuming you started your journey in the Big Data world. You probably realized that by now, that one of the most important V is Value. Yep, you mine PB of data, you’d run crazy hypothesis, invest great amount of time and probably money to harvest your unstructured data.what we call Value is actually the parallel meaning to the word: Use case.

Yep, we need to show ROI, business model and all the great business school/consulting firms will tell you. Well, gling gling, here is your reality check, forget About the use case, just start!
Don’t let anyone fool you with good marketing pitch. Predicting the future is not an easy task.
Advanced analytics can help the business and drive the strategy but you must setup expectation with your marketing department.

Use case is a journey, it requires discipline, it requires great resources(not good but great) – get smart folks to do this job or identify the talent people in your org and make them your new data scientists or whatever new hype role that is defined in the industry

There are two great books that I recommend you reading:
1. Win business with AA
2. Data agile

These two books were written by different authors from different angle but I think they complete each other.
The first one will set you up with the business acumen
The second will make you understand that your still have to learn and the big data technology is pretty cool.

“Hello Big Data World”

So here it is.

I am getting many times questions like how much it takes to learn Hadoop? How much time I need to spend on training before I can start my professional career in that area.

The short answer I have: “No free Lunch”.
I might be old school but I honestly believe that in order to be great in something, you must have the passion.

My short answer is: 3-5 months.


  • You have great Java programming background (Hadoop was written in JAVA) – I heard about guys who knew only JAVASCRIPT and perl,…
  • You dont just read but you actually setup your own cluster with 2-3 nodes and code – yes code.
  • You get 2-3 books, read and practice.

Next time I will elaborate on my short answer.





Big Guy on Big Data

The Hype?!

One of my best friend who is a CEO of a young  startup met me several weeks back and  was mad at me as I used the term “big data” at least 10 times on 5 minutes call. It was probably a side effect of me over reading and exploring this area.
Meeting with Steve Borbst, who is the CTO of Teradata had some great impact when you hear a great lecture and have the “Ah” moments.

Steve Borbst talked about the fact that we are living in the SPIME world (hoping I say it right) – but the thing is that we are looking on “Space and Time” dimensions when we generating and consuming data – I might need to write a post just on this specific topic.

There is great buzz/hype around big data – reminds me cloud computing few years back To me it is hyped but it is here to stay for many many years.However there is great confusion regarding Analytics/BigData in general – I have many definitions for all but I won’t share it as it will create a long discussion like this innocent question posted by Joana on linkedin :-)

I will try within my blog to cover many aspects and areas I encounter at work and  learn and most important, I will use the term “big data” less than 10 times on every post.