For the past 9 years, it has been a wonderful journey for me in the field of Data Science, starting as a facilitator in an engineering college and later transitioning into a Machine Learning Engineer was awesome. I work in a company that has ventured into full-fledged data Science (relatively recent based on its existence), it is doing the business of data capturing and delivering business insights to retail giants, malls and airports for the past 25 years.
I am indeed delighted to share my journey in this field, as it has been my interest even before the widespread usage of the term “Data Science”, it was then called Data Mining. I am writing my experience with a hope that the valuable lesson I learnt throughout my career will be helpful to a rookie who plans to pursue his career in this field, and an amateur Data Scientist who will have a premonition of the mistakes he would make in the near future. I hope this will also appeal to the serious practitioner of Data Science as it explores the journey of a fellow practitioner.
Way back in 2010, after obtaining my post graduation in Computer Science and Engineering, I started having a fascination towards decision sciences. The field was particularly interesting to me as it had the physical world and digital world complimenting each other. As a facilitator in an engineering college, I discovered myself a passionate public speaker who can train and mentor others and help them to attain their objectives. Later in my career, I realized that this attitude of mine will be very much useful in the field of data science, as a data science practitioner who would be required to identify and solve the business pain of the clients day in and day out which the client themselves would not have identified in the first place.
At the beginning of my career, I was fervent about my job as a facilitator because the very nature of the profession is to have a profound impact on lots of younger minds. I had delivered lectures on Data Structures, Database Management Systems, Data Mining, Design and Analysis of Algorithms, Operations Research, Probability and Queuing theory. I strongly believe that this laid the foundation of my Data Science quest. As the saying goes,
“In learning, you will teach, in teaching, you will learn”
I learnt a lot during my teaching career. I was lucky to be involved in Data mining projects during my work in academia, I had worked on problems which were predominantly approached as forecasting problems. In academia, there were no strict deadlines and so I could involve myself in exploring the nuances of forecasting techniques ranging from exponential smoothing to ARIMA.
I was bothered more about the technical aspects of the problem, I felt like an expert solving time series problems standing on the shoulders of the giants(people who developed the forecasting algorithms). The technical was fun and “it was the only way to solve any problem” was my belief then. I solved the problems using legacy algorithms, I then started exploring statistics which constituted the bigger part of the Data Science equation. To start Statistics was a fun exercise, as I was still teaching in college, I had a lot of time and the needed books to prepare, as access to the college library was easy then. As I was already delivering lectures on probability and queuing theory, it helped a lot in my statistics learning. I then moved towards machine learning algorithms like linear regression, logistic regression, support vector machines and nearest neighbours. I was more than happy to use all the techniques and tools under my belt I have mastered over the years to solve the problem using available data. From that experience, I learnt four important lessons,
- Ideas drawn from various fields such as Data preparation, Statistics and Machine learning laid the foundations of Data Science.
- 70% of the time is on dirty works of data science like data cleaning, integration, transformation and selection.
- Data exploration by visuals and performing statistical tests to analyze them are key to data understanding.
- Feature engineering should be carefully carried out before building machine learning models, as it tremendously improves the performance of the algorithm if done correctly or resulting otherwise.
I entered into the industry in early 2015, as I wanted to explore more and taste the commercial side of the field. Soon after that, I started exploring the possibilities of using Deep learning. Jeremy Howard and Andrew Ng courses helped me to learn and implement deep learning models to solve problems involving computer vision. It was then my first deep learning assignment in the industry.
Not all the projects I worked involved computer vision; traditional ways were used to solve a majority of the real world business use cases, where my learning and experience which I brought from academia helped me immensely. I was good at solving problems with the given tools.
There was a blind spot which I had not paid attention to till then, I never bothered about the business pain which leads to the problem statement in the first place. I was lucky enough to work on some great projects in various domains ranging from Retail, Retail real estate and Telecom, but did not have the chance or bothered to sneak into the business side of the Data Science problems. As I have already mentioned, I was happy with my technical skills. Recently, I was involved by my mentor into the business side of the Data Science projects which opened a world of possibilities for me. I was awestruck by the Businesses knowledge used to identify the business pain and formulate the problem. I started to understand the definition of the umbrella term Data science, i.e the right mix of Domain expertise, Programming and Mathematical skills, shown clearly in the picture below.
Since then I felt the importance of Domain expertise, I approached every problem keeping in mind the underlying business pain to be addressed. I started involving in discussions with the stakeholders to understand the nuances of the business I am working on, which helped me in tailoring the solutions according to the business needs. From the initial years of experience gathered from the industry, I learnt four important lessons, they are,
- Concepts are way more important than tools. You can solve a problem using R /Python/SPSS Modeler/SAS or any other tool, but the underlying algorithms are the same irrespective of tools used.
- Domain expertise is the key to problem formulation and providing solutions to any Data Science project.
- Presenting technical findings to the business (non-technical) audience in a way that they appreciate the value add of the solution is a critical step. It plays a vital role in the success of any Data Science project as it counts directly towards the Return on Investments (ROI).
- Whatever I have learnt so far in my journey is vividly captured in the famous saying below, explaining the qualities a Data Science practitioner should possess to begin, survive and excel in this career.
“The illiterate of the 21st century will not be those who cannot read and write, but those who cannot learn, unlearn, and relearn.”
In addition to my personal experience shared above, I also like to highlight a few pointers of 2019 O’Reilly Media survey results published a few weeks ago which reaffirms my learnings/findings,
Biggest skill gap reported for evaluating AI/ML is for ML modelers and Data scientists. But, the critical need in this stage must be for those who can ‘Understand business use cases’ to identify the right AI project. Maybe this reflects the misplaced priorities in the industry, which often impacts AI project’s success or ROI.
Though big breakthroughs and popular conversations in Deep Learning are around applications in computer vision and text analytics, about 86% was used on structured data, which is interesting.
The failures and learning were part and parcel of almost all the projects I had involved in my career till date. Some were successful after initial failures whereas others were failures till the very end. Irrespective of success or failure, learning was always my primary objective. I have consciously avoided discussing the technical challenges I have faced and the failures I have encountered in this article. I am saving those for my future articles.
Author: Balaji Muthukrishnan