Confessions logo

Preludes to my red pill as a data scientist.

When I started my career, I was naive and my ego was pushing me even further into my naivety. I adhered to everything that was circulating on the Internet about data and the recommendations of some of its major players.

By Tarek SamaaliPublished 4 years ago 13 min read
Preludes to my red pill as a data scientist.
Photo by Roman Davydko on Unsplash

It took me a long process of reflection and self-adjustment to transcend all that. In this article, I detail the course of my first five years of experience, which started with grotesque innocence and led me to a satisfying form of maturity at the end.

1. The first year's mistake: Trying to absorb as much technology as possible.

My engineering background gave me a fairly consistent and enriched algorithmic, mathematical and computer-science-based arsenal. I was trained to model a complex problem, in whichever domain I was involved with. My interlocutors until then were my professors. I trained myself and got used against my will to associate problems with complexity. Each time I was confronted with a new problem in a different subject, I added a new layer of complexity, in a view to maximizing my gratification in the form of an evaluation grade. The more the problem pushed us to think, the more our reflections leaned towards abstraction and complex thinking, the closer we got to the solution and the more gratification we gained. Even in some modules, we were required to keep track of the number of days taken to solve a project. Rewards were represented in my perception as very strongly correlated to the resources consumed in the solving process. This perception was the backbone of my reasoning when I started my graduate internship. Before that, I filtered the titles and descriptions of many internship offers according to what emerged as a fair deal of complexity. On the first day, I installed the first core dependency in my machine: tensorflow. I was hearing about it quite often at that time and I thought that in order not to flunk out, I had to start with the mastery of the technology and the mathematical pillars that gave rise to it. My internship supervisors were understanding and rather sympathetic in that they immediately intercepted my obsession. Although the challenge of my internship was clear, I could hardly see how I could start otherwise than with tensorflow and the other Python libraries ( numpy, pandas, other built-in modules ). I let myself go by the hype and I did what scientific fashion compelled me to do: convince myself that everything comes down to see the problem through the lens of neural networks, therefore to designing neural networks. The way I saw it, I had fun reproducing different architectures thanks to the flexibility that Keras (a simplifying layer above tensorflow ) provided to its end users. My experiments covered several designs that should normally be adapted to the type of problem at hand: merge of two CNNs, hybrid Neural Networks (radial basis networks type), deep CNNs, RNNs, LSTM, … I already saw the power of the tool and I put aside the whole context of my internship. One month later, I was asked to present my findings and my ideas for the sake of the project I was working on. Confident and happy with the heavy content I have been acquiring, I launched into presentations of the latest state-of-the-art findings in the realm of deep learning and the different tests I was able to conduct. Unlike what I had been expecting, my team was skeptical much to my surprise. In my novice mind, attending a topical and new presentation should be given a minimum of recognition. Their message was clear; it was not clear how all this range of choices could be adapted to their problem and they would have liked, instead of what I had shown, a study of their business context and a clear roadmap for its resolution. Because of my immaturity, I had come to the ridiculous conclusion that the people I was working with were old-fashioned and averse to innovation. I was so convinced of this - even though I adjusted my course so as not to go against the current - that I didn't stop filling my browser with favorite pages that were essentially documentation of all the latest achievements of the big data players (torch, cafe, mxnet ..). I was jumping on everything that could move. My thirst for learning made me a predator: I couldn't control myself and I was guided most of the time solely by my instinct. To fully earn and honor my title as an engineer, my toolbox had to be as complete as possible. I won't say more about how my internship ended except that I proudly kept my stubbornness until the end and didn't change my mind. It was obvious to me; the company I would belong to after presenting my internship's work in front of the jury would have to align with my obsession. I didn't see my career development any other way. In any case, it was the first blue pill that cut me off from the reality of the field.

2. Second year: A choking urge of gratification, a kind of recognition for all the work that has been done at all costs.

My stressful race to acquire as many skills as possible was intensified more than ever the first week of my first job. The company I already belonged to met all my criteria, focused all its energy and investments on innovation and technology conquests. It also gave me complete freedom to proceed in any way I wanted.

I started my first real project with data collection, feature engineering, machine learning, and engaging in an iterative process. I was lucky enough to work with experienced people who knew how to decipher my energy and channel it to meet the contracts' requirements. Needless to say, I was still far from understanding the product-oriented approach. All I was worried about was the modeling and implementation aspects of the project. I was still trapped in the spiral of pure technicality and I was testing several tools (some of which are even extinct today) to improve my metrics. The business was completely separable from the implementation in my book. I was already happy enough to have data at hand that I struggled to collect with Scrapy. I was looking at my script of which almost 50 lines were imports and I was immensely proud of immersing myself into topical frameworks. I wielded my CSV's with grace and did all the generic steps that mentors, internet aficionados and coaches recommended doing: two correlated variables? Eliminate one. Not enough variables? Build new ones that are linear/non-linear combinations of those already established. Unbalanced classes? Proceed with sampling. Too generalized and evasive targets? Build new, more precise ones. Missing values? Plug in the mean of the variables' distributions. Outliers? Keep them out of the way. I could step back, but the path forward was more than obvious. Reasoning about the confronted problem became an obstacle.

I didn't question the consequences of everything I was learning anymore. For a given problem, I had to test all the elements of my toolbox and if I misfortunately ran out of leads, I would look for another library that would allow me to unconvincingly apply supplementary that might work. As soon as I did it, I restarted my process chain and observed if there would be any change on my confusion matrix and my other metrics as well. After a while, I started to live a certain routine in my work. A certain pattern that repeated itself in the way I dealt with each problem became even more palpable. I was systematically ignoring everything that was business-related. I failed to react to business issues. I was the crude example of a soldier going to war with heavy artillery, without any idea of who his enemy was or where he should be headed nevertheless. On another level, I got used to the adrenaline rush I felt with each discovery. In terms of gratification, it no longer gave me pleasure. I was already questioning my career and my aspirations as an unyielding dilemma arose in my eyes: Should I pick up the pace to be as up to date as possible with the current events, or should I slow down and reflect on where I stand in my profession?

The question was hard, especially since I kept observing and absorbing all the incessant flow of information because it was also a question of being up to date.

As critical as my eyes were, I couldn't tell what was necessary from what was not. There was always a blur in my vision. I thought I was efficient (I beg the pardon of the heavens and the real data scientists). I was certainly learning, but there was always a huge gap between what I was doing and the expectations of the professional in front of me, which got bigger as the project progressed and the requests were added and the information piled up. Without hiding it, I was lost. Fortunately, in my company, some very clear-sighted people had been there to reorient me every time I deviated from the right path. Nevertheless, I knew that something was wrong and that some things had to be surgically reviewed. This was the first prelude to the red pill.

3. Third year: A slight detachment from work, much more abundant importance to personal life. Ephemeral absence of ambition.

I am not talking about burnout. The detachment I experienced was the result of several self-confrontations and is an expression of a softened form of abandonment. For two years, I kept my student mindset and perpetuated my student thinking and pace. I hoped to get "grades" as a reward. I didn't realize that I was a professional and that if I hoped to collect any form of gratification, it would be from a client who would pay for my service even though he would be unhappy most of the time. I understood then that if I had to provide a result, there are no clear metrics to be found in the literature. My previously repetitive approach was beginning to hit its limits. This would certainly have worked for the developers, for whom the evaluation of the solution is known in advance. When developing a solution, it is advisable not to rethink a problem already posed and solved by other specialists. The question of reusability is a matter of faith even in the world of software engineering and has been the subject of much debate. I wasn't suggesting that the scientific and engineering approaches to data were similar. However, in terms of data science, reusability was a double-edged sword. If applied properly, the consequences can only be highly positive. I was no longer able to cope with the complexity of the world. I told myself that, until further notice and waiting for a black swan that would tip the market, I had to step back. The motivation I had that was fueled by the refreshing novelty and the conquest of technologies has taken a big hit. Please believe me, I still loved my job. I still enjoyed hearing others talk about descriptive statistics, modules to be developed, GPU and ML/DL learning. However, I couldn't find my place in all this anymore. I stopped working on weekends exactly when I saw it as a necessity to evolve. I stopped filling my YouTube account with playlists full of tutorials, courses and demonstrations. I was enjoying my free time with my family again. Instead, I adopted a passive posture. I told myself that the future would take me where I deserved to be, that my potential would be recognized by the universe one of these days, but not right now. My obsession was already taking shape in other aspects of life: spirituality, beauty and harmony. I read a lot. My objectives were placed elsewhere than my work. I was planning to visit hidden Parisian treasures that provided an unparalleled serenity. I read everything that had to do with reflection, disruption, innovation, fiction, but nothing about personal development (that would have bored me to death, sorry). I was learning things that I immediately tried to transpose to my reality. Undoubtedly, I felt some change. This was the second prelude to the red pill.

4. Fourth year: Deep reflection on the issue of moderation, and on ambition. More serious thoughts about specialization and what should be important in my career.

The detachment did not last long. As all romantic experiences tend to bring out the past, I could see that the year of calm and tranquility was not in sync with my initial desire. The career I decided to embrace required sacrifice. The efforts I put in the previous year were not enough. To survive in the data field, you have to align yourself with the complexity and demands of the era and this was no easy task for the beginner that I was. The shock was brutal during one of my contracts: Machine learning models I thought I could create marveling pieces of software with were of no use to me. I was expected to come up with things that would make the clients profoundly amazed. But this time, it was not the case. I was faced with a customer who knew a little bit about it, had a technical profile and had relatively considerable experience in the field of AI. He was able to get a head start on our performance with simple rules that reflected his deep understanding of his business. The project was part of an IOT intrusion detection problem. The consolidated signal recordings we were given certainly contained useful information to answer the problem. Where I struggled to read the signals, he had a disconcerting ease that allowed him to look beyond the common vision of most data professionals. Of course, the rules helped us a lot and were adopted as decision variables without which the expected deliverable would never have been as high as expected.

To my mind, this experience was the missing piece of the puzzle, the missing link in a whole chain of intertwined confusions. I realized that all the hard work I was doing, all the energy I was putting in, was going the wrong way. In fact, I was doing reverse engineering. I was using the tools I had at my disposal to see how the problem could be positioned. If there was a blockage, it was because the problem was badly pondered and scaled. So, the formulation process had to be repeated, even if it meant doing it several times. What I did not come to realize was that as a starting point, I had to look at the problem, the whole problem and nothing but the problem ; it was clear as mud. If there are rules that allow you to solve it with minimum cost and easiness to be put in production, it's all good and as I would say in French "c'est tout bénéf ". If instead of having rules, we had examples of collected data to rely on to automate, classify, estimate or even optimize, and if the rules allowing to do so were unfathomably extractable, then by all means I would use Machine Learning as a last resort. If there was no sound, text, nor images, I would probably never think about Deep Learning. Unless it's a perception problem (vision or hearing), no Deep Learning considerations in my area.

The conclusions I reached had given me a huge boost of confidence, enthusiasm, and ambition. They allowed me to start a new learning cycle just as intense as the beginning, except my vision was now clearer. I was now able to optimize my thinking and resources. I also felt I could see the essence of a data science project. That was the moment when I decided to take the red pill.

5. Fifth year: Clarity of vision and thinking. A deep desire to pass on emotional and professional experience to more junior people and show them the essentials for a good start.

It took me a while. Adding value to customers had become an acceptable metric to measure my performance. There were already enough challenges to deal with. The value delivered was completely independent of the technology used. I no longer saw any harm in reusing an existing solution. I was no longer in the mindset of trying to re-implement solution bricks to satisfy my ego (thank God for open source and thank God for giving life to Richard Stallman).

I also don't see any harm in working on weekends if it fits in with my career plan. My personal readings and experimentations have become more in tune with my original ambitions. I now think in terms of use cases, and this was very impactful on the type of resources I consult for instance. My interests started to cover projects' feedbacks that would answer the following type of questions: What problem was addressed? In which business context? What information do we dispose of? What were the constraints set by the client? Was he willing to relax some of them and open the door to compromise? Once these questions are identified and asked in clear and precise terms, my curiosity moves on to the choice of technologies used.

The part of my thinking that involved pure technicality and invaded the whole of my interests has been revised downwards. It is now no more than 40% of my mental effort.

I begin to observe the same self-observed pattern in other people. I intervene whenever it is possible for me to do so. I feel a responsibility to share my experience with people who need it. I think it is also the duty of every person who has experienced such a metamorphosis.

My discipline is subject to many career changes as evidenced by the market. I look at roadmaps concocted by professionals in the field that recommend lists of must-have skills that every newcomer must master. Data science would then look like amateur coding.

Please, the purpose of data science is to solve problems with data. The rest are gadgets that will serve us to do so, period.

Closing thoughts :

The data market is still in turmoil and this can rub off on individuals. Confusion is rampant. The world is very complex. The AI today's topics are full of buzzwords of all kinds and colors. Maturity is a major asset to survive in a world like today. If I had some advice to give to people who have just started in the field, I would suggest to get some more seasoned colleagues to accompany them. I would suggest to respect them and trust their guidance and above all not to try to impress them with exuberant knowledge and skills, it will backfire right afterward. Having a mentor by your side helps you control yourself and allows you to gain a double experience. This was the case for me, and I got much more out of it than I could have imagined on my own, especially in a deceptive realm like ours.

Workplace

About the Creator

Reader insights

Be the first to share your insights about this piece.

How does it work?

Add your insights

Comments

There are no comments for this story

Be the first to respond and start the conversation.

Sign in to comment

    Find us on social media

    Miscellaneous links

    • Explore
    • Contact
    • Privacy Policy
    • Terms of Use
    • Support

    © 2026 Creatd, Inc. All Rights Reserved.