
1. What is the AdaBoost Algorithm?
AdaBoost also called Adaptive Boosting is a technique in Machine Learning used as an Ensemble Method. The most common algorithm used with AdaBoost is decision trees with one level that means with Decision trees with only 1 split. These trees are also called Decision Stumps. What this algorithm does is that it builds a model and gives equal weights to all the data points. It then assigns higher weights to points that are wrongly classified. Now all the points which have higher weights are given more importance in the next model. It will keep training models until and unless a lower error is received.
2. What is the Sliding Window method for Time Series Forecasting?
Time series can be phrased as supervised learning. Given a sequence of numbers for a time series dataset, we can restructure the data to look like a supervised learning problem.
In the sliding window method, the previous time steps can be used as input variables, and the next time steps can be used as the output variable.
In statistics and time series analysis, this is called a lag or lag method. The number of previous time steps is called the window width or size of the lag. This sliding window is the basis for how we can turn any time series dataset into a supervised learning problem.
3. What do you understand by sub-queries in SQL?
A subquery is a query inside another query where a query is defined to retrieve data or information back from the database. In a subquery, the outer query is called as the main query whereas the inner query is called subquery. Subqueries are always executed first and the result of the subquery is passed on to the main query. It can be nested inside a SELECT, UPDATE or any other query. A subquery can also use any comparison operators such as >,< or =.
4. Explain the Difference Between Tableau Worksheet, Dashboard, Story, and Workbook?
Tableau uses a workbook and sheet file structure, much like Microsoft Excel.
A workbook contains sheets, which can be a worksheet, dashboard, or a story.
A worksheet contains a single view along with shelves, legends, and the Data pane.
A dashboard is a collection of views from multiple worksheets.
A story contains a sequence of worksheets or dashboards that work together to convey information.
5. How is a Random Forest related to Decision Trees?
Random forest is an ensemble learning method that works by constructing a multitude of decision trees. A random forest can be constructed for both classification and regression tasks.
Random forest outperforms decision trees, and it also does not have the habit of overfitting the data as decision trees do.
A decision tree trained on a specific dataset will become very deep and cause overfitting. To create a random forest, decision trees can be trained on different subsets of the training dataset, and then the different decision trees can be averaged with the goal of decreasing the variance.
6. What are some disadvantages of using Naive Bayes Algorithm?
Some disadvantages of using Naive Bayes Algorithm are:
It relies on a very big assumption that the independent variables are not related to each other.
It is generally not suitable for datasets with large numbers of numerical attributes.
It has been observed that if a rare case is not in the training dataset but is in the testing dataset, then it will most definitely be wrong.
Prompt Engineering in itself does not warrant a separate job.
Most of the things you see online related to prompts (especially things said by people selling courses) is mostly just writing some crazy text to get ChatGPT to do some specific task. Most of these prompts are just been found by serendipity and are never used in any company. They may be fine for personal usage but no company is going to pay a person to try out prompts ๐ . Also a lot of these prompts don't work for any other LLMs apart from ChatGPT.
You have mostly two types of jobs in this field nowadays, one is more focused on training, optimizing and deploying models. For this knowing the architecture of LLMs is critical and a strong background in PyTorch, Jax and HuggingFace is required. Other engineering skills like System Design and building APIs is also important for some jobs. This is the work you would find in companies like OpenAI, Anthropic, Cohere etc.
The other is jobs where you build applications using LLMs (this comprises of majority of the companies that do LLM related work nowadays, both product based and service based). Roles in these companies are called Applied NLP Engineer or ML Engineer, sometimes even Data Scientist roles. For this you mostly need to understand how LLMs can be used for different applications as well as know the necessary frameworks for building LLM applications (Langchain/LlamaIndex/Haystack). Apart from this, you need to know LLM specific techniques for applications like Vector Search, RAG, Structured Text Generation. This is also where some part of your role involves prompt engineering. Its not the most crucial bit, but it is important in some cases, especially when you are limited in the other techniques.
Learn Data Science in 2024
๐ญ. ๐๐ฝ๐ฝ๐น๐ ๐ฃ๐ฎ๐ฟ๐ฒ๐๐ผ'๐ ๐๐ฎ๐ ๐๐ผ ๐๐ฒ๐ฎ๐ฟ๐ป ๐๐๐๐ ๐๐ป๐ผ๐๐ด๐ต ๐
Pareto's Law states that "that 80% of consequences come from 20% of the causes".
This law should serve as a guiding framework for the volume of content you need to know to be proficient in data science.
Often rookies make the mistake of overspending their time learning algorithms that are rarely applied in production. Learning about advanced algorithms such as XLNet, Bayesian SVD++, and BiLSTMs, are cool to learn.
But, in reality, you will rarely apply such algorithms in production (unless your job demands research and application of state-of-the-art algos).
For most ML applications in production - especially in the MVP phase, simple algos like logistic regression, K-Means, random forest, and XGBoost provide the biggest bang for the buck because of their simplicity in training, interpretation and productionization.
So, invest more time learning topics that provide immediate value now, not a year later.
๐ฎ. ๐๐ถ๐ป๐ฑ ๐ฎ ๐ ๐ฒ๐ป๐๐ผ๐ฟ โก๏ธ
Thereโs a Japanese proverb that says โBetter than a thousand days of diligent study is one day with a great teacher.โ This proverb directly applies to learning data science quickly.
Mentors can teach you about how to build a model in production and how to manage stakeholders - stuff that you donโt often read about in courses and books.
So, find a mentor who can teach you practical knowledge in data science.
๐ฏ. ๐๐ฒ๐น๐ถ๐ฏ๐ฒ๐ฟ๐ฎ๐๐ฒ ๐ฃ๐ฟ๐ฎ๐ฐ๐๐ถ๐ฐ๐ฒ โ๏ธ
If you are serious about growing your excelling in data science, you have to put in the time to nurture your knowledge. This means that you need to spend less time watching mindless videos on TikTok and spend more time reading books and watching video lectures.




Comments
There are no comments for this story
Be the first to respond and start the conversation.