Machine Learning

AdaBoost Algorithm

By Bahati MulishiPublished about a year ago • 5 min read

1. What is the AdaBoost Algorithm?

AdaBoost also called Adaptive Boosting is a technique in Machine Learning used as an Ensemble Method. The most common algorithm used with AdaBoost is decision trees with one level that means with Decision trees with only 1 split. These trees are also called Decision Stumps. What this algorithm does is that it builds a model and gives equal weights to all the data points. It then assigns higher weights to points that are wrongly classified. Now all the points which have higher weights are given more importance in the next model. It will keep training models until and unless a lower error is received.

2. What is the Sliding Window method for Time Series Forecasting?

Time series can be phrased as supervised learning. Given a sequence of numbers for a time series dataset, we can restructure the data to look like a supervised learning problem.

In the sliding window method, the previous time steps can be used as input variables, and the next time steps can be used as the output variable.

In statistics and time series analysis, this is called a lag or lag method. The number of previous time steps is called the window width or size of the lag. This sliding window is the basis for how we can turn any time series dataset into a supervised learning problem.

3. What do you understand by sub-queries in SQL?

A subquery is a query inside another query where a query is defined to retrieve data or information back from the database. In a subquery, the outer query is called as the main query whereas the inner query is called subquery. Subqueries are always executed first and the result of the subquery is passed on to the main query. It can be nested inside a SELECT, UPDATE or any other query. A subquery can also use any comparison operators such as >,< or =.

4. Explain the Difference Between Tableau Worksheet, Dashboard, Story, and Workbook?

Tableau uses a workbook and sheet file structure, much like Microsoft Excel.

A workbook contains sheets, which can be a worksheet, dashboard, or a story.

A worksheet contains a single view along with shelves, legends, and the Data pane.

A dashboard is a collection of views from multiple worksheets.

A story contains a sequence of worksheets or dashboards that work together to convey information.

5. How is a Random Forest related to Decision Trees?

Random forest is an ensemble learning method that works by constructing a multitude of decision trees. A random forest can be constructed for both classification and regression tasks.

Random forest outperforms decision trees, and it also does not have the habit of overfitting the data as decision trees do.

A decision tree trained on a specific dataset will become very deep and cause overfitting. To create a random forest, decision trees can be trained on different subsets of the training dataset, and then the different decision trees can be averaged with the goal of decreasing the variance.

6. What are some disadvantages of using Naive Bayes Algorithm?

Some disadvantages of using Naive Bayes Algorithm are:

It relies on a very big assumption that the independent variables are not related to each other.

It is generally not suitable for datasets with large numbers of numerical attributes.

It has been observed that if a rare case is not in the training dataset but is in the testing dataset, then it will most definitely be wrong.

Prompt Engineering in itself does not warrant a separate job.

Most of the things you see online related to prompts (especially things said by people selling courses) is mostly just writing some crazy text to get ChatGPT to do some specific task. Most of these prompts are just been found by serendipity and are never used in any company. They may be fine for personal usage but no company is going to pay a person to try out prompts 😅. Also a lot of these prompts don't work for any other LLMs apart from ChatGPT.

You have mostly two types of jobs in this field nowadays, one is more focused on training, optimizing and deploying models. For this knowing the architecture of LLMs is critical and a strong background in PyTorch, Jax and HuggingFace is required. Other engineering skills like System Design and building APIs is also important for some jobs. This is the work you would find in companies like OpenAI, Anthropic, Cohere etc.

The other is jobs where you build applications using LLMs (this comprises of majority of the companies that do LLM related work nowadays, both product based and service based). Roles in these companies are called Applied NLP Engineer or ML Engineer, sometimes even Data Scientist roles. For this you mostly need to understand how LLMs can be used for different applications as well as know the necessary frameworks for building LLM applications (Langchain/LlamaIndex/Haystack). Apart from this, you need to know LLM specific techniques for applications like Vector Search, RAG, Structured Text Generation. This is also where some part of your role involves prompt engineering. Its not the most crucial bit, but it is important in some cases, especially when you are limited in the other techniques.

Learn Data Science in 2024

𝟭. 𝗔𝗽𝗽𝗹𝘆 𝗣𝗮𝗿𝗲𝘁𝗼'𝘀 𝗟𝗮𝘄 𝘁𝗼 𝗟𝗲𝗮𝗿𝗻 𝗝𝘂𝘀𝘁 𝗘𝗻𝗼𝘂𝗴𝗵 📚

Pareto's Law states that "that 80% of consequences come from 20% of the causes".

This law should serve as a guiding framework for the volume of content you need to know to be proficient in data science.

Often rookies make the mistake of overspending their time learning algorithms that are rarely applied in production. Learning about advanced algorithms such as XLNet, Bayesian SVD++, and BiLSTMs, are cool to learn.

But, in reality, you will rarely apply such algorithms in production (unless your job demands research and application of state-of-the-art algos).

For most ML applications in production - especially in the MVP phase, simple algos like logistic regression, K-Means, random forest, and XGBoost provide the biggest bang for the buck because of their simplicity in training, interpretation and productionization.

So, invest more time learning topics that provide immediate value now, not a year later.

𝟮. 𝗙𝗶𝗻𝗱 𝗮 𝗠𝗲𝗻𝘁𝗼𝗿 ⚡️

There’s a Japanese proverb that says “Better than a thousand days of diligent study is one day with a great teacher.” This proverb directly applies to learning data science quickly.

Mentors can teach you about how to build a model in production and how to manage stakeholders - stuff that you don’t often read about in courses and books.

So, find a mentor who can teach you practical knowledge in data science.

𝟯. 𝗗𝗲𝗹𝗶𝗯𝗲𝗿𝗮𝘁𝗲 𝗣𝗿𝗮𝗰𝘁𝗶𝗰𝗲 ✍️

If you are serious about growing your excelling in data science, you have to put in the time to nurture your knowledge. This means that you need to spend less time watching mindless videos on TikTok and spend more time reading books and watching video lectures.

how to

About the Creator

Bahati Mulishi

Practical advice on remote work, IT careers, and professional skills to help you stay work-ready anywhere in the world.

Reader insights

Be the first to share your insights about this piece.

How does it work?

Add your insights

Comments

There are no comments for this story

Be the first to respond and start the conversation.

Keep reading

More stories from Bahati Mulishi and writers in Geeks and other communities.

Machine Learning

AdaBoost Algorithm

About the Creator

Bahati Mulishi

Reader insights

Be the first to share your insights about this piece.

Comments

Keep reading

The Data Science skill no one talks about

A Knight of the Seven Kingdoms Series Review (Season 1)

Spongebob Review

Tatt~Master