History Of Toloka

History Of Companies - 97

By TheNaethPublished about a year ago • 3 min read

Toloka is a corporation that offers services that are associated with crowdsourcing and artificial intelligence that generates content (generative AI). Amsterdam serves as the seat of the company's administrative headquarters.

The organization offers assistance in the creation of artificial intelligence, beginning with training and finishing in certification. Additionally, the company offers services related to massive language models and generative artificial intelligence. Not only that, but the firm also provides services that are associated with artificial intelligence.

Formed in 2014, Toloka was first established by Olga Megorskaya, who at the time was serving on the board of directors of Yandex International. The platform known as Toloka is designed to make crowdsourcing and microtasking more accessible.The primary goal of its development was to enhance search engines and machine learning algorithms, and it was built specifically for the purpose of data markup.

As a result of the development of generative artificial intelligence, the platform underwent improvements in order to meet the needs of app developers who deal with AI of various generations.

These modifications included the provision of professional data labeling.During the year 2024, the operations of the firm that were located in Russia were sold to investors from Russia. A transaction of this kind took occurred.

Toloka offers a diverse selection of services in the field of generative artificial intelligence, which is one of the areas in which it operates.

Model fine tuning, reinforcement learning based on human input, assessment, and ad hoc datasets are some of the services that are included in this category. In order to be able to provide these services, it is necessary to have a substantial number of personnel that possess a high level of expertise to give annotation.

With the help of Toloka, trainers are charged with determining whether or not the material they are dealing with contains any items. The requirements that are provided by algorithms are taken into consideration while making this conclusion.

The degree to which the chatbot's replies are relevant and interesting in relation to the conversations that are offered is yet another aspect that they take into account.Translation verification also includes evaluating the degree of correctness of translations that have been provided by a large number of individual annotators.

This is an additional employment that falls under the category of translation verification.

The development and provision of context-based prompts is necessary for the aim of fine-tuning large language models (LLMs), which necessitates the aid of specialists. Not only are these prompts capable of being single-turn or multi-turn, but they may also serve a broad variety of domains and purposes, and they can originate from a variety of different origins.

Within the field of natural language processing (NLP), Toloka makes the process of detecting and categorizing optical characters, doing sentiment analysis, recognizing named entities, and assessing the degree to which search results are relevant more straightforward.

It also offers options for transcription and the categorization of audio data, in addition to the aforementioned features.

In the month of May 2019, the research team at Toloka started making datasets accessible for use in academic and non-commercial contexts by making them available for use.

The objective of this endeavor was to provide support to the scientific community and to encourage researchers to go to Toloka by means of the supply of assistance. The use of these datasets is intended for academics who are doing research in a broad variety of fields, including, but not limited to, linguistics, computer vision, testing of result aggregation approaches, and chatbot training, amongst a great number of other examples.

Toloka's research has been presented at a variety of conferences, such as the Conference on Neural Information Processing Systems, the International Conference on Machine Learning, and the International Conference on Very Large Data Bases, amongst others. These conferences have all heard presentations from Toloka's research.

During the AAAI Conference on Artificial Intelligence, which took place in February 2024, Toloka presented a tutorial that focused on the process of aligning massive language models with native languages while using a limited amount of resources. And Toloka was the one who delivered the lesson.

Both HuggingFace and ServiceNow were the main factors behind the scientific project known as BigCode, which was a collaborative effort between the two technologies. In the time period in question, the organization was actively participating in BigCode and serving as the key data partner for the project.