Artificial Intelligence: A Broad Perspective
AI Imagination unlimited

Greetings. Good afternoon, ladies and gentlemen. Today, we shall engage in a discussion regarding the integration of reality and imagination. However, before delving into the topic, let us take a moment to reflect upon the year 2001. As a fifteen-year-old residing in India, I developed a profound fascination for computer graphics and visual effects. Naturally, at that tender age, my endeavors were limited to creating somewhat amateurish videos akin to the one presented here. Nevertheless, this marked the inception of a fundamental theme in my life: the pursuit of harmonizing reality and imagination. This pursuit has remained a constant presence throughout my extensive ten-year career in the field of technology, during which I have held positions as a product manager at esteemed companies such as Google, and as a content creator on prominent platforms like YouTube and TikTok. Today, let us dissect this quest to blend reality and imagination, and explore how it is being propelled to new heights through the utilization of artificial intelligence.
Let us commence with the factual aspect. It is likely that you are familiar with the concept of photogrammetry. This discipline involves the art and science of measuring objects in the physical world through the use of photographs and other sensory devices. What was once a process that necessitated extensive data centers and teams of experts in the 2000s has gradually become more accessible to the general public by the 2010s. Subsequently, machine learning emerged and revolutionized the field with advanced techniques such as neural radiance fields, commonly referred to as NeRFs. The present demonstration showcases an AI model that constructs a comprehensive volumetric 3D representation solely based on 2D images. Unlike previous methods of capturing reality, NeRFs excel at encapsulating the intricate complexity and subtleties of the real world, effectively capturing its essence. Remarkably, within a year, one can accomplish all of these tasks using a device as commonplace as an iPhone, utilizing applications like Luma. It is akin to capturing three-dimensional snapshots of the physical world. By capturing an object or location once, it becomes possible to endlessly manipulate and incorporate it into future creative endeavors. This encapsulates the reality aspect. As NeRFs gained prominence last year, the field of artificial intelligence experienced a surge of innovation, with the introduction of Midjourney, DALL-E 2, and Stable Diffusion, all entering the market simultaneously. However, it was the technique of inpainting that truly captivated me. This method allows for the augmentation of existing imagery with any desired elements, resulting in photorealistic and astonishing outcomes. It astounded me because tasks that would have taken hours to complete using traditional workflows could now be accomplished in a mere matter of minutes. Nevertheless, my desire for further advancements persisted.
Introducing ControlNet, a revolutionary technique developed by researchers at Stanford University that enables the utilization of diverse input conditions to direct and regulate the process of AI image generation. In my particular case, I am able to extract depth information and texture detail from my 3D scans and apply it to re-skin reality. This is not merely a visually appealing concept, as there are numerous practical applications. For instance, I have taken a 3D scan of my parents' drawing room and transformed it into various styles of Indian decor while maintaining the spatial context and layout of the interior space. It is evident that this technology has the potential to revolutionize the fields of architecture and interior design. One could take a 2016 scan of a Buddha statue and transform it into a magnificent golden statue while executing camera movements that would be impossible otherwise. Similarly, vacation footage from a trip to Tokyo could be enhanced by bringing cherry blossoms to life in a novel manner. The beauty of natural landscapes can also be accentuated, such as a waterfall that appears to be from another planet. The possibilities are not limited to static scenes, as this technology can also be applied to videos. The prospect of this technology operating at 30 frames per second is exciting, as it has the potential to transform augmented reality and 3D rendering. It is not far-fetched to imagine a future where we can switch between realities superimposed on the real world.
Certainly, just as reality capture has become more accessible to the general public, these tools from the previous year are becoming even more user-friendly. Instead of spending hours combining various tools, platforms such as Runway and Kaiber allow users to achieve the same results with just a few clicks. Do you want to transition from day to night? Not a problem. Do you desire to recreate the retro 90s aesthetic from the television show "Full House"? You can accomplish that as well. However, the advancements extend beyond reality capture. Companies like Wonder Dynamics are transforming videos into a flawless form of performance capture, enabling individuals to embody fantastical creatures using nothing more than their smartphones. This is something that James Cameron could only have imagined in the 2000s. And now, it is possible to achieve with an iPhone? That concept is truly astonishing to me. Reflecting on the past two decades and the assortment of tools I have had to learn, I feel a sense of optimism for what lies ahead for the next generation of creators. Today's 15-year-olds do not have to concern themselves with the complexities of the past. All they need is a creative vision and the ability to collaborate harmoniously with these AI models, which represent a true distillation of human knowledge and creativity. This is the future that excites me, a future where reality and imagination can seamlessly blend with the assistance of a reliable AI co-pilot.


Comments
There are no comments for this story
Be the first to respond and start the conversation.