Sumedh Datar FBCS is a senior software engineer with expertise in computer vision. He explores ways in which an absence of data may not necessarily result in AI being unable to identify new objects correctly.

In computer vision, data is often hailed as the lifeblood that fuels the development and performance of models. Computer vision models rely on abundant high-quality data, from recognising faces to detecting objects in real-time. However, not all scenarios are blessed with an abundance of labelled data — or data in general. Data scarcity poses a significant challenge in many real-world situations, limiting the potential of computer vision applications.

Why data scarcity is a challenge

The availability of large, labelled datasets ushered in the golden era of deep learning in computer vision. Models, especially deep neural networks, thrive on vast amounts of data, using it to fine-tune their parameters and achieve remarkable accuracy. But what happens when such data isn't available?

First, without sufficient data, models tend to overfit. They might perform exceptionally well on the training data but fail to generalise to new, unseen data. This lack of generalisation is critical, especially in applications where reliability is very important.

Furthermore, collecting and labelling data is neither cheap nor swift. Acquiring even a few hundred labelled examples can be prohibitively expensive or time-consuming in many domains. This bottleneck severely restricts deploying advanced computer vision solutions in such areas.

Constructing datasets in data-scarce environments

With data frequently lacking, computer vision modellers have devised innovative methods for data acquisition, such as:

1. Leveraging pre-trained models

Given their training on extensive standard datasets, pre-trained open-source models excel at identifying common objects. Many custom object detection or classification tasks share similarities with these standard objects. As an initial step, integrating a pre-trained model into the solution can provide a foundation, even if the initial predictions are suboptimal. These preliminary results can serve as a mechanism to gather pertinent data.

2. Data augmentation

Obtaining adequate data can be challenging in specific scenarios, even with pre-trained models and dedicated platforms. For instance, in healthcare settings, most patients might present with typical results, making abnormal data samples sparse. This scarcity poses a challenge when training models need to recognise rare or atypical patterns. Data augmentation offers a solution to this dilemma. By programmatically applying techniques such as rotation, scaling, and brightness adjustments, the volume of abnormal or varied data can be artificially increased, enhancing the diversity and richness of the dataset.

3. Synthetic data generation

In the contemporary AI landscape, generative AI (GenAI) and generative adversarial networks (GANs) have emerged as frontrunners in synthesising realistic images. These tools are invaluable when conventional data collection methods fail to capture edge cases. By generating such rare yet crucial data points, they equip models to handle uncommon situations. As a strategy in data-scarce environments, leveraging GenAI and GANs fills the data void and ensures that models are robustly trained to tackle typical and atypical challenges.

4. Manually labelling data

Manual data labelling involves assigning a human the responsibility of verifying the model's output and meticulously curating each piece of data from an unindexed collection based on specific guidelines. This careful curation is instrumental in forming reliable datasets. Several companies, such as Labelbox, Sama AI, Superb AI, Scale AI and Amazon Mechanical Turk (MTurk), that specialise in this area help create valuable datasets that can be used to train models to perform tasks.

Training computer vision models for data-scarce environments

1. Transfer learning

Transfer learning is a cornerstone in machine learning, especially when data is sparse. It capitalises on models like VGG, ResNet and Inception, and is trained on vast datasets like ImageNet, capturing a spectrum of features from basic textures to intricate patterns. Instead of starting from scratch, transfer learning repurposes these pre-trained models, leveraging their foundational features. The real magic unfolds during fine-tuning: the last layers of the model, tailored for specific tasks, are adjusted using the target dataset. This approach not only accelerates development but also economises computational resources. By building on the robust feature representations of pre-trained models and adapting them to specialised tasks, transfer learning adeptly navigates the challenges of limited data, ensuring impressive model performance.

2. Zero-shot learning

Zero-shot learning is an advanced machine learning technique designed to recognise and classify objects or entities without seeing examples of those specific classes during training.

Instead of relying on direct experience, zero-shot learning leverages semantic relationships. Typically, it uses semantic attribute spaces, where each seen and unseen class is associated with a set of attributes or descriptors. For instance, even if a model hasn't been trained on a specific bird species if it knows the attributes like ‘has wings’ or ‘can fly’, it can make educated guesses about new bird species based on these descriptors.

For you

Be part of something bigger, join BCS, The Chartered Institute for IT.

By understanding the relationships between known classes and their attributes, the model can infer the characteristics of unseen classes, allowing it to make predictions in scenarios where direct training data is absent.

3. Meta-learning

Meta-learning, often described as ‘learning to learn’, is a paradigm in machine learning where models are trained to adapt to new tasks with minimal data quickly. Instead of being designed for a specific job, meta-learners are trained across various tasks, gaining a broader understanding of learning itself. The primary goal is to leverage the knowledge from previous tasks to facilitate faster and more effective learning of new tasks. For instance, in few-shot learning scenarios, where only a handful of examples are available, meta-learning models can draw upon their prior ‘meta-knowledge’ to achieve impressive performance. By encapsulating the broader patterns and structures of learning, meta-learning models aim to generalise across tasks, making them particularly suited for environments where rapid adaptability is crucial.

Limitations

While the techniques above have proven instrumental in constructing robust predictive models and addressing real-world challenges, they are not without their constraints. Their efficacy diminishes when the data diverges significantly from the domain, making domain translation challenging. These neural networks are adept at recognising patterns based on everyday objects; when the data aligns closely with these familiar patterns, the models excel. However, suppose the data presents starkly different visuals or features: in that case, there's often a need to retrain the models from the ground up — a scenario that arises approximately 3 to 5% of the time.

Emerging trends

To address challenges in domain translation and find solutions without training models from scratch, cutting-edge techniques such as neural architecture search (NAS), self-supervised learning, and transformer-based attention mechanisms are employed. NAS tailors the neural network architecture to fit custom data, optimising its structure for specific tasks. Self-supervised learning, on the other hand, autonomously generates labels from the data, eliminating the need for manual annotation. Attention mechanisms, particularly those rooted in transformers, enhance model interpretability by highlighting focus areas to ensure the model concentrates on pertinent regions. A prime example of this innovation is Meta's ‘Segment Anything’ model, which leverages transformers to segment objects of interest adeptly.

Conclusion

Navigating data-scarce environments in computer vision is undeniably challenging, yet it's a crucible that has spurred remarkable innovation. In an era where data is often dubbed the 'new oil', its scarcity has compelled researchers and practitioners to think beyond traditional paradigms. Techniques like transfer learning, few-shot learning, and GenAI have emerged as beacons, illuminating the path forward. Furthermore, integrating advanced algorithms with cost-effective deployment strategies has proven that it's not always about having more data, but harnessing it intelligently. As we continue to push the boundaries of what's possible in computer vision, the lessons learned from data-scarce environments will undoubtedly shape the future, emphasising adaptability, efficiency and the relentless pursuit of innovation.

About the Author

Sumedh Datar FBCS is a senior software engineer with deep expertise in computer vision. He works as a research engineer. He has expertise working in different domains, including retail, healthcare and asset management. He holds several patents, and his solutions have impacted many people worldwide. Sumedh holds a Master's Degree in computer science, specialising in AI.