Publication

Self-Supervised Pretraining and Controlled Augmentation Improve Rare Wildlife Recognition in UAV Images

Related concepts (21)

Aerial photography (or airborne imagery) is the taking of photographs from an aircraft or other airborne platforms. When taking motion pictures, it is also known as aerial videography. Platforms for aerial photography include fixed-wing aircraft, helicopters, unmanned aerial vehicles (UAVs or "drones"), balloons, blimps and dirigibles, rockets, pigeons, kites, or using action cameras while skydiving or wingsuiting. Handheld cameras may be manually operated by the photographer, while mounted cameras are usually remotely operated or triggered automatically.

Aerial photographic and satellite image interpretation

Aerial photographic and satellite image interpretation, or just image interpretation when in context, is the act of examining s, particularly airborne and , to identify objects and judging their significance. This is commonly used in military aerial reconnaissance, using photographs taken from reconnaissance aircraft and reconnaissance satellites. The principles of image interpretation have been developed empirically for more than 150 years.

Unmanned aerial vehicle

An unmanned aerial vehicle (UAV), commonly known as a drone, is an aircraft without any human pilot, crew, or passengers on board. UAVs were originally developed through the twentieth century for military missions too "dull, dirty or dangerous" for humans, and by the twenty-first, they had become essential assets to most militaries. As control technologies improved and costs fell, their use expanded to many non-military applications.

Feature learning

In machine learning, feature learning or representation learning is a set of techniques that allows a system to automatically discover the representations needed for feature detection or classification from raw data. This replaces manual feature engineering and allows a machine to both learn the features and use them to perform a specific task. Feature learning is motivated by the fact that machine learning tasks such as classification often require input that is mathematically and computationally convenient to process.

Transformer (machine learning model)

A transformer is a deep learning architecture that relies on the parallel multi-head attention mechanism. The modern transformer was proposed in the 2017 paper titled 'Attention Is All You Need' by Ashish Vaswani et al., Google Brain team. It is notable for requiring less training time than previous recurrent neural architectures, such as long short-term memory (LSTM), and its later variation has been prevalently adopted for training large language models on large (language) datasets, such as the Wikipedia corpus and Common Crawl, by virtue of the parallelized processing of input sequence.

Aerial reconnaissance

Aerial reconnaissance is reconnaissance for a military or strategic purpose that is conducted using reconnaissance aircraft. The role of reconnaissance can fulfil a variety of requirements including artillery spotting, the collection of , and the observation of enemy maneuvers. Aerial photography#History and Imagery intelligence#History Espionage balloon After the French Revolution, the new rulers became interested in using the balloon to observe enemy manoeuvres and appointed scientist Charles Coutelle to conduct studies using the balloon L'Entreprenant, the first military reconnaissance aircraft.

Generative pre-trained transformer

Generative pre-trained transformers (GPT) are a type of large language model (LLM) and a prominent framework for generative artificial intelligence. The first GPT was introduced in 2018 by OpenAI. GPT models are artificial neural networks that are based on the transformer architecture, pre-trained on large data sets of unlabelled text, and able to generate novel human-like content. As of 2023, most LLMs have these characteristics and are sometimes referred to broadly as GPTs.

Deep learning

Deep learning is part of a broader family of machine learning methods, which is based on artificial neural networks with representation learning. The adjective "deep" in deep learning refers to the use of multiple layers in the network. Methods used can be either supervised, semi-supervised or unsupervised.

Attention (machine learning)

Machine learning-based attention is a mechanism mimicking cognitive attention. It calculates "soft" weights for each word, more precisely for its embedding, in the context window. It can do it either in parallel (such as in transformers) or sequentially (such as recursive neural networks). "Soft" weights can change during each runtime, in contrast to "hard" weights, which are (pre-)trained and fine-tuned and remain frozen afterwards. Multiple attention heads are used in transformer-based large language models.

Satellite imagery

Satellite images (also Earth observation imagery, spaceborne photography, or simply satellite photo) are s of Earth collected by imaging satellites operated by governments and businesses around the world. Satellite imaging companies sell images by licensing them to governments and businesses such as Apple Maps and Google Maps. The first images from space were taken on sub-orbital flights. The U.S-launched V-2 flight on October 24, 1946, took one image every 1.5 seconds.

Explainable artificial intelligence

Explainable AI (XAI), also known as Interpretable AI, or Explainable Machine Learning (XML), either refers to an AI system over which it is possible for humans to retain intellectual oversight, or to the methods to achieve this. The main focus is usually on the reasoning behind the decisions or predictions made by the AI which are made more understandable and transparent. XAI counters the "black box" tendency of machine learning, where even the AI's designers cannot explain why it arrived at a specific decision.

Artificial intelligence

Artificial intelligence (AI) is the intelligence of machines or software, as opposed to the intelligence of human beings or animals. AI applications include advanced web search engines (e.g., Google Search), recommendation systems (used by YouTube, Amazon, and Netflix), understanding human speech (such as Siri and Alexa), self-driving cars (e.g., Waymo), generative or creative tools (ChatGPT and AI art), and competing at the highest level in strategic games (such as chess and Go).

Large language model

A large language model (LLM) is a language model characterized by its large size. Their size is enabled by AI accelerators, which are able to process vast amounts of text data, mostly scraped from the Internet. The artificial neural networks which are built can contain from tens of millions and up to billions of weights and are (pre-)trained using self-supervised learning and semi-supervised learning. Transformer architecture contributed to faster training.

Deep reinforcement learning

Deep reinforcement learning (deep RL) is a subfield of machine learning that combines reinforcement learning (RL) and deep learning. RL considers the problem of a computational agent learning to make decisions by trial and error. Deep RL incorporates deep learning into the solution, allowing agents to make decisions from unstructured input data without manual engineering of the state space. Deep RL algorithms are able to take in very large inputs (e.g.

Machine learning

Machine learning (ML) is an umbrella term for solving problems for which development of algorithms by human programmers would be cost-prohibitive, and instead the problems are solved by helping machines 'discover' their 'own' algorithms, without needing to be explicitly told what to do by any human-developed algorithms. Recently, generative artificial neural networks have been able to surpass results of many previous approaches.

Q-learning

Q-learning is a model-free reinforcement learning algorithm to learn the value of an action in a particular state. It does not require a model of the environment (hence "model-free"), and it can handle problems with stochastic transitions and rewards without requiring adaptations. For any finite Markov decision process (FMDP), Q-learning finds an optimal policy in the sense of maximizing the expected value of the total reward over any and all successive steps, starting from the current state.

Foundation models

A foundation model (also called base model) is a large machine learning (ML) model trained on a vast quantity of data at scale (often by self-supervised learning or semi-supervised learning) such that it can be adapted to a wide range of downstream tasks. Foundation models have helped bring about a major transformation in how artificial intelligence (AI) systems are built, such as by powering prominent chatbots and other user-facing AI.

Extended precision

Extended precision refers to floating-point number formats that provide greater precision than the basic floating-point formats. Extended precision formats support a basic format by minimizing roundoff and overflow errors in intermediate values of expressions on the base format. In contrast to extended precision, arbitrary-precision arithmetic refers to implementations of much larger numeric types (with a storage count that usually is not a power of two) using special software (or, rarely, hardware).

Energy–momentum relation

In physics, the energy–momentum relation, or relativistic dispersion relation, is the relativistic equation relating total energy (which is also called relativistic energy) to invariant mass (which is also called rest mass) and momentum. It is the extension of mass–energy equivalence for bodies or systems with non-zero momentum. It can be written as the following equation: This equation holds for a body or system, such as one or more particles, with total energy E, invariant mass m0, and momentum of magnitude p; the constant c is the speed of light.

Data wrangling

Data wrangling, sometimes referred to as data munging, is the process of transforming and mapping data from one "raw" data form into another format with the intent of making it more appropriate and valuable for a variety of downstream purposes such as analytics. The goal of data wrangling is to assure quality and useful data. Data analysts typically spend the majority of their time in the process of data wrangling compared to the actual analysis of the data.