Publication

Encoder-Decoder Models for Human Segmentation and Motion Analysis

Related concepts (26)

In and computer vision, image segmentation is the process of partitioning a into multiple image segments, also known as image regions or image objects (sets of pixels). The goal of segmentation is to simplify and/or change the representation of an image into something that is more meaningful and easier to analyze. Image segmentation is typically used to locate objects and boundaries (lines, curves, etc.) in images. More precisely, image segmentation is the process of assigning a label to every pixel in an image such that pixels with the same label share certain characteristics.

Motion capture

Motion capture (sometimes referred as mo-cap or mocap, for short) is the process of recording the movement of objects or people. It is used in military, entertainment, sports, medical applications, and for validation of computer vision and robots. In filmmaking and video game development, it refers to recording actions of human actors and using that information to animate digital character models in 2D or 3D computer animation. When it includes face and fingers or captures subtle expressions, it is often referred to as performance capture.

Image analysis

Image analysis or imagery analysis is the extraction of meaningful information from s; mainly from s by means of techniques. Image analysis tasks can be as simple as reading bar coded tags or as sophisticated as identifying a person from their face. Computers are indispensable for the analysis of large amounts of data, for tasks that require complex computation, or for the extraction of quantitative information.

Medical image computing

Medical image computing (MIC) is an interdisciplinary field at the intersection of computer science, information engineering, electrical engineering, physics, mathematics and medicine. This field develops computational and mathematical methods for solving problems pertaining to medical images and their use for biomedical research and clinical care. The main goal of MIC is to extract clinically relevant information or knowledge from medical images.

Activity recognition

Activity recognition aims to recognize the actions and goals of one or more agents from a series of observations on the agents' actions and the environmental conditions. Since the 1980s, this research field has captured the attention of several computer science communities due to its strength in providing personalized support for many different applications and its connection to many different fields of study such as medicine, human-computer interaction, or sociology.

Self-driving car

A self-driving car, also known as an autonomous car, driverless car, or robotic car (robo-car), is a car that is capable of traveling without human input. Self-driving cars use sensors to perceive their surroundings, such as optical and thermographic cameras, radar, lidar, ultrasound/sonar, GPS, odometry and inertial measurement units. Control systems interpret sensory information to create a three-dimensional model of the vehicle's surroundings.

Vehicular automation

Vehicular automation involves the use of mechatronics, artificial intelligence, and multi-agent systems to assist the operator of a vehicle (car, aircraft, watercraft, or otherwise). These features and the vehicles employing them may be labeled as intelligent or smart. A vehicle using automation for difficult tasks, especially navigation, to ease but not entirely replace human input, may be referred to as semi-autonomous, whereas a vehicle relying solely on automation is called robotic or autonomous.

Self-driving truck

A self-driving truck, also known as an autonomous truck or robo-truck, is an application of self-driving technology aiming to create trucks that can operate without human input. Alongside light, medium, and heavy-duty trucks, many companies are developing self-driving technology in semi trucks to automate highway driving in the delivery process. In September 2022, Guidehouse Insights listed Waymo, Aurora, TuSimple, Gatik, PlusAI, Kodiak Robotics, Daimler Truck, Einride, Locomation, and Embark as the top 10 vendors in automated trucking.

Autonomous robot

An autonomous robot is a robot that acts without recourse to human control. The first autonomous robots environment were known as Elmer and Elsie, which were constructed in the late 1940s by W. Grey Walter. They were the first robots in history that were programmed to "think" the way biological brains do and meant to have free will. Elmer and Elsie were often labeled as tortoises because of how they were shaped and the manner in which they moved. They were capable of phototaxis which is the movement that occurs in response to light stimulus.

Computer vision

Computer vision tasks include methods for , , and understanding digital images, and extraction of high-dimensional data from the real world in order to produce numerical or symbolic information, e.g. in the forms of decisions. Understanding in this context means the transformation of visual images (the input to the retina in the human analog) into descriptions of the world that make sense to thought processes and can elicit appropriate action.

Pose tracking

In virtual reality (VR) and augmented reality (AR), a pose tracking system detects the precise pose of head-mounted displays, controllers, other objects or body parts within Euclidean space. Pose tracking is often referred to as 6DOF tracking, for the six degrees of freedom in which the pose is often tracked. Pose tracking is sometimes referred to as positional tracking, but the two are separate. Pose tracking is different from positional tracking because pose tracking includes orientation whereas and positional tracking does not.

Deep learning

Deep learning is part of a broader family of machine learning methods, which is based on artificial neural networks with representation learning. The adjective "deep" in deep learning refers to the use of multiple layers in the network. Methods used can be either supervised, semi-supervised or unsupervised.

Waymo

Waymo LLC, formerly known as the Google Self-Driving Car Project, is an American autonomous driving technology company headquartered in Mountain View, California. It is a subsidiary of Alphabet Inc., the parent company of Google. Google's development of self-driving technology began in January 2009, at the company's Google X lab run by co-founder Sergey Brin. The project was launched by Sebastian Thrun, director of the Stanford Artificial Intelligence Laboratory (SAIL) and Anthony Levandowski, founder of 510 Systems and Anthony's Robots.

Motion

In physics, motion is the phenomenon by which an object changes its position with respect to time. Motion is mathematically described in terms of displacement, distance, velocity, acceleration, speed, and frame of reference to an observer, measuring the change in position of the body relative to that frame with a change in time. The branch of physics describing the motion of objects without reference to their cause is called kinematics, while the branch studying forces and their effect on motion is called dynamics.

Video content analysis

Video content analysis or video content analytics (VCA), also known as video analysis or video analytics (VA), is the capability of automatically analyzing video to detect and determine temporal and spatial events. This technical capability is used in a wide range of domains including entertainment, video retrieval and video browsing, health-care, retail, automotive, transport, home automation, flame and smoke detection, safety, and security. The algorithms can be implemented as software on general-purpose machines, or as hardware in specialized video processing units.

Stop motion

Stop motion is an animated filmmaking technique in which objects are physically manipulated in small increments between individually photographed frames so that they will appear to exhibit independent motion or change when the series of frames is played back. Any kind of object can thus be animated, but puppets with movable joints (puppet animation) or plasticine figures (clay animation or claymation) are most commonly used. Puppets, models or clay figures built around an armature are used in model animation.

Finger tracking

In the field of gesture recognition and , finger tracking is a high-resolution technique developed in 1969 that is employed to know the consecutive position of the fingers of the user and hence represent objects in 3D. In addition to that, the finger tracking technique is used as a tool of the computer, acting as an external device in our computer, similar to a keyboard and a mouse. The finger tracking system is focused on user-data interaction, where the user interacts with virtual data, by handling through the fingers the volumetric of a 3D object that we want to represent.

Motion controller

In video games and entertainment systems, a motion controller is a type of game controller that uses accelerometers or other sensors to track motion and provide input. Motion controllers using accelerometers are used as controllers for video games, which was made more popular since 2006 by the Wii Remote controller for Nintendo's Wii console, which uses accelerometers to detect its approximate orientation and acceleration, and serves an image sensor, so it can be used as a pointing device.

Transformer (machine learning model)

A transformer is a deep learning architecture that relies on the parallel multi-head attention mechanism. The modern transformer was proposed in the 2017 paper titled 'Attention Is All You Need' by Ashish Vaswani et al., Google Brain team. It is notable for requiring less training time than previous recurrent neural architectures, such as long short-term memory (LSTM), and its later variation has been prevalently adopted for training large language models on large (language) datasets, such as the Wikipedia corpus and Common Crawl, by virtue of the parallelized processing of input sequence.

3D modeling

In 3D computer graphics, 3D modeling is the process of developing a mathematical coordinate-based representation of any surface of an object (inanimate or living) in three dimensions via specialized software by manipulating edges, vertices, and polygons in a simulated 3D space. Three-dimensional (3D) models represent a physical body using a collection of points in 3D space, connected by various geometric entities such as triangles, lines, curved surfaces, etc.